The AI Daily Brief: Artificial Intelligence News and Analysis - Why GPT-4-32k Will Totally Transform What We Can Do With AI

Episode Date: April 24, 2023

GPT-4 as most people use it has an 8K token limit. Some already have access to a version with a 32K limit, however, and are reporting hugely different opportunities in terms of what it GPT can do, bas...ed on how much more text it can take in and output at once.

Transcript
Discussion (0)
Starting point is 00:00:00 The episode of the AI breakdown that you're about to hear was originally released on YouTube on Monday, April 24th. In this show, we discussed GPT432K and just how many new use cases the expansion of the token limits of GPT will truly open up. An AI developer recently wrote on Twitter, widespread GPT432K access will be a bigger leap than GPT 3.5 to GPT4 was. Today on the AI breakdown, we're talking about what GPT-432K actually is and what it's going to change when you have access to it. Welcome back to the AI breakdown. So for today's show, I wanted to give a chance for people to decide or weigh in on what they wanted to hear about. There were two topics that stood out to me as potentially interesting.
Starting point is 00:00:52 The first was about Grimes. Grimes last night retweeted the New York Times piece about the viral AI Drake weekend track and said, that she would collaborate and give 50% to any AI music creators who used her voice. And then today she built on it and talked about what would or wouldn't be allowed. And there's all sorts of interesting things around the future of creation and new business models for music. And Grimes is weird and fascinating. So I thought, hey, maybe that would be something that people were interested in. But then I also gave folks the choice between that and GPT4, moving to GPT432K.
Starting point is 00:01:26 And of the 150 people who responded nearly 80% wanted to hear about GPT432K, which I think reflects how interested in this people are. So today we're going to talk about 32K, what it is, why you should care, what it will likely mean for you. Now, you might have seen something like this tweet recently that kind of gives you an idea of what it means to move to GPT432K. So Lee Fang here says, I'm trying to crunch some numbers on various corporate filings, experimenting with chat GPT. I just tried a 66 page document and it says, the message was too long, submit something shorter. Does the premium paid version of chat GPT allow for longer message requests? Mike Kim responds, GPT3 and GPT4 have 4K and 8K token limits in the
Starting point is 00:02:14 context window. A token is approximately 70% of a word. A token is not a character. Soon they will be rolling out GPT432K, which has a 32K token limit, but that will be more expensive. So all right, we're getting the idea here that this shift to a 32K limit increases the amount of information that GPT can ingest and potentially changes what it can output. So to go a little bit deeper on what this actually means, I turned, of course, to chat GPT. I asked when discussing an AI model, what are token limits? ChatGPT says token limits refer to the maximum number of tokens, which are units of meaning that the model can process or generate in a single input or output sequence.
Starting point is 00:02:56 They're the building blocks of natural language processing models, and they represent words, subwords or characters depending on the specific tokenization method. So the limits are significant, they say, for three reasons. One is computational constraints. So how much it takes from a computational resource perspective to generate text and responds to questions. A second is memory limitations, how much they can deal with at a certain time, right? If the input or output sequence they write exceeds the token limit, the model may lose context, which can affect its performance. And that gets to the third part response quality. Token limits it writes can also influence the quality of the generated text.
Starting point is 00:03:35 If the input or output sequence is too long, the model might generate less coherent or less accurate responses. So if you've ever messed around with chat GPT, if you had a long string of conversations with it or questions that build on other questions, at some point it might get confused about what the original context was. Likewise, if you try to feed it a huge block of text, it might start to produce proportionally less coherent responses. So going down a little bit, I also asked ChatGPT, what would it change to move from the 4,000 token limit? That's what it was at GPT3 to a 32,000 token limit. It says it would
Starting point is 00:04:11 significantly increase the amount of text that an AI model can process and generate in a single sequence. This expansion could lead to new uses and improvements in existing applications. It gave six examples, one longer text analysis, so it could ingest, in entire books, entire research papers, entire contracts, without having to break them into smaller chunks. That means that it could provide potentially more context. That's number two, improved context retention. With a higher token limit, the AI model can retain context over longer input sequences. So you have, one, the ability to ingest bigger sets of text or bigger inputs,
Starting point is 00:04:49 but then two, over a longer series of questions, the AI will be able to remember or keep track of that context. Number three, it says better summarization. The ability to process longer text could also improve the quality of the text summarization. If you don't have to break a book into chapters, then the AI can ingest the entire book in the context of the entire book and come back presumably with a better summarization because it has the entire context all at once. That applies to enhanced translation as well. Now it says in the realm of chatbots and conversational AI,
Starting point is 00:05:22 a higher token limit would allow for more extensive conversations. for kind of the same reason that it enables the model to maintain context over a larger number of messages. And finally, creative writing and content generation. With the ability to generate longer pieces of text, the model could be used for more extensive creative writing tasks, such as writing full-length articles, stories, or even novels. Now, going back to the quote that I started with at the beginning of this episode, McKay wrote, widespread GPT432K access will be a bigger leap than GPT3.5 to GPT4 was. That was the quote that I started with.
Starting point is 00:05:55 The 4x larger context window combined with the power of GPT4 will be absolutely insane. People will be completely blown away by the complexity of workflows and use cases it will unlock. And indeed, you're starting to see people actually talking about this more and more. Christian here says GPT432 context window will immediately unlock two cost-effective use cases, in my opinion. One, turbocharged programming can fit an entire project's worth of code into context. And two, meeting summarization without chunking. Average hour-long meeting is 15,000 words for better info sharing and recap. Now, Matt Schumer here went even farther and wrote a whole thread this morning.
Starting point is 00:06:35 GPT432K makes regular GPT4 look like a toy. Here are some things it can do. One, summarize and answer questions about an entire research paper. He says, I literally just pasted the whole paper in the prompt, no embeddings required. It's exactly what we were talking about when we were seeing what chat chat. GPT said about this. When you can paste an entire big piece in there, it has the entire context of that piece when it comes to trying to summarize it. Likewise, very similarly, Matt says you can take in an entire codebase plus supporting documentation and make changes in improvements. The long
Starting point is 00:07:09 context length opens up fundamentally new opportunities to make ridiculously powerful developer tools. A more personalized use case Matt suggests is pass in dozens of full articles and get a personalized summary of the day's news. Summing up, Matt says, we're moving towards a world of infinite context lengths. Massive new possibilities are opening up. Get ready. Now, one more that I wanted to share from someone who's actually thinking about this on the front lines of an application they're coding. This is Nate Chan, and he built an application called Storytime AI a few weeks ago. I love Storytime AI. I think it's a super cool idea, and it actually is similar to an experiment I'm doing with my four-year-old daughter. So basically, what Storytime AI is, is it's an interface for using GPT to create.
Starting point is 00:07:52 bedtime stories for your kids. You can toggle a bunch of different options in terms of how old they are and how long you want the app to be. Now, the experiment that I'm running that I'll have a video about soon is basically my four-year-old daughter and I are taking a character that she creates from her head. We turn it into a visual with mid-journey, and then we use chat GPT in a mediated interface between her and chat GPT to write a story together. So it's very similar kind of instincts. I think a lot of parents of young kids are having that idea. But the relevant part for our conversation today is that, writes, I built this iOS app mostly over the course of a week, and it includes a nice handful of features and some simple API and some simple API calls and user interfaces. GPT4 can write straight line code pretty well, and Auto-GPT will eventually be able to stitch code and project files together with agents. I'm optimistic. GPT4's API will soon be available with a 32K token context window. According to Open AI's tokenizer tool, MyApps codebase is only 25,000. and tokens. It feels like an opportunity is brewing where a combination of GPT432K and AutoGPT will be able
Starting point is 00:09:00 to entirely write my app and further debug and add features with the full scope of the project in one GPT4 API call. Pretty insane. We don't need GPT5 right now. With multimodal input in 32K context length, not even public yet, GPT4 is just getting started. So what Nade's talking about is the ability to, in a single API call, input the entirety or create the entirety of the code needed for this application in terms of building it and debugging it, which is just pretty amazing to see. Now, what he's talking about with this idea of auto-GPT stitching code together, other people are talking about as well. Philippe Scheiber here says trying to get Baby AGI to write code combining multiple files.
Starting point is 00:09:44 Not quite there yet, but it seems possible. This is with GPT 3.5, so 4K tokens is the context. window. We'd love to try this in GPT4 with 32K tokens, anxiously waiting. So again, what's going on here is the developers are seeing that just around the corner, this increase in the token limits, is going to dramatically change the nature of the applications that GBT can help code. And that I think is why coming back to McKay and his original post that got this whole thing started, it's such a bigger deal than even this shift from GPT 3.5 to GPT4. It's really going to open up. up this entirely new set of use cases. But as if that weren't enough, there's already research
Starting point is 00:10:27 showing that we could be looking at something entirely different and an entirely different scope in the future. This research just came out called scaling transformer to one million tokens and beyond with RMT. It's about a process called recurrent memory transformer that can retain information up to two million tokens. Cigar here says this could fundamentally upend the current state of L-M-O-M-Ps. But remember, it is just a research paper so far. Mar Golan writes, is this the future of large language models with unlimited tokens? Recurrent memory transformer is able to retain information across up to 2 million tokens. What? Nymar continues. During inference, the model effectively used memory for up to 4,096 segments with a total length of 2,048,000 tokens, significantly exceeding the largest
Starting point is 00:11:15 input size reported for transformer models. Most importantly, this augmentation maintains the base model's memory size at 3.6 gigabytes in their experiments. But here's the real juice. Itamar writes pros and cons. Pros and cons. Pros, same memory consumption, almost unlimited length. Cons decrease quality, probably the same decay patterns we observed in RNNs, also potentially very long inference time. To summarize, he writes, this is still not a revolution, but it may become the foundation for the next paradigms that will achieve much longer, allegedly unlimited token limitations. I want to sum up going back to Nate Chan's assertion here. We don't need GPT-5 right now.
Starting point is 00:11:55 With 32K context length, not even public yet, GPT4 is just getting started. I think that when you look at the type of applications that people are imagining, inputting entire books or research papers or legal contracts, and having that whole context be able to be spit back out, the expansion of token limits really does seem like one of the immediate term frontiers that could totally transform even already how we are using these tools. Now, unfortunately for now, we don't know when exactly GPT-432K will come to everyone, but I'm starting to see more and more people who have access to it, who are playing around with it, and who are consequently writing threads about all the cool things you can do with it. So that says to me that it
Starting point is 00:12:35 can't be all that far on the horizon. I will certainly keep you up to date with all of the most interesting applications when I see them. And you better believe that you. I will be thinking of some to use on my own. That's it for now, guys. Until next time, appreciate you listening or watching. Peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.