The AI Daily Brief: Artificial Intelligence News and Analysis - Why OpenAI's API Updates Will Change How We Use ChatGPT

Episode Date: June 14, 2023

OpenAI has announced a set of API updates including lower prices, a larger 16k context window, and something they're calling function calling. On today's episode, NLW explains why function calling in ...particular is such a big deal. Before that on the Brief, updates from Adobe and Meta as well as a new superchip and HuggingFace partnership for AMD.  The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI breakdown, we're discussing OpenAI's new API announcement. Before that on the brief, Meta announces a new image model, Adobe launches a new tool for illustrator, and AMD tries to catch up to Invidio with a new super chip. The AI breakdown is a daily podcast and video about the most important news and stories in AI. Like, subscribe and share, and go to Breakdown.network for more information. Welcome back to the AI breakdown brief. All the AI headline news you need in five minutes or less, and once again, it is going to be a challenge to fit it in five minutes.
Starting point is 00:00:33 Yesterday, Matt Wolfe tweeted Tuesdays, what is it about Tuesdays that makes all the AI news flood in at once? And indeed, there was even more news that I could fit in my first five. So what we're going to do today is go through the most important announcements, starting with a few that didn't even make it into that top five. The first is that we've got another leaked AI document, this time from Amazon, and it's all about how the company sees opportunities to use ChatGBTBT and other AI at work. So what I find interesting about this is that there are many corporations around the world that are currently creating policies that effectively amount to you can't use AI.
Starting point is 00:01:07 And it's not just Luddite companies, right? Samsung and Apple have both created really strict prohibitions on what tools their employees can use, and understandably so. There are concerns around information privacy and proprietary trade secrets and all of that. But this document from Amazon that was obtained by business insider is called generative AI, chat GPT impact and opportunity analysis. It was apparently created by managers at Amazon after they started asking employees to come up with ideas for how to use AI chatbot tech to improve not only Amazon products but also how they work internally. There are in this document 67 different ideas. They range from using ChatGPT to generate software code and marketing materials to creating an engineering app that could answer questions related to AWS services, to developing a ChatGPT style search bar for Amazon shoppers that can explain pros and cons between brands and site and summarize user review. In an email response to Insider, a representative from Amazon said,
Starting point is 00:02:01 though still in its very early days, we are investing in generative AI across all of our businesses and have a significant number of unique capabilities that we already offer or that we're working hard to bring to customers in the near future. Now, this does come just a few months after Amazon's lawyers recommended that employees not share confidential information with chat GPT, so I guess we'll have to see if the managers or the lawyers hold sway when it comes to what the company actually does. Next up, we've been talking a lot recently about Enterprise AI,
Starting point is 00:02:27 and Oracle's Larry Ellison has just confirmed on their earnings call that they are now partnering with new generative AI service Cohere. On the call, he said, cohere and Oracle are working together to make it very, very easy for enterprise customers to train their own specialized large language models while protecting the privacy of their training data. And then confirming exactly what we were saying yesterday, he said, over the next few years, lots of companies are going to train their own specialized large language models. Next up, a cool tool release from Adobe. Yesterday, they held their annual Max event in London and one of the new tool. tools they announced was generative recolor. This is a tool for Illustrator that allows people to use text to modify vector images. The value proposition is that sometimes people who are working with vector artwork need variations, either because they're trying to find the best version, or because they
Starting point is 00:03:12 simply need a lot of different versions of a thing for some sort of branded content. So the benefits here are quicker experimentation, easier modification, and different color and image combinations for unique applications. Now, speaking of image generation, it's a day that ends in why, which means that Meta has released yet another piece of open source AI research. This one is called IJPA, and it's a new model for AI image generation that meta claims is more human-like. IJPA stands for image joint embedding predictive architecture. Rather than comparing pixels, as do some other image generation models, IJPA learns by creating what they call an internal model of the outside world. The way of example, Meta's research page shows four images in which
Starting point is 00:03:50 they gave the model an image that had part of it removed. So for example, a dog's head without the eyes in the top of the nose, a bird that was missing its feet, a wolf that was missing its legs, and a building that was missing part of the structure. The model then produces a sketch of what it thinks should be in the missing slot, and based on their research does a good job of recognizing what should go in those missing parts of the image. Alpha Signal AI sums up some of the implications and takeaways. He writes, IJPEC can be used for many different applications without fine tuning and is highly scalable, and the model predicts missing information at a high level of abstraction avoiding generative model limitations. Next up, one really
Starting point is 00:04:25 cool little one. One of the big rate limiting factors for startups right now who are developing new approaches to AI or new models is just their access to compute. We've heard over and over again about how hard getting GPUs is. It's part of why Nvidia has gone up so much in value this year. And as we heard from Sam Altman in that developer meeting a couple weeks ago, it's a huge limiting factor for companies like OpenAI who are changing their product release schedule because of the availability or lack thereof of computing power. Well, former GitHub CEO Nat Freeman and his frequent investment collaborator Daniel Gross have set up a new 10 exaflop cluster for startups that they call the Andromeda cluster. In a note on Twitter, they say it's available for experiments,
Starting point is 00:05:03 training runs, and inference, no minimum duration in what they call superb pricing, and big enough to train Lama 65 billion parameters in around 10 days. Strikes me is a very cool value ad for investors to bring to their startup ecosystem. Now, speaking of OpenAI, they made a huge announcement yesterday with a number of API updates, including what they call functional calling, but that is going to be the subject for the main AI breakdown, so check out that video, which was released just a little bit after this one. And then, of course, there is AMD. Now, as we have discussed over and over on this show, one of the big stories of 2023, especially for public markets, has been the rise of Nvidia. By basically any metric, Nvidia absolutely dominates the market for AI chips. Analyst put their market share at
Starting point is 00:05:46 somewhere around 80%. There are, however, a few other players in the space, and of them, AMD is one of the most significant. Earlier this year, AMD saw a big pop in their stock price when there were rumors that they were working with Microsoft on their Project Athena, which was a new AI chip project, although ultimately Microsoft denied that rumor. But now we're getting a few more details about how AMD plans to try to counter Nvidia's dominance. While Nvidia had previously announced its MI300X chip, we got a lot more information about it yesterday. AMD's CEO Lisa Sue said that the chip and its architecture were designed specifically for LLMs and AI models. The chip can use up to 192 gigabytes of memory, as compared to the H-100's 120 gigabytes of memory. At the demo yesterday,
Starting point is 00:06:27 they showed the MI300x running a 40 billion parameter model that's called Falcon. Trying to keep parity with other chip developers, AMD also said that it's offering what they call an infinity architecture that combines eight of the chip accelerators into one system, and they've also announced a new software suite called Rock M, which competes with Nvidia's Kuda software package that has historically been one of the reasons why AI developers preferred Nvidia chips over AMD. Now, a lot of the mainstream financial analysis basically took all of AMD's announcements as them just trying to catch up to Nvidia and Nvidia having kind of too big of a lead for them to overcome. However, one thing that many in the developer community took note of was that they were also
Starting point is 00:07:04 announcing a partnership with Hugging Face to tap into the open source community to accelerate development of both CPU and GPU models. In their announcement post, Hugging Face writes, whether language models, large language models or foundation models, Transformers require significant computation for pre-training, fine-tuning, and inference. To help developers and organizations get the most performance bang for their infrastructure bucks, Hugging Face has long been working with hardware companies to leverage acceleration features present on their respective chips. Today, we're happy to announce that AMD has officially joined our hardware partner program. This partnership is excellent news for the Hugging Face community, which will soon benefit from the latest
Starting point is 00:07:37 AMD platforms for training and inference. The selection of deep learning hardware has been limited for years, and prices and supply are growing concerns. This new partnership will do more than match the competition and help alleviate market dynamics. It should also set new cost performance standards. You might remember a few weeks ago when that Google memo dropped, it argued that companies like Google and Open AI were going to get beat ultimately by open source approaches to developing AI. Could AMD's partnership with Hugging Face, which is at the very epicenter of the AI open source movement, actually make a difference in their fight against Nvidia? Hard to say, but it's also hard not to welcome the new competition and the new approach to it. Anyways, guys, that is it for today's
Starting point is 00:08:13 AI Breakdown Brief. If you're enjoying, please like, subscribe, and share and hit that notification button so you never miss an episode. And I'll be back soon with the main AI breakdown, which is all about why this new OpenAI API announcement is actually very significant and reflective of a change of phase for the overall AI space. We're moving from the era of novelty to the era of real utility. Welcome back to the AI breakdown. Today, we're talking about OpenAI's big API announcement from yesterday. And while nominally this was focused on developers, I actually think it's reflective of a much broader change. Professor Ethan Mollick tweeted yesterday, it's important to remember that AI is advancing very quickly and you shouldn't mistake current capabilities for the ones LLMs will
Starting point is 00:08:56 have in months. Like today, OpenAI just released a much faster, cheaper version of GPT and a better way for the AI to work with other systems. So what we're going to do today is go through that announcement and specifically talk about why it matters not just for developers, but also for end users. The announcement post was called Function Calling and Other API Updates. And just from that title, you get a sense of where the emphasis is. However, before we get into function calling, which is undoubtedly the biggest piece of this, let's talk about some of the other updates as well. The company writes,
Starting point is 00:09:26 We released GPT 3.5 Turbo and GBT4 earlier this year. And in only a few short months, we've seen incredible applications built by developers on top of these models. Today, we're following up with some exciting updates. Now, as I mentioned, we're about to talk about function calling in some detail, but the other updates they announced include, one, a much longer context window for GPT 3.5 Turbo. Now, longer context windows are something we've talked a lot about on this show. Context window refers to how many tokens or how much text or information an LLM can ingest in one fell swoop. The longer the context window, then, the longer a piece of information that it can ingest. So, for example, instead of breaking a book into chapters, you could just feed the entire book in at once, depending on how big that context window was.
Starting point is 00:10:12 Obviously, then, there are benefits in the context with which the LLM can interact with that piece of information. Right now, the context window for a user inputting information on chat chpT is 8,000 tokens, which means 4 to 5,000 words on average. Now, that's a lot, but that's not a ton. There are even some major magazine articles that are longer than that, right? Now, throughout much of this year, the big conversation has been around a 32K context, window for Gpt4. However, we've heard that one of the reasons that OpenAI hasn't been able to push forward with that 32K context window is just that they're dealing with the same thing that everyone else is dealing with, which is a shortage of computing power. In meetings with
Starting point is 00:10:49 developers as part of his world tour, OpenAI CEO Sam Altman said basically that, well, the technology might be there, the GPUs just aren't. Now, when it comes to GBT 3.5 Turbo, the developers were using, they only had a standard 4K context window. It was big news then yesterday when they announced a 16,000 context version of GPT 3.5 Turbo. That's obviously four times longer. That means it can accommodate about 20 pages of text in a single request. Now, on top of that, they're also reducing their pricing. OpenAI's most popular embeddings model is having its cost reduced by 75% to 0.000000 tokens, and the cost of GPT3.5 Turbo's input tokens is going down by 25%. OpenAI writes,
Starting point is 00:11:32 developers can now use this model for just 0.0015 per 1,000 input tokens and 0.002 per 1,000 output tokens, which equates to roughly 700 pages per dollar. GBT 3.5 Turbo 16K is priced at exactly double that. So if the announcement were just that, it would probably be enough to get developers really excited, but that was far from the only part of the announcement. In fact, the main part of the announcement was what they call function calling. OpenAI writes, Developers can now describe functions to GPT4 and GPT 3.5 Turbo and have the model intelligently choose to output
Starting point is 00:12:06 a JSON object containing arguments to call those functions. This is a new way to more reliably connect GPT's capabilities with external tools and APIs. These models have been fine-tuned to both detect when a function needs to be called, depending on the user's input, and to respond with JSON that adheres to the function's signature. Function calling allows developers to more reliably get structured data back from the model. So if you are not a developer, that could sound like Latin. Here's maybe a simplified way to think about this. When people are interfacing with LLMs, they're interfacing via natural language.
Starting point is 00:12:36 They're saying things like, what is the weather in New York right now? However, when computers talk to each other, they don't speak in natural language. They speak in structured data. JSON stands for JavaScript object notation. It's a lightweight data exchange format that's used primarily to help data move between a server and a web application. So, for example, a JSON object that represents a person's information might be organized into a nested structure such as name, age, hobby, profession, or address, which might then underneath address have a number of subfields, including
Starting point is 00:13:07 streets, city, or country. JSON is language independent, which means it can be used with various programming languages. So a simple way to think about function calling in the context of open AI or GPT is that it allows developers to automatically translate natural language inputs from users into functions that can query external APIs or sources of data in a structured language that computers speak. Those external sources of data or APIs can then send back the relevant information, and then the AI can interpret the structured results and turn it back into natural language for its answer. Developer Alex Volkov writes, you know how many folks struggled to get a JSON output consistently for the use of agents and other stuff? Well, OpenAI took it
Starting point is 00:13:46 one step further and gave us function calls. Alex points out, first of all, that this is something the developers have had to develop complicated workarounds for. Alex writes, OpenAI said, why not just provide our API with your function and what it needs to get his arguments, and the model will return the right function call. He concludes running to try this out. Seems like a major shift in the developer experience for these models, and we essentially are getting the benefits of the plugin ecosystem into the API calls. This is an analogy I've heard kind of a lot.
Starting point is 00:14:13 Basically, what plugins do for chat chip ET is they allow the user to point to specific sources of information in order to get chat chipt to contextualize whatever the request is, whether it's a summarization or something else in the context of that source of data. This means you can do things like pull-in basketball information as with basketball stats, or magic card information as with Magic Codex, MLS information as with Zillow, or stock market or crypto data as with a ton of different plugins that have been released. Now, some of these are just really novelty. For example, the creature generator for role-playing games.
Starting point is 00:14:46 Yesterday it generated something called a Frost River for me, which is apparently a fearsome creature that inhabits icy tundras, and which has 20 strength, 10 dexterity, 18 constitution, 4 intelligence, 12 wisdom, and 6 charisma. Some of them are trying to be useful. For example, Instacart, I wrote, could you suggest a paleo meal plan for a family of four for one week? It gave me that and then asked if it wanted me to generate a shopping list with these meals using the Instacart plugin, from which I can click and then go to Instacart. Now, in this one I say trying to be useful, because one of the big open questions is the extent to which
Starting point is 00:15:16 most of these plugin creators actually want their experience to be in chat GPT, or they want want ChatGPT to be in their app. That's the way that Sam Altman has put it. And then there are some that are just dead on useful now. And what I've found is most common to those is that they are simply the plugins that point to very specific pieces of information. The one that I use most often because of this podcast is XPapers, which allows ChatGPT to access all of the research on archive. So if that's how the external facing consumer experience of ChatGPT is evolving, in other words, plugins giving us the ability to point ChatGPT to specific information sources, function calling is effectively that for developers.
Starting point is 00:15:52 So the examples that they give of what developers can do with this include creating chatbots that answer question by calling external tools, such as chatGBT plugins. For example, they write converting queries such as email Anya to see if she wants to get coffee next Friday to a function that is actually sending that email, or asking what's the weather in Boston to a function that goes and pulls the current weather from some particular API source.
Starting point is 00:16:15 Another is converting natural language into API calls or database queries, So think businesses that have put proprietary information in the form of charts or spreadsheets or customer data into chat GPT. This would allow for things like converting who are my top 10 customers this month to an internal API call such as get customers by revenue. Finally, they suggest this could extract structured data from text. The example they gave is defining a function called extract people data to extract all the people mentioned in a particular Wikipedia article.
Starting point is 00:16:42 So to drill down even more, they use that example of what's the weather like in Boston right now. Step one is that OpenAI's API would recognize the function that was trying to be called by the user's input. Step two is that it would structure that data and send it to the third-party API. And then step three is that OpenAI's API would get the response back and then summarize it once again in natural language. So what does this mean for end users? For those of us who are not developers,
Starting point is 00:17:06 what it means is that the developers who are building on the OpenAI API API and using GPT for their applications now have a much more powerful and native tool to actually build things that are useful for us, that have specific functionality, that are less likely to hallucinate because they're being pointed to specific information in structured ways,
Starting point is 00:17:26 even though they're returning to us information in the natural language that makes this tool so appealing and human feeling. When most people experience chat GPT for the first time, they asked it to write a poem about cats or summarize some history like it was a Taylor Swift song. Yes, I'm speaking from personal experience there. Those things are novel, they show off the capacity of the tool,
Starting point is 00:17:45 but it's different than it being actually useful. Now, of course, legions of people have come together to start creating content about how to use chat GPT in ways that are much more effective. And of course, people all over the world are using chat GPT for their businesses or their hobbies. So it's not to suggest that there isn't utility already. But when it comes to what people are building on this,
Starting point is 00:18:04 I think this represents a major shift in the capacity of the development tools to move from novelty to utility and really powerful utility in ways that I expect to produce an incredible, new wave of applications. Now, interestingly, this comes exactly at the same time as some people are starting to say, maybe we've been a little overhyped about generative AI and what chat GPT can do. My guess is that this answers some of that skepticism in a pretty convincing way, in pretty
Starting point is 00:18:31 short order. So again, we return to the Ethan Mollick tweet from whence we started. It is important to remember that AI is advancing very quickly, and you shouldn't mistake current capabilities for the ones LLMs will have in months. That's it for today's AI breakdown. Hopefully this was useful, hopefully this got you excited about what OpenAI's new API updates might mean. If you're enjoying the content, please like, subscribe, and share it. Click the notification button to not miss any episodes. Subscribe to the podcast and the newsletter version. You can find all of that information on breakdown.network. I appreciate you listening or watching. And until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.