The AI Daily Brief: Artificial Intelligence News and Analysis - Why OpenAI's API Updates Will Change How We Use ChatGPT
Episode Date: June 14, 2023OpenAI has announced a set of API updates including lower prices, a larger 16k context window, and something they're calling function calling. On today's episode, NLW explains why function calling in ...particular is such a big deal. Before that on the Brief, updates from Adobe and Meta as well as a new superchip and HuggingFace partnership for AMD. The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're discussing OpenAI's new API announcement.
Before that on the brief, Meta announces a new image model,
Adobe launches a new tool for illustrator, and AMD tries to catch up to Invidio with a new super chip.
The AI breakdown is a daily podcast and video about the most important news and stories in AI.
Like, subscribe and share, and go to Breakdown.network for more information.
Welcome back to the AI breakdown brief.
All the AI headline news you need in five minutes or less,
and once again, it is going to be a challenge to fit it in five minutes.
Yesterday, Matt Wolfe tweeted Tuesdays,
what is it about Tuesdays that makes all the AI news flood in at once?
And indeed, there was even more news that I could fit in my first five.
So what we're going to do today is go through the most important announcements,
starting with a few that didn't even make it into that top five.
The first is that we've got another leaked AI document, this time from Amazon,
and it's all about how the company sees opportunities to use ChatGBTBT and other AI at work.
So what I find interesting about this is that there are many corporations around the world that are currently creating policies that effectively amount to you can't use AI.
And it's not just Luddite companies, right?
Samsung and Apple have both created really strict prohibitions on what tools their employees can use, and understandably so.
There are concerns around information privacy and proprietary trade secrets and all of that.
But this document from Amazon that was obtained by business insider is called generative AI, chat GPT impact and opportunity analysis.
It was apparently created by managers at Amazon after they started asking employees to come up with ideas for how to use AI chatbot tech to improve not only Amazon products but also how they work internally.
There are in this document 67 different ideas. They range from using ChatGPT to generate software code and marketing materials to creating an engineering app that could answer questions related to AWS services,
to developing a ChatGPT style search bar for Amazon shoppers that can explain pros and cons between brands and site and summarize user review.
In an email response to Insider, a representative from Amazon said,
though still in its very early days, we are investing in generative AI across all of our businesses
and have a significant number of unique capabilities that we already offer
or that we're working hard to bring to customers in the near future.
Now, this does come just a few months after Amazon's lawyers recommended that employees
not share confidential information with chat GPT,
so I guess we'll have to see if the managers or the lawyers hold sway when it comes to
what the company actually does.
Next up, we've been talking a lot recently about Enterprise AI,
and Oracle's Larry Ellison has just confirmed on their earnings call that they are now partnering with new generative AI service Cohere.
On the call, he said, cohere and Oracle are working together to make it very, very easy for enterprise customers to train their own specialized large language models while protecting the privacy of their training data.
And then confirming exactly what we were saying yesterday, he said, over the next few years, lots of companies are going to train their own specialized large language models.
Next up, a cool tool release from Adobe.
Yesterday, they held their annual Max event in London and one of the new tool.
tools they announced was generative recolor. This is a tool for Illustrator that allows people to use
text to modify vector images. The value proposition is that sometimes people who are working with
vector artwork need variations, either because they're trying to find the best version, or because they
simply need a lot of different versions of a thing for some sort of branded content. So the benefits
here are quicker experimentation, easier modification, and different color and image combinations
for unique applications. Now, speaking of image generation, it's a day that ends in why, which
means that Meta has released yet another piece of open source AI research. This one is called
IJPA, and it's a new model for AI image generation that meta claims is more human-like.
IJPA stands for image joint embedding predictive architecture. Rather than comparing pixels,
as do some other image generation models, IJPA learns by creating what they call an internal
model of the outside world. The way of example, Meta's research page shows four images in which
they gave the model an image that had part of it removed. So for example, a dog's head without the
eyes in the top of the nose, a bird that was missing its feet, a wolf that was missing
its legs, and a building that was missing part of the structure. The model then produces a sketch
of what it thinks should be in the missing slot, and based on their research does a good job
of recognizing what should go in those missing parts of the image. Alpha Signal AI sums up some of the
implications and takeaways. He writes, IJPEC can be used for many different applications without
fine tuning and is highly scalable, and the model predicts missing information at a high level of
abstraction avoiding generative model limitations. Next up, one really
cool little one. One of the big rate limiting factors for startups right now who are developing
new approaches to AI or new models is just their access to compute. We've heard over and over again
about how hard getting GPUs is. It's part of why Nvidia has gone up so much in value this year.
And as we heard from Sam Altman in that developer meeting a couple weeks ago, it's a huge
limiting factor for companies like OpenAI who are changing their product release schedule
because of the availability or lack thereof of computing power. Well, former GitHub CEO Nat Freeman
and his frequent investment collaborator Daniel Gross have set up a new 10 exaflop cluster for startups that
they call the Andromeda cluster. In a note on Twitter, they say it's available for experiments,
training runs, and inference, no minimum duration in what they call superb pricing, and big enough
to train Lama 65 billion parameters in around 10 days. Strikes me is a very cool value ad for
investors to bring to their startup ecosystem. Now, speaking of OpenAI, they made a huge announcement
yesterday with a number of API updates, including what they call functional calling, but that is going to
be the subject for the main AI breakdown, so check out that video, which was released just a little bit
after this one. And then, of course, there is AMD. Now, as we have discussed over and over on this show,
one of the big stories of 2023, especially for public markets, has been the rise of Nvidia. By basically
any metric, Nvidia absolutely dominates the market for AI chips. Analyst put their market share at
somewhere around 80%. There are, however, a few other players in the space, and of them, AMD is one of the
most significant. Earlier this year, AMD saw a big pop in their stock price when there were rumors that
they were working with Microsoft on their Project Athena, which was a new AI chip project, although
ultimately Microsoft denied that rumor. But now we're getting a few more details about how AMD plans
to try to counter Nvidia's dominance. While Nvidia had previously announced its MI300X chip,
we got a lot more information about it yesterday. AMD's CEO Lisa Sue said that the chip and its
architecture were designed specifically for LLMs and AI models. The chip can use up to
192 gigabytes of memory, as compared to the H-100's 120 gigabytes of memory. At the demo yesterday,
they showed the MI300x running a 40 billion parameter model that's called Falcon. Trying to keep parity
with other chip developers, AMD also said that it's offering what they call an infinity
architecture that combines eight of the chip accelerators into one system, and they've also announced
a new software suite called Rock M, which competes with Nvidia's Kuda software package that has
historically been one of the reasons why AI developers preferred Nvidia chips over AMD. Now, a lot of
the mainstream financial analysis basically took all of AMD's announcements as them just trying to
catch up to Nvidia and Nvidia having kind of too big of a lead for them to overcome.
However, one thing that many in the developer community took note of was that they were also
announcing a partnership with Hugging Face to tap into the open source community to accelerate
development of both CPU and GPU models. In their announcement post, Hugging Face writes,
whether language models, large language models or foundation models, Transformers require significant
computation for pre-training, fine-tuning, and inference. To help developers and organizations get
the most performance bang for their infrastructure bucks, Hugging Face has long been working with
hardware companies to leverage acceleration features present on their respective chips. Today, we're
happy to announce that AMD has officially joined our hardware partner program. This partnership is
excellent news for the Hugging Face community, which will soon benefit from the latest
AMD platforms for training and inference. The selection of deep learning hardware has been limited for
years, and prices and supply are growing concerns. This new partnership will do more than match the
competition and help alleviate market dynamics. It should also set new cost performance standards.
You might remember a few weeks ago when that Google memo dropped, it argued that companies like
Google and Open AI were going to get beat ultimately by open source approaches to developing AI.
Could AMD's partnership with Hugging Face, which is at the very epicenter of the AI open source
movement, actually make a difference in their fight against Nvidia? Hard to say, but it's also
hard not to welcome the new competition and the new approach to it. Anyways, guys, that is it for today's
AI Breakdown Brief. If you're enjoying, please like, subscribe, and share and hit that notification
button so you never miss an episode. And I'll be back soon with the main AI breakdown, which is all
about why this new OpenAI API announcement is actually very significant and reflective of a
change of phase for the overall AI space. We're moving from the era of novelty to the era of real
utility. Welcome back to the AI breakdown. Today, we're talking about OpenAI's big API announcement
from yesterday. And while nominally this was focused on developers, I actually think it's reflective
of a much broader change. Professor Ethan Mollick tweeted yesterday, it's important to remember that
AI is advancing very quickly and you shouldn't mistake current capabilities for the ones LLMs will
have in months. Like today, OpenAI just released a much faster, cheaper version of GPT and a better way
for the AI to work with other systems. So what we're going to do today is go through that announcement
and specifically talk about why it matters not just for developers, but also for end users.
The announcement post was called Function Calling and Other API Updates.
And just from that title, you get a sense of where the emphasis is.
However, before we get into function calling, which is undoubtedly the biggest piece of this,
let's talk about some of the other updates as well.
The company writes,
We released GPT 3.5 Turbo and GBT4 earlier this year.
And in only a few short months, we've seen incredible applications built by developers on top of these models.
Today, we're following up with some exciting updates.
Now, as I mentioned, we're about to talk about function calling in some detail, but the other updates they announced include, one, a much longer context window for GPT 3.5 Turbo.
Now, longer context windows are something we've talked a lot about on this show.
Context window refers to how many tokens or how much text or information an LLM can ingest in one fell swoop.
The longer the context window, then, the longer a piece of information that it can ingest.
So, for example, instead of breaking a book into chapters, you could just feed the entire book in at once, depending on how big that context window was.
Obviously, then, there are benefits in the context with which the LLM can interact with that piece of information.
Right now, the context window for a user inputting information on chat chpT is 8,000 tokens, which means 4 to 5,000 words on average.
Now, that's a lot, but that's not a ton.
There are even some major magazine articles that are longer than that, right?
Now, throughout much of this year, the big conversation has been around a 32K context,
window for Gpt4. However, we've heard that one of the reasons that OpenAI hasn't been able to
push forward with that 32K context window is just that they're dealing with the same thing
that everyone else is dealing with, which is a shortage of computing power. In meetings with
developers as part of his world tour, OpenAI CEO Sam Altman said basically that, well, the technology
might be there, the GPUs just aren't. Now, when it comes to GBT 3.5 Turbo, the developers
were using, they only had a standard 4K context window. It was big news then yesterday when they
announced a 16,000 context version of GPT 3.5 Turbo. That's obviously four times longer.
That means it can accommodate about 20 pages of text in a single request. Now, on top of that,
they're also reducing their pricing. OpenAI's most popular embeddings model is having its cost
reduced by 75% to 0.000000 tokens, and the cost of GPT3.5 Turbo's input tokens is going down by
25%. OpenAI writes,
developers can now use this model for just 0.0015 per 1,000 input tokens
and 0.002 per 1,000 output tokens, which equates to roughly 700 pages per dollar.
GBT 3.5 Turbo 16K is priced at exactly double that.
So if the announcement were just that, it would probably be enough to get developers really excited,
but that was far from the only part of the announcement.
In fact, the main part of the announcement was what they call function calling.
OpenAI writes,
Developers can now describe functions to GPT4 and GPT 3.5 Turbo and have the model intelligently choose to output
a JSON object containing arguments to call those functions.
This is a new way to more reliably connect GPT's capabilities with external tools and APIs.
These models have been fine-tuned to both detect when a function needs to be called,
depending on the user's input, and to respond with JSON that adheres to the function's signature.
Function calling allows developers to more reliably get structured data back from the model.
So if you are not a developer, that could sound like Latin.
Here's maybe a simplified way to think about this.
When people are interfacing with LLMs, they're interfacing via natural language.
They're saying things like, what is the weather in New York right now?
However, when computers talk to each other, they don't speak in natural language.
They speak in structured data.
JSON stands for JavaScript object notation.
It's a lightweight data exchange format that's used primarily to help data move between a server and a web application.
So, for example, a JSON object that represents a person's
information might be organized into a nested structure such as name, age, hobby, profession,
or address, which might then underneath address have a number of subfields, including
streets, city, or country. JSON is language independent, which means it can be used with various
programming languages. So a simple way to think about function calling in the context of open
AI or GPT is that it allows developers to automatically translate natural language inputs from
users into functions that can query external APIs or sources of data in a structured
language that computers speak. Those external sources of data or APIs can then send back the
relevant information, and then the AI can interpret the structured results and turn it back into
natural language for its answer. Developer Alex Volkov writes, you know how many folks
struggled to get a JSON output consistently for the use of agents and other stuff? Well, OpenAI took it
one step further and gave us function calls. Alex points out, first of all, that this is something
the developers have had to develop complicated workarounds for. Alex writes, OpenAI said,
why not just provide our API with your function and what it needs to get his arguments,
and the model will return the right function call.
He concludes running to try this out.
Seems like a major shift in the developer experience for these models,
and we essentially are getting the benefits of the plugin ecosystem into the API calls.
This is an analogy I've heard kind of a lot.
Basically, what plugins do for chat chip ET is they allow the user to point to specific sources
of information in order to get chat chipt to contextualize whatever the request is,
whether it's a summarization or something else in the context of that source of data.
This means you can do things like pull-in basketball information as with basketball stats,
or magic card information as with Magic Codex,
MLS information as with Zillow, or stock market or crypto data as with a ton of different plugins that have been released.
Now, some of these are just really novelty.
For example, the creature generator for role-playing games.
Yesterday it generated something called a Frost River for me,
which is apparently a fearsome creature that inhabits icy tundras,
and which has 20 strength, 10 dexterity, 18 constitution, 4 intelligence, 12 wisdom, and 6 charisma.
Some of them are trying to be useful.
For example, Instacart, I wrote, could you suggest a paleo meal plan for a family of four for one week?
It gave me that and then asked if it wanted me to generate a shopping list with these meals using the Instacart plugin,
from which I can click and then go to Instacart.
Now, in this one I say trying to be useful, because one of the big open questions is the extent to which
most of these plugin creators actually want their experience to be in chat GPT, or they want
want ChatGPT to be in their app. That's the way that Sam Altman has put it. And then there are some
that are just dead on useful now. And what I've found is most common to those is that they are
simply the plugins that point to very specific pieces of information. The one that I use most often
because of this podcast is XPapers, which allows ChatGPT to access all of the research on archive.
So if that's how the external facing consumer experience of ChatGPT is evolving, in other words,
plugins giving us the ability to point ChatGPT to specific information sources,
function calling is effectively that for developers.
So the examples that they give of what developers can do with this include
creating chatbots that answer question by calling external tools,
such as chatGBT plugins.
For example, they write converting queries such as email Anya
to see if she wants to get coffee next Friday to a function
that is actually sending that email,
or asking what's the weather in Boston to a function
that goes and pulls the current weather from some particular API source.
Another is converting natural language into API calls or database queries,
So think businesses that have put proprietary information in the form of charts or spreadsheets
or customer data into chat GPT.
This would allow for things like converting who are my top 10 customers this month to an
internal API call such as get customers by revenue.
Finally, they suggest this could extract structured data from text.
The example they gave is defining a function called extract people data to extract all
the people mentioned in a particular Wikipedia article.
So to drill down even more, they use that example of what's the weather like in Boston
right now.
Step one is that OpenAI's API would recognize the function that was trying to be called by the user's input.
Step two is that it would structure that data and send it to the third-party API.
And then step three is that OpenAI's API would get the response back
and then summarize it once again in natural language.
So what does this mean for end users?
For those of us who are not developers,
what it means is that the developers who are building on the OpenAI API API
and using GPT for their applications
now have a much more powerful and native tool
to actually build things that are useful for us,
that have specific functionality,
that are less likely to hallucinate
because they're being pointed to specific information
in structured ways,
even though they're returning to us information
in the natural language
that makes this tool so appealing and human feeling.
When most people experience chat GPT for the first time,
they asked it to write a poem about cats
or summarize some history like it was a Taylor Swift song.
Yes, I'm speaking from personal experience there.
Those things are novel, they show off the capacity of the tool,
but it's different than it being actually useful.
Now, of course, legions of people have come together
to start creating content about how to use chat GPT
in ways that are much more effective.
And of course, people all over the world
are using chat GPT for their businesses or their hobbies.
So it's not to suggest that there isn't utility already.
But when it comes to what people are building on this,
I think this represents a major shift
in the capacity of the development tools
to move from novelty to utility
and really powerful utility
in ways that I expect to produce an incredible,
new wave of applications. Now, interestingly, this comes exactly at the same time as some people
are starting to say, maybe we've been a little overhyped about generative AI and what chat GPT
can do. My guess is that this answers some of that skepticism in a pretty convincing way, in pretty
short order. So again, we return to the Ethan Mollick tweet from whence we started. It is important
to remember that AI is advancing very quickly, and you shouldn't mistake current capabilities for
the ones LLMs will have in months. That's it for today's AI breakdown. Hopefully this was
useful, hopefully this got you excited about what OpenAI's new API updates might mean. If you're
enjoying the content, please like, subscribe, and share it. Click the notification button to not
miss any episodes. Subscribe to the podcast and the newsletter version. You can find all of that
information on breakdown.network. I appreciate you listening or watching. And until next time,
peace.
