The AI Daily Brief: Artificial Intelligence News and Analysis - OpenAI DevDay: Everything You Need To Know
Episode Date: November 7, 2023Yesterday OpenAI announces 128k GPT-4 Turbo at 1/3rd the price; a new Text-to-Speech model; Whisper 3; and proto-agent features like the Assistants API and Custom GPTs. Today's Sponsors: Listen to th...e chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts or here: https://link.chtbl.com/xz5kFVEK?sid=AIBreakdown Interested in the opportunity mentioned in today's show? jobs@breakdown.network ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're talking about all of the biggest announcements from OpenAI's Dev Day, which happened yesterday.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown. Network for more information about our YouTube, our Discord, and our newsletter.
Welcome back to the AI breakdown.
We are back in studio, finally, after a few days of travel.
But today will still be a slightly different episode than normal.
There is so much to cover from yesterday's OpenAI Dev Day,
as well as a little follow-up from the weekend,
that instead of having the normal convention
of a brief followed by a main episode,
we will just be doing the main episode today.
Our normal format will be back, I anticipate, tomorrow.
Today we are getting into everything you need to know
from OpenAI's Dev Day, which was held yesterday in San Francisco.
But I would be remiss if we didn't at least mention the announcement from over the weekend,
which was, of course, Elon Musk and XAI announcing their new GROC.
Now, you might have caught the episode that I did entirely with AI,
including a Hey Gen Video Avatar that I posted as a bonus episode over the weekend, so I won't get into
this much, I just wanted to touch on it briefly as myself. The way that the XAI team described
GROC was as an AI modeled after the Hitchhiker's Guide to the Galaxy. So they say intended to answer
almost anything and far harder even suggest what questions to ask. GROC, they say, is designed to answer
questions with a bit of wit and has a rebellious streak, so please don't use it if you hate humor.
A unique and fundamental advantage of GROC is that it has a real-time knowledge of the world
via the X platform. It will also answer spicy questions that are rejected by most other AI systems.
GROC is still a very early beta product, the best we could do with two months of training,
so expected to improve rapidly with each passing week with your help. So even just in that
announcement, they hit on a number of the main things. The first is the idea of it having humor
and its responses. I mentioned before the post that Elon shared where someone asked Grock
how to make cocaine step by step. And Grock said, oh sure, just a moment while I pull up the recipe
for homemade cocaine, you know, because I'm totally going to help you with that.
The announcement also points out its access to real-time data, which Elon showed off with a set of
questions about Elon's most recent interview on Joe Rogan, which happened just a couple weeks ago.
Now, of course, much has also been made of Elon's anti-woke positioning of the chatbot.
That's something that I noted in that previous piece about just how much that seems to be a clear
emphasis in the way that they're trying to differentiate from chat GPT and all the others.
Now, one thing that got a little bit lost in the announcement was the prompt
IDE. XAI describes it like this. The XAI prompt IDE is an integrated development environment
for prompt engineering and interpretability research. It accelerates prompt engineering through
an SDK that allows implementing complex prompting techniques and rich analytics that
visualize the network's outputs. We originally created prompt IDE to accelerate our development
of GROC. It has helped us iterate quickly over different prompts and prompting techniques. We're now
making the IDE available to members of our GROC Early Access program. Now, I think this is relevant in the
context of today's OpenAI Dev Day focus, given that what that event represents and what we're
seeing more broadly is competition for developer affiliation as these AI models get more advanced,
and especially as closed models, face more competition from their open source peers.
Now, there is going to be a ton more to talk about when it comes to GROC in the coming weeks
I anticipate, but that is really not the focus of today's show. The focus of today's show is, of course,
OpenAI's first developer conference dev day happened in San Francisco yesterday, and there was
so much to cover. In fact, basically the entirety of the AI breakdown first five this morning was
about announcements from that event and there could have been a lot more. So what we're going to actually
do today is break the announcement into four categories. And even with this, there are still some
things that I'm missing, but this should give you the high level of the most important parts of the
announcements and the reactions from the developer and the larger AI community. So first, let's talk about
whisper and text to speech. Whisper is open AI speech to text model. If you've ever used the chat
GPT mobile app, you'll know that Whisper is frankly what you expect Siri to be. It's an unbelievably
accurate model that is unbelievably fast and can even handle environments where there's a lot of background
noise. Whisper is getting a third update which will be released to open source, although it was one of the
only things that wasn't actually available upon the announcement of it, and so should be something that we
get in the weeks to come. Now, in addition to Whisper's speech to text model, we also got a new
text-to-speech model that can convert text into spoken audio. Now, out-of-the-gate text-speech
comes with six highly realistic voices, but they're also apparently going to allow people to have
professional voice cloning of their own voices, as well as creating up to 30 custom voices.
At first glance, AI developer Nate Chan wrote that OpenAI's text-to-speech looked to be around
10 to 20x cheaper than 11 labs, which, if that's true, would obviously have huge implications
for that company, which, as you guys know, is one that I use very frequently. Now, some people have
already begun the comparison. Justin de Guzman writes, just try to open AI text to speech.
Non-HD about the same speed as 11 Labs, HD is slower. Audio quality, even HD, is lower.
11 Labs' expressiveness of voice is still a lot better. No voice cloning yet makes prototyping less fun,
but 10x cheaper is compelling. It's interesting because this is the first time that we've seen
OpenAI really, really compete on price first versus just on quality. Now, we will in just a moment
get to another part of the announcement where again open AI is being at least conscious of cost,
but that's different than just focusing on it as a main competitive advantage.
Just benchmark the new OpenAI text to speech, coming in at around one second for TTS1.
However, they've massively undercut others on price.
Daniel Monge summed it up,
Open AI came out of left field and today revealed an 11 labs-level TTS that costs one-sixth of the
price or 1-12th if you use the standard model.
That's kind of crazy.
Now, this was a big theme of the discourse surrounding the event, summed up quite well by
Stability AIs and Mod Mostok, who wrote, Open Eye killing it, where it is all those dead AI startups.
Now, these whisper and text-to-speech announcements would be huge on any other day, but were totally
overshadowed by some of the other announcements at the event.
The next one that I'll mention is the new GPT4 Turbo.
Now, this was a huge update for the GPT4 API.
First of all, we are clearly heading towards multimodal, given that the Vision API,
API is integrated as is Dolly 3 and this new text-to-speech model, but the biggest news comes in terms of the context window and in terms of the price.
GBT4's API context window used to be 32K, but it is now 128K.
That's effectively as long as most any book that you might read, holding aside, I don't know, Brandon Sanderson novels or something like that.
The cost for input tokens is down 3x and the cost for output tokens is down 2x.
And this is what I was referring to when I said that Open AI is clearly conscious of cost questions when it comes to devise.
developer affiliation. In fact, one of the loudest and most raucous sections of applause at the event
was when Sam Altman announced those price cuts. Every co-founder Dan Shipper pointed out another set of
benefits from the new GPT4 Turbo, including, quote, more control, GPT4 will respond with valid JSON
and can call multiple functions. Better knowledge. The API now comes with retrieval built in,
and knowledge cutoff is now April 2023. He also points out the multimodal that we just talked about
with Dolly, text to speech, and vision all being in the API as well. He also points out that
fine-tuning for GPD4 is coming out today in experimental access. Now, another announcement that
once again went a little bit under the radar is that OpenAI is following the model of a number of
companies in the space, including Microsoft and Adobe, in that they're guaranteeing that they will
pay legal fees for developers who build on top of their platform and ultimately get sued for copyright
infringement. Said Sam Altman, we can defend our customers and pay the costs incurred if you face
legal claims around copyright infringement, and this applies both the ChatGPT Enterprise and the API. This
Copyright Shield program, which is what they're calling it, applies to, as Sam pointed out,
the enterprise users and to developers using the API, but not to free chat GPT or chat GPT plus users.
People are, of course, already racing to figure out how they might use the new expanded context
window. And to get a sense of one of those use cases, let's listen to this short video from
AI content creator and educator, Riley Brown. The new chat GPT is here. Let's break it down.
This video is How to Write. And it has eight million.
With one click, I can download the transcript, I can go back to chat GBT, I can pull up the text file, and the text file is just a giant wall of text.
You see this? This is an hour and 20 minutes of a lecture.
And we're going to hit control A, control C, copy everything, go back to chat GBT, and we are going to type
summarize every section of this video.
Break down every important point.
that entire transcript in here and press enter. Now let's see how this bad boy does. So
here is critique of traditional writing, challenges faced by expert writers, interference in
reader comprehension, and it's basically going through every single part of the
video, creating value in writing, differences in academic writing, and standardized testing.
Now every piece of text that you own now has so much more value with this chat GPT, and this
doesn't even include the feature of creating GPTs where you can
create a chatbot on hundreds of these documents right here. Then you can ask it to expand on any
part, right? So effective problem construction. We're going to copy this right here. And please expand
on this part based on the text I provided earlier. And we're going to hit enter. And this is very,
very just great information for writing scripts for any form of social media or for any
storytelling in any manner. And now a word from today's sponsor. Are you interested in how
two top of mind trends AI and crypto can work together?
If so, I have the perfect podcast recommendation for you.
Web 3 with A16Z Crypto, the chart-topping show brought to you by venture firm Andresen Horowitz.
Web 3 with A16Z Crypto is your definitive resource for the future of the internet.
Whether you're already building in these spaces or simply curious about what's next.
If you need a place to start, they recently released an excellent episode with Stanford Cryptography Professor Dan Bonay
and former Google Xer Aliya in conversation with host Sonal Choxi about the intersection of AI and crypto.
From fighting deepfakes and proving humanity to large language models like chat GPT, they cover it all.
I highly recommend checking it out, especially if you'd like to learn more about how AI and
crypto will impact our everyday lives. Beyond crypto and AI, this show is for creators seeking
more ways to truly own their work, for business leaders trying to prepare for the future today,
and for innovators exploring trending tech topics. So go ahead, listen to Web3 with A16Z
crypto wherever you get your podcasts.
Now, somehow, we still haven't really gotten to the part of the presentation that the most people
were excited about, but that's where we're headed next. In his keynote presentation, OpenAI CEO
Sam Altman talked a lot about the future of AI agents. This has been one of the hottest areas
of AI development throughout the entire year. You might have heard an episode or seen a video
about AutoGPT or Baby AGI or any of the numerous other projects that are trying to build
autonomous agents that can actually go out and solve problems on their own and complete tasks
on behalf of their creators and users.
Albin shared the company's belief that these agents were going to play an increasingly
important role in society and the economy, as well as the company's belief that when it
comes to these types of disruptive changes, the best way to figure out how to adapt to them
is to slowly walk down the path towards them and figure out how to adapt as we get real-life
evidence of what happens. With that in mind, two of the biggest parts of the announcements from
Dev Day were the Assistance API and Custom GPTs, both of which they viewed as very first
steps towards that AI agent future. Let's start with the Assistance API, and in terms of a summary,
I'll share a tweet from Sully Omar that reads, Open AI literally flipped the entire AI landscape on its
head. Most AI code is likely tech debt. Companies have spent hundreds of millions of dollars
building their own assistance API, and now it's available to everyone. This is huge for the little
guys, brutal for the big guys. So what is the assistance API? Well, it's an API for building agent
interfaces. It has a set of core primitives including threading, retrieval, code interpreter, and function
calling. Here's how Conrad Nat sums it up. Threads and messages for each user create a stateful
thread add messages. Improved function calling allows AI to respond with actions on your front end.
Function calling also includes guaranteed JSON output, and retrieval allows for the upload of related documentation.
Finally, code interpreter figures out if there is a need to write custom code, and then can actually go generate those files.
OpenAI writes, the Assistance API allows you to build AI assistance within your own applications.
An assistant has instructions and can leverage models, tools, and knowledge to respond to user queries.
The Assistance API currently supports three types of tools.
Code interpreter, retrieval, and function calling.
In the future, we plan to release more OpenAI-built tools and allow you to provide your own tools on our platform.
At a high level, a typical integration of the Assistance API has the following flow.
One, create an assistant in the API by defining it custom instructions and picking a model.
If helpful, enable tools like code interpreter, retrieval, and function calling.
Two, create a thread when a user starts a conversation.
Three, add messages to the thread as the user asks questions.
Four, run the assistant on the thread to trigger responses.
This automatically calls the relevant tool.
So almost immediately people started hacking.
Yohei, who's built things like Baby AGI, writes,
The Assistance API is awesome, had to build something,
open sourcing GBT versus GPT,
a simple template to have two AI assistants converse.
What's exciting is the ability to extend these
with retrieval data and custom functions.
You set the parameters for two assistants,
then feed that in with the topic and number of messages you want.
You'll get something like this.
Topic, global warming.
Pirates speaking.
Arm 80, the seas be arising, and the winds be a blowing.
and the winds be a blowing, for we be facing the wicked scourge of global warming.
Let's set sail on a treacherous journey to explore this dire threat to our world.
Mermaid speaking.
Like, oh my gosh, global warming is like seriously the worst.
Anyway, kind of a cheesy application, but just something that was thrown together to demonstrate what this could do.
Another application that looks a little bit more like something that someone might use comes from Brian Sunter, who writes,
using the OpenAI Assistance API to make an AI chatbot from my blog post in less than five minutes.
We see a video in the Assistance Builder Playground where Brian writes,
Instructions, you are the public-facing AI assistant for Brian Sunter,
answering questions in the style of the uploaded documents.
Answer questions about the uploaded documents.
The documents are writings from his personal blog.
He then selects the model, GPT4116 preview,
and uploads a set of a dozen files.
Then in the preview section, when a user asks,
what is Brian Sunder's blog about?
The anew assistant API chatbot responds with information that comes from the post that he
uploaded.
It doesn't take much imagination to see how this could be applied to any content creator on the web,
or frankly, any company with products or services that people might want to know more about.
Now, what about limitations? Well, Zhao Aguiam gets into some of those. He writes,
a maximum of 20 file uploads per assistant. Each file can be up to 512 megabytes with a 100 gigabyte
max at the organization level. Function calling has a maximum wait time of 10 minutes for execution.
There's no support for streaming output. Image generation is not supported. You need to call the Dali
3 API separately, image analysis is not supported. You need to call the vision API separately.
Retrieval capabilities do not extend to XLS or CSV files. But still, and this is something
that Bennett's strategy pointed out, as opposed to the way that tech presentations have trended recently,
where products are announced weeks before they're actually available, open AI was by and large,
with the one exception of the Whisper 3 API, actually putting all these tools out into the world as
soon as they had announced them. Given that, there's going to be, I imagine, a lot more patience for
some of those limitations that we just listed. Still, I think for me and for many people,
the most potentially game-changing aspect of the announcement was the announcement of custom GPs,
which are basically customized specific purpose versions of chat GPT that anyone can create with natural
language. Here's how Sam Altman described them. GPs are tailored version of chat GPT for a specific
purpose. You can build a GPT, a customized version of chat GPT for almost anything, with instructions,
expanded knowledge and actions, and then you could publish it for others to use.
use. And because they combine instructions, expanded knowledge, and actions, they can be more helpful
to you. They can work better in many contexts and they can give you better control. They'll make it easier
for you to accomplish all sorts of tasks or just have more fun, and you'll be able to use them right
within chat GPT. You can, in effect, program a chat GPT with language just by talking to it.
OpenAI CTTL Mira Morati writes, GBTs are not omniscient. They're custom versions of chat GPT
tuned for specific tasks, smart tools that I'm certain we won't be able to live without. OpenAI
describes GPTs as a new way for anyone to create a tailored version of chat GBT to be more helpful
in their daily life, at specific tasks, at work, or at home, and then share that creation with
others, no code required. So what are some examples of this? Well, one that Sam Altman himself
built live was a startup mentor that used previous speeches of his from his time leading Y Combinator
to help give founders of startups advice that he might give them, but automatically through a chatbot.
To do this, Sam goes to the GPT builder, where it asks him to describe what he wants to build.
Sam writes that he wants to build a chatbot that gives advice to founders, and in a preview
window on the side of the builder, you can actually see how the GBT builder is interpreting
his instructions and starting to turn it into what will eventually be the custom GPT.
The GPT builder comes up with a name and a suggested icon, both of which Sam accepts, calling it
startup mentor, and then he goes into the configure menu, where, among other things, he can
upload a file, in this case the transcript from a speech he's previously given about startups.
From that same configure menu, he can also add or subtract capabilities, including
web browsing, Dali image generation, and code interpreter. And within just a few minutes,
he's got a working version of one of these custom GPs. Professor Ethan Malik gave another example.
He writes, here's a little GPT, the name for the new agent-like thing released by OpenAI,
that I threw together in less than a minute. It looks up the latest trends for a product category
on the web and then creates prototype images for it.
It takes less than 90 seconds end to end.
So in the demo video, Ethan shares the tool, which is called trend analyzer, asks the user
what type of product they're interested in.
It says, are you thinking about technology, fashion, home goods, or something else entirely?
Once I know the product category, I can look into the latest trends for you.
Ethan writes sneakers, and the trend analyzer goes off and sources some trends around 2023
that could be built into the product design.
It comes back with six different trends, all featuring sources, and then asks,
which, if any, the user wants to proceed with.
When Ethan writes, you decide, trend analyzer says,
we'll go with a high top silhouette with chunky elements and bold and vibrant colorways.
From there, it says, next I'll create realistic photo shoots of our futuristic sneaker concept
incorporating these trends.
That, of course, is where Dolly 3 comes in, and boom, all of a sudden you have this
futuristic shoe concept.
Nick Dobos had another example.
He writes, playing with the new chat GPT custom GPs, introducing GIFPT,
automatically turned Dolly images into GIFs.
Now, as Nick points out, quote,
this was based on some of my earlier experiments using Dolly and code interpreter to make
gifts, but now I can package that entire workflow into a GPT that does it all for you.
And this is something that I think people are initially perhaps missing a little bit,
at least those who are inclined to be contrarian about all of the excitement surrounding this.
For example, if you go back to Ethan's post, someone writes,
how is this any different than Bing chat with image generation?
But as Ethan points out, it uses the same tool, so it just makes it easier to share and to work
with large prompts. It also includes a lot of features that Bing hasn't implemented yet.
Connection to outside systems, CI, and working with files. I think it would be incorrect to underestimate
how much it matters to simplify workflows in the way that these custom GPs do. My general belief
is that every simplification of a workflow leads to a massive increase in the number and variety
of use cases for the thing that's underlying that workflow. In other words, open AI making this a lot
easier, means a lot more people are going to use it, and for a lot different purposes than they
might have before. Now, to the extent that there was any skepticism around custom GPTs, it wasn't around
their usefulness, but about whether there will actually be demand to buy these. As part of their
announcement, Open AI said that within a month, we would have a custom GPT store. X user and AI
Trendwatcher Boris writes, the idea of building a private library of custom GPs is a great and very
useful one. I'm a little skeptical of the store, but it might turn out to be great as well.
AI entrepreneur Bindu Reddy as part of a much larger critique post, one of frankly the only ones that I saw,
was also skeptical of this.
She writes,
To be honest, you need tens of millions of subscribers before you start worrying about a paid developer ecosystem on top of chat GPT.
Like plugins, it seems premature.
I also suspect this is to compete with Barr that promises to unify information from all your Google products.
If I were OpenAI, I wouldn't worry.
No one seems to be using Bard at the moment, and unless Gemini is launched and is really good,
no one will continue to use it.
Robert Scoble, however, disagrees.
He writes,
The store encourages developers to build for it,
which gives OpenAI quite a bit of lock-in
even after competitors arrive.
Think about it this way.
If you're getting paid $1,000 a month
for building a useful GPT,
will you leave just because Elon Musk has a better AI?
Nope.
So the race is on to get developers addicted to your ecosystem today.
I could see buying quite a few GPTs
to help me run my business in life.
That revenue, even if expected to be small for a while,
like Bindu says, will provide lock-in.
Now, speaking of Google and Gemini,
Nvidia's Dr. Jim Fan writes,
expectation for Google Gemini is now ridiculously high.
Gemini has to check off at least one of the following.
120% IQ of textual GPT4,
or 100% of GPT4 but at half the cost or 2x speed of turbo,
or 100% of visual GPT4 or natively support long videos,
and ship the API in Q1 of 2024.
It's about time that DeepMind recovers the glory of AlphaGo in 2016.
I'm looking forward to it.
However, they have their work cut out for them.
Investor Ali Miller writes,
I'm at Open AI Dev Day in San Francisco.
I was front and center for the keynote.
I've tweeted so many tweets,
but there is one big takeaway that you're not going to see over the live stream,
one feeling that you only get in person.
And that is, compared to every other big tech event I've been to,
OpenAI Dev Day is the highest,
okay, I have to go build something with this new release immediately score.
I'm talking 11 out of 10 builder activation score.
It's incredible.
Putting an exclamation point on that,
AI entrepreneur and developer Sam Whitmore writes,
when you run away from Dev Day to go integrate everything as soon as humanly possible,
dream day for people building AI powered products.
Thank you, OpenAI.
This is certainly what I felt when I was watching these announcements,
which I was doing in an airport and an airplane on my way back from Mexico.
Right now, Open AI has builder imagination in a huge way.
They are moving quickly.
They're introducing new things.
They are to the point of the video that I'm,
I released over the weekend. Walking down a path to a very different type of relationship with
computing, and frankly, it's exciting to watch. I'm sure that over the course of this week,
we will come back to these topics and see what people are hacking on and what early experiments
are showing the possibility of things like the Assistance API and Custom GPs. But for now,
after this long episode, we are going to wrap it there. I appreciate you guys listening or
watching as always. I'm very excited to be back with you all. And until next time, peace.
