The AI Daily Brief: Artificial Intelligence News and Analysis - ChatGPT All Tools and What OpenAI Is Really Building
Episode Date: November 5, 2023In the lead up to November 6th's OpenAI DevDay, NLW looks at their recent launch of PDF reading and the All Tools model and explores that it suggests about the company's bigger plans. Today's Spon...sor: Listen to the chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts or here: https://link.chtbl.com/xz5kFVEK?sid=AIBreakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're looking at what OpenAI is really building and why it's less about an application and more about the future of computing in general.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our YouTube channel, our Discord, and our newsletter.
Welcome back to the AI breakdown.
Undisputedly, undoubtedly, the biggest event coming up in artificial intelligence in the near future is OpenA.
AI's Dev Day, which is happening on November 6th. This event was announced back in September,
and it's the first time that OpenAI has held a developer-focused event. Now, speculation about
what is going to be announced at this event is running rampant. Altman himself, when he announced
it, made sure to say that there was not going to be any sort of GPT 4.5 or GPT5, but he thought that
people would still be excited anyways. One of the areas of speculation, which we've talked about
previously on the show is around autonomous agents. Summed up here in this peeky
blinder meme, where furious Killian Murphy is saying, no effing autonomous agents. Today, we will
go to a little bit of speculation at the end about what might get announced on Debday, but that's
actually not so much the point of today's episode. Instead, what we're looking at is a set of
recent updates from OpenAI around ChatGPT in advance of the event and what they suggest
about how OpenAI sees what it's actually building. So first, let's look at what has changed.
If you are a plus user of chat GPT, when you go to GPT4, you have a set of different models that you can
choose between. You can choose default, which is their model that is not connected to the internet,
and that has been trained on data up to January 22, or as some people have reported perhaps
a little bit later, but the point being that that is their default model, and again, it is not
connected to the internet. That is in contrast with the next model option that you have, which is
Browse with Bing. Now, Browse with Bing was in fact turned off for quite some time after users
figured out how to use it to get around paywalls. It's only a few weeks ago that it opened back up,
but for many people, if they are doing tasks that involve understanding current information,
Browse with Bing is in some ways the default model that they're likely to choose. Two of the models
that made waves earlier in the year are Advanced Data Analysis and Plugins. Now, Advanced Data Analysis used to be
called Code Interpreter. Advanced Data Analysis is probably a slightly better name, but neither really
fully captures the idea that in this version of GPT4, the system is able to actually write code to help
it figure out how to solve problems given to it by the user. I and others have done just
innumerable videos about how different people use Code Interpreter or Advanced Data Analysis,
with some of the most frequently cited examples being things like segmenting customers,
taking a spreadsheet of data, and then using it to figure out different clusters of a market.
Another capacity that people have shown is to feed advanced data analysis some set of data
and just ask it to figure out what it thinks is interesting.
To come up with hypotheses, for example, it can create basic visualizations, graph public data,
and visualize data in a ton of different ways.
Even though it's slightly old news at this point, it does remain one of the most impressive
features of any AI product.
Now, of course, plugins add functionality to chat chip ET by creating some access to different
datasets or platforms.
The one that I use most often is Xpapers, which plugs into,
Cornell University's archive site, which is a free open access archive of nearly 2.3 million
scholarly articles. Whenever a new AI paper is released, it ends up pretty quickly on archive,
and the XPapers plugin allows me to get summaries of those things, just by sharing the link to
the particular piece I'm interested in. Now, at this stage, there are an insane number of
plugins that you can use. I'm scrolling through a list currently across all different categories
that someone put together on an open Google document, and we are currently well over a thousand
and plugins ranging from music to travel, to SEO and marketing, to education, to productivity,
to finance, to crypto, to science, to business, and more.
The last GPT4 model is, of course, Dali 3, and that's certainly the one that I've been using
most frequently.
I already had mid-journey deeply in my workflow, particularly when it comes to YouTube thumbnails
or images for newsletters, and the ability to interact with Dali 3 through chat GPT where I can
actually use English language rather than just traditional prompting, gives it a much
better ability for me to get my weird request across, such as these images on the screen of a
Thanksgiving table with Bitcoin. I really love this huge Bitcoin pumpkin pie with the little
tiny Bitcoin pumpkin pies all around it. So the new update is two parts. First of all, there is a new
ability to upload and analyze files. Specifically, you can chat now with PDFs. Now, this is a big
deal because there are numerous plugins that do exactly that, and it is a very, very compelling
feature with just a ton of different use cases. However, the second update, the second update,
is what some people are calling all tools, which basically means that now, instead of manually
switching between these different GPT4 models, chat GBT itself can actually figure out which
are most useful based on the instructions of the user. So functionally, what this allows for is all
of these different capacities to come together much more fluidly. Let's look at a couple examples of how
people have demonstrated this. Daragwalsh writes, Multimodal AI is the future. I uploaded the great
Gatsby PDF to chat GBT, then asked Dali 3 to create images from the key scene.
Gatsby's Mansion left off the page.
Having all tools in one chat is a game changer.
Indeed, I think this sort of layer one use case of uploading an image
and asking for a modified version of it or an image that resembles it
is one of the simplest but clearest benefits of this all tools update.
Previously, the Dolly 3 model didn't give you the ability to upload an image as a reference point,
which was one of its biggest limitations.
Now, Anu Akash also demonstrated how the integration of Browse with Bing could be useful,
asking GPT4 all tools, what is the price of it?
MacBook Pro M3, creative visual of MacBook Pro M3 with that price tag. Now, of course, for those of you
who have been following along, you know that the MacBook M3 is the new advanced chip that Apple is
actually promoting as significant and important to the AI space in what really represents their
first time talking actively about the industry and trying to compete directly for people
who are in it. So with this prompt, obviously ChatGPD had to both browse with Bing to figure
out the price, and then it used Dolly 3 to create the image that had that information that it had
searched up, integrated into the output. Now, these are all very basic demonstration uses. They're literally
all, I think, designed to figure out what the capacity of all tools is, rather than really stretch
and get creative about how all these tools in one spot changes what we can do. But there is one more
thing that some eagle-eyed folks noticed, which is that apparently all tools has an increased token
context window as well. Whereas GPD4 previously had an 8K token context window, it appears that all
tools has a 32K token window. Now, it makes sense that they would try to roll out a larger context
window given that they are now allowing people to upload PDFs. But still, this is a huge
update and one that people have been eagerly awaiting for months and months and months now.
But where I really want to get to is the conversation that surrounds this.
And now a word from today's sponsor. Are you interested in how two top-of-mind trends
AI and crypto can work together? If so, I have the perfect podcast recommendation for you.
Web 3 with A16Z Crypto, the chart-topping show brought to you by venture firm Andresen Horowitz.
Web 3 with A16Z Crypto is your definitive resource for the future of the internet,
whether you're already building in these spaces or simply curious about what's next.
If you need a place to start, they recently released an excellent episode with Stanford Cryptography
Professor Dan Bonay and former Google Xer Aliya in conversation with host Sonal Choxi about
the intersection of AI and crypto.
from fighting deepfakes and proving humanity to large language models like chat GPT, they cover it all.
I highly recommend checking it out, especially if you'd like to learn more about how AI and
crypto will impact our everyday lives. Beyond crypto and AI, this show is for creators seeking
more ways to truly own their work, for business leaders trying to prepare for the future today,
and for innovators exploring trending tech topics. So go ahead, listen to Web3 with A16Z
crypto, wherever you get your podcasts.
Alex Kerr summed up a huge number of tweets that I saw when he wrote,
Many startups just died today because OpenAI added PDF chat.
You can also chat with data files and other document types.
We had a wave of products better suited as features rather than standalone companies.
Rappers are being squeezed by OpenAI on one side and incumbents on the other.
It's a rough world out there.
Now, of course, what Alex is referring to is the way in which companies that filled in the gaps of the chat GPT product are now being out-competed
as chat GPT and OpenAI adds that functionality natively.
This is always the risk of building on someone else's platform.
What makes it more brutal, and I think what Alex is recognizing,
is that there have been a lot of different pressures on startups
that perhaps they didn't anticipate going into the AI space earlier this year.
Some of those are the platforms like OpenAI themselves,
deciding to compete in a particular area,
but some of them are just the speed with which incumbents and big players have adapted
and integrated versions of a lot of different services
that previously startups might have been focused on
into the tool sets that people are already using.
Mike Butcher from TechCrunch wrote,
New ChatGPT demonstrates how small tech companies and products that relied on parsing
PDFs will eventually be wiped out by AI platforms.
AI isn't just coming for blue-collar jobs, it's also coming for engineers.
Now, my note is that actually I think it's coming more for engineers than blue-collar jobs,
but that's the subject of a different show as well.
Now, there are two analyses that I wanted to share in completion
because I think they do a really good job of summarizing
and putting into context these moves in a bigger way.
Itapai quote tweeted Alex Kerr and wrote,
I don't know why there is any surprise.
Here's OpenAI's product strategy for the next two years.
You will be able to upload anything to chat GPT.
You will be able to link to any external service like Gmail and Slack.
ChatGPT will have persistent memory, no more multiple chats unless you want it.
ChatGPT will have a consistent user customizable personality, including political bias.
ChatGPT will be able to respond by text, voice, images, diagrams, and video still in this time frame.
ChatGPT will become much faster until you feel it's a real person, around a 50-millimeter
response time. Hallucinations and non-factual errors will decline rapidly. As self-moderation
improves, question rejection will decline. Now, Adipai also added a little nuance to the agent
conversation. When browsing and able responded to them and said,
Little Birdie told me AI is pretty far ahead in developing agents that run continuously and
complete tasks. Adding memory and learning over time, reasoning, different fine-tuning, and
determination of output quality is far from real true loops.
Browsing Nabled wrote,
Little Bertie told me that OpenAI is pretty far ahead in developing agents that run
continuously and complete tasks.
To that, Adipai responded, AI initiated actions.
I hesitate to say agents because it's a loaded word.
Current browsing and data analysis tools are already AI initiated.
Would expect to see that functionality be extended to more tools.
Now, expanding this line of thinking even further was Rob Phillips at I Was Robbed on Twitter.
Rob wrote,
As an ex-Viv with Siri team engineer, let me help ease everyone's future trauma as well with the fundamentals of assisted intelligence.
Make no mistake, OpenAI is building a new kind of computer, beyond just an LLM for a middleware and front end.
Key parts they'll need to pull it off.
Persistent user preferences.
The biggest unlock of assistance has always been to deeply understand what someone wants in the most specific way.
This is the wow moment where computers stop being scary and start being truly helpful.
We did this in 2016 on VIV when our AI knew what you liked for each and every single.
every service you used via Viv and mix that in with context like what kind of flowers you told us your
mom liked. This will need to include access to your personal information to infer preference as well.
External real-time data. 50% of the utility of an LLM comes from the base training and RLHF fine-tuning,
but much more comes from extending its available data with external sources. Zapier, Airbyte,
and others will help, but expect deep integration with third-party apps and data pipelines.
Chat with PDF is a tiny, tiny part of this. If you're only building that, think much bigger.
actual computing on virtual machines.
Context windows are limiting, so AI providers will continue benefiting from running tasks
directly on a Python or no-Deno virtual environment so it can consume huge amounts of data
just like a computer today can.
Today, these are short-lived environments used by data analysts and Julius, but over time
they'll become a new type of Dropbox where your data is persisted long-term for additional
processing or cross-file inference and insights.
Agent Task and Flow Planning. Planning can't function without intent.
Understanding intent has always been a holy grail, and LLMs finally helped us
unlock what we spent years approximating at bib with NLP tricks. Once intent is accurate,
planning can start. Creating an agent planner is incredibly nuanced and will take significant
integration with user preferences, third-party data sets, knowledge of compute capabilities, etc.
An app store of experts. Apple initially made the mistake of building a closed app store,
then it realized it could monetize a cornucopia of creativity if they opened it up. Regardless of
OpenAI saying they're focused on chat GPT and only chat GPT, it's inevitable they'll
re-scope it and enable a long tail of specialized agents. Builders will be able to compose
multiple tools together into workflows that can specialize, and AIs over time will be able to
auto-complete these tools together as well, learning from the builders that came before them.
Persistent contextual memory. Embeddings are helpful, but they are missing fundamental parts like
context switching, conversational centroid, summarization, enrichment, etc. Most of the cost of
LLMs today comes from prompts, but as history and persistence is embedded and the inference cached,
this will unlock the ability to have long-term memory with pointers to critical subjects, topics,
feelings, and tone. Core memory is just the beginning. We still need all the rich information
our minds conjure when we think about a past sunset, a breakup, a scientific understanding,
or sensitive context for people we interact with.
Now, guys, I know that this is long, but I want to continue because I think it's hugely,
hugely relevant.
Rob goes on.
Long polling tasks.
Agent is a loaded word, but part of the intent is to have tasks that can be scheduled
in self-completing regardless of the time horizon required.
E.G. Let me know when flights from Montreal to Hawaii are less than $500.
This will require coordination of compute across API providers as well as virtual
environments in the cloud.
Dynamic UI.
Chat is not the final end-all interface.
There's a reason apps have affordances like buttons, date-pickers, images.
It simplifies.
AI will be a co-pilot, but to be a co-pilot, it'll need to adjust to what works best
for a given user.
The future is personalized as optimizations require it, so UI will be dynamic.
API and tool composition.
Expect AIs to generate custom, quote-unquote, apps in the future, where we can build our
our own workflows to compose together APIs without waiting for a big startup to do so.
Fewer apps and startups will be needed to generate front-ends,
and AI will be better at composing an array of tools and APIs together coupled with a gas-fee or tax.
Assistant to Assistant interactions.
There will be countless assistants in the future,
with each assisting humans and other assistants towards some greater intent.
Alongside this, assistance will need to learn to interface across text,
APIs, file systems, and other modalities used by both agents-slash-startops and humans
as integration flows deeper into our world.
Plug-in slash tool stores.
Specialized assistance can only be made possible by composed.
using tools, APIs, prompts, data preferences, and much more. The current plugin store is super early
days, so expect much more work to come and expect many of those plugins to be rolled in
house as they become more mission-critical. And this is just a 10-minute brain dump. Much, much more is
needed behind the scenes including internet search and scraping, community for intent, building,
RLHF, etc. Dynamic API generators and connectors, gas fees, tool builders, ingestion via glasses,
earbuds, et cetera. If you think it's too late to be an AI, just know the above is about
25% of what it will actually take with much more to come as we iterate and get even more
creative. The point of all of this, this very long and super interesting perspective on where
things are going, is that what you see with Chatchip-T is the very first iteration, now maybe the second
or third iteration, depending on how you look at it, of a clean break with a former
modality of computing. Thinking about it simply as an application that does some set of things
completely misses how comprehensively it's challenging the way that we interact with computers,
and indeed the way that we interact with the world and get anything done in general. Every time OpenAI
or ChatGBTBTBT adds another feature, it has this feeling of having meant to have always been
there. Of course, browsing the internet was supposed to be wired with image input was supposed to be
wired with image output. How could it ever have not been so? And so to the extent that one is
building a startup in this space, the old Wayne Gretzky notion of skating to where the puck is going
has never been more significant. Those who design for the static world in which some feature or another
is missing from what we see now are too likely to be steamrolled by the inevitable spur of progress.
I, for one, am excited to see what OpenAI announces at their dev day, and I'm sure we'll be
talking about it here. For now, I appreciate you listening or watching as always, and until next time,
peace.
