The AI Daily Brief: Artificial Intelligence News and Analysis - The End of the First Phase of Generative AI
Episode Date: October 7, 2023A reading and exploration of Prof. Ethan Mollick's The shape of the shadow of The Thing https://www.oneusefulthing.org/p/the-shape-of-the-shadow-of-the-thing The piece argues that a vision of artifici...al intelligence is coming into view, if haltingly, and that we're near to the end of the first phase of generative AI, which was kicked off by ChatGPT. TAKE OUR SURVEY ON EDUCATIONAL AND LEARNING RESOURCE CONTENT: https://bit.ly/aibreakdownsurvey ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're reading excerpts from an essay about the culmination of the first phase of the chat GPT AI era.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our YouTube, our Discord, and our newsletter.
Hello, friends, happy weekend.
It being the weekend at all, it means it's time for another long read.
And this one I am particularly excited about.
I first saw it recommended on the AI Breakdown Discord.
Shout out to Magister Falk for posting it. And it's called The Shape of the Shadow of the Thing.
It's by Dr. Ethan Mollock, who has to be one of the folks that I quote most on this show.
Those of you who have been listening to me for a while will know that I am a big patterns of history kind of guy.
So when I read this line, the idea of the culmination of a first phase of something, I knew that I had to dig in.
So let's read this essay, or at least parts of it, and then we'll come back for a little bit of discussion.
Again, this is the shape of the shadow of the thing. With the subtitle, we can start to see dimly
what the near future of AI looks like. Ethan Begins, this was written on October 3rd, and Ethan
begins, a lot has happened in the past week or so, so I wanted to write a post taking stock
of where we are. In many ways, I see us reaching the culmination of the first phase of the AI era
that started only 10 months ago with the launch of ChatGPT. It ends with the upcoming launch
of Google's Gemini, the first LLM model likely to beat OpenAI's GPT4.
Now, enough pieces of the jigsaw puzzle are in place that we can start to see what AI
can actually do, at least in the short term.
Many pieces are still missing, though, and this is a temporary state of affairs.
AI continues to improve.
Even more importantly, the actual implications of what this phase of AI will mean for work
and education is currently unknowable.
It is unknowable to all of us who don't have insight into what the AI labs have planned,
but it is also actually a knowable.
to them. I guarantee that the people at Google and OpenAI and Microsoft do not know the implications
of AI for your job or your company or your education, or even all the ways in which the systems
they are building will ultimately be used for good or bad. So we can't see the thing that is being
built, or even the shadow it is going to cast over work in education, but we can get a sense
of its general shape. That's what this post is about. With so many updates, there is a lot to discuss,
but the right place to start is with the metaphorical brain. I'm going to be anthropomorphizing,
a lot in this post for ease of understanding. Please forgive it. The brain is the core LLM models themselves.
Section. Brains
One thing everyone should pay attention to is the quality of what industry insiders call
frontier models. These are the LLMs that are the most capable, the most intelligent,
the most impressive to work with. There are increasing numbers of other good models that might
be better for some uses, usually because they are cheaper or open source, but the abilities
that the frontier models show us what AI is capable of. The gold standard large language model to
date has been OpenAI's GPT4, which has been deployed in some form for over a year, although
it originally stopped training many months before that. No other AI released has beaten GPT4,
and Google's BART AI model is notoriously mediocre. That is likely to change in the coming
weeks with the release of Google's Gemini, which all rumors suggest will take the crown of the most
powerful AI model. I suspect, however, that while Gemini will beat GPT4, it will not exceed it by
such a large margin that it takes us to a new phase of AI. Instead, we are
are likely to be at a situation where OpenAI, Google, and maybe one or two other players have
very capable models that can out-innovate many humans, boost performance on complex tasks,
and also do everyone's homework. They also come with a lot of knowledge, which is why GPT4,
a general-purpose AI, can beat older, more specialized AI's trained to be good in one area,
like medicine. But these systems have flaws as well, and continue to have problems with hallucinations
and making up facts. So not bad for a year of AI releases, but also not the dreaded slash
hoped for artificial general intelligence that out thinks all humans or achieves sentience.
The future development of AI remains controversial, yet, even if we stopped AI development
today, it would likely be most of a decade before we figured out the full implications of today's
LLMs. That is in part because brains are just the start.
Section. Vision
Image recognition is not new, nor is the ability to create AI images. But when they
are combined with the quote-unquote brains of the LLM, something very different happens. So it is
significant that both Google and Microsoft slash OpenAI have introduced different levels of multimodal
capabilities. That means that they can create and see images and also receive and produce voice.
More on that in a minute. Once you give AI's vision, they gain a new method of interacting with
the world, one that expands their capabilities into industries and uses that many of us have
never considered. It does all the basics, of course, basics like deciphering handwritten treatises
on mummies written in archaic catalan, or becoming a solid photography coach. But that is just the start.
In fact, researchers at Microsoft have written a dense paper documenting the surprising ways these
vision systems can be used, though they do not provide any statistics on how often it works
successfully, and these models are still flawed. Among the more interesting is the ability of
the AI to read an operating manual to learn how to use a machine, write an insurance report,
perform medical diagnoses, do manufacturing analysis, and even pilot a robot. All of these
applications used to require expensive and highly specialized vision systems. Now the frontier models can do
them all. Though again, we do not know how to best prompt them or how accurate they are. But giving
AI Vision also lets it do things that might be more double-edged swords. Out of the box, it is extraordinarily
good at facial recognition even without training, and it can accurately assess the expressions on people's
faces, the location in which they are, and the context in which they are acting. For example,
without location information beyond a sign with half-obscured text, GPT4 was able to guess the location
of a trip to Hershey Amusement Park, track who is in which picture, figure out the context under which
they were acting, and make inferences about the sequence of events and activities. This is both really
exciting and offers the potential for misuse. OpenAI and Microsoft have both put guardrails around how
these products can be used, they won't tell you the names of people or assess them in detail,
and they refuse to solve CAPTCHAs. But these tools will soon become broadly available,
and of course, people are finding ways around these limits by convincing the AI to break its own
rules. For example, someone was able to convince the AI to solve a CAPTCHA by fooling it
into believing that it was an inscription on his mother's locket. And by hooking up LLMs with vision
and research capabilities to image creation models, AIs can actually start to design their own
images, beyond just prompting image generators. For example, I asked Bing, which uses GPD4 with
vision in creative mode, create a Nike ad in the style of a 1920s poster, make sure to really do
research in advance and translate the elements from a modern ad to that period giving lots of detail.
Bing looked up 1920s art styles and Nike ads and then decided to use the prompt, an illustration of a
person wearing Nike shoes and clothing, doing some sport activity. The person is drawn in a geometric
and abstract style with bright colors and sharp angles. The illustration is inspired by the works
of Tomorrow de Lampica, a famous art deco painter who depicted modern and elegant figures.
The person is shown running on the track with the city skyline in the background. The illustration
covers most of the poster space to create a dynamic and eye-catching effect. While the results
aren't perfect, you can start to see how being good at prompting AI image generators
is going to be much less important when the AI generates its own images.
But we can go further starting a cycle of self-improvement.
Bing, here is the image you created. Can you critique and improve it?
Without further prompting, it decided that the picture needed a slogan,
so it regenerated an image with those elements.
Section. Voice
It may not seem like a big deal compared to vision,
but AIs are also gaining the ability to both listen and speak.
If you are used to yelling at Siri or Alexa,
these new AI-powered systems are going to be a big change.
They can understand accents, mixes of languages, and are not bothered by crowded, noisy rooms,
combined with the brains of the LLM and you can start to do interesting things.
For example, in my entrepreneurship class, students not only pitch to real venture capitalists,
but also to AI with the instructions, you are a seed stage venture capitalists who evaluate
startup pitches, evaluate the following pitch from that perspective and offer four positives
and negatives, as well as what you think about the pitch overall as an investor.
The VC in the room was impressed by the results.
So are most of the students when I surveyed them.
Everyone considered the results to be either somewhat or very realistic,
55% of students rated the feedback is very useful,
and 35% as somewhat useful,
95% reported either minor or no hallucinations.
But adding voice output turned out to be a bigger deal than I thought.
Talking with an AI is an oddly personal experience,
even though you know you are talking to a machine.
It feels like there is a real human interested in what you have to say.
This is all an illusion, but it is a convincing enough one,
and that, even with today's LLMs, I can see people looking forward to talking to their AI
companions. There is a lot of debate over whether this will be a good thing or a bad thing,
but I would suggest downloading Pi and trying it out for yourself for free. That same
capability will be standard on other LLMs soon. Section. Connection
One of the limits of AI right now is that they don't quote-unquote know anything. They
are trained on a massive data, which they can imperfectly recreate in response to prompts,
leading to hallucinations and errors. As you add tools to the AI that give
context and connections to other sources of data, their usefulness increases. One way to do this is connecting
the AIs to the internet so they can look up information. A potentially more powerful technique is to connect
it to your own data. Google is doing exactly that, connecting BARD to its other services like Gmail.
Since BARD is currently underpowered, though again, I expect that to change, I would not trust the results
yet. It hallucinates details, including making up messages that don't exist. But with lower
hallucination rates and human supervision, these sorts of connections can become very powerful.
Even the flawed bard was able to identify urgent tasks in my email and draft potential replies.
As AIs learn more about you, their usefulness will go up, though the full implications of AIs
that make complex inferences about you is currently unclear.
Section The Shape of the Shadow
We have these pieces which let us guess at the shape of the AI in front of us.
It is in science fiction to assume that AIs will soon talk to you, see you, know about you,
do research for you, create images for you, because all of that is already built and working.
I can already pull all of these elements together myself with just a little effort.
That means AI can quite easily serve as personal assistant, intern, and companion,
answering emails, giving advice, paying attention to the world around you.
In a way that makes the series and Alexis of the world look prehistoric,
it also suggests unexpected corporate and government uses as AI with these capabilities,
can act as a ubiquitous helpful coach or troubling Panopticon by observing and listening,
intervening with advice or instructions.
In many ways, what happens next?
actual thing that all of this becomes in the near term depends on our agency and decisions.
It is not going to be imposed on us by machines, at least with the current generation of LLMs.
With these new capabilities, AI can either serve to empower and simplify, fill out my expense
reports, I am nervous about responding to this email, please help me, I don't understand
this confusing form should I sign it, or to remove power, who needs a human companion when you
have an AI? What happens when everyone has a perfect facial tracking system?
Some of these consequences are knowable and need regulation and responsible action by individuals,
and some is going to fall unevenly across industries and societies.
It is up to us to figure out how to use this technology to empower and uplift rather than harm.
All right, so back to NLW here.
Obviously, I decided that when I got into it, it was better presented as an entire piece.
So thanks to Ethan once again for writing, a very thought-provoking and useful reflection.
I want to discuss this idea of the phase of AI or generative AI that we are in.
Ethan argues that we are reaching a crescendo of a period that started with Chad GPT.
And I think that's a useful way of looking at things.
I don't think it's particularly arguable that ChatGPT was the starting gun,
at least when it came to the recognition of generative AI for the general public.
Last November, this thing exploded onto the mainstream.
It captured everyone's attention and has barely let go at all ever since.
Now, within this, I do think that we saw a peak hype cycle end around April, May, June, and some calming over the summer.
But I think obviously the fall has come roaring back with more advances in product updates that have put AI more at the center of our lives than even before.
I think it's an interesting heuristic to use the leading frontier model power as the sort of benchmark for an era.
The era that we've been living through then is very much the GPT3.5 to GPT4 era.
Now, as I've mentioned before, I think part of what will make this an especially discrete era
is that I don't think that we're going to see much more advanced models released
without there being some new policy regime in Washington, D.C., that has something to say about
how models that are more powerful than these LLMs that we currently have today get released.
However, it is interesting to look at how, to some extent, the capabilities of GPT4
created a magnet for everything to organize itself around.
Open source started racing to see how close it could come to those capacities, and of course, a lot of the advances that we've had over the last few months are less about the power of the underlying models, and more, to Ethan's point, these things which expand the way that they can interact with the world and bring them closer into approximation of how we interact with the world.
I am sadly one of those folks who is still waiting for my chat GPT with vision, but those who have used it tend to clearly describe it even if they wouldn't use these terms as a change not in scale but in kind, a fundamentally different.
more complete and more robust experience that once again shifts the paradigm of how people think
about how they can use these tools. My base case for what happens next is that everything converges
around this GPT4 or maybe GPT4.5 kind of level. Multimodality comes to all of the closed models
and open source versions inch ever closer. The most interesting question to me will be who will be
the first to move to go farther, to truly get to GPT5 level, and what will that do to our public
conversation about the capacities of these tools and the obligations of the people who are building them.
It feels to me that there is something of a standoff, a feeling of brinksmanship in some ways, at this
cusp of things more powerful than GPT4. I feel energy building up around that parabola,
and the question is just when it's going to spill over. But anyways, there's a lot that we
could talk about there. As you could see, a super interesting essay. Thanks once again to Ethan for writing
it. If you want to subscribe to Ethan's writings, go find them at one useful thing, one spelled out.
and until next time, peace.
