The AI Daily Brief: Artificial Intelligence News and Analysis - The End of the First Phase of Generative AI

Starting point is 00:00:00 Today on the AI breakdown, we're reading excerpts from an essay about the culmination of the first phase of the chat GPT AI era. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our YouTube, our Discord, and our newsletter. Hello, friends, happy weekend. It being the weekend at all, it means it's time for another long read. And this one I am particularly excited about. I first saw it recommended on the AI Breakdown Discord. Shout out to Magister Falk for posting it. And it's called The Shape of the Shadow of the Thing.

Starting point is 00:00:41 It's by Dr. Ethan Mollock, who has to be one of the folks that I quote most on this show. Those of you who have been listening to me for a while will know that I am a big patterns of history kind of guy. So when I read this line, the idea of the culmination of a first phase of something, I knew that I had to dig in. So let's read this essay, or at least parts of it, and then we'll come back for a little bit of discussion. Again, this is the shape of the shadow of the thing. With the subtitle, we can start to see dimly what the near future of AI looks like. Ethan Begins, this was written on October 3rd, and Ethan begins, a lot has happened in the past week or so, so I wanted to write a post taking stock of where we are. In many ways, I see us reaching the culmination of the first phase of the AI era

Starting point is 00:01:25 that started only 10 months ago with the launch of ChatGPT. It ends with the upcoming launch of Google's Gemini, the first LLM model likely to beat OpenAI's GPT4. Now, enough pieces of the jigsaw puzzle are in place that we can start to see what AI can actually do, at least in the short term. Many pieces are still missing, though, and this is a temporary state of affairs. AI continues to improve. Even more importantly, the actual implications of what this phase of AI will mean for work and education is currently unknowable.

Starting point is 00:01:54 It is unknowable to all of us who don't have insight into what the AI labs have planned, but it is also actually a knowable. to them. I guarantee that the people at Google and OpenAI and Microsoft do not know the implications of AI for your job or your company or your education, or even all the ways in which the systems they are building will ultimately be used for good or bad. So we can't see the thing that is being built, or even the shadow it is going to cast over work in education, but we can get a sense of its general shape. That's what this post is about. With so many updates, there is a lot to discuss, but the right place to start is with the metaphorical brain. I'm going to be anthropomorphizing,

Starting point is 00:02:31 a lot in this post for ease of understanding. Please forgive it. The brain is the core LLM models themselves. Section. Brains One thing everyone should pay attention to is the quality of what industry insiders call frontier models. These are the LLMs that are the most capable, the most intelligent, the most impressive to work with. There are increasing numbers of other good models that might be better for some uses, usually because they are cheaper or open source, but the abilities that the frontier models show us what AI is capable of. The gold standard large language model to date has been OpenAI's GPT4, which has been deployed in some form for over a year, although

Starting point is 00:03:04 it originally stopped training many months before that. No other AI released has beaten GPT4, and Google's BART AI model is notoriously mediocre. That is likely to change in the coming weeks with the release of Google's Gemini, which all rumors suggest will take the crown of the most powerful AI model. I suspect, however, that while Gemini will beat GPT4, it will not exceed it by such a large margin that it takes us to a new phase of AI. Instead, we are are likely to be at a situation where OpenAI, Google, and maybe one or two other players have very capable models that can out-innovate many humans, boost performance on complex tasks, and also do everyone's homework. They also come with a lot of knowledge, which is why GPT4,

Starting point is 00:03:44 a general-purpose AI, can beat older, more specialized AI's trained to be good in one area, like medicine. But these systems have flaws as well, and continue to have problems with hallucinations and making up facts. So not bad for a year of AI releases, but also not the dreaded slash hoped for artificial general intelligence that out thinks all humans or achieves sentience. The future development of AI remains controversial, yet, even if we stopped AI development today, it would likely be most of a decade before we figured out the full implications of today's LLMs. That is in part because brains are just the start. Section. Vision

Starting point is 00:04:19 Image recognition is not new, nor is the ability to create AI images. But when they are combined with the quote-unquote brains of the LLM, something very different happens. So it is significant that both Google and Microsoft slash OpenAI have introduced different levels of multimodal capabilities. That means that they can create and see images and also receive and produce voice. More on that in a minute. Once you give AI's vision, they gain a new method of interacting with the world, one that expands their capabilities into industries and uses that many of us have never considered. It does all the basics, of course, basics like deciphering handwritten treatises on mummies written in archaic catalan, or becoming a solid photography coach. But that is just the start.

Starting point is 00:04:57 In fact, researchers at Microsoft have written a dense paper documenting the surprising ways these vision systems can be used, though they do not provide any statistics on how often it works successfully, and these models are still flawed. Among the more interesting is the ability of the AI to read an operating manual to learn how to use a machine, write an insurance report, perform medical diagnoses, do manufacturing analysis, and even pilot a robot. All of these applications used to require expensive and highly specialized vision systems. Now the frontier models can do them all. Though again, we do not know how to best prompt them or how accurate they are. But giving AI Vision also lets it do things that might be more double-edged swords. Out of the box, it is extraordinarily

Starting point is 00:05:37 good at facial recognition even without training, and it can accurately assess the expressions on people's faces, the location in which they are, and the context in which they are acting. For example, without location information beyond a sign with half-obscured text, GPT4 was able to guess the location of a trip to Hershey Amusement Park, track who is in which picture, figure out the context under which they were acting, and make inferences about the sequence of events and activities. This is both really exciting and offers the potential for misuse. OpenAI and Microsoft have both put guardrails around how these products can be used, they won't tell you the names of people or assess them in detail, and they refuse to solve CAPTCHAs. But these tools will soon become broadly available,

Starting point is 00:06:14 and of course, people are finding ways around these limits by convincing the AI to break its own rules. For example, someone was able to convince the AI to solve a CAPTCHA by fooling it into believing that it was an inscription on his mother's locket. And by hooking up LLMs with vision and research capabilities to image creation models, AIs can actually start to design their own images, beyond just prompting image generators. For example, I asked Bing, which uses GPD4 with vision in creative mode, create a Nike ad in the style of a 1920s poster, make sure to really do research in advance and translate the elements from a modern ad to that period giving lots of detail. Bing looked up 1920s art styles and Nike ads and then decided to use the prompt, an illustration of a

Starting point is 00:06:52 person wearing Nike shoes and clothing, doing some sport activity. The person is drawn in a geometric and abstract style with bright colors and sharp angles. The illustration is inspired by the works of Tomorrow de Lampica, a famous art deco painter who depicted modern and elegant figures. The person is shown running on the track with the city skyline in the background. The illustration covers most of the poster space to create a dynamic and eye-catching effect. While the results aren't perfect, you can start to see how being good at prompting AI image generators is going to be much less important when the AI generates its own images. But we can go further starting a cycle of self-improvement.

Starting point is 00:07:26 Bing, here is the image you created. Can you critique and improve it? Without further prompting, it decided that the picture needed a slogan, so it regenerated an image with those elements. Section. Voice It may not seem like a big deal compared to vision, but AIs are also gaining the ability to both listen and speak. If you are used to yelling at Siri or Alexa, these new AI-powered systems are going to be a big change.

Starting point is 00:07:49 They can understand accents, mixes of languages, and are not bothered by crowded, noisy rooms, combined with the brains of the LLM and you can start to do interesting things. For example, in my entrepreneurship class, students not only pitch to real venture capitalists, but also to AI with the instructions, you are a seed stage venture capitalists who evaluate startup pitches, evaluate the following pitch from that perspective and offer four positives and negatives, as well as what you think about the pitch overall as an investor. The VC in the room was impressed by the results. So are most of the students when I surveyed them.

Starting point is 00:08:20 Everyone considered the results to be either somewhat or very realistic, 55% of students rated the feedback is very useful, and 35% as somewhat useful, 95% reported either minor or no hallucinations. But adding voice output turned out to be a bigger deal than I thought. Talking with an AI is an oddly personal experience, even though you know you are talking to a machine. It feels like there is a real human interested in what you have to say.

Starting point is 00:08:43 This is all an illusion, but it is a convincing enough one, and that, even with today's LLMs, I can see people looking forward to talking to their AI companions. There is a lot of debate over whether this will be a good thing or a bad thing, but I would suggest downloading Pi and trying it out for yourself for free. That same capability will be standard on other LLMs soon. Section. Connection One of the limits of AI right now is that they don't quote-unquote know anything. They are trained on a massive data, which they can imperfectly recreate in response to prompts, leading to hallucinations and errors. As you add tools to the AI that give

Starting point is 00:09:15 context and connections to other sources of data, their usefulness increases. One way to do this is connecting the AIs to the internet so they can look up information. A potentially more powerful technique is to connect it to your own data. Google is doing exactly that, connecting BARD to its other services like Gmail. Since BARD is currently underpowered, though again, I expect that to change, I would not trust the results yet. It hallucinates details, including making up messages that don't exist. But with lower hallucination rates and human supervision, these sorts of connections can become very powerful. Even the flawed bard was able to identify urgent tasks in my email and draft potential replies. As AIs learn more about you, their usefulness will go up, though the full implications of AIs

Starting point is 00:09:55 that make complex inferences about you is currently unclear. Section The Shape of the Shadow We have these pieces which let us guess at the shape of the AI in front of us. It is in science fiction to assume that AIs will soon talk to you, see you, know about you, do research for you, create images for you, because all of that is already built and working. I can already pull all of these elements together myself with just a little effort. That means AI can quite easily serve as personal assistant, intern, and companion, answering emails, giving advice, paying attention to the world around you.

Starting point is 00:10:27 In a way that makes the series and Alexis of the world look prehistoric, it also suggests unexpected corporate and government uses as AI with these capabilities, can act as a ubiquitous helpful coach or troubling Panopticon by observing and listening, intervening with advice or instructions. In many ways, what happens next? actual thing that all of this becomes in the near term depends on our agency and decisions. It is not going to be imposed on us by machines, at least with the current generation of LLMs. With these new capabilities, AI can either serve to empower and simplify, fill out my expense

Starting point is 00:10:58 reports, I am nervous about responding to this email, please help me, I don't understand this confusing form should I sign it, or to remove power, who needs a human companion when you have an AI? What happens when everyone has a perfect facial tracking system? Some of these consequences are knowable and need regulation and responsible action by individuals, and some is going to fall unevenly across industries and societies. It is up to us to figure out how to use this technology to empower and uplift rather than harm. All right, so back to NLW here. Obviously, I decided that when I got into it, it was better presented as an entire piece.

Starting point is 00:11:30 So thanks to Ethan once again for writing, a very thought-provoking and useful reflection. I want to discuss this idea of the phase of AI or generative AI that we are in. Ethan argues that we are reaching a crescendo of a period that started with Chad GPT. And I think that's a useful way of looking at things. I don't think it's particularly arguable that ChatGPT was the starting gun, at least when it came to the recognition of generative AI for the general public. Last November, this thing exploded onto the mainstream. It captured everyone's attention and has barely let go at all ever since.

Starting point is 00:12:04 Now, within this, I do think that we saw a peak hype cycle end around April, May, June, and some calming over the summer. But I think obviously the fall has come roaring back with more advances in product updates that have put AI more at the center of our lives than even before. I think it's an interesting heuristic to use the leading frontier model power as the sort of benchmark for an era. The era that we've been living through then is very much the GPT3.5 to GPT4 era. Now, as I've mentioned before, I think part of what will make this an especially discrete era is that I don't think that we're going to see much more advanced models released without there being some new policy regime in Washington, D.C., that has something to say about how models that are more powerful than these LLMs that we currently have today get released.

Starting point is 00:12:51 However, it is interesting to look at how, to some extent, the capabilities of GPT4 created a magnet for everything to organize itself around. Open source started racing to see how close it could come to those capacities, and of course, a lot of the advances that we've had over the last few months are less about the power of the underlying models, and more, to Ethan's point, these things which expand the way that they can interact with the world and bring them closer into approximation of how we interact with the world. I am sadly one of those folks who is still waiting for my chat GPT with vision, but those who have used it tend to clearly describe it even if they wouldn't use these terms as a change not in scale but in kind, a fundamentally different. more complete and more robust experience that once again shifts the paradigm of how people think about how they can use these tools. My base case for what happens next is that everything converges around this GPT4 or maybe GPT4.5 kind of level. Multimodality comes to all of the closed models and open source versions inch ever closer. The most interesting question to me will be who will be

Starting point is 00:13:52 the first to move to go farther, to truly get to GPT5 level, and what will that do to our public conversation about the capacities of these tools and the obligations of the people who are building them. It feels to me that there is something of a standoff, a feeling of brinksmanship in some ways, at this cusp of things more powerful than GPT4. I feel energy building up around that parabola, and the question is just when it's going to spill over. But anyways, there's a lot that we could talk about there. As you could see, a super interesting essay. Thanks once again to Ethan for writing it. If you want to subscribe to Ethan's writings, go find them at one useful thing, one spelled out. and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - The End of the First Phase of Generative AI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.