Deep Questions with Cal Newport - Is AI About to “Eat Everything”? | AI Reality Check

Starting point is 00:00:00 Last week, the AI Safety and Evaluation Organization Meter, that's METR, released a new update on their famous AI Time Horizon chart. Look, I'm going to load it on the screen here for people who are watching. And when you zoom in, you can see these points on this chart, starting around 2025, begin to go up. And then when we get the 2026, they go way up. and then the last update go way up again. Now, this graph looks scary.

Starting point is 00:00:34 Even if you don't know what it means, it does create a strong sense of digital ick. And as you can imagine, the internet jumped into action to try to amplify that uneasy feeling. Now, in a recent essay posted to his newsletter, Gary Marcus did a good job of rounding up some of the more, shall we say,

Starting point is 00:00:53 concerned responses to this latest update to meter's latest graph. Let me show you a couple here. Here's one, a tweet that said, AI power is doubling every 103 days now. It's going to eat everything. Nothing will be spared. We are on the threshold of truly ergodic alien intelligences

Starting point is 00:01:14 in which human input will be nothing but a liability. All right. Here's another example that Gary pointed out. The tweet simply says, TikTok. It has an expertly drawn graph that shows highest intelligence on earth by time. And you see there's a point where it goes up, up, crosses a tripwire, and then shoot straight up, where human brains become smart enough to create ASI, which is artificial superintelligence.

Starting point is 00:01:40 Then below it is a version of that time horizon graphs. And they're like, look, doesn't that look similar? The line goes up. The line goes up. So I guess we're about to have artificial superintelligence. conquer the world. Now, there are many more tweets out there in response to this time horizon update. They all give you the same sense that this meter chart is capturing an intelligence explosion that, A, we're not ready for, B, that will change everything, and C, that vindicates

Starting point is 00:02:08 every bold or crazy thing anyone has ever claimed about AIs and his capabilities. But is this right? Well, it's Thursday, which means it's time for an AI reality check episode of this show, which seems like a perfect time to look closer at what exactly the meter time horizon chart is showing and what exactly that means. As always, I'm Cal Newport, and this is Deep Questions, the show for people seeking depth in a distracted world. All right, so the first question we want to ask here is what is it exactly that the meter time horizon chart is actually showing? All right. So, So I spent time reading about it.

Starting point is 00:02:57 The good news is meter actually is very transparent. They published very detailed collections of notes describing their methodology and what goes into their chart. So it was actually quite a pleasure to get answers to this question. So what are they actually showing on this chart? Well, here's what they did. They came up with a collection of what they call software tasks. These are well-defined challenges that you can solve by writing and or analyzing computer code. All right.

Starting point is 00:03:23 then for each of these tasks, they went out and asked a collection of human programmers to go do the task. Hey, go do this. I believe the instruction was as quickly as you can. And then they asked them, how long did it take you to complete this task? They would then take the geometric mean, so they would average those answers. And whatever the mean was, whatever the average was is the human time duration that they would label that task with. So if it, you know, on average, took people two hours to complete a various tasks, they would say this is a two-hour tasks. They then said, let's evaluate different large language model-based tools on these tasks.

Starting point is 00:04:03 Now, of course, a given large language model can't do anything except Spedelt token. So they would take each large language model and combine it with a, they call it a scaffold, but what we would call today also a coding harness. So a program that can call the LLM to try to solve programming challenges. It's like Claude Code or Cursor or Kodax. These are all coding harnesses. So the coding harness, when you give it the problem, for example, will query an LLM and say, give me a plan for tackling this problem.

Starting point is 00:04:33 The coding harness will then go step by step to that plan. It will call the LLM when it needs code generated. But the harness could also do a lot of stuff on its own, like do checks. It knows how to interact with various software tools that are relevant for creating software. It can go back and verify, hey, did this step really work?

Starting point is 00:04:50 The coding harnesses have gotten pretty complicated. More on that later. So I'll take an LLM with a coding harness, and one by one, they'll ask it to do each of these tasks. And they actually have it do each of these tasks six times. And if it completes the task, at least half the time, they say, okay, this model plus this harness can tackle that task. And they keep going until it gets stuck.

Starting point is 00:05:13 So they say, what was the longest duration, task that this LLM plus a coding harness was able to complete at least 50% of the time, and that's what they plot. So let's go back now to this plot to make that a little bit more clear. It's a little bit confusing sometimes. So we see on the Y axis here the duration that various tasks are labeled with, right? So fixed bugs in a small Python library was labeled a little bit more than an hour. That's how long it took the humans who did it, who completed the task to actually finish it.

Starting point is 00:05:41 Exploit a buffer overflow. That took a little bit more than two hours. etc. Okay, so let's zoom back out here. We've got the chart is trying to, look this. The technology, with all the AI technology we have in the world, the biggest problem we're having is the chart itself isn't loading. Let me just do a quick reload here.

Starting point is 00:05:59 It's ironic, I think, Jesse. All right. There we go. Let's go back to a linear scale. So then what they're plotting for each dot here is actually the name of a model, right? So like over here is Clod Opus for, point five. And where they're plotting the point for Claude Opus 4.5 is to correspond to the longest

Starting point is 00:06:22 duration task it was able to conclude. So if we click on that, we find out that a four hour and 53 minutes was the, that was the length of the longest length task that it was able to successfully complete at least 50% of the time. Right. So they're, they're plotting each model against what's the longest time duration task that it was able to complete at least successfully. at least 50% of the time.

Starting point is 00:06:47 And the X-axis is time. So they're taking each model to the time in which it was released and then plotting it at how long of a duration task was it able to actually complete. All right. So it's a little bit confusing. If we zoom out, it's just line goes up. I guess that's scary. But this is what's really going on here, is they're trying to capture, are these models

Starting point is 00:07:10 able to tackle tasks that require more and more human time to complete? as we move on to more and more advanced models, and that is, in fact, what they're actually showing, where we start to get this speed up around 2025 that then really picks up around 2026. All right? So that's what's on the plot. You can also look at 80% success

Starting point is 00:07:32 where they plot each model at the duration that it could complete 80% of the time. This curve looks similar, but if we look at the y-axis, we see it's much smaller. So if you need it to be successful, 80% of the time, the very best model Claude Mythos preview is able to complete a task that's roughly took to humans about three hours, where 50% of the time is enough, it's completing a task that takes 16 hours. So it does make a difference how successful you need it to be. All right.

Starting point is 00:08:04 So that's what's on this chart. Hopefully that makes sense. A little confusing. Hopefully that makes sense. What is that actually capturing? So now that we know what is on this chart. this plot, what is it actually mean? Well, first of all, now that we know this, here's our first observation.

Starting point is 00:08:20 Meter is not measuring the general capability of these LLM models, right? They're looking at a specific suite of programming tasks. So it's not the case. It's a mistake a lot of people were making. When you see like Opus 4.6 labeled with 12 hours on this plot, that doesn't mean that Opus 4.6 can now do whatever would take a human 12 hours to do. No, it means there's a particular software task that required on average 12 hours for human testers to do, that Claude Opus 4.6 can now complete accurately about 50% of the time. Right? So it's not telling us anything about the general capabilities of these models.

Starting point is 00:08:59 It's also not measuring the general programming capabilities of these models. So in other words, the fact that Opus 4.6 is on the X, the Y axis line for 12 hours, does, doesn't mean that that model with the right coding harness can now successfully do any programming tasks that would take humans 12 hours. That's another thing I hear often. Well, now our models can do work that used to require 12 hours for people to do. Again, it means there's a particular programming task that when it was given to a collection of humans to complete, it took them 12 hours that this model can now complete on its own.

Starting point is 00:09:35 We don't know how long it takes the model, but it can complete it on its own correctly 50% of the time. So it's really measuring a specific collection of software tasks. Now, what about these durations themselves, though? I mean, are they meaningful? Like, what does it mean that a task took humans that they test on 12 hours? Like, what meaning do we get out of that? And here we have to be careful. It's not really clear what the specific number means.

Starting point is 00:10:01 Like, meter acknowledges in their notes to go along with this study, that it's kind of hard to put a person. precise meaning on this, right? Because when you say this task, this particular task, took humans 12 hours, what does that mean? What were those humans doing for those 12 hours? And they're clear about this. Like, well, it's not clear. Like, it could be there spending this time looking up what the task even meant, trying to learn the techniques needed to do that task, right? Maybe they're having to learn a new programming language or go, you know, master, they've never done something like this before and they're on the internet for six hours trying to learn what it is.

Starting point is 00:10:39 we don't know what this time is actually being spent on. And this is what meter says. I'm going to quote from their study here. The time horizon is closer to what a, quote, low-context person, such as a new hire or remote internet contractor can accomplish. An eight-hour time horizon does not mean that AIs can do eight hours of work that a high-contact human professional can do as part of their day-to-day job. So they're saying, look, we don't really know what to tell you about these numbers

Starting point is 00:11:04 precisely. It's just that some people, when we gave them this task, It took them a while. So probably the right way to think about what is being measured here is something like a general benchmark for programming capability. And these time durations, you should not get caught up with the particular hours, but just think of these as an abstract measure of difficulty. So I don't know what these low-context human programmers were doing, but if this task

Starting point is 00:11:32 took them twice as long as this task, then maybe that's a twice as hard task. So we don't know exactly what this abstract scale tells us, but it's like a good general way of capturing the hardness of programming task, which is smart, right? It's a good way to do a benchmark. We took a bunch of programming problems. We found some way to measure how hard they are. And what we're asking is how far can this new model get in these tasks?

Starting point is 00:11:58 How hard of a task can this new model actually complete? And so if we see this model can complete a harder task than this model. We're like, oh, that model is better at programming tasks. That's the right way probably to think about what meter is doing. I think it's a well-designed benchmark of how capable our models combined with harnesses at various programming tasks. And that's the right way to read those numbers. More like abstract difficulty numbers, the actual times aren't that meaningful. All right.

Starting point is 00:12:30 Third question, how are these models getting better? Why are we seeing these jumps? Well, I want to jump back into the chart here because the timing here is really important and it connects to some important things you need to know about how LLM-based coding models work and what has happened in the last two years. So if we look at this chart, I'll bring it back up here. Notice it's flat for a long time. From GPT2 all the way to this point right here, basically they can't do anything, right?

Starting point is 00:12:59 They can't complete any meaningful, interesting task as part of this coding suite. Now, there's a reason for this, because up until this first point right here, which is like Claude Sonnet 35, and we get O1 preview. This is where we first start to see arrays to, you know, hey, there's some of these tasks we can complete. Up to that point, what was happening with LLMs? The focus was on pre-training. Pre-training is that the long, expensive period where you use real text written by humans and you have the model try to guess missing tokens.

Starting point is 00:13:31 is where all of the primary smarts and intelligence and capabilities of language models come from. From GPT2 through the attempt to make GPD 4-5, we were just trying to make pre-training longer, more data, train it longer, and wanted to make the general capability of these models better. We didn't need benchmarks like this so much to know that three was better than two or four was better than three because we could just demonstrably see without much work. All these new things the models could do that the last ones couldn't. As I've written about in the New Yorker last August, they hit a wall in the summer of 2024 where it became clear.

Starting point is 00:14:07 OpenAI discovered this first. Other companies had the same realization over the next year or so. It became clear that simply scaling up the pre-training, the quality of data, the quantity of data, and how long they train, was not giving obvious new leaps and capabilities of these models. So it created a shift in how they thought about improving these models that we really began to see in the fall of 2024. And this shift was towards post-training, where they said, okay, we're going to take a pre-trained model, and now we're going to get very particular narrow data sets where we have like prompts and correct answers and using complicated techniques based on reinforcement learning. We're going to tune that model to use its intelligence it already has to get better

Starting point is 00:14:51 responses for very narrow types of problems. So we're now going to start focusing on particular problems where we have really good right-and-wrong answer data. And we're going to start tuning these pre-trained models to try to do better in these particular areas. And so they surveyed the landscape of particular areas in which they could tune these models to do better. And one area that seemed really clear was computer programming. Programming languages are highly structured text. It's actually easier for a language model to deal with producing computer code, even than it is with English language. So we knew from the very beginning of these large language models that they're very good at producing code. It was just

Starting point is 00:15:29 hard to prompt them to produce exactly the right code you needed and to be sort of consistent about it. Starting in that fall of 2024, when we began tuning these models, they started to become better at not just like producing one small bit of code, but being able to produce longer, more coherent pieces of code. This is where we begin to get, we see here like 01 and Sonnet 35. We get these early reasoning models, which were models that they said, okay, after you've been pre-trained, we're going to tune you to try to give longer answers to sort of think out loud and because we generate answers auto-regressively, so we always look at what we've output so far before producing the next token.

Starting point is 00:16:07 By thinking out loud, we're basically going to have more expensive computation, but be more likely to get better answers. So that began to help planning. So if you asked one of these LLMs to come up with a simple plan for solving a problem, once we went to reasoning, they became a little better. They began tuning it on computer code that actually works. And so the code they produced was starting to have a higher, quality as well. So that's where we begin to get this sort of move up to curve on these

Starting point is 00:16:34 programming tasks. Then we start to get these massive jumps. So this real big jump with Opus 4-6 and Claude Mythos that follows it. This really corresponds with the period starting in late 2025, early 2026, when suddenly you sell professional computer programmers began using agentic coding systems. So the other thing that happened. So we started tuning these things to be better at producing plans and to be produced sharper code. We also, and by we, I mean the AI companies began to work really hard on the coding harnesses. So the programs that you hook up to the LLMs to help produce, to make the plan, produce the code, check things, to connect with the various systems that professional programmers work.

Starting point is 00:17:17 And they begin building out these coding harnesses to be better and better at solving and working on the types of software tasks that professional programmers. face. And here's the key thing. We know about this because, ironically, the company that produced the new model that could detect all vulnerabilities is had a vulnerability and they leaked their source code for their coding and hardest clod code. So we know what's actually in the source code for the cloud code. And there is a ton of hand-coded, just humans sitting there building this thing, tinkering from scratch, pattern recognition, giant if-then statements. They call all sorts of external tools. There is a lot of old-fashioned 1960-style. AI logic built into this coding harness where it's just the expertise, they used to call these expert systems, the expertise of the computer program is the anthropic, building the cloud code coding harness are sitting in here just building out. How do I build a tool that's as useful as possible for actually producing computer code to solve problems in the way that like computer programmers solve it? So it's a mixture of an LLM that can produce better plans and produce sharper code plus a year or a year and a half of working on these monstrous coding harness.

Starting point is 00:18:27 with all of these hand-coded expert system type logic, just to be good at the very specific types of tasks that computer programmers face. And it's when those things came together, they crossed a certain threshold of utility that we got to take off with, we see with like Claude Opus 4.6. Now here's the key.

Starting point is 00:18:44 Meter is testing the model plus the best coding harness they can find. So what you're seeing in this graph is not just the fruits of post-training these models that are pre-trained in the same way they were doing in 2024, but you're also seeing the fruits of these incredibly complicated hand-coded, old-fashioned 1960s AI-style coding harnesses that they put on top of it. So a huge amount of energy in the AI industry went towards trying to solve this problem because there's a good market there. How do we build tools to help computer programmers? And then those tools could now handle these much more complicated problems that require many more steps or many more time.

Starting point is 00:19:21 Well, this is exactly the type of thing these coding harnesses plus tune models were being optimized to do. So they're jumping up this plot because we really started caring about this for the last two years, and in particular the last year or so we really, the industry really focused on this particular problem. Okay. So is that like a bad thing? Is there like a fraud? No, like this is actually very impressive. I think from a technology perspective, there was this long period for a couple years with the genera of AI where the concern was

Starting point is 00:19:57 what's the killer app here? Like this is really cool. I like asking chatbots things. It's really impressive. But where we're going to make money on this? What can this actually do? And for about two years, that was the question. And they originally thought the answer was going to be,

Starting point is 00:20:13 we'll just pre-trained these things until they're AGI and we could just do anything with them. That didn't work. But they pivoted. They pivoted in late 2024 to say, okay, we need to start tuning these for particular uses and building tools around them to try to solve particular problems.

Starting point is 00:20:28 And they said programming, not just like vibe code me, whatever, but like professional quality programming using professional tools over multiple steps. There's a market for that. And they were right. And they really worked hard on this. And it's impressive technology.

Starting point is 00:20:42 These harnesses are impressive technology. The tuned up LLMs are impressive technology. There's a huge amount of data you can use to tune an LLM to be better, in particular at producing compilable code. But really the coding harnesses, I think, is the story of the exponential leap here because they really figure.

Starting point is 00:20:56 figured out, we can't just trust the LLM to come up with the right plans. We can hard code a lot of logic because we know a lot about programming as programmers. And these are really interesting, cool tools. It's still shaking out exactly how they're going to be integrated into software development. But it's a real success story. They found a lane or applying this technology that would have a real commercial viability. And it worked, and that's what we're seeing in that chart. But does this mean, like it was being claimed in all those tweets we looked at the beginning of this episode, that AI is about to, quote, eat everything.

Starting point is 00:21:26 Does that chart going up in the last year with two points mean that we are inevitably on a crash course towards artificial superintelligence in all things we care about and we're going to become inputs to a nergotic alien intelligence? Well, clearly the answer there is no. It is a measure specifically of the efforts of the AI companies to try to build programming tools. It's a measure of programming tools.

Starting point is 00:21:50 And in the last couple of years, they made some great breakthroughs. We have two data points to capture that. I think partially what's going on here is there's two different mental models for thinking about AI improvement. And depending on what mental model you adopt, it really affects the way you end up thinking about things like the meter chart. Though first model, which I think is very common and is wrong, is a model I first saw in the MaxTegmark book Intelligence 3.0, which predates all this gender of AI stuff. And it's this idea that you can imagine that AI capability is like water. And as AI in general gets better, the water level rises.

Starting point is 00:22:26 And this water level is rising over a sort of like mountainous plane. And the taller the peak is like the harder the problem. And so as the water rises above a certain peak, it can now solve problems of that difficulty. And then as it rises farther, it can solve problems of that next difficulty. That's the way a lot of people think about AI. And so when you look at the meter chart, it looks like, look, this is, whatever this is measuring, we see the water level going up fast. so the hardness of things AI in general can solve is really improving. Oh my God, it's going so fast.

Starting point is 00:22:58 If we extrapolate that out for another three years, then we're going to have the water above like all the mountains, and AIs can be able to do everything. That's the wrong mental model for how generative AI-based tools work. A better model is to think about AI progress as a river. And as you go down this river, you see these various openings for tributaries, like little streams coming to the river.

Starting point is 00:23:23 And think about each of these tributaries as a potential application of the AI technology, a particular area where you could build tools on the AI for it to be useful. You don't really know in advance how navigable each of these tributaries are

Starting point is 00:23:37 until you go at it and give it your best effort and you portage over the rapids and see how far you can get like it's Henry Hudson in the 17th century. Some of these tributaries end up if you really try hard or very navigable.

Starting point is 00:23:48 I think the computer programming example, the software development example, is a tributary that's ended up to be very navigable. It's like, oh, we found the Hudson River. This thing keeps going. This is really important. But one tributary being navigable doesn't necessarily tell you anything about another unrelated tributary. So maybe we go down this other one over here.

Starting point is 00:24:06 Oh, I'm going to build AI that's going to like handle all my email. We didn't get very far. It's rapids, and then it becomes really shallow and they just kind of disappeared. Oh, okay, we'll try another. That is what it is like trying to find applications based off of a, technology. You have to explore different tributaries and that requires the building of these custom tools, these harnesses. It's really hard. It took two years of concerted effort with experts at computer programming to build harnesses for computer programming. And you don't know how far

Starting point is 00:24:34 it's going to go before you hit a dead end. So there is no notion. No one says, hey, I got a hundred miles up the Hudson River. So I'm now going to assume any other opening I see is going to be one that I can go 100 miles on. They're different tributaries. They have their own. challenges, some are better suited for navigation than others. And that really describes our current moment. And we know this. I'm going to load up a tweet here, but we know this in part because here's a tweet from Rames Nam.

Starting point is 00:25:02 I saw this from Gary Marcus' newsletter as well. But here we have another index of AI model capabilities called the EPOC Capabilities Index or the ECI, which is not just computer programming, but a bunch of different things. And now, look, here's GPT4, and then here's the latest. one's over here. We have like a linear increase. It's noisy and the jumps here over the last three years are slow and steady, right? These are the same models that on the programming tests were jumping up on an exponential. So it depends what you're trying to do with them. All right. So I think that's an important way to think about this, right? I think an important way

Starting point is 00:25:42 to think about this is what application are we talking about? And I think we should treat these applications like normal technology, which means we can be excited about the things that they can do, but not extrapolate wildly about the things they can't. If you're a software developer, you can be super interested in these tools. They went all in on it, and I think they're making really interesting progress. But if you're not a software developer, that doesn't really matter to you right now. And it doesn't tell you anything about what AI is going to do in your particular corner of the world, if anything. All right, so the wrap this all up.

Starting point is 00:26:14 Let's go back to those hysterical tweets from before. AI is going to eat everything. What's really going on here? Well, partially it's the wrong mental model. It's the water raising instead of exploring the tributaries. I think that's a big part of it. But I also think a lot of these people that are making those really over-the-top tweets, they come out of a community that's known as the transhumanist.

Starting point is 00:26:34 So we talked about last week about the rationalist community and a subversion, a subclass of the rationalist where the existential risk people and these really influence the big AI companies. But there's another group that intersects the rationalist, which is a transhumanist. and they really came out of Ray Kurzweil's work, which looked at exponentials. And it was looking at exponential increases in processing power of computer chips.

Starting point is 00:26:55 They said, well, if we extrapolate this exponential, computers will be so powerful that we'll be able to upload our consciousness in the machines and it will be in a utopia. Transhumanists love this idea of following exponentials wherever they find them, extrapolating them out, and then saying, well, if we get all the way out there, life as we know it will literally be changed.

Starting point is 00:27:14 and sometimes they're super utopian and sometimes they're super dystopian, but they get meaning in their life. Their religious cult is one of exponentials delivering transcendence or destruction. It's eschological, right? I mean, there's heaven, there's hell, and a giant event's going to happen to live as one or the other. It's a story that goes back to the very beginning of written stories, and the transhumanist love the story. They move from exponential to exponential. And so when they see something like the meter graph, which has an exponential,

Starting point is 00:27:45 now honestly, it's two points. But okay, an exponential, they say, yes, this is going to deliver our doom or our salvation, because that's the way they want to see the world. So the transhumanist and the existential risk communities, these have become mixed together, and they've been very influential

Starting point is 00:28:01 in the way that the AI companies talk about their technologies. They've been very influential in the internet chatter, and they've caused a lot of anxiety. So I want to end this. You know, and again, I hope I'm being clear here. I'm not trying to be skeptical about the value of programming tools, but I want to be able to talk about just tools like a normal person. Right?

Starting point is 00:28:21 Like, when someone showed me the first useful electric car, we were able to say, oh, that's cool. Like that thing, it drives really nice, it doesn't use any gas, it's simpler, it's not going to break down as much. What a cool technology. We could just say that without being like, I'm extrapolating, and soon cars will be going to speed of light, and we're all going to worship car gods.

Starting point is 00:28:39 It's like, why do we have to go crazy? Why can't we just look at a tool and say, that's really cool? Let's see what happens next. Here's my call now. Here's what I think has to happen. I think the AI companies are now at a size and importance, especially as they're considering going public, that they need to start to distance themselves from these cult-like communities.

Starting point is 00:28:59 I think the major AI companies need to distance themselves from the extreme ex-riskers. I think they need to distance themselves from the transhumanist. they've become too big and too important and too influential to be mixed up with these schools of thought that have so many other out there exaggerated or straight up problematic elements to it. I really do think we're going to see in the next year or so a separation from the AI companies and Dario Amade and Sam Altman and Elon Musk. We can see a separation in the way they talk about their technology to the public, to consumers, to the investors, from the way that these other communities that they were

Starting point is 00:29:37 a part of before and are very influential to them talk about it. I think it's going to be like the modern Republican Party distancing themselves from the conspiratorial John Burt's society in the 1960s. That's what we need to do today. We need a Dario Amadeh or Sam Altman to look at these. AIs going to need everything. The aliens are here, you know, whatever, and say, that's not us. That's kooky. That doesn't represent us. We're trying to build useful tools. Let us tell you how. Over here, we're building this. Here's why we're just going to make everyone's life better. Over here, we're trying to build this tool. Here's why I think it's going to be useful. Here we failed, but we're still working on it. We're not destroying the world. AI is not going to eat everything.

Starting point is 00:30:12 You need to stop talking about all these exponential worshipping cult. You want to be on the internet doing that? You should be in a dark corner of the internet of doing it. We're trying to build real tools that we think are going to be useful. We're going to explain why. That's what I think has to happen. It's time to distance the way we think about AI from these particular communities that are freaking everyone the hell out. They're influencing how even CEOs are talking about things. They're moving to stock markets. They're causing large anxiety. And they're often wildly wrong, exaggerated or technically incomplete. So, all right, that's my soapbox plea, but if you want to come away with one message from

Starting point is 00:30:42 the meter chart, how would I summarize it? I would summarize it by saying, in the fall of 2024, we moved from pre-training the post-training, and one of the problems we began post-training for was reasoning the computer programming. And then in 2025, we began working on the harnesses. We got good. And so our last couple of model plus harness combinations have been making leaps in the complexity of things they can solve, which exactly matches what we're.

Starting point is 00:31:05 we're seeing the software development world where now it wasn't until those models that they were good enough for people to use them. The meter chart is capturing the fact that the AI companies bet on this very narrow task but perhaps financially lucrative and economically useful task is paying off that these tools are doing well. It says nothing about the fate of humanity or AI more generally or artificial superintelligence or any of these other sorts of ex-risk transhumanist fever dreams. All right. So let's leave it there. We'll be back Monday with advice episode of the show and probably another AI reality check on Thursday, but until them, remember, take AI seriously, but not everything that people say about it.

Deep Questions with Cal Newport - Is AI About to “Eat Everything”? | AI Reality Check

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.