Deep Questions with Cal Newport - AI Reality Check: Are LLMs a Dead End?

Starting point is 00:00:00 We've been told time and again that the massive large language models trained by companies like OpenAI and Anthropic are poised to utterly transform our world. We've been told that huge percentages of existing jobs are soon to be automated. We've been told that skills like writing and photography and filmmaking are all about to be outsourced. And we'd be told if we're not careful, the systems built on these models might someday soon become sentient and even threaten the existing. of the human race. But here's the thing. One of the AI pioneers who helped usher in this current age is not convinced. His name is Jan Lacoon, and he's been long arguing that not only will LLM-based AI fail

Starting point is 00:00:47 to deliver all these disruptions, but that it is, and I'm quoting him here, a technological dead end. People have started to listen. Earlier this month, a syndicate of investors, including Jeff Bezos and Mark Cuban, along with a bunch of different VC firms, raised over a billion dollars to fund Lacoon's new startup, Advanced Machine Intelligence Labs, which seeks to build an alternative path to true AI, one that avoids LLMs altogether. After all of the hype and stressed and hand-winging around LLM-based tools like Chap, GPT, and Cloud Code, is it possible that Jan Lacoon was right? that those specific types of tools won't change everything. And if so, what's going to come next? If you've been following AI news recently,

Starting point is 00:01:38 you've probably been asking these questions. And today, we're going to seek some measured answers. I'm Cal Newport, and this is the AI reality check. Okay, so here's the plan. I've broken down this discussion into three sub-question. Sub-question number one, what exactly is Jan Lacoon up to and how does this differ from what the existing major AI companies are doing? Sub question number two, how is it possible that he could be right about LLMs running out of steam if everything we've been hearing recently from tech CEOs and news media is about how fast LLMs are advancing and how this technology is about to change everything?

Starting point is 00:02:19 And number three, if Lacoon is right, what should we expect to happen in the next few years? And what should we expect to happen in the maybe decade time span? All right? So that's our game plan here. It's going to get a little technical. I'm going to put on my computer science hat. But I'll try to keep things simple, which really is the worst of both worlds because it means that the technical people will say I'm oversimplifying. And the non-technical people will say I still don't make sense.

Starting point is 00:02:44 So I'm going to do my best here to walk this high wire act. Let's get started with our first sub-question. What is Jan Lacoon up to? Well, let's just start with the basics. I want to read a couple of quotes here from a recent article that Cade Mets wrote for the New York Times, discussing what just happened with Lacoon's new company. All right, so I'm quoting here. Lacoon's startup, Advanced Machine Intelligence Labs, or AMI Labs,

Starting point is 00:03:08 has raised over $1 billion in seed funding from investors in the United States, Europe, and Asia. Although AMI Labs is only a month old and employees only 12 people, this funding round values the company at $3.5 billion. Dr. Lacoon, who 65, was one of the three pioneering researchers who received a Turing Award, often called the Nobel Prize of Computing, for their work on the technology that is now the foundation of modern AI. Dr. Lacoon has long argued that LLMs are not a path to truly intelligent machines. The problem with LLMs, he said, is that they do not plan ahead, trained solely on digital data.

Starting point is 00:03:48 They do not have a way of understanding the complexities of the real world. Quote, if you try to take robots into open environments into households or into the street, they will not be useful with current technology, end quote. Mr. LeBroon, who's the CEO of AMI Labs, told the times, we want to help them reach new situations, react to new situations with more common sense. All right, so that's kind of a high-level summary of what's going on. Let's get in the weeds here to really get into the technical details of what Lacoon is saying and how it differs, how his vision differs,

Starting point is 00:04:16 from what the major existing frontier AI companies are actually doing. All right, let's start with a basic idea here. If you're an AI company, you're trying to build artificial intelligence-based systems that help people do useful things. This could be like by asking them questions with a chatbot or having a system help you produce computer code. If we're talking about coding agents,

Starting point is 00:04:37 at the core of all these products, needs to be some sort of what we can call digital brain, something that encapsulates the core of the artificial, intelligence that your tool or system is leveraging. So the major AI companies like OpenAI and Anthropic have a different strategy for creating those underlying digital brains than Jan Lacoon's new company has. All right. So what are the existing AI companies doing?

Starting point is 00:05:06 They're all in on the idea that the digital brain behind these AI products should be a large language model. Now we've talked about this before. You've heard this before. so I'll go quick, but it's worth reiterating. A large language model is an AI system that takes as input text, and it outputs a prediction of what word or part of a word should follow. So if we want to be sort of anthropomorphic here,

Starting point is 00:05:31 what it's trying to do is that it assumes the text it has as input is a real pre-existing text, and that what it's trying to do is correctly guess what followed that text in the actual real existing pre-existing text. really what a language model does. So if you call it a bunch of times, so you give an input, you get a word or part of the word as output, you then append that to your input and now put the slightly longer input into the language model. You get another word or part of a word. And if you add that to the input and put that through the model, you slowly expand the input into a longer

Starting point is 00:06:06 answer. This is called auto-regressive text production, that you keep taking the output and putting it back into the input until the model finally says, I'm done. And then you have your response. So we can think about it then if we zoom out a little bit, the large language model takes text as input and then expands whatever story you told it to try to finish it in a way that it feels is reasonable. Under the hood, they look something like this.

Starting point is 00:06:31 Jesse, can we bring this up on the screen here? This is like a typical architecture for a large language model. You have input, like here it says the cat sat, that gets broken into tokens, those get embedded into some sort of mathematical semantic space. Don't worry about that. They then go through,

Starting point is 00:06:45 a bunch of transformer layers. Each layer has two sublayers, an intention sublayer, and a feed-forward neural network. And out of the end of those layers come some information that goes into an output head that selects what word or part of a word to output next. So that's the, it's kind of this linear structure is the architecture of a large language model. So the way you train a large language model is you give it lots of real existing text. And what you do is you knock words out of that text.

Starting point is 00:07:12 You have it try to predict the missing word. and then you correct it to try to make it a little bit more accurate. If you do this long enough on a big enough network with enough words, this process, which is called pre-training, produces language models that are really good at predicting missing words. And to get really good at predicting missing words, they end up encoding into those feed-forward neural network layers within their architecture. Lots of knowledge about the world, sort of how things work,

Starting point is 00:07:42 different types of tones. They get really good pattern recognizers, really good rules. You actually sort of implicitly, emergently and implicitly, within the feed forward neural networks in the language models, a lot of sort of smarts and knowledge begins to emerge.

Starting point is 00:07:57 That's the basic idea with a large language model. So the large, the AI companies, their bet is if these things are large enough and we train them long enough, and then we do enough sort of fine-tuning afterwards with post-training, you can use a single massive large,

Starting point is 00:08:12 language model as the digital brain for many, many different applications, right? So when you're talking with a chatbot, it's referencing the same large language model that your coding agent might also be talking to to help figure out what computer code to produce. It'll be the same large language model that your OpenClaw personal assistant agent is also accessing. So it's all about one Hal 9000 style, massive model, massive large language model that is so smart, you can use it as a digital brain for anything that people might want to do in the economic sphere. That is the model of companies like OpenAI and Anthropic. All right.

Starting point is 00:08:50 So what is Jan Lacoon's AMI Labs doing differently? Well, he doesn't believe in this idea that having a single large model that implicitly learns how to do everything makes sense. He thinks that's going to hit a dead end. That's an incredibly inefficient way to try to build intelligence. and the intelligence you get is going to be brittle because it's all implicit and emergent. You're going to get hallucinations or sort of odd flights of responses that really doesn't make sense in the real world. So what is his alternative approach? Well, he says instead of having just one large single model, he wants to shift to what we could call a modular architecture,

Starting point is 00:09:31 where your digital brain has lots of different modules in it that each specialize in different things that are all wired together. Let me show you what this might look like. I'm going to bring on the screen here a key paper that Lacoon published in 2022 called A Path Towards Autonomous Machine Intelligence. This has most of the ideas that are behind AMI Labs. This paper has this diagram here I have on the screen. It's an example of a modular architecture. So he imagines an AI digital brain now has multiple modules, including a world model, which is separate from an actor, which is separate from the critic, which is separate from a perception module, which is separate. from short-term memory, which is separate from an overall configurator that helps move information

Starting point is 00:10:14 between each of these different modules. So you might have, for example, the perception module makes sense of input it's getting maybe through text or through cameras if it's a robot. It passes that to an actor, which is going to propose like, here's what we should do next, but then the critic is going to analyze as different options using the world model, which has a model of how the relevant world works to try to figure out which of these options is best, pulling from short-term memory, then the actor can choose the best of those options, which and gets executed.

Starting point is 00:10:41 So it's a much more of a, we have different pieces that do different things. Now, another piece of the Jan Lacoon image is that you can train different modules within modular architecture differently. Again, in a language model, there's like one way you train the whole model and all the intelligence implicitly emerges. In Lacoon's architecture, he says, well, wait a second. Train each module with the best way, with whatever way makes the most sense for what that module does.

Starting point is 00:11:07 So like the perception module, let's say, It's making sense of the world through cameras. Well, there we want to use a sort of vision network that's trained with sort of like classic, deep learning vision recognition of the type that, you know, Lacoon actually helped pioneer back in the 90s and early 2000s. But then the world model, which is trying to build an understanding of how the world works, he's like, oh, we would train that very differently. In fact, he has a particular technique. So if you've heard of JEPA, GEPA, joint embedding predictive architecture, this is a new training technique that Lacoon can. came up with for training a world model. We're at a very high level. He says, here's the right way to do that. Don't train a model that tries to understand how a particular domain works.

Starting point is 00:11:50 Don't just train it with the low-level data, like the actual raw words from a book or raw images from a camera. What you want to do is take these real-world experiences and convert them all the high-level representations and train them on the high-level representation. So, like, I'm simplifying here a lot. Let's say you have as input a picture of a baseball about the hero hit a window and then a subsequent picture where the window is broken. You don't want to train a world model, he argues, just on those pictures. Like, oh, if I see a picture like this, the picture that would follow is one where the glass is broken. That's how maybe something like a standard LLM-style generative picture generator might work. He's like, instead, take both pictures and have a high-level

Starting point is 00:12:35 representation. So it's like a mathematical encoding of like a baseball is getting near a window. Like what actually matters? What are the key factors of this picture? And then the next picture is the window breaks. And what you really want to teach the model is when it has this high level setup, a baseball is about to hit the window. It learns that leads to the window breaking. So it's not stuck in particular inputs, but learning causal rules about how the relevant domain works. And anyways, there's a lot of other ideas like this. The critic and actor that comes out of RL reinforcement learning worlds as sort of well known. You've trained one network with rewards and another one to propose actions.

Starting point is 00:13:11 And so there's a lot of different ideas coming together here. The third piece about Lacoon's vision that differs from the big AI companies is he doesn't believe in having just one system that you train once and is then the digital brain for all the different types of things you should do. He says this architecture is to write architecture for everything. but you train different systems for different domains. So if I want a digital brain that we can build computer programming agent tools on, I'm going to take one of my systems with its world model and perception and actor and critics,

Starting point is 00:13:47 and I'm going to train it specifically for the domain of producing computer programs. And then all my computer programming agents that people are building will use that particular system. But if I want to do help with call centers or whatever, I might completely train a different version of the system just to be really good at call center. So we don't have just one massive hell 9,000 that everything uses, which is the open AI plan or the Anthropic plan. We custom train systems that maybe all use the same general architecture,

Starting point is 00:14:17 but we train them from scratch for different types of domains. You're going to get much better performance out of it. All right. So that is Jan Lacoon's vision. And he says, this is how you're going to get, much more reliable and smart and useful activity out of AI, this idea that we're just going to train like a massive model that can do everything based off of just text. He's like, come on, this makes no sense. That can't possibly be the best, most efficient route towards actually

Starting point is 00:14:44 having smarter AI. All right. So that is the key tension between the existing AI companies and Jan Lacoon's idea. This brings us to our second sub-question. How is it possible that Lacoon could be right that LLMs are dead in if we've been hearing nonstop in recent months about how these LLM-based companies are about to destroy the economy and change everything. How could we be so wrong?

Starting point is 00:15:09 Lacoon is not surprised by that. I think if we asked him, I'll simulate Laccoon. If we asked him, he would say the short answer to that question is, look, a lot of coverage of LLMs recently have been a mixture of hype and confusing the specific

Starting point is 00:15:26 LLM strategies of the frontier companies with the idea and possibilities of AI more generally and kind of mixing those things together, which is fine if you're Sam Altman or Dario Amadee, that's great for you because you need investment, but it's probably not the most accurate way to think about it. Now, if we ask Laccoon in this hypothetical to give a longer explanation about how we could be so wrong about LLMs, he would probably say, okay, let me explain to you the trajectory of the LLM technology in three stages. And I think this will clarify. a lot. All right. So the first stage was the pre-training scaling stage. And this is the stage where the AI companies kept increasing the size of the LLM, so how big those layers are inside of them, the size of the LLMs, the amount of data they trained them on, and how long they trained them. And there was a period starting in 2020 and lasting until 2024, where making the model bigger and training them longer, demonstrably and unambiguously increased their capabilities. This petered out after about GPT4. After about GPT4, OpenAI, we have evidence that XAI had the same issue. We have evidence that meta had the same issue. When they continued to make their models

Starting point is 00:16:43 bigger, they stopped getting those big performance jumps. So they couldn't just scale them to be more capable. This led to stage two, which I think of us starting in the summer of 2020. which is where they shifted their attention to post-training. So now, like, we can't make the underlying smarts of these LLMs better by making them bigger, training them longer. So what we need to do is try to get more useful stuff out of these existing pre-trained LLMs. And so the first approach they came up with, and we saw this with the alphabet soup of models that were released starting in the fall of 2024,

Starting point is 00:17:19 2001-03, nanobanana, like all these type of names. the first approach they tried was telling the models to think out loud. So instead of just directly producing a response, they post-trained the models to be like, actually explain your thinking. And it was sort of a way, because remember, it's auto-regressive. So as the model sort of explains its thinking, that's always going back as input into the model and it gives it more to work off of in reaching an answer. So it turned out if you had the model think out loud, you got slightly better on certain types of benchmarks.

Starting point is 00:17:50 So these are the so-called reasoning models. But it was a bit of a wash because this also made it more expensive to use the models because it burned a lot more tokens because the answers. It produced a lot more tokens that get to the answer you cared about. So it did better, but it was unclear like how much of that we actually wanted to turn on for users. The second approach they used in the second stage was post-training. So now if you have, for example, a lot of examples of a particular type of question, prompts correct answers, prompts correct answers. You could use those combined with techniques out of reinforcement learning to nudge the existing pre-trained model to be better on those type of tasks. So we entered this stage, stage two of the sort of post-training stage where because we couldn't make these LM brains fundamentally smarter, we wanted to try to tune them to get more performance out of them on particular types of tasks.

Starting point is 00:18:43 This is when we began to see less of just, hey, try this model and it's going to blow your socks off. and we instead got lots of charts of inscrutable benchmarks. Look, the chart is going up on this Alphabet Soup benchmark because, you know, you could post-trained for particular benchmarks. It was less obvious than a lot of use cases for the regular user that like, well, the underlying smart seems to be the same. We then entered a stage three. I think this started in the fall of 2025 where the LLM company said,

Starting point is 00:19:13 really the big gains going forward is in the applications that you, the LLMs. Let's make these applications smarter. So it's not just how capable the LLM is. It's like how capable is the programs that are prompting the LLM. Let's make those smarter. So we saw a lot of this effort going into the programs that are called coding agents that help computer programmers edit and produce and plan computer code. Now, these type of agents had been around for many years, but they got really serious. A lot of the AI companies, especially last year coming into the fall of last year. And how do we make the programs?

Starting point is 00:19:50 They weren't changing really much the LLMs. They did some fine-tuning for programming, but really the big breakthroughs and coding agents were in the programs that call the LLMs and they figured out how can we make these coding agents capable of working with enterprise code bases. So not just for individuals vibe coding web apps, but something you could use if you're a professional programmer in a big company. All of that's tool improvements, making sure that you're able to sit. better prompts to the LLM,

Starting point is 00:20:17 when you hear about things like skill files and managing like hierarchies of agents, this is all improvements in the programs that use the LLM, none of this is breakthroughs in the digital brain itself. And so this is the stage that we are in now is we're spending a lot more time, building smarter programs that sit between us and the LLMs that they're querying as their digital brain

Starting point is 00:20:39 so that it's in very particular domains, it is more useful. So this all tells us, right? This is like what Lacoon would tell you, right? I'm channeling Lacoon. He would say, once you understand this reality, you see that this impression that LLM-based AI has been on this super fast, like, upward trajectory of lots of fast advances,

Starting point is 00:20:58 is pretty illusory. The fundamental improvements in the underlying brain stopped a couple years ago. What we saw was then a period of lots of bragging about benchmarks doing better, but this was all about post-training. And now for the last four months, like all these improvements we've been hearing is about the programs that use the LLMs are being made smarter and they're better fitting particular use cases. But there really hasn't been major fundamental improvements in the underlying

Starting point is 00:21:28 smartness of the digital brains, which is why all the problems like hallucinations and unreliability persist. The brains are actually incrementally improving, either in narrow areas or in narrow ways. And it's what we're building on top of them, is creating an illusion of increasing trajectory of artificial intelligence, when in reality we might just be in a very long-tail stage of now we're going to do product market fit and actually build, do the work of building more useful products on top of a mature digital brain technology that's only advancing at a very slow rate. That would be Lacoon's argument. Therefore, we will find some good fits, but this is not a technology that's on a trajectory where it's going to be able to make massive leaps in what it's actually able to do. all right

Starting point is 00:22:11 so there you go that would be the argument for how we could have gotten L in progress so wrong all right sub question number three let's follow through this thought experiment what would happen if Lacoon is right about that

Starting point is 00:22:28 what would we expect then to happen in the near future well let's start with the window of the next one to three years if he is right we would see a long tail of applications based on existing LLMs to begin to fill in. So computer coding agents have gotten more useful. We will see other use cases like that that don't exist now,

Starting point is 00:22:50 but where people are really experimenting to try to figure out applications that are going to work in other types of fields. So it'll be sort of clod code moments in other fields, which I think will be useful and exciting. The tool sets used in many jobs will change, but because we're now just trying to find areas where we can build useful applications on top of existing LLMs, these doomsday scenarios, like we've been talking about on this AI reality checks recently,

Starting point is 00:23:19 where knowledge workers are going to have to become pet masseuses, and then after that they're going to have to cook the pets on garbage can fires because there's no money left in economy. None of those scenarios would unfold based on LLMs in this current vision. There would be a big economic hit, because if we've shifted our attention to building better applications on top of the LLMs,

Starting point is 00:23:40 what we're going to see is a lot more companies get into that game, and they're going to say, I don't want to pay for a cutting-edge frontier, hyper-scaled LLM. It's too expensive. Let's look at cheaper LLMs. Let's look at open-source LLMs. Let's look at LLMs that can fit on chip. We saw this already with the OpenClaw framework,

Starting point is 00:24:01 which allowed people to build their own custom applications that use LLMs to do personal assistant type roles. And right away, people are like, I don't want to pay all the money to use Clot or GPT, and you saw an explosion of interest in on-ship machines and open-source machines. All this is going to be, I think, good news for the consumers. That means we could have more people building these applications. There'll be more variety of these applications and they'll be cheaper. It's bad news for the stock market because we've invested, depending on who you ask,

Starting point is 00:24:27 somewhere between $400 to $600 billion into these LLM hyperscalers like OpenAI and Anthropic. that market's not going to support it. So there's going to be a big crash. This will probably temporarily slow down. If this vision is correct, would temporarily slow down AI progress because investors are going to feel burnt. All right, what's going to happen now if we zoom out to like a three to 10 year range? That's roughly the range in which the modular architecture approach that Lacoon is talking about would reach maturity. That's what their current CEO is saying.

Starting point is 00:24:59 Again, it's a research company now, and they said it'll be several years until we really get the products that are ready for market. If Lacoon is right, what we're going to see is domain by domain, you're going to have these very bespoke, trained domain-specific modular architecture systems, which, if he's right, are going to be way more reliable and more smart in the sense of, like, they do the thing I ask them in a way that's good and as good as like some of my human employees and in a way that I can actually trust. We're going to see a lot more of that.

Starting point is 00:25:33 What's promised with LLMs, we're going to see instead on that three to 10-year basis if Lacoon is right. Because they're based on this module architecture, I think these systems will, you know, they'll be more reliable. They're also going to be easier to align. LLMs are so obfuscated.

Starting point is 00:25:51 It's just like here's 600 billion parameters in this big box that we trained for a month on all the text on the Internet. Let's just see what it does. Modular architectures are way more alignable. Like you have literally a critic module in there that evaluates plans based on both a world model and some sort of hard-coded value system to say which of these do I like better. You can just go in there and sort of hard code, don't do these type of plans, you know,

Starting point is 00:26:14 really have a low score for plans that lead to whatever, like a lot of variability in outcome or something like that. You have more direct knobs to turn. So it does make alignment more easier. they would also be more economically efficient because when you have to train one module long enough, one model long enough that could do everything, it has to be huge and it takes a huge amount of energy. But when you're training different modules

Starting point is 00:26:38 in a domain-specific system, these can be much smaller. I like to point out the example of a deep mind, a Google deep-mind tool called Dreamer V3, which can learn how to play video games from scratch. It's famous for figuring out how to find diamonds in Minecraft. And it uses a modular architecture very similar to what Lacoon is proposing here.

Starting point is 00:26:57 And we just read a paper about it in my doctoral seminar. I'm teaching on superintelligence right now. Dreamer v3, which can play Minecraft well better than if you ask an LLM to do, right? It's domain specific. Requires around 200 million parameters, which is a factor of 10 or less than what you would get in a standalone LLM. It could be trained on a single GPU chip. And it could do this domain way better than a frontier language model,

Starting point is 00:27:23 which is significantly longer and trained significantly more exhaustively. So there would be some advantages here. There would also be some, there's a little bit of digital lick around this world because way more so than LLMs, again, these domain-specific models might actually have more of a displacement capability. So we'd have to keep an eye on them. All right. Conclusion, what do I think is going to happen here?

Starting point is 00:27:46 Well, you know, I don't know, right? It's possible that there's more performance breakthroughs to get on LLMs and we're going to get more useful tools. A gun to the head if I had to predict, you know, through my computer science glasses, Laccoon's modular architecture, it feels like that has to be the right answer. I think of this doubling down on LLMs. We're going to look back at this like an economic mistake. It was the first really promising new AI technology, widespread AI technology built on top of deep learning. And it did cool things.

Starting point is 00:28:25 But instead of stepping back and like, okay, what will this be good for and what types of domains might we want to different models? We said, no, let's just raise half a trillion dollars and just go all in on everything. Text-based LLMs, which are trained on text that are made to produce text, all artificial intelligence will run off of these things. I just think when we zoom out on the 30-year scale, we'll be like, that was some. naive. This idea that like this was the only type of model we need for artificial intelligence. It's super inefficient for like 99% of the domains we want to use. It's great for text-based domains and computer programming, kind of. The planning is a little suspect, but the code production is okay. But we're going to make all intelligence based off just massive LLMs

Starting point is 00:29:04 and there'll be like four of them, like four companies that have like massive ones and that's it. This can't be the right way to do it. So my computer science instincts, say modular architecture, it just makes so much more sense. Domain specificity, differential training of modules, you have much more alignment capability. They're much more economically feasible. Like it just feels to me like that probably is going to be the right answer,

Starting point is 00:29:29 which means we're going to have to have some bumpiness in the stock market because I don't think that if this is true, the hyperscalers is now, either they have to pivot to those quick enough before they run out of money or some of them are going to go out of the business and the others are going to have to collapse before they expand again. So I think the modular architecture approach will work better. I don't know if Lacoon's company is going to do it or not,

Starting point is 00:29:48 but I think that architecture, it makes a lot of sense to a lot of computer scientists. Now, I hope they don't get too much better because I'm much more, I can much more imagine a very trained modular architecture, AI, digital brain, creating justified ick than I can building these Python agent programs

Starting point is 00:30:07 that access some sort of massive LLM somewhere. All right. So, yes, well, no, I think within a year, we'll begin to get a sense of which of these trajectories is actually true. I, of course, will do my best to keep you posted here on the AI reality check. All right, that's enough computer science talk for one day. Hopefully that made sense. Hopefully that's useful.

Starting point is 00:30:26 Been back soon with another one of these checks. And until then, remember, take AI seriously, but not everything that's written about it.

Deep Questions with Cal Newport - AI Reality Check: Are LLMs a Dead End?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.