CoRecursive: Coding Stories - The Pre-Training Wall and the Treadmill After It

Starting point is 00:00:00 So hi, I'm Adam Gourbel. This is co-recursive, and today I have Don. Hi, I'm Don. I'm here. Yeah, so Adam's been obsessed with AI and L-IMs for way too long. He keeps sending me tweets and articles. In fact, you sent me a bunch just like last night. Like 10 o'clock, too.

Starting point is 00:00:15 It wasn't, it wasn't early. It was late. It was late. And I'm like, what is? I think it was like nine. It was like, I don't know, close to 10. It was like 940 or something. I mean, it's not like I was busy.

Starting point is 00:00:26 It's fine. 920. 920. I win this round. Okay. But you ended at 950. So basically Open AI has a new release and so they're out pumping it up. Yeah.

Starting point is 00:00:38 And the thing I sent you that I thought was super interesting was this quote from Greg Brockman, who's the president of the company. He said, I think of Spud as a new base, a new pre-train. And I'd say it's like we have maybe two years worth of research that is coming to fruition in this model. And I have no idea what those words mean. What's Spud? What's a base? What's a pre-trained?

Starting point is 00:00:59 two years away. So we'll get into that. And what was the second one I sent you? The other one was older. A leaked memo from inside Google, three years old, a line you had highlighted said, we have no moat and neither does open AI. And moat, like, why are they making castle analogy? Like, I mean, I feel like you kind of, you know what a moat means. I do. I do know what a moat means. They're creating walled gardens, right? So like, they're like, well, hey, you know, we're making this thing and we've got billions and billions of dollars in funding, but there's nothing that stops somebody else from just doing this thing, which is like the whole core of what the internet was created for,

Starting point is 00:01:39 like way back in the day, right? It was just a bunch of people figuring things out. Everything was open. Then, you know, corporations moved onto the scene, and all of a sudden it's like, how can we monetize and make wild gardens and force people into our ecosystems? Which I think ties to the third quote I sent you. Do you want to share that one?

Starting point is 00:01:53 Sure. It says R1 is on GitHub. Lama is on hugging. and what's this $850 billion for? Yeah, that one, it's cryptic, but I feel like this gets at exactly what you are getting at. But yeah, I can call this format StackTrace, working name, we'll see how that goes.

Starting point is 00:02:10 But it's like, you know, when something blows up and you get a giant stack trace, and you have to kind of figure out what the error is and peel back the layers one by one, right? So I thought if we could peel back through these quotes from these articles that I thought were super interesting. The Brockman one is like brand new, right? It's now the 29th.

Starting point is 00:02:29 I think it was like a couple days ago they released their new framework. So I thought if we can walk backwards from whatever the business case to the engineering, what they're building, how it works, because none of this makes sense unless you understand what pre-training is, what a base model is, like what Open AI is even doing or trying to do with their new models. And like once we add some meat onto these bones, maybe we can figure out if these companies make sense, if they'll be profitable, if the world will change, et cetera. Yeah, no, that sounds like a good idea.

Starting point is 00:03:01 Okay, so let's start with what training is. So did you ever use the old school co-pilot where it was like auto-complete in VS code? I used something similar in IntelliJ. So I didn't use BS code too much, but IntelliJ had like auto-complete and it started getting smarter and smarter. Like it would sort of look at like the context in what you were writing and try and propose something. I would say that maybe 60% of the time it was useful, but like 40%, it was like, way off.

Starting point is 00:03:31 It was like, I don't want that. And then it would, it would try and you get into this state where it's like, press a button to auto complete it. It's like, but I don't want to. So now it's interrupted my flow, right? Because I can't just press a button or else it'll spew out all the stuff I don't want. So I'd have to like hit another button to cancel it out. I don't know if this is just a me problem, but it got, it got in the way of me actually

Starting point is 00:03:50 writing the code of like, no, leave me alone. You're suggesting something that's not useful. Yeah, I feel like people had different reactions to it. Like some people are still using that form factor, but many people aren't. But that was like the part of the first iteration of these LLMs. It was just picking the next token. So you have like all this code and then it's like, hey, what comes next? And it tries to guess that.

Starting point is 00:04:15 That's the entire like training objective. So before co-pilot even launched in 2017, Google publishes paper, attention is all you need. and it invented the transformer. The transformer is the T in like GPT, right? And it was just in Google, they figured out, hey, let's take this transformer thing, let's feed it all the internet, every Wikipedia article, every book. Every Reddit, yeah. Everything you've ever posted on Reddit is preserved somewhere.

Starting point is 00:04:42 And you just get it to predict what the thing is that comes after that, right? And Open AI, at this point, you know, they were kind of a research lab. And they did like this Dota 2 battles where they were trying to beat professional players. They had like physical robot. We were trying to solve Rubik's cubes. Like they did all this stuff. It was like supposed to be a researchy type organization. The GPT was like one of their bets.

Starting point is 00:05:07 And it was complicated because to get it to consume all of the internet was this complicated training run and it was a bit finicky. But it worked, right? Like it started to become good at predicting this next token. And like as we know this all, you know, became this huge industry. But the first thing that they kind of figured out, even maybe, you know, in the very early days,

Starting point is 00:05:27 was that, hey, we have something here where if we throw more compute, if we throw more GPUs at this, it just gets better. Yeah. Like that is kind of unusual, right? Like most problems, I don't know, most problems can't just be solved by like,

Starting point is 00:05:42 give it more CPUs. I find the opposite. That throwing more hardware at it is kind of like a, it's kind of like a trope in our, in our line at work, right? If you have some code that's not well optimized, it's used in a lot of memories taking a long time, you throw more hardware at it and problem solved

Starting point is 00:06:00 until it starts slowing down against. That's true, right? Like, why optimize this code? Yeah, that is a trope. Why optimize this code? Just buy a bigger server. Just buy a bigger server. Yeah.

Starting point is 00:06:10 Just upgrade it to the next node size. Yeah. So they have this very clear idea. Like, hey, if we can throw enough computers, then we'll have AGI or something. We'll have something very intelligent. hey, we got something that seems like you can think a little bit. We can't chat to it yet.

Starting point is 00:06:24 And like if we just throw more compute at it, you can think even better, right? And so let's just keep doing that. And they call this process training, simple. All the text of the world feed it to this thing, give it as much compute as you can. We get something smarter and smarter. So this original hypothesis of just like scaling up came from 2014, even before the Transformers, there was this guy, Ilya, Stutzgiver, and he had this paper called Sequence to Sequence,

Starting point is 00:06:54 and he argued that, yeah, with a big data set and enough compute, success is guaranteed of, like, building, you know, some sort of prediction machine. What does success look like, like, define, like, success? The predictions are only useful if they're accurate most of the time. Yeah, so he had come out of the University of Toronto, this, like, deep learning group, and they had this great success on what had been this really hard

Starting point is 00:07:18 problem at the time, which was identifying images, like picking what the things were in images and tagging them. And people have been competing for all different places to do the best labeling of these images. And this group, what's the head guy's name? Hinton. So this is Hinton, right? He was the professor. And yeah, they beat this benchmark of identifying images. They were just like so much better at it. And they did it with deep neural networks and just like a lot more compute, right? They blew it out of the water and revolutionized the field of machine learning. This guy in the background, this is Ilya right here, right? He was one of his students. They started working on this ImageNet, which was an annual computer vision competition,

Starting point is 00:08:01 and their submission was called AlexNet, and they trained on two consumer gaming cards that they had in the basement of UFT. I don't know a lot about GPUs, but you probably do. So they had two GTX 580s. Is that good, or? Those were good, yeah, those were good cards. I'm still rocking a 1080 TI. It's old. Yeah, I don't know. I'm not up on the field of GPUs.

Starting point is 00:08:20 But the point is, like, they, they were able to use neural nets and they just blew this, this benchmark out of the water that had been kind of, people have been inching up, right? Like getting a little bit better at identifying things. And that year when they submitted, like, the runner up 26.2 of the questions wrong, and they got 15.3. So, like, everybody had been slowly climbing into the 20s and they just, like, cut it in half. They're like, we got everything except these. But 15% is actually not bad.

Starting point is 00:08:46 What kind of questions are we talking about? It's identifying all these different things, but for some reason, there's a lot of dogs in it. So like to guess the dog breed and like circle like, oh, this is a whatever. That's better than I would do. Because of all those poodle ones. Like, who knows? Yeah. Yeah, there's like a million poodle crosses.

Starting point is 00:09:03 Yeah. So every researcher in this room grew up to become very important because this was like a revolution when they built this, when they beat this using. new approach, right? So three years later, based on this, Ilya, this guy, he co-found open AI. And he co-foundes it with Sam Altman and Brockman was the original quote. And I'm sure you've heard of Sam Altman. Then there was this other party, Elon Musk. Who's that guy? Elon Musk. Oh, I haven't. Yeah. Not familiar. Interesting. Anyways, so Ilya is like, he's, he's the research brains, right? He's the researcher who does all the research, right? Brockman is like the engineer, you know,

Starting point is 00:09:40 like let's actually productize all this, right? So he becomes chief scientists, and then in 2020, there's a paper published. They actually write a formal paper that's more than just vibes that says, like, here is how given more compute, like we can learn more things. Right. Like here's, this is their paper. They say, like, here's how we do this. Which is interesting because like before this, all these people were competing for ImageNet and it was like, you try a bunch of things, right? you have your pile of GPUs in your basement or in your research lab and it's like try some stuff

Starting point is 00:10:13 see if you can do better but here they're like dude we have a graph and obviously the graph was was based on on some like concrete data because i mean like anybody can make a graph and be like oh you know i i i improve the performance by 20% when i like increased the hardware by this much so therefore my graph goes to the moon yeah or like a you got married 10 years ago so by the time i'm 60 i'll be married five time. You need like a solid base of some comprehensive data points at the beginning of the graph to make a prediction. So I think it makes sense to be skeptical, but from their perspective, there was something great they could do with their graph, right? Which is say like, hey, if you give us more money, we can buy more GPUs. And like, per our graph, we'll have a smarter thing. And so

Starting point is 00:10:55 it becomes like a fundraising thing. Well, I mean, like, yeah, why else would you make a graph unless you wanted to like, you know, convince somebody to give you money? Yeah, like, I have this thing here on the graph, but think what I could do instead of those two like Don GPUs if I had all the highest end ones that I could fit in this room. Oh, now we're talking, right? So this is sort of what they do, right? The kind of this graph and this published paper, you know, and it's published in a reputable place. So people have vetted it. Yeah, so originally it was financed by this guy who you said didn't know, uh, Elon Musk. And, um, I don't know. He was just like, cool. Let's, let's build the, the, let's build the super AGI of the future, right?

Starting point is 00:11:34 There was a little bit in the early days of all the people involved in this, where they all believed in this idea that we could build a super human, intelligent machine. But like that belief meant when somebody got a published graph saying like, GPUs go up, smarts go up. And they're like, let's do this, right? I guess where I'm getting hung up is what was the thing that they were buying for $850 billion? Like, a smartness? Yeah, I'll buy, you know, here like $500.

Starting point is 00:12:01 units of smart. We can have a thousand units of smart. Cool. Here's $850 billion. Are they just still buying units of smart? That's not a good business point. Yeah. So they built an API, right? So early like GPT3, I used it. So this is before the chat. It was just token completion. Like, I tried to use it to write tweets for my work because like I would write a blog post and don't want to write the tweets. But you would have to be like give it an article and then a sample tweet and an article on a sample tweet. And then when you give it your article, then it's like, oh, I get the pattern. Complete it for you.

Starting point is 00:12:34 You couldn't say like, hey, man, write me a tweet. Like, it didn't understand that. You couldn't communicate with it. Anyway, so they put this on an API. They charge for it. People like it. It's exciting. But it's small, but they're small.

Starting point is 00:12:45 But they're like, we're on to something. Let's raise more money. And yeah, their business strategy that they were worried about that they said you could write down on a single grain of rice was scale. Like the word scale, because they're like, we have this thing on this API and people are paying for it. And it's, you know, it's one unit. smart. If we had 10 times the amount of GPUs, we can have 10, 10 unit smart or whatever the graph

Starting point is 00:13:09 was. So they were like, we got to go, man. Like we have this thing. It's going to change the world. But like, anybody can look at what we're doing and be like, oh, we could do the same thing, right? There's no secret sauce from their perspective. They're like, dude, if people knew all we're doing is trying to get as big as possible as fast, like we'll be in trouble. Yeah. So like the underlying algorithms are easy to replicate. And that's bad because they want to inevitably be the people that hold the keys? Yeah, they had this idea that the first super intelligence that came around would be all powerful. And so it better be us because we're great upstanding people who control it and not China or Iran or just that guy down the road.

Starting point is 00:13:57 But like the same rationale is the atom bomb. Yeah. but from a corporate side, because at this point, I don't think national security has gotten into it yet. At this point, it's just a bunch of nerds building this thing, right? It's not, but the U.S. will get involved. Okay, let's keep going. So in the year 2022, DeepMind, which was a group within Google, right,

Starting point is 00:14:18 they published this paper called Chinchilla. So at this point, like the GPD thing, you know, came out of Google. Google wrote a paper, but they didn't really build anything on it except an internal LLLM and then Open AI ran with it. Google sees it's something important and they keep working on it. And so they put out this chinchilla thing. And it shows that their graph

Starting point is 00:14:38 is kind of wrong. Oh. Which isn't good, right? That's not new. And so the chinchilla people, they built a whole bunch of models of varying sizes and varying training. And they found that like, no, there's actually a very clear relationship. It has to do not

Starting point is 00:14:54 just with the compute and not with just how big the model ends up with but the amount of data that you trained it on, which makes a lot of sense. It's like you give it more information for it to get smarter. Yeah, it needs to know more so that it can have a bigger library to draw from. Yeah, and so you can build it bigger with less information going in,

Starting point is 00:15:14 and it's just, it's not smarter. It's just bigger, right? So key factor is the data. Yeah, if we can give more resources to this thing, it'll be better, right? So Chitiel doesn't really break it. It just adds a new important wrinkle, right? It sounds like it reveals an important factor. It's making it work.

Starting point is 00:15:32 It's not just compute. You need the data. And so at the same time, the same month as Chinchilla, OpenAI publishes a paper that they call Instruct GPD. Guess what Instruct GPD is? Are instructing it to do something? So you're giving it data to learn on? How would you instruct it? You would have to feed it similar data for what you want to accomplish.

Starting point is 00:15:54 So, like, I mean, you're sort of close. But no, this is chat GPT. So it's a different way of asking it to do something? Yeah, because before, you would give it a bunch of texts and it would predict the next token. So now you're just talking to it. Now you're talking to it. Instructions. So they call it Instruct GPT.

Starting point is 00:16:12 So the thing that happens is that becomes the post-training step. So there was training and now we have this post-training step where we make it more human. But then they decide to rename that first training step where it consumes the whole internet pre-training. Right. Which means you end up with this weird world where, They have a pre-training step and then a post-training step. There's no actual training step. Like, they've accidentally...

Starting point is 00:16:32 They've removed the training. Yeah, exactly. The training is gone, even though it still exists. Okay, so then mid-2020-three, start a new training run. Same idea, right? Let's just, let's do an even bigger pre-training, right? So we've trained on the whole of the internet or a lot of it, like, let's add even more, right? As we know, I guess they trained it on a lot of books that they probably didn't properly have access to, but it's like, let's feed it more data.

Starting point is 00:16:57 we understand the formula. Let's make it even bigger. We'll have an even smarter model. So that's in mid-20203. They call this Orion. And this was supposed to be Chad Chupit 5. So 3.5 was the first chat chitp-t, and then there was four. And they came really close. And they're like, let's make five. Because the difference between 3.5 to 4 was really big. But I remember when they shipped it because Sam Altman said something like, hey, we have this new bottle. It's pretty cool. Not sure if we'll release it for a long time. But like he kind of downplayed it. He's like, it's all right. You You know what I mean? He wasn't like, this is the most exciting thing. And so this was, it was supposed to be GPT5, blow everybody's socks off. We're at the next level of smarts, but they

Starting point is 00:17:35 released it as 4.5, and it was super expensive if you use the API. And then I used it a little bit, and it felt kind of like more natural. I used it to get critiques in my writing at the time, like, hey, what's wrong with this essay? And it felt like more, it's hard to describe. It felt like more human or something. I thought it was great, but they took it away, right? It's gone, you can't use it anymore. What was, why did they take it away? So interesting theories, right?

Starting point is 00:18:02 But one for sure is, you know, it was 10 times the size of GPT4 point whatever, right? A lot more expensive. Like, requires 10 times more servers. Like, and it's like,

Starting point is 00:18:13 it's a bit better. Like some people were like, yeah, no, it's, I mean, I can tell it's a bit better. But like,

Starting point is 00:18:19 you're like, yeah, but it costs 10 times more. And, you know, the results are, you know, maybe not as obvious as you don't.

Starting point is 00:18:26 If I play, against the chess spot that's on my phone, like, it will beat me, right? If I play against whatever alpha chess, the best chess player in the world, it will also beat me. It'll also beat you. You don't notice. So yeah, reportedly, it's $500 million that they spent on that first run. And it was, it was like the model was fine, right? We got to try again. We got to get the smarter one. By that time, you're up to a billion dollars in compute. So have you ever had a project this big fail? No, no, I haven't. I can speak from experience. I've never had 500 million dollar project and it like flop because that's a huge failure but there's also this problem right

Starting point is 00:19:02 the business case is predicated on that they're going to make this forward progress so it could be devastating so that's probably why they did a second time they're like we're not giving up on this in the middle of this ilia the guy we're talking about he he left so he just he left opening a guy not a good sign there has to be some kind of mitigating factor to why but to this point they've been operating on the premise that if they just give more computing data that it will increase according to this chart. Yeah, we shall find out, right? I think, like, a lot of people have this perception

Starting point is 00:19:33 that these labs, like, open AI, like they're huge, they're making all this money, they're sitting on these big piles of cash, people are paying them for this product, like it's an amazing place to work. But if you think of it, it's really high stress, right? Like, they need to keep this promise going. It's very important for their valuation

Starting point is 00:19:51 that they're always able to have the next exciting model. Like, the whole thing, is premised upon, you know, number go up. And that's most corporations. But just being the hottest one with this huge, like, valuation. And it's not like Apple, where they have phones and stuff, like installed. It's like they have this API that you call. And if it's not getting better, and if there's alternatives, like, it's just, it can very

Starting point is 00:20:15 quickly. Yeah. And I guess the thing is that when people didn't like it and you say, well, why? It's like, well, didn't, it just didn't feel as. good. It's like, are the results based on feelings? How do they quantify, I guess? The benchmarks, right? And like in the early days, they tested against like the LSATs, like lawyer tests, like the GRE, like graduate test. Right. And then when they had the $500 million model and the 10 times more expensive, did they perform those benchmarks and it was like way better? Was it like 10 times better?

Starting point is 00:20:48 No, no, it was like a little bit better. Right. It was. Oh, okay. So like that's what they're basing this result on. It's not. Oh, yeah. It's, you got an. 82 it got 84 and you're like oh but it's 10 better and you're like maybe it's one of those phenomenon where like yeah but that last 10% it's like being perfect very hard it's way it's like a logarithmic scale right where like you could put in 10 times but you're not going to get 10 times improvement on that score right you're going to have to put a hundred times in to get like you know yeah yeah percent the final 10 percent is the hardest so this goes on for two years so they build this giant model, it's not great. And semi-analysis, like an industry research place says like open

Starting point is 00:21:30 AIs, leading researchers have not yet completed the successful full-scale pre-run that has been broadly deployed since May 24. That was GPG-40. So like that's not good, right? It's like the main thing they do, they haven't been able to do a new one and time is passing on, right? Internally, you have to imagine they're trying all these things and that they're not like moving things like like they see. And even inside, people didn't really agree on why this wasn't working. So they looked into it and they couldn't figure it out? Well, I mean, what's your guess? Something to do with the core way in which it operates. It got to the point where more hardware isn't going to actually make up for improvements in the algorithm. So in December, 24, like,

Starting point is 00:22:16 Ilya, who left, so he left Open AI. There was a whole kerfuffle with, he tried to get Sam Altman kicked out. he failed at that corporate drama. Yeah, corporate drama, right? The researcher guy tried to let the power move and the executive people. I'll make a movie about that someday. Yeah, right? Anyway, so he starts a new company,

Starting point is 00:22:35 and then he's at Nureps. He's at this big conference where he's being presented in an award for his great earlier work that led to all this, and he gives a talk. Pre-training, as we know it, will unquestionably end. Why will it end? Because while compute is growing, better hardware, larger clusters,

Starting point is 00:22:52 the data is not growing, but we have but one internet. You could even go as far as to say that data is the fossil fuel of AI. So like they made these early versions. They scrape a lot of the internet. They scrape all these books. They feed it to it. And it's great. And then you're like, okay, we need 10 times more.

Starting point is 00:23:10 So like, okay, we used to download the source code on a GitHub repo. Now let's get every revision, right? Let's get all the history. Well, it's just like less good data, right? or like we got every Reddit, you know, important posts, let's go to the really obscure forums or let's, it's like there's just less good data out there. Well, it seems that they've reached the limit to what the core algorithm can actually solve given its data.

Starting point is 00:23:39 We operate every day without the whole internet to like, like figure it answers to questions, right? Yeah, yeah. So if you need the whole entirety of the human internet to be a little bit better, then maybe you're not using the data. you have as efficiently as you should be. You've eaten all the good parts. Yeah.

Starting point is 00:23:57 Only the crumbs are left and they're not going to get you where you want to be. Yeah, I agree. And so they call this the pre-training wall. Pre-training was the original training that they were in the pre-training wall. They're like, we just can't. There's nothing here. Like, we can't get past this. We've, uh, or as Ilya says it, right, like, there was these fossil fuels,

Starting point is 00:24:15 which was all of the internet and all these books and we ate it all. We've hit peak oil. There's nothing left. There's nothing. This is, so they have to find another way, right? Okay. So to understand what happened next, like how these improvements happened, we have to go back to DeepMind, right? So DeepMind was the people who released the chinchilla paper, but the more important thing

Starting point is 00:24:38 was like, I don't know if you remember like a decade ago. There was this Alpha Go. Do you remember that? Yeah. The game go. Yeah. Yeah, I remember that. These guys, they had this DeepMind company got bought by Google.

Starting point is 00:24:48 And originally they started it playing at. Atari games. Then eventually they did Go and then chess. And the way that they trained it was this reinforcement learning. So they create something. They get it to play Go against itself. And then whichever one wins, they like let that one continue, make two copies of it. So like kind of like a evolution type thing. Yeah. So it learns. But uniquely it doesn't need the internet. It's not reading Go books, right? It's playing Go. Creating its own data. It's creating its own data by playing the game itself and when they originally created this go was considered like uncrackable and then they had this big Google had this big tournament against the best go player in Korea.

Starting point is 00:25:31 Lee Sedol and nobody thought that this thing would beat him and of course it crushed him because it had been playing go against itself like for the compute equivalent of the zillion years right it's just like learning and learning and learning right so it's creating its own data as you said right which is like a great solution to to this problem but it needs a scoreboard, right? Somebody has told it what is the preferable outcome. Like in a game, there's rules and you know when you win, right? So you can generate data because you can always figure out, oh, did I win?

Starting point is 00:26:03 Yes or no, right? So they started with this training, right? They became pre-training. And they added on this chat thing, the instruct. Now they add on this new step, which is reinforcement learning. So they call it RLVR. But basically, they need ways to have an action that we need the LLM to take. where we can verify if it got it right or not.

Starting point is 00:26:23 What's an example of something that's easy to verify if you got right? Like math. Yeah, math or anything that has like a right or correct answer, right? Or I think the most impactful one of recent years is like coding, right? You can write code. And it will work or it won't. It'll work or it won't, right? You can run the compiler see if it worked.

Starting point is 00:26:43 There's some nuance there because you can write code that will work but isn't good. Yes, I know that. I used to work with you. I know. Yeah, yeah. Oh, thanks. It's a cheap shot. Cheap shot.

Starting point is 00:26:53 Yeah. And so the cool thing is that it just lean into this, right? So this is a new way to generate data that Open AI comes up with in their panic and they kind of keep it to themselves. But if they can, you know, they can ask the LLM to, yeah, to come up with the solution to a bunch of calculus problems. And they ask it to like think out through all the steps, right? So let's ask it 12 times with random problems to solve this calculus problem and think it out step by step. And like most of them are wrong, but maybe one is right. And so then they take that one where it got it right and they can feed that back in as training model, like update the weight.

Starting point is 00:27:30 And they just start doing this in loops, right? Because once they get it to successfully do some calculus, then they update all the weights. Now it's a little bit better. And they can give it more problems, get more right answers. Now they're doing this deep mind like go thing, right? They can take their LM, do like thousands and thousands of generations. getting better at problems, as long as the problem has like somebody to say, like, is this right or not?

Starting point is 00:27:51 So now they're generating their own data. So this becomes O-1, this GPT model. And so in a way, it's like they have this wall of training and like they hit this wall and then they found just like a new dimension, right? So they can grow by generating their own data in another direction. Yeah. I mean, going back to his analogy of how that's the fossil fuel of AI, it's like they've just come up with a more efficient combustion engine.

Starting point is 00:28:17 Or a renewable resource, right? Because here the thing, it's like you take the LLM and it can play its own games. And if it succeeds, you're feeding that back in, right? So it's renewable and it's generating its own data if you have a way to score it, right? I mean, in the places where you can verify the answer, it can learn. But okay.

Starting point is 00:28:36 Yeah. So in February, so now we're like coming close to modern day, right? So in February 2025, we haven't even talked about Anthropic, but Anthropic releases Claude Code and the cool thing, I don't know where it happened with them and they've never confirmed it, but all of a sudden, these LLMs, they don't necessarily start doing a lot better at all different trivia, but they just start getting super good at coding.

Starting point is 00:28:59 And the theory that I think is pretty much confirmed, right, is just like, Anthropic builds Claude Code, but they can train on Claude Code as well, right? So they have all these problems and then they can run Claude Code through it, and then when it works, they're like, good job Claude Code, and they reinforce it. And so it gets better and better at coding. It's not necessarily better at all kinds of other things,

Starting point is 00:29:19 but this is like a very clear signal. If we have a bug on this project and it can solve it, it learns to get better and better at these things, right? And so that makes, but that makes all this synthetic data, right? If they have cloud code run on a problem and it solves it correctly, then, you know, they end up with this thing where it's step through things, right?

Starting point is 00:29:39 They're creating their own data, as we said. But so this is a big, this is like a second big breakthrough by OpenAI, right? They have a new way to generate more data. They kind of keep it closely held. But at the same time, the government is getting antsy about, you know, AGI. The U.S. is like, I hope we get AGI and not China.

Starting point is 00:29:56 And then they outsmart us and destroy the world. Or maybe the other idea is the government's just worried, like, hey, this is going to be huge industry. We want this industry to be American, right? And so they start putting in place controls on NVIDIA, telling NVIDIA, like, don't sell GPUs to China. Like we just don't want that. And then Nvidia doesn't love that because they're like, we like to sell these things.

Starting point is 00:30:17 We make a profit on them, right? Companies can buy these Nvidia GPUs, but they are handicapped. So they're super good at doing GPU stuff, but they have a very low memory. They did that back for Bitcoin learning too. That's when it started. But now it's not a tariff, I guess, but it's like an export constraint. The Chinese just can't get the ones that you can buy here. Yeah, there's proprietary processors meant specifically for AI that Navidia makes.

Starting point is 00:30:41 They're not allowed to sell them, yeah, to China. So in China, there's this company called Deep Seek that we talked about at the beginning. And they spun out of this hedge fund because the hedge fund wanted to do all this machine learning, I'm assuming to predict the stock market. They decide like we're going to build, we're going to build our own AI, right? And because they couldn't get the Navidia chips. Yeah. So they could get the Nvidia chips, but not the really high-end ones.

Starting point is 00:31:04 They could get the less high-end ones. And so they couldn't get the H-100, which is the Frontier Lab ones that cost. like $30,000, you know, per, and you end up in the server, they end up putting like eight of these in. So they're very expensive. I mean, they would have bought them, but they weren't allowed. Right. So they could only get this one called the H-800s that just are much less good at talking to each other. And the problem is you need a whole cluster of these to make it work.

Starting point is 00:31:31 And so DeepSeek is like, hey, we got to crack this code, right? They released a paper about what they did. So here's one of the things. Low precision training is often limited by the presence of outlawful. buyers in activators, weights, and gradients. So this is one of their tricks where they were able to lower the bits. Like, it's like they're running like a, like an N64 game on like an NES 8 bit. Like they were able to lower the bits without losing the accuracy somehow, which let them...

Starting point is 00:31:59 Like MP3s, right? Yeah, and then... They're able to do more with less. Only 20 SMs are sufficient to fully utilize the bandwidths of IB and NVLink. So what happened there is these things were handy. at how quickly they could network to each other. And so they found a way to use some of the layers as like a network card so that they could more quickly talk to each other.

Starting point is 00:32:23 We employ customized PtX instructions and auto tune the communication chunk size. So basically they figured out the instruction set or maybe it's known for Nvidia GPUs. And instead of using the normal like SDKs, they wrote in assembly how the instructions would work, sidestepping, how Nvidia does things so that they get performance speed. And so they published this whole paper on this. They published this new LLM and it blew people's minds. Like it's a more gritty approach, right? It's like we're constrained.

Starting point is 00:32:52 So we need to come up with a different way. Yeah, I mean, and that's how a lot of things were back in the early days of, you know, software development. And like, you had to be very aware of how many bytes of data certain fields were because you only had so much to work with. So the government, like the U.S. government tried to prevent China from getting a heads up. by putting these constraints in place. But the constraints actually just taught these Chinese companies how to how to do with less, right?

Starting point is 00:33:18 Maybe it even advantaged them because now they can operate on a smaller budget. Yeah, I remember when this happened, they came out with like their deep seat. And it was like, the video was freaking out because, well, if they can do this with their constraints, what's going to happen to us, right? The gravy train might be coming to an end here because obviously, like, you know, we have, all the unlimited hardware. And we can't perform as well as this. It's almost like we should have been looking at how to optimize our AI instead of just

Starting point is 00:33:51 drawing hardware out of it. Yeah, exactly. So they published in their paper that it cost them to do this, about $5.6 million, which was a little bit misleading because they were only talking about one specific stage of the training. But that got published and people were using it. And they're like, this thing's amazing. And meanwhile, Open AI is saying like, we spent a billion and it didn't.

Starting point is 00:34:10 We didn't get the same results. Yeah, if we didn't get an improvement, yeah, everybody panicked, right? The Nvidia stock fell. Everybody was like, what's going on here? Other thing that happened is when they published this paper and they released this model that they called R1. So one thing is this has to do with the moat, right? So we found ways to work with less, right? The other thing is the deep seek people publish in their R1 thing, this whole reinforcement learning idea.

Starting point is 00:34:38 The Open AI, this is their new secret, right? They're like, oh, we can give this thing rewards, have it think out, provide this feedback. So R1 uses the same trick. They came up with it on their own. Hey, try to solve these problems, try to think it through, and we'll take all these results, and then we'll feed them back. And they publish exactly how they train it. The Open AI's new trick that's going to blow away the market, this Chinese company just... It's just open.

Starting point is 00:35:04 Just put it in a PDF and put it on GitHub, right? Eventually, like, enough people are going to come to the same conclusion in the I mean, that's how most inventions happened, right? Maybe, but like, not right away, right? And not right away. Yeah, I guess they were hoping that they could keep that secret a little bit longer. That's our secret sauce. That's our secret sauce.

Starting point is 00:35:23 Yeah, so they called it RL on verified reward loops, and they described this like multi-stage pipeline and that there was this like aha moment where they saw after doing this feedback that the, the LM started to talk to itself and say like, oh, that seems like the wrong answer. maybe I should try this. And in its thing, you're seeing like, oh, whether it's thinking or not, it doesn't matter, but it's starting to be able to put out reasoning loops of like following a chain down one path, backtracking, going down another. They're like, oh, we're on to something, right?

Starting point is 00:35:52 So they get these reasoning loops where it's succeeding. It's like thousands of generations of it generating a bunch of questions, verifying which are right, feeding it back in. It's learning. It's generating its own data. And they just put the model out open weight for people to use. They put out, here's how we built it, right? because they're part of a hedge fund.

Starting point is 00:36:09 Like, this isn't how they make their money. And it's like, oh, my God, this is a crater into this whole, like, capitalistic venture of building these amazing models, right? It's because, like, on the one hand, this AI models, they're amazing. Like, the work that they can do, it's ridiculous, right? But, like, it's just the most, it's such a powerful tool. But at the other hand, yeah, you create it.

Starting point is 00:36:31 And then somebody else is right behind you. And then quickly the value of them is, like, is, like, going towards zero. Well, it could use like an older analogy. If you think way back in the day when, you know, they came up with TCBIP. And if they had to walled that technology off and they're like, only we know what TCIP, how it works, right? Somebody else would have figured it out. Yeah, yeah. It's networking because networking is very much real connecting.

Starting point is 00:36:54 But yeah, I get your point. Yeah, it was a technology that, you know, they could have held on to it and said, this is how, this is how it works. And now we are the holders to anybody who wants a network. Priitary technology that, like, makes networking work. and you have to pay us a license to use it. But other people are going to figure that out eventually, right? Or they'll come out with an open standard and be like, well, you know, everybody should use this because it's easier.

Starting point is 00:37:18 Yeah, exactly. So if you go back to where we started, right, we have this like pre-training that's actually really training and then this like thing to make it chat light and then this thing to do this reinforcement to generate its own data, right? So like Deepseek figured out how to do this part, right? That nobody else could. and they figured out how to do it very cheaply. But like, doing that first step of consuming the whole internet is still really expensive, right?

Starting point is 00:37:43 And so, like, you could think, like, oh, that's a, that's like a moat, like getting all this data and putting it together. And the barrier there is higher because you have to consume the whole internet. And that's something that's logistically hard to do. Yeah. But enter Mark Zuckerberg, right? So at the similar time, I'm not sure the exact time frame, right? Facebook, meta, they start building their own base model. they like we don't want to be left out of it and then when they release it um they say like hey

Starting point is 00:38:10 we're not actually in the business of of being like a an lLM serving API or something like we're we sell ads about i don't know what their ads are yeah i don't i don't use facebook yeah and so they say like hey if you're a researcher you can just uh download our model just ask for permission and you can download it and so they do that and then um very quickly one of those researchers downloads it and just puts it on BitTorrent because literally yeah, why not? Why not? And then Facebook demands that they

Starting point is 00:38:40 meta, I guess, demands that they take it down and then, but it's too late. And so they change their stance. They say like, no, this, I mean, you can use this. If for non-commercial purposes, just grab it and use it. Right. And this becomes a llama. So this is like the first, I think, open weight model. If you have the GPUs that you can run this on,

Starting point is 00:39:01 like you can just grab it. and use it for free. So Facebook spent whatever, the $500 million to consume the whole internet, which is weird, right? Like, why would they do that? I don't know. Here's what Zuckerberg said.

Starting point is 00:39:13 In the early days of high-performance computing, the major tech companies of the day each invested heavily in developing their own closed source versions of Unix. It was hard to imagine at the time that any other approach could develop such advanced software. Eventually, though, open-source Linux gained popularity.

Starting point is 00:39:30 Today, Linux is the industry standard foundation for both cloud computing and the operating systems that run most mobile devices. And we all benefit from superior products because of it. I believe AI will develop in a similar way. And that makes sense to me, right? It's like the premise of open source software software. So there's like this business strategy. I heard about it from Joel Spolski, and it was called commoditize your complement. Right?

Starting point is 00:39:57 And so it's like if you sell a product and along with this problem, something else is used, if you can actually decrease the cost of that thing, it makes your thing more valuable. Right. Like, if electricity is super cheap, electric cars are more valuable, right? So if you're an electric car company, if there was some magic trick to make electricity cheaper, it would help the value of your car. And so, Meda's like, we're not in the business of LLMs, but we're going to need them, right?

Starting point is 00:40:22 We're going to need them to, like, judge if somebody's spamming comments or whatever, right? We don't want to pay these absorbent fees for open AI or whatever. We'll just build our own. And because it's not our business, we'll just give it away. And it also allows me, you know, probably to give the finger to these other companies, right? Yeah, subvert them. Abvert them in a way, right? So that's what they did, right?

Starting point is 00:40:42 So that erodes like another thing, right? Now the base thing that's very expensive, you can just get it. I mean, maybe it's not as good, but it does exist, right? It creates an atmosphere of competition. Exactly, right? except he's not doing it. It's not necessarily for charitable reasons. And then it's helpful, like, from his perspective,

Starting point is 00:41:01 if we make the one that we give away for free and everybody else builds on it, we benefit from all of those things, right? Yeah. If you build some ORM internally, that's like a crazy Don creation, you have to maintain it and whatever. But if you build one and then release it

Starting point is 00:41:16 and the industry starts using it, they make improvements, and then you can pull those in, right? Oh, there's an interest in increasing their own business. Yeah, I forget where I am. So yeah, what were the original text that I sent you? Do we, have we answered any? Yeah, I think we have. We've covered it because like R1 is on GitHub, Lama's on Hugging Face,

Starting point is 00:41:37 and what's this 880 billion for? Number one was the model that the deep seek came up with. So it is, it has the more efficient algorithm because they were constrained by hardware restrictions. So they came up with like a better, a better way of doing it. that wasn't locked in to like, you know, Navidia's model. And Lama was the,

Starting point is 00:42:01 was the Facebook training model that had all of the internet included in it. So you didn't have to go through all the work of combing the whole internet. There's no, there's no secret sauce. There's no special sauce. Everything's open. You can, you can get a model.

Starting point is 00:42:17 It's all, it's all out there for people to develop. There's no, there's no reason why somebody couldn't make our product. So then I think, think the only thing we haven't answered is like the very first part right which is like Greg Brockman being like oh yeah I think of spud as a new base model like a new a new pre-trained it sounds like he's saying we're two years we're two years ahead of everybody else so Orion we

Starting point is 00:42:40 talked about that was like their 500 million dollar and then a billion dollar run yeah it just an amount to anything became it became 4.5 but then they they pulled the plug on it so spud is one of their new models and so Brockman in that video I shared was like from a couple days ago talking about Spud, their internal model, and then they released it. So it became GPT 5.5, right? Six days ago, they released GPT 5.5. It's a new class of intelligence for real work.

Starting point is 00:43:08 Okay, well, like, every time they release a new model, they say revolutionary. Of course, yeah. But there's some interesting things going on here, right? And I think his quote is the only thing we haven't unpacked. It's a new base. It's a new pre-train. It's a new pre-train, right? But like, so what is it?

Starting point is 00:43:24 Because, like, we discussed this problem wherein they keep, they tried making the bigger, but there was no good data. They made one 10 times the size and it wasn't better. They were diminishing returns. And now they have another big one. So what is their secret? Yeah. Well, wait a couple weeks. It'll get leaked or you'll figure it out.

Starting point is 00:43:44 Because there's this other problem that happens, right? And so there's this process called distillation. If I am building a new model, I have this 4.5, let's say, that there was superheaval. huge and very expensive to run and was like a little bit better. Well, I can chat with it. And similar to this reinforcement learning process, I can take that chat log and I can take a smaller model that's not giant and I can train it on that. Right. And so it's learning from the bigger model. It's like it's teaching a small model that I can run at lower cost with the great answers that the bigger model had. And like it can't learn at all because it's just it's it's not as big,

Starting point is 00:44:19 but it will get a lot closer. It's like a senior teaching a junior, right? It's like I went through some the last 20 years I've seen some stuff Yeah You don't have to do all that You don't have to make all my mistakes Just do this

Starting point is 00:44:32 Yeah and so they They call this like distillation And if it's like my son He won't listen to me He'll make the mistakes anyway And then he'll be like Oh yeah it turns out you were right That's the learning process

Starting point is 00:44:43 But like the big labs do this themselves right There is like BPT 4.4 And there's like GPT 4.4 Mini And mini cost less and it's still really smart,

Starting point is 00:44:55 but it's much smaller and faster because it's like they took this big model and they distilled it, they got all these important lessons from it and gave it to the small one. Well, Open AI and Anthropic made all these allegations against Deepseek and other openweight companies

Starting point is 00:45:10 that this is what they're doing. Because whenever there's like a new super smart model out, six months later, there's an open weight model that seems just this smart. The only thing that you need to do this distillation, really, is the logs of chats from these smart models, which is in fact their product.

Starting point is 00:45:28 If you want to make a smarter model and I have one, just like have lots of chats with my one and then train your model on it. So it's even more like competitive pressure, right? If I come up with a really smart model, by definition, you can use it to make your smarter, right? So this is another moat problem, I guess, right? And if you look at OpenAI,

Starting point is 00:45:47 they have these thinking models, but it won't actually tell you what the thinking is. Well, they don't want people to know. how it's thinking because that's their trade secret. That's their secret sauce. And so even without that black box, they're still alleging, I don't think they're lying, that these Chinese labs and other open-mate companies are just using their service and using that to train their models. Because, like, why not? Then you can use their models and you can see exactly how it's reasoning. Yeah. But so every time they come up with a super smart model, like a couple months later,

Starting point is 00:46:16 probably there'll be a new smart model from the open-weight companies because it's easy to extract things, right? It's like the problem of MP3s. Like, it's easy to copy this information. Not as easy as an MP3, but like, sort of easy. The moat is leaking. So if we go back to the 880 or 850, like, what can be, where's that value hiding, right? Well, I think a lot of investors are asking this well. But like, I'm not so negative on Open AI. I think they could be a very valuable company, but where is that value? Well, what's the good way to try and get some money? They're not subscribing, but we do have subscription. I mean, I'm a subscriber. Yeah. But I think that overall the money that they get from subscriptions isn't at the levels that they would need in order to call it a success.

Starting point is 00:46:58 So ads, man. That's the worst. I don't want that. That's, I think they're talking about it. Like, that's how they have to monetize, right? A lot of people are using their product, even though there are alternatives. Brand loyalty. Yeah, brand, whatever.

Starting point is 00:47:12 But it just means that you're going to have to meet them where they are. I don't know. Like, businesses are using. I think if you look at it frozen in a moment, right? Like the amount of spend that my company is spending. So it's anthropic, but it could easily be open AI. Just for like coding subscriptions, right? It's massive.

Starting point is 00:47:30 If the amount of money we're pouring into this company is huge. So I think at any given moment, like, that's real, right? They're making a profit off that. But the challenge is, yeah, that it can quickly diminish and there's competitors. If there's an open source alternative as well, right? Like, why would your business keep paying a subscription for hundreds and hundreds of on credits when they could just, you know, use this open source alternative that's maybe even locally hosted.

Starting point is 00:47:57 Yeah. Or just like there's a company that hosts it, you know, at cost because they didn't have to build it, right? The open source model that have it. Lots of those exist. So it's a Red Queen's race, right? There's, have you ever heard this term before? No.

Starting point is 00:48:09 I love this term. So it's like through the looking glass from Alice in Wonderland and the Red Queen has a race and they're running, right? And Alice says like, why aren't we moving? Like they're running and it's like they're on treadmills and they're not going anywhere. And the Red Queen says like, oh here, like you have to run as fast as you can just to stay in the same place. If you slow down, you go backwards. But as fast as you run, you stay in the same place.

Starting point is 00:48:33 Owen can ever win. Yeah. As soon as you have an advantage, keeping that advantage requires working just as hard as you did before. It's a great metaphor for this process, right? It's like, okay, every year or so, Open AI is coming up with a new breakthrough. that lets them push the frontier or Anthropica's. So the open-way models right now are all, just like, let's say, six months behind.

Starting point is 00:49:00 Maybe, I don't know about this new release, but previous to Brockman saying we're two years ahead, the gist is kind of, the open-weight models are six months behind. But in that six months, like, the models got so much better that everybody's paying for the premium service. Yeah, well, it's like movie theaters, right? You can go see it in the theater. It's very expensive.

Starting point is 00:49:18 it. You get to see it first. There's a lot of people who just wait. If you want to be six months behind, you can use a cheap model and it's fine. Right now the curve is so high that it's, no, you've got to get on the new thing. Everybody feels that way. If that curve flattened out, it's over, right? If that curve keeps going up, though, oh my God, who knows where we're going to end up? It's the Red Queen's race. They, like, all these frontier labs, if that's their product, they're running as fast as they can to stay, right? Like right now, Chad Chubit has my $20 a month, and Propik has my $100 a month for the coding agent, but if something better comes out than those or just everybody else catches up

Starting point is 00:49:54 and has a cheaper service that's sold at cost, that money's gone. Thropik has to go as hard as they can just to keep my money, because I'll just switch. There's this line from Stuart Brand. People always remember it's like, information wants to be free. But his actual quote was, information wants to be free, but information also wants to be expensive. Some information is just so valuable, but at the same time, it's free to share it. And like these companies are in this place where they have something that's so amazing, this amazing breakthrough with these AIs that are so valuable. But yet, it's depreciating like nothing, like a peach and like a summer day because everybody's catching up. And so he comes up with this new thing, right? Brockman's two years of

Starting point is 00:50:41 research is coming to fruition. That's not modesty, right? He's trying to tell people hey actually I think we're more than six months ahead but the news came out so semi analysis who we talked about before they said this is the first new scale up in pre-training since gpte 4.5 bigger model so they're back on the curve right so this curve that we would follow this curve up so far and then they could never get past it now they're claiming we're up we we got past this wall ilia said there's nowhere else to go up here but we we found a spot right ilia left the company we're here we found away right there's the two founders, one the engineering guy and one the researcher and the researcher left and said, like, fossil fuels have been exhausted, but they're saying like, hey man, no, actually, it's still going.

Starting point is 00:51:25 We found something. What are their fossil fuels? And one interesting thing is, usually these GPT models have been getting cheaper and cheaper over time. This 5.5 that they just released costs four times as much per token per conversation. Okay. So it's obviously going to be more of a moneymaker then. Well, maybe, but 4.5 was also super expensive and then they pulled it. And the reason is, these things get bigger. Like, it just become more expensive to host. They're like, no, man, this costs more. Like, this is a bigan, right? This is a chunky model of the biggest we've ever shipped. But will the results be, like, compelling enough for somebody like yourself.

Starting point is 00:52:07 Is it worth it, right? Four times more is a lot. So I don't know. And like, whatever, the podcast will go out. Somebody's listening to this and it's a year later. And we'll know. And we'll know. But like, it doesn't matter because this is the next one and the next one and the next one. The race keeps going. I could not get a subscription and use this open weight model from a number of providers, or I could just pile up some GPUs here and run it or whatever. I have friends who will say, like, hey, this is all a trick. The Open AI gets us addicted to coding using these coding agents. Then they'll jack up the price and everybody forgets what coding work.

Starting point is 00:52:40 That's what I was worried about. It's a risk for your company and you want to limit the experience. that you have to that risk. If they start relying on it, then the risk is that Anthropic could jack the price by four times. Yeah, but I'm saying, hopefully, why there's less of a concern. If you can jump to a free model. Because the free models are always just a little bit behind. These companies are actually fighting tooth and nail with each other. If both Anthropic and open AI collapse, we'll just lose the latest six months because everybody's racing to keep up. These things are not going away. You can torrent that Lama version. Like it's not at the lead

Starting point is 00:53:24 anymore, but people use that Lama as the base model. They add all their training on. They do the distilling, the anthropic and whatever is mad at. And like if all of these companies explode, we still end up with just like the open weight models of six months ago. And there's a bunch of companies that host these and you can use like open router. Yeah, the cat's already out of the bag, right? Like, cats out of the bag, this isn't going away. Okay, we got to wrap this though. Okay, so let's go through it again.

Starting point is 00:53:50 So what were the original quotes? There was like where the open AI president says, I think of Spud as a new base as a new pre-trained. And then there was like the Google memo that was like, guys, we don't have any moat and nobody does. So what's your feeling? True, false? The memo leaked well well before.

Starting point is 00:54:10 the announcement of that new base model, right? So it could have been true at that point, but maybe it's not. Maybe the moat is now in the new base model. See, I feel like it's still true because Rockman may think that they have a moat, but he's saying we have a moat that's two years. And like, not very long. He's like, we think we know six months this time. Now we have two years.

Starting point is 00:54:31 That's, we need to reinvent ourselves in the next two years. Yeah, it seems like the underlying premise of them having something over other companies is temporary. It's still, so it can't be something that won't eventually be discovered. Yeah, so what do you think? We have no mo and neither is open AI.

Starting point is 00:54:47 True or false? Um, I feel like it's false. Oh, I feel like it's true, but yeah, interesting. I mean,

Starting point is 00:54:53 I guess it depends on the timeline. Like, they have one, but like, it's a temporary one. And like, they maybe have one for now, right?

Starting point is 00:55:00 But when that one gets bridged, they will be stuck trying to dig out another one. They do have one, but it's not, it's not permanent. Yeah. Okay.

Starting point is 00:55:08 And then the first quote, I think we understand. So he says, I think of Spud as a new base, a new pre-trained, and it's two years worth of research coming to fruition. So the two years of research doesn't mean that they have two years before somebody figures it out. Like you could spend a lot of time on the research and development and then release it and somebody like copies it. And then what's the third one? So the the quote is, if all this stuff is already built, why are you paying $850 billion? What are you buying with that? Yeah, I think we actually agree on what the answer is.

Starting point is 00:55:41 The answer is people are betting on this horse. They're saying, like, we know this one moat is only going to last so long, but we think this company will build the next moat, right? They'll keep the treadmill going. Let's keep the treadmill going. Everything's like on open source. Why are we spending money buying $850 billion of something I can fork today? But you're not buying that.

Starting point is 00:56:02 You're buying the process. But I think we understand it all. I think we got through it. What do you think? I think we figured it out. Thank you.

CoRecursive: Coding Stories - The Pre-Training Wall and the Treadmill After It

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.