TBPN Live - Weekly Recap | Elon vs. Trump, Ukraine's Drone Attack, Cluely Update & OpenAI CRO

Starting point is 00:00:00 You're watching TVPN! This week, what were the top stories? What were the most interesting things that we learned? One, the Ukraine drone attack. That was huge. Operation Spiderweb, a whole ton of drones were smuggled into Russia in shipping containers. They emerged and went out and attacked bombers. We had Soren Monroe Anderson from Neiros on the show

Starting point is 00:00:29 to break that down for us. We also talked to Connor Love at Lightspeed who does a lot of defense tech investing. And of course, a couple weeks ago we had Eric Prince on the show, the founder of Blackwater, and he had kind of predicted that the Ukrainian military was perhaps underrated and we might be seeing something like this in the future.

Starting point is 00:00:48 And so that was interesting to see play out. So we will take you through those kind of interviews, recaps of those. Then obviously we had the absolute meltdown between President Donald Trump and Elon Musk that unfolded on X and Truth Social, the two leaders- Dueling. Dueling social media.eling dueling social platform exactly and although it's a highly political story there are big business implications totally talking about what's gonna happen in space between NASA

Starting point is 00:01:15 Boeing different launch providers yeah the implications for Tesla SpaceX Nuralink even the boring company right there's a lot of stuff that needs to be in your face. Many of Elon's businesses are heavily regulated and the potential impacts are substantial. Yeah, then we also had some earlier stage founders on the show. Roy from Cluely came on and went pretty viral, put on a show. You're surrounded by journalists.

Starting point is 00:01:41 Hold your position. Cluely is a service to help you cheat on everything. We got to actually try the app. Someone was pushing us to try it and see how good the product is. But regardless of the product, he's also a phenomenal marketer and he came on and put on an absolute show. And he's printing, apparently.

Starting point is 00:02:01 Yeah, he's doing great. He can't spend all the money that they're bringing in. And so we'll give you an update on Roy and see where that business is. And then Kian from Nucleus came on to launch Nucleus Embryo, which he calls the first ever genetic optimization software that helps parents give their children the best possible start in life.

Starting point is 00:02:21 Quite a lot of controversy there on the timeline this week. A lot of people hate it. A lot of people hate it. A lot of people love it. And we'll let you kind of decide for yourselves. And then we had a whole bunch of AI experts on the show from Google, OpenAI, and Anthropic, got to the front leading edge of the debates around AI, LLMs, deep research.

Starting point is 00:02:42 Poor John yesterday. You were fighting as long as you could. You didn't want to talk about drama. You got dragged into it. But we did get some great coverage from Mark Chen at OpenAI, as well as Sholto over at Anthropic. Yeah, it was a lot of fun. The big news over the weekend was the Ukraine drone attack

Starting point is 00:03:00 on Russia. They shipped shipping containers deep into Russia at which point drones flew out of the containers and hit strategic targets. We're gonna have two guests on the show today. Soren Munro-Enerson from Nero's to talk about that and also Connor Love from Lightspeed to talk about that and also defense tech investing generally. Today we have Soren from Nero's who builds drones and has been to the Ukraine and so we'll bring him into the studio and ask him how he's doing. How are you doing?

Starting point is 00:03:29 There he is. Welcome. Great. How are you guys? We're good. We have new soundboards, so expect some wild, some wild cards. Wild stuff. Could you give us a high level overview of the history of drone warfare in Ukraine? Cause I understand it's been progressing super rapidly on both sides and it'd be helpful to understand kind of the different stages. Did they ever have like predator drones, like the global war on terror type of drone, or did they jump straight to quadcopter

Starting point is 00:03:56 and kind of like leapfrog the technology? So, you know, you've had this Russian aggression war in Ukraine since 2014. Obviously, the full scale invasion was 2022. But even during that period before the full scale invasion, there was some usage of drones for surveillance and dropping explosives. These are primarily still like small drones like what you're seeing now. But this was not a proliferated technology.

Starting point is 00:04:23 Then when the full-scale invasion happened, within a few months, the Ukrainians started thinking about all these ways that they could use inexpensive drone technology to get an asymmetric advantage. And that is where FPV drones started becoming a really, really big deal. So they pioneered really this idea of putting an explosive on a racing drone and using that as a precision strike weapon.

Starting point is 00:04:47 There were instances of this happening in other places, but they really scaled it and they've really refined it. And then Russia was much slower to take it seriously, although now they tend to in some ways outproduce Ukraine and they have a much more direct line to China where most of these components are coming from. But since 2022 and FPV is just starting to get used, now it's reached an unbelievable scale. It's estimated Ukraine is going to produce four and a half million FPV drones this year. And those are ranging from ones that are this big to 15-inch

Starting point is 00:05:22 propellers, fiber optic controlled drones, many different types and sizes of warheads, different configurations, and I can talk more about the drones that were used in Operation Spiderweb as well, because those were really interesting. But what we've seen is just this vast technology landscape where new clever ideas like fiber optic

Starting point is 00:05:43 are going to be the hot thing for a few months and then they sort of just become another tool in the tool belt and it's just this constant arms race. Yeah, talk about this attack was unique in a bunch of different ways, but is this something that had been to your knowledge or just more generally known to be something that had been attempted multiple times, or maybe like, I'm curious to know, yeah, kind of the backstory on this type of attack, because it seems, it's a massive difference

Starting point is 00:06:16 to be using this technology way behind enemy lines versus using it at the front line. Yeah, so primarily FPV drones are used on the front line, say the 30 kilometer band across the zero line. What was so unique here is that it was FPV drones, short range drones, being used 4,000 kilometers inside of Russia. It was this unbelievable application

Starting point is 00:06:42 where you've seen the Ukraine using long range one-way attack drones that are going, you know, fifteen hundred kilometers to strike targets deep inside of Russia. But here these were small drones actually driven in on trucks, basically in the tops of shipping containers. And I don't know of any operations that were similar to this beforehand. I think it was not something they wanted to give away. And the drones were actually operating on cellular. They were not operating on local, like the normal low latency local radios you use for FPVs typically.

Starting point is 00:07:21 And so I think this is going to be something that a lot of people are going to look at and see if you have drones that are operating on cellular, you can't really tell them apart from cell phones. It's really hard to defend against, really hard to detect. But now it's going to be part of air-based defense is thinking about drones that are operating on cellular being piloted from basically anywhere in the world. Talk about the Russian response,

Starting point is 00:07:43 the immediate response to this incident from basically anywhere in the world. Talk about the Russian response, the immediate response to this incident from the footage that I saw. And I think most people saw that that tracked it. It seemed incredibly challenging to respond to it quickly, right? By the time you could sort of organize a response, a lot of the, the core damage had been done. What, what do you think the, the do you think the question I think that every country is asking themselves now is how do you defend against this type of attack, whether you're at war

Starting point is 00:08:14 like Ukraine and Russia are, or you're just thinking long-term? Yeah, this clearly poses a massive threat to critical infrastructure. I mean, being blatant, the US does not have any defenses Yeah, this clearly poses a massive threat to critical infrastructure. I mean, being blatant, the US does not have any defenses in place that would stop this from happening. We already know there's already news stories about drones that are flying over our air force bases and we can't do anything about it. And I think the only approach here has to be a multilayered system where you're looking at all of the different types of electronic warfare

Starting point is 00:08:45 and also considering things like satellite communications and cellular communications where you're basically able to turn those off on the flip of the switch, which is a huge inconvenience and a huge thing to build into the infrastructure. But clearly that's going to be required. Welcome to the stream Connor. How are you doing? I'm good. I'm doing alright. Good to be back, guys. He's got a suit this time. Looking great. We love to start. I won't say I dressed up just for you, but

Starting point is 00:09:14 I would have taken the suit off far before this if I wasn't coming on. Fantastic. Thanks so much for jumping on. Have you been tracking the Ukraine story closely? Any insights there? Anything in the portfolio that's at all relevant

Starting point is 00:09:29 in the defense tech world? Do you expect a response from the US government or guidance or change to any strategies? Really any takes on that? I mean, first shit, what a time to be alive. I mean, I'm sure your Twitter feeds and your group chats are going up pun intended over the weekend. I mean, it know, I'm sure your Twitter feeds and your group chats are going up, you know, pun intended over the weekend.

Starting point is 00:09:47 I mean, it's pretty crazy. I mean, let's be honest, like, first, I'm not shocked that the Ukrainians did this. I mean, the execution seemed to be flawless from what we can pull from open source Intel. I do think though, I mean, again, it's not a surprise that the Ukrainians have been mastering drone warfare for the last handful of years. And you want to call it that, they called it spider web.

Starting point is 00:10:09 Like this was their Trojan horse. This was their Israeli beeper. And the outcome is pretty impressive to be honest. I mean, from the outside looking in, like the Russians woke up over the weekend and they thought they were getting their $4 team orders. And what did they get? They got a thousand FPV drones, you know, blowing them to smithereens. So it's pretty impressive. I mean, my takeaways from this are really twofold. The first is like, there's never been a clear signal of where warfare is going.

Starting point is 00:10:40 And to be clear, what I view this from, you know, both the entrepreneurs in my portfolio, but also from my perspective, I mean, the world is about, you know, cheap, attributable, a lot of times, autonomous systems. And that's, you know, playing out warfare, that's playing out in other areas of life. And then the second thing is, you know, candidly, it's like, it's really hard to defend yourself at the pace at which things are changing. And again, like, I know really hard to defend yourself at the pace at which things are changing. And again, like, I know we do some things here

Starting point is 00:11:08 in the United States and are trying to be on the front end of a lot of this innovation, but when this happens, I think this almost just resets everyone again and says, all right, how do we respond to it? And I think it's, to your point, it's not a direct US response, it's more of, hey, what do we need to buy? What do we need to develop for our own fight

Starting point is 00:11:24 in some way, shape or form? Yeah, what do you think, obviously, you're a venture capitalist, not a geopolitical strategist, but what's the right Russian response to this? Is it, hey, we suddenly need to be wary of having cell coverage anywhere near strategic military assets? It seems like Ukraine and Ukraine

Starting point is 00:11:45 in Ukraine's perfect world, they could run this style of attack a bunch and copy and paste and hit other targets. But it feels like something that was dependent on cellular technology that that's something that the Russians can revoke fairly quickly. Sure, it'll be inconvenient, but I'm curious if you have a take.

Starting point is 00:12:05 Yeah. I mean, to be honest, when I think about how do you defend against this, I think there is, I wouldn't call this the easy answer of just turning off the cellular network. I actually think the only way to do it kind of practically is in layers or in a multitude of different ways because, yeah, the reality you know, the reality is, if you looked at how the Ukrainians carried out this attack, they did so on the local, you know, Russian cell network, which, again, I don't think any Russian kind of defense unit, any of these bases was ever thinking that they would have to turn off

Starting point is 00:12:40 their own cell network. And then there's just the practicality of how you do it. I mean, I think there was what four or five different attacks that hit all at the same time. What do you do? You turn off the network for tens of thousands, hundreds of thousands of people. And oh, by the way, this is like a dirty little secret that nobody talks about. You know, yes, you have your military systems that are protected and all that. But a lot of coordination is happening through WhatsApp, a lot of coordination on incident. All of a sudden you turn off the cell networks, you're actually inhibiting your own defense,

Starting point is 00:13:08 your own response, the first responder, you know, getting your own people out of there. So I think it's a bit more complex than that. And then the last thing I'd say is just like, even if you do this in layers, you know, you need to be resilient in a way, but you're not gonna stop everything. I mean, this was just brilliant masterclass of,

Starting point is 00:13:28 again, if maybe there was a plan, we didn't know this, but maybe there's a plan for 100 bases and we only hit five of them. And so if you think about just the broad, geopolitical, geographic coverage you have to have, I think to be 100% certain on anything, it's just, it's impossible. You can't do it.

Starting point is 00:13:46 The most capable military in Europe right now is the Ukrainian military. The lessons learned that they have are very significant. The drone tech is far and away the best, their ability to do fight against and to even conduct electronic warfare and even closer support in this environment is is leaps and bounds ahead of even what the US military's is. So that's the military to learn from. Ukraine does does have a corruption problem. I hope sincerely that Trump is able to get a ceasefire in place and to stop this killing because it's absolutely pointless.

Starting point is 00:14:26 It's just Slavs killing Slavs at this point and nobody's going to advance. And you have a blend of old and new. If you look at the pictures of the front there now, it's almost indistinguishable from the Battle of the Somme, right? Artillery duels, static lines, bunkers, all the rest. Now the problem is somebody can fly an FPV into your bunker on the other side, but between tens of millions of landmines, which make armored breakthrough very difficult, it slows down any attack so that the FPVs and artillery

Starting point is 00:15:03 can get to it. You're not going to see any kind of blitzkrieg, Hans Gadarian maneuver warfare there until some significantly different weapon systems come along. So look, Europe needs to get serious about it. They're far from it at this point. Elon post four minutes ago, the Trump terrorist will cause a recession in the second half of this year. Elon post four minutes ago, the Trump terrorists will cause a recession in the second half of this year.

Starting point is 00:15:27 Wow. Somebody else was saying, can I finally say that Trump's tariffs are super stupid? Who's who is that? Somebody else is posting, Mads posting is saying it's a Jiju Ping. He says, bro, you seeing this? And it's Putin on the other end.

Starting point is 00:15:43 He's just looking at it, Hold up, got a line. And it's, we'll start pulling some of these up. Ridiculous. What else is going on here? This is the present versus Elon. Neval says, Elon's stance is principled. Trump's stance is practical. Tech needs Republicans for the present,

Starting point is 00:16:05 Republicans need tech for the future, drop the tax cuts, cut some pork, get the bill through. This is so crazy. Antonio Garcia says remember there's a few money and then there's F the world money. Will Stancil says imagine being the ice agent suiting up for your biggest mission of all time right now. People are saying that Trump's going to deport you on back to the South Africa. Will the Pew says time to drop the really big bomb growing Daniel is in the Epstein file. That is coffee pasta.

Starting point is 00:16:41 That is a real piece there. Terrible. Oh no. Delian. We had a question from a friend of the show. They said, the real question is if Tesla is down 14%, how could SpaceX and OpenAI be trading if they were, how would they be trading if they were public? The real thing here is it's bad for everyone, right? DJT is down, Trump coin is down,

Starting point is 00:17:08 nobody's really winning here. China's up. Yeah. Oh really? Sean McGuire, I mean, I'm just saying like at a high level, China is the big beneficiary here of Sarah Guo says, if anyone has some bad news to bury, might I recommend right now? Yes, yes, Sarah gross as if anyone has some bad news to bury might I recommend

Starting point is 00:17:25 right now? Yes, yes, yes. If you have, if you, uh, what's the canonical bad startup news? Like, oh yeah, you missed earnings or something. Drop it now. Inverse Kramer says Bill Ackman is currently writing the longest posts in the history of this app. And we have, uh, we have a video from Trump here if we want. I can throw it in the tab and we can share it on the stream and react to it live.

Starting point is 00:17:55 Lex Friedman says to Elon, that escalated, quickly triple your security, be safe out there brother. Your work, SpaceX, Tesla, XAI, Neuralink is important for the world. We need to get Elon on the show today. If somebody's listening and can make that happen, I would love to hear from him. Max Meyer says, so I got this wrong.

Starting point is 00:18:13 I didn't say it never happened, but I thought it wouldn't. I'm floored at the way this has happened. Yeah. He didn't think they would have a big breakup. Many people didn't think they would have a big breakup. Even just earlier this week it seemed like they might just have a somewhat peaceful exit. Trump just posted a little bit ago, I don't mind Elon turning against me, but he should have done so months ago. This is one of the greatest bills ever

Starting point is 00:18:38 presented to Congress. It's a record cut in expenses, 1.6 trillion dollars and the biggest tax cut ever given. If this bill doesn't pass, there will be a 68% tax increase and things far worse than that. I didn't create this mess. I'm just here to fix it. Anyways, lots going on. Let's go to this Trump video. I want to see what he has to say.

Starting point is 00:19:00 This is what I've seen and I'm sure you've seen regarding Elon Musk and your big beautiful bill. What's your reaction to that? Do you think it in any way hurts passage in the Senate? Which of course what is your seeking? Well look, you know, I've always liked Elon and I was always very surprised. You saw the words he had for me, the words of, and yes it said anything about me that's bad. I'd rather have him criticize me than the bill because the bill is incredible.

Starting point is 00:19:25 Look, Elon and I had a great relationship. I don't know if we're well anymore. I was surprised because you were here, everybody in this room practically was here as we had a wonderful sendoff. He said wonderful things about me. You couldn't have nicer said the best things. He's worn the hat. Trump was right about everything. And I am right about

Starting point is 00:19:46 the great, big, beautiful bill. But I'm very disappointed because Elon knew the inner workings of this bill better than almost anybody sitting here, better than you people. He knew everything about it. He had no problem with it. All of a sudden he had a problem. And he only developed the problem when he found out that we're going gonna have to cut the EV mandate because that's billions and billions of dollars And it really is unfair. We want to have cars of all types electric. We won't have electric, but we want to have a gasoline Combustion we want to have different we want to have hybrids We want to have all we want to be able to sell everything he hasn't said bad about me personally But I'm sure that'll be next but I'm I'm very disappointed in Elon. I've helped Elon a lot.

Starting point is 00:20:28 Mr. President, did he, I just want to clarify, did he raise any of these concerns with you privately before he raised them publicly? And this is the guy you put in charge of cutting spending. Should people not take him seriously about spending now? Are you saying this is all sour grapes? No, he worked hard and he did a good job. And I'll be honest, I think he misses the place. I think he got out there and all of a sudden he wasn't in this beautiful oval office and he was and he's got nice offices too. But there's something about this when I was telling the chancellor. Folks, breaking news, Deleon, that's Ruhov, he's joining us in the temple for some live reactions. Come on in.

Starting point is 00:21:08 Surprise guest. I can't even spell surprise guest. I'm so excited about this. Surprise guest. Yeah, in other news, 11 Labs dropped a new product. I was like. Ha ha ha ha. So in other news, $2 million seed round.

Starting point is 00:21:24 Stop it, stop it. We love 11 Labs. Another news, $2 million seed round. Stop it, stop it. We love 11 Labs. No, they'll keep grinding. But just launch again tomorrow. You're going to have to launch again. Start shooting a new Vibreel, start shooting a new, writing a new blog post

Starting point is 00:21:38 because no one's going, Lulu says yes, delay the launch on TVP. So basically right now I can just pull up and just read. I'm going to just be refreshing true. So you do it. So okay, Jordy's on truth social. I'll be on X. Give us your reaction.

Starting point is 00:21:55 Tell him what's going on. I've made some point. I was like, I'm just sort of scrolling X and I like tuned into you guys like an hour ago. And I was like, they're talking about something. I think I was like, at some point they switched to like, we have to like, and then an hour ago and I was like, they're talking about something I think I was like, at some point they switched to like breaking news and then I was watching and I was like, okay, like, you know, I got it.

Starting point is 00:22:09 John resisted. I fought it for like a half an hour, but we couldn't do it. But yeah, give us your quick reaction. I mean, always, you know, sort of give it from the, you know, sort of space angle, you know, it's amazing that, you know, how much the world has shifted since Friday of last week,

Starting point is 00:22:28 whereas it was presumed that Jared Isaacman was gonna be the NASA admin to today, it was released that the Senate reconciliation package re-added budget back into NASA, largely for the SLS program, which was basically the program that Jared and Elon were sort of largely advocating to completely shut down.

Starting point is 00:22:51 It is already showing, the counter reaction is already showing up in policy. Sorry, SLS program, is that space shuttle or no? Sorry, that's the SLS launch rocket. It is based off of old space shuttle hardware. But it is basically the internal, you know, sort of NASA run competitor effectively to like a Starship heavy launch rocket. Yeah. You know, because it was sort of generally behind budget behind schedule. And there are so many commercial heavy lift rockets coming

Starting point is 00:23:19 online. Sure. The default was canceled. That is largely sort of a Boeing based program. And so, so you know if you look at you know You know three months ago, you know when they were announcing that 47 program Yeah, you know Elon walks into the secretary of the Air Force's office obviously even you sort of ranting against Man fighter jets and believe that shouldn't be what you know be what the department is prioritizing 30 minutes after that meeting was when they announced the f-47 program and so so now you're seeing basically like the equivalent in space where, you know, that was obviously awarded to Boeing. Boeing is the largest prime behind SLS.

Starting point is 00:23:53 Boeing basically is going to be the biggest winner of NASA refunding SLS and Jared Eisenman not being a NASA administrator. So tying this back to the timeline, Trump posted less than 30 minutes ago, in light of the president's statement about cancellation of my government contract, SpaceX will begin decommissioning its Dragon spacecraft immediately. Break that down. I mean, that just means that we no longer have a vehicle that can go to the International Space Station.

Starting point is 00:24:19 We no longer have a vehicle that can take astronauts up and down. We also don't have a vehicle that can de orbit the International Space Station safely, right? The Dragon was expected to be able to do that. So what that means is, you know, if you guys remember all the memes about Stranded, you know, from last year around Boeing Starliner, it now means that the space station, you know, itself is basically, you know, sort of stranded. And that's like, you know, one of the government contracts, obviously, that, you know, space is involved in.

Starting point is 00:24:44 And Elon, I've heard generally generally, like, just wants to shift all things to Starship anyways, and so in some ways, was probably kind of looking for an excuse to, you know, sort of shut down Dragon and refocus energies. There's also a part of it where it's like, look, he is, like, kind of independent in the space world, in that, you know, Starlink's total top-line revenue is gonna be passing the NASA budget in the next year or two.

Starting point is 00:25:03 And so, in terms of, like, size of, two. And so in terms of like size of you know, state actor that can influence space, you know, his own company is basically about to become you know, as large of an actor as like the entire United States. So I don't think there's going to be like a de escalation here. Like, you know, my my, you know, estimation is like on both sides, it's going to continue to escalate. You know, if we thought that we live in dynamic times, you know, when Trump got into office, if we thought that we lived in dynamic times,

Starting point is 00:25:25 you know, when Trump got into office, it's gonna be even more dynamic when there's like, three parties in the world. The dynamism will continue until morale improves. Elon the center, AOC the progressive populist, and Trump the, you know, sort of conservative populist. And, oh man, it's a, It's remarkable times.

Starting point is 00:25:43 It's hard to be on the timeline. I mean, I just have so many questions, right? How does this impact Golden Dome? What's Boeing stock doing? Will Golden Dome even be a viable project without SpaceX? I think there's just going to be more resistance probably to working with upstarts because they would be ones that would probably be more likely to collaborate with SpaceX know, sort of with, you know,

Starting point is 00:26:05 a SpaceX and so. Well, wait, wait, so, I mean, it feels like, it feels like Boeing would be a logical beneficiary of this turmoil and yet they're down today. They haven't really popped. Oh, really? Yeah. I mean, I'm not obviously, you know, one to give like, you know, public talk about.

Starting point is 00:26:20 Yeah, I know. I'm just kind of working through it myself and it's- Yeah, that's surprising. It just feels like it's just- I would have like, let a drop in a drop in Boeing to pop basically. Yeah. Yeah, that would be the expectation But there must be something here because there it feels like this is purely interpersonal between Elon and Trump and not it's not like oh Boeing was secretly behind the scenes the whole time lobbying even more effectively. It doesn't Well, where's the tinfoil hat? Maybe we need a tinfoil hat segment, who knows?

Starting point is 00:26:47 But yeah, I mean, when you're in Boeing World, it's like, hey, we're only down 1%, let's go. The coup of the century. My question is, has there ever been a crash out of this magnitude ever? In history, well, you know. In internet history. When Elon and Trump became friends.

Starting point is 00:27:02 Or, honestly, world scale. I actually. There's the probably world world history equivalent I feel like there's something in like maybe didn't have an era in the United States where you know they're crashing out crashing out used to mean calling up the New York Times and just ranting now you can just live post like all your reactions and it's just all real time this is like crash outs are actually intensifying. You actually want to be long crash outs over the next 24 hours.

Starting point is 00:27:28 And you have their own social media platform that they own. So you know, you got to be on both X and truth social to like stay on top of things. Yeah, I actually did like a deep research report a while back on like, has the richest man in America ever been close with the US president? Going back to like, you know, was Rockefeller particularly close? And, and because the narrative was like, Oh, this is like so unprecedented.

Starting point is 00:27:51 And in fact, it is unprecedented. In fact, I would have guessed that like Rockefeller was close to me too. That's what I was going for. It was like, no, I imagine this is always, it is always close, but no, I think because the president has become more powerful globally, your point about mayor of America, dictator of the world, it becomes increasingly valuable for the richest man

Starting point is 00:28:13 to have a close alliance, and so it's become more. I don't know exactly how accurate that research was. It's totally possible that behind the scenes, Rockefeller was really close to the president at the time, and we just didn't write about it in the history books. But there certainly aren't very many anecdotes about the richest man in America going on. Yeah, so Pavel had a great.

Starting point is 00:28:31 So you had a great for AP US history, 2050, you know, APCD exam. Yeah, so this is. What was the tweet where, you know, Elon Musk called the president at the time a potential pedophile. Was it A, about Epstein Island, B, about a cave in the Philippines,

Starting point is 00:28:47 C. What a mess. No, so Pavel had a good post. He was quoting the big bomb from Elon. He said, hypothetical question about the USA's power structure. Is the man with the most access to capital more or less powerful than the political head Honcho?

Starting point is 00:29:01 Purely hypothetical. It's a good question to ask. I mean, I think both like archetypes have grown both in absolute power, but also in relative power to the rest of the globe basically since the Gilded Era. If you think about like the president of the United States in 1925, I'd say pretty darn powerful, but like there was clear like, it was a sort of multipolar sort of world. Argentina was pretty darn powerful, but there was clear, it was a multi-polar world. Argentina

Starting point is 00:29:28 was pretty darn rich at the time. Obviously, Europe was still recovering from World War I, but the UK was generally doing well. It was not clear that there was a huge outweighed effect. And then if you look at probably the biggest industries at the time, I don't think you could claim that even like Standard Royal at its peak, I'd have to go look at the exact numbers, but that like it had the size of budgets relative to like the US government in terms of sort of budgets, right? Versus I feel like now for the first time, you both have sort of US president, extremely,

Starting point is 00:29:57 extremely powerful. And then you have like the sort of mag seven, effectively like the size of sort of huge states. Countries. Like they're fucking with their own state governments. and then also just more bureaucracy, more red tape. So like, when I think about the 1920s, like rubber barons, it's like, it is the, you can just do things era. And so you want to build a railroad like, yeah, you might need to get like one rubber stamp, but it's not going to be 10 years and tons of lobbying

Starting point is 00:30:15 and all this different stuff. So you can kind of just go, you can just go wild. You know, it's bad when Kanye is saying, bros, please know, we love you both so much. And then you're like, oh, I'm going to go and I'm going to go and I'm going to go and I'm going to go and I'm going to go and I'm going to go and I'm going to be 10 years and tons of lobbying and all this different stuff. So you can kind of just go, uh, you can just go, wow, you know, it's bad when Kanye is saying bros, please know we love you both so much. It's just like the voice, the voice of reason is Kanye West.

Starting point is 00:30:35 Yes. Thank you. You didn't bring them together and, uh, you know, form a peace treaty. Nikita beer just added his pronouns back to his bio. Let's go. He's got a rubber band. Elon's got a rubber band all the way back to, you know, sort of extreme woke because I'm straight back to, you know, sort of super climate change and you know,

Starting point is 00:30:51 somebody's, somebody's sharing, re-sharing the picture of the, the cyber truck blown up in front of the Trump tower. I guess it's just like this. This is in real life. It was foretold. Yeah. I, I, I, I didn't,

Starting point is 00:31:04 but it was a question of like when and what magnitude, not if. Always bad if Vladimir Putin is operating to negotiate between President Trump and Elon. I think I think a lot of the world is waiting for Roy Lee's take in the Kluge army. That's who they want been all waiting for. People have been asking him to get involved with geopolitics. Wow.

Starting point is 00:31:32 I love the Shilmoha put up a sort of meme about Narenda, like prime minister of India. He basically copied and pasted the Trump Truth social post about negotiating peace between India and Pakistan when it wasn't like actually fully negotiated. You know, posting about, you know, negotiating a ceasefire between you and Trump. Funny thing is like truth social. You can just read all of Trump's posts without creating an account. It truly shows that like, I would think that you would have to make an account to read them all, but they just, it's not gated at all

Starting point is 00:32:05 It's his this could be the biggest but you know, they clearly I don't think they care about monetization Bitcoin is actually Falling alongside falling Wow Bitcoin falling Boeing falling Tesla falling. Who's the biggest winner of the day? I think it's China China. Yeah, China China Sean McGuire of the day. I think it's China. China. China. China. China. Sean Mcguire really sold off. It's down 3% today at 101 K. So still up, but you know, yeah, rough Winnie the Pooh just dipping his hands in that pot of honey, just snacking away and watching from the sidelines. Yeah. Let's see. Chinese stocks, US stocks, Chinese

Starting point is 00:32:42 stocks. I can't find. Okay. That's probably my commentary on the day, boys. I hope that was just great. It was fantastic having you. Thanks for jumping on. Thanks for hopping on so quickly. Cheers. Next up, we have Roy from Cluely coming back for an update. He's hired 50 interns, I think, or something close to it.

Starting point is 00:32:59 He said they're bringing every intern on. They're bringing every intern on. We got every intern coming in. Well, welcome to the studio, Roy. How are you doing? Oh, let's go. There they are. I think we're overpowering you. Can you, uh, can, can, can you hear us? Yeah, yeah, we can hear you. Yeah. Make sure we're zoomed out all the way so we can see everybody. We got a small army. This is incredible. How big is the team? Kick us off. How many you got at this point?

Starting point is 00:33:26 The team is 11 full time plus the interns. How many interns you got so far? Interns, bro, we're closing in on 50, brother. Let's go. 10 there. That's amazing. Congratulations. What are they all doing?

Starting point is 00:33:40 How do you manage everything? Is it just, is it purely is purely social media? Is that what you want them to focus on growth? Yeah, yeah. Growth marketing. Like the only goal of the company is get 1 billion eyeballs onto Clueless. So you have unrestricted creative freedom and permission to do anything and everything. Uh, just, just make the company go viral. Every single person you see behind you has over a hundred thousand followers on some social media platform. Wow. Wow. Thousand plus. That's remarkable. Me too. Now. Yeah. There we go. There we go.

Starting point is 00:34:12 Probably popped. Uh, uh, what's working, what platforms have actually been, uh, been driving the most growth. I mean, I'm sure you've run a lot of tests. What have you learned that's, uh, that you can share? Bro, Ben take it away, bro. UGC has been really good. We just hit 10 million views today. 10 million views. Eight days.

Starting point is 00:34:30 Wow. There we go. Hoping to get 100 million views in the next month. What platforms specifically are the most fertile ground for targeting your specific customer? Because you can imagine that there's a lot of folks who are AI curious on X, but then there's much broader, more viral audience, more general audience on platforms like TikTok, YouTube, Instagram,

Starting point is 00:34:51 what's working and what is the next next platform that you're going to be focused on? Yeah. Well, we're trying to go viral on every platform regardless. But the main thing right now is Instagram Reels. Oh, Instagram Reels. Interesting. And what is the main value prop that you're hitting people with? Is it still the cheat on tests thing or have you evolved at all? What? Still? Like the interviews. Yeah.

Starting point is 00:35:12 Interviews. Okay. And, uh, has there been, this was controversial when you launched it. Is it still controversial in the comments? Are you getting flamed? Has anyone big dunked on you and has that driven virality? Is that actually a net positive? Instagram is not like Twitter. You could post the craziest shit on Instagram and they still will not think it's controversial.

Starting point is 00:35:34 Really? So how to make it controversial, we have to engage in faith some other way. It's hitting tool is controversial on Twitter, but on Instagram, you could have a white guy say the N-word 10 times and it's still not controversial. You know, like you need crazy shit on Instagram. That could have like a white guy say the N word 10 times and it's still not controversial enough. Like, you need crazy shit on Instagram. That's what we crack. Every single person here has like very great viral sense.

Starting point is 00:35:53 And if you watch the reels that do go viral, you see there's like ways that we've engaged in beta the videos and this is what we'll keep doing to uh, probably a billion views a month is what we're probably going to do. How long does it take to figure out if an intern is cracked? Is it like an hour, two hours? How much time do you need? For me personally, me personally probably like 10 minutes, but for anybody watching probably would take like one or two weeks. There we go, there we go.

Starting point is 00:36:12 How do you guys think about product marketing? Obviously you're just going viral everywhere, getting all this attention. How do you make sure that it, he's shaking his head. It doesn't think about, it's not about the product, it's about his head. Doesn't think about it's not about the product. It's about the attention. Attention is all you need.

Starting point is 00:36:29 You can't make anything go viral. Yeah. Yeah, but but but how do you. The side of the street, you know, you make some UGC videos, make some Twitter posts, you know, you can sell anything, you know, in 2025 product doesn't matter. You know, I could jack off off the side of a building, sell some videos of it for 20 bucks each, make $2 trillion. It's crazy.

Starting point is 00:36:48 $2 trillion, that's intense. How do you guys think about burn? Is it on your mind at all? I don't know if you saw the last tweet, but as of literally like two days ago, we're still cashflow positive. We're still fucking profitable. We're still profitable.

Starting point is 00:37:02 Let's give it up for the profitable. Let's hear it. It sounds... So you're charging for the. We're still profitable. Let's give it up for the profitable. Let's hear it. It sounds, yeah. So you're charging for the product, and people are paying. Are they at all satisfied? Or do they feel like they got scammed? Of course they're satisfied, bro. Like, the product works.

Starting point is 00:37:13 You're either using this as a consumer, and it's working because you're passing your interviews, and or if it doesn't work, you're not going to complain to me, because I'm going to go right to your employer and tell them, yo, guess who's complaining about using the product? Like, I'll get you blacklisted if you complain really. Where, how are you thinking about, how do you, how are you guys thinking about

Starting point is 00:37:31 product evolutions? What do you want to add to the product? Obviously you want to help people cheat on everything. Where, where are you going to help people cheat next? We don't care about like the product is going to be led by the virality of the content. We have video ideas right now that we're going to try to push for different use cases. We're going to see which ones go consistently the most viral. If you can make something go more viral, then like you can just build the technology after you have all the attention. So we'll figure out the exact use

Starting point is 00:37:57 cases and exact niches we're going to quintuple down on once these guys get to work. What formats on Instagram Reels are like the most modern in terms of consistently viral? You mentioned like man on the street interviews, what do you do for a living? That's always been fertile ground. What about, I see a lot of those like mobile game ads that look like, you know, you're fighting down some sort of bridge and then you go into the game. It's actually just match three. What are the different formats that you like to pull from? and then you go into the game, it's actually just a match three. Um, what, what, what are

Starting point is 00:38:25 the different formats that you like to pull from every week? There's two new ones. And at that point there's probably 10 to 20 viral trends that is happening. And these cycles so quick, you need to keep your finger on the pulse. These things will like immediately need to be on the ball. And like, if I, if I told you right now, by the time people watch this on YouTube, like it would have all been expired. Well, we're live. So give us the latest and greatest. Like what's going viral today?

Starting point is 00:38:50 Well, right now we got 10 million views using a Snapchat format. Okay. Viral for like the last three years, to be honest. And I think that like, we just have to get people who continuously scroll TikTok like six hours a day. Yeah. But what's the actual format that you use? Like describe the video. What is the hook? Like, break it down for me, like you're explaining the art behind the viral format. There's a caption. It starts with a face, usually a handsome dude or a pretty girl is saying, damn, this interview is starting with the interviewer is starting with the hard

Starting point is 00:39:26 questions. I should have been a CS major, not a business major. Cause people are saying like, bro, like CS is way harder than business. Then it turns around the interviewer asks like, like, Hey, how are you doing? Why should we hire you? And then this guy uses clue to generate a response, but he can't fucking read the spots. So he reads it a little autistically, like, Oh, I read revel in detail. And and then that's that is like another conversation point like people are cooking on the guy cuz yeah, you can't read properly Guys like a you want a really dumb interview using that's great. How are you guys using?

Starting point is 00:39:56 AI generated content internally. I know a lot of these the videos that you guys are creating are just Typical social, vertical video. Do you have an intern that's just generating basically copy and pasting, making a bunch of other- B-roll, Vio3, are any of these tools relevant? Anything clicking? Not yet. I think there's still like a 10% left

Starting point is 00:40:15 before they cross the uncanny valley. And the biggest thing is that people need to think your video is real. That is the difference between 100K views and 10 million views if people think it is real. Yeah like, like that is a difference between 100K views and 10 million views. If people think of this real, Yeah, what about AI CEOs bearish on AI? What about Google needs like 10 more Chinese researchers to like figure it out. And once,

Starting point is 00:40:36 once they push out the latest update, then, then then VO3 will be there. But right now we need real people. Yeah. Well, I mean, what about just using AI as like stock footage replacement? Not, not as the lead in for the video, not the entire video, but just like sprinkled into illustrate a point, you know, an establishing shot of like a building, a helicopter pulling into a building, like that, that historically has been kind of something that you would reach to, uh, you know, Adobe stock video for VO three feels like it's there,

Starting point is 00:41:07 but are you not drawing on that at all yet? If there's a viral format when we need it, maybe we'll use it. But right now, like it's, it's really brain dead to go viral on Instagram. Or mats are not hard. You don't need a helicopter. You need a guy, a camera, a really shitty camera. I need a computer. I mean, what about those kind of AI mashups, like Harry Potter Balenciaga, or the kangaroo with the plane ticket getting on the plane?

Starting point is 00:41:33 AI content can go viral when it's really, when it's inspired almost by a human. It's not entirely AI generated, but it's using the tools effectively to create something that's still catchy. Do you think you'll be using any of that anytime soon? Probably, probably very soon. We're scaling up. Like what you see right now is probably about less than 1% of what the size will be by the end of this year. Like we are profitable. We're not

Starting point is 00:41:56 trying to be profitable. We just keep making so much money. We can't help it. So we're really scaling this shit up. I'm not even trolling you. 1,000 creators are going to be shipping out content. We're doing a complete internet takeover. OK, so why in-house? Why do they even have to be employees? Couldn't you turn this into a multi-level marketing scheme or something, a pyramid scheme? Actually, what we're going to do, that's exactly.

Starting point is 00:42:17 Oh, that's what you're going to do. OK. MLM. MLM. MLM. I love it. Are you guys worried that you could be infiltrated by journalists? I'm sure they're circling the house right now. The hit. AI after MLM. Are you guys worried that you could be infiltrated by journalists?

Starting point is 00:42:26 I'm sure they're circling the house right now. The hit pieces are going to come. We're doing a softball interview right now. The person that's brave enough to try to do a hit piece on the Cluley army is... It's going to be... I bet they're dying too. Look, more eyeballs is better. There's no company that ever died from a founder being too controversial.

Starting point is 00:42:46 You got deal fucking infiltrating with genuine spies and they're still doing fine, bro. You got workers, 17 guys. They're still kicking like no company ever dies from being too controversial. You die because you don't make enough fucking money. Yeah, yeah, yeah, yeah. Speaking of making money,

Starting point is 00:43:02 what's the pricing model right now? Are you doing anything on price discrimination? Is there a super high tier? If you get a whale, what does it clearly whale look like? Can I spend $2,000 a month on this service? Yeah. You should add a tipping feature too. People should be able to tip you guys if they have a good experience, you get the job, really financialization pay as you go, high interest rate loans, just really push it, make it sports gambling in there. Maybe there's throw it all

Starting point is 00:43:24 in. Yeah. I mean, it's $20 a month for a consumer, $100 a year. And our top line revenue is really being driven up by Enterprise. You're going to have to talk to the sales team to get a custom quote. But you know, like, there's a lot of money. Are you serious? What is that more on the sales side? What? Who are the Enterprise? So you sell the SDRs? You guys laugh because you think I can't sell enterprise because I'm

Starting point is 00:43:48 no, I don't believe it. I trust like these 20, these fortune 500 CEOs, like these are like 35 year old dudes who sit there, school reporter laughing at my posts. Yeah, yeah, yeah. No, no, no. It seems it seems legit. It makes sense. No, I believe it. But, but I mean, you're not going even higher tier. Like what's the $2,000 a month, clearly vision for consumer. There's a lot more we can do with more compute, but right now we're like, to be honest, I didn't expect to grow this fast. The end team is quite small. I'm like, spend a lot of time trying to hire or more competent engineers. We have a lot of backlog tasks that we need to fill out, especially for this last contract that we signed. So we're full time focusing on the one big guy that we got right now.

Starting point is 00:44:26 And after that, um, then we'll, we'll try and scale this up. But right now we're focused on the one, one big client that we signed. Yeah. Uh, talk about your compensation strategy that people want to know. Uh, you, you said, uh, you can raise infinite capital and you're so confident, I believe you, uh, but, but I'm curious to get some more insight there. Bro, I feel like it's so retarded to be a company. Sorry, am I allowed to say that?

Starting point is 00:44:49 No, you're not allowed. No, this is a family friendly show. It's very stupid to be a company. I'm trying to race to the bottom to see how little you can pay your employees. Bro, if I'm making hella money, we're all making hella money. I'm trying to pay them more to see if, man,

Starting point is 00:45:04 maybe tomorrow we'll start being cashflow negative. But I can make a whole bunch of money like like it's I'm trying to pay them more to see if man like maybe tomorrow will start being like cash flow negative but I would like to pay these guys what they're worth and the output is fucking insane we did 10 million UGC views and what like eight days like like you don't see this sort of traction in any company and you don't see killers like this in any company unless you paying these motherfuckers like what they're worth, bro. Like, I don't know what like 135 maxed out contracts, maxed out contracts. Yeah. Uh, what about devices?

Starting point is 00:45:30 I mean, it seemed like this would be a natural fit for some sort of AI wearable or other platform. Um, is there an app coming or are you interested in what's happening with Johnny Ive and open AI? What's, what was your take on the device world? We're very interested in the hardware space. We've got like a million things cooking on hardware. We've got people in the garage right now working on,

Starting point is 00:45:52 you don't even know about bro. Like we're bringing manufacturing back to America and it all starts at the Cluely garage. Cluely garage. Let's go, I'd love to see it. Nobody, you know, they doubted, but you guys are re industrializing America. We're gonna brain chips down there.

Starting point is 00:46:09 Yeah. Brain chips. Brain chips. That's the future. There we go. There we go. The new Neuralink. Yeah. I mean, I, you know, there's a world in the future where you guys actually just roll up Neuralink and open AI for sure. Fully umbrella. Yeah, definitely. It's possible. I did to offer acquisitions for, for both of those companies. Yeah, it's in the roadmap. It's on the road All right. This has been a lot of fun. I'm excited for you guys. It is And I have no doubt that you'll go from, you know, 10 million views a week 10 million views a week to a hundred And I'm excited to see you guys hit that billion view mark very soon. So keep it up

Starting point is 00:46:43 We are all very entertained and rooting for you. I love the energy thanks man we appreciate you joining. Better guys keep having fun. Bye. Next up we have Keon from Nucleus coming on with a big announcement something like 10 years in the making close to it maybe seven years we'll bring Keon in. Let's play some soundboard.

Starting point is 00:47:05 How you doing? Welcome to the show. That's a great intro. The tweets are flying. Oh my God, you guys seeing this? Yeah. You seeing this? Seeing this? Break it down for us.

Starting point is 00:47:20 Explain what's happening. There's nothing like a launch day. I'm trying to figure out guys, is this, is it Gattaca or is it Theranos? Because people can't, they can't make up their mind. Oh yeah, we're going to find out. They're trying to figure it out, they're trying to figure out what's going on. Let's give some context to the audience.

Starting point is 00:47:34 Nucleus has launched Nucleus Embryo, the world's first genetic optimization software. Basically parents can give their children the best start in life. They can pick their embryo based off of physical characteristics like eye color, IQ, they can go to disease risk like cancers or heart disease. Basically really believe parents can get all the information that exists about their embryos and they can pick however they want. For me personally, it's been 10 years in the making.

Starting point is 00:47:58 The journalists actually covered it today in the Wall Street Journal was a journalist that covered my gene editing in a warehouse in Brooklyn 10 years ago. Yes, let's see. Wow. Overnight success. You know, it's a long time in genetics. Yeah, so break down the state of the art because like embryo screening exists.

Starting point is 00:48:15 I think most parents in America, at least if they have the means, do some sort of screening while the embryo is growing. Is this purely for IVF? Is this just going a layer deeper? And then I want to talk about the regulatory and FDA component as well. Yeah, let's talk about it. So basically, if you go to an IVF clinic today,

Starting point is 00:48:35 you're a couple. The vast, vast, vast majority of clinics. The first thing I actually understand is that the IVF process is principally controlled today by clinicians or doctors. Honestly, couples don't have as much liberty in our perspective as they should. It's their baby, it's their embryos, they should have the right to that information and they should be able to pick off any vertical.

Starting point is 00:48:52 However, today in the clinic, what generally happens is people test embryos for very rare and severe genetic conditions. For example, like chromosomal abnormality like Down syndrome, for example, or even a condition like cystic fibrosis or Tay-Sachs or PKU. These are conditions that are very rare that maybe someone might have a carrier for cystic fibrosis, but again, it's pretty rare. Then there are conditions that we've all heard about, things like breast cancer, things like cornea artery disease, the things that actually kill the vast majority of people today, right?

Starting point is 00:49:20 Chronic conditions kill the vast majority of people today. Those conditions are just not tested for in the clinic, even though we have very good science actually that can make those predictions. How do we know this as a DNA company as well? That's what we do, right? We build models that predict disease and the way you test those models in adults. So we go from adults to embryos is actually because we can basically well validate these models to show that they work in both the embryonic context and in the adult context. And so what we're really doing is we're going from okay, instead of just looking for really

Starting point is 00:49:45 severe like down syndrome cystic fibrosis, why not do breast cancer? Why not do heart disease? Why not do colorectal cancer? Why not do schizophrenia? Why do Parkinson's? But then why stop there? And this is really the important thing because ultimately, you know, if you think about disease and traits, the extreme version of any trait is actually a disease, right?

Starting point is 00:50:03 Height is a great example of this. One extreme end is like, you know, John, for example, he's like Mark, then the other end is like me, dwarfism, right? It's like both ends, okay? So, you know, so, you know, IQ is another example of this. One end is like, you know, autism, the other end, it can actually be some sort of, you know,

Starting point is 00:50:20 a cognitive basically challenge that people have. And so when you think about it, when you start realizing that people have drawn a line in the sand saying, you can get rare diseases, you can get common diseases, but then they really said, you can't get any traits like height, even though the best predictor we have today, actually in the world,

Starting point is 00:50:36 the best polygenic predictor is for height. So as a company, we've kind of completely reimagined this and said, wait a second, what's going on here? You should have access to the entire stack. Rare diseases, we do. Cystic fibrosis, common diseases like breast cancer, and also traits all the way up to something like IQ. Yeah, so I mean, that test,

Starting point is 00:50:54 are you just giving people the data? Because I imagine that once you get into particular recommendations, that's more of what I would expect a licensed doctor to need to do. Well yeah, my sense is that you can allow people to get the data from their doctor and then feed it into nucleus, is that correct?

Starting point is 00:51:13 So that is correct, and actually we have a couple, there was like 10 announcements today, you know how we do it, we like to do 10 announcements in one day, we are actually very, very excited to announce a huge partnership with Genomic Prediction. So Genomic Prediction's actually the oldest embryo testing company that exists. They've done genome-wide testing embryos for almost a decade at this point, and I think they've done over 120,000 couples for PGTA, which is a specific kind of test.

Starting point is 00:51:36 And so we're actually partnering with them, so we make it very easy for Genomic Prediction customers to request their files and actually port it over to Nucleus. But really, this isn't just for genomic prediction customers. Anyone who's undergoing IVF can go to their clinic and say, I want my embryos data. You can take that data. You can upload it to nucleus and then all of a sudden, you know, the application of your DNA of DNA makes this technology universally, uh, basically universally accessible.

Starting point is 00:52:00 Now how much of, how much of the benefit is, is actual, uh, algorithmic analysis bringing in other data points to contextualize the data versus just better UI and better hydration of existing text? Because we had a friend on the show who was talking about getting some medical results from a doctor. The doctor's office was closed. It took two days until the doctor was office was closed, it took two days until the doctor was gonna be able to interpret the results.

Starting point is 00:52:28 He was able to just take a photo, upload it to chat GPT, and say, hey, is this really, really bad? Should I be panicking? Because it seems somewhat out of the range. And chat GPT was able to say, hey, you still gotta talk to the doctor, but this isn't the craziest thing I've ever seen. This is way out of distribution. And so that's almost like a pure UI layer,

Starting point is 00:52:47 but extremely valuable. I know it might not be like the right narrative for some people that it's like not as innovative, but I think that like all that matters at the end of the day is giving people benefit. It's always both. It's always both. You fundamentally technology just for technology sake, it's not siliconized about, right? Siliconized by making something that people want. Okay. And people can actually use. So you think about the nucleus innovation, it's, it's not siliconized about, right? Siliconized about making something that people want, okay? That people can actually use.

Starting point is 00:53:06 And so if you think about the nucleus innovation, it's too prompt, okay? One is in the informatics, right? I've been doing this for five years. I almost, I would argue to myself that I probably spend too much time developing the science. Science in a nutshell isn't actually very useful. You need to expand it, access to it.

Starting point is 00:53:21 So on that point, we do multiple different kinds of analyses that make it such that we can actually provide the most comprehensive analyses that exist today, So you need to expand that access to it. So on that point, we do multiple different kinds of analyses. They make it such that we can actually provide the most comprehensive analyses that exist today. But moreover, and this is really, I think, a key point to your point, John, is people understand them. People can see them. I mean, you can pull up the platform.

Starting point is 00:53:36 I'm not sure if you guys have shown it already, but it's very easy to sort, compare your embryos. You can actually name your embryos, you can stack rank your embryos, you can understand what the score means. We lead with overall risk or we tell you, for example, instead of saying you're in the 99% top for genetic risk for a condition, which, you know, what does it actually mean? We say, hey, you have a 5% chance or the like of, let's say, schizophrenia or some other condition. In other words, by leaving overall risk, people have a much greater intuitive understanding

Starting point is 00:53:59 of the results we're communicating to them. We have genetic counselors on hand. So this really is a, what are we showing here? Are we showing something? Are we showing the- Yeah, yeah, we pulled up here. We're showing actually-. We have genetic counselors on hand. So this really is, what are we showing here? Are we showing something? Are we showing the- Yeah, yeah, yeah, we pulled up here. We're showing actually- Pull up your website. That's another thing.

Starting point is 00:54:09 That's a fun one. That's an Easter egg. That's an Easter egg. That's the kind of approach that we're taking here. And I think consumers are responding to it, right? People want to have access to their data. The clinician, the doctor shouldn't decide what embryo to implant.

Starting point is 00:54:22 You should. Okay, so talk to me about what requires FDA approval, obviously new medical devices. Like if you were developing a machine to take in an embryo and sequence the DNA, I would expect that the FDA would want an approval for that medical device. But if you are taking data and just showing it

Starting point is 00:54:42 to a customer in a different UI, that feels like probably a very light FDA process, and then there's probably a continuum in the middle where once you're making a recommendation, they have rules around that, right? We as a company do not tell you which embryo to implant. Sure. You know, basically parents, the couple,

Starting point is 00:54:59 has complete agency to decide how they want to use the information to implant their embryo. Moreover, let's be clear, height, right? I mean, can a height analysis be a medical device? It doesn't even make sense, right? IQ, height, these traits, for example, we all, you know, traits are something that I don't think actually belongs in even the kind of infrastructure thing about medical care, right?

Starting point is 00:55:18 These are things that go beyond medical care. These are things like, you know, that people just kind of intuitively know and that there are DNA tests done every single day due to see for these analyses, because they're not disease analyses, right? So we do both diseases and traits to be clear. My point is many of these innovations that you have to wonder, like, you know, should the government say if someone can or cannot pick their embryo based off height? That doesn't seem right to me.

Starting point is 00:55:38 I think it should be in the complete liberty of the individual to decide that. Yeah, but I mean, we're a democratic country. And so if, you know, a huge swath of the population says that the FDA should review that type of test or that type of analysis. Ice analysis? It could happen. I mean, the FDA reviews all sorts of different stuff. And so I guess the question shifts to like,

Starting point is 00:56:01 do you expect a change from FDA on the way these analysis tools are regulated? I think right now the most important thing is just putting these high quality, rigorous science results in people's hands and then helping them basically have healthier children, helping them give their child the best start in life. I think that generally speaking, that people should have more liberty, more choice in medicine. I think the broader longevity trend

Starting point is 00:56:32 actually touches on that point as well. So that's what we're excited to do at Nucleus. Yeah, I mean, the fact that you're partnering with a company on the actual medical device side, they are doing the sequencing of the embryos. That really takes it out of the Theranos question entirely in my mind. I feel like you should be beating the drum there

Starting point is 00:56:49 a little bit more. It's like, we didn't say we created some new device, but I don't know if you have to find this. We ship, we ship. That's the difference. We ship, it's law and law, baby. Don't look at it, go use it. That's the evidence.

Starting point is 00:57:00 It's the truth, okay? I love the visual of John and his wife selecting between embryos and it's like six, 10 or seven, two tough choice. Well, if we go with the six, 10, he has, you know, potentially fly commercial once in his life. Actually, we can actually play this game right now. Okay. Here, we're going to play a game right now.

Starting point is 00:57:24 I'm going to put it in the chat. Okay. Your embryo right now. Okay, here, we're gonna play a game right now. I'm gonna put it in the chat. Pickyourembryo.com. Okay, everyone listening to this, pickyourembryo.com. I'm gonna go to it. Oh my God, here we go. Little Easter egg here. Okay, let's see what's more important to you, John, intelligence or muscle strength?

Starting point is 00:57:36 Come on. Why do you jack up too smart? Oh, absolutely, muscle strength. Let's go, we're the future's bodybuilding. Let's go. Okay, a hider last week. You have a lot of hiders, right? John would take a, he would happily have a five two sun

Starting point is 00:57:46 if he had, you know, top 0.01% bodybuilding genetics. Exactly, yeah. Okay, so which one, your lifespan or height? Come on, lifespan. Lifespan, let's go. Let's go, let's go, let's go. Maybe low depression, you gotta be golden retriever mode. You gotta be-

Starting point is 00:58:02 You need low depression. You need low depression. Let's go low OCD I don't mind bouncing around a bunch. Okay Anxiety, let's go high risk taking I got an audio to the enduring athlete. Let's go, physically strong, cautious, built to last. Yeah, this is great. Is this driving a lot of attention, a lot of downloads, is this going viral yet?

Starting point is 00:58:31 This seems like something that's designed to be shareable. I think we just dropped it right now. Technology brothers, we got you the exclusive. Let's go, let's go. There you go. Let's put it out there. You know they can pick your embryo, people say, what's it like, maybe you're not doing IVF yet.

Starting point is 00:58:40 No problem. Only 9% of people choose Nadia. Okay, well, we're contrarian. We like that here. Yeah, that's fine. That's great. Oh, well, well, congratulations on the news. Congratulations on the launch. Yeah, the pace is wild. Last last thing what's going on with? Have you seen these just blood billboards? Oh, yeah, they're all over LA. So so there is there's someone who's running a campaign right now,

Starting point is 00:59:06 Justice for Elizabeth Holmes, claiming that Theranos was not the scam people think it was. And there's a documentary coming out, and there's billboards all over LA for just blood. Like, it's just blood. It's not that big of a deal. And John, to be clear, there's an exclusive on Technology Brothers next week from this person, right?

Starting point is 00:59:24 They're going to tell their story next week just to make sure. You invited them already, I hope. I want to hear their story. We are toying with the idea that someone reached out to kind of connect us. We're thinking about doing it, but we're not 100% sure that it would be appropriate for the show.

Starting point is 00:59:38 Based on the website, I don't know if it's appropriate. Yeah, it doesn't look like it was designed with Figma, so I don't know. We can't quite do it. It's a little bit. Yeah, it doesn't look like it was designed with Figma, so I don't know. We can't quite do it. It's a little bit. The team definitely doesn't use linear. Yeah, but they claim that Elizabeth Holmes has been proven innocent.

Starting point is 00:59:53 And so it's a bold claim. We like to see people making bold claims. By what jury, is my question. Yeah, the jury of someone who knows HTML. Kian, the energy is off the charts. Electric. Yeah. Jury of someone who knows HTML. Kian, always a great time. The energy is Fantastic.

Starting point is 01:00:07 Electric. Electric. Thank you for coming on, firing us up. Congratulations on the launch. We will talk to you soon. I'll see you on Twitter for sure, okay? We'll see you there. Bye guys.

Starting point is 01:00:17 Bye. We have someone from OpenAI here. We're gonna stick to technology and business, but welcome to the show, Mark Chen. Good to see you. Great to see you guys. Thanks for having me. Awkward business, but welcome to the show, Mark Chen. Good to see you. Great to see you guys. Thanks for having me. Awkward day, but I'm excited to talk about Deve Research.

Starting point is 01:00:30 I am excited to talk about AI products. Would you mind introducing yourself and kind of explaining what you do because OpenAI is such a large company now and there's so many different organizations. I'd love to know how you interact with the product and the research side and anything else you can give to contextualize this conversation. Yeah, absolutely. So first off, thanks for having me on. I'm Mark, I am the chief research officer at OpenAI.

Starting point is 01:00:53 So in practice, what that means is I work with our chief scientist, Jakub, and we set the vision for the research org, we set the pace, we hold the research work accountable for execution. And ultimately we really just want to deliver these capabilities to everyone. That's amazing. In terms of research, I feel like a lot of the what happens in the research side is actually gated by compute. Is that a different team? Because what if the researchers ask for a $500 billion data center?

Starting point is 01:01:22 That feels like maybe a bigger, a bigger task. It is useful for us to factor the problem of research and also kind of building up the capacity to do that research. So we have a different team. Greg leads that, which really thinks holistically about data center bring up and how to get the most compute for us. And of course, when it comes to allocating that computer research, you know, Yakov and myself do that. That's great. And so, uh, what, uh, what can you share that's top of mind right now on the research side? There's been this discussion of pre-training scaling wall, potentially,

Starting point is 01:02:01 the importance of reinforcement learning, uh, reasoning. There's so many different areas to go into. What's actually driving the most conversations internally right now? Yeah, absolutely. So, um, I think really it's a really exciting time to do research. Um, I would say versus two or three years ago, I think people were trying to build this very big scaling machine. Yeah. Um, and really the reasoning paradigm changed a lot of that, right?

Starting point is 01:02:24 You know, like reasoning is really taking off and it really opens this new playing ground, right? It's like there are a lot of kind of known unknowns and also unknown unknowns that, you know, we're all trying to figure out. It kind of feels like GPT-2 era, right? Well, where there's so many different hyperparameters you're trying to figure out. And then I think also, you know, like you mentioned, you know, pre-training, that's not to be forgotten either. You know, today we're in a very different regime of pre-training than we used to be, right? Today, we can't treat data as this infinite resource. Yeah, I think a lot of academic studies, you know, they've always kind of treated, you know, you have some kind of finite compute, but infinite data, I don't think there's much study of, you know, like, you know, you have some kind of finite compute, but if in a data, I don't think there's much study of, you know, like, uh, you know,

Starting point is 01:03:07 finite data and infinite compute. And I think, you know, uh, that also leads to a very rich playground for research. Do we need kind of a revision to the bitter lesson? Is that a refutation of the bitter lesson or, or do we just need to re re rethink what the definition of, of scaling laws looks like. No, I don't think of anything as a refutation of the bitter really like our company is grounded in. We want simple ideas that scale.

Starting point is 01:03:35 I think RL is an embodiment of that. I think pre-training is an embodiment of that. And really at every single scale, we face some kind of difficulty of this form. It's just like, you gotta find some innovation that gets you past the next bottleneck. And this doesn't feel fundamentally very different from that. What is, what's most important right now on the actual compute side? We heard from Nvidia earnings that, that we didn't get a ton of guidance on the shift from training to inference usage of Nvidia GPUs, but it feels like it must be coming.

Starting point is 01:04:10 It feels like this inference wave is happening. Are those even the right buckets to be thinking about tracking metrics in terms of the story of artificial intelligence? Because, yeah, I mean, it's like, if the reasoning tokens are inference tokens metrics in terms of the story of artificial intelligence. Because, yeah, I mean, it's like if the reasoning tokens are inference tokens, but they're what lead to higher intelligent, more intelligent models, like it's almost back in the training bucket again.

Starting point is 01:04:37 What bucket should we be thinking about? And, or are we, how firmly are we in the applied AI era versus the research era? Well, I think research is here to stay. And it's for all the reasons I mentioned above, right? It's such a like a rich time to be doing research, but I do think, you know, inference is going to be increasingly important as well, right?

Starting point is 01:05:02 It's such a core part of RL that you're doing rollouts. And I think, you know, we see 2025 as this year of agents, right? We think of it as a year where models are gonna do a lot more autonomous work. You can let them kind of be unsupervised for much longer periods of time. And that is just gonna put big demands on inference, right?

Starting point is 01:05:23 When you think about kind about our overall vision, we lay it out as a series of steps and levels on the way to AGI. And I think the pinnacle, really that last level, is organizational AI. You can imagine a bunch of AIs all interacting. And yeah, I think that's just going to put huge demands on inference.

Starting point is 01:05:43 On that organizational question, I remember reading AI 2027 and one of the things that they proposed was that the AIs would actually like literally be talking to each other in Slack. Does that seem like the way you imagine agents playing out, like using the same tools as humans instead of kind of- One agent says, I'm going to go talk with teams. Yeah. Talk with Slack.

Starting point is 01:06:08 I'm going to do a little negotiating. But maybe it just happens super, super fast 24 seven or, or is there like a new machine language that emerges? Yeah. I mean, I think one thing that's really helped us so far in AI development is to come in with some priors for, you know, how humans do things. And that's actually, if you bake those priors in, they typically are great starting points. So I could

Starting point is 01:06:32 imagine maybe you start with something that's Slack-like and give it enough flexibility that it can kind of develop beyond that and really figure out the way that's most effective for it to communicate. One important thing though is we want interpretability too. I think it's very helpful for us today that what the agents do is easy for us to read and interpret. I don't think you want that to go away as well. I think there's a lot of benefits just even from a pure debug the whole system perspective, so just let the models speak in a way

Starting point is 01:07:06 that is familiar with us. And you can also imagine we might want to plug in to the system too, right? So whatever interfaces we're familiar with, we would ideally like our model to be familiar with as well. I think it's also pretty compatible with, we hit a big milestone. We got, I think, three also pretty compatible with, you know, we hit a big milestone.

Starting point is 01:07:25 We got, I think, three million paying business users for free for this week. Let's go! Yeah, there we go, let's go. Yeah. Again, I think that... Three gong hits for three million. The gong will keep ringing for a while.

Starting point is 01:07:42 Sorry, we had to do it. I was hoping you would drop a number. Yeah, yeah. Congratulations, that's actually huge. That's amazing. Yeah, yeah, yeah. But I think one big part of that is, we have connectors now, right?

Starting point is 01:07:55 We're connecting into G drives, and I think, yeah, you can imagine, like Slack integrations, things like that. I think we just want the models to be familiar with the ways we communicate and get information. Yeah. Can you talk about benchmarking? It feels like we're potentially- Yeah, do you think about benchmarks at all?

Starting point is 01:08:14 Oh, yeah, a lot. I mean, but I think it's a difficult time for benchmarks, right? I think we used to be in this world where you have these human written benchmarks for other humans, right? And I think we used to be in this world where you have these human written benchmarks for other humans. I think we all have these norms for what are good benchmarks. We've all taken the SAT, we all have a good conception of what it

Starting point is 01:08:33 means to get whatever score on that. But I think the problem is the models are already at the point where for even the hardest human written benchmarks for other humans, it's really near saturated or saturated, right? I think one clear example here is the Amy, like probably the hardest autogradable like human math eval, at least in the US. And yet the models are consistently getting like 90 plus percent on these. And so what that means is, I think there's kind of two different things that people are doing, right?

Starting point is 01:09:13 They're developing kind of model based benchmarks, right? They're not kind of things that we would give to an ordinary human, things like humanities last exam, things like, you know, epic AI that are really, really at the at the frontier of what what people can do. And I think the hard thing is, it's not grounded in intuition, right? Like, you know, you don't have a lot of people who have taken these exams. So it makes it harder to kind of calibrate on whether this is a good exam or not. One of the exciting things that's on the flip side of that is, I really do think we're at the era where models are going to start innovating, right?

Starting point is 01:09:48 Because I think once you've passed the last kind of like the hardest human rating exams, that's kind of at the edge of innovation. And I think you already see that with the models, right? Like they're helping to write parts of papers. And I think the other kind of way that people have shifted is, you know, there's these, you know, ultra frontier evals, but there are also people kind of just indexing on real world impact, right? You look at your revenue, kind of the value you deliver to users.

Starting point is 01:10:15 And I think that's ultimately what we care about. Can you bring that back to interpretability research, like with these super, super hard math evals, for example, are we doing the right research to understand if the thought process mirrors, not just one-shotting the answer, oh, you memorized it or you magically got it correct, but you actually took the correct path, kind of like you're graded for your work, not just the answer. If you're in grade school.

Starting point is 01:10:48 And, and, you know, Dario said that, uh, interpret, interpret, interpret ability research will actually contribute to capabilities and even give a decisive lead. Do you agree with that? What's your reaction to that concept of interpret ability research being very important? Yeah. I mean, we care a lot about it here at OpenAI as well. So one thing that we care a lot about is interpreting how the model reasons, right?

Starting point is 01:11:11 Because I think we've had a very kind of specific and strong view on this in that we don't want to apply optimization pressure to how the model thinks so that it can be faithful in the way it thinks and to expose that to us, you know, without any kind of incentives to cater to what the user wants, right? I think it's actually very important to have that unfiltered view because, you know, oftentimes, like if the model isn't sure, you don't want to hide that fact, right? Just for it to kind of please the user. And sometimes it really isn't sure, right?

Starting point is 01:11:46 And so we've really done a lot of work to try to promote this norm of chain of thought, faithfulness and interpretability. And I think it gives you a lot of a sense into what the model is thinking and, you know, what are the pitfalls that it can go off into if it's not reasoning correctly? That's such an important point.

Starting point is 01:12:05 Cause if you have somebody on your team and they come to you and they say, Hey, you know, I think this is the right answer, but we should probably verify it. It's like, it's still valuable. Totally puts you on the right path. If somebody comes to you a hundred percent confidence, this is, this is the truth. Well, like trust is just destroyed. Yeah, totally.

Starting point is 01:12:23 You guys feel like, you know, safety felt a lot more theoretical a couple years back, right? But like today, you know, like the things that people were talking about a couple years, like scalable oversight, really having the model be able to tell you, like, and convince you that the work it did was right, it feels so much more relevant

Starting point is 01:12:39 right now. 100%. Just because the capabilities are so strong. Yeah, I mean, just personally, I've completely flipped from being like, oh, the safety research is not that valuable because I'm not that worried about getting paper clipped. It just seems like a very low likelihood

Starting point is 01:12:53 that that's kind of like the bad ending, like immediately in this foom and all this crazy, the gray goo scenarios were just so abstract in sci-fi. It just felt like economics will fall into place and there will be like a cold, like a nuclear ending, which is like we didn't build nuclear plants and we just stopped everything because we humans seem to be good at that.

Starting point is 01:13:13 But now that we're actually seeing things like that. Yeah, it's crazy how fast it's been, right? I think my personal story is it's like, what got me into AI was AlphaGo, right? Like just watching it get to that level of capability. Yeah. And you were kind of like, it was such an optimistic and also kind of a little bit of a sobering message, right?

Starting point is 01:13:33 When you saw at least it'll get beat. And I just remember, you know, like we saw the coding models, you know, when we first launched like, I think very OG codecs, you know, with GitHub Co-pilot, it was maybe like under, you know, a thousand Elo on, on Codeforces. And I still remember the meeting where I walked into where the team showed my score and they're like, Hey, with models better than you. You come full circle and it's like, wow, like I put decades of my life into this

Starting point is 01:14:00 and the capabilities are there. So like, if you know, I'm kind of at the top of my field in this thing and it's better than me, like what can it do? Yeah. Yeah. That's amazing. Uh, I have so many more questions on AlphaGo. Are there, uh, are there lessons from scaling, how scaling played out there that you can, that we can abstract, abstract into the rest of AI research. What I mean is, uh, as I remember it, the alpha go training run was not a hundred K H

Starting point is 01:14:32 two hundreds. Uh, but what would happen if we actually did an alpha go style training run? I mean, it would be an economic money pit, right? Like they've had no economic value to do, but let's just say some benevolent trillionaire decides, I'm going to spend a billion dollars on a training run to beat AlphaGo and go even bigger. Is Go at some point solved? Would we see kind of diminishing scaling curves?

Starting point is 01:14:58 Could we throw extra R out? Could we port back everything that we've doing in just general AGI research and just continue fighting it out in the world of Go? Or does that end and does that teach us anything? Yeah. Honestly, I feel like if you really are curious about these mysteries, join our team. Yeah. I mean, really like kind of the central problem of today is RL scaling. When you look at AlphaGo, it's a narrow domain, right? Yeah. I think in some sense that limits

Starting point is 01:15:28 the amount of compute you can pump into it. But even kind of small toy domains, they can teach you a lot about how you scale RL. What are the axes where it's most productive to pump scale in? I think a lot of scaling research just looks like that, whether it's on RL or pre-training. So you identify a lot of different research just looks like that, whether it's on RL or pre-training. So you identify a lot of different variables under which you can scale,

Starting point is 01:15:49 and where you get the best marginal impact for pumping scale there. I think that's a very open question for RL right now. I think what you mentioned as well, it's just like going from narrow to broad. Does that give you a lever to pump a lot more scale in as well? Um, I think when you look at our reasoning models today, there are a lot more broad based than, uh, you know, just being able to kind of an expert system on go. Um, so yeah, I really do think that, um, there are so many levers to scale.

Starting point is 01:16:22 What about move 37? That was such an iconic moment in that AlphaGo Lisa doll match. They placed move 37. It's very unconventional. Everyone thinks it's a blunder. It turns out not to be, it turns out to be critical. It turns out to be, it turns out to be innovation. Do you think we are, we're certainly post touring test in language models. We're probably post touring test in image models. We're probably post-touring test in image generation, but it feels like we're pre-Move 37 in text generation, in the sense that there hasn't been

Starting point is 01:16:54 like a fully AI-generated book that everyone is just, oh, it's the new Harry Potter, everyone has to read it, it's amazing, and it's fully generated, or this image. The images, they do go viral, but they go viral because they're AI. Move 37 in the context of Go did not go viral because it was AI, but like it was actual innovation. So is that the right frame?

Starting point is 01:17:15 Does that make any sense? Yeah, I think it's not the wrong frame. So I think some quick thoughts on that. I think kind of when you have something that's, you know, very measurable, like win or lose, right? Like, go. Yeah, it's like very easy for us to kind of just judge, right? Like, did the model do something right here? And I think the more fuzzy you get, you know, it is just harder, right? Like, when it comes to you, and this is the next Harry Potter, right? Like, you know, it's not a universally loved book, I think, very universal, but you know, it is just harder, right? Like, when it comes to you, and this is the next Harry Potter, right? Like, you know, it's not

Starting point is 01:17:45 a universally loved book, I think, very universal, but you know, there's there's some haters. And yeah, I think it is just kind of hard when it comes to these human subjective things, right? Where it's really hard to put down in words, like what makes you like Harry Potter, right? And, and so, I think those are always going to lag a little bit, but I think we're developing more and more techniques to attack kind of these more open-ended domains.

Starting point is 01:18:14 And I don't know, I wouldn't say that we're not at an innovative stage today. So I think my biggest touch with this was when we had the models compete on the IY last year. So I like it's like the International basically Olympics for for computer science Basically the the top four kids from from each country go and compete and these are really really tough problems Basically selected so that they require some innovative insight to solve. We did see the model come up with solutions,

Starting point is 01:18:52 even to some very ad hoc problems. I think there was a lot of surprise for me there. I was completely off-base about which problems the model would be able to solve the most. I think I categorized their six problems, some of them as more like, oh, this is standard, a little bit more standard, this is a little bit more out of the box. I was like, it's not going to be able to solve this more out of the box one, but it did. And I think it really does speak to these models have the capacity to do so, especially

Starting point is 01:19:25 trained with RL. Now, put that in context of what's going on with Arc AGI. Obviously, OpenAI has made incredible progress there, but it just, when I do the problems, it seems easy. And when I look at the IOI sample problems, I think this would be a 20-year process for me to figure out how to achieve that. And I can do the Arc AGI on my phone. Is this the spiky intelligence concept?

Starting point is 01:19:50 Is this something that a small tweak in algorithmic design, just one shots AGI, Arc AGI, or is there something else going on there that we should be aware of? Yeah, I mean, I think part of this is the beauty of Arc AGI as well, right? Like I think, I'm not of this is the beauty of RKG as well, right? Like, I think, I'm not sure if there's another kind of like human intuitive simpler benchmark, which is for the models. I think really, that's one of the things they really optimize for on that benchmark. I do

Starting point is 01:20:17 think when it comes to models, though, like, there's just a little bit of a perception gap as well, like, you know, models aren't used to this kind of native, you know, like just screen type input. I think there's a lot we can bridge there. Actually, even 04 Mini, it's a state of the art multimodal model in many ways, including visual reasoning. And I think, you know, you're starting to kind of build up

Starting point is 01:20:41 the capacity for the models to take images, manipulate and reason about them, models to take images, manipulate, and reason about them, generate new images, write code on images, and I think it's just been kind of under-focused, but I think when I talk to researchers in the field, they all see this as a part of intelligence, too, and we're gonna continue to focus there. Yeah, is RKGI, kind of in the,

Starting point is 01:21:06 if we're dropping a buzzword on it, is like program synthesis? Is there a world where, I know the tokens, like the images, we see them as renderings of squares in different colors, but when they're fed into the LLM, they're typically just a stream of numbers, effectively. Is there a world where actually adding a screenshot is what's important, like visual reasoning? Yeah, yeah, so I think that could be important.

Starting point is 01:21:31 It's just like kind of, you know, whenever it comes to like textual representation of grids, models today just don't really do that well. Right, and I think it's just kind of because humans don't really ever write down textual representations of groups. Right? Yeah, we have a chessboard, like no one really kind of just like types it out in a grid. And, and so the models are kind of like, undertrained a little bit on on

Starting point is 01:21:58 what that looks like and what that means. So, you know, I think with more reasoning, we'll just bridge the gap. I think with better visual perception, we'll just bridge that gap. Yeah. How are you thinking about the role of non-lab researchers in the ecosystem today? I'm sure you try to recruit some of the best ones,

Starting point is 01:22:19 but the ones that don't join your team. Tell us about the one that got away. Yeah, the one that got away. Yeah, no, I mean, I think it's still actually a fairly good time for specific domains, right, to be doing research. And I think the style is just very different. And you do feel the pull of non-lab researchers into labs because I think they feel like a lot of the burning problems

Starting point is 01:22:43 in the field are at scale, right? And that's kind of one of the unfortunate things too, right? Like when you look at reasoning, you just don't see that happen at small scale, right? There's like a certain scale at which it starts becoming signal bearing and that requires you to have resources, right? But I do think, you know, a lot of the really good work that I've seen, you know, there's experimental architectures, I think a lot of good work is happening in the academic world there, like a lot of study in optimization,

Starting point is 01:23:14 a lot of study in kind of like GANs, you know, there's certain fields where you see a lot of fruitful research that that happens in academia. Yeah, that makes a lot of sense. How about that happens in academia. Yeah, that makes a lot of sense. How about consumer agents? How are you thinking about them? You talked earlier about sort of B2B adoption, and that's all very exciting, but how much do you and the research org

Starting point is 01:23:37 think about breakout consumer agent products? Yeah, that's a fantastic question. I think we think about it a lot. I think that's the short answer. You know, we really do think like this year we're trying to focus on how we can move to the agentic world, right? And when I think about consumer agents, I think like ChatGPD proved that, you know, people got it, right? It's like people get conversational agents when a conversational kind of models. But when it comes to consumer agents, we have a couple of theses

Starting point is 01:24:08 that we've tried out in the world. I think one is deep research, right? I think this is something that can do five to 30 minutes of work autonomously, come back to you and really like kind of synthesizes information, right? It goes out there, gathers, collects, and kind of compresses the information in a form that's useful for you.

Starting point is 01:24:28 A little bit of pushback there, I can see that as a consumer product when someone like Aidan is like, I want new towels, and he uses deep research to figure out what is the best towel across every dimension. But when I think of deep research, yes, it has applications with students, but it's often, you know,

Starting point is 01:24:47 and I guess it could be consumers being like, give me a deep research report on this country and where to travel and things like that. We keep using this flight example, but I haven't actually tried to book a flight with deep research. It's totally possible that it could go and pull all the different flight routes

Starting point is 01:25:01 and calculate all the different delays and all the different parameters of, if I the different delays and all the different parameters of if I fly to this airport in park or I can use valet here or something like that. Yeah. And I guess like when I think of agents, it's deep research is like, you know, curating information on which you can take action on. But it's like at what point is action a part of that sort of loop, right?

Starting point is 01:25:23 Where you can not only curate a list of flights that you want, but then, you know, actually go out and have agency. I think one of our explorations in that space is operator. It's where you kind of just feed in raw pixels from your laptop into, or, you know, from some virtual machine into the model. And it produces, you know, either a click

Starting point is 01:25:44 or some keyboard actions. And so there it's taking action. And I think the trouble is, you don't ever want to mess up when you're taking action. I think the cost of that is super high. You only have to get it wrong once to lose trust in a user. And so we want to make sure that that feels super robust before we get to the point where we're like,

Starting point is 01:26:09 hey, look, here's a tool. That's so different than deep research because you can wind up on some news article and read one sentence that it gets a fact wrong or the commas in the wrong place and the numbers off. But that's just the expectation for just text and analysis. And if you delegated that, yeah, you're going to expect a few errors here and there, oh, that's actually a different company name or that's the, that's an

Starting point is 01:26:34 old data point, there's new data, uh, but very different if I book a flight and you book the wrong flight and I can wind up in Chicago instead of New York. Exactly. And I think the reason why we care so much about reasoning is because I think that's the path that we get reliable agents through. Sure. You know, we've talked about like reasoning helping safety,

Starting point is 01:26:53 but reasoning is also helping reliability, right? It's like, you imagine like, what makes a model so good at a math problem? It's like, it's banging its head against it. It's trying a different approach. And then it's like adapting based on what it failed at last time. And I think that's the same kind of behavior you want your agents to have. It's like, yeah, it's things like adapts and keeps going until it's, and that's, that's the humans do this every day. You're booking a flight, you keep

Starting point is 01:27:18 hitting an error. It's not which or which form you missed, right? And you're just sort of banging your head against the computer and eventually it says, okay, you're booked, right? So I think that's a great call out. Yeah. There's so many more questions we could go into, but I'm interested in the scaling of RL and kind of the balancing act between pre-training RL and inference, just the amount of energy

Starting point is 01:27:43 that goes into getting a result when you distribute it over the entire user base, how is that changing? And I guess, are we post really big runs? Is this gonna be something that's continually happening online? It feels like we're moving away from the era of, oh, some big development, some big run happened, and now we're grouping the fruit fruits of it versus a more iterative process.

Starting point is 01:28:12 Um, yeah, I mean, I don't see why it has to be so right. I think like, if you find the right levers, you can really pump a lot of compute into RL as well as pre-training. I think it is a delicate balance though, between all of these different parts of the machine. And when I look at my role with Jakob, it's just kind of like, figure out how this balance should be allocated,

Starting point is 01:28:33 where the promising kind of like nuggets are arising from and resourcing those. Yeah, it's kind of a, in some sense, I feel like part of my job is a portfolio manager. That's a lot of fun. Well, thank you so much for joining. This was a fantastic conversation. We'd love to have you back and get deeper.

Starting point is 01:28:50 Great hanging, Mark. We'll talk to you soon. Absolutely. Yeah. Peace. Have a good one. Next up, we have Shalto Douglas from Anthropic coming on this show. I'm getting so many.

Starting point is 01:28:59 I just- Jordy is giving us the update on the- No, I'm just getting a lot of messages saying why no one cares about a talk about the drama on the timeline. Well, we do care about AI. We care a lot about AI, but it is a mess out there. Wow. Yeah. The end of the Trump Elon era. Uh, I don't know. Well, maybe we have to get some people on to talk about it tomorrow or something. Anyway, we have Shalto from Anthropic in the studio.

Starting point is 01:29:29 How are you doing? How you doing? Good to see you guys. Hopefully you're staying out of the chaos on the talk. Don't open the time. Don't open, we're doing you a favor. Sweet child. We're just doing it.

Starting point is 01:29:42 We're just doing it right now. Yeah, mute everything. Stay focused on the application layer. Stay focused onute everything. Stay focused on the application layer. Stay focused on the mission. Stay focused on the next training run. Humanity really cannot afford for any AI researchers to open X today. What a hilarious day.

Starting point is 01:29:55 Anyway, I mean. You might have to back by 24 hours, guys. Yeah, how are you doing? What is new in your world? What are you focused on mostly day to day? And maybe it's just a way of an intro? Yeah.

Starting point is 01:30:07 So at the moment, focused really hard on scaling RL. That is the theme of what's happening this year. And we're still seeing these huge gains where you go 10x compute increase in RL, we're still getting very distinct linear gains based on that. And because RL wasn't really scaled anywhere close to how much pre-training was scaled at the end of last year, we have like a basically a gamut of like riches over the course of this year. So where are we in that, in that RL scaling story?

Starting point is 01:30:34 Because I remember the, the some of the rough numbers around like GPT two, GPT three, we were getting up into like, it costs a hundred million dollars. It's going to cost a billion dollars. Like, just rough order of magnitude, not even from Anthropic, just generally, like what is a big RL run cost or how many, are we talking 10K H200s or 100K? Like, are we gonna throw the same resources at it?

Starting point is 01:30:57 And if so, how soon? Yeah, so I think in Dyer's essay at the beginning of the year, he said that a lot of runs were only like a million dollars back in like December. I think you have like deep seek v3 and this kind of stuff like r1, which means that with that's like at least two rooms just to get to the scale of GPT four and GPT four was two years ago. Yeah, right. RL is also perhaps a bit more naively parallelizable and scalable than pre training.

Starting point is 01:31:20 You're pre training, you need everything in one big data center ideally, or you need some clever tricks. RL, you could, in theory, like what the prime intellect folks are doing, scale it all over the world out of it. And so you're held back far less than you are in free training. Sure, sure. So everyone and their mother has a billion dollars now. There are hundreds of thousands of GPUs

Starting point is 01:31:43 getting pumped all over the place. I feel like we're not GPU poor as a society. Maybe some companies need to justify it in different ways, but it sounds like there's some sort of like reward hacking problem that we're working through in terms of scaling RL. What are all of the problems that we're working through to actually go deploy

Starting point is 01:32:05 the capital cannon at this problem? Yes, so I mean, think about what you're asking the model to do in RL is you're asking it to achieve some goal at any cost, basically. Yeah. And this comes with a whole host of behaviors, which you may not intend. In software engineering, this is really easy.

Starting point is 01:32:21 I like to, it might try and hack unit tests or whatever. In much more longer horizon real world tasks, you might ask it to say go make money on the internet. And it might come up with all kinds of fun and interesting ways to do that unless you find ways to guide it into following the principles that you want it to obey, basically, or to align it with your like idea of what sort of best for humanity. And so it's actually, it's a pretty intense process.

Starting point is 01:32:46 It's a lot of work to find out and hunt down all the ways these models are hacking through the rewards and patch all of that. Are we going to see scaling in the number of rewards that we're RLing against, if that makes sense? I would imagine that at a certain point, unless we come up with kind of like the Genesis prompt, go forth and be fruitful or something and multiply,

Starting point is 01:33:15 you could imagine training runs on just knocking down one problem after another, and is that kind of the path that we're going down? I very much think so. There's this idea in which the world becomes an RL environment machine in some respects, because there's just so much leverage in making these models better and better

Starting point is 01:33:34 at all the things we care about, and so I think we're gonna be training on just everything in the world. Got it. And then does that lead to more model fragmentation, models that are good at programming versus writing versus poetry versus image generation, or does this all feed back into one model?

Starting point is 01:33:55 Does the idea of the consumer needing to pick a model disappear, are we in a temporary period for that paradigm? I think the main reason that we've seen that so far is because people are trying to make the best of the capital, like we are all still GPU poor in many ways. Okay. And people are focusing those GPUs

Starting point is 01:34:15 on the sort of like spectrum of wars that they think is most important. And I'm a bit of a big model guy. I really do think that similar to how we saw with large pre-trained models before, with small fine-tuned models made it, like had gains over the sort of GPT-2 era, but then were obsolete by GPT-4

Starting point is 01:34:35 being generally good at everything. I think to be honest, you're gonna see this generalization and learning across all kinds of things. That means you benefit from having large single models rather than specialization or area fine-tuned models. Can you talk a little bit about the transition or any differences between RLHF and just other RL paradigms?

Starting point is 01:34:55 Yes, so RLHF, you're trying to maximize a pretty, like lossy signal, things like airwaves, like what do humans prefer? And I don't know if you've ever tried to do this, like judge two language model responses. I get prompted for that all the time. Right. And I'm always like, I don't want to read both of those.

Starting point is 01:35:12 I'll just click the one on the left. Exactly, exactly. And I click one of the random ones sometimes. Yeah, or I click like the one that just looks bigger, or I'll read the first two sentences, but yeah, I'm not giving straight. I'm not doing my job as a human reinforcer. Exactly. Human preferences are easy to hack.

Starting point is 01:35:29 Yeah, totally. Environments in the world are much truer if you can find them. So something like, did you get your math question right? Is a very real and true reward. Does the code compile, right? Does the code compile, exactly. Did you make a scientific discovery? We've got very little rewards right now, but pretty quickly over the next year or two,

Starting point is 01:35:49 you're going to start to see much more meaningful and long horizon. You're going to see models bribing the Nobel committee to win. Good reward hacking. That's the reward you want to prevent. Right? Exactly. Yeah. Yeah. that's the real nightmare scenario. What about, there's so many different problems that we run into that feel like, it's just really, really hard to design any type of eval.

Starting point is 01:36:18 My kind of benchmark that I use whenever a new model drops is just tell me a joke, they're always bad, or even the latest VO3 video that went viral was somebody said like, uh, standup comedy joke. And it was kind of a funny joke, but it was literally the top result for joke Reddit on Google. And then it clearly just took that joke and then instigated in a video that looked amazing. Um, but it wasn't original in any way. And so we were joking about the RLHF loop for that

Starting point is 01:36:52 is like you have an endless cycle of comedians running AI generated materials and then microphones in all the comedy clubs to feed back what's getting the hafs. But. I mean honestly that would work pretty well. I can't wait. If any comedians want to focus off of the RL loop, I mean. Yeah, yeah. But for some of those less, like as you go down the curve, it feels like each one gets harder and harder to actually tighten the loop.

Starting point is 01:37:18 We see this with like longevity research where it's like, okay, it takes a hundred years to know if you extended a human life.. Like, yes, you could create a feedback loop around that, but every change is gonna be hundreds of years. And so, even if you're on the cycle, it's irrelevant for us in the context that we talk about AI. So, talk to me about, like, are you running into those problems,

Starting point is 01:37:37 or will there be, like, another approach that kind of works around those? So, there are a lot of situations where you can get around this by just running much faster than real time. Like let's say the process of building like a giant app, like building Twitter, right? It's something that would take human months. But if you got fast enough and good enough AI, you could do that in several hours.

Starting point is 01:37:56 Sure. Like realize heaps of AI agents that are building right now. And so you can get a faster reward signal in that way. In domains that are less well specified like humor, I agree it's really, really hard. And this is like why I think in some respects, like creativity is like at the top end of the spectrum, like true creativity is much, much harder to replicate

Starting point is 01:38:14 than the sort of like analytical scientific style reasoning. And that will just take more time. You know what? The models actually are pretty good at making jokes about being an AI. This feels weird, but fresh. Like everything else is kind of a weird copy of something. It just feels like it's derivative basically. It's trying to infer what humor is and it doesn't really understand it, but jokes about being an AI are quite funny. Yeah. I think this also might be, I don't know

Starting point is 01:38:39 if it was directly reward hacking, but I noticed that one of the new models dropped and a bunch of people were posting these like 4chan, like be me memes. And they were, it seemed like they were kind of hacking the humor by being hyper specific about an individual that they could find information on online. And so you're laughing at the fact that it's like, oh wow, that is like something that I've posted about it. It's making a reference,

Starting point is 01:39:02 but it's not really that funny to me. Other than it's just like, wow, they really did its research. Like, it really knows Tyler Cowan intimately, which is cool, but I didn't find it hilarious. Yeah, yeah, yeah, very interesting. Let's talk about some sort of deep research projects and products.

Starting point is 01:39:23 We were talking to Will Brown, and he he was saying like, AGI is here with some of the bigger models, but the time that AGI can feel consistent, it diverges. And so you could be working with someone who's 100 IQ, but they will stay consistent for years as an employee or they'll keep living their life. Whereas a lot of these super smart models are working really well and then after a few minutes

Starting point is 01:39:48 of work the agents kind of diverge and kind of go into odd paradigms. It feels very not human. It feels like just a, they're hyper intelligent in one way and then extremely stupid in the other. What's going on there? What is the path to extending that? Is that more, like having more better planning

Starting point is 01:40:08 and better dividing up the task? Or will this just kind of naturally happen through the RL and scale? Yeah, so there's that jaggedness, right? Which is what you're seeing, is how we call it. And I think that is largely a consequence of the fact that maybe something like deep-suit research is probably being RL'd to be really good at producing a report.

Starting point is 01:40:27 Yeah, but it's never been r l on the like, act of producing valuable information for a company over a week or a month or like making sure the stock price goes up in like, you know, a quarter or something like this, right? Like it, it doesn't have any conception of how that feeds into the broader story at play. It can kind of infer it because it's got a bit of world knowledge from the, you know, the base model and this kind of stuff, but it's never actually been trained to do that in the same way humans have.

Starting point is 01:40:49 So to extend that, you need to put them in much longer running, much like long horizon things. And so deep research needs to become deep operator company for a week kind of thing. Sure. Is that the right path? It feels like the road might be, there's the longest running LLM query used to be just like a few

Starting point is 01:41:12 seconds, maybe a few minutes. And I remember when, uh, when some of the reasoning models came out, people were almost trying to like stunt on it by saying like, Oh, I asked it a hard question. I thought for five minutes now, deep research is doing 20 minutes pretty much every time. Is the path two hours, two days, or are we gonna see more efficiency gains such that we just get the 20 minute results

Starting point is 01:41:35 in two minutes and then two seconds? Yeah, so this is somewhere where inference, in many respects, and prioritization becomes really important. So both, how fast is your inference, if that literally affects the speed at which you can think and the speed at which you can like, like, you know, do these experiments. Also, how easily you can prioritize becomes really important. Like, can you dispatch a team of sub agents to go and do deep research and

Starting point is 01:41:55 like compile like sub reports for you so that you can do everything in parallel, these kinds of like, it's, it's both like, there's an infrastructure question here that feeds up from the hardware and the chips and this kind of stuff to designing better chips for better inference and all this. And an RL question of like, how well can you parallelize and all this.

Starting point is 01:42:16 So I think we just need to compress the timelines, compress the time frames, basically. Yeah, so if I'm like an extremely big model and I'm running an agentic process, like how much am I hankering for like a middle sized model on a chip or like baked down into silicon that just runs super fast because it feels like that's probably coming. We saw that with the Bitcoin progression from CPU to GPU to FPGA to ASIC. Do you think we're,

Starting point is 01:42:46 we're at a good enough point where we can even be discussing that because I, every time I see like the latest mid journey, I'm like, this is good enough. I just want it in two seconds instead of 20. Um, but then a new model comes out. I'm like, Oh, I'm glad I didn't get stuck on that. Right. But, but yeah, like how far away from how far away are we from, okay, it's actually good enough to bake down into silicon. Well, there's a question here of baking it down to silicon versus designing a chip, which is like very suited to the architecture that you care about.

Starting point is 01:43:15 Right. And baking on the silicon, unsure, like, I think that's a bet you could take, but it's a risky one, because the pace of progress is just so fast nowadays. And I really only expect it to accelerate. But designing things that make a lot of sense for the transformers or architectures of the future should make a lot of sense. Wait, there's a big gap though,

Starting point is 01:43:36 transformers or architectures of the future. If we diverge, there's a lot of companies that are banking on the transformer sticking around. What is your view on transformer architecture sticking around for the next couple of years? I mean, look, they stuck around for five years, so they might stick around for a little while. You think about architectures in terms of this balance of memory bandwidth and flops, right? One of the big differences we've seen here is Gemini recently had actually a diffusion

Starting point is 01:44:00 model that they released at an IELTS the other day. Diffusion is inherently extremely flops intensive process. Whereas normal language model decoding is extremely memory bandwidth intensive. You're designing two very different chips depending on which bet you think makes sense. Yeah. And if you think you can make something that does flops like four times faster than diffusion and like four times cheaper than your those code, fusion makes more sense. So there's like, there's this dance basically between the chip providers and the architecture, both trying to build for each other, between the chip providers and the architecture, both trying to build for each other,

Starting point is 01:44:26 but also build for the next paradigm. It's risky. Do you, I don't know how much you've played with image generation, but do you have any idea of what's going on with images in ChatGPT? It feels like there's some diffusion in there, there's some tokenization, maybe some transformer stuff in there.

Starting point is 01:44:43 It almost feels like the text is so good that there's like an extra layer on top almost and that it's almost like reinventing Photoshop. And I guess the broader question is like, it feels like an ensemble of models, maybe the discussion around just agents and text-based LLM interactions shouldn't necessarily be transformer versus diffusion,

Starting point is 01:45:05 but maybe how will these play together? Is that a reasonable path to go down? Well, I think pretty clearly there's some kind of rich information channel between, even if there are multiple models there, there's like, it's conditioning somehow on the other model because we've seen before, let's say when models use mid-journey to produce images,

Starting point is 01:45:24 it's never quite perfect. It can't perfectly replicate what went in as an input. It can't perfectly like adjust things. So there's a link somehow, whether that's the same model producing tokens plus diffusion, I don't know. Like yeah, can't comment on what OpenAIR is doing there. Yeah.

Starting point is 01:45:40 Yeah, yeah. Are there any other kind of like super wild card, long shot research efforts that are maybe happening even, even in academia where, I mean, this was the big thing with, uh, what was his name? Gary. Uh, he was talking about, I forget what it was called. Symbolic symbol manipulation was a big one. And I feel like, you know, you can never count anyone out because it might come from behind and be relevant

Starting point is 01:46:06 in some ways. Um, but, but are there any other research areas that you think are like purely in the theory domain right now that are worth looking into or tracking that, you know, low, low, low probability, but high upside if they work. That's how fun this is tough one, this is a tough one. But we'll say it's not a symbolic thing. It's crazy how similar transformers are to systems that manipulate symbols.

Starting point is 01:46:31 What they're doing is they're taking a symbol and they're converting it into a vector and then they're manipulating and moving information around across them. This whole debate that all transformers can all represent symbols and they can't do this, it's not real. So Garry Mark is underrated or overrated, I guess.

Starting point is 01:46:51 Overrated. Yeah, yeah, but I mean, if you twist it so much, you wind up with saying, well, really, the transformer fits within that paradigm, and so maybe it's, you know, it's, you know, it like the rhetoric around it being a different path was maybe false the whole time. Yeah, something like that. But as I remember that debate, it was really the, the idea of compute scaling versus almost

Starting point is 01:47:23 like feature engineering scaling and, and will the progress scale with human hours or GPUs essentially and that has a very different economic equation and it feels like there's been some rumblings about maybe with a data wall we'll shift back to being human labor bound but do you think that there's any chance that that's relevant in the future, or is it just algorithmic progress married

Starting point is 01:47:50 with bigger and bigger data centers in the future? So I'm pretty bitter lesson built, hence that I do think removing as many of our biases and our clever ideas from the models is really important, just freeing them up to learn. Now obviously there is clever structure that we put into these models such that they're able to learn in this extremely general way. And but I am more convinced that we will be compute bound than we will be like human

Starting point is 01:48:17 researcher out human research, our bound on this kind of thing. Like, we're not going to be feature engineering and this kind of stuff. We're going to be trying to devise incredibly flexible learning systems. Yeah, that makes sense. On the scaling topic, part of my worry is that the ooms get so big that they turn into these mega projects that are,

Starting point is 01:48:44 at a certain point you're bound by the laws of physics because you have to move the sand into silicon chips and you have to dig up the silicon and at a certain point. Yeah, there's only so much sand and like the math gets really, really crazy just for the amount of energy required to move everything around to make the big thing. Where are you on how much scale we need to reach AGI, whether or not we will see

Starting point is 01:49:09 like the laws of physics start acting as a drag on progress because it certainly feels exponential, we're feeling the exponentials, but a lot of these turn into sigmoids, right? So I think we've got what, like two or three more ooms before it gets really hard. Leopold has this nice table at the end of his situational awareness where I think like 2028 or something is when under really aggressive timelines that you get to

Starting point is 01:49:34 20% of US energy production. Yeah, it's pretty hard to go exponentially beyond 20% of US energy production. Now, I think that's enough. Every indication I'm seeing says that's enough. Now, there might be some complex data engineering, we're engineering this kind of stuff that goes into, there's still a lot of algorithmic progress left to go, but I think that with those extra OOMs, we get to basically a model that is capable of assisting us in doing research and software

Starting point is 01:50:05 engineering. Yeah. Which is the beginning of the self-reinforcement. Yeah. Interesting. Is that just a coincidence? Like this feels like one of those things. This feels like one of those things where like the moon is the exact same size as the sun in the sky. It's like, Oh, it just happens that AGI happens within this time. Like, Hey, well, did you, have you unpacked that anymore? Because it feels convenient. Not to, you know, I know last month. There's a lot of weird conveniences are like weird. It's a good sci-fi story, let's say.

Starting point is 01:50:30 Totally. You know, we've got, you know, Taiwan in between China and the US, and it produces the most valuable material in the world. It's locked between the two. Incredible plot. Yeah. Incredible plot.

Starting point is 01:50:40 Yeah, really bad for the people that don't think of, that don't believe in simulation theory. It really feels like, oh, this is script theory. It really feels like it's a scripted, uh, it's fascinating. Um, well, I talked to me more about, um, uh, getting to an ML engineer in AI and, and kind of that reinforcement, I imagine that you're using AI code gen tools today and, and, and thropic is broadly and everyone is but but what

Starting point is 01:51:06 are you looking for and what are the what's the shape of the the spiky intelligence where do they fall flat and what are you looking to kind of knock down in the interim before you get something that's just like go yeah so I mean we definitely use them the other night I like I was a bit tired I asked to do something just sat watching it in front of me working for half an hour it was great. It was truly weird experience, particularly when you look back a year ago, and we're still copy pasting stuff between a chat

Starting point is 01:51:31 window and you know, a code file. Yeah. What I like meters evals for this kind of stuff. So they have a bunch of evals where they measure like the ability to write a kernel, the ability to run a small experiment and improve a loss. And they have these nice progress curves versus humans. And I think this is maybe the most accurate reflection of like what will take for it to really help us during progress. And there's a

Starting point is 01:51:55 mix here, like, where they're not so great at the moment is like large scale distributed systems engineering, right, like debugging stuff across heaps and heaps of accelerators and like the way the feedback loops are slow. And like, if your feedback loop is like an hour, then it's more you spending the time on doing something. And feedback is 15 minutes. And for context there, the hour long feedback loop is just because you have to actually compile and run the code across everything.

Starting point is 01:52:20 Exactly. You need to spin up all your machines or you need to like, you need to like run it for a while to see if something's gonna happen. Like at that point in time, you're still cheaper than the chips. So you're, you're, you're sort of, it's better than you do it. Yeah. But for things like your kernel engineering, offer, like, you're actually even just understanding these systems incredibly helpful. Like I one thing I regularly do at

Starting point is 01:52:42 the moment is in parts of the code base in like languages that I'm unfamiliar with or something like this, I'll just ask it to rewrite the entire file but with comments on every line. Game changing. It's like, yeah, or just come through like thousands of files and explain how everything interacts to me, draw diagrams, this kind of stuff.

Starting point is 01:52:58 It's really, yeah. Yeah, how important is a bigger context window in that example you gave that feels like something that's important and yet it just naively like Google's the one that has the million token context window. I imagine that all the other frontier labs could catch up, but it seems like it hasn't been as much of a priority as maybe like the PR around it sounds like.

Starting point is 01:53:19 Is that important? Should we be driving that up to like a trillion token window? Is that just gonna happen naturally? There's a nice plot in the Gemini 1.5 paper where they show the loss over tokens as a function of context length and they show that the loss goes down quite steeply, actually, as you put more and more and more

Starting point is 01:53:36 like over code base into context, you get better and better and better at predicting the rest. Yeah, that makes sense. But for context length, it's a cost. The way transformers work is that there's, you know, you have this memory that is proportional, the KB cache is proportional to how much context you've got. And so you can only fit so many of those

Starting point is 01:53:55 into like your various chips and this kind of stuff. And so a longer context actually just costs more because you're taking up more of the chip and you're sort of like, you could have otherwise been doing other requests. Bringing it back to the custom silicon, is that a unique advantage of the TPU? Is that something that Google has thought about and then wound up to put themselves in this advantage position or is it a durable advantage even? Yeah. So TPUs are good in many respects,

Starting point is 01:54:20 partially because you can connect hundreds or thousands of them really easily across really great networking. Whereas only recently has that been true for GPUs. Like, yeah, with NVLink and like the MVL 72 stuff. So it used to be like eight GPUs in a pod, and then like you connect them over worse in the cat. And now you can be 72. And then the brace down with Google, you can do like 4,000 8,000 of a really high bandwidth interconnect in one pod.

Starting point is 01:54:45 And so that is helpful for things like just general scaling in many respects. I think it's doable across any chip platform, but it is an example of somewhere that being fully vertically integrated is using a benefit. Yeah, that makes sense. Talk to me about Arc AGI. Why is it so hard?

Starting point is 01:55:04 It seems so easy. It does seem easy, doesn't it? Well, it certainly seems like more evaluatable than tell me a funny joke, right? Yeah, yeah, and I mean, I think if you RL'd on ArcGi, then you'd probably get superhumanized pretty fast. But I think we're all trying not to RL on it so that it functions as an interesting held out. Sure

Starting point is 01:55:25 Okay, wait, is that just an informal agreement between the labs? We try and have a sense of honor between us That's amazing. How many people on earth do you think are getting the full potential out of the publicly available bottles? Because we're now at a point where we have, you know Billion plus people are using AI almost daily. And yet I have to, my sense would be it's maybe like 10,000, 20,000 people on the entire planet are getting that sort of full potential. But I'm curious what your assessment would be.

Starting point is 01:55:57 Yeah, I completely agree. I mean, I think that even I don't get the full potential out of these models often. And I think as we shift from you're asking questions and it's giving you sensible answers to you're asking it to go do things for you that might take hours at a time and you can really like paralyze and spin that we're going to hit like yet another inflection point where even less people are like really effectively using these things

Starting point is 01:56:20 because it's basically going to require you to like, it's like a like Starcraft or Dota, like it's going to be like your APM of like managing all these agents and that's totally process yeah so Starcraft is such a good example you think you're just absolutely crushing it and then you realize like there's an entire area of the map you're just getting destroyed it's such a good it's such a good comp that's's great. Anything else, Jordy? I think that's it on my side. I mean, I would like this to be an evolving conversation. Yeah, this was fantastic.

Starting point is 01:56:51 We'd love to have you back and keep chatting. Absolutely, it was really fun. Love to be back on class. Yeah, we'll talk to you soon. Cheers, Sholta. Have a good one. Bye.

Your Ad Here

TBPN Live - Weekly Recap | Elon vs. Trump, Ukraine's Drone Attack, Cluely Update & OpenAI CRO

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.