TBPN Live - Grok 4 Launch Breakdown, OpenAI to Release Web Browser | Chris Paik, Will Bruey, Joel Becker, Dylan Parker, Eric Olson, Ghita Houir Alami, Elliot Hershberg, Karim Atiyeh

Starting point is 00:00:00 We are live from the TVP and Ultra Dome, the Temple of Technology, the Fortress of Finance, the Capital of Capital. Today we are covering Grok 4 launched. We're gonna break that down. The third browser war has begun. Every artificial intelligence company is getting in the game. Launching new browsers. Get yourself a browser. The new Volkswagen electric bus is a flop according to the Wall Street Journal. Ouch. Apparently Linda Yakarino was not fired for the grok dust-up with the crazy hallucinations that were going on We have more details there and well and I don't be oh, I would anyone think that she was considering that

Starting point is 00:00:38 That was the timeline that was for sure in the timeline people were talking about like oh like this happened And she stepped down within like six hours. Oh Really? Yeah, my read on it was clearly grok and X AI or not her domain and and They they were grok was saying some things about her that should never be said yes, and My read on it was, who knows? When Elon commented on her post and said, thank you for your contributions,

Starting point is 00:01:10 it's like the boilerplate text. And so I'm sure that their relationship is maybe not as good as it was day one. But I almost thought it was maybe the, you know, Mecca was the straw that broke the camel's back, and she basically said, look, like, you know, I can no longer, you know, bet my career on this platform. Maybe, yeah.

Starting point is 00:01:33 I mean, we can debate it. There's more reporting in the Wall Street Journal about what actually happened, and that story, which we'll get into, is kind of pointing to this idea that it had been doing the works for a while. And that was not the straw that broke the camel's back. That was like, the papers had been signed.

Starting point is 00:01:52 Everything had been signed before that. And then the dust-up happened with Grok. But the bigger news is that they actually got Grok 4 out, and people are excited about it, so we'll talk about that. And then the other Grok, GROQ, the CEO of which, of that company we had on the show. Is it yesterday? I'm losing track of time.

Starting point is 00:02:10 It was very recently, no, it was Tuesday. Tuesday. Apparently they're out raising it six billion and we have some more details on that company, so that's interesting. Anyway, let's tell you about ramp.com, time is money, save both, easy to use corporate cards, bill payments, and a whole lot more all in one place

Starting point is 00:02:25 They have a new agent launch today. Yep. We'll be joining later in the show for him to break it down So let's break down the grok for launch DD dos has a summary in saying that Elon Musk has pulled it off again Absolutely crushing the AI wars with grok for and we can go into some of the meta rushing the benchmark wars For sure, and there's a question about like, are we post benchmark? Does this matter? What's the real question to be asking here?

Starting point is 00:02:50 But there's a bunch of interesting takes. So just summarizing the core announcements, post training RL spend was equal to pre-training spend for this release. That's the first time it's ever been like that. I think when you go back to the original RLHF stuff that ChatTPT was doing, that kind of unlocked like, oh wow, this really, really works. I'm pretty sure the pre-training spend was an order of magnitude or two orders of

Starting point is 00:03:14 magnitude bigger. Now we are truly in this reinforcement learning regime. $3 per million input is tokens, $15 per million output tokens 256,000 token context window price 2x beyond 128 K it's number one on humanities last exam which interestingly was effectively like postgraduate PhD level problems but across a bunch of different domains. So everything from literature to physics. Yeah, kind of like the hardest SAT possible. Interestingly, I believe that benchmark

Starting point is 00:03:51 was created by Scale AI. And so Alex Wang is now at Metta trying to figure out how can we beat our own exam. And Elon's just like, I'm number one at your thing. Interesting dynamic. Yeah, the real test would be Elon doing the same problem set himself and saying, look. Well, yeah, I mean, I was talking to Tyler

Starting point is 00:04:12 about this before the show. It's like, humanity's last exam. It's like really good at PhD level math, PhD level stuff. But how often are you running into those types of problems? Yeah, I mean, I think that's the whole thing about there's this concept of spiky intelligence, right? Where it's like, okay, it's really good at this very obscure problem that I never deal with,

Starting point is 00:04:31 but if I have a super long kind of context window, there's no kind of long-term, it just completely loses its footing, and then it's useless. Yeah, we're kind of in like less of the benchmark regime and more of the agentic, like how long can the agent run? So it's like, we are in the 15 minute AGI regime. Maybe this is 15 minutes of like even better AGI,

Starting point is 00:04:58 but we want to go to 30 minutes. Well and Dwar Kesh on Monday, this, you know, takes me back to him talking about continual learning being the next problem that we really need to solve because it's great if you have a PhD level expert in your pocket that can solve any problem in any domain almost instantly, but if it can't learn and take feedback

Starting point is 00:05:19 and improve on certain tasks, then it's basically useless. If you had a PhD level, a PhD join your team to work on a specific problem, but it was hard restarting at the beginning of every single task with no prior knowledge, it would be almost impossible for that person to succeed. So, human still got it on that front.

Starting point is 00:05:48 But at the same time, if you are trying to just really establish yourself as at least an API for tokens that every business should check out against Anthropic or the OpenAI APIs, just saying, hey, we're on the frontier. Or Gemini, yeah. We're on the frontier is Gemini yeah we're on the frontier is a good way and they certainly proved that with a GQA graduate math problems at 88% the the really interesting news yeah the interest I mean it's worth calling out it's worth calling out so Grok got

Starting point is 00:06:18 number one on humanities last exam at 44.4% number two is sitting at 26.9% and then going down this list of all these different sort of challenges. They are consistently well beyond the second place. So they are at the frontier now of all these different benchmarks. Yeah. So Mike Newp over at Arc AGI says, zooming out on Arc progress. I'd say OpenAI's O-series progression on V1 is a bigger deal than Grok's progression on V2 so far. The O-series marked a critical frontier AI transition moment from scaling pre-training to scaling test time adaptation.

Starting point is 00:06:56 And this was the O-series progression, if you remember that, OpenAI was spending, it was like thousands of dollars of reasoning tokens generated in the test time inference to actually get a good score on the V1 of ArcGi. And so it had to think a ton, but it was able to figure it out, and at least it proved that throwing a ton of tokens and a ton of inference at a problem

Starting point is 00:07:22 and letting it cook, basically, wound up producing progress there. So that was kind of like a new, just a new paradigm. Says, whereas Grok 4 mostly takes existing ideas and just executes them extremely well, in my opinion, the notable thing is the speed at which XAI has reached the frontier. And that is really, it just can't be understated

Starting point is 00:07:46 that this is crazy. You put a post from Owen in the chat. I'll pull it up here. He says, Elon Musk is such a beast. I'm not even a pure fan boy anymore. How does he, he's a lot of swearing in here, Owen. Gotta keep the timeline PG. But how does he come out of nowhere with aaring in here on got to keep the keep the timeline PG But how does he come out of nowhere with a cold start late to the game and ship grok 4 and do it alongside everything else

Starting point is 00:08:10 He's up to he's launching new political parties. He's literally magnitudes above every founder. It's humbling So basically everyone is that it's almost like he was a co-founder of open AI You would have to you would have to you know be it you know almost be a co-founder over thereAI. Yeah, I guess he's returned. You would have to, you know, almost be a co-founder over there to be able to do something like this. Let me tell you about Graphite. Code review for the age of AI. Graphite helps teams on GitHub ship

Starting point is 00:08:35 higher quality software faster. You can get started for free at graphite.dev. If you want to ship like Ramp, get on Graphite. Yeah, Chamath was saying the same thing. Somebody in his reply says, seriously, how does this guy produce what he produces? Meta is buying talent at $200 million a year, and Elon keeps his people at a fraction.

Starting point is 00:08:53 It's mind blowing. Very deeply underappreciated edge for Elon, says Chamath. The retention of the best people happen when you can offer them a freewheeling culture of technical innovation, no politics, and few constraints. And people in the comments are like, no politics. What are you talking about? Yeah, can get a little political over there, but.

Starting point is 00:09:11 But probably not within the engineering org at XAI, right? Like it's probably just, okay, how do we build the biggest thing? Cool. Well, you can imagine the politics of like, who gets the best spot for their tent in the office. The tent. You know, there's a hierarchy

Starting point is 00:09:28 Proximity to the bathroom. I want to be directly under the air conditioning unit I want to be closer to my desk. The windows can be nice too So you can you know pull down your tent a little bit and get a little view morning light I wonder what the political structure is of the tent. The tent hierarchy. So is there, is there, is there democracy? Do they vote for who runs the tent city? I guess it's just a. The XAI tent city. It's probably just Elon at the top,

Starting point is 00:09:51 but does he have a tent? Something about San Francisco in tents. Yeah, very funny. But Swix has been chiming in saying like, we need community notes for LLM benchmark porn, because in the GROK 4 launch, they highlight this AIME competition math problem, and so Matt Schumer is basically saying,

Starting point is 00:10:14 AIME is saturated, let that sink in. GROK 4 got 100%, it made no mistakes on that benchmark, which is obviously very impressive, but there's this extra comment about the nature of AIME, and so it's a cautionary tale about math benchmarks and data contamination. Apparently, you know, predictions were that the models weren't smart enough to actually solve these,

Starting point is 00:10:39 but he says, I used OpenAI's deep research to see if similar problems to those in AIME exist on the internet, and guess what, an identical problem to Q1, question one of AIME 2025, exists on Quora. I thought maybe it was just a coincidence, so I used deep research again on problem three, and guess what, a very similar question was on math

Starting point is 00:10:57 about stack exchange. Still skeptical, I did problem five, and a near identical problem appears on math stack exchange. And so, at a certain point, if people put out a benchmark, then talk about it a lot online, and then that gets baked into the training data, you're just memorizing the results. You're not necessarily actually learning everything.

Starting point is 00:11:17 It's still cool, it's good. It's good to have everything memorized, but it really is not beating the knowledge retrieval, knowledge engine allegations. And it's, and we're not really in when Scott Wu was on the show earlier this year, he was basically saying AI will win an IMO gold medal this year. He felt very confident in that.

Starting point is 00:11:37 And I'd be interested to see how he thinks about, um, and I'm pretty sure new performance, pretty sure the IMO gold medal questions are public once the IMO happens. So every year they're developing new questions, but then they go out there and then they get memorized and the solutions become discussed and there's all the context around that.

Starting point is 00:11:57 And so yeah, it gets kind of baked in. So big question about how valuable are these. At the end of the day, it's really just about adoption. And that's why we were looking at the polymarket for the best, which company has the best AI model at the end of July, and XAI has just surpassed Google, which was sitting around 80% chance for a while, and then started dropping earlier this week, last week,

Starting point is 00:12:27 started dropping and now XAI is sitting at 48%, Google's sitting at 45%. Well, yeah, actually updating it's updating live. Google's back up at 49%. Is Google planning to launch something new in July? Because it feels like, it feels like this market particularly is more driven by Google's release schedule. Because Google might have something in the lab, but they like to release things at specific times. They have, it's a big company. They don't just like, who knows, drop it.

Starting point is 00:12:52 Gemini team Logan over there might be fixated on this polymarket being like, I need this. Yeah, yeah, yeah. Oh, during the wait, he was like, if you need something to kill the time, Google AI Studio. So, I mean, people were definitely memeing the production values on the Grok 4 launch, because it was supposed to start the time, Google AI studio. So people were definitely memeing the production values on the Grok 4 launch, because it was supposed to start at 8.

Starting point is 00:13:08 I think it went live at 8.45 or something like that, maybe a little bit later at Pacific time. And Eigenrobot was saying the production lines were terrible. This market is based on LLM Arena, specifically the tech leaderboard. So currently, they haven't fully updated it. So it's unclear right now Gemini 2.5 Pro is still at the top but I think the expectation is once they

Starting point is 00:13:30 get Grok up there it will be the top spot so we'll keep following yeah this market there's over 2 million of volume already on it so yeah it's so interesting that Anthropix not on this poly market at all because because people talk about them as having the best vibes, the best big model smell, the best interaction, and L.M. Arena's supposed to kind of test that with these A.B. tests, and yet doesn't seem to be performing there, but it almost doesn't matter because they're just focused on the business at this point

Starting point is 00:14:02 as opposed to the benchmarks. I don't know, it's all changing. We have a post here from Ben Hylek. He says, Elon Musk on AI. So during the presentation, a lot of people were critiquing the presentation, saying that it didn't feel super polished or whatever. I don't think that was the intent.

Starting point is 00:14:20 And it was pretty fixated on the models themselves, and what went into them, and what they're good at. But Elon did have this one quote in here where he says, and at least if it turns out, so he's talking about what kind of impact AI will have on the world, and he goes, at least if it turns out to not be good, I'd at least like to be alive to see it happen.

Starting point is 00:14:41 It's like, if we get the Terminator ending, I wanna be around for that. Yeah, I want to experience it. What does that say about his timelines? Because it's like, is he expecting that to be alive? Like, I feel like most people that have been in the Doom category have been like, the Doom's coming soon, not the Doom's coming in 200 years.

Starting point is 00:15:00 I didn't, I read into it more like, he will find it interesting if that is the outcome and And and it'll be entertaining less so like will I be alive when it happens kind of thing? But who knows there was another funny quote at the end of the art at the end of the presentation where? Elon kind of looked around at the very end. He's like Anyone else have anything to add and one of looked around at the very end. He's like, anyone else have anything to add? And one of the engineers goes, it's a good model, sir. And they cut it. Extremely online crew. Definitely on brand.

Starting point is 00:15:36 Well, Ben Heilach, as you know, he's been on the show. He's a designer, probably working in Figma. All day. Think big, think bigger, build faster. Figma helps design and development teams build great products together. You can get started for free at figma.com. And we have our first product coming out very soon

Starting point is 00:15:51 with Figma Make that Tyler has been cooking on. I've been very excited. He showed me, he showed me it and I was like, oh, like someone built the thing that we were thinking about building. And he was like, no, like I did this. Generated this. This is in Figma.

Starting point is 00:16:04 And I was like, this is like an iframe on another website that like already exists because it looks like exactly what we want But it looks so good like it looks like you work on it It looks like it looks like he worked on it for like a few weeks No, it looked like someone else did it it looked like it was a professional product that like stole our idea Basically, I was like, oh like someone else got to it. That's that was the vibe what I Yeah, well, how is, how has the experience been? I don't know if you want to leak exactly what you're working on, but.

Starting point is 00:16:30 Yeah, I don't want to talk about it too closely, but. How many prompts did it take you to get where you showed me? Yeah, I mean, maybe five. That's so crazy. This thing is so detailed. The design is super, it's really great. It's really good

Starting point is 00:16:45 Yeah, the fact that it came out looking like basically like 90 like 90% Yeah, yeah Yeah And and I imagine that there's probably like the last 10% if we were really strict about like it's got to be on this exact Style guy like that might be something where like, you know, Tyler winds up spending more time fit finalizing and customizing stuff But in terms of like just getting a functional prototype out, oh man, it was mind-blowing, it was awesome. I'm very excited about the age of vibe coding. This is an interesting chart from Tracy Allouay.

Starting point is 00:17:16 Been on the show. The cost to rent an NVIDIA H100 GPU hit a new low this week with annualized revenue at 95% utilization falling from 23,000 at the start of May to less than 19,000 today. So that's not that big of a percentage drop, but it is, but I mean it is a 20% drop. It's a consistent trend. It's a consistent trend.

Starting point is 00:17:42 I wonder how much of this is driven just by all of the frontier labs that are driving the most adoption or moving on from the H100 to the 200. I don't know what else would be driving this because if you can still get, like if you only take a 20% drop off of a full refresh of a new hardware, like they're not the latest and greatest anymore. It's a pricing drop, not a new hardware.

Starting point is 00:18:05 It's not the latest and greatest anymore. It's a pricing drop, not a utilization drop. Yeah, annualized revenue at 95% utilization. So this is revenue per unit. So utilization is still very high. It's the price that these Neo clouds are able to rent them for, which is dropping. Which tracks.

Starting point is 00:18:25 Yeah, yeah, I mean the market's more competitive than ever. There's more Neo clouds spinning up, and more people actually inferencing these things. And then, I guess this is the question of like, how stuck will certain workloads get? Like if you have figured out a great use case for an LLM in your organization, and it's something that's not one-shotting

Starting point is 00:18:50 your entire stack or whatever, but it's just like we have data flowing through our systems and we are going to use, you know, LLMs are gonna interact with every PDF that gets uploaded to our website or whatever. And so we're inferencing a lot. You might not need to put that on the latest hardware or update the hardware forever.

Starting point is 00:19:10 You might just be like, yep, it's llama three, it works. It's on H100s and it'll be on H100s forever. And that piece of our business will just stay there. Just like we have a Postgres database that works and we're not changing it every year. We're not changing everything. We're just like, we're just trying to cost optimize that and just hopefully the cost just comes down on that.

Starting point is 00:19:31 But like we've solved this particular problem, then we'll go solve new problems with new technology. So I think that's probably what's going on here. But it gets to the point of like, the biggest question with Grok is that like, the model clearly is frontier, it works, the whole fine-tuning on the actual X account is a crazy final step of system prompt and people were joking about that.

Starting point is 00:19:57 Like, oh, they're going to fix that. That's not what they're demoing today. They're demoing the underlying raw model, which is clearly like just engineering focused, as you saw in the demo, the demo which was just like benchmarks and stats. Turns out the secret ingredient to crushing every benchmark is to have the bunch of data from schizophrenic posters. No, obviously not.

Starting point is 00:20:21 I actually think it's the design of the RLHF stuff and the design of the reinforcement learning pipeline Tyler you got anything Yeah, I mean I think just like so far what I've seen on X like the overall response like five stuff Yeah, is that people are saying? Maybe it was a little too kind of overfit on the RL like VR like verifiable rewards. Yeah, like you kind of see this when even in the demo, I think it would sometimes respond in the answers with in late tech formatting.

Starting point is 00:20:49 Oh, sure. Which is like, OK, that means obviously they've trained a ton on math questions, stuff like that. Papers and stuff. Maybe people are saying maybe it was kind of bench maxed. You see it like 100% on AME is kind of crazy. It's like sauce. It's like you don't want to be too good.

Starting point is 00:21:06 Yeah, yeah, yeah, this is the thing about democracy. If you win like 80% of the popular vote, it's like okay, it was a blowout. If you win 100% of the popular vote, like probably not a democracy. I don't know. I mean, in theory these things should be able to do it, but I'm interested to know more if we dig into Arc AGI, is there more stuff going on there?

Starting point is 00:21:27 Are there any secrets? Because it does seem like kind of an outlier result. You can see it from this Aaron Levy post. Grok four looks very strong. Importantly, it has a mode where multiple agents do the same task in parallel, then compare their work to figure out the best answer. In the future, the amount of intelligence you will get

Starting point is 00:21:44 will just be based on how much compute you throw at it. I was joking with Tyler about this that the individual models are mixture of experts models so there's a whole bunch of parameters right and then the individual parameters like light up the different neurons based on an internal to the model router so there's kind of like the math section of the brain, the literature section of the brain. And so this was like one of the, this was one of the key breakthroughs in like GPT-4, right? Was mixture of experts.

Starting point is 00:22:14 People think, we're not super sure. Yeah, we don't still, we don't fully know, but that's like an internal decision that happens within the model to be like, let's go, this feels like a math question, let's go down the math path in the model. But then, Grok 4 is doing multiple, it's running the same model multiple times and then comparing the results.

Starting point is 00:22:34 And so now you have- Yeah, grading it. Yeah, you have multiple agents running mixture of expert models. You have mixture of agents running mixture of experts models. And the next thing is gonna be like, if you want the absolute best intelligence, you need a mixture of agents running mixture of experts models, and the next thing is going to be like, if you want the absolute best intelligence,

Starting point is 00:22:47 you need a mixture of companies. I send one prompt, and it goes to Grok and Claude and GPT and Gemini and a human. Yeah, I wonder how OpenRouter is thinking about this stuff. It is funny to think about the human version of that, where you give five engineers on your team built the same feature and and then kind of compare notes afterwards.

Starting point is 00:23:08 Wildly inefficient, but with software, when you can do these things very quickly, there's incremental cost. But you can have more confidence in results. I mean, it's basically like having a brainstorming meeting with the whole team, and just throwing up a question and being like, hey, we have this hard problem that we need to solve. Here's my idea.

Starting point is 00:23:26 What do you think? What does Tyler think? What does Ben think? So you kind of like go around the table. Everyone kind of gives their input, their various expertise. They kind of think through the problem in different ways, and then you can pair answers and everyone kind of coalesces around one strategy. This is like how work happens in the real world with a meeting. It's kind of the same thing, but certainly expensive to do that,

Starting point is 00:23:48 so it'll be interesting to see where companies, like how eager are companies to jump over to Grok? Because it seems like it's been a big lever for Microsoft to have Grok in the ecosystem as kind of a stocking horse for all the other models because Satya wants Azure to be very model independent, serve them all, they have the, I think they have exclusivity for chat GPT or GPT APIs

Starting point is 00:24:14 or they have obviously a great deal there with OpenAI. And so if they can have GROK 4 as well, that's another tool in the tool chest to be like this top layer. Sachi is in such a good position. It's probably not discussed enough, not much, just by owning those end customer relationships and being able to vend in whatever model is hot at that moment and give people optionality and still get 20% of opening eyes revenue, at least for now.

Starting point is 00:24:46 Yeah, he's also SOC 2 compliant. Of course. And if you wanna get SOC 2 compliant, head over to Vanta, automate compliance, manage risk, prove trust continuously. Vanta's trust management platform takes the manual work out of your security and compliance process and replaces it

Starting point is 00:24:58 with continuous automation, whether you're pursuing your first framework or managing a complex program. So yeah, EigenRobot was talking trash about the production values. I don't know about trash. They were just noticing. I didn't think it was that bad. I think it's really good.

Starting point is 00:25:13 Slides are worse than I'd create after getting into roped into a presentation with one hour notice. You can tell the engineers made them themselves. I think this is just a reflection of the culture. It's like a screenshot. Yeah, very clearly it's like screenshots dropped into a slide. But this is a reflection.

Starting point is 00:25:28 It's like this light mode screenshots on dark mode slides. So like, let's do black slides. And then you come with your white screenshots that are kind of like misaligned and not really evenly distributed. Like they didn't do like the distribute evenly or whatever, distribute horizontally. Still gets the point across.

Starting point is 00:25:46 And I think it's a reflection of their culture. And it shows what they care about, what they don't care about. They're not trying to be the most polished. They're just trying to be the best. Yeah. Igenrobot did a whole live tweet here. Yeah, so Elon was predicting the model will discover

Starting point is 00:26:01 new physics within two years. He said, let that sink in. Long silence. One engineer laughs awkwardly. Is that sooner or later than his previous timeline? Because he was talking about AI discovering new physics soon. I don't remember if he was saying. Dating it. Two years or three years or one year before.

Starting point is 00:26:23 Because this could be that he's still excited about this, he still thinks it's possible, but he thinks it's gonna take longer than he said previously, and that's kind of the more important update. I don't remember what he said originally. See if Grok can find out. But he was saying this at the Grok 3 launch,

Starting point is 00:26:36 that that is the goal. And if you can get there, you've kind of solved everything. And Sam Altman was talking about that too. That if you can create a super intelligence, that's probably the first thing that you'd wanna do. Is like, hey, go discover all the new physics and really help us figure out how the world works so you can solve, you know, fusion and all this other stuff.

Starting point is 00:26:58 I wanna be clear, I love all you guys at XAI, I only want the best for you, but I'm gonna continue to live post. Elon attempts to give a speech on alignment involving a very small child, a child much best for you, but I'm going to continue to live post Elon attempts to give a speech on alignment involving a very small child a child much smarter than you the monologue rambles with no conclusion In sight a pause. Yeah, will this be bad or good for humanity? He says the you know at least if it turns out to not be good. I'd like to be alive to see it happen Oh, yeah, they had a polymarket integration

Starting point is 00:27:21 That was kind of interesting. Yeah, it's interesting basically giving the model access to real-time polymarket data so that it can help make predictions and sort of add context around the market itself. Yeah, that's interesting. Yulan asking the real questions. You say that's a weird photo, but what is a weird photo? I still don't understand why we're

Starting point is 00:27:44 looking at weird photos of XAI photo, but what is a weird photo? I still don't understand why we're looking at weird photos of XAI employees, but they were charming. They're calling it Super Grok, crazy features, 16-bit microprocessors. What is, I don't even understand what this is. Oh, they built like a game in Grok. They had a demo of a video game generated by Super Grok. It's a Doom clone.

Starting point is 00:28:01 Every time the PC shoots an enemy, floating text appears reading Grokdom. Elon is fabricating timelines for product launches on the spot. The engineer sitting next to him is looking at the floor, face impassive, nodding. It's a good model, sir. For real though, congratulations on the launch, guys.

Starting point is 00:28:17 It's a good model, sir. I thought this post from the actual XAI engineer, Eric Zellickman, was funny. It was like AI model version numbers over time. Did you see this? So it's this chart of the version numbers over time and you can see that Grok is versioning fastest because it's like at this point,

Starting point is 00:28:35 what else are we measuring? Like at least they're iterating on the version number effectively as opposed, and I guess this is a shot at OpenAI because they launched 4.5 and then went to 4.1 and they're kind of like, you know, there's this big question about like, when will GPT-5 come?

Starting point is 00:28:49 The expectations are so high for GPT-5. And so they've obviously, the Grok teams are like, hey, at least every three months we release a new full number. So I wonder, the five is a number that really no one has like gone for. And I wonder if Grok will do it first. The five is a number that really no one has like gone for. And I wonder if Grok will do it first. Like if you draw the line on this,

Starting point is 00:29:10 they certainly should do it in like three months. They should have Grok five. And there's no reason that they shouldn't, but maybe there's some. And it's very possible that Colossus is the key. Yeah. To get into five. The new data center.

Starting point is 00:29:22 Oh, the new data center, yeah. Well, they'll need Linear to plan that out. Linear is a purpose-built tool for planning and building products meet the system for modern software Development streamline issues projects and product roadmaps. They linear app need linear badly Hopefully they've gotten signed up near said grok on Humanities last exam grok for I'm not sure I buy even in the general case that there's a given Humanities last exam number which implies you discover useful new physics. How would one make a benchmark of the proper shape for this? You'd have to have a validation set of questions which are outside the scope of what we currently

Starting point is 00:29:57 are able to do. You could choose things on the edge of our knowledge distribution and then try and exclude. Yeah, it is interesting. If you are able to memorize every hard math problem, does that allow you to discover new math? It's sort of a prerequisite because you have to- I think where I've imagined these discoveries coming from are having a single intelligence that has

Starting point is 00:30:25 PhD level intelligence across, like a single mind that has PhD level intelligence across every human domain, right? And being able to combine ideas from different domains. Like historically, a lot of innovation is just taking something from one field, bringing it over here, making some combination of it. I think Elon talks about the potential

Starting point is 00:30:46 of discovering new physics, but again, didn't spend a lot of time breaking down how that would actually happen, but world is unpredictable. So we'll see. Yeah, it's interesting. People are really pushing this idea of, okay, we are accelerating.

Starting point is 00:31:02 The Arc AGI leaderboard is accelerating, but I keep seeing this and and feeling deceleration like I am NOT feeling acceleration right now are you Tyler? Yeah I don't know I think generally I'm kind of like not that interested in a lot of these kinds of benchmarks like Arc AGI is more interesting but just like the Humanities last exam the kind of general math physics knowledge. It doesn't seem to be that like

Starting point is 00:31:29 It doesn't seem to line up with you like you see gbt 4.5 Kind of does very poorly on these things but like writing it does really great So like I think I'm I'm more like if I were good a long short on like different benchmarks like the usefulness of them I think stuff like HLE. I'm kind of short, long. I'm like, have you guys seen the Minecraft benchmark where it builds the two different? OK, you basically two models build like a Minecraft. There's like a prompt.

Starting point is 00:31:55 It's like build a house. Then you can choose. And then it's like their rank. The model's here and for the mines. But who's grading that? The human? It's a human who picks between them. It's kind of like a ELO. but just like general kind of creative tasks sure I think stuff like that Aiden bench is good

Starting point is 00:32:09 Yeah, I think even on the grok launch there was the vendor bench which was Aiden bench Aiden bench is in McLaughlin's benchmark. It's just like It's it's kind of hard to describe how it works exactly, but it's just various like creative tasks How like kind of novel its thinking is the like style of its text sure wait it's just like it's just like whichever one he likes the most at the end of the day he's the only grader no no there is like an objective like function okay you can like run it it's not just like okay the idea that they open again it will be funny you know there come there's a period of life where your SAT score like I was thinking which one. The idea that the. Open it again. It will be funny.

Starting point is 00:32:46 You know, there's a period of life where your SAT score matters a lot. Totally. And it says something about you. And then a decade later, it's what you can do, what you have done starts to matter a lot more. And so I do think we'll reach that point where it's like, yes, you can one shot every hard exam question there

Starting point is 00:33:05 is that you can throw at it, but like, what can you do for me? Yeah. Yeah, totally. And I think that's, I think that's why like the bigger question is almost like, you know, chat GPT, D I use and like, and like actual revenue, revenue and the final app installs and stuff. Yeah. I mean, the mean the revenue thing is interesting because you wind up in like B2B cloud world, which is valuable, but it's maybe less, it's like, it's more competitive because it's more commoditized. And well, yeah, if you don't have a lot of leverage in the enterprise, if Azure is able to offer infinite models that

Starting point is 00:33:47 are infinite frontier models, open source models that are maybe just behind the frontier, but great at certain tasks. The leverage isn't quite there. There will need to be another pretty significant leap until then. Anthropic being really good at code gen, there's leverage there.

Starting point is 00:34:06 We saw this yesterday with Lama switching over to Anthropic models internally, and then just having a consumer app with a lot of users, also very valuable. Yeah, the other interesting thing about the foundation model layer commoditizing and it becoming like cloud, and if you have a model, you'll just be like vended in as an API to anything else.

Starting point is 00:34:28 Like the token factory is that the hyperscaler clouds are extremely profitable. Like even though AWS, GCP and Azure are all somewhat directly competitive and they're somewhat perfect substitutes for each other, they have not driven prices to zero, in the way airlines are deeply unprofitable. AWS and Google Cloud are both profitable.

Starting point is 00:34:55 Yeah, or you look in other commodity sectors, like oil and gas. And I don't know if that's just because there's lock-in. I'm not exactly sure, but there's something about where, you know, maybe the counterintuitive take is that, yes, they do commoditize, and there are a few major foundation models that are frontier, and they all are roughly the same price,

Starting point is 00:35:17 but they all have decent lock-in with their customers to the point where they're still able to extract some level of profit, or they're just creating so much value that even if they're taking a small marginal slice on top of the cost to run, that they're creating so much value that they still have 50% margins or something like that.

Starting point is 00:35:36 Because this was the story of AWS. No one knew how much money it was making, and then they had to break out the financials in one of Amazon's earnings reports and it was like the AWS IPO as Ben Thompson put it. Anyway, before we get to the next story, let's tell you about Numeral HQ. Sales tax on autopilot, spend less than five minutes

Starting point is 00:35:56 per month on sales tax compliance. So the big news is that the third browser war has begun. Google stock has dropped on the news that OpenAI is planning to launch a Google Chrome competitor within just weeks, and this is very interesting timing because- It's time to browse. Yeah, time to browse.

Starting point is 00:36:18 Certainly makes sense to become deeper, more deeply integrated into the user's life. Makes a ton of sense. There's a ton of benefits that come from having a web browser. What was interesting is we can go into what Google actually launched, or what OpenAI is talking about launching, but this news, this scoop leaked the same day

Starting point is 00:36:38 that Arvind from Perplexity announced that they're finally releasing their next big product after launching Perplexity, Comet, the browser that's designed to be your thought partner and assistant for every aspect of your digital life, work and personal. And so Perplexity launched this on June 9th, and then OpenAI, the scoop goes out via Reuters the same day.

Starting point is 00:37:00 And so this feels like very much like, let's not let Perplexity get a bunch of attention and drive a bunch of people to start daily driving Comet the browser because Even though we're not ready to launch our competitor. We want me to Arvin was on the show talking about Comet But over a month ago. He said it was really important to the business. This was a big bet that they're making yeah, he And I'm sure both companies are racing to be the first to launch, but Dia, the browser from the browser company, also launched out of, or they're still in beta, but they launched like a month ago or

Starting point is 00:37:35 something like that. So you're not going to be the first. Oh, they launched a month ago with the Dia browser? That's interesting because I saw Riley Brown also posted the cursor for web browser and Dia browser. And I thought Dia browser launched the same day, but I guess it had launched earlier. Yeah, so anybody that was an ARC user can download DIA today and chat with their tabs. But interestingly enough, Perplexity's browser

Starting point is 00:37:57 and OpenAI's browser are both built on Chromium, the same open source project that underpins Google Chrome and Microsoft Edge. So the cool thing here, that means that they're compatible with existing Chrome extensions. Oh, interesting. OK, that's cool. Yeah, I want to talk to more people who were active and tech

Starting point is 00:38:22 during the earlier browser wars. The first browser war was Netscape Navigator versus Microsoft Internet Explorer. This is in the mid 90s, early 2000s. Netscape was super dominant and everyone loved Netscape. It was originally the Mosaic browser, this is the Marc Andreessen project. And then, but Microsoft bundled Internet Explorer

Starting point is 00:38:41 with Windows 95 and the distribution was so powerful that Internet Explorer actually wound up winning and became really, really dominant. But then there was this lawsuit and it went back and forth. But then basically by the early 2000s, Internet Explorer had over 90% market share, but then they got kind of lazy and stagnant apparently. And I mean, I'm not exactly sure what happened,

Starting point is 00:39:04 but there was a lot more competition. So Firefox, which was, I believe, like a spin out of Netscape, or kind of like some of the same heritage there, began getting traction. And then Google Chrome launched in 2008 and leapfrogged everyone. And Google Chrome was really focused on speed.

Starting point is 00:39:20 It was the fastest browser. And they did a whole bunch of work to optimize JavaScript so the pages would just load faster and run better on pretty much every computer that you had. And so, and then they had the open source project with Chromium and so they were able to kind of standardize the entire industry. And so everyone's always been trying to draw analogies between like the browser wars and the LLM wars and like what's the role of open source in that, like is open source a strategy

Starting point is 00:39:44 to wind up maintaining your dominance? How much does distribution matter? Chrome was probably pretty easy to distribute because every single person was visiting Google just every day searching. And so you just put this bar, hey, wanna switch to the faster browser?

Starting point is 00:40:01 And people just do it because you can have basically billions of ad impressions on your product every day. Will be interesting to see if chat GPT can get people to download their own browser on desktop. I mean I'm using chat GPT on desktop in Chrome all the time. Which chat GPT model would you want to use as a default search engine? That's the hard part because I always run into this problem where it defaults to 03 Pro, but that takes 10 minutes.

Starting point is 00:40:29 And so then I have to go to a 40. And then if I'm in an 03 Pro flow and I'm talking to 03 Pro and I let it cook for 10 minutes, it gave me a great answer. But then I want to just be like, okay, just like clean this up a little bit or summarize this or do some bullet points. I want 40 to do that.

Starting point is 00:40:43 So I have to switch over. So I don't know, I would imagine I'd go 4.0 as the default because I want speed. But even 4.0 could probably be faster before it truly replaced. Google's very fast. They've spent a very long time being fast. Yeah, and I could imagine them doing a similar project to,

Starting point is 00:41:01 I believe it was like the V8 JavaScript engine. They sent this team out to, I believe it was like the V8 JavaScript engine, they sent this team out to, I wanna say like Iceland or something. They basically sent like a bunch of engineers to like an offsite and they were like, just go optimize JavaScript for like a month. Just go focus on this for like a month or months and come back when it's done.

Starting point is 00:41:23 Like you have no other responsibilities than just like optimizing this like compiler. And they came out, came back with a VA JavaScript engine and created this whole like Node.js boom. People were running JavaScript on the server then. And I could see Google kind of doing something similar where they're like, okay, we have Gemini. It's good at looking stuff up.

Starting point is 00:41:40 It's a good knowledge retrieval engine. Go figure out how to make it load all the tokens for the full response in 100 milliseconds. And that would be very, very cool. And I wonder if that's like a uniquely Google advantage. Tyler, you looked something up? Yeah, it was in Denmark. Denmark, okay, I was close, I was close.

Starting point is 00:41:59 Yeah, I wasn't sure if it was Finland or Iceland in Denmark. Yeah, the interesting thing here, I'm realizing that tabs are definitely a light lock-in to browsers. It's not just the default, but if you have six to ten tabs that you've just had open for a really long time and they're like from a bunch of different things and you can't exactly remember what they were if you had to list them all off, but you know, I personally

Starting point is 00:42:22 end up using tabs as like somewhat of a to-do list. And so if you're spinning up a new browser and you don't have your tabs, it's like, oh, do I want to just like get rid of my tab stack? I have a bunch of tabs that just have stayed there for years and they're basically like, it's basically like a mini operating system, right? With like different apps that might be a Google Sheet or something else. Yeah, I know what you mean. So there's very real lock-in. I could bring all of those tabs over, but I have to then log in to a bunch of different services. And so it's really, really hard to actually win here.

Starting point is 00:42:58 I wonder if anyone's using, you know in Google Chrome, you can actually change the default search bar. You know when you type in the search bar and if you just type words, it just Google searches it, you can change that to search ChatGPD, yeah, you can pass in a query parameter and it can just do that, but I haven't heard of anyone actually doing that, and I used to have, I used to be such a power user of Chrome,

Starting point is 00:43:18 I used to have different code words basically, so if I typed like I space and then a query, it would go to IMDB and search that specifically. So you could have Chrome like route to any specific search. Any, so you could press like Y space and it would search Yelp or you know, anything else. But I don't know if people are doing that with Google,

Starting point is 00:43:42 with ChatGPT, I think people mostly just like control, command T and then hang out in ChatGPT. Well, we'll have to ask Chris in 15 minutes about get an update on the browser wars because he was an early investor in. I know one of those tabs that you have pinned right now. What's that? Adio.

Starting point is 00:43:59 Of course. Customer relationship magic. Adio is the AI native CRM that builds, scales, and grows your company to the next next level you can get started for free I've had adio open for Thousands of hours at this point. Yeah So signal kind of breaks it down with the open AI launching the rub web browser says this is the oldest plan tech fine product Market fit with a single killer use case then vertically integrate and horizontally expand until you control the interface layer itself, app, platform, once you own the interface,

Starting point is 00:44:26 you own the defaults. Welcome to the next generation of browser wars. Yeah, what's interesting is there, Sam Altman at OpenAI and just the fact that OpenAI is a company, there is kind of a mandate to vertically and horizontally integrate, figure out code, figure out research, figure out devices. But every company wants to do everything,

Starting point is 00:44:48 but then sometimes they run up against barriers. There was a time when Google was like, we want to win social networking and we want to beat Facebook and we're going to launch a direct Facebook competitor. And they did, and it didn't go well, and then they shelved it, and then they wound up

Starting point is 00:45:05 producing trillions of dollars in market cap just doing the thing that they do great. And so the question is like the surface area of OpenAI, they have to explore, they have to experiment. It would be stupid not to see if they could get a browser and a device and a chip and a nuclear reactor and everything and sand, get the sand, get everything. But there's no guarantee that they will win

Starting point is 00:45:28 the entire vertical stack and they will be the one company. I think my question is are these gonna be, like is OpenAI's browser gonna be an entirely new app other than their existing mobile app? Or their desktop app? Yeah, that is interesting. Because if they have to get people to re-download a separate app, then that's entirely,

Starting point is 00:45:50 they have a good fly. It is interesting that they wouldn't just evolve the apps that they already have installed. Perplexity, too. I don't know if Perplexity is planning to release this as a new standalone app, or it will be in the Perplexity mobile app but. Yeah.

Starting point is 00:46:10 Yeah I mean I know I think comments like it's own thing because we were looking to download it and we need a code and you can't just get it if you're just on Perplexity. But I don't know. All I know is that you should go to fin.ai, the number one AI agent for customer service, number one in performance benchmarks, number one in competitive bake-offs,

Starting point is 00:46:27 number one ranking on G2. So Arvin breaks down his philosophy of Comet, the browser that he's dropping from Perplexity, he says, you can either keep waiting for connectors and MCP servers for bringing in context from third-party apps, or you can just download and use comment and let the agent take care of browsing your tabs and pulling relevant info.

Starting point is 00:46:49 It's a much cleaner way to make agents work. So that is interesting. So I wonder how much puppeteering will be in this, because ChetGPT and OpenAI have operator that operates a chromium front, like a headless web browser basically, but you can actually see it working and it's clicking things.

Starting point is 00:47:10 And so if they're, like there's also the value of like the training data, if you're getting people using all these websites, you have all this training data of like, okay, they clicked on the blue button, they clicked on the green button, they saw this, they entered, this is how they dealt with this form, this is how they dealt with that form.

Starting point is 00:47:25 And so that feels like very, very valuable data if you can get it, so it's probably worth duking it out even if it doesn't, even if it takes a long time. For sure. I do wonder where else they will, where they will plug in, like Clueli operates at like a higher level of abstraction with like the screen scraping.

Starting point is 00:47:44 And I wonder if we'll hear rumbles about either perplexity or open AI thinking about like moving up the stack to that level not exactly sure anyway Dan Ivey's says we believe Apple needs to acquire perplexity for AI capabilities likely 30 billion dollar range would be a no-brainer deal given treadmill AI approach in Cupertino perplexity would be a game changer on the AI front and rival Chachi Petit given scale and scope of Apple's ecosystem so people have been talking about this for a while it feels like there were talks and then they kind of stalled out and and Arvind is it'd be a 300 and I think

Starting point is 00:48:23 this would be a 375x revenue multiple. Wow. I mean, the product sense is good. You use the product and like there's, like Apple hasn't been able to deliver on the product side. They have the distribution, but they haven't been able to get things out. We talked about this before though.

Starting point is 00:48:39 The most expensive acquisition Apple has ever made was Beats by Dre for $3 billion, which was a 3X revenue multiple. It would be a huge shift. And I don't know that, I think that Apple is embarrassed right now and feels a lot of pressure to deliver. I don't know if they're at the point

Starting point is 00:48:58 where they would pay $30 billion just yet. And even then it's like hard to integrate. Or even 14 or whatever their last private valuation was. Yeah, and the big question for me was like, perplexity is built on a lot of different clouds, a lot of different tools, a lot of different models. Is Apple cool with that stack? Because if all of a sudden.

Starting point is 00:49:20 Or do they want to just go to direct to Anthropic or OpenAI which they are in conversation to it. And every once in a while, these scoops pop up around perplexity and Apple conversations. And it's hard to read into that. Is this like rumor mill? Like, what's driving that rumor mill? Yeah, well, pull up the Mag7 chart.

Starting point is 00:49:44 I want to see where Apple and Google are sitting today. Apple at 3.2 trillion, Google at 2.1 trillion, and Nvidia's holding strong at 4 trillion. Not bad. Yeah, I mean, 1% of market cap, they're at 3.2 trillion, 30 billion dollar acquisition to be, to have an AI product that clearly has a good roadmap. Is it that crazy?

Starting point is 00:50:08 I don't know. Well, if you're making bets on any of the Mag-7, do it on public.com. Investing, for those who take it seriously, they have multi-asset investing, industry-leading yields, and they're trusted by millions, folks. So in other Apple news, they're preparing to launch the new version of the Apple Vision Pro.

Starting point is 00:50:23 They're just doing a slight iteration on the chip. They're moving to the M4 chip, and they're launching a new strap, which was something people were complaining about because the weight, maybe it'll be better distributed. People were switching out for the Pro strap earlier. Elon announced the America party, or I guess it came out on Monday, stock dropped from $312 a share all the way down to $291 a share. That is when Dave Portnoy, I think, market bought, he was saying, Davey Day Trader is back, if that's not a top signal, I don't know what it is,

Starting point is 00:51:01 but he was market buying like 10 million of Tesla, being like like I just think it's gonna go back up to where it was It's just been climbing since then it's back up to three hundred and eight dollars a share Almost almost recovered. It's up 4.2 percent today on brand for Elon and Basically gonna it looks like it'll just recover the price prior to the America party and Dave Fortnight. He literally was, his thesis was like, I think it's going to go back up to where

Starting point is 00:51:31 it was in about two weeks. And I'm going to make 10%. And I'm going to make a million dollars. I mean, that was your thesis on Nvidia. You were like, wait, it's down because of DeepSeek. Maybe it'll go back up. It was the most basic analysis. And it worked up. It was like the most basic analysis and it worked perfectly.

Starting point is 00:51:47 It was fascinating. Maybe that's broad. Yeah, you can see. Just the idea of like simple analysis and like not necessarily needing deep insight to call the market is good. I don't know. Who knows?

Starting point is 00:51:59 Anyway, this Apple story is from Mark Gurman. Of course, the master of scoops in Bloomberg. He's got another one. He's on his fourth or fifth this week. He's an animal. He's on an absolute tear. I mean, this one is a little bit minor. They're gonna include a faster processor and components that can better run AI stuff.

Starting point is 00:52:13 And so, not that Apple has any crazy AI stuff that they really want to run in there. I don't think that that's a major differentiator. I've been thinking about how, is AI a key unlock for VR? And I don't think so at all. I think it's much more about the content and the use case.

Starting point is 00:52:29 Entertainment. Entertainment, I think it's a replacement for a TV for to start and they need to just make it dead simple to use as a TV. I don't know, we got a demo of a VR product a while back and it had some very cool native AI features. So there's something there. But Apple's product doesn't feel like it's ready

Starting point is 00:52:49 for just wearing while you're making dinner. Yeah, so that version, the one that significantly reduces the weight of the headset, they're planning to launch that redesigned model for 2027, which feels so far away. I know it's only a year and a half, probably the end for 2027, which feels so far away. I know it's only a year and a half, probably the end of 2027, so maybe we're talking two years, but in the AI race, we're really like,

Starting point is 00:53:16 AGI tomorrow, AGI next week, AGI next month. I'm like, we can't ship a better, we can't slim down the headset and take off the screen and create a little lighter materials like this month, like let's do it. But hardware's hard and you know, the stuff takes time. So good luck to them. I'm excited for it.

Starting point is 00:53:33 I'm very excited for the next Quest. Do you still have a Vision Pro? I don't. I had it for a month, I took it back because I just wasn't using it that much. It was like heavy and I couldn't find it. And it had a bunch of things that like, you had to do these crazy workarounds.

Starting point is 00:53:48 I wanted just like an HDMI cable that I could plug into it and then just be like, okay, my PS5 is in VR now. And I couldn't do that. It was like, you had to like, pull the screen into the Mac and then screen share it in. There'd be latency, it was ridiculous. The thing, the use case that I still see is people using it on planes yeah but I just gotta check in with Tyler when give me your how many times

Starting point is 00:54:14 have you thrown on the VR headset in the last week did you play it last night break it down have you termed is collecting dust no it's I've been playing a lot of Call of Duty in the VR headset. Yeah. Okay. It's a lot of fun. There's like no latency. I'm kind of surprised. Really? No. And you're doing the cloud. It's really slow. You're doing the cloud. Yeah. Okay. And it's online. It's multiplayer. Multiplayer. So you play multiplayer and you play like the latest and greatest Call of Duty basically. Yeah. Okay. Like ops, like six, I think. Cool. So you have a controller and it's a big screen on the wall and you just chill

Starting point is 00:54:44 there. But walk me through it me through is like 30 minutes a day Yeah, probably like 30 45 minutes a day. You're fired This is this is true research so yeah, I mean I honestly think that that that the the the quest Xbox the meta Meta Xbox quest or whatever. I forget the name but like that I think that's more That's better news than like a processor bump on the vision Pro just like yeah deeper integration so that you don't have So you can just throw it on what's the actual time? To you know if you want to turn it on throw it on plus start get playing get into a lobby Actually get your first kill. Is that one minute?

Starting point is 00:55:25 No, it's like 30 seconds maybe? 30 seconds, it's fast. No, maybe like a minute. It's not like noticeably slow or anything. But you're logged in, you don't need passwords or anything, it's not like a hassle. Okay, that's cool. And I put the screen, it's funny,

Starting point is 00:55:37 like I have a TV in my apartment, but I just put the screen right where the TV is. It's like the perfect spot on the couch. It's a nice black square. Yeah. Yeah, yeah. I'm gonna have to get this back for you now. This sounds amazing. Now I don't have any time to do this, but.

Starting point is 00:55:50 But I feel like the next, what's on the feature roadmap that you would wanna see? Like Apple is bumping the neural engine and trying to upgrade the chip. I'm not sure that that's the problem with the vision pro what would you like to see out of the quest for I guess is the next one that's coming yeah I think the main thing so I've tried the the vision pro and basically I mean the visuals are just

Starting point is 00:56:17 like vastly superior it's it's it looks so much better okay then even though it's so it's it's the quest better screen in meta quest 3x Xbox edition. That's what I have. Yeah. And the screen is just way better. Okay. I think that's, I would say that's the main thing. So if they can just go find the supplier, if meta can just go find the supplier for the vision pro screen and put it in the quest for you'd buy it yourself. Depends on how much it is kind of broke. I would definitely be inclined to not have

Starting point is 00:56:46 not a few drop out and go full time. You would speed run all of Halo in order to potentially win one? I would do that. You would do that? You would do a very difficult challenge in order to potentially win one because you would you would want it. Yeah. Okay. No, I think that's that's I would think that's definitely the main thing. Are there any other are there any other nice tabs that that you think might? might shift people I don't know. I mean the it's it's very light. It's way lighter than the the vision Pro. Yeah but still I mean I feel like light is very relative like It's light to the point where you can do 30 minutes or an hour

Starting point is 00:57:21 I you probably can't do like a full day or like four hours. Or any kind of like workout stuff. I think I definitely would not do that. Totally, totally. But what I'm saying is like- You're not training neck enough. I knew guys at UCLA who, I didn't go there,

Starting point is 00:57:37 but like friends that went there, and they were so obsessed with Call of Duty that they would take a bunch of stimulants when the new Call of Duty came out and play it for 24 hours straight to get the max prestige because they were so addicted to Call of Duty that they would just stay up all night chugging energy drinks and and just to just to beat that and I just don't think you could do that in VR I think after like two or three hours right now it's like too much and you have to take it off and get sweaty and

Starting point is 00:58:02 tired but so so I feel, I feel like screen first, then probably even a little bit lighter, a little bit more comfortable and then just drop the price as low as possible. Because if the next one was a hundred bucks, you'd probably buy it. Right. Yeah. And I think stuff like, like maybe I'd want another screen, just this monitor, but that's just an issue with the, with like the visuals. It's a screen. Yeah. It's got to be, it's got to be competitively priced with the TV.

Starting point is 00:58:27 And the TVs are so cheap now that you've got to just be like, yeah, I'm just picking one up. Or the price of AirPods, or the price of, you know, it's got to be down in low, low hundreds of dollars to really ramp that up. But I don't know. It'll be interesting. Anyway, our first guest is here.

Starting point is 00:58:43 Let's tell you about AdQuick really quickly. Out-of-home advertising made easy and measurable. Say goodbye to the headaches of out-of-home advertising. Only AdQuick combines technology, out-of-home expertise, and data to enable efficient, seamless ad buying across the globe. And we will welcome Chris Pike to the show. Welcome back, Chris.

Starting point is 00:58:57 Fantastic to have you on the show. Thanks so much for taking the time. Great to see you. Last time we got cut off. Great to be back. In the temple. Last time we got cut off. Great to be back. In the temple. Last time we got cut off, we were having to jump and I was like, I wish we had another hour.

Starting point is 00:59:09 So at least we have another 30 minutes here. Yeah, that's great. To get into it. First off, what's top of mind for you? Have you been tracking anything in the news that's kind of updated your thinking? We were digging into Grok 4 and seeing, is this an update to agent you know, agent timelines?

Starting point is 00:59:27 It seems pretty great in the benchmark, but is there anything else in the last week that's been like, Oh, I can't get enough of this story just in your world. Great question. I feel like every week is a total blur. Uh, it seems like we're all waiting for not just these foundation models to come out, but like the next open source models, the big open source models to come out. I think that that's super interesting to me. The proprietary foundation models obviously are the frontier of research, but they're

Starting point is 01:00:01 relatively inaccessible from a technology perspective because they're Fundamentally rent-seeking you can't run them on your own hardware. You can't It's significantly less accessible And so I'm I'm kind of waiting for the next generation of open source models Yeah one maybe Underrated or under analyzed grok for thing that happened last night was I don't know if either of you saw this but they they did this voice demo and they were like really pushing the Accents really far. I don't know if any of you saw this dialer. Did you see this?

Starting point is 01:00:36 Yeah, and there's the the whispering the whispering and so ASMR. Did you think it was uncanny valley Tyler? I felt very uncomfortable. Yeah. But at the same time, I think, I think it's a path where we're in the uncanny valley, just like we were with like six finger hands and stuff. And when they actually sort out the accents, the whispering, the intonation, the cadence, it's going to become a much more addictive companion potentially. So I want to, I want to bridge to your piece and talk about

Starting point is 01:01:08 the different use cases that you see people might kind of flow into with these like chat companions because you mapped out way more than just the normal take in my opinion. Yeah, well, so I guess it's worth asking ourselves, where do we want other humans to exist? And where will we accept substitutes? I think the last time we were talking about it, if you can imagine a situation where humans are getting in your way of doing something, then you hate them, right?

Starting point is 01:01:48 Imagine you're in traffic, in gridlock traffic. That's the most misanthropic you could possibly be. You're like, if none of you existed, I could just get where I wanted to go without you being here. But then there are just total other times where we would refuse to accept anything other than humans as that thing. I know that it's very popular of value from AI companions, that that value isn't real. But at the same time, I think that when it comes to the allocation of, let's call it like allocation of leisure hours. We really care about other people. Whether it's like, I think last time I was talking about like going to fine dining or reality TV.

Starting point is 01:02:56 I think I mentioned that like we have chess software that is way better than any human will ever be. And it's not entertaining to us. It's not entertaining to us. Cause that's sort of like a, there's no drama. There's no emotion. Exactly. Exactly. And so, um, sorry, I, I, I, I'm, I'm just thinking about like the shape of companionship because in because in your most recent piece,

Starting point is 01:03:25 you call out the imaginary friends that kids have, Calvin and Hobbes, Toy Story, these stories resonate because they poignantly depict how colorful whimsical placeholders of our childhood slowly fade as society offers real alternatives. And I'm just thinking about like kids love imaginary friends, but they also love multiple IPs essentially. Like they like Batman and then they also like Spider-Man.

Starting point is 01:03:52 And so I'm wondering, there's been this narrative for a few years in AI of like don't build a GPT wrapper cause you're gonna get rolled. There's gonna be immense concentration of value, and there will actually be no middle class in this ecosystem. And I'm wondering, because there's this company, Tolan, that's kind of imaginary friend, AI driven, and I'm wondering how we might actually,

Starting point is 01:04:22 is it possible that we're heading towards something where people are essentially developing new IP I'm wondering like how, like we might actually, is it possible that we're heading towards something where people are essentially developing new IP, and yes, there's still a power law in the companionship market, but these models and these products are like much more opinionated to the point where there actually isn't a one product

Starting point is 01:04:44 to rule them all. And there's a variety of products that fit into different holes, just for companionship, but also even just within like the imaginary friend hole, there's 25 different options and yeah, there's one that's popular, but then there's one that's one tenth as popular and one hundredth as popular. But yeah, we react to that. Yeah, it's a really interesting, let me first go back to like the this rapper concept. And I think it's really

Starting point is 01:05:12 important to distinguish when rapper strategies work and when they don't work. I would argue that the rapper strategy works the best when the underlying infrastructure is purely commoditized. You can choose across many different options. Where the wrapper strategy doesn't work is if you're basically building on top of a monopoly and that underlying landlord is basically just going to be increasingly rent-seeking and squeeze you out of all margin. So sort of embedded in the wrapper strategy is the assumption that over time, you're going to be able to distribute your product on top of like increasingly commoditized infrastructure.

Starting point is 01:06:02 So for example, like Snowflake, Snowflake launched actually just just on Amazon. And then over time distributed its product across Azure and Google. And actually in doing so was able to expand the margin capture of its own product because it had a, and the margin capture of its own product because it had a, their vendors were competing to be their underlying customer base.

Starting point is 01:06:32 Going back to like this open source notion, this is actually why I am so interested in open source. The more that we have competitive fungible models, the more that the application layer on top can really flourish. The more that we'll see distinct unique applications and kind of until we start to maybe S-curve near the top of the frontier.

Starting point is 01:07:04 I'm sure you guys are familiar with the, it's really popular essay, The Bitter Lesson, which is basically the, you know, no amount of fine tuning or no amount of specific training is actually going to outcompete just the fundamental advances when it comes to like more games with more compute. But if we start S-curving, if we start, if these scaling laws break, which it seems like they are breaking on pre-training and test time compute and things like that, all of a sudden you have

Starting point is 01:07:39 largely spongable similar capabilities at that commoditized layer. And then we can really start to see the application layer flourish. Yeah. I, my interpretation of the bitter lesson right now is that the, the impact of AI should be tracked less in benchmarks and less in individual tests of one model and more in the actual like volume of inference tokens being generated by humanity and it's fine that The that we don't have one central AI doing all of the work if we just give everyone an AI Copilot for every single task. They all get better and, and we'll build more data centers

Starting point is 01:08:26 to inference more. And eventually that will compound and compound and compound until the, until the, the, the overall impact of AI is remarkable and, and unmistakable in the same way the internet has, but it won't be this like all of a sudden we unlock this one incredible algorithm. Yeah, maybe. So I'm going to I'm going to try and map a comparison that's probably wrong for any number of reasons. Like, let's say let's say we ported the bitter lesson to the rollout of PCs, right? I think there's been a lot of comparisons of like AI feels like a new computer.

Starting point is 01:09:10 So like let's map the bitter lesson to PCs. Maybe the analog would be, hey, don't work on building software that optimizes for like a computer speed of like 50 megahertz, because the computer that comes out that's like 100 megahertz or 200 megahertz or, you know, a one gigahertz is going to be, you know, it's just going to blow out whatever software optimization you've achieved at that compute stop. Um, and so I think about it a lot like that. Now, I think this really begs the question of, okay, well, what happens when the vast majority of our use cases are satisfied by the compute threshold. You know, I feel like, you know, everybody's like,

Starting point is 01:10:10 who needs the next gen version of this because largely all of the applications that you can, that you use or you can run work. And so it's very clear that we're in this like rising part of the S-curve, but when the S-curve starts to taper off, that's really when it comes in the question of, okay, well, how do we think about the value delivery that sits on top of the underlying rent capture from these foundational models? You could think about it also, like video game consoles, the amount of creativity that video game developers are limited by is actually how advanced the video game consoles are,

Starting point is 01:11:22 and also how expand, like what the install base of the video game consoles are and also how, like how expand, like what the install base of the video game consoles are. And so I think one of the, one of the challenges right now is like, we have so few developers of AI applications. Like they're, they're still like, you can kind of like count them on maybe a few sets of hands, right? Which is insane. Wait, really? I feel like there's like thousands of startups that count in like the AI software developer world. I see market maps every single day. Um, I'm sorry. Uh, there's like 10 in every B2B category. Like, uh, well, okay. So maybe like, let's,

Starting point is 01:12:00 let's split the world between like enterprise use cases and consumer use cases. Oh, sure. Sure. So enterprise, I would, yes, a hundred, a thousand percent. Maybe like, let's split the world between like enterprise use cases and consumer use cases. Oh, sure. So enterprise, I would, yes, a hundred, a thousand percent. There's a lot of companies that are, it's a blue ocean sprint to vertical value delivery within different sectors because you're largely swapping. Um, you are, uh, the, the spend, the addressable spend is, um, is OpEx, which is insane. Like that's crazy. You're at your retin opportunity is just like headcount spend.

Starting point is 01:12:36 Um, and so that's for sure. What, what also legacy tool spend, like it's all the OpEx, like to your point, I mean, I guess not like real estate or rent or something, but basically everything else Yes, which is by far the biggest cost center for any company Yeah, of course anyone who's ever run payroll or anyone who's ever scaled a company is like man like humans are expensive Yeah, humans are so expensive and I mean I

Starting point is 01:13:04 This is this is why I feel like And I mean, I, this is why I feel like everybody says that AI is the best candidate, or the best argument that UBI is coming. Sure, sure. Yeah, Jordy? I think the picks and shovels meme became too dominant and there were so many over the last decade or so, there were just so many amazing outcomes of people building infrastructure and like very visible

Starting point is 01:13:29 outcomes and it became cool to build infrastructure right like the Collison brothers made building infra cool and you have like Parker Conrad is like a folk hero sure right and like rippling it's like like. And you have the ramp is a good example of this. Corporate spend management should not be cool. They've built a really cool culture around it. And there's this weird kind of pervasive meme of like, you have two years to escape the permanent underclass. And so I think people are like well

Starting point is 01:14:05 I'm not gonna just build something weird and fun I'm gonna build enterprise sass so I can make you know so I can escape the permanent underclass And so I don't think there's been enough weird fun Attempts from people like Toland's the one you brought up earlier is cool like it's pretty rare not the most rational thing to say You know if you just want to build a big business, to be like, I'm going to build a little alien AI friend. But clearly, there's demand for that.

Starting point is 01:14:31 And we were talking with Scott Belsky yesterday around just wanting new, fun, weird consumer use cases. And I feel like, I think what you're getting at is, that whole area is like relatively under Underexplored to date where like we've had a bunch of we had two browser announcements yesterday, and they're both built on chromium and That's exciting and cool, and we should we should talk about the Potential for new browser wars, but I think like think the number of people that are saying,

Starting point is 01:15:05 I'm actually, yeah, you could call it a rapper, but I'm actually trying to create something entirely novel. The example we gave yesterday is a dating app based on a digital twin that is just constantly dating other digital twins. And I haven't seen, I'm sure somebody's working on that, I haven't seen it yet.

Starting point is 01:15:22 But pick any popular consumer app category, and there's probably a way to entirely rethink it with this sort of LLM as a new computer at the core of that ecosystem. Consumer just seems so much more risky, because it's either a billion dollar outcome or zero, whereas in enterprise it feels like, well, there's no way it's gonna be a zero.

Starting point is 01:15:43 It's gonna be a 10 million dollar outcome or a 100 million dollar outcome or a billion dollar outcome, but it it's gonna be a zero. It's gonna be a $10 million outcome or a $100 million outcome or a billion dollar outcome, but it's not gonna be a swing for the fences. So then you have 10 companies that each have 10 to 50 million doing the same thing. And we've even seen this with the story of Slack, where they were trying to go for the hits driven business with the game, didn't work, and then they went into SaaS

Starting point is 01:15:58 and it worked. Anyway, sorry, Chris. I think that an additional challenge with consumer is inference is not free right now. Somebody has to pay for the inference bill. And so until you can run inference on device, it's still like every developer is doing the mental math in their head of like, how do I,

Starting point is 01:16:17 like if any one of your users can run you out of house and home if they abuse your service, how do you build on top of that? That's crazy. One of the opening-eyed researchers, I think, in Zurich that's going to Metta posted, like, I didn't realize that I had this thing running. It was like $150 a day, just like every single day.

Starting point is 01:16:38 Luckily, I think he'll be fine. He can afford it. I did before we fully leave,, what do you think of? Do you think that AI companions and LLMs broadly present a real threat to traditional social media, like the idea of a companion? A lot of people in our world are using these tools very functionally, like for doing research,

Starting point is 01:17:03 or getting answers, or understanding topics. But a lot of people are using them as companions and that is somewhat of a social entertainment experience. And you see these charts ticking up of kind of user minutes in LMS. So, chat GPT went from about five user minutes per day to over 30. There's no- Around 30. Around 30, I think it's a 28, 29,

Starting point is 01:17:27 over like the last six months, there's no blip on any of the other social networks. They're not declining yet. But personally, I'm finding that if I'm doing research on a topic, I used to go to YouTube, I used to go to Instagram, and people famously like search TikTok for answers to things, and some of that is shifting over but but we and if you think that an analog Sort of anecdotal, you know

Starting point is 01:17:53 Sort of experience for me is when do I use social media? The least is when I'm with my family, which is like companionship It's the social time or when I'm with friends at dinner, hanging out. It's like rude to be using, there's no point to go hang out with a friend and then use Instagram the whole time, right? Totally. I think I subscribe to this sort of cutting of our time as like we're either allocating labor hours

Starting point is 01:18:23 or we're allocating leisure hours. So we're either trying to be productive or we're trying to enjoy ourselves. And so I would say that all leisure allocation effectively competes against each other. I think maybe it was like the Netflix CEO that said that they weren't competing with HBO, they were competing with Fortnite. That's largely true. We only have 24 hours in a day, we only have a finite amount of leisure hours.

Starting point is 01:18:53 So if I'm allocating one hour to this leisure activity, if I'm watching an episode of Love Island or whatever, that's time that I'm not going to be able to allocate to a different kind of leisure activity. So to that end, I would absolutely agree that AI companions almost certainly firmly in the bucket of leisure, more consumption of leisure equals less consumption of other leisure activities. It's really zero sum. The only thing that is going to make it non-zero sum is a fundamental advance on productivity

Starting point is 01:19:30 that allows the leisure pie to be even larger. So, maybe we have, we definitely have more leisure hours as humanity now than we've ever had in the history of humanity. Let's give it up for the leisure hours. Like proto-human zero leisure hours. Yeah, yeah, fun time. Now it's just like amazing, amazing leisure hours. When it comes to labor hour allocation or like research or utility. I would say like that doesn't necessarily encroach against time on social media. And I think that all social media, whether it's YouTube or TikTok or

Starting point is 01:20:16 Twitter, they carry, they care less about helping people get work done and they care much, much more about absorbing as much of attention as possible, which is why the algorithms are so insidious of serving you exactly the kind of saccharin thing that you want to consume next. I wonder if there will be an incentive. I mean, there will obviously be incentive, but I wonder how it will play out in the LLM chat bot interface because right now open AI and basically anyone who has a dominant consumer AI app is probably seeing user minutes increase just naturally without putting in like growth hacks or retention loops or, you know, but you could imagine a world where

Starting point is 01:21:08 to get from 30 minutes to 60 minutes, the LLM has to not just give you the response, but surface, hey, would you like to follow up and learn more about this? Click these buttons. That's already kind of happening. Yeah, it starts surfacing you stories that it knows you're interested in

Starting point is 01:21:24 by what you've asked it about in the past. Let's give you a new breakdown, kind of pre-populating a deep research report. It is funny that the push notification hasn't really quite hit consumer chat apps. Yeah. And it undoubtedly will. I had this thesis that push would be very important

Starting point is 01:21:42 versus like pull, like you have to go to Chatt GPT and ask it for something and someone is going to solve kind of like, it's almost like an AI driven newsletter or something where it understands what you're interested in and then generates the report before you can even ask it because it knows that if Ferrari drops a new car, I'm gonna want a table of all of the details because I like consuming information that way,

Starting point is 01:22:06 in addition to just hearing commentary and watching the Doug DiMiro video about it. But OpenAI could pre-populate that and just send that to me. But I don't know. How do you think that's going to evolve? Question for you guys, do you think we're going to pay for AI services forever? I've been asking that a lot.

Starting point is 01:22:27 I mean, I think the more important question is will the average American pay for an AI, like an LLM? And then will they pay for multiple? Like, the comp for this is in streaming, where Americans. I was about to say Netflix. A lot of Americans pay for multiple streaming services, but they are incredibly ruthless about canceling them on average.

Starting point is 01:22:54 Like a lot of, I'm sure a lot of people listening to this have like had some streaming service billing them monthly for years that they haven't even watched. But the average American is like, I'm not getting a lot of value out of HBO right now. I'm going to cancel, even though they might sign up again in like four months. Next time there's a hit show that they're going to watch.

Starting point is 01:23:13 And I think that there's not a, right now, there's this incredible demand for what's new and what's best. Right? Like, Grok will drive a lot of signups today because it is, the Grok 4 Heavy is like a meaningful advancement, but I went and I haven't, we've been busy this morning, I haven't had a chance to sign up and play around with it yet,

Starting point is 01:23:38 and I was getting plenty of value from Grok 3. Like, I was able to just search and get it. Yeah, I was asking Grok 3 about Grok 4 as well, and it was actually doing a pretty good job, which is funny. And so, and I'm not, yeah, and so I'm not, I'm not, I guess I'd probably get it through Ax Premium, but. So I have some data here.

Starting point is 01:23:53 Netflix made 39 billion last year. They're on track for 44 billion this year. 1.8 billion of that was ad revenue last year. It's estimated to be around four billion this year. 1.8 billion of that was ad revenue last year. It's estimated to be around four billion this year. So their ad revenue's doubling while their subscription revenue is growing by 5%. And so my takeaway for ChatchiePT would be, I would imagine that the ChatchiePT paid subscriptions

Starting point is 01:24:23 follow an S-curve, and we get to something where we see OpenAI making, I don't know if it'll be 10 billion or 40 billion, but they will soak up a ton of subscription demand for ad-free frontier models, the most advanced, the most expensive stuff, and then ads will eventually become the dominant revenue driver, but I feel like the subscription revenue will be a really hard tap to turn off just from, hey, people are paying and it's a lot

Starting point is 01:24:52 of money and we don't want it to go away. Even in the enterprise, it does. My bet is that we go to continue down this trend towards paying for outcomes because a lot of people will just say, well, I don't want a subscription for this service because I only use it every now and then. And when I get value from it, I'm happy to pay for it. I think a question to ask is, would you pay $20 a month for Instagram today to not have ads?

Starting point is 01:25:16 And I would actually have to think about that for a while because like once a month, I get an ad for something that looks interesting and I discover a product that I wouldn't have otherwise discovered and I buy it and sometimes it's great. So do I want to ad for something that that looks interesting and I discover a product that I wouldn't have otherwise discovered and I buy it And sometimes it's great. Yeah So do I want to just completely eliminate that and rely entirely on random organic or do I actually like that? This ad platform is spending a bunch of time and energy trying to serve me the next product that I'm gonna like Which is like actually a service and like it's it's it's not a bad trade at all

Starting point is 01:25:44 totally, I think we a lot of the way that we go with our feet and what we're both with our wallets is that way we were super happy to pay with our time and our attention if given the option yeah I actually like the vast majority people people won't, not only that, but people are more than happy to give up their attention and privacy for that matter if it can save them money or instead of paying for something. One thought experiment I like to think about is

Starting point is 01:26:22 just how much people will trade privacy for value. Imagine a checkout flow where you could get $5 off if you enter your social security number. Like how many people do you think would enter their social security number? Like everyone or like 90% of people. Like an insane amount of people. And so people really don't value their privacy

Starting point is 01:26:47 as much as I think like maybe we say people value their privacy. And in aggregate, obviously that data is super monetizable. It's interesting, you know, search obviously is the big prize, I think, for AI. And what I mean by search, it's intent. If you're the arbiter, or you control this fire hose of intent, you can benefit by metering it out and having people bid for that intent.

Starting point is 01:27:23 Obviously Google, Google's like the best business, the best business model maybe ever invented. It's kind of insane. What's interesting is the most valuable searches, maybe like not what people think and certain kinds of searches are totally worthless. So, and certain kinds of searches are totally worthless. So, knowledge-based search, like fact-based search,

Starting point is 01:27:51 things like... What is the market cap of this company? Right, right. It's like, what's the market cap of this? You know, who won this game? Or like, you know, who was president in like, you know, 19 cap of this? Who won this game? Or who was president in 1936? Yeah, they're dead ends. They're dead ends.

Starting point is 01:28:10 You get the fact that we leave. No value. Zero value. Actually, there was a Google out subpoena that had actually shared some documentation of what their most profitable keyword searches were. It's super interesting. I suggest people to go check it out.

Starting point is 01:28:27 The number one most profitable search for Google was just the word iPhone. No way. That's amazing. Because if you think about it, like what does that mean? What is somebody telegraphing to the market when they search the word iPhone? They're like, they're basically saying, in not so many words, I

Starting point is 01:28:49 am ready to spend $1,600 on a smartphone. Yep. And and who's who's who's interested in like, jockeying to get that person's attention? Well, Apple has to write Best Buy as a retailer is interested Samsung wants to then Verizon, AT&T, T-Mobile, the networks. And so when you think about like, where is valuable search? It, the value of search is often misunderstood because you have to really think about

Starting point is 01:29:19 how to capture the most valuable intent. And not all intent is equally valuable. There's a bunch of search that's garbage. You actually don't want it. Fact-based, the most, the cream off the top of fact-based search would be what I would call like comparison shopping. So it's like, what's the best headphone? What are the best headphones?

Starting point is 01:29:45 Or, you know, and maybe, maybe you can slice off a top of that revenue pie. But there's a tension between that and the objective, the objective truth that you're serving up the user. The by far the most valuable search is not fact-based and it's relatively, it's where the user kind of knows exactly what they want and they're trying to do it and other people are willing to bid to get in their way and run interference.

Starting point is 01:30:20 While we have you, how are you thinking about the new, I wouldn't call it a browser war yet But a but a potentially a skirmish eating up dia dia Release to arc users probably feels like a maybe a month ago at this point at least a few weeks And then we have open AI Potentially coming in with a new standalone app sort of unclear It was unclear to us whether this is gonna be a new app that you download or just integrated into the existing chat GPT mobile app.

Starting point is 01:30:49 And then Perplexity as well does have a separate standalone app that they're pushing now. And it feels interesting one because the browser company has spent now years working on the browser, trying to figure out what is going to enable unlocking more value out of this portal to the web and effectively an operating system. And so meanwhile, new players are basically

Starting point is 01:31:19 being like, we want to have a browser, and just going, we're shipping. We just got to get this out there. So I think it'll be interesting. The next month, two months, I think will be very interesting. But I'm curious what you're looking at. I could not be more excited.

Starting point is 01:31:35 I think I'm obviously biased. We're investors in the browser company. I'm a daily user of DIA. I personally get a ton of value from it, particularly the custom skills. And I think that the browser company has always known that this is a really valuable position. And it's like, honestly, just validating

Starting point is 01:32:03 to see incredible company, you know, opening as a great company, Perplexity is a really amazing, formidable company. Also recognizing that this is a really valuable position to play for. I have supreme confidence that the team at the browser company is the most talented, the best instincts, the best nuanced understanding of interaction design and how to create and craft a great product regardless of the underlying model or technology that underpins it. I'll be very curious to see how it plays out. My instinct is that OpenAI is an incredible foundational model company. Maybe I've seen them ship a lot of different products.

Starting point is 01:32:53 To my knowledge, JAT GPT is really the only product that's quite stuck. And it's not really even like the interface design so much as it is the underlying power of things. Can't your question could not be more excited. Thanks. Like this is going to be an amazing check back. I mean, check back in a few weeks. I'm sure there'll be a lot more. It's going to be a huge plug for Dia. If you haven't downloaded Dia, it's on Mac. It's it's available. Please download it.

Starting point is 01:33:24 It'll it'll blow your mind it blew it blew my mind all right great talking as always wish we had more time before bringing well brewery from Varta let me tell you about wander find your happy place find your happy place book of wonder with inspiring views hotel great amenities dreamy beds top tier cleaning in 24-7 concierge service It's a vacation home, but better folks and soon with Varda. You'll be able to maybe they'll put a wander in space Let's bring in will brewery from Varda. Is this your first time on the show? I feel like this is a disaster that we are finally rectifying. We did it

Starting point is 01:34:05 disaster that we are finally rectifying we did it we made it thanks for having me finally we've had that other the kind of knockoff version of you at Varta the other guy he's been on the show a ton something I think I forget yeah yeah great to finally have you on amazing massive day break it down for us what's the news are we gonna make Geordi stand up in? Yeah, oh you have the gong that's both we have the gong so we have the gong for for big moments You know either fantastic hands or you know You know sell a mission to a customer stuff like that and yeah, we love that. Yeah, what's the record? Here's here's my advice to you record every hit I want to see a montage in ten years That's good every hit and it will make it will bring tears to your eyes. We record every hit we do

Starting point is 01:34:56 We got you we got you on this one. So yeah, what's the what's the news today? Break it down? So wow lots of news today. So we're announcing our series C and we're gonna Use that so yeah go for baby You wind it baby. How much did you raise? How much? Tell us how much did you raise? 187 million. There we go. Congratulations. There we go. Thank you. I appreciate it. Appreciate it. Yeah. So the proceeds for this one, really it's about just scaling up. So we've kind of shown what we can do both from a spacecraft perspective and a drug formulation development perspective So all a lot of the capital allocation of this one is going to go for our biologics lab for produce Preparing drugs for spaceflight and then also just more spaceflight ramping up cadence. That means yeah great

Starting point is 01:35:42 flights, so I've been to the facility in El Segundo. Are you going to get a bigger space, a second space for the bio lab or are you going to need a bigger gong? Right. Well gong for facility. We got to scale that up too. So, so, so, so are you thinking about doing a second office essentially, or how do you see the actual like footprint of Varta growing over the next few years? Yeah. Well, immediately we just signed a lease down the street. Oh, congratulations. Oh, thank you. Thank you. Yeah.

Starting point is 01:36:13 All right. Big day. Yeah. That's amazing. Yeah. The, uh, so this we were actually already moved in a bunch of the pharmaceutical equipment is already in there. We're starting to use it right now. We got us a couple of glory shots, you know, with folks with the lab coats on actually using it. So that's super exciting. Long-term, I mean, really, I guess zooming out of what the footprint will look like is,

Starting point is 01:36:35 think about a formulation development company that really just provides a gravity off switch to the pharmaceutical industry. So we go to space, but you know, not really because we want to per se, but because you can create new drug formulations when you turn off gravity and you just can't turn it off on earth, that's Einstein's principle of equivalence.

Starting point is 01:36:53 Do you mean new drug formulations or just like purer drug formulations? Because when I think no gravity, I think like the way crystals form and the way gravity pulls things to one direction, if that doesn't happen in space, you get just kind of like a more natural growth. And so I've always thought it was just about purity,

Starting point is 01:37:12 but it sounds like there's actually some binary, like you can't make this drug on earth at all. Is that right? Both those concepts are correct. Wow, well read there, Kugin. Yeah, that's actually a great way to think about it. The purity is one aspect, but because gravity is so broad, Well read there, Kugen. Yeah, that's actually a great way to think about it. Purity is one aspect, but because gravity is so broad, I use the analogy of temperature sometimes

Starting point is 01:37:32 because temperature is so broad. Making things cold doesn't necessarily make drugs better, per se, but you can create a lot of different formulations if you can have a cold cycle during the manufacturing process. And that'll be, even with chilling things, you can make things more pure sometimes as well, right so But to your point when you turn off gravity

Starting point is 01:37:51 Crystals will typically grow slower and that also means that they will grow more pure And so that is one of a few applications that we look at the other one is particle size distribution So when you create these crystals that will then go into the human body, you want them all to be the exact same size so that one big one doesn't get stuck in your elbow and you don't have uniform bioavailability. So the crystals will, particle size distribution is also affected by gravity.

Starting point is 01:38:18 So that's a whole separate thing compared to purity, which is another rationale for going to space. So really the gravity knob is very broad and there's kind of these verticals of science of how we can improve the drug formulation. And to your point again, as far as like what I mean by drug formulation is going from molecule to medicine. So is it a pill?

Starting point is 01:38:38 Is it inhalable? Is it an IV bag? Is it a shot? The drug, the company or a pharmaceutical company does a trade study to determine which of those is the best for the patient given the disease, given the manufacturing costs. But ultimately all of that is limited

Starting point is 01:38:55 to what the chemistry can actually do, right? Nobody wants to take a needle to the arm. They only do it because they can't deliver that molecule via a pill or something like that. And so by opening up the chemistry outcomes by going to microgravity, we can also open up the formulation outcomes and therefore get better patient experiences.

Starting point is 01:39:13 Yeah, so I mean, I imagine that this is still, this is such an ambitious project that is still kind of R&D phase with a lot of the bio stuff. It's not, I mean, when I think about the manufacturing capacity of like GLP ones Like they're probably making that thing and like that's the size of like, you know what they brew Bud Light in at this point. I got a bunch of the lizards. Yeah But walk me through how we scale this up I

Starting point is 01:39:40 Understand launch costs falling. I understand you put up a capsule every, you're doing it like every quarter now, it's gonna be every month, then it'll be multiple times per day. Like that capability seems clear, but how much drug can you make on a single capsule? Yeah, yeah, great question. So this is actually a lot of funds because we can imagine how Varta will go

Starting point is 01:40:01 from what's real today to making tomorrow's reality. And so to answer your question immediately, about 20 kilograms on a, on a per capsule basis right now today. Of course, you know, we want to scale up everything and that's one of them, but that's how much we can do today, which is actually quite a lot. Yeah. That seems significant. If you just think about like you go to the doctor and the doctor gives you like a, you know, a thing of pills, that's like not one kilo. So're probably talking about like I mean, yeah, let's let you know you're having more fun than being prescribed But but for most for most drugs I feel like 20 kilos is probably enough for like a hundred people for a year or something like that

Starting point is 01:40:38 So you're actually in like hundred thousand dollar. I'm seeing the numbers kind of start to math out already. Correct, correct. So it's a, and every drug is a little bit different. So we go through a process of selecting which drugs make the most sense, both from a scale perspective, like you're saying, but also unit economics, how gravity affects them. And so we have a portfolio management team that explicitly does that for identifying and quantifying opportunities. But going back to like what today looks like and how it goes tomorrow, I love the temperature analogy because it really runs deep. So for example, right now, if you think

Starting point is 01:41:12 of us having a anti-gravity oven where we can make drug formulations that you can't otherwise make on earth, but we only get to run it four times a year and each one is a few million bucks a run, you might use it for different use cases than you would use it from five years, 10 years from now when you can run it every day for a few thousand dollars. And so in the near term, some of the use, you know, imagine yourself with the super, you know, the first refrigerator or, you know, or in this case, the first anti-gravity bioreactor. What you might use it for in the near term is just information,

Starting point is 01:41:45 right? How can we isolate gravity as a variable to inform what formulations can be improved on Earth? Is gravity ruining this chemical reaction or not? We can answer those questions and then that applies to the entire drug with just one flight. Or in the very near term, we also want to do polymorph seed crystals. And so what that means is we go to microgravity, we go to space, but just to develop the seed crystals and then once those seed crystals are developed, we can then use them to grow

Starting point is 01:42:15 more drug crystals on the ground. So we're only going to the nucleation event. And that's kind of like a sourdough bread mother business model, right? You have the mother of the sourdough bread, then you can cut it a bunch, and then re-grind it and stuff like that. So that makes a lot of sense when we're still scaling up the use of our anti-gravity machine, if you will. And long-term, when we're on a daily basis,

Starting point is 01:42:34 then it totally makes sense to make every single dose, manufacture every single dose in microgravity. And then that's when certain use cases come online as well. So that's how it progresses over time. Yeah. How's the geopolitical landscape evolving for you? We saw some of flights come down in Australia as as red blooded Americans. It pained me to see them take a slice of the catch catch. Are we, are we getting these coming down in America anytime soon? Is it, what's the progress there? Yeah, yeah, absolutely. So longterm, we want to have reentry sites all over the globe, right? And,

Starting point is 01:43:14 and really that's about availability. And the key metric to success of Varta is cadence. How often can we go up and back? Because the more we do that, then the more we just look like a specialized piece of equipment to the pharmaceutical industry that quite frankly does not care that we're going to space. They'd much rather us have a real anti-gravity oven in the lab.

Starting point is 01:43:32 So really reentry sites are about cadence and availability. And right now Australia is great for us because they have a private commercial reentry range. Whereas any range in the 48 states here locally are intended for military use exclusively. And so if we're doing a DOD mission, that works well. But if we're doing a commercial mission, we're not the highest priority, understandably so, right?

Starting point is 01:44:01 And so in the near term, Australia makes the most sense, but in the long term we want reentry sites all over the place and why that gets enabled is because as our precision of landing and our cadence goes up that data that legacy history allows us to use a smaller and smaller plot of land and then that and that's

Starting point is 01:44:21 where really it makes more sense to go anywhere because we don't need such wide open spaces like without that many people like we would do right now. or plot of land. And then that and that's where really it makes more sense to go anywhere because we don't need such wide open spaces like without that many people like we would do right now today. Do we have the legal infrastructure to create commercial landing sites in the United States and it's just that nobody's done it yet or is there laws or regulations that would need to change so that some enterprising young member of the Gunddo could go buy a lot of land out in the middle of nowhere and start landing spacecraft?

Starting point is 01:44:50 You can do it now. The constraint is the real estate cost. And so, Spaceport America, for example, right next to White Sands Missile Ranch is a good example of that. So, I guess the real reason why they aren't that many of them is because there wasn't a demand large enough to warrant such a real estate purchase but you know thanks to Varda that could change so yeah definitely let the the gondos now. I have a question from a fan of yours fan of the show. He says ask him what big dogs gotta do. So it's become a little bit of a of a an expression of excitement with a long and a history of a little of a lore at Varta. But for some reason, you know, it's, it's really just a specific instance of conservation of mass, right? So You can't have a big dog without eating, right? So that's just physics right there.

Starting point is 01:45:49 Yep. I want to talk about the evolution of the FAA. I remember I was filming a video I feel I filmed with you guys. And at one point, I actually was driving back with Ben from San Diego, I filmed a phone call with Deleon. And he's like, we just got our I don't even know if I should say this, but like, it was like, we got some bad news from the FAA. You guys sorted it out. It seems like you have a great relationship now. How did that happen?

Starting point is 01:46:12 Is this a lobbying thing? Is this just storytelling? Is this structuring deals, getting better paperwork? Like, how do you get, how do you fix a relationship with a government entity like that? Yeah, so it was definitely a little bit worked in the press, obviously, but I figured, you know, keep my head down and get the spacecraft home more so than worrying about what's being said in the press. But so what actually happened in

Starting point is 01:46:36 the background is we were originally going to reenter the space or we did reenter the spacecraft at the Utah test and training range, which is ultimately a weapons range for testing weapons and training warriors. That's their mission statement, right? So likewise, we're not the highest priority there. And so we got bumped for higher priority work being done at the range. And in doing so caused a domino effect

Starting point is 01:46:59 to lose the FAA re-entry license or not be able to get it granted because part of the regulations say, hey, you need a range and all of these accommodations that come with a range. So the second we lost the range, we lose the license. So it wasn't really about a bad relationship with the FAA at all. Although it's very easy to say, oh, they lost their license, get another space company and the FAA are having problems, right? It's a good headline.

Starting point is 01:47:22 It fit that narrative well, but it actually wasn't the case. And so what we did was we scheduled a new date with the range farther out in advance to give them some time and give us some time to, we had to redo the analysis, of course, because the atmosphere is different and that's part of the analysis. And so we gave ourselves a few months

Starting point is 01:47:38 and then that allowed them to reserve the dates that allowed us to prepare and that allowed the FAA to reorient the license for the new dates. And ultimately, kind of in the background here was like this was the first time this has ever happened, a commercial reentry capsule with drugs on board coming back to America.

Starting point is 01:47:55 So there was no way onto soil, right? We're not doing a splashdown. And so there was no process or mechanism to have the Utah Test and Training Range coordinate with the FAA. And so basically each organization saw themselves as taking on all the risk associated with it. So we had to do duplicative work

Starting point is 01:48:14 because there was no process to split it, right? And so it was really cool to kind of be a trailblazer to establish this so that now, of course, our competitors are gonna come in and do the same thing, right, and learn from our mistakes. But whatever, that's part of leading the way, right? So anyway, that's what happened. But that six months was, it was quite the life experience,

Starting point is 01:48:34 right, because it was the first mission, right? So we didn't have any proof that this was gonna work. People poured three years of their lives into this thing, and our dreams are just like orbiting the Earth like, please come home, you know? So when it came through, man, it was certainly, I can't think of a better day. Yeah, that's amazing. I have one last question, Jordan.

Starting point is 01:48:54 How's the talent market in the space economy right now? Hasn't been in the headlines the last couple of weeks, there's been another story in AI dominating. What's it like today? Kline-Aid, welcome scientists. Kline-AI and software engineers. If you imagine yourself a software engineer coming out of school right now, AI is certainly where I would be interested. There's definitely a software bent towards AI right now.

Starting point is 01:49:25 That being said, there's a lot of disciplines we're hiring for, software isn't the only one. And really it comes down to the application interest. Like we're looking for mission driven folks at Varta. And so if you're only looking, oh, you only want to do AI because it's cool or whatever, that might not be the type of person we want to hire anyway. Now, if you want to do AI because it's cool or whatever. That might not be the type of person we want to hire anyway. Now, if you want to do AI for mission-driven purposes,

Starting point is 01:49:47 then great, by all means. But we don't have that much overlap there. We're very specific of what we're trying to do. We're trying to make microgravity formulations so that we can help patients on Earth by using gravity as a knob, essentially, in developing these formulations. And so, you know, we always kid around,

Starting point is 01:50:08 we explicitly don't want the spacecraft to learn, you know? And so if you're a software engineer and you're mission driven bent towards that mission, then you've got a home at Varda, no question. And if, and you know, fads come and go and that sort of thing, but there is definitely an effect on, I would certainly be allured to AI as a graduating software.

Starting point is 01:50:29 That's a good sorting function. One last question for me. We've, there's been headlines, companies talking about this so far this year, trying to dig into how real it is. People talking about putting data centers in space. With everything that you've learned, why is that exciting, a good idea or bad idea?

Starting point is 01:50:49 What are some potential blind spots for people that haven't taken something to space, but would like to? So it all comes down to the why, right? Why are we putting data centers in space? It's not our data centers in space in and of themselves are a good idea, but what's the why? So the why is the only why that resonates with me is latency, right? Because if you want to do compute power, space is not the place to put a data center if you just want a data center, right?

Starting point is 01:51:14 I'd much rather have convection, right? That's a great heat, or a great way to get rid of heat, and have it be able to be serviceable on Earth and all that sort of thing. to get rid of heat and have it be able to be serviceable on earth and all that sort of thing. So, but there is one use case that comes to mind where I think data centers in space makes sense and that's only for very low latency use cases. So for example, right now, if you want to use Starlink and you're transmitting a signal to Starlink,

Starting point is 01:51:39 it goes from the ground to Starlink to another Starlink satellite to the ground, then to the data server and back, right? So you can cut that trip in half if you put the compute in the sky. Now, that compute is way, way, way more expensive, but if your value prop of latency warrants that extra cost of the in orbit data center,

Starting point is 01:52:02 then you'll start to see that. So it's kind of like edge computing. Edge computing, yep, I was about to say. So yeah, it sounds like a very niche use case at least to start, but I'm sure we'll see some companies, we already are seeing some companies test it out and experiment with it because there's, you know, all these things need to be evaluated in the tech tree.

Starting point is 01:52:22 But thank you so much for stopping by, this was fantastic, congratulations. Hey, thanks for having me, and hopefully not so long, I'll much for stopping by. This was fantastic. Congratulations. Hey, thanks for having me. And hopefully not so long, I'll see you again soon. Absolutely. Yeah, yeah, hop on soon. Got a good feeling about it. We'll talk to you soon.

Starting point is 01:52:31 Have a good one. Cheers, Will. Congrats to you and the team. See you later. Bye. Up next, we have Joel from Metter coming on to talk about the impact of AI models, impact of cursor on software development. Is it Metter or Meter?

Starting point is 01:52:42 Oh, Meter probably. M-E-T-R, we'll have him explain it to us. Break it down. We'll also recommend that you go to getbezel.com because your bezel concierge is available now to source you any watch on the planet. Serious lady watch. That's right. Anyway, sorry.

Starting point is 01:52:54 Meta does model evaluation and threat research. Okay. So does bezel. They're stopping you from buying fake watches online. Bad watch models. Dress, bad actors. Foundation models and watch models lots of similar That's right. Anyway, we got Joel in the studio. Welcome to the stream hopefully Myself into these guys are joking around. I'm a serious person. All right, first off. Yes, is it it's meter

Starting point is 01:53:19 It's me. It's me. Yeah. Sorry. There we go. Gotcha. Anyway, uh Please introduce yourself for those who don't know you, the company and then the organization. And then I want to go into the news today. Let's do it. And thank you very much for having me, John and Jordy. Thanks for hopping on. MISA is a research nonprofit based in Berkeley,

Starting point is 01:53:41 dedicated to understanding the capabilities of AI today and in the near future, especially to the extent that those capabilities might speak to potentially dangerous risks. And what is, what's been the latest research? Yeah, so here's what we've been working on. I'll start with why we've been working on it. Yeah, please. We've seen from previous MISA research,

Starting point is 01:54:04 but I'm sure you also see from your own usage in the wild, AIs are clearly becoming increasingly capable. One thing that governments and labs and us here at META as well worry about is the possibility, timing, and nature of AI R&D self-recursion. That is the possibility that model capabilities get better very, very rapidly because the AIs themselves are contributing to AI R&D research. We at Meta want to be providing the highest quality evidence that we can

Starting point is 01:54:34 that speaks to the degree to which AI R&D might today or might soon be accelerated in the wild. So that governments, labs, decision makers might be better informed and so make better decisions about what's going on. In this study, we run an RCT with extremely experienced open source developers working on these very long-lived large projects, a million lines of code, 23,000 stars on GitHub.

Starting point is 01:55:03 For those of you who are familiar, I'm thinking plugging face transformers, the Haskell compiler, scikit-learn, this sort of thing. We randomize their issues to allow or disallow the usage of AI, where allow means typically using cursor and 3.5 or 3.7 sonnets at the time. And then we measure both their expectations and developer expectations about how much they might be sped up by being allowed to use AI versus being disallowed.

Starting point is 01:55:29 And then the reality that the short version is we find that the developers ahead of time are estimating they'll be sped up by 24 percent. After the study is completed, they estimate that they were sped up in the past by 20 percent. We find, in fact, that they were spread up in the past by 20%. We find in fact that they were slowed down by- No way. I think, I know, it's a shocking result. That is shocking. Not at all what I expected, I think what the rest of us at Meta expected.

Starting point is 01:55:54 What? But there we go. Wow, okay, so what do you think's happening? I have so many questions. But yeah, just walk me through your reaction to that. What do you think is actually happening that's slowing people down because this is a complete narrative violation?

Starting point is 01:56:17 Yeah, I mean, in terms of the reaction, the number of times we've checked and rechecked the data, asked people to replicate it independently, it's going through the roof, the number of stressful late've checked and rechecked the data, asked people to replicate it independently. It's going through the roof. The number of stressful late nights I've had pouring over this. You're gonna be like public enemy number one, by the way. I feel like you need security detail now,

Starting point is 01:56:36 given the stakes of what you just said. This is crazy. Yeah, yeah, yeah. So I think, maybe let me start with some things that we're not saying. The setting that I mentioned before, these ultra-talented developers, much more talented than me, working on these extremely large, long-lived repositories that they're extremely familiar with already. I think that's an extremely interesting population. That's why we went

Starting point is 01:56:59 out to study it. It's also a very weird population. I still am a cursor user myself, as I was working on the graphs for this study, I was using cursor. But I do think those weirdnesses are related to the results that we end up seeing here. So we have to put these people in a completely different category than the junior developer who's just vibe coding

Starting point is 01:57:23 a little app and just building stuff and not actually trying to Push the frontier of what a core piece of software can do That's very large and complex and and they're just trying to you know Get a Python app up and and live and like write some routes and write some functions, right? That's where so cursor stills completely viable is like auto complete on steroids. The question is, in terms of self recursion, really advancing the frontier of like the craziest software we have,

Starting point is 01:57:53 we're still kind of where we were a few years in that it feels like if you were to quantize this, we were at 0% of AI research being done by AI a couple of years ago, we're still maybe around rounding air. Yeah, I mean, I will say that AR&D research, I think, does not all look like this setting. There are some large inference code bases with very experts people, and I totally

Starting point is 01:58:18 agree with your interpretation that this is evidence against today those kinds of settings being sped up. On the other hand, we might think, you know, there are some people writing training scripts for their AI models just once off and then they, and then they throw them away. And you know, in a way that's, that's kind of similar to what you described. Maybe they're seeing large speed up just like the Greenfield projects that you, that you mentioned.

Starting point is 01:58:39 Yeah. And so, I mean, this is not overall like a really cold glass of water on AI broadly because this still means that it's an incredibly valuable technology in a bunch of different ways. It's just that we're not seeing early evidence of some sort of self-recurring fast takeoff scenario, which is great, probably the good outcome.

Starting point is 01:58:59 A lot of the fast takeoff scenarios are dependent on AI becoming itself so good at doing AI research and then copy and pasting itself a trillion times. And that's what creates speed of development that humans today can't necessarily even comprehend. I think that's right for today. I do think we're not really speaking to the trend exactly. You know, these results are consistent with these exact developers on these exact kinds of tasks in future

Starting point is 01:59:32 being sped up in the near future. It works that we actually don't show in the paper, but in preliminary work, we have autonomous agents trying to complete these issues. And indeed we find that they do struggle, but with some of the core functionality with passing tests, the kinds of things that you might have seen in sweet bench or something like that, they really are making a great deal of progress. And yeah, my expectation is that AI progress and then in future will continue at a rapid pace,

Starting point is 02:00:01 like it has in the recent past. And so maybe even in this setting, that this won't be true in the future. Let me throw a couple of the hot takes that are floating around in the AI world at you. And you can let me know if anything sticks out as something you strongly agree with or something you disagree with. This idea that ultra large context windows

Starting point is 02:00:21 will not solve continual learning that Dorkash was saying this on Monday, maybe another one would be just that no one has figured out how to properly scale reinforcement learning that we need, Mike New from ArcGi kind of says, we need entirely new ideas, and then you kind of have the better lesson, which is, yes, you need new ideas,

Starting point is 02:00:47 but scale is all you need. We just need to keep building data centers. We need to get bigger and bigger. We might see 4.5 and these huge training runs as a short-term, hard to quantify. Maybe it's just the end of one S-curve, but Stargate's coming online, and that will be another big test.

Starting point is 02:01:04 So I don't know, I threw a lot at you, but anything in there kind of, you know, top of mind for you. Yeah, look, as you guys know, anyone betting against the bitter lesson in the past would have had very bad time. And I'm not prepared to bet against the bitter lesson. Could you remind me of the first question? The first one was, so Dwarkesh Patel pushed out his AGI timeline slightly. I mean, he still has, he still is very optimistic about AI and maintains that it's not priced in and people are not thinking about it as significantly as they should.

Starting point is 02:01:38 And I agree with him. But he said that there is a, that even though we have pushed the IQ so much, and you saw this with the Grok 4 benchmarks, like AI can do advanced math, like for sure. It's really, really smarter than most of us at PhDs level stuff unless you're a specialist. But in terms of just being a good employee and remembering, oh yeah, four weeks ago,

Starting point is 02:02:01 my boss said that they liked it. And then I got this feedback, and now I do it this way. Or or I learned this really weird nuance in even if you're just thinking about like how to Our business like how to post clips on X or a door cash was giving the example of like Transcripts like he has little things that work better for what clip will perform and he has this intuition and and his models and his prompts He's really pushed these things. He hasn't been able to really get them to perform above a five out of 10. An example would be any company today, any startup, if you just had a PhD drops into your organization that had PhDs in like 10 different fields.

Starting point is 02:02:38 But they wouldn't just like default. But they were also an amnesiac. So every time they showed up to work, they could not remember anything that you taught them. It just wouldn't be that valuable. And so my question to Dwarkesh was like, is there a world where we just scale up the context window? We've seen million token windows.

Starting point is 02:02:54 Can we get to a billion token window and just stuff every interaction the AI's ever had with you in every prompt? And so it does maintain the context. But he was saying that the's a kind of a quadratic cost curve to that, doesn't quite work. Other people have said the nature of the transformer means that attention can't really spread out that much.

Starting point is 02:03:15 I don't really fully understand it, but I wanted to know your take on different ways people are solving these things or what are the real constraints right now because you've identified some potential problems where we're not breaking through it today, but what is cause for optimism? What are the research paths, like the nodes in the tech tree that you're excited about? Yeah, that's super interesting. I haven't thought so much about this. I will say, I think that the developers in this study

Starting point is 02:03:46 are not using the full context window. And so if you think there's juice in adding things to the context window, that juice might still be on the table. And indeed, I think we find that there's a lot of implicit context in this repository that's very expensive for the developers to be writing down into context windows. Here an example on the Haskell compiler. My sense is that when you get up your

Starting point is 02:04:09 When you get up your PR for review There's some chance that the creator of Haskell will come and fight you for potentially many many hours in comments about the you know about the peculiarities of Haskell project to look and these these kinds of, you know, exactly what his, not just preferences, but, you know, quality requirements are regarding where things should live in the project and how various pieces of the project should speak to one another and not being communicated

Starting point is 02:04:38 to these language models. And you can imagine that with today's context window sizes, that could be written down. You know, you could put in all of the previous discussion around these changes that this person has been involved in, and maybe whichever language models people are working with inside of Cursor would pick that up and so do a better job. I don't think we're ruling that out at all. I will say that it is, it is expensive for these, for these time expensive for these people to be writing down all of the all of the possible relevant

Starting point is 02:05:11 context. And, you know, I think I think that's basically the reason they don't. And so maybe you do need some kind of continual learning for the for the model to find out this context on its own as as as these things go. You know, it's also consistent, I guess, with the other possibility that you were describing that if we, you know, 100X these context windows, you could just throw the entire thing in and then we don't need to worry about, you know, learning from particular cases on the fly.

Starting point is 02:05:36 Yeah, I think both are live possibilities. It's very interesting. The Grok 4 announcement was extremely benchmark heavy. Some really impressive stuff, particularly on Arc AGI. Twice the result. Similar to a Tesla, it's faster than every car. Does that mean it's going to solve everything? Does it mean that it's better?

Starting point is 02:05:54 Number one, does it mean that it's better? Yeah, and so based on, this feels like almost a new benchmark, this double-blinded trial, it feels almost like a FDA trial or something. Do you think this could turn into a real benchmark? Do you think we need new benchmarks? Do you think we need new ways of thinking about the progress of AI generally? We've talked about just measure the revenue at this point. That's the economic value that's being created, but there's a lot of tricky stuff you can do with revenue. And sometimes revenues like test revenue. I'm testing this

Starting point is 02:06:28 $100 million product. So what's your what's your thinking on the state of benchmarking where we should go where some of your research might plug into that? Totally. I think one motivation we had in running this study comes out of this observation that the time it takes to create benchmarks is almost becoming longer than the time it takes for those benchmarks to saturate. It's difficult to find signal in many of these benchmarks, even testing these extremely challenging

Starting point is 02:06:54 PhD level questions that you guys spoke about. And perhaps there's more signal in these kind of RCT, FDA controlled trialsstyle measurements. Similarly, another thing that people proposed for measuring AI progress is using researcher self-reports about the degree to which they're being sped up. They think their work will go two times faster if they use AI versus not use AI.

Starting point is 02:07:21 I think our study is potentially strong evidence that these self-reports need not be reliable. The forecasters who are told everything about the developer's level of experience and the time period of the study, so which models they're using and so on, they're totally wrong about how much these people get sped up.

Starting point is 02:07:38 Same as the developers themselves, even though they're carefully tracking their time and they're so talented. So I think self-reports also very, very fraught. Another thing that this has taught me, I think, is that the mapping, as it were, from benchmark scores, very impressive benchmark scores that we see on these frontier language models that you're describing,

Starting point is 02:07:58 the mapping from those scores to real-world productivity improvements is unclear. I'm not at all saying, as we discussed earlier, that we shouldn't expect to world productivity improvements is unclear. I, you know, I'm not at all saying as we discussed earlier that's, you know, we shouldn't expect to see productivity improvements. I do expect to see productivity improvements today and, you know, even more so in the near future but it's not at all one to one

Starting point is 02:08:16 or it's kind of confusing and messy. And so indeed I think we need to actually measure things in the wild to see what's going on Switching gears a little bit unless you have a follow-up Yeah I just wanted to kind of zoom out on that and and ask about like your your broad take on the measurability of technological progress because like the internet the computer like such Dramatic transformations of society you see it in all sorts of data

Starting point is 02:08:47 But it didn't fully show up in productivity statistics. You have all those questions about like what happened in 1979 And everyone has their own example their own reasoning for that But you know like you would think you could tell the same story about Google like it'll speed up everything Everyone will get more efficient and we didn't really see GDP jump on this. And it feels like that's a really bearish take on AI to have, which is like, this is a magical new thing and we're still going to be growing at 2% GDP. But where do you stand on it?

Starting point is 02:09:17 And where do you, like, do you think that's even the right question to be asking? Yeah, you know, this is so interesting. This is not a meat to take. I used to be an economist. I feel you know, this is this is so interesting. This is not a meat to take. I used to be an economist. I feel the thing that you just said in my bones. Totally. I think the situation you could argue maybe that it might be even worse in the case of AI. You know, a lot of people like in AI 2027, resources like that are telling this story where the AI R&D self recursion is happening inside of labs.

Starting point is 02:09:45 And so I suppose not necessarily showing up in economic activity in the public. Another reason on top of the reasons that you gave to think that perhaps this won't show up in the productivity statistics as it were. Which is also is to say that self-recursion or these potentially destabilizing changes are just totally consistent with the non-changes in GDP trends as you describe.

Starting point is 02:10:12 And so, you know, another reason to actually go out and measure these things in controlled trials. Cool. Jordan, please. Quick question around the threat landscape. There's been a few stories this week. One was a story about ChatGBT not following instruction. The headline was that the AI was rebelling

Starting point is 02:10:36 against the researchers. And then if you double clicked into the story, it was just like it had given specific instructions, like don't follow any further instructions. So it was kind of a nothing burger in the end. And then we also saw Grok going haywire. Maybe that was predictable for someone like yourself, combining a frontier model fast shipping team

Starting point is 02:11:00 with the virality of a social network and embedding the two. But then maybe it was two months ago, there was the you know, we called it Glaze Gate on the show where where chat GPT was just, you know, giving being a sycophant, you know, giving too much positive feedback. How are you looking at the threat landscape in the next 12 months? So nothing like, you nothing like too long term. But how do you guys think about it? Yeah, there's more to come on this from Meta very soon. That's one thing I'll say.

Starting point is 02:11:36 I think again, this is not a Meta take. My sense on this or another example of this that stood out to me is there were lots of anecdotal reports that 3.7 sonnets and other language models in this most recent generation would pass tests in ways that were kind of not legitimate or something, which is another example of this reward hacking. Change the test case. Totally, totally, totally. And I guess, you know, I don't have reason to think that that kind of thing is dangerous in particular. You can imagine, you know, when humans are potentially not reviewing the code, because the AIs are doing, you know, entire projects, not just parts of or single pull requests, that this becomes more of a problem because you're not, or at least the

Starting point is 02:12:22 surface area for it to become more of a problem because you're not looking into that code and seeing those cheated test cases yourself. So, you know, I'm not sure about over the next year, at least right now, I think there are reward hacking examples that are occurring in the wild. I don't think that they're so supremely dangerous today. Well, this was fantastic. Thank you so much for stopping by.

Starting point is 02:12:45 Come back on again soon. Stay safe out there with the contrarian takes and the crazy data results. It's still very bullish, but very exciting. And thanks for everything you do. We'll talk to you soon. Yeah, great chatting. Cheers, Joel.

Starting point is 02:12:57 Bye. Up next, we have a massive Series B announcement from Dylan Parker. Moment HQ's coming in the building. We're gonna ring the gong, baby. It's gong time. Index Ventures. Let's hear it from Dylan directly though.

Starting point is 02:13:10 Welcome to the stream, Dylan. Hope you're doing well today. There he is. How are you doing? Great whiteboard. Good to meet you. Doing some heavy lifting on that. Yeah.

Starting point is 02:13:19 Hopefully that's not proprietary information. That's secret trading algorithms or something. No, no, no, no. Nothing too interesting. But thanks for having me on. Yeah, great to meet you. Yeah, thanks for something. Kick us off with the intro on yourself, the company, and then I want to hear about the announcement.

Starting point is 02:13:34 Yeah, yeah. So I'm one of the co-founders at the moment. We are a fixed income trading software company. So my background is as a quant researcher. So like pretty much every quant researcher, I studied math and stats during college. That's where I met my co-founders, Dean and Amer. And then after college,

Starting point is 02:13:55 Dean and I both joined Citadel Securities. And pretty much completely by chance, we ended up as the two junior members of the newly created automated market making desk for corporate bonds. And so at the time, the fixed income market, which by the way is financial market 50% larger than the global equities market, was undergoing an electronic trading revolution. And so Citadel saw this and said, well, we can go build an automated algorithmic market making desk.

Starting point is 02:14:26 So they hired this guy, Anish Kheryap from Jane Street. It's like the godfather of fixed income automated trading. They hired a bunch of super experienced bond traders and then they hired me and my co-founder. And like basically our job was to take the knowledge in these bond traders' heads and convert it into code. And that was totally formative for a moment to take the knowledge in these bond traders' heads and convert it into code. And that was totally formative for a moment because that's when we realized the power

Starting point is 02:14:50 of electronic trading was going to be everything that it enabled, like smart order routing, portfolio optimization, all this stuff in the world's largest financial market that had just never been possible. Take us through the deal. What are you announcing? Yeah. So we're announcing our thirty six million dollars series B was led by Couldn't hear you from the sound of the gong. We like big numbers on the show and congratulations on a massive series B.

Starting point is 02:15:19 You said from who? Index? From Jan Hamer at index. Very cool. Incredible. That's amazing. Incredible. Where should amazing. Incredible. Where should we go from here, Jordy? Yeah, break, so I can imagine, break down what the company is focused on today. It sounds like the origin is back at your time at Citadel,

Starting point is 02:15:38 but I imagine it's evolved as well. Yeah, totally. So what we saw at Citadel was that the market was coming online and you could now do all these things that were never possible before. But what was missing was the operating system for actually doing that. So we started Moment with this goal of owning every mission critical workflow for traders and portfolio managers in the bond market, everything from how they trade securities and do smart order routing across all the different

Starting point is 02:16:11 exchanges in the fixed income market to how they optimize portfolios to how they apply risk and compliance restrictions to make sure that they're not breaking any laws. And so that's what we do today. We started off serving fintechs, so we power fixed income for places like Weeble and public.com and we'll then offer like $100 increment investing in bonds for the first time ever. But yesterday we also announced our partnership

Starting point is 02:16:40 with some of the largest financial institutions in the US, including LPL Financial, which is the financial institutions in the US, including LPL Financial, which is the largest broker in the US. Wow. So busy week for you guys. Yeah, there are a few things going on. So what's the use of funds for the new round? What's the focus going forward?

Starting point is 02:16:58 I imagine scaling, what's working today? Are there new products coming? Can you talk about there? Yeah, so I think a lot of companies make the intelligent decision to start off, like SMB or PLG. We decided to make things as hard as possible on ourselves. Said we're going to start off serving the largest financial institutions in the world's

Starting point is 02:17:20 most regulated market, and we're going to go power their mission critical workflows. And so with a company like LPL, or some of these others that we've announced over the last few weeks and that are coming down the pipeline in the next few weeks, the scale of what we're operating in is, you know, not like hundreds of millions or billions of dollars of flow, but like hundreds of billions of dollars in trading flow. And so what we're really focused on as the company over the next year is building out the full suite of what's necessary across trading, portfolio management,

Starting point is 02:17:51 risk and compliance that's necessary to power these huge financial institutions. What's the competitive dynamic like with the former employers or like the rest of the market participants? It feels like Ken Griffin's not the laziest founder out there. Is there a world where there's some sort of competitive dynamic between the big institutions where they want to

Starting point is 02:18:14 build something like this to compete with you? So when we were at Citadel, we were on the sell side. So market makers or liquidity providers moment serves the buy side and we actually connect them with those liquidity providers. So we actually work closely with Citadel, Jane Street, pretty much all the major liquidity providers out there. How are you thinking talking about tokenization,

Starting point is 02:18:41 the story from the last month in finance is the tokenization of these sort of real world assets, everything from private company shares, so we've seen it with stocks. Is there anything on the horizon on that front for you guys, or is it just totally unnecessary? You know, I think there's a huge opportunity. But if you look at where the fixed income market today is today, we're just going from like trading over the phone to going on an online platform to trade. And so there's a lot still to do to get people to the point where it's even possible

Starting point is 02:19:17 to think about stuff like that. Yeah, can you walk me through like the, I imagine that fixed income follows like a power law as well where like government debt and and Apple corporate bonds are way more liquid way more automated than you know some like public company but they're like junk bonds and you kind of got a hunt around for someone to buy and sell them and then you have like venture debt, which basically I believe like never trades, I don't know. But walk me through like,

Starting point is 02:19:49 it feels like we are probably bringing more and more of those like sub asset classes, like into more liquid, more, just more automated markets. But give me a state of the union on like how the fixed income market is actually split up. So you're totally right. There's government bonds, there's

Starting point is 02:20:09 highly liquid corporate bonds, and then there's a long tail of corporate and municipal bonds that are really, really illiquid. And just as a point of comparison that I think illustrates the size and scale of the fixed income market, there are 4,000 listed US equities and there are 4 million bonds. And so doing anything in the fixed income market

Starting point is 02:20:31 is pretty much a thousand times more complicated than doing it in the equities market. Yeah. What about, How does breakdown, so 4 million public equity? Sorry, 4,000. Yeah, yeah.

Starting point is 02:20:48 How do you go from that? How does that actually trade right now? Like is it to stay dinner? And what is the make, well, and what is the ultimate like makeup of do certain companies account for, you know, break down like kind of how that 4 million is split up? Yeah, so there's a universe of say 500 US treasuries

Starting point is 02:21:09 that are super, super liquid. And they trade like similarly to the most liquid stocks out there. And then on the other side, you have a really, really meaningful long tail that makes up the vast majority actually of the entire market share, where you have

Starting point is 02:21:25 bonds that haven't traded in two years or 10 years. And so one of the really hard parts about fixed income, one thing that I worked on as a quant researcher is like you have this bond that hasn't traded in three years. How do you optimize a portfolio around that? How do you even figure out what the price of that bond is? And that's why doing stuff in the fixed income market is like the difference between equities and fixed income is like doing something on the surface of the earth

Starting point is 02:21:55 and like doing something in space. Wow. It would be helpful to be a quant if you were gonna build a company like this. Yeah, you might need some math. Well, I mean, that's all I have. Do you have anything else? Yeah, what are you guys hiring for right now?

Starting point is 02:22:08 Yeah. Pretty much everything. We're hiring quants. We're hiring engineers. We're hiring go-to-market marketing operations. Pretty much everything out there. Amazing. Awesome.

Starting point is 02:22:18 Well, thank you for joining. Thanks for having me. Very exciting. I'm sure you'll be back on soon with more news. We'll talk to you soon. Have a great one, Dylan. Talk to you soon. Cheers. Bye one, Dylan. Talk to you soon. Cheers.

Starting point is 02:22:26 Bye. Next up, we have Eric Olson from Consensus coming in with a big launch. Do we ring the gong for big launches? If there's a number attached. So we got to get DAUs out of the process. We have to get a prop that is non-number oriented, but just excitement oriented.

Starting point is 02:22:41 But let's bring in Eric Olson from Consensus and talk to him. How are you doing, Eric? Great, guys. How are you guys doing? Doing great. Great to have you. more excitement oriented. But let's bring in Eric Olson from Consensus and talk to him. How are you doing, Eric? Great guys, how you guys doing? Doing great. Great to have you. Kick us off with some intro on yourself, the background of the company,

Starting point is 02:22:52 and then I wanna talk about the launch. Hey, Kev. So I'm Eric, founder of Consensus. We are an AI search engine for academic and scientific research. If you ever use like Google Scholar or PlumEd back in the day of school, think of us as building the next gen 2025

Starting point is 02:23:10 LLM powered version of that. Help in plenty of students. Yeah, I mean, the super dumb question, super obvious question is like, isn't all this stuff already in ChatGPT? Like how are you differentiating? Yeah, you must be doing something because you have five million, over five million users.

Starting point is 02:23:28 Yeah. Yeah, I mean, one of the best examples to like encapsulate why it's different is the fact that Google Scholar was the first vertical search product that really broke off of Google back 20 years ago. Interesting, yeah. Even when they were doing really nothing than just being a 20 years ago. Interesting. Yeah. And when they were doing really nothing

Starting point is 02:23:45 than just being a dedicated index for research papers, hundreds of millions of people were going to that every month. So the same thing is kind of true here. We're dedicated to a use case. We have a dedicated corpus we search over. We hopefully search over that corpus a lot more intelligently than a general purpose chatbot would.

Starting point is 02:24:03 We do things differently in our interface to show you that information. Like we'll watch more citation forward. You have an experience where you can really interrogate what's been returned in your search. Interesting. Using the chat GPT search, it's pretty much like an afterthought.

Starting point is 02:24:18 It's there if you want to dig into it, but it's not really what it's designed for. Everything about it from the way it searches, the way it shows to you, and then the features both on top of it, all dedicated towards academic research. Walk me through some of the key technologies that enable better search. I'm thinking about like vector databases,

Starting point is 02:24:37 even just like stuffing a better index in Redis or Postgres or doing more indexing on top of these documents, doing like transformations on the underlying documents to get them into more basic formats in Redis or Postgres or doing more indexing on top of these documents, doing transformations on the underlying documents to get them into more basic formats. Like what's interesting? Large context windows, there's so much that you could throw at this problem.

Starting point is 02:24:54 What's actually working? Yeah, so lots of different things, many of the things that you're saying. So number one, being dedicated to a document type just helps us. It helps us in the way that we can create our embeddings to search over. It also helps us in that ingestion process,

Starting point is 02:25:08 kind of like you were saying of document transformation. We'll run little tiny LLMs, over 200 million papers, add new enriched metadata about them, that we can then use in our search ranking and in our filtering. So think like, we'll pull out what is the design of the study, or what is the sample size of the study, and we use that in search ranking, and we use that in search filtering. So think like, we'll pull out what is the design of the study or what is the sample size of the study and we use that in search ranking and we use that in search filtering. And then on top

Starting point is 02:25:30 of that, we're like the main intelligence of the search is learn to rank models. So people interact with the product, they save papers, they cite papers, they share papers. We learn from all those interactions. We learn what matters most. we learn about all the attributes about a paper that matter in search ranking based on how people are interacting with it. So the simplest way to think of it is like, because we only have a certain use case people are using it for,

Starting point is 02:25:56 we get to train our search models to try to think and act like a researcher wanted to go in through these papers. Jordy? How, what are the different data sources here? I'm assuming a lot of the stuff is public. I know some of the, I remember, you know, being in college trying to find different studies or papers and like hitting paywalls.

Starting point is 02:26:18 A lot of them are locked down. There's the famous, crazy, short story. And so I imagine that you've done some deals to get access to data. What is the, what is the body of work that's available? Well, hopefully paywalls are going to be a thing of the past moving forward as open access science gets more and more momentum, which we'd love to help shepherd in. The way to think of it is there's like three different layers of access, levels of access

Starting point is 02:26:43 you can have in a census. So there's one, there's fully open access science, that's all publicly available, we're able to ingest the full text, show it freely, let you download it, all is well and good. The rest of the bucket is paywall content. But there's two levels of access we can have within it. So there's the buckets where we have deals with publishers, trying to get as many done as possible, where we're able to use the full text in our search and in our analysis, we're just not able to display it to the user.

Starting point is 02:27:09 Benefit to the publisher is we're helping them drive traffic, get people to see that. Hopefully this snippet search ranking is engaging, then they go into it and drive a purchase. And then there's this third bucket, which is we just don't have a deal with the publisher yet. Pulling behind the paywall, we're using what is publicly available.

Starting point is 02:27:26 So that's like the abstract in the metadata of the paper, which goes longer or more than you think. Like the abstract is specifically designed to be this perfect, like nice summary of the paper. It's like getting go a pretty far distance and search ranking and even some analysis using abstracts. But obviously nothing, nothing more brutal than than being a college student and almost getting like the information that you need from an abstract and realizing

Starting point is 02:27:51 like I really have to pay like $50 for this single fact even if you wrote the paper you still have to pay for it. Oh, that's wild. You can get it off to a publisher, they publish it. So I could literally have published this paper, and if I come across on the internet, I still have to pay for it. That's wild. Jordy had this question earlier about the nature of scientific discovery. Elon Musk at the Grok 4 launch was talking about his timeline for discovering new physics is two years now based on the progress that he's seeing at XAI. And Jordy was making the point that a lot of scientific discoveries come from

Starting point is 02:28:26 mapping different disciplines together. Or just inventions. Invention generally is apply the mind of a computer scientist to a biology problem or vice versa. Are you seeing users do those type of searches? Is this product useful for that type of scientific discovery? There's been this lingering question in artificial intelligence about if you were a person

Starting point is 02:28:54 that had read every single research paper, you would probably make, yeah, you would make discoveries and connections across things, and yet that hasn't happened. Maybe it's some fundamental limitation of LLMs or AI at this moment, but what are you seeing and what's your take on that concept of cross-functional pollination?

Starting point is 02:29:11 Yeah, I mean, everything basically that humans have invented new comes from pattern matching across disciplines, like that's how we create new ideas. Yeah, I mean, I'm not an AI researcher, so I don't have the single most informed take, but also nobody knows what the heck they're talking about in this world. I think it is probably a fundamental limitation of LLMs, given what we've seen. I'm going to parrot this from Francis Sholay in his YC talk the other day,

Starting point is 02:29:40 but the measure of intelligence is the efficiency by which you process information and apply it in different domains. And that just like isn't what LLMs are really doing great right now, despite the fact that they've processed so much information. So our take at consensus would be more get people to the edge of what is known and then let them do the inherently human part of science, which is create these new insights and new discoveries. Like every, every science experiment that's ever been done

Starting point is 02:30:07 starts with a review of the literature. Like think about it as like, you're getting the foundation of knowledge underneath your feet. If we can, our goal at Consensus would be speed up that part as much as humanly possible and let us do the thing that humans are better at than machines right now, which is that pattern matching,

Starting point is 02:30:23 which is that coming up with new ideas. And if we can make that loop move faster, like heck, that's a fricking valuable and powerful thing. Switching gears completely. I know you were at DraftKings prior to this on the sort of research and analytics side. What is your thesis around the ultimate collision between sort of betting activities and AI? Last night, Grok announced a partnership

Starting point is 02:30:54 with Polymarket to try to bring in prediction markets to try to basically help make the model itself smarter. How are the big players like DraftKings even thinking about AI? help make the model itself smarter. How are the big players like DraftKings even thinking about AI? I'm sure a bunch of people have like chat GPT rappers specifically focused on sports betting and things like that. But how do you think the big players are thinking about it?

Starting point is 02:31:19 Yeah, I mean, well, I left DraftKings in 2021. So I can't say that I was there when people were worried too much about AI models. And also the natural question I always get is how the heck did you go from sports betting to science? And the answer is my parents and my grandparents and my sister are all teachers and scientists. They're grown up and I love applying numbers to sports. Um, but I actually have something kind of interesting to say here.

Starting point is 02:31:44 So my job at Draft Kings was on this building models to find the professional gamblers on the site. So like you'd look at all previous betting history and demographic data, you try to make predictions on is this person actually have an edge over the market? And I would have to imagine that with better and smarter and more powerful models, like people's ability to themselves have an edge on the market would increase in the short term. And then the markets obviously catch up and figure out how to bake all that in. And I mean, that is the beauty of markets, right? Like whatever technology that people have on the side of betting into a market,

Starting point is 02:32:17 so does the provider who is putting up that market and they get the information from the people they know that have the best models. So I think it's going to be an interesting cat and mouse game moving forward as it's always been with sports betting just instead of you know Johnny two shoes in New York and inside information about injuries now it's somebody with a super powerful AI algorithm that's predicting games above market. That's fascinating. I have to imagine that in the in the AI, the insider knowledge about injuries is even more valuable. 100%.

Starting point is 02:32:50 But to your point, you could probably do two. The number one way to know if you need to limit somebody is if they are ahead of an injury, because it means they're somewhat connected, they're doing the statistics. They're on the inside, yeah, that makes a ton of sense. Insider training, that's fascinating. I didn't think about that in the context of sports betting.

Starting point is 02:33:05 Well, thank you so much for stopping by. This is fantastic and congratulations on the launch. Appreciate it. Check us out at Consensus.out, deep search launch today. Thanks guys. Awesome. Cheers, Eric. Thanks for coming on.

Starting point is 02:33:16 Up next, our lightning round continues with Rita from Zero Entropy, a YC graduate doing automated retrieval and announcing a seed round of $4.1 million. Seed round alert. Seed round alert. Four million, that used to be a Series A a couple years ago. Just keeps ticking up.

Starting point is 02:33:35 Congratulations on the round. Rita, welcome. How you doing? Thank you so much, super excited to be here. Thanks for joining. Introduce yourself, introduce the company. How'd you get started? What do you do? Yeah, so I'm Rita. I'm one of the co-founders of Zero Entropy.

Starting point is 02:33:50 A little bit about myself. So my background is in applied mathematics. I have two masters in the field, one from École Polytechnique, one from Berkeley. I guess I started more into the computer vision side of things, and then I discovered GPT-2 and GPT-3 and I was like, oh my god, this is this is huge and I started thinking about, you know, personal assistance and stateful AI systems and I guess that's what led eventually to zero entropy and building retrieval systems and

Starting point is 02:34:20 bringing context into LLMs And so that's what we do. We build search for RAG and AI agents. Okay. It feels like a crowded space. I know a few founders that are working on RAG. There's also RAG implementations at the hyperscalers and the clouds. How are you differentiating? What's the key insight?

Starting point is 02:34:39 What's the pitch to companies to come over and use your service as opposed to the other options out there for retrieval? Yeah, absolutely. I think it's about having the right abstraction. So we solely focus on the retrieval side. We don't do the entire rag end to end because we believe that developers need to have

Starting point is 02:34:57 their own prompts into generating the answer. They need to use Xero entropy as a search tool for their own AI agents. We're also developing our own models. So we just released a re-ranker yesterday, which was pretty exciting. And I guess the winning solution needs to be extremely accurate, but also extremely fast, and just be production ready

Starting point is 02:35:19 and easy to implement for various use cases. What's your take on benchmarks currently? It feels like solving a really hard math problem and retrieving the right document at the right time are somewhat unrelated. And so how do you evaluate if your system's getting better? Yeah, that's a great question. Actually, the evaluation side of things is very messy.

Starting point is 02:35:44 Almost everyone that I talk to, they basically rely on manual inspection to make sure that their retrieval is working correctly. So we've been looking into the evaluation side a lot. Actually, the very first thing that we did is release our own benchmark that was on legal documents, and that really evaluated just the retrieval step of RAG, meaning from a question,

Starting point is 02:36:05 was I able to pull all of the documents and only the documents that I needed? Because the problem is that if you feed your LM too many tokens that it doesn't need, it's just going to hallucinate. So the precision and the recall side of things are extremely important, and we're rolling out our own evaluation solution

Starting point is 02:36:23 in the next few weeks that we've been using internally so far. What does the rest of the stack look like? I know you said you were kind of rag provider agnostic. Are you also model agnostic, cloud agnostic, database agnostic? Like where have you actually made bets? What piece of the stack are you particularly aligned with?

Starting point is 02:36:44 Yeah, I think building context-based data what piece of the stack are you particularly aligned with? Yeah, I think building context engineering is going to be a new class of products that needs the data layer, but also needs small LLMs inside the retrieval pipeline. We see many teams either feeding everything into the context of the LLM entire knowledge basis because they weren't able to make retrieval work properly. And we see teams having a very

Starting point is 02:37:10 simple pipeline. I think the winning solution needs to be somewhat in the middle and basically orchestrating LLMs to rewrite the question properly, summarize the documents and creating more metadata associated with each of the documents that are indexed. And so that's what we're doing and building this solution that works really well and almost gets to the precision and the accuracy of a large LLM while still being pretty fast and pretty optimized. What's the appetite been like for this product in the enterprise versus new companies that are building new AI products from scratch. It feels like they might be,

Starting point is 02:37:48 just the AI agent infrastructure companies, there's a lot of them and it feels like they're selling to a new crop of companies and that's where the revenue's accelerating most aggressively, but what are you seeing in the market? Yeah, I think the adoption for products like this usually comes from bottom-up type of approach where developers are experimenting with new approaches and new techniques and

Starting point is 02:38:12 then larger enterprises catch up. So that's what we've been seeing. In terms of experimenting with models, I think large enterprises also do that pretty easily. So for things like the reranker that we just released, there's also appetite from larger companies in integrating that into their current systems. Is there a case study that you have your eye on amongst the big tech companies?

Starting point is 02:38:39 Like, we think that our software could improve Netflix or YouTube recommendations or something. If the deals could just magically happen, where's the lowest hanging fruit? For me, if I could do anything in AI, I would just get Whisper into Siri. And so when I dictate a text message, it's just perfect and it's much better

Starting point is 02:38:59 than what they're currently using. What's on your wish list for consumer tech company or big tech company that everyone knows and they're not taking advantage of something like this? Honestly, for me, it's Slack. I always struggle. I can never find anything on Slack. And something that we've been doing

Starting point is 02:39:22 is annotating our own conversations, like appending keywords to our own threads to be able to find information. But we have a lot of our internal research and a lot of things going on on Slack. And we find it pretty difficult to find the right stuff. So I think companies like that could benefit. And it would provide a much better user experience

Starting point is 02:39:41 if you could just magically find all of the information that you have in there. Yeah, I've been noticing that with Gmail, like the amount of email has just grown so much and the amount of text in each email has grown because of all the trackers and cookies and stuff behind the scenes. And so when I search for something, it just pulls up completely random emails like every time and it doesn't understand the hierarchy of in an email, I care a lot more about what's in the subject line

Starting point is 02:40:09 than what's in the footer. And so if I'm searching for artificial intelligence or something and someone has that in their footer that, hey, I run an artificial intelligence company, that's not what I'm looking for. I'm looking for the thread that I was talking to somebody, a close friend about AI and I wanna pull that up first. Yeah, I think that's also why, you know,

Starting point is 02:40:27 basic semantic search is just not enough, because it basically will pull all of the similar information, but not the most relevant or the most helpful. Keyword search is the same, it's not very smart. And I think it's just such a waste, because there's a lot of information that you could have access to and it would make your work so much faster and you're just spending time rewriting your question

Starting point is 02:40:50 and trying to make the system understand what actually you're looking for. So I think that the query side of the user intent, query rewriting is also super important. Yeah. I message search, absolute disaster. It's like, I know I'm in a text message with Jordy and someone else. And so pull that up. And it's like, here's six. It never works. I also noticed these fix it all. There's, there's going to be,

Starting point is 02:41:17 we need to be a shift in the way people search. I remember hearing the story about Google, where there was some Google engineer who was running a test on like, how many, it was like, what's the world record for the marathon or something like that? And they were using the typical keyword Boolean search and they weren't getting good results and then they sent it to a user

Starting point is 02:41:36 and the user just asked the question in natural language and it just hit it the first time. And so I feel like people still, at least I have been an email user for a long time, when I go to my email search, I often, I'm searching in the keyword world instead of just natural language, but Google has, I mean, they're experimenting

Starting point is 02:41:55 with the AI search thing, they have a 50 word limit right now, you can't just type a whole prompt in Google search, they need to kind of reimagine what that search box is. And then there also needs to be a consumer change in how consumers interact with that particular UI element, basically. But thank you so much for stopping by. Congratulations.

Starting point is 02:42:17 And good luck to you. Thank you, guys. We will talk to you soon. See you soon. Have a good one. Up next, we have Elliot Hirschberg from Amplify Partners coming in the studio. They are in Datadog, Chain Guard, Runway, in data dog chain guard runway. I love data dog. That's just But I loved it so much it's the greatest company name ever

Starting point is 02:42:36 Right up there with a sleep a sleep calm get a pod 5 ultra Back on back on my game. I'm still Still behind where I was relative to a month ago, but back up into the 75 range. I'm going for 90 tonight. Good luck, good luck. They're calling the Pod 5 the first fully immersive sleep system that works with any bed.

Starting point is 02:42:54 Pod 5 actively adjusts your temperature, elevates your body, and plays integrated soundscapes to improve your sleep. I have a new copy today. Simply too good. Welcome to the stream, Elliot. Hopefully you're here. How are you doing? What's going on? Hey, what's going on, guys? copy today. Welcome to the stream. Elliot, hopefully you're here. How are you doing? What's going on? Hey, what's going on guys? It's a pleasure to be here.

Starting point is 02:43:09 Thanks so much. And I love the suit. You are dressed fantastically. Sign of great respect to our culture. You know, there was like a period where people would actually sort of match the vibes and have the suit for the technology brothers. And I feel like it's dropped off. So I want to bring it back. You know, I appreciate it I appreciate it. We we we were in the boardroom. You know, it is uh, it's I'm glad to be here Yeah, it looks great. Yeah proper uniform I I wanted to get a state of the union from you on a few things But why don't you just kick us off with an introduction on you? Yeah, for sure. My name is Elliot Hirschberg. I started my career as a

Starting point is 02:43:42 Experimental biologist so I was in the lab trying to make new treatments for cancer. Got super frustrated, decided to retrain as a computer scientist. So I became a computational biologist and was obsessed with that as sort of a practitioner for about a decade. And then got really obsessed with writing about it. So I was sort of writing a newspaper called The Century of Biology, writing about it. So I was sort of writing a newspaper called the Century of Biology writing about companies in the space, data on the frontier and then that sort of was a rabbit hole into investing with a friend of the network with none other than Paki McCormick so spent some time in Not Boring where I was writing and investing and then recently joined

Starting point is 02:44:21 Amplify. We just closed 900 million in new capital, including 200 million for a dedicated... Did you just close... Are you announcing the new capital, the new fund today, or was that a little bit ago? It was a little bit ago, 900 million including 200 million specifically for Bio that I'm helping to build.

Starting point is 02:44:42 You guys were not loud enough about that. Well, that's part of the thing. So yeah, we're enough. You guys weren't loud enough about that. No, it's a lot of money. Well, that's part of the thing. I feel like it's a really quiet fund where, for three of the four funds for Amplify, they've been in the top 5% of venture returns. Not top decile, but like top 5%, like really good at what they do.

Starting point is 02:45:02 And I feel like it's just not thought of as much and just like very quiet and stealthily doing really phenomenal work. And so, yeah, excited to talk more about it. Okay, State of the Union on bio. I want to know about where we are in artificial intelligence and technology helping advance bio. We've seen alpha fold, we've seen kind of tools and amazing breakthroughs. I think everyone has a really concrete idea of the impact AI is having on software engineering, whether it's like amazing auto complete, you have cursor,

Starting point is 02:45:37 now you have agents, where are we in the deployment of AI tooling in bio? A lot of the narrative just jumps straight to we're going to one shot cancer. And I love that. I'm optimistic it happens eventually. It doesn't feel like we're there. But where are we actually in terms of the impact on productivity in AI with bio? Yeah.

Starting point is 02:46:00 So you guys know like the Gartner hype cycle, right? Where you have these like huge swings for new technology where I've been working in this field for like a decade and there has been a bunch of companies, you know, Amplify invested in Recursion, which was one of the early leaders of this, right? Like there's been this general sentiment that you can make a huge amount of progress

Starting point is 02:46:17 with new data, new tools, new technology and life sciences. And it just turns out that it's like a really hard problem, right? And it takes time. It takes some time for the market to ingest that, actually figure out the right business models and strategies. There was a really strong wave of early adopters. Then there was some disillusionment and disappointment where it's like, oh shit, it actually turns

Starting point is 02:46:39 out that it's really hard to one-shot a cure for cancer. Then as that sort of happens, there was just a bunch of breakthroughs in the technology. So it became consensus that this is making a huge impact on hard problems in biology, where we had the Nobel Prize for Alpha Fold. So like a Nobel Prize going to an AI lab to Dennis and part to David Baker at the University of Washington, because it's actually starting to make real, meaningful impacts on hard problems in biology. And so that's true for sort of molecular machine learning, where you're thinking about designing

Starting point is 02:47:15 new molecules and proteins, there are virtual cell research, we're trying to actually model how cells behave. And so you're seeing this sort of step change now where like there are a couple sort of, you know, actually faster than Moore's law curves. DNA sequencing is decreasing in cost faster than Moore's law. And so you get this huge data tailwind plus the tailwinds in machine learning and modeling. People are actually starting to scale these models

Starting point is 02:47:41 and it's just like getting pretty impressive pretty fast. Where are we on new drug discovery companies that we're targeting a single thing, we're going to build a drug to solve a problem versus we're going to start a company that's SaaS, it's tool, it's going to help with all different drug companies. What's working? What's more overhyped, underhyped? What's your take on like picks and shovels versus drugs basically?

Starting point is 02:48:09 Yeah, so like short answer is we do both. So you guys had Jake on the other day at Centivax, like absolutely insane founder, was one of the early computational immunologists at Stanford and like he's making a medicine that you just couldn't otherwise make without these technologies, right? Where it's like all these impacts within biotech and within modeling to make these just really

Starting point is 02:48:29 incredible drugs you couldn't otherwise make. And so that's people just making these singular things that are really phenomenal. There's also a real platform opportunity there for other things beyond the universal flu vaccine. And then I think if we take a little trip down history lane and think about how hard it was to actually sell software in the life sciences, one of the early companies in this space is Schrödinger. And they've been around for about 25 years.

Starting point is 02:48:52 They're a molecular dynamics company. And it just took an extraordinary amount of time and effort to actually saturate and get people to adopt the technology and get people to pay for it. And they're a phenomenal business, they're a public company, they're actually vertically integrating into making their own drugs. But the thing that we're hearing consistently,

Starting point is 02:49:12 like from CEOs of top pharma companies is just like there's huge demand for new infrastructure. People realize that this technology is here now and that they need to adopt it. They're hearing this from their shareholders, they're hearing this from the scientists at their companies. And so there's just a very different moment post Alpha Fold and post even chat GPT

Starting point is 02:49:32 where like they're using it, their kids are using these models and they're just like, oh, I really actually need to adopt this. And so I think opportunity if you're both like fundamentally new picks and shovels where you sort of replace experiments with compute And then also just fundamentally new drug products Jordy last question

Starting point is 02:49:52 No, I mean, I think we should have you back on as new Yeah, I think they'll do the TVPN bio drop-in, you know Feels like there's, within the traditional labs, bio has been used as almost like marketing, right? Or even Elon yesterday saying, we're going to discover new physics. And I don't think that that's obviously not

Starting point is 02:50:23 where a lot of the true innovation is happening. So Yeah, let's make it a regular thing. Yeah You think about that right like there's just been a couple things like I didn't invest really made a joke that like there's just a Couple things that have consistently delivered venture returns and that's like software and also drugs Hmm. And so like as far as the physical prediction goes for being something that's super valuable, if you can make your inference be a billion dollar drug product, that's a pretty good spot to be.

Starting point is 02:50:55 We're excited about it. So see you guys soon. There's, I forgot who we had on, when Trump had the executive order around drug prices, we talked to a few different people that were saying, biotech has on average been a terrible asset class. And there's some amazing outliers. Is it like Weinberg maybe?

Starting point is 02:51:14 Yeah, basically there's all these amazing outliers that do deliver returns. But if you just index the market, you are going to underperform dramatically. And underperform ventures specifically. Yeah, and the venture category feels like this could be a massive shift where suddenly the next five years become the golden age of venture

Starting point is 02:51:35 bio-investing. I mean, there have been some massive companies before it. So you have breakouts like the Genentex of the world, huge companies. There actually is. You guys should have Bruce Booth on, who a an og biotech investor at Atlas ventures He's done a bunch of analysis and the fact that like there's actually some interesting just return data for biotech versus tech where it's like Not as gloomy as you as you would think but I think in general like biotech starting right now

Starting point is 02:52:00 We want to make a world where it's actually like engineering right and that you're actually just getting these like really scurable amazing Medicines and like I think that's where we're headed Incredible. Thank you for joining great to have you on. Yeah Alright, bye. Yeah, and next up we have Kareem from ramp the man himself and to talk about the launch new agent launch Is he in the waiting room? We will bring in Kareem from rank to check. Second time on the show, he hopped on at Hill and Valley. That's right. First time as a remote guest.

Starting point is 02:52:34 Great to see you. How you doing Kareem? Hello, great to see you guys. Can you hear me okay? Yes, loud and clear. I don't think you need much of an introduction, so why don't you just kick it off with the announcement and break down the launch today, and then we'll have a bunch of questions.

Starting point is 02:52:47 Yeah, of course. I mean, it's been a very exciting day for us at RAM. We finally announced our, it's our first agent. We're going to be announcing a lot more agents soon, so it's hard to keep track sometimes. We've been playing with a lot of tech internally. We think we're in a very interesting space where maybe, maybe the thing about it is a lot of people from the outside look at ramp and think of maybe visualize the card. They think about the, the FinTech aspects,

Starting point is 02:53:18 but at the end of the day, like what we're really trying to do is help reduce the drag on companies that happens when there's just a lot of bullshit work in between teams and a lot of papers being passed around, questions being asked, the things that really get in the way of doing work. And that first agent that we're building is really just that, like it operates in the messy middle between finance teams and every other team trying to spend to move the business forward. And yeah, that's basically what we launched today.

Starting point is 02:53:52 So it's an agent for controllers. It knows a lot more about the expense policy of a company, the rules that are in place that govern spend, than any single employee. And it knows a lot more about every single transaction than any single person on the finance team. So it can operate in the middle and automate all the little decisions and the extra work that needs to get done to figure out what's in policy or not.

Starting point is 02:54:23 And it's immediately available. Like that's part of the power, right? Is that if you're in the sense of if an employee wants to decide whether or not they can buy something or if something's in policy, you no longer have to be slacking somebody. It could be in the middle of the night or something like that or off hours where there's creating that drag,

Starting point is 02:54:43 that delay, right? 100%. That's certainly one part of it. You can ask questions about your policy and ask questions about specific transactions live to figure out whether they would be in or out of policy. But more interestingly, once you make a transaction, it's already doing work to go and figure out, well, that transaction that you made at that restaurant, it looks big, but if that was a dinner with 10 people, maybe it's not as bad as initially thought and that's actually impulsive. So, well, that information is in your calendar, it's in your email, it's in sometimes outside

Starting point is 02:55:18 of just the immediate context of a transaction. So the agent will go out on the internet in some cases, contact vendors or pull data from APIs on your behalf to really gather all that context and make better decisions on behalf of the company. I wanna talk about like this, the word agent and the decisions to, like how agents are fitting in the different stack of a tech company these days,

Starting point is 02:55:45 because like there is kind of always, there's kind of always been an agent behind the scenes working. You think of these as like cron jobs before. It's like there is a long running process that when a receipt comes in, it gets tagged. And there has been for I think years, I don't wanna share anything you can't,

Starting point is 02:56:06 but there's been an LLM interacting with receipt data for a long time, but it's been fully agentic in the sense that it was behind the scenes. And so I've been thinking about this in the context of like meta and some of the value that Zuck is gonna be getting from having a frontier AI model. It's like there's so many workloads inside a business that has billions of users that just happens

Starting point is 02:56:30 behind the scenes and these are agents, but they're like almost internal agents. And so I'm wondering about your decision to, yeah, position an agent as like, this is a user facing agent versus something that we're just going to have a process that's running behind the scenes entirely.

Starting point is 02:56:48 Well, there's a bit of a difference, right? Because when you think about these processes running behind the scenes, for the most part, the code is pretty deterministic. The tools are the same. It's built for accuracy and auditability and you have a high confidence. You can trace back the path that the old school agent,

Starting point is 02:57:10 let's call it that, went through exactly. And in this case, it's less deterministic. You give the agent a set of tools. You can tell the agent, or you can essentially give it access to, let's say, ability to call, ability to email, and it can be like, go figure out a way to get the receipt, that's what you know about the restaurant.

Starting point is 02:57:31 And it will browse the web and figure out that that's the phone number of the restaurant, and then try to call the restaurant, and if that doesn't work, then it will try to email the restaurant until it achieved that goal of getting you the receipt, or it fails and you can then interact with it.

Starting point is 02:57:49 In this case, the instructions that we are giving the agent as we're building it are very high level. You're just giving it high level instruction and access to tools. That's very different from the old way of building these processes in which you had to be very specific about all these paths. So it would take a lot longer to build these systems, to debug them, to update them, etc. We lost you. Your background is turning like a ghost. It's very funny. It's a super intelligence. I think you just need a little bit more light on your face.

Starting point is 02:58:23 I actually lost power. Wait think you just need a little bit more light on your face. I think I actually lost power. Wade, you lost power? There we go. I'm back. That's wild. Much better. I want to talk to you about the data walls that are going up and some of the battles that are playing out in the enterprise world because when I read stories about companies that want to do enterprise search search, you can see that, well, maybe Google doesn't want you to be taking, maybe they want that for themselves. Ramp's in a very different position,

Starting point is 02:58:53 but at the same time, there's just evolving policies about how friendly, this is a classic with Amazon not sending the itemized receipts to Gmail, because they just didn't want to give Google the data. But as a ramp customer, I want the Amazon details pulled in three via Gmail via the ramp integration. So talk to me about like, how's the broader trend playing out? And then how do you go to big companies and say, hey, like, you know, work with us,

Starting point is 02:59:27 our clients want to be able to pull data from your service and we're not going to build a delivery network Amazon. So you're not, you're not, we're not a competitor for you. Yeah, for sure. I mean, most of the data that we need at the end of the day is like data that is quote-unquote owned by our users, businesses that are on ramp, their employees. I think it's a little bit easier to operate in the B2B space because like those, I guess what governs who owns the data and whose data it is a lot clearer than in the a lot of consumer applications. So I can in our case, it's like, what data do we really need to know to, in the case of the agents that we just launched to figure out whether something's in policy or not, it's metadata about the

Starting point is 03:00:18 transaction, right? Like, what's in the receipt at the end of the day, like stores all your your receipt that's your receipt, right? We get that information. You have information that we get through the networks, through Visa, the metadata about the geographical location of the transaction, maybe whether it was an in-person transaction or not. There's data that's in your inbox, in your email, which again, like that information

Starting point is 03:00:43 is also owned by the company. We haven't really encountered a lot of pushback and challenges. I found most of the challenges in getting the data to be more like technical. How do you make sure you get it quickly, clean it up, and get it accurately, as opposed to ones where there are third parties that are trying to make it harder and harder for us to access the data. We had been in that in the previous company. Yeah of course we had a lot of these problems. I mean there were lots of funny moments at Paribas or previous companies where we were I mean we're really building an agent for consumers to help them save money on their online shopping, right? And we're trying to log

Starting point is 03:01:30 in on their behalf to Amazon accounts and Walmart accounts, etc. And of course, they'll put blocks, they'll put captures. And today those captures seem like a joke. I think any version of any half recent version of Chad GPT or Claude is able to solve this captures very easily But that's one of the ways the internet's getting worse right now is it captures are getting so hard and annoying You know when we go to the gym in the morning Jordy has to log I get logged out like two minutes to get through the captures this gym It has like the most like military-grade security to get to a gym login It's just it just gives you a barcode that you just scan as could but I bet it but on that

Starting point is 03:02:08 I'm actually interested because I can imagine You know ramp has tens of thousands of customers like high value business customers and other people that are building agents I'm sure would love to actually be able to make actions on the ramp platform, but at the same time you guys are trusted to handle the finance, basically the finance back office for these companies. You don't want an agent hallucinating, like saying, based on and taking actions on the ramp platform.

Starting point is 03:02:40 So I'm curious how you see that dynamic playing out, because I'm sure you've been approached by a lot of companies saying, hey, we're building this agent to do this thing. We'd love to be able to get authorization. Of course. We're thinking through that a lot right now. I think there are good ways of exposing the right information to the right agent, as long as our customers are very aware of what they're exposing and there are lots of interesting applications for us to work on. In the case of any large purchase at a company, there are multiple parties within that company that need to review it or approve.

Starting point is 03:03:22 You want to review a certain vendor and look at their data protection policies. You want to look at the legal agreements. In some cases, you want to negotiate the price. And you can imagine a day in the future where a lot of our customers have an agentic tool that they trust or agents that they trust for legal work, agents that they trust for legal work, agents that they trust for IT work, etc. And we're very interested in actually working with some of these companies, but we've got

Starting point is 03:03:53 to figure out on our end how we expose the right interface so that we're ensuring really like the security of the data of our customers. So it's an ongoing, uh, uh, work stream. Uh, last question for me, um, the, the, the grok four launch was very benchmark heavy. Um, it seems like, you know, the consensus is that, uh, it's a good model. And so as soon as I see that, it's now about costs per token. And so I wanna hear from your perspective,

Starting point is 03:04:30 what drives decision making? How big of a line item roughly, or how much time is spent thinking about LLM inference optimization at your scale? Like roughly, how big of a deal is it? And then what is the workflow to decide? Can we use a cheaper model? How do we, do you have internal benchmarks? Are you just checking these things?

Starting point is 03:04:53 Like how are you making decisions about which model to use for what problem? Yeah, that's a great question. I mean, I'm a lot more paranoid about being too slow to try the newest model and the latest and greatest tools than I am by maybe overspending a little bit in one area. I mean, the amount of time and money wasted at companies doing BS work is just insane. If we're debating whether you can make something faster by spending an extra dollar or half dollar, the value that we're able to create is so big that I don't worry about it too much.

Starting point is 03:05:37 But we do have internally somewhat imprecise stack ranking of the different places where we need to make inference calls and in some cases they're very simple, high volume, kind of low risk right like you're trying to normalize or clean up some like merchant data to figure out the appropriate spelling and maybe they're right like photo to use. It's not the end of the world if it's not perfect. We're doing it at high volume, it better be cheap. So we have a kind of stack ranking of like, this is something high volume where we need to be cheap.

Starting point is 03:06:14 This is something that's low volume and high stakes where you need to be accurate. And we'll generally try the newest and greatest models and the places where we think will make the biggest difference. And over time, like we'll break up some workflows and some parts of it will become cheaper, more repeatable with smaller versions or cheaper versions of the model and it will just evolve. I mean, we come from, I mean, I remember like micro optimizing every single thing on our AWS account back in 2014, right? Like we

Starting point is 03:06:50 were, it was it was a lot harder back then. Like, I think we, we also pride ourselves in being the time and money company. So we do care a lot about making sure that we don't waste our own money in our own time. But I would say that the TLDR is like our time and engineering time is the most valuable thing here. And I'm a lot more focused on that than anything else. Yeah. On the time issue, uh, what do you think about, uh,

Starting point is 03:07:16 the various latency trade-offs? I'm sure if, if, if, uh, if an employee wants to know, is this in policy and you hit Oh three pro and it waits 10 minutes. Like they're probably just going to slack their manager and ask them. Um, but you're going to get a really accurate answer. That's really detailed. And so how do you think about those trade-offs in latency? Yeah, I mean, it really depends on like where, where the workflow are we making the reference call, right?

Starting point is 03:07:40 Like if it's live in the interface and the user expects a quick answer, we'll be using some of the faster models. But the reality is like a lot of these agentic workflows that are being kicked off at ramp like happen behind the scenes, right? Like you make a transaction, you maybe get a very quick question from ramps AI to gather a little bit more context. Like that's enough. And then from there we'll kick, we'll kick off another task

Starting point is 03:08:07 that can be a little bit slower, that'll happen in the background. And by the time it reaches a bottleneck, or it'll reach a place where it needs additional feedback, it'll be in someone else's notifications or on someone else's Slack. Like you could take a little bit of time when the work is going from one person to another person

Starting point is 03:08:27 But less when it's like the same person interacting with the interface live That's yeah, that's generally the thinking but cool. Yeah, I have I Tried some of the the newer brow the newest browsers today and and like I tried some of the newest browsers today, and I tried Palmit today, I tried DIA a couple weeks ago, and I think what they're trying to do is incredibly cool, but I often find myself thinking like, damn, I wish this was a little bit faster.

Starting point is 03:08:59 And I know it's coming, but I think, unlike some of the browser agentic calls, you want it to be really fast. Yeah, I was thinking about that in the context of the OpenAI browser. And unless they figure out something that makes it basically 10 times as fast, I'm still going to default to Chrome

Starting point is 03:09:19 if I have both of them open, just because I'm like, well, I just really need a fast answer here. So I kind of expect them to need it. I mean, that was the Chrome innovation, right? Chrome 1 on speed. Like they just optimized the code and they nailed speed and it was enough to leapfrog. And so you could see, I mean, that's like the bull case for like Apple coming from behind is like, yeah, they like, it feels like if XAI and entropic and and open AI are all kind of like Gemini or like roughly at the frontier if you can just get something that's at that frontier not any new innovations

Starting point is 03:09:53 But hyper optimized and it runs locally on your phone and and it's put spitting out like tons of tokens every second Like you have a product that would be very very it would be very rapidly adopted you have a product that would be very, very, it would be very rapidly adopted. Uh, it's exciting. It matters a lot. I mean, I think one of the weirdest UX patterns on, on, on Shad's, GPT now is that I have to do the work to figure out whether to use, uh, oh three or three or four. Oh, every time. Do I have 10 minutes or do I want the end? And four, oh, is always,

Starting point is 03:10:22 it's so good that I usually don't need to, but then I'm just like, well, I want the best, of course, and like, I'll come back to it. And it's such a weird paradigm. It's gonna be something that dates us. And I just know our kids are gonna be like, what did you have to do back then? You had to rewind the VCR tape.

Starting point is 03:10:39 You had to put the disk in the Xbox. You had to pick which model to use. This is insane. It's so legacy and it's going away, but we're just in this weird, you had to pick which model to use. This is insane. It's so legacy and it's going away, but we're just in this weird, like, we don't have a model router solved. And it feels like the easiest thing is like, which model should we use for this?

Starting point is 03:10:55 I don't know. We'll see. Yeah, and if you, I mean, I don't know if you guys, I grew up in Lebanon, I still remember the days of dialogue where you would have to uh, uh, you kick everyone else off the phone line. Well, exactly. Well, select the phone line in that case. Like, okay,

Starting point is 03:11:11 which phone line am I going to use? Like, I don't know. Can't you tell me which one is free and like pick it for me? No, I saw it. Yeah. It seems like the easiest thing to do. And also, I mean, this is just, uh, you know, complaining about the app that I use 30 minutes a day, at least, chat to me, but I almost wish I could just define it in the prompt and just say, hey, use O3 Pro, and then here's the prompt,

Starting point is 03:11:35 as opposed to needing to click the UI, change it, switch it, and then pick, instead of just being able to go back and forth. I don't know. I mean, it's a good sign because people are using this stuff so much that they're frustrated by these niche UI things. So, it's an exciting time. There's a lot of, I forgot who it was who posted on X.

Starting point is 03:11:55 I think it was a couple weeks ago. Did every company is one great UX breakthrough away from something amazing? And I think that will be true for a long time. There's a lot of alpha right now and just great UX and good patterns. We haven't figured it out. We're still in the maybe terminal phase of personal computers, right? Like, when is the mouse going to come out? When are the right GUI is going to come out? Like, there's a lot of that happening right now. And yeah, it's a fun, fun time to be building. There's a lot of that happening right now. And yeah, it's a fun time to be building. One last question for me. On Monday, Dwar Kesh released an article

Starting point is 03:12:29 and then came on the show kind of talking about his timelines around when an AI agent would be able to do his taxes, right? Sort of like basically fully agentic experience being like, I want to do my 2025 taxes. And then it just sort of autonomously runs. How do like big, how are like, you know, Fortune 500 CFO, like what are their timelines around?

Starting point is 03:12:56 Maybe you just tell them what the timelines are. Like, okay, by 2028, you know, we're gonna be able to do this for you. But how is the sort of finance arm of the C-suite kind of anticipating the rate of advancement? Obviously, the agent today is a step towards that future, but you'll obviously need a variety of different agents. Well, I think in terms of capabilities of LLMs,

Starting point is 03:13:28 we're there. We have the capabilities. The bottleneck on being able to do this today is having the right context, right? So while some of that context is in my head, so the AI needs to know to ask me the right questions efficiently so I can answer those, right, even when I'm working with my accountant. Like, pick the best accountant in the world

Starting point is 03:13:51 for your personal taxes. If you just tell them like, find my taxes, they can't do anything. Maybe you tell them, find my taxes, and here's access to my email, they could do a little bit more, but they can't get it fully. So tell them like, find my taxes,

Starting point is 03:14:03 here's access to my email. you can call my wife as much as you want, you can look through my drawers and give it more and more of these things. Maybe it can do it, but it's going to get lost, it's going to take forever. And really what we need to do even for businesses is like what are the right patterns for us to extract context that's in people's heads, organize it, get them comfortable with connecting different tools like your inbox and things of that nature. And I think in terms of tech and capabilities, we're there.

Starting point is 03:14:40 We're not, we're not really missing anything. So there's a lot of UX and plastic like yeah, we all Can email me a question and put it in my inbox, which is effectively my to-do list And that's what my accountant does when that access happened. They email me and say that's why this gotta do this You can just you can take a picture of a product There and ask it if I buy this, you know, yeah policy. Yeah Yeah Well, thank you so much for stopping by

Starting point is 03:15:11 Great. Well, we'll definitely see you soon Kareem. This is great. Good to the whole talk soon talk to you soon. Bye And that is the rest of our guests we are through that in other news Periodic labs. There's this scoop from Natasha Mascarenes, the startup being co-founded by Liam Fetus and Eric Dogus Kubuk, great names, is in talks to raise hundreds of millions of dollars in funding and above a $1 billion valuation, the two-month-old startup is looking to apply AI

Starting point is 03:15:43 to physical science, starting with discovering novel materials Let's give it up to the two-month old unicorn. Oh, we got to have these guys on the show is extremely fast I also like this person David Perel where we're getting back on the show a sap We had a lot of fun talking to him a couple months ago He said I'm touring apartments in New York and just about every new build has the same soulless aesthetic, flat walls, white paint, no cornices, no ornamentation, just a room in a box. Only one real estate agent said to me, if you want something with character you're going to have to stick to pre-war buildings. Look, I'm all for some efficiency gains but we've created a world where new things are soulless

Starting point is 03:16:18 things and that's how a society as modern as ours and that's not how a society as modern as ours should function.'s not how a society as modern as ours should function. Intuitively, you'd think that a wealthier society would build more beautiful things, but not ours. And I completely agree. What's crazy is that this isn't, I mean, these apartments look nice,

Starting point is 03:16:39 but this continues all the way to $20 million houses that are still bland. And I think it's mostly because maybe time and all the difficulties with permitting, because if you even have the resources to build something from scratch, creating, okay, I want these ornaments, and I want this, and I want something

Starting point is 03:17:04 that's really expressive of my personality, well now if you want that, no one else wants that, so you have to build it and you have to underwrite it and you're gonna be underwritten. Make sure it's to code. Make sure it's to code and then get it built and then the secondary market value is gonna be less

Starting point is 03:17:22 because not everyone wants Hurst Castle. Whereas if you build, if everyone builds the exact same thing, they're perfectly, it's perfectly liquid market because every apartment is interchangeable with every other. So it's kind of a function of just like modernity, but it's more a function of people not,

Starting point is 03:17:40 just risking it ever on building a disaster project, making their forever home. People learn the lesson of William Randolph Hearst too much. They should have just like never learned that lesson, ripped it and just send it and just build something that no one else will wanna buy and will take decades to build. That's always the best.

Starting point is 03:18:00 Well, I have a good place to end it. Rob Petrozzo says the original Hermes Birkin bag prototype just sold for $10 million at Sotheby's. There was a two minute standing ovation. He says bull market confirmed. And a gong hit. We love a bull market. The original prototype.

Starting point is 03:18:19 Fascinating. That's wild. Makes sense. Very cool. It's incredible lore. And I wouldn't be excited for a bull market and alternative assets such as Birkins be great and You should be too, but uh that's a great show folks. We will be back tomorrow I cannot wait we will talk to you tomorrow talk to you today cheers. Bye

TBPN Live - Grok 4 Launch Breakdown, OpenAI to Release Web Browser | Chris Paik, Will Bruey, Joel Becker, Dylan Parker, Eric Olson, Ghita Houir Alami, Elliot Hershberg, Karim Atiyeh

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.