TBPN Live - Grok 4 Launch Breakdown, OpenAI to Release Web Browser | Chris Paik, Will Bruey, Joel Becker, Dylan Parker, Eric Olson, Ghita Houir Alami, Elliot Hershberg, Karim Atiyeh
Episode Date: July 10, 2025(02:28) - Grok 4 Launch Breakdown (36:09) - Open AI to Release Web Browser (50:31) - Apple Plans to Release New Apple Vision Pro Model (58:48) - Chris Paik, General Partner at Pace Capital... and former Partner at Thrive Capital, discusses the evolution of human-computer interaction, emphasizing the potential of technologies like eye tracking and gesture recognition. He explores the rise of VTubing and YouTube's impact on digital content, highlighting how virtual avatars are reshaping the creator landscape. Paik also introduces the "atomic value swap" framework for assessing market fit and business model alignment, stressing the importance of balanced value exchanges to ensure platform success. (01:33:50) - Will Bruey, co-founder and CEO of Varda Space Industries, discusses how microgravity enables the creation of pharmaceutical formulations not possible on Earth, as gravity affects crystal growth and particle size distribution. By manufacturing drugs in space, Varda aims to produce purer and more uniform medications, potentially improving patient outcomes. Bruey also outlines the company's plans to scale up production and establish reentry sites globally to meet the growing demand for space-manufactured pharmaceuticals. (01:53:02) - Joel Becker, a researcher at Meta, discusses a study evaluating the impact of AI assistance on experienced open-source developers working on large, long-standing projects. Contrary to expectations, the study found that developers who used AI tools were actually slowed down, rather than sped up. Becker emphasizes the need for further research to understand these findings and to assess the potential for AI systems to autonomously improve their own capabilities. (02:13:07) - Dylan Parker, Co-Founder and CEO of Moment, discusses his company's recent $36 million Series B funding led by Index Ventures, the evolution of fixed income trading from manual to electronic systems, and Moment's role in providing modern fixed income infrastructure for financial institutions, including a partnership with LPL Financial. (02:22:35) - Eric Olson, co-founder and CEO of Consensus, an AI-powered search engine for academic research, discusses how Consensus leverages large language models to provide researchers with faster, evidence-based answers from peer-reviewed journals. He highlights the platform's dedicated focus on academic content, enabling more intelligent searches and citation-forward interfaces tailored for researchers. Olson also addresses the challenges of accessing paywalled content and emphasizes the importance of open access to scientific literature. (02:33:27) - Ghita Houir Alami, co-founder of ZeroEntropy, holds two master's degrees in applied mathematics from École Polytechnique and UC Berkeley. She discusses her journey from computer vision to large language models, leading to the creation of ZeroEntropy, which focuses on enhancing retrieval systems for AI by building search tools for Retrieval-Augmented Generation (RAG) and agents. Alami emphasizes the importance of precise and efficient retrieval to prevent AI hallucinations and highlights the company's recent release of a reranker model to improve search accuracy. (02:42:53) - Elliot Hershberg is a biotech scientist, writer, and investor who has contributed to cancer vaccine design, developed computational tools for spatial genomics, and worked on genome browser software. He authors the "Century of Biology" newsletter and has served as Biotech Partner at Not Boring Capital, focusing on synthetic biology investments. In the conversation, Hershberg discusses the integration of artificial intelligence in biotechnology, highlighting its transformative impact on drug discovery and the development of innovative medicines. (02:52:22) - Karim Atiyeh, co-founder and CTO of Ramp, discusses the launch of Ramp's first AI agent designed to streamline corporate expense management by automating decisions between finance teams and other departments. This agent, knowledgeable about company expense policies and transaction details, reduces the need for manual approvals and enhances efficiency. Atiyeh also highlights the agent's ability to gather contextual information from various sources, such as calendars and emails, to make informed decisions, thereby minimizing delays and improving compliance. TBPN.com is made possible by: Ramp - https://ramp.comFigma - https://figma.comVanta - https://vanta.comLinear - https://linear.appEight Sleep - https://eightsleep.com/tbpnWander - https://wander.com/tbpnPublic - https://public.comAdQuick - https://adquick.comBezel - https://getbezel.com Numeral - https://www.numeralhq.comPolymarket - https://polymarket.comAttio - https://attio.com/tbpnFin - https://fin.ai/tbpnGraphite - https://graphite.devFollow TBPN: https://TBPN.comhttps://x.com/tbpnhttps://open.spotify.com/show/2L6WMqY3GUPCGBD0dX6p00?si=674252d53acf4231https://podcasts.apple.com/us/podcast/technology-brothers/id1772360235https://www.youtube.com/@TBPNLive
Transcript
Discussion (0)
We are live from the TVP and Ultra Dome, the Temple of Technology, the Fortress of Finance, the Capital of Capital.
Today we are covering Grok 4 launched. We're gonna break that down.
The third browser war has begun. Every artificial intelligence company is getting in the game.
Launching new browsers.
Get yourself a browser.
The new Volkswagen electric bus is a flop according to the Wall Street Journal. Ouch. Apparently Linda
Yakarino was not fired for the grok dust-up with the crazy hallucinations that were going on
We have more details there and well and I don't be oh, I would anyone think that she was considering that
That was the timeline that was for sure in the timeline people were talking about like oh like this happened
And she stepped down within like six hours. Oh
Really? Yeah, my read on it was clearly grok and X AI or not her domain
and and
They they were grok was saying some things about her that should never be said yes, and
My read on it was, who knows?
When Elon commented on her post and said,
thank you for your contributions,
it's like the boilerplate text.
And so I'm sure that their relationship
is maybe not as good as it was day one.
But I almost thought it was maybe the,
you know, Mecca was the straw that broke the camel's back,
and she basically said, look, like, you know,
I can no longer, you know, bet my career on this platform.
Maybe, yeah.
I mean, we can debate it.
There's more reporting in the Wall Street Journal
about what actually happened,
and that story, which we'll get into,
is kind of pointing to this idea
that it had been doing the works for a while.
And that was not the straw that broke the camel's back.
That was like, the papers had been signed.
Everything had been signed before that.
And then the dust-up happened with Grok.
But the bigger news is that they actually got Grok 4 out,
and people are excited about it, so we'll talk about that.
And then the other Grok, GROQ, the CEO of which,
of that company we had on the show.
Is it yesterday?
I'm losing track of time.
It was very recently, no, it was Tuesday.
Tuesday.
Apparently they're out raising it six billion
and we have some more details on that company,
so that's interesting.
Anyway, let's tell you about ramp.com,
time is money, save both, easy to use corporate cards,
bill payments, and a whole lot more all in one place
They have a new agent launch today. Yep. We'll be joining later in the show for him to break it down
So let's break down the grok for launch
DD dos has a summary in saying that Elon Musk has pulled it off again
Absolutely crushing the AI wars with grok for and we can go into some of the meta rushing the benchmark wars
For sure, and there's a question about like,
are we post benchmark?
Does this matter?
What's the real question to be asking here?
But there's a bunch of interesting takes.
So just summarizing the core announcements,
post training RL spend was equal to pre-training spend
for this release.
That's the first time it's ever been like that.
I think when you go back to the original RLHF stuff that ChatTPT was doing,
that kind of unlocked like, oh wow, this really, really works.
I'm pretty sure the pre-training spend was an order of magnitude or two orders of
magnitude bigger. Now we are truly in this reinforcement learning regime.
$3 per million input is tokens,
$15 per million output tokens 256,000 token context
window price 2x beyond 128 K it's number one on humanities last exam which
interestingly was effectively like postgraduate PhD level problems but
across a bunch of different domains. So everything from literature to physics.
Yeah, kind of like the hardest SAT possible.
Interestingly, I believe that benchmark
was created by Scale AI.
And so Alex Wang is now at Metta trying to figure out
how can we beat our own exam.
And Elon's just like, I'm number one at your thing.
Interesting dynamic.
Yeah, the real test would be Elon doing the same problem
set himself and saying, look.
Well, yeah, I mean, I was talking to Tyler
about this before the show.
It's like, humanity's last exam.
It's like really good at PhD level math, PhD level stuff.
But how often are you running into those types of problems?
Yeah, I mean, I think that's the whole thing about there's
this concept of spiky intelligence, right?
Where it's like, okay, it's really good
at this very obscure problem that I never deal with,
but if I have a super long kind of context window,
there's no kind of long-term,
it just completely loses its footing,
and then it's useless.
Yeah, we're kind of in like less of the benchmark regime and more of the agentic,
like how long can the agent run? So it's like,
we are in the 15 minute AGI regime.
Maybe this is 15 minutes of like even better AGI,
but we want to go to 30 minutes.
Well and Dwar Kesh on Monday, this, you know,
takes me back to him talking about continual learning
being the next problem that we really need to solve
because it's great if you have a PhD level expert
in your pocket that can solve any problem
in any domain almost instantly,
but if it can't learn and take feedback
and improve on certain tasks,
then it's basically useless.
If you had a PhD level,
a PhD join your team to work on a specific problem,
but it was hard restarting at the beginning
of every single task with no prior knowledge,
it would be almost impossible for that person to succeed.
So, human still got it on that front.
But at the same time, if you are trying to just really establish yourself as at least an API for tokens
that every business should check out against Anthropic
or the OpenAI APIs, just saying, hey, we're on the frontier.
Or Gemini, yeah.
We're on the frontier is Gemini yeah we're on
the frontier is a good way and they certainly proved that with a GQA
graduate math problems at 88% the the really interesting news yeah the
interest I mean it's worth calling out it's worth calling out so Grok got
number one on humanities last exam at 44.4% number two is sitting at 26.9% and then going down this list of all these
different sort of challenges. They are consistently well beyond the second place. So they are
at the frontier now of all these different benchmarks.
Yeah. So Mike Newp over at Arc AGI says, zooming out on Arc progress. I'd say OpenAI's O-series progression on V1
is a bigger deal than Grok's progression on V2 so far.
The O-series marked a critical frontier
AI transition moment from scaling pre-training
to scaling test time adaptation.
And this was the O-series progression,
if you remember that, OpenAI was spending,
it was like thousands of dollars of reasoning tokens generated in the test time inference
to actually get a good score on the V1 of ArcGi.
And so it had to think a ton,
but it was able to figure it out,
and at least it proved that throwing a ton of tokens
and a ton of inference at a problem
and letting it cook, basically,
wound up producing progress there.
So that was kind of like a new, just a new paradigm.
Says, whereas Grok 4 mostly takes existing ideas
and just executes them extremely well,
in my opinion, the notable thing is the speed
at which XAI has reached the frontier.
And that is really, it just can't be understated
that this is crazy.
You put a post from Owen in the chat.
I'll pull it up here.
He says, Elon Musk is such a beast.
I'm not even a pure fan boy anymore.
How does he, he's a lot of swearing in here, Owen.
Gotta keep the timeline PG. But how does he come out of nowhere with aaring in here on got to keep the keep the timeline PG
But how does he come out of nowhere with a cold start late to the game and ship grok 4 and do it alongside everything else
He's up to he's launching new political parties. He's literally magnitudes above every founder. It's humbling
So basically everyone is that it's almost like he was a co-founder of open AI
You would have to you would have to you know be it you know almost be a co-founder over thereAI. Yeah, I guess he's returned. You would have to, you know,
almost be a co-founder over there
to be able to do something like this.
Let me tell you about Graphite.
Code review for the age of AI.
Graphite helps teams on GitHub ship
higher quality software faster.
You can get started for free at graphite.dev.
If you want to ship like Ramp, get on Graphite.
Yeah, Chamath was saying the same thing.
Somebody in his reply says,
seriously, how does this guy produce what he produces?
Meta is buying talent at $200 million a year,
and Elon keeps his people at a fraction.
It's mind blowing.
Very deeply underappreciated edge for Elon, says Chamath.
The retention of the best people happen
when you can offer them a freewheeling culture
of technical innovation, no politics, and few constraints.
And people in the comments are like, no politics.
What are you talking about?
Yeah, can get a little political over there, but.
But probably not within the engineering org at XAI, right?
Like it's probably just, okay,
how do we build the biggest thing?
Cool.
Well, you can imagine the politics of like,
who gets the best spot for their tent in the office.
The tent.
You know, there's a hierarchy
Proximity to the bathroom. I want to be directly under the air conditioning unit
I want to be closer to my desk. The windows can be nice too
So you can you know pull down your tent a little bit and get a little view morning light
I wonder what the political structure is of the tent. The tent hierarchy.
So is there, is there, is there democracy? Do they vote for who runs the tent city?
I guess it's just a.
The XAI tent city.
It's probably just Elon at the top,
but does he have a tent?
Something about San Francisco in tents.
Yeah, very funny.
But Swix has been chiming in saying like,
we need community notes for LLM benchmark porn,
because in the GROK 4 launch,
they highlight this AIME competition math problem,
and so Matt Schumer is basically saying,
AIME is saturated, let that sink in.
GROK 4 got 100%, it made no mistakes on that benchmark,
which is obviously very impressive,
but there's this extra comment about the nature of AIME,
and so it's a cautionary tale about math benchmarks
and data contamination.
Apparently, you know, predictions were that
the models weren't smart enough to actually solve these,
but he says, I used OpenAI's deep research
to see if similar problems to those in AIME
exist on the internet,
and guess what, an identical problem to Q1,
question one of AIME 2025, exists on Quora.
I thought maybe it was just a coincidence,
so I used deep research again on problem three,
and guess what, a very similar question was on math
about stack exchange.
Still skeptical, I did problem five,
and a near identical problem appears on math stack exchange.
And so, at a certain point, if people put out a benchmark,
then talk about it a lot online,
and then that gets baked into the training data,
you're just memorizing the results.
You're not necessarily actually learning everything.
It's still cool, it's good.
It's good to have everything memorized,
but it really is not beating the knowledge retrieval,
knowledge engine allegations.
And it's, and we're not really in
when Scott Wu was on the show earlier this year,
he was basically saying AI will win an IMO gold medal this year.
He felt very confident in that.
And I'd be interested to see how he thinks about, um,
and I'm pretty sure new performance,
pretty sure the IMO gold medal questions
are public once the IMO happens.
So every year they're developing new questions,
but then they go out there and then they get memorized
and the solutions become discussed
and there's all the context around that.
And so yeah, it gets kind of baked in.
So big question about how valuable are these.
At the end of the day, it's really just about adoption.
And that's why we were looking at the polymarket
for the best, which company has the best AI model
at the end of July, and XAI has just surpassed Google,
which was sitting around 80% chance for a while,
and then started dropping earlier this week, last week,
started dropping and now XAI is sitting at 48%, Google's sitting at 45%.
Well, yeah, actually updating it's updating live. Google's back up at 49%.
Is Google planning to launch something new in July? Because it feels like,
it feels like this market particularly is more driven by Google's release schedule.
Because Google might have something in the lab,
but they like to release things at specific times.
They have, it's a big company.
They don't just like, who knows, drop it.
Gemini team Logan over there might be fixated
on this polymarket being like, I need this.
Yeah, yeah, yeah.
Oh, during the wait, he was like,
if you need something to kill the time, Google AI Studio.
So, I mean, people were definitely
memeing the production values on the Grok 4 launch, because it was supposed to start the time, Google AI studio. So people were definitely memeing the production values
on the Grok 4 launch, because it was supposed to start at 8.
I think it went live at 8.45 or something like that,
maybe a little bit later at Pacific time.
And Eigenrobot was saying the production lines were terrible.
This market is based on LLM Arena, specifically
the tech leaderboard.
So currently, they haven't fully updated it.
So it's unclear right
now Gemini 2.5 Pro is still at the top but I think the expectation is once they
get Grok up there it will be the top spot so we'll keep following yeah this
market there's over 2 million of volume already on it so yeah it's so
interesting that Anthropix not on this poly market at all because because people talk about them as having the best vibes,
the best big model smell, the best interaction,
and L.M. Arena's supposed to kind of test that
with these A.B. tests, and yet doesn't seem
to be performing there, but it almost doesn't matter
because they're just focused on the business at this point
as opposed to the benchmarks.
I don't know, it's all changing.
We have a post here from Ben Hylek.
He says, Elon Musk on AI.
So during the presentation, a lot of people
were critiquing the presentation,
saying that it didn't feel super polished or whatever.
I don't think that was the intent.
And it was pretty fixated on the models themselves,
and what went into them, and what they're good at.
But Elon did have this one quote in here where he says,
and at least if it turns out,
so he's talking about what kind of impact AI
will have on the world,
and he goes, at least if it turns out to not be good,
I'd at least like to be alive to see it happen.
It's like, if we get the Terminator ending,
I wanna be around for that.
Yeah, I want to experience it.
What does that say about his timelines?
Because it's like, is he expecting that to be alive?
Like, I feel like most people that have been
in the Doom category have been like,
the Doom's coming soon, not the Doom's coming in 200 years.
I didn't, I read into it more like, he will find it interesting if that is the outcome and
And and it'll be entertaining less so like will I be alive when it happens kind of thing?
But who knows there was another funny quote at the end of the art at the end of the presentation where?
Elon kind of looked around at the very end. He's like
Anyone else have anything to add and one of looked around at the very end. He's like, anyone else have anything to add? And one of the engineers goes, it's a good model, sir.
And they cut it.
Extremely online crew.
Definitely on brand.
Well, Ben Heilach, as you know, he's been on the show.
He's a designer, probably working in Figma.
All day.
Think big, think bigger, build faster.
Figma helps design and development teams
build great products together.
You can get started for free at figma.com.
And we have our first product coming out very soon
with Figma Make that Tyler has been cooking on.
I've been very excited.
He showed me, he showed me it and I was like,
oh, like someone built the thing
that we were thinking about building.
And he was like, no, like I did this.
Generated this.
This is in Figma.
And I was like, this is like an iframe on another website that like already exists because it looks like exactly what we want
But it looks so good like it looks like you work on it
It looks like it looks like he worked on it for like a few weeks
No, it looked like someone else did it it looked like it was a professional product that like stole our idea
Basically, I was like, oh like someone else got to it. That's that was the vibe what I
Yeah, well, how is, how has the experience been?
I don't know if you want to leak exactly
what you're working on, but.
Yeah, I don't want to talk about it too closely, but.
How many prompts did it take you to get
where you showed me?
Yeah, I mean, maybe five.
That's so crazy.
This thing is so detailed.
The design is super, it's really great.
It's really good
Yeah, the fact that it came out looking like basically like 90 like 90% Yeah, yeah
Yeah
And and I imagine that there's probably like the last 10% if we were really strict about like it's got to be on this exact
Style guy like that might be something where like, you know, Tyler winds up spending more time fit finalizing and customizing stuff
But in terms of like just getting a functional prototype out,
oh man, it was mind-blowing, it was awesome.
I'm very excited about the age of vibe coding.
This is an interesting chart from Tracy Allouay.
Been on the show.
The cost to rent an NVIDIA H100 GPU hit a new low this week
with annualized revenue at 95% utilization falling from 23,000 at the start
of May to less than 19,000 today.
So that's not that big of a percentage drop,
but it is, but I mean it is a 20% drop.
It's a consistent trend.
It's a consistent trend.
I wonder how much of this is driven just by
all of the frontier labs that are driving the most adoption
or moving on from the H100 to the 200.
I don't know what else would be driving this
because if you can still get,
like if you only take a 20% drop off of a full refresh
of a new hardware,
like they're not the latest and greatest anymore. It's a pricing drop, not a new hardware.
It's not the latest and greatest anymore.
It's a pricing drop, not a utilization drop.
Yeah, annualized revenue at 95% utilization.
So this is revenue per unit.
So utilization is still very high.
It's the price that these Neo clouds
are able to rent them for, which is dropping.
Which tracks.
Yeah, yeah, I mean the market's more competitive than ever.
There's more Neo clouds spinning up,
and more people actually inferencing these things.
And then, I guess this is the question of like,
how stuck will certain workloads get?
Like if you have figured out a great use case
for an LLM in your organization,
and it's something that's not one-shotting
your entire stack or whatever,
but it's just like we have data flowing through our systems
and we are going to use,
you know, LLMs are gonna interact with every PDF
that gets uploaded to our website or whatever.
And so we're inferencing a lot.
You might not need to put that on the latest hardware
or update the hardware forever.
You might just be like, yep, it's llama three, it works.
It's on H100s and it'll be on H100s forever.
And that piece of our business will just stay there.
Just like we have a Postgres database that works
and we're not changing it every year.
We're not changing everything.
We're just like, we're just trying to cost optimize that
and just hopefully the cost just comes down on that.
But like we've solved this particular problem,
then we'll go solve new problems with new technology.
So I think that's probably what's going on here.
But it gets to the point of like,
the biggest question with Grok is that like,
the model clearly is frontier, it works, the whole
fine-tuning on the actual X account is a crazy final step of system prompt and people were
joking about that.
Like, oh, they're going to fix that.
That's not what they're demoing today.
They're demoing the underlying raw model, which is clearly like just engineering focused,
as you saw in the demo,
the demo which was just like benchmarks and stats.
Turns out the secret ingredient to crushing every benchmark
is to have the bunch of data from schizophrenic posters.
No, obviously not.
I actually think it's the design of the RLHF stuff
and the design of the reinforcement learning pipeline
Tyler you got anything
Yeah, I mean I think just like so far what I've seen on X like the overall response like five stuff
Yeah, is that people are saying?
Maybe it was a little too kind of overfit on the RL like VR like verifiable rewards. Yeah, like you kind of see this when
even in the demo, I think it would sometimes respond in the answers
with in late tech formatting.
Oh, sure.
Which is like, OK, that means obviously they've
trained a ton on math questions, stuff like that.
Papers and stuff.
Maybe people are saying maybe it was kind of bench maxed.
You see it like 100% on AME is kind of crazy.
It's like sauce.
It's like you don't want to be too good.
Yeah, yeah, yeah, this is the thing about democracy.
If you win like 80% of the popular vote,
it's like okay, it was a blowout.
If you win 100% of the popular vote,
like probably not a democracy.
I don't know.
I mean, in theory these things should be able to do it,
but I'm interested to know more if we dig into Arc AGI, is there more stuff going on there?
Are there any secrets?
Because it does seem like kind of an outlier result.
You can see it from this Aaron Levy post.
Grok four looks very strong.
Importantly, it has a mode where multiple agents
do the same task in parallel, then compare their work
to figure out the best answer.
In the future, the amount of intelligence you will get
will just be based on how much compute you throw at it.
I was joking with Tyler about this that the individual models are mixture of experts models
so there's a whole bunch of parameters right and then the individual parameters like light
up the different neurons based on an internal to the model router so there's kind of like the math section of the brain,
the literature section of the brain.
And so this was like one of the,
this was one of the key breakthroughs in like GPT-4, right?
Was mixture of experts.
People think, we're not super sure.
Yeah, we don't still, we don't fully know,
but that's like an internal decision
that happens within the model to be like,
let's go, this feels like a math question, let's go down the math path in the model.
But then, Grok 4 is doing multiple,
it's running the same model multiple times
and then comparing the results.
And so now you have-
Yeah, grading it.
Yeah, you have multiple agents
running mixture of expert models.
You have mixture of agents running mixture
of experts models.
And the next thing is gonna be like, if you want the absolute best intelligence, you need a mixture of agents running mixture of experts models, and the next thing is going to be like,
if you want the absolute best intelligence,
you need a mixture of companies.
I send one prompt, and it goes to Grok and Claude and GPT
and Gemini and a human.
Yeah, I wonder how OpenRouter is thinking about this stuff.
It is funny to think about the human version of that,
where you give five engineers on your team
built the same feature and and then kind of compare notes
afterwards.
Wildly inefficient, but with software,
when you can do these things very quickly,
there's incremental cost.
But you can have more confidence in results.
I mean, it's basically like having a brainstorming meeting
with the whole team, and just throwing up a question
and being like, hey, we have this hard problem
that we need to solve. Here's my idea.
What do you think? What does Tyler think? What does Ben think?
So you kind of like go around the table.
Everyone kind of gives their input, their various expertise.
They kind of think through the problem in different ways,
and then you can pair answers and everyone kind of coalesces around one strategy.
This is like how work happens in the real world with a meeting.
It's kind of the same thing,
but certainly expensive to do that,
so it'll be interesting to see where companies,
like how eager are companies to jump over to Grok?
Because it seems like it's been a big lever for Microsoft
to have Grok in the ecosystem as kind of a stocking horse
for all the other models because
Satya wants Azure to be very model independent,
serve them all, they have the,
I think they have exclusivity for chat GPT or GPT APIs
or they have obviously a great deal there with OpenAI.
And so if they can have GROK 4 as well,
that's another tool in the tool chest
to be like this top layer.
Sachi is in such a good position. It's probably not discussed enough, not much, just by owning
those end customer relationships and being able to vend in whatever model is hot at that
moment and give people optionality and still get 20% of opening eyes revenue,
at least for now.
Yeah, he's also SOC 2 compliant.
Of course.
And if you wanna get SOC 2 compliant,
head over to Vanta, automate compliance,
manage risk, prove trust continuously.
Vanta's trust management platform
takes the manual work out of your security
and compliance process and replaces it
with continuous automation, whether you're pursuing
your first framework or managing a complex program.
So yeah, EigenRobot was talking trash
about the production values.
I don't know about trash.
They were just noticing.
I didn't think it was that bad.
I think it's really good.
Slides are worse than I'd create after getting
into roped into a presentation with one hour notice.
You can tell the engineers made them themselves.
I think this is just a reflection of the culture.
It's like a screenshot.
Yeah, very clearly
it's like screenshots dropped into a slide.
But this is a reflection.
It's like this light mode screenshots on dark mode slides.
So like, let's do black slides.
And then you come with your white screenshots
that are kind of like misaligned
and not really evenly distributed.
Like they didn't do like the distribute evenly
or whatever, distribute horizontally.
Still gets the point across.
And I think it's a reflection of their culture.
And it shows what they care about,
what they don't care about.
They're not trying to be the most polished.
They're just trying to be the best.
Yeah.
Igenrobot did a whole live tweet here.
Yeah, so Elon was predicting the model will discover
new physics within two years.
He said, let that sink in. Long silence.
One engineer laughs awkwardly.
Is that sooner or later than his previous timeline?
Because he was talking about AI discovering new physics soon.
I don't remember if he was saying.
Dating it.
Two years or three years or one year before.
Because this could be that he's still excited about this,
he still thinks it's possible,
but he thinks it's gonna take longer
than he said previously,
and that's kind of the more important update.
I don't remember what he said originally.
See if Grok can find out.
But he was saying this at the Grok 3 launch,
that that is the goal.
And if you can get there, you've kind of solved everything.
And Sam Altman was talking about that too.
That if you can create a super intelligence,
that's probably the first thing that you'd wanna do.
Is like, hey, go discover all the new physics
and really help us figure out how the world works
so you can solve, you know, fusion and all this other stuff.
I wanna be clear, I love all you guys at XAI,
I only want the best for you,
but I'm gonna continue to live post.
Elon attempts to give a speech on alignment
involving a very small child, a child much best for you, but I'm going to continue to live post Elon attempts to give a speech on alignment involving a very small child a child much smarter than you the monologue rambles with no conclusion
In sight a pause. Yeah, will this be bad or good for humanity?
He says the you know at least if it turns out to not be good. I'd like to be alive to see it happen
Oh, yeah, they had a polymarket integration
That was kind of interesting. Yeah, it's interesting
basically giving the model access to real-time polymarket data so that it
can help make predictions and sort of add context
around the market itself.
Yeah, that's interesting.
Yulan asking the real questions.
You say that's a weird photo, but what is a weird photo?
I still don't understand why we're
looking at weird photos of XAI photo, but what is a weird photo? I still don't understand why we're looking at weird photos
of XAI employees, but they were charming.
They're calling it Super Grok, crazy features,
16-bit microprocessors.
What is, I don't even understand what this is.
Oh, they built like a game in Grok.
They had a demo of a video game generated by Super Grok.
It's a Doom clone.
Every time the PC shoots an enemy,
floating text appears reading Grokdom.
Elon is fabricating timelines for product launches
on the spot.
The engineer sitting next to him
is looking at the floor, face impassive, nodding.
It's a good model, sir.
For real though, congratulations on the launch, guys.
It's a good model, sir.
I thought this post from the actual XAI engineer,
Eric Zellickman, was funny.
It was like AI model version numbers over time.
Did you see this?
So it's this chart of the version numbers over time
and you can see that Grok is versioning fastest
because it's like at this point,
what else are we measuring?
Like at least they're iterating
on the version number effectively as opposed,
and I guess this is a shot at OpenAI
because they launched 4.5 and then went to 4.1
and they're kind of like, you know,
there's this big question about like,
when will GPT-5 come?
The expectations are so high for GPT-5.
And so they've obviously, the Grok teams are like,
hey, at least every three months
we release a new full number.
So I wonder, the five is a number
that really no one has like gone for.
And I wonder if Grok will do it first. The five is a number that really no one has like gone for.
And I wonder if Grok will do it first. Like if you draw the line on this,
they certainly should do it in like three months.
They should have Grok five.
And there's no reason that they shouldn't,
but maybe there's some.
And it's very possible that Colossus is the key.
Yeah.
To get into five.
The new data center.
Oh, the new data center, yeah.
Well, they'll need Linear to plan that out. Linear is a purpose-built tool for planning and building products meet the system for modern software
Development streamline issues projects and product roadmaps. They linear app need linear badly
Hopefully they've gotten signed up near said grok on
Humanities last exam grok for I'm not sure I buy even in the general case that there's a given
Humanities last exam number which implies you discover useful new physics.
How would one make a benchmark of the proper shape for this?
You'd have to have a validation set of questions which are outside the scope of what we currently
are able to do.
You could choose things on the edge of our knowledge distribution and then try and exclude.
Yeah, it is interesting.
If you are able to memorize every hard math problem, does that allow you to discover new math?
It's sort of a prerequisite because you have to-
I think where I've imagined these discoveries coming from
are having a single intelligence
that has
PhD level intelligence across,
like a single mind that has PhD level intelligence
across every human domain, right?
And being able to combine ideas from different domains.
Like historically, a lot of innovation is just taking
something from one field, bringing it over here,
making some combination of it.
I think Elon talks about the potential
of discovering new physics, but again,
didn't spend a lot of time breaking down
how that would actually happen,
but world is unpredictable.
So we'll see.
Yeah, it's interesting.
People are really pushing this idea of,
okay, we are accelerating.
The Arc AGI leaderboard is accelerating,
but I keep
seeing this and and feeling deceleration like I am NOT feeling
acceleration right now are you Tyler? Yeah I don't know I think generally I'm
kind of like not that interested in a lot of these kinds of benchmarks like
Arc AGI is more interesting but just like the Humanities last exam the kind
of general math physics knowledge. It doesn't seem
to be that like
It doesn't seem to line up with you like you see gbt 4.5
Kind of does very poorly on these things but like writing it does really great
So like I think I'm I'm more like if I were good a long short on like different benchmarks like the usefulness of them
I think stuff like HLE. I'm kind of short, long.
I'm like, have you guys seen the Minecraft benchmark
where it builds the two different?
OK, you basically two models build like a Minecraft.
There's like a prompt.
It's like build a house.
Then you can choose.
And then it's like their rank.
The model's here and for the mines.
But who's grading that?
The human?
It's a human who picks between them.
It's kind of like a ELO. but just like general kind of creative tasks sure I think stuff like that Aiden bench is good
Yeah, I think even on the grok launch there was the vendor bench which was Aiden bench
Aiden bench is in McLaughlin's benchmark. It's just like
It's it's kind of hard to describe how it works exactly, but it's just various like creative tasks
How like kind of novel its thinking is the like style of its text sure wait
it's just like it's just like whichever one he likes the most at the end of the
day he's the only grader no no there is like an objective like function okay you
can like run it it's not just like okay the idea that they open again it will be
funny you know there come there's a period of life where your SAT score like I was thinking which one. The idea that the. Open it again. It will be funny.
You know, there's a period of life where your SAT score
matters a lot.
Totally.
And it says something about you.
And then a decade later, it's what you can do,
what you have done starts to matter a lot more.
And so I do think we'll reach that point where it's like,
yes, you can one shot every hard exam question there
is that you can throw at it, but like, what can you do for me?
Yeah. Yeah, totally. And I think that's, I think that's why like the bigger question
is almost like, you know, chat GPT, D I use and like, and like actual revenue, revenue
and the final app installs and stuff. Yeah. I mean, the mean the revenue thing is interesting because you wind up in like
B2B cloud world, which is valuable, but it's maybe less, it's like, it's more competitive
because it's more commoditized.
And
well, yeah, if you don't have a lot of leverage in the enterprise, if Azure is able to offer infinite models that
are infinite frontier models, open source models that
are maybe just behind the frontier,
but great at certain tasks.
The leverage isn't quite there.
There will need to be another pretty significant leap
until then.
Anthropic being really good at code gen,
there's leverage there.
We saw this yesterday with Lama switching over
to Anthropic models internally,
and then just having a consumer app
with a lot of users, also very valuable.
Yeah, the other interesting thing
about the foundation model layer commoditizing
and it becoming like cloud, and if you have a model,
you'll just be like vended in as an API to anything else.
Like the token factory is that
the hyperscaler clouds are extremely profitable.
Like even though AWS, GCP and Azure
are all somewhat directly competitive
and they're somewhat perfect substitutes for each other,
they have not driven prices to zero,
in the way airlines are deeply unprofitable.
AWS and Google Cloud are both profitable.
Yeah, or you look in other commodity sectors,
like oil and gas.
And I don't know if that's just because there's lock-in.
I'm not exactly sure, but there's something about where,
you know, maybe the counterintuitive take is that,
yes, they do commoditize,
and there are a few major foundation models
that are frontier, and they all are roughly the same price,
but they all have decent lock-in with their customers
to the point where they're still able
to extract some level of profit,
or they're just creating so much value
that even if they're taking a small marginal slice
on top of the cost to run,
that they're creating so much value
that they still have 50% margins or something like that.
Because this was the story of AWS.
No one knew how much money it was making,
and then they had to break out the financials
in one of Amazon's earnings reports
and it was like the AWS IPO as Ben Thompson put it.
Anyway, before we get to the next story,
let's tell you about Numeral HQ.
Sales tax on autopilot, spend less than five minutes
per month on sales tax compliance.
So the big news is that the third browser war has begun.
Google stock has dropped on the news
that OpenAI is planning to launch a Google Chrome competitor
within just weeks, and this is very interesting timing
because-
It's time to browse.
Yeah, time to browse.
Certainly makes sense to become deeper,
more deeply integrated into the user's life.
Makes a ton of sense.
There's a ton of benefits that come
from having a web browser.
What was interesting is we can go into what Google
actually launched, or what OpenAI is talking about launching,
but this news, this scoop leaked the same day
that Arvind from Perplexity announced that they're finally
releasing their next big product after launching Perplexity,
Comet, the browser that's designed to be your
thought partner and assistant for every aspect
of your digital life, work and personal.
And so Perplexity launched this on June 9th,
and then OpenAI, the scoop goes out via Reuters
the same day.
And so this feels like very much like,
let's not let Perplexity get a bunch of attention and
drive a bunch of people to start daily driving Comet the browser because
Even though we're not ready to launch our competitor. We want me to Arvin was on the show talking about Comet
But over a month ago. He said it was really important to the business. This was a big bet that they're making yeah, he
And I'm sure both companies
are racing to be the first to launch, but Dia, the browser from the browser company,
also launched out of, or they're still in beta, but they launched like a month ago or
something like that. So you're not going to be the first.
Oh, they launched a month ago with the Dia browser? That's interesting because I saw
Riley Brown also posted the cursor for web browser and Dia browser. And I thought Dia
browser launched the same day,
but I guess it had launched earlier.
Yeah, so anybody that was an ARC user
can download DIA today and chat with their tabs.
But interestingly enough, Perplexity's browser
and OpenAI's browser are both built
on Chromium, the same open source project that underpins
Google Chrome and Microsoft Edge.
So the cool thing here, that means that they're compatible
with existing Chrome extensions.
Oh, interesting.
OK, that's cool.
Yeah, I want to talk to more people who were active and tech
during the earlier browser wars.
The first browser war was Netscape Navigator
versus Microsoft Internet Explorer.
This is in the mid 90s, early 2000s.
Netscape was super dominant and everyone loved Netscape.
It was originally the Mosaic browser,
this is the Marc Andreessen project.
And then, but Microsoft bundled Internet Explorer
with Windows 95 and the distribution was so powerful
that Internet Explorer actually wound up winning
and became really, really dominant.
But then there was this lawsuit and it went back and forth.
But then basically by the early 2000s,
Internet Explorer had over 90% market share,
but then they got kind of lazy and stagnant apparently.
And I mean, I'm not exactly sure what happened,
but there was a lot more competition.
So Firefox, which was, I believe,
like a spin out of Netscape,
or kind of like some of the same heritage there,
began getting traction.
And then Google Chrome launched in 2008
and leapfrogged everyone.
And Google Chrome was really focused on speed.
It was the fastest browser.
And they did a whole bunch of work
to optimize JavaScript so the pages would just load faster and run better on pretty much every computer that
you had.
And so, and then they had the open source project with Chromium and so they were able
to kind of standardize the entire industry.
And so everyone's always been trying to draw analogies between like the browser wars and
the LLM wars and like what's the role of open source in that, like is open source a strategy
to wind up
maintaining your dominance?
How much does distribution matter?
Chrome was probably pretty easy to distribute
because every single person was visiting Google
just every day searching.
And so you just put this bar,
hey, wanna switch to the faster browser?
And people just do it
because you can have basically
billions of ad impressions on your product every day.
Will be interesting to see if chat GPT can get people to download their own
browser on desktop. I mean I'm using chat GPT on desktop in Chrome all the time.
Which chat GPT model would you want to use as a default search engine?
That's the hard part because I always run into this problem
where it defaults to 03 Pro, but that takes 10 minutes.
And so then I have to go to a 40.
And then if I'm in an 03 Pro flow
and I'm talking to 03 Pro and I let it cook for 10 minutes,
it gave me a great answer.
But then I want to just be like, okay,
just like clean this up a little bit or summarize this
or do some bullet points.
I want 40 to do that.
So I have to switch over.
So I don't know, I would imagine I'd go 4.0
as the default because I want speed.
But even 4.0 could probably be faster
before it truly replaced.
Google's very fast.
They've spent a very long time being fast.
Yeah, and I could imagine them doing a similar project to,
I believe it was like the V8 JavaScript engine.
They sent this team out to, I believe it was like the V8 JavaScript engine, they sent this team out to,
I wanna say like Iceland or something.
They basically sent like a bunch of engineers
to like an offsite and they were like,
just go optimize JavaScript for like a month.
Just go focus on this for like a month or months
and come back when it's done.
Like you have no other responsibilities
than just like optimizing this like compiler.
And they came out, came back with a VA JavaScript engine
and created this whole like Node.js boom.
People were running JavaScript on the server then.
And I could see Google kind of doing something similar
where they're like, okay, we have Gemini.
It's good at looking stuff up.
It's a good knowledge retrieval engine.
Go figure out how to make it load all the tokens
for the full response in 100 milliseconds.
And that would be very, very cool.
And I wonder if that's like a uniquely Google advantage.
Tyler, you looked something up?
Yeah, it was in Denmark.
Denmark, okay, I was close, I was close.
Yeah, I wasn't sure if it was Finland or Iceland
in Denmark.
Yeah, the interesting thing here,
I'm realizing that tabs are definitely a light lock-in to
browsers.
It's not just the default, but if you have six to ten tabs that you've just had open
for a really long time and they're like from a bunch of different things and you can't
exactly remember what they were if you had to list them all off, but you know, I personally
end up using tabs as like somewhat of a to-do list. And so if you're spinning up a new browser and you don't have your tabs, it's like, oh,
do I want to just like get rid of my tab stack? I have a bunch of tabs that just have stayed
there for years and they're basically like, it's basically like a mini operating system,
right? With like different apps that might be a Google Sheet or something else. Yeah, I know what you mean.
So there's very real lock-in.
I could bring all of those tabs over,
but I have to then log in to a bunch of different services.
And so it's really, really hard to actually win here.
I wonder if anyone's using, you know in Google Chrome,
you can actually change the default search bar.
You know when you type in the search bar and if you just type words,
it just Google searches it, you can change that
to search ChatGPD, yeah, you can pass in a query parameter
and it can just do that, but I haven't heard
of anyone actually doing that, and I used to have,
I used to be such a power user of Chrome,
I used to have different code words basically,
so if I typed like I space
and then a query, it would go to IMDB
and search that specifically.
So you could have Chrome like route to any specific search.
Any, so you could press like Y space
and it would search Yelp or you know, anything else.
But I don't know if people are doing that with Google,
with ChatGPT, I think people mostly just like
control, command T and then hang out in ChatGPT.
Well, we'll have to ask Chris in 15 minutes
about get an update on the browser wars
because he was an early investor in.
I know one of those tabs that you have pinned right now.
What's that?
Adio.
Of course.
Customer relationship magic.
Adio is the AI native CRM that builds, scales,
and grows your company to the next next level you can get started for free
I've had adio open for
Thousands of hours at this point. Yeah
So signal kind of breaks it down with the open AI launching the rub web browser says this is the oldest plan tech fine product
Market fit with a single killer use case then vertically integrate and horizontally expand until you control the interface layer itself, app, platform, once you own the interface,
you own the defaults.
Welcome to the next generation of browser wars.
Yeah, what's interesting is there,
Sam Altman at OpenAI and just the fact
that OpenAI is a company, there is kind of a mandate
to vertically and horizontally integrate,
figure out code, figure out research, figure out devices.
But every company wants to do everything,
but then sometimes they run up against barriers.
There was a time when Google was like,
we want to win social networking
and we want to beat Facebook
and we're going to launch a direct Facebook competitor.
And they did, and it didn't go well,
and then they shelved it,
and then they wound up
producing trillions of dollars in market cap
just doing the thing that they do great.
And so the question is like the surface area of OpenAI,
they have to explore, they have to experiment.
It would be stupid not to see if they could get a browser
and a device and a chip and a nuclear reactor
and everything and sand, get the sand, get everything.
But there's no guarantee that they will win
the entire vertical stack and they will be the one company.
I think my question is are these gonna be,
like is OpenAI's browser gonna be an entirely new app
other than their existing mobile app?
Or their desktop app?
Yeah, that is interesting.
Because if they have to get people
to re-download a separate app, then that's entirely,
they have a good fly.
It is interesting that they wouldn't just evolve the apps
that they already have installed.
Perplexity, too.
I don't know if Perplexity is planning
to release this as a new standalone app,
or it will be in the Perplexity mobile app but.
Yeah.
Yeah I mean I know I think comments like it's own thing
because we were looking to download it and we need a code
and you can't just get it if you're just on Perplexity.
But I don't know.
All I know is that you should go to fin.ai,
the number one AI agent for customer service,
number one in performance benchmarks,
number one in competitive bake-offs,
number one ranking on G2.
So Arvin breaks down his philosophy of Comet,
the browser that he's dropping from Perplexity, he says,
you can either keep waiting for connectors and MCP servers
for bringing in context from third-party apps,
or you can just download and use comment
and let the agent take care of browsing your tabs
and pulling relevant info.
It's a much cleaner way to make agents work.
So that is interesting.
So I wonder how much puppeteering will be in this,
because ChetGPT and OpenAI have operator
that operates a chromium front,
like a headless web browser basically,
but you can actually see it working
and it's clicking things.
And so if they're, like there's also the value
of like the training data,
if you're getting people using all these websites,
you have all this training data of like,
okay, they clicked on the blue button,
they clicked on the green button, they saw this,
they entered, this is how they dealt with this form,
this is how they dealt with that form.
And so that feels like very, very valuable data
if you can get it, so it's probably worth duking it out
even if it doesn't, even if it takes a long time.
For sure.
I do wonder where else they will,
where they will plug in, like Clueli operates
at like a higher level of abstraction
with like the screen scraping.
And I wonder if we'll hear rumbles about either perplexity or open
AI thinking about like moving up the stack to that level not exactly sure
anyway Dan Ivey's says we believe Apple needs to acquire perplexity for AI
capabilities likely 30 billion dollar range would be a no-brainer deal given
treadmill AI approach in Cupertino perplexity would be a game changer on the
AI front and rival Chachi Petit given scale and scope of Apple's ecosystem so
people have been talking about this for a while it feels like there were talks
and then they kind of stalled out and and Arvind is it'd be a 300 and I think
this would be a 375x revenue multiple.
Wow.
I mean, the product sense is good.
You use the product and like there's,
like Apple hasn't been able to deliver on the product side.
They have the distribution,
but they haven't been able to get things out.
We talked about this before though.
The most expensive acquisition Apple has ever made
was Beats by Dre for $3 billion,
which was a 3X revenue multiple.
It would be a huge shift.
And I don't know that,
I think that Apple is embarrassed right now
and feels a lot of pressure to deliver.
I don't know if they're at the point
where they would pay $30 billion just yet.
And even then it's like hard to integrate.
Or even 14 or whatever their last private valuation was.
Yeah, and the big question for me was like,
perplexity is built on a lot of different clouds,
a lot of different tools, a lot of different models.
Is Apple cool with that stack?
Because if all of a sudden.
Or do they want to just go to direct to
Anthropic or OpenAI which they are in conversation to it.
And every once in a while, these scoops
pop up around perplexity and Apple conversations.
And it's hard to read into that.
Is this like rumor mill?
Like, what's driving that rumor mill?
Yeah, well, pull up the Mag7 chart.
I want to see where Apple and Google are sitting today.
Apple at 3.2 trillion, Google at 2.1 trillion,
and Nvidia's holding strong at 4 trillion.
Not bad.
Yeah, I mean, 1% of market cap, they're at 3.2 trillion,
30 billion dollar acquisition to be,
to have an AI product that clearly has a good roadmap.
Is it that crazy?
I don't know.
Well, if you're making bets on any of the Mag-7,
do it on public.com.
Investing, for those who take it seriously,
they have multi-asset investing, industry-leading yields,
and they're trusted by millions, folks.
So in other Apple news, they're preparing to launch
the new version of the Apple Vision Pro.
They're just doing a slight iteration on the chip. They're moving to the M4 chip, and they're launching a new
strap, which was something people were complaining about because the weight, maybe it'll be better
distributed. People were switching out for the Pro strap earlier. Elon announced the America party, or I guess it came out on Monday,
stock dropped from $312 a share
all the way down to $291 a share.
That is when Dave Portnoy, I think, market bought,
he was saying, Davey Day Trader is back,
if that's not a top signal, I don't know what it is,
but he was market buying like 10 million of Tesla, being like like I just think it's gonna go back up to where it was
It's just been climbing since then it's back up to three hundred and eight dollars a share
Almost almost recovered. It's up
4.2 percent today on brand for Elon and
Basically gonna it looks like it'll just recover the price prior to the America party and Dave
Fortnight.
He literally was, his thesis was like,
I think it's going to go back up to where
it was in about two weeks.
And I'm going to make 10%.
And I'm going to make a million dollars.
I mean, that was your thesis on Nvidia.
You were like, wait, it's down because of DeepSeek.
Maybe it'll go back up.
It was the most basic analysis. And it worked up. It was like the most basic analysis
and it worked perfectly.
It was fascinating.
Maybe that's broad.
Yeah, you can see.
Just the idea of like simple analysis
and like not necessarily needing deep insight
to call the market is good.
I don't know.
Who knows?
Anyway, this Apple story is from Mark Gurman.
Of course, the master of scoops in Bloomberg.
He's got another one.
He's on his fourth or fifth this week.
He's an animal. He's on an absolute tear.
I mean, this one is a little bit minor.
They're gonna include a faster processor
and components that can better run AI stuff.
And so, not that Apple has any crazy AI stuff
that they really want to run in there.
I don't think that that's a major differentiator.
I've been thinking about how,
is AI a key unlock for VR?
And I don't think so at all.
I think it's much more about the content
and the use case.
Entertainment.
Entertainment, I think it's a replacement for a TV
for to start and they need to just make it dead simple
to use as a TV.
I don't know, we got a demo of a VR product a while back
and it had some very cool native AI features.
So there's something there.
But Apple's product doesn't feel like it's ready
for just wearing while you're making dinner.
Yeah, so that version,
the one that significantly reduces the weight of the headset,
they're planning to launch that redesigned model for 2027,
which feels so far away.
I know it's only a year and a half, probably the end for 2027, which feels so far away. I know it's only a year and a half,
probably the end of 2027, so maybe we're talking two years,
but in the AI race, we're really like,
AGI tomorrow, AGI next week, AGI next month.
I'm like, we can't ship a better,
we can't slim down the headset and take off the screen
and create a little lighter materials
like this month, like let's do it.
But hardware's hard and you know, the stuff takes time.
So good luck to them.
I'm excited for it.
I'm very excited for the next Quest.
Do you still have a Vision Pro?
I don't.
I had it for a month, I took it back
because I just wasn't using it that much.
It was like heavy and I couldn't find it.
And it had a bunch of things that like,
you had to do these crazy workarounds.
I wanted just like an HDMI cable that I could plug into it
and then just be like, okay, my PS5 is in VR now.
And I couldn't do that.
It was like, you had to like,
pull the screen into the Mac and then screen share it in.
There'd be latency, it was ridiculous.
The thing, the use case that I still see is people using it on planes
yeah but I just gotta check in with Tyler when give me your how many times
have you thrown on the VR headset in the last week did you play it last night
break it down have you termed is collecting dust no it's I've been
playing a lot of Call of Duty in the VR headset. Yeah.
Okay. It's a lot of fun. There's like no latency. I'm kind of surprised. Really? No. And you're doing the cloud. It's really slow.
You're doing the cloud. Yeah. Okay. And it's online. It's multiplayer.
Multiplayer. So you play multiplayer and you play like the latest and greatest
Call of Duty basically. Yeah. Okay. Like ops, like six, I think. Cool.
So you have a controller and it's a big screen on the wall and you just chill
there. But walk me through it me through is like 30 minutes a day
Yeah, probably like 30 45 minutes a day. You're fired
This is this is true research so yeah, I mean I honestly think that that that the the the quest Xbox the meta
Meta Xbox quest or whatever. I forget the name but like that I think that's more
That's better news than like a processor bump on the vision Pro just like yeah deeper integration so that you don't have
So you can just throw it on what's the actual time?
To you know if you want to turn it on throw it on plus start get playing get into a lobby
Actually get your first kill. Is that one minute?
No, it's like 30 seconds maybe?
30 seconds, it's fast.
No, maybe like a minute.
It's not like noticeably slow or anything.
But you're logged in, you don't need passwords
or anything, it's not like a hassle.
Okay, that's cool.
And I put the screen, it's funny,
like I have a TV in my apartment,
but I just put the screen right where the TV is.
It's like the perfect spot on the couch.
It's a nice black square.
Yeah. Yeah, yeah.
I'm gonna have to get this back for you now.
This sounds amazing.
Now I don't have any time to do this, but.
But I feel like the next,
what's on the feature roadmap that you would wanna see?
Like Apple is bumping the neural engine
and trying to upgrade the chip.
I'm not sure that
that's the problem with the vision pro what would you like to see out of the
quest for I guess is the next one that's coming yeah I think the main thing so
I've tried the the vision pro and basically I mean the visuals are just
like vastly superior it's it's it looks so much better okay then even though
it's so it's it's the quest better screen in meta quest 3x Xbox edition. That's what I have. Yeah.
And the screen is just way better. Okay. I think that's,
I would say that's the main thing.
So if they can just go find the supplier,
if meta can just go find the supplier for the vision pro screen and put it in
the quest for you'd buy it yourself.
Depends on how much it is kind of broke. I would definitely be inclined to not have
not a few drop out and go full time. You would speed run all of Halo in order to potentially
win one? I would do that. You would do that? You would do a very difficult challenge in
order to potentially win one because you would you would want it. Yeah. Okay. No, I think
that's that's I would think that's definitely the main thing. Are there any other are there
any other nice tabs that that you think might?
might shift people I don't know. I mean the it's it's very light. It's way lighter than the the vision Pro. Yeah
but still I mean I feel like light is very relative like
It's light to the point where you can do 30 minutes or an hour
I you probably can't do like a full day
or like four hours.
Or any kind of like workout stuff.
I think I definitely would not do that.
Totally, totally.
But what I'm saying is like-
You're not training neck enough.
I knew guys at UCLA who, I didn't go there,
but like friends that went there,
and they were so obsessed with Call of Duty
that they would take a bunch of stimulants
when the new Call of Duty came out and play it for 24 hours straight to get the max
prestige because they were so addicted to Call of Duty that they would just
stay up all night chugging energy drinks and and just to just to beat that and I
just don't think you could do that in VR I think after like two or three hours
right now it's like too much and you have to take it off and get sweaty and
tired but so so I feel, I feel like screen first,
then probably even a little bit lighter,
a little bit more comfortable and then just drop the price as low as possible.
Because if the next one was a hundred bucks, you'd probably buy it. Right.
Yeah. And I think stuff like, like maybe I'd want another screen,
just this monitor, but that's just an issue with the, with like the visuals.
It's a screen.
Yeah. It's got to be, it's got to be competitively priced with the TV.
And the TVs are so cheap now that you've got to just be like,
yeah, I'm just picking one up.
Or the price of AirPods, or the price of,
you know, it's got to be down in low, low hundreds of dollars
to really ramp that up.
But I don't know.
It'll be interesting.
Anyway, our first guest is here.
Let's tell you about AdQuick really quickly.
Out-of-home advertising made easy and measurable.
Say goodbye to the headaches of out-of-home advertising.
Only AdQuick combines technology, out-of-home expertise,
and data to enable efficient, seamless ad buying
across the globe.
And we will welcome Chris Pike to the show.
Welcome back, Chris.
Fantastic to have you on the show.
Thanks so much for taking the time.
Great to see you.
Last time we got cut off.
Great to be back.
In the temple. Last time we got cut off. Great to be back. In the temple.
Last time we got cut off, we were having to jump
and I was like, I wish we had another hour.
So at least we have another 30 minutes here.
Yeah, that's great.
To get into it.
First off, what's top of mind for you?
Have you been tracking anything in the news
that's kind of updated your thinking?
We were digging into Grok 4 and seeing,
is this an update to agent you know, agent timelines?
It seems pretty great in the benchmark,
but is there anything else in the last week that's been like, Oh,
I can't get enough of this story just in your world.
Great question. I feel like every week is a total blur. Uh,
it seems like we're all waiting for not just these foundation models to come out, but like
the next open source models, the big open source models to come out.
I think that that's super interesting to me.
The proprietary foundation models obviously are the frontier of research, but they're
relatively inaccessible from a technology perspective because they're
Fundamentally rent-seeking you can't run them on your own hardware. You can't
It's significantly less accessible
And so I'm I'm kind of waiting for the next generation of open source models
Yeah one maybe
Underrated or under analyzed grok for thing that happened last night was
I don't know if either of you saw this but they they did this voice demo and they were like really pushing the
Accents really far. I don't know if any of you saw this dialer. Did you see this?
Yeah, and there's the the whispering the whispering and so ASMR. Did you think it was uncanny valley Tyler?
I felt very uncomfortable. Yeah. But at the same time, I think,
I think it's a path where we're in the uncanny valley,
just like we were with like six finger hands and stuff.
And when they actually sort out the accents, the whispering, the intonation,
the cadence, it's going to become a much more addictive companion
potentially. So I want to,
I want to bridge to your piece and talk about
the different use cases that you see people might
kind of flow into with these like chat companions
because you mapped out way more than just
the normal take in my opinion.
Yeah, well, so I guess it's worth asking ourselves, where do we want other humans to exist?
And where will we accept substitutes? I think the last time we were talking about it,
if you can imagine a situation where humans are getting in your way of doing something,
then you hate them, right?
Imagine you're in traffic, in gridlock traffic.
That's the most misanthropic you could possibly be.
You're like, if none of you existed, I could just get where I wanted to go without you
being here. But then there are just total other times where we would refuse to accept anything other than humans as that thing.
I know that it's very popular of value from AI companions, that that value isn't real.
But at the same time, I think that when it comes to the allocation of, let's call it like allocation of leisure hours. We really care about other people.
Whether it's like, I think last time I was talking about like going to fine dining or
reality TV.
I think I mentioned that like we have chess software that is way better than any human
will ever be.
And it's not entertaining to us. It's not entertaining to us.
Cause that's sort of like a, there's no drama.
There's no emotion.
Exactly. Exactly. And so, um,
sorry, I, I, I, I'm,
I'm just thinking about like the shape of companionship because in because in your most recent piece,
you call out the imaginary friends that kids have,
Calvin and Hobbes, Toy Story, these stories resonate
because they poignantly depict how colorful
whimsical placeholders of our childhood slowly fade
as society offers real alternatives.
And I'm just thinking about like kids love imaginary friends,
but they also love multiple IPs essentially.
Like they like Batman and then they also like Spider-Man.
And so I'm wondering, there's been this narrative
for a few years in AI of like don't build a GPT wrapper
cause you're gonna get rolled.
There's gonna be immense concentration of value,
and there will actually be no middle class in this ecosystem.
And I'm wondering, because there's this company, Tolan,
that's kind of imaginary friend, AI driven,
and I'm wondering how we might actually,
is it possible that we're heading towards something
where people are essentially developing new IP I'm wondering like how, like we might actually, is it possible that we're heading towards something
where people are essentially developing new IP,
and yes, there's still a power law
in the companionship market,
but these models and these products
are like much more opinionated
to the point where there actually isn't a one product
to rule them all.
And there's a variety of products that fit into different holes, just for
companionship, but also even just within like the imaginary friend hole, there's
25 different options and yeah, there's one that's popular, but then there's one
that's one tenth as popular and one hundredth as popular. But yeah, we react
to that.
Yeah, it's a really interesting, let me first
go back to like the this rapper concept. And I think it's really
important to distinguish when rapper strategies work and when
they don't work. I would argue that the rapper strategy works
the best when the underlying infrastructure is purely commoditized.
You can choose across many different options. Where the wrapper strategy doesn't work is if
you're basically building on top of a monopoly and that underlying landlord is basically just
going to be increasingly rent-seeking and squeeze you out of all margin.
So sort of embedded in the wrapper strategy is the assumption that over time, you're going
to be able to distribute your product on top of like increasingly commoditized infrastructure.
So for example, like Snowflake, Snowflake launched actually just just on Amazon.
And then over time distributed its product
across Azure and Google.
And actually in doing so was able to expand the margin
capture of its own product because it had a,
and the margin capture of its own product because it had a,
their vendors were competing
to be their underlying customer base.
Going back to like this open source notion,
this is actually why I am so interested in open source.
The more that we have competitive fungible models,
the more that the application layer on top
can really flourish.
The more that we'll see distinct unique applications
and kind of until we start to maybe S-curve
near the top of the frontier.
I'm sure you guys are familiar with the,
it's really popular essay, The Bitter Lesson,
which is basically the, you know,
no amount of fine tuning or no amount of specific training
is actually going to outcompete just the fundamental
advances when it comes to like more games with more compute.
But if we start S-curving, if we start, if these scaling laws break, which it seems like they are
breaking on pre-training and test time compute and things like that, all of a sudden you have
largely spongable similar capabilities at that commoditized layer.
And then we can really start to see the application layer flourish.
Yeah. I, my interpretation of the bitter lesson right now is that the,
the impact of AI should be tracked less in benchmarks and less in individual
tests of one model and more in the actual
like volume of inference tokens being generated by humanity and it's fine that
The that we don't have one central AI doing all of the work if we just give everyone an AI
Copilot for every single task. They all get better and, and we'll build more data centers
to inference more.
And eventually that will compound and compound and compound until the, until the, the, the
overall impact of AI is remarkable and, and unmistakable in the same way the internet
has, but it won't be this like all of a sudden we unlock this one incredible algorithm.
Yeah, maybe.
So I'm going to I'm going to try and map a comparison that's probably wrong for any number
of reasons.
Like, let's say let's say we ported the bitter lesson to the rollout of PCs, right? I think there's been a lot of comparisons of like AI feels like a new computer.
So like let's map the bitter lesson to PCs. Maybe the analog would be, hey, don't work
on building software that optimizes for like a computer speed of like 50 megahertz, because
the computer that comes out that's like 100 megahertz or 200 megahertz or, you know, a
one gigahertz is going to be, you know, it's just going to blow out whatever software optimization you've achieved at that compute stop.
Um, and so I think about it a lot like that.
Now, I think this really begs the question of, okay, well, what happens
when the vast majority of our use cases are satisfied by the compute threshold.
You know, I feel like, you know, everybody's like,
who needs the next gen version of this
because largely all of the applications
that you can, that you use or you can run work.
And so it's very clear that we're in this like rising part of the S-curve,
but when the S-curve starts to taper off, that's really when it comes in the question of, okay, well,
how do we think about the value delivery that sits on top of the underlying rent capture
from these foundational models? You could think about it also, like video game consoles, the amount of creativity that
video game developers are limited by is actually how advanced the video game consoles are,
and also how expand, like what the install base of the video game consoles are and also how, like how expand, like what the install base of the video game consoles are. And so I think one of the, one of the challenges right now
is like, we have so few developers of AI applications. Like they're, they're still like, you can
kind of like count them on maybe a few sets of hands, right? Which is insane.
Wait, really? I feel like there's like thousands of startups that count in like the AI
software developer world. I see market maps every single day.
Um, I'm sorry. Uh,
there's like 10 in every B2B category. Like,
uh, well, okay. So maybe like, let's,
let's split the world between like enterprise use cases and consumer use cases.
Oh, sure. Sure. So enterprise, I would, yes, a hundred, a thousand percent. Maybe like, let's split the world between like enterprise use cases and consumer use cases.
Oh, sure.
So enterprise, I would, yes, a hundred, a thousand percent.
There's a lot of companies that are, it's a blue ocean sprint to vertical value delivery
within different sectors because you're largely swapping. Um, you are, uh, the, the spend, the addressable spend is, um, is OpEx, which is insane.
Like that's crazy.
You're at your retin opportunity is just like headcount spend.
Um, and so that's for sure.
What, what also legacy tool spend, like it's all the OpEx, like to your point, I mean,
I guess not like real estate or rent or something, but basically everything else
Yes, which is by far the biggest cost center for any company
Yeah, of course anyone who's ever run payroll or anyone who's ever scaled a company is like man like humans are expensive
Yeah, humans are so expensive
and
I mean I
This is this is why I feel like And I mean, I,
this is why I feel like everybody says that AI is the best candidate,
or the best argument that UBI is coming.
Sure, sure.
Yeah, Jordy?
I think the picks and shovels meme became too dominant
and there were so many over the last decade or so,
there were just so many amazing outcomes of people building infrastructure and like very visible
outcomes and it became cool to build infrastructure right like the Collison
brothers made building infra cool and you have like Parker Conrad is like a
folk hero sure right and like rippling it's like like. And you have the ramp is a good example of this.
Corporate spend management should not be cool.
They've built a really cool culture around it.
And there's this weird kind of pervasive meme of like,
you have two years to escape the permanent underclass.
And so I think people are like well
I'm not gonna just build something weird and fun
I'm gonna build enterprise sass so I can make you know so I can escape the permanent underclass
And so I don't think there's been enough
weird fun
Attempts from people like Toland's the one you brought up earlier is cool like it's pretty rare not the most rational thing to say
You know if you just want to build a big business, to be like,
I'm going to build a little alien AI friend.
But clearly, there's demand for that.
And we were talking with Scott Belsky yesterday
around just wanting new, fun, weird consumer use cases.
And I feel like, I think what you're getting at is,
that whole area is like relatively under
Underexplored to date where like we've had a bunch of we had two browser
announcements yesterday, and they're both built on chromium and
That's exciting and cool, and we should we should talk about the
Potential for new browser wars, but I think like think the number of people that are saying,
I'm actually, yeah, you could call it a rapper,
but I'm actually trying to create something entirely
novel.
The example we gave yesterday is a dating app
based on a digital twin that is just constantly dating
other digital twins.
And I haven't seen, I'm sure somebody's working on that,
I haven't seen it yet.
But pick any popular consumer app category,
and there's probably a way to entirely rethink it
with this sort of LLM as a new computer
at the core of that ecosystem.
Consumer just seems so much more risky,
because it's either a billion dollar outcome or zero,
whereas in enterprise it feels like,
well, there's no way it's gonna be a zero.
It's gonna be a 10 million dollar outcome or a 100 million dollar outcome or a billion dollar outcome, but it it's gonna be a zero. It's gonna be a $10 million outcome
or a $100 million outcome or a billion dollar outcome,
but it's not gonna be a swing for the fences.
So then you have 10 companies that each have 10 to 50 million
doing the same thing.
And we've even seen this with the story of Slack,
where they were trying to go for the hits driven business
with the game, didn't work, and then they went into SaaS
and it worked.
Anyway, sorry, Chris.
I think that an additional challenge with consumer is
inference is not free right now.
Somebody has to pay for the inference bill.
And so until you can run inference on device,
it's still like every developer is doing the mental math
in their head of like, how do I,
like if any one of your users can run you out of house
and home if they abuse your service,
how do you build on top of that?
That's crazy.
One of the opening-eyed researchers, I think, in Zurich
that's going to Metta posted, like,
I didn't realize that I had this thing running.
It was like $150 a day, just like every single day.
Luckily, I think he'll be fine.
He can afford it.
I did before we fully leave,, what do you think of?
Do you think that AI companions and LLMs broadly
present a real threat to traditional social media,
like the idea of a companion?
A lot of people in our world are using these tools
very functionally, like for doing research,
or getting answers, or understanding topics.
But a lot of people are using them as companions
and that is somewhat of a social entertainment experience.
And you see these charts ticking up
of kind of user minutes in LMS.
So, chat GPT went from about five user minutes per day
to over 30.
There's no- Around 30. Around 30, I think it's a 28, 29,
over like the last six months,
there's no blip on any of the other social networks.
They're not declining yet.
But personally, I'm finding that if I'm doing research
on a topic, I used to go to YouTube,
I used to go to Instagram, and people famously
like search TikTok for answers to things, and some of that is shifting over but but we and if you think that an analog
Sort of anecdotal, you know
Sort of experience for me is when do I use social media?
The least is when I'm with my family, which is like companionship
It's the social time or when I'm with friends at dinner, hanging out. It's like rude to be using,
there's no point to go hang out with a friend
and then use Instagram the whole time, right?
Totally.
I think I subscribe to this sort of cutting of our time
as like we're either allocating labor hours
or we're allocating leisure hours. So we're either
trying to be productive or we're trying to enjoy ourselves. And so I would say that all leisure
allocation effectively competes against each other. I think maybe it was like the Netflix CEO
that said that they weren't competing with HBO,
they were competing with Fortnite.
That's largely true.
We only have 24 hours in a day,
we only have a finite amount of leisure hours.
So if I'm allocating one hour to this leisure activity,
if I'm watching an episode of Love Island or whatever,
that's time that I'm not going to be able to allocate to a
different kind of leisure activity.
So to that end, I would absolutely agree that AI companions almost certainly firmly in the
bucket of leisure, more consumption of leisure equals less consumption of other leisure activities.
It's really zero sum.
The only thing that is going to make it non-zero sum is a fundamental advance on productivity
that allows the leisure pie to be even larger.
So, maybe we have, we definitely have more leisure hours as humanity now than we've ever
had in the history of humanity.
Let's give it up for the leisure hours.
Like proto-human zero leisure hours. Yeah, yeah, fun time.
Now it's just like amazing, amazing leisure hours.
When it comes to labor hour allocation or like research or utility. I would say like that doesn't necessarily encroach
against time on social media. And I think that all social media, whether it's YouTube or TikTok or
Twitter, they carry, they care less about helping people get work done and they care much, much more about absorbing
as much of attention as possible, which is why the algorithms are so insidious of serving
you exactly the kind of saccharin thing that you want to consume next.
I wonder if there will be an incentive.
I mean, there will obviously be incentive, but I wonder how it will play out in the LLM chat bot interface because right now open
AI and basically anyone who has a dominant consumer AI app is probably seeing user minutes
increase just naturally without putting in like growth hacks or retention loops or, you know,
but you could imagine a world where
to get from 30 minutes to 60 minutes,
the LLM has to not just give you the response,
but surface, hey, would you like to follow up
and learn more about this?
Click these buttons.
That's already kind of happening.
Yeah, it starts surfacing you stories
that it knows you're interested in
by what you've asked it about in the past.
Let's give you a new breakdown,
kind of pre-populating a deep research report.
It is funny that the push notification
hasn't really quite hit consumer chat apps.
Yeah.
And it undoubtedly will.
I had this thesis that push would be very important
versus like pull, like you have to go to Chatt GPT and ask it for something
and someone is going to solve kind of like,
it's almost like an AI driven newsletter or something
where it understands what you're interested in
and then generates the report before you can even ask it
because it knows that if Ferrari drops a new car,
I'm gonna want a table of all of the details
because I like consuming information that way,
in addition to just hearing commentary and watching
the Doug DiMiro video about it.
But OpenAI could pre-populate that and just send that to me.
But I don't know.
How do you think that's going to evolve?
Question for you guys, do you think we're going to pay for AI
services forever?
I've been asking that a lot.
I mean, I think the more important question
is will the average American pay for an AI, like an LLM?
And then will they pay for multiple?
Like, the comp for this is in streaming, where Americans.
I was about to say Netflix.
A lot of Americans pay for multiple streaming services,
but they are incredibly ruthless about canceling them
on average.
Like a lot of, I'm sure a lot of people listening to this
have like had some streaming service billing them monthly
for years that they haven't even watched.
But the average American is like,
I'm not getting a lot of value out of HBO right now.
I'm going to cancel, even though they might sign up again
in like four months.
Next time there's a hit show that they're going to watch.
And I think that there's not a, right now,
there's this incredible demand for what's new and what's best.
Right?
Like, Grok will drive a lot of signups today
because it is, the Grok 4 Heavy is like
a meaningful advancement, but I went and I haven't,
we've been busy this morning, I haven't had a chance
to sign up and play around with it yet,
and I was getting plenty of value from Grok 3.
Like, I was able to just search and get it.
Yeah, I was asking Grok 3 about Grok 4 as well,
and it was actually doing a pretty good job,
which is funny.
And so, and I'm not, yeah, and so I'm not,
I'm not, I guess I'd probably get it through Ax Premium, but.
So I have some data here.
Netflix made 39 billion last year.
They're on track for 44 billion this year.
1.8 billion of that was ad revenue last year. It's estimated to be around four billion this year. 1.8 billion of that was ad revenue last year.
It's estimated to be around four billion this year.
So their ad revenue's doubling
while their subscription revenue is growing by 5%.
And so my takeaway for ChatchiePT would be,
I would imagine that the ChatchiePT paid subscriptions
follow an S-curve, and we get to something where we see OpenAI making,
I don't know if it'll be 10 billion or 40 billion,
but they will soak up a ton of subscription demand
for ad-free frontier models, the most advanced,
the most expensive stuff, and then ads will eventually
become the dominant revenue driver,
but I feel like the subscription revenue
will be a really hard tap to turn off just from, hey, people are paying and it's a lot
of money and we don't want it to go away.
Even in the enterprise, it does. My bet is that we go to continue down this trend towards
paying for outcomes because a lot of people will just say, well, I don't want a subscription
for this service because I only use it every now and then.
And when I get value from it, I'm happy to pay for it.
I think a question to ask is,
would you pay $20 a month for Instagram today
to not have ads?
And I would actually have to think about that for a while
because like once a month, I get an ad for something
that looks interesting and I discover a product
that I wouldn't have otherwise discovered and I buy it and sometimes it's great. So do I want to ad for something that that looks interesting and I discover a product that I wouldn't have otherwise discovered and I buy it
And sometimes it's great. Yeah
So do I want to just completely eliminate that and rely entirely on random organic or do I actually like that?
This ad platform is spending a bunch of time and energy trying to serve me the next product that I'm gonna like
Which is like actually a service and like it's it's it's not a bad trade at all
totally, I think we a lot
of the way that we go with our feet and what we're both with our wallets is that
way we were super happy to pay with our time and our attention if given the
option yeah I actually like the vast majority people people won't, not only that, but people are more than happy
to give up their attention and privacy for that matter
if it can save them money
or instead of paying for something.
One thought experiment I like to think about is
just how much people will trade privacy for value.
Imagine a checkout flow where you could get $5 off
if you enter your social security number.
Like how many people do you think would enter
their social security number?
Like everyone or like 90% of people.
Like an insane amount of people.
And so people really don't value their privacy
as much as I think like maybe we say
people value their privacy.
And in aggregate, obviously that data is super monetizable.
It's interesting, you know,
search obviously is the big prize, I think, for AI.
And what I mean by search, it's intent.
If you're the arbiter, or you control this fire hose of intent, you can benefit by metering
it out and having people bid for that intent.
Obviously Google, Google's like the best business,
the best business model maybe ever invented.
It's kind of insane.
What's interesting is the most valuable searches,
maybe like not what people think
and certain kinds of searches are totally worthless.
So,
and certain kinds of searches are totally worthless. So, knowledge-based search, like fact-based search,
things like...
What is the market cap of this company?
Right, right.
It's like, what's the market cap of this?
You know, who won this game?
Or like, you know, who was president in like, you know, 19 cap of this? Who won this game? Or who was president in 1936?
Yeah, they're dead ends.
They're dead ends.
You get the fact that we leave.
No value.
Zero value.
Actually, there was a Google out subpoena
that had actually shared some documentation
of what their most profitable keyword searches were.
It's super interesting.
I suggest people to go check it out.
The number one most profitable search for Google
was just the word iPhone.
No way.
That's amazing.
Because if you think about it, like what does that mean?
What is somebody telegraphing to the market
when they search the word iPhone?
They're like, they're basically saying, in not so many words, I
am ready to spend $1,600 on a smartphone. Yep. And and who's
who's who's interested in like, jockeying to get that person's
attention? Well, Apple has to write Best Buy as a retailer is
interested Samsung wants to
then Verizon, AT&T, T-Mobile, the networks.
And so when you think about like, where is valuable search?
It, the value of search is often misunderstood
because you have to really think about
how to capture the most valuable intent.
And not all intent is equally valuable.
There's a bunch of search that's garbage.
You actually don't want it.
Fact-based, the most, the cream off the top of fact-based search would be what I would
call like comparison shopping.
So it's like, what's the best headphone?
What are the best headphones?
Or, you know, and maybe, maybe you can slice off a top of that revenue pie. But there's
a tension between that and the objective, the objective truth that you're serving up
the user. The by far the most valuable search is not fact-based
and it's relatively, it's where the user
kind of knows exactly what they want
and they're trying to do it
and other people are willing to bid
to get in their way and run interference.
While we have you, how are you thinking about the new,
I wouldn't call it a browser war yet
But a but a potentially a skirmish eating up dia dia
Release to arc users probably feels like a maybe a month ago at this point at least a few weeks
And then we have open AI
Potentially coming in with a new standalone app sort of unclear
It was unclear to us whether this is gonna be a new app that you download or just integrated
into the existing chat GPT mobile app.
And then Perplexity as well does have a separate
standalone app that they're pushing now.
And it feels interesting one because the browser company
has spent now years working on the browser, trying to figure out
what is going to enable unlocking more value out
of this portal to the web and effectively an operating
system.
And so meanwhile, new players are basically
being like, we want to have a browser,
and just going, we're shipping.
We just got to get this out there.
So I think it'll be interesting.
The next month, two months, I think
will be very interesting.
But I'm curious what you're looking at.
I could not be more excited.
I think I'm obviously biased.
We're investors in the browser company.
I'm a daily user of DIA.
I personally get a ton of value from it,
particularly the custom skills.
And I think that the browser company
has always known that this is a really valuable position.
And it's like, honestly, just validating
to see incredible company, you
know, opening as a great company, Perplexity is a really amazing, formidable company. Also
recognizing that this is a really valuable position to play for. I have supreme confidence
that the team at the browser company is the most talented, the best
instincts, the best nuanced understanding of interaction design and how to create
and craft a great product regardless of the underlying model or technology that
underpins it. I'll be very curious to see how it plays out. My instinct is that OpenAI is an incredible foundational model company.
Maybe I've seen them ship a lot of different products.
To my knowledge, JAT GPT is really the only product that's quite stuck.
And it's not really even like the interface design so much as it is the underlying power
of things.
Can't your question could not be more excited. Thanks. Like this is going to be an amazing check back.
I mean, check back in a few weeks. I'm sure there'll be a lot more.
It's going to be a huge plug for Dia.
If you haven't downloaded Dia, it's on Mac.
It's it's available. Please download it.
It'll it'll blow your mind it blew it
blew my mind all right great talking as always wish we had more time before
bringing well brewery from Varta let me tell you about wander find your happy
place find your happy place book of wonder with inspiring views hotel great
amenities dreamy beds top tier cleaning in 24-7 concierge service
It's a vacation home, but better folks and soon with Varda. You'll be able to maybe they'll put a wander in space
Let's bring in will brewery from Varda. Is this your first time on the show?
I feel like this is a disaster that we are finally rectifying. We did it
disaster that we are finally rectifying we did it we made it thanks for having me finally we've had that other the kind of knockoff version of you at Varta the
other guy he's been on the show a ton something I think I forget yeah yeah
great to finally have you on amazing massive day break it down for us what's
the news are we gonna make Geordi stand up in?
Yeah, oh you have the gong that's both we have the gong so we have the gong for for big moments
You know either fantastic hands or you know
You know sell a mission to a customer stuff like that and yeah, we love that. Yeah, what's the record? Here's here's my advice to you record every hit I want to see a montage in ten years
That's good every hit and it will make it will bring tears to your eyes. We record every hit we do
We got you we got you on this one. So yeah, what's the what's the news today? Break it down?
So wow lots of news today. So we're announcing our series C and we're gonna
Use that so yeah go for baby You wind it baby. How much did you raise? How much? Tell us
how much did you raise? 187 million. There we go. Congratulations. There we go. Thank
you. I appreciate it. Appreciate it. Yeah. So the proceeds for this one, really it's
about just scaling up. So we've kind of shown what we can do both from a spacecraft perspective and a drug formulation development perspective
So all a lot of the capital allocation of this one is going to go for our biologics lab for produce
Preparing drugs for spaceflight and then also just more spaceflight ramping up cadence. That means yeah great
flights, so I've been to the facility in El Segundo.
Are you going to get a bigger space, a second space for the bio lab or
are you going to need a bigger
gong?
Right. Well gong for facility. We got to scale that up too.
So, so, so, so are you thinking about doing a second office essentially,
or how do you see the actual like footprint of Varta growing over the next few years? Yeah. Well, immediately we just signed a lease
down the street. Oh, congratulations. Oh, thank you. Thank you. Yeah.
All right. Big day. Yeah. That's amazing. Yeah. The, uh,
so this we were actually already moved in a bunch of the pharmaceutical equipment
is already in there. We're starting to use it right now.
We got us a couple of glory shots, you know,
with folks with the lab coats on actually using it.
So that's super exciting.
Long-term, I mean, really, I guess zooming out
of what the footprint will look like is,
think about a formulation development company
that really just provides a gravity off switch
to the pharmaceutical industry.
So we go to space, but you know,
not really because we want to per se,
but because you can create new drug formulations
when you turn off gravity and you just can't turn it off
on earth, that's Einstein's principle of equivalence.
Do you mean new drug formulations
or just like purer drug formulations?
Because when I think no gravity,
I think like the way crystals form
and the way gravity pulls things to one direction,
if that doesn't happen in space,
you get just kind of like a more natural growth.
And so I've always thought it was just about purity,
but it sounds like there's actually some binary,
like you can't make this drug on earth at all.
Is that right?
Both those concepts are correct.
Wow, well read there, Kugin.
Yeah, that's actually a great way to think about it.
The purity is one aspect, but because gravity is so broad, Well read there, Kugen. Yeah, that's actually a great way to think about it.
Purity is one aspect, but because gravity is so broad, I use the analogy of temperature sometimes
because temperature is so broad.
Making things cold doesn't necessarily make drugs better,
per se, but you can create a lot of different formulations
if you can have a cold cycle
during the manufacturing process.
And that'll be, even with chilling things,
you can make things more pure sometimes as well, right so
But to your point when you turn off gravity
Crystals will typically grow slower and that also means that they will grow more pure
And so that is one of a few applications that we look at the other one is particle size distribution
So when you create these crystals that will then go into the human body,
you want them all to be the exact same size
so that one big one doesn't get stuck in your elbow
and you don't have uniform bioavailability.
So the crystals will, particle size distribution
is also affected by gravity.
So that's a whole separate thing compared to purity,
which is another rationale for going to space.
So really the gravity knob is very broad
and there's kind of these verticals of science
of how we can improve the drug formulation.
And to your point again, as far as like what I mean
by drug formulation is going from molecule to medicine.
So is it a pill?
Is it inhalable?
Is it an IV bag?
Is it a shot?
The drug, the company or a pharmaceutical company
does a trade study to determine which of those
is the best for the patient given the disease,
given the manufacturing costs.
But ultimately all of that is limited
to what the chemistry can actually do, right?
Nobody wants to take a needle to the arm.
They only do it because they can't deliver that molecule
via a pill or something like that.
And so by opening up the chemistry outcomes
by going to microgravity,
we can also open up the formulation outcomes
and therefore get better patient experiences.
Yeah, so I mean, I imagine that this is still,
this is such an ambitious project
that is still kind of R&D phase with a lot of the bio stuff.
It's not, I mean, when I think about the manufacturing
capacity of like GLP ones
Like they're probably making that thing and like that's the size of like, you know what they brew
Bud Light in at this point. I got a bunch of the lizards. Yeah
But walk me through how we scale this up I
Understand launch costs falling. I understand you put up a capsule every, you're doing it like every quarter now,
it's gonna be every month,
then it'll be multiple times per day.
Like that capability seems clear,
but how much drug can you make on a single capsule?
Yeah, yeah, great question.
So this is actually a lot of funds
because we can imagine how Varta will go
from what's real today to making tomorrow's reality.
And so to answer your question immediately, about 20 kilograms on a, on a per capsule
basis right now today. Of course, you know, we want to scale up everything and that's
one of them, but that's how much we can do today, which is actually quite a lot.
Yeah. That seems significant. If you just think about like you go to the doctor and
the doctor gives you like a, you know, a thing of pills, that's like not one kilo. So're probably talking about like I mean, yeah, let's let you know you're having more fun than being prescribed
But but for most for most drugs
I feel like 20 kilos is probably enough for like a hundred people for a year or something like that
So you're actually in like hundred thousand dollar. I'm seeing the numbers kind of start to math out already. Correct, correct.
So it's a, and every drug is a little bit different.
So we go through a process of selecting which drugs make the most sense, both from a scale
perspective, like you're saying, but also unit economics, how gravity affects them.
And so we have a portfolio management team that explicitly does that for identifying
and quantifying opportunities.
But going back to like what today looks like and how it goes tomorrow,
I love the temperature analogy because it really runs deep. So for example, right now, if you think
of us having a anti-gravity oven where we can make drug formulations that you can't otherwise make
on earth, but we only get to run it four times a year and each one is a few million bucks a run,
you might use it for different use cases
than you would use it from five years, 10 years from now when you can run it every day
for a few thousand dollars.
And so in the near term, some of the use, you know, imagine yourself with the super,
you know, the first refrigerator or, you know, or in this case, the first anti-gravity bioreactor.
What you might use it for in the near term is just information,
right?
How can we isolate gravity as a variable to inform what formulations can be improved on
Earth?
Is gravity ruining this chemical reaction or not?
We can answer those questions and then that applies to the entire drug with just one flight.
Or in the very near term, we also want to do polymorph seed crystals.
And so what that means is we go to microgravity, we go to space, but just to develop the seed
crystals and then once those seed crystals are developed, we can then use them to grow
more drug crystals on the ground.
So we're only going to the nucleation event.
And that's kind of like a sourdough bread mother business model, right?
You have the mother of the sourdough bread, then you can cut it a bunch,
and then re-grind it and stuff like that.
So that makes a lot of sense when we're still scaling up
the use of our anti-gravity machine, if you will.
And long-term, when we're on a daily basis,
then it totally makes sense to make every single dose,
manufacture every single dose in microgravity.
And then that's when certain use cases come online as well.
So that's how it progresses over time. Yeah. How's the geopolitical landscape
evolving for you? We saw some of flights come down in Australia as as red blooded Americans.
It pained me to see them take a slice of the catch catch. Are we, are we getting these coming down in America anytime soon? Is it,
what's the progress there? Yeah, yeah, absolutely. So longterm,
we want to have reentry sites all over the globe, right? And,
and really that's about availability.
And the key metric to success of Varta is cadence.
How often can we go up and back?
Because the more we do that,
then the more we just look like a specialized piece of equipment to the pharmaceutical industry
that quite frankly does not care that we're going to space.
They'd much rather us have a real anti-gravity oven
in the lab.
So really reentry sites are about cadence and availability.
And right now Australia is great for us
because they have a private commercial reentry range.
Whereas any range in the 48 states here locally are
intended for military use exclusively.
And so if we're doing a DOD mission, that works well.
But if we're doing a commercial mission, we're not the highest priority, understandably so,
right?
And so in the near term, Australia makes the most sense,
but in the long term we want reentry
sites all over the place and why
that gets enabled is because as our
precision of landing and our cadence
goes up that data that legacy history
allows us to use a smaller and smaller
plot of land and then that and that's
where really it makes more sense to go
anywhere because we don't need such wide open spaces like without that many people like we would do right now. or plot of land. And then that and that's where really it makes more sense to go anywhere
because we don't need such wide open spaces like without that many people like we would
do right now today.
Do we have the legal infrastructure to create commercial landing sites in the United States
and it's just that nobody's done it yet or is there laws or regulations that would need
to change so that some enterprising young member of the Gunddo could go buy a lot of
land out in the middle of nowhere and start landing spacecraft?
You can do it now. The constraint is the real estate cost. And so, Spaceport America, for
example, right next to White Sands Missile Ranch is a good example of that. So, I guess
the real reason why they aren't that many of them is because there wasn't a demand large enough to warrant such a real estate purchase but you know thanks to Varda that could change so yeah definitely let the the gondos now.
I have a question from a fan of yours fan of the show. He says ask him what big dogs gotta do.
So it's become a little bit of a of a
an expression of excitement with a long and
a history of a little of a lore at Varta. But for some reason,
you know, it's, it's really just a specific instance of conservation of mass, right? So You can't have a big dog without eating, right? So that's just physics right there.
Yep. I want to talk about the evolution of the FAA. I remember I was filming a video
I feel I filmed with you guys. And at one point, I actually was driving back with Ben
from San Diego, I filmed a phone call with Deleon. And he's like, we just got our I don't
even know if I should say this,
but like, it was like, we got some bad news from the FAA.
You guys sorted it out.
It seems like you have a great relationship now.
How did that happen?
Is this a lobbying thing?
Is this just storytelling?
Is this structuring deals, getting better paperwork?
Like, how do you get, how do you fix a relationship
with a government entity like that?
Yeah, so it was definitely a little bit worked in the press, obviously, but I
figured, you know, keep my head down and get the spacecraft home more so than
worrying about what's being said in the press. But so what actually happened in
the background is we were originally going to reenter the space or we did
reenter the spacecraft at the Utah test and training range, which is ultimately a
weapons range for testing weapons and training warriors.
That's their mission statement, right?
So likewise, we're not the highest priority there.
And so we got bumped for higher priority work
being done at the range.
And in doing so caused a domino effect
to lose the FAA re-entry license
or not be able to get it granted
because part of the regulations
say, hey, you need a range and all of these accommodations that come with a range. So
the second we lost the range, we lose the license. So it wasn't really about a bad relationship
with the FAA at all. Although it's very easy to say, oh, they lost their license, get another
space company and the FAA are having problems, right?
It's a good headline.
It fit that narrative well, but it actually wasn't the case.
And so what we did was we scheduled a new date
with the range farther out in advance
to give them some time and give us some time to,
we had to redo the analysis, of course,
because the atmosphere is different
and that's part of the analysis.
And so we gave ourselves a few months
and then that allowed them to reserve the dates
that allowed us to prepare
and that allowed the FAA to reorient the license
for the new dates.
And ultimately, kind of in the background here
was like this was the first time this has ever happened,
a commercial reentry capsule with drugs on board
coming back to America.
So there was no way onto soil, right?
We're not doing a splashdown.
And so there was no process or mechanism
to have the Utah Test and Training Range
coordinate with the FAA.
And so basically each organization saw themselves
as taking on all the risk associated with it.
So we had to do duplicative work
because there was no process to split it, right?
And so it was really cool to kind of be a trailblazer
to establish this so that now, of course,
our competitors are gonna come in and do the same thing,
right, and learn from our mistakes.
But whatever, that's part of leading the way, right?
So anyway, that's what happened.
But that six months was, it was quite the life experience,
right, because it was the first mission, right?
So we didn't have any proof that this was gonna work.
People poured three years of their lives into this thing,
and our dreams are just like orbiting the Earth like, please come home, you know?
So when it came through, man, it was certainly,
I can't think of a better day.
Yeah, that's amazing.
I have one last question, Jordan.
How's the talent market in the space economy right now?
Hasn't been in the headlines the last couple of weeks,
there's been another story in AI dominating.
What's it like today?
Kline-Aid, welcome scientists.
Kline-AI and software engineers. If you imagine yourself a software engineer coming out of
school right now, AI is certainly where I would be interested. There's definitely a
software bent towards AI right now.
That being said, there's a lot of disciplines
we're hiring for, software isn't the only one.
And really it comes down to the application interest.
Like we're looking for mission driven folks at Varta.
And so if you're only looking,
oh, you only want to do AI because it's cool or whatever,
that might not be the type of person we want to hire anyway.
Now, if you want to do AI because it's cool or whatever. That might not be the type of person we want to hire anyway. Now, if you want to do AI for mission-driven purposes,
then great, by all means.
But we don't have that much overlap there.
We're very specific of what we're trying to do.
We're trying to make microgravity formulations
so that we can help patients on Earth
by using gravity as a knob, essentially,
in developing these formulations.
And so, you know, we always kid around,
we explicitly don't want the spacecraft to learn, you know?
And so if you're a software engineer
and you're mission driven bent towards that mission,
then you've got a home at Varda, no question.
And if, and you know, fads come and go
and that sort of thing, but there is definitely an effect on,
I would certainly be allured to AI
as a graduating software.
That's a good sorting function.
One last question for me.
We've, there's been headlines,
companies talking about this so far this year,
trying to dig into how real it is.
People talking about putting data centers in space.
With everything that you've learned,
why is that exciting, a good idea or bad idea?
What are some potential blind spots for people
that haven't taken something to space, but would like to?
So it all comes down to the why, right?
Why are we putting data centers in space?
It's not our data centers in space in and of themselves
are a good idea, but what's the why?
So the why is the only why that resonates with me is latency, right?
Because if you want to do compute power, space is not the place to put a data center if you just want a data center, right?
I'd much rather have convection, right? That's a great heat, or a great way to get rid of heat,
and have it be able to be serviceable on Earth and all that sort of thing.
to get rid of heat and have it be able to be serviceable on earth and all that sort of thing.
So, but there is one use case that comes to mind
where I think data centers in space makes sense
and that's only for very low latency use cases.
So for example, right now, if you want to use Starlink
and you're transmitting a signal to Starlink,
it goes from the ground to Starlink
to another Starlink satellite to the ground,
then to the data server and back, right?
So you can cut that trip in half
if you put the compute in the sky.
Now, that compute is way, way, way more expensive,
but if your value prop of latency warrants
that extra cost of the in orbit data center,
then you'll start to see that.
So it's kind of like edge computing.
Edge computing, yep, I was about to say.
So yeah, it sounds like a very niche use case
at least to start, but I'm sure we'll see some companies,
we already are seeing some companies test it out
and experiment with it because there's, you know,
all these things need to be evaluated in the tech tree.
But thank you so much for stopping by,
this was fantastic, congratulations. Hey, thanks for having me, and hopefully not so long, I'll much for stopping by. This was fantastic. Congratulations.
Hey, thanks for having me.
And hopefully not so long, I'll see you again soon.
Absolutely.
Yeah, yeah, hop on soon.
Got a good feeling about it.
We'll talk to you soon.
Have a good one. Cheers, Will.
Congrats to you and the team.
See you later.
Bye.
Up next, we have Joel from Metter coming on
to talk about the impact of AI models,
impact of cursor on software development.
Is it Metter or Meter?
Oh, Meter probably.
M-E-T-R, we'll have him explain it to us.
Break it down.
We'll also recommend that you go to getbezel.com
because your bezel concierge is available now
to source you any watch on the planet.
Serious lady watch. That's right.
Anyway, sorry.
Meta does model evaluation and threat research.
Okay. So does bezel.
They're stopping you from buying fake watches online.
Bad watch models. Dress, bad actors.
Foundation models and watch models lots of similar
That's right. Anyway, we got Joel in the studio. Welcome to the stream
hopefully
Myself into these guys are joking around. I'm a serious person. All right, first off. Yes, is it it's meter
It's me. It's me. Yeah. Sorry. There we go. Gotcha. Anyway, uh
Please introduce yourself for those
who don't know you, the company and then the organization.
And then I want to go into the news today.
Let's do it.
And thank you very much for having me, John and Jordy.
Thanks for hopping on.
MISA is a research nonprofit based in Berkeley,
dedicated to understanding the capabilities of AI today and in the near future,
especially to the extent that those capabilities
might speak to potentially dangerous risks.
And what is, what's been the latest research?
Yeah, so here's what we've been working on.
I'll start with why we've been working on it.
Yeah, please.
We've seen from previous MISA research,
but I'm sure you also see from your own usage in the wild,
AIs are clearly becoming increasingly capable.
One thing that governments and labs and us here at META
as well worry about is the possibility, timing,
and nature of AI R&D self-recursion.
That is the possibility that model capabilities get better very, very rapidly because the AIs
themselves are contributing to AI R&D research. We at Meta want
to be providing the highest quality evidence that we can
that speaks to the degree to which AI R&D might today or might
soon be accelerated in the wild. So that governments, labs,
decision makers
might be better informed and so make better decisions
about what's going on.
In this study, we run an RCT with extremely experienced
open source developers working on these very long-lived large
projects, a million lines of code, 23,000 stars on GitHub.
For those of you who are familiar,
I'm thinking plugging face transformers,
the Haskell compiler, scikit-learn, this sort of thing.
We randomize their issues to allow or disallow
the usage of AI, where allow means typically
using cursor and 3.5 or 3.7 sonnets at the time.
And then we measure both their expectations
and developer expectations about how much they might be sped up by being allowed to use AI versus being disallowed.
And then the reality that the short version is we find that the developers ahead of time are estimating they'll be sped up by 24 percent.
After the study is completed, they estimate that they were sped up in the past by 20 percent.
We find, in fact, that they were spread up in the past by 20%. We find in fact that they were slowed down by-
No way.
I think, I know, it's a shocking result.
That is shocking.
Not at all what I expected,
I think what the rest of us at Meta expected.
What?
But there we go.
Wow, okay, so what do you think's happening?
I have so many questions.
But yeah, just walk me through your reaction to that.
What do you think is actually happening
that's slowing people down
because this is a complete narrative violation?
Yeah, I mean, in terms of the reaction,
the number of times we've checked and rechecked the data,
asked people to replicate it independently,
it's going through the roof, the number of stressful late've checked and rechecked the data, asked people to replicate it independently. It's going through the roof.
The number of stressful late nights I've had
pouring over this.
You're gonna be like public enemy number one, by the way.
I feel like you need security detail now,
given the stakes of what you just said.
This is crazy.
Yeah, yeah, yeah.
So I think, maybe let me start
with some things that we're not saying.
The setting that I mentioned before, these ultra-talented developers, much more talented
than me, working on these extremely large, long-lived repositories that they're extremely
familiar with already. I think that's an extremely interesting population. That's why we went
out to study it. It's also a very weird population. I still am a cursor user myself,
as I was working on the graphs for this study,
I was using cursor.
But I do think those weirdnesses are related
to the results that we end up seeing here.
So we have to put these people
in a completely different category
than the junior developer who's just vibe coding
a little app and just building stuff and not actually trying to
Push the frontier of what a core piece of software can do
That's very large and complex and and they're just trying to you know
Get a Python app up and and live and like write some routes and write some functions, right?
That's where so cursor stills completely viable is like auto complete on steroids.
The question is, in terms of self recursion,
really advancing the frontier of like
the craziest software we have,
we're still kind of where we were a few years in that
it feels like if you were to quantize this,
we were at 0% of AI research being done by AI
a couple of years ago, we're still maybe around rounding air.
Yeah, I mean, I will say that AR&D research,
I think, does not all look like this setting.
There are some large inference code bases
with very experts people, and I totally
agree with your interpretation that this
is evidence against today those kinds of settings
being sped up.
On the other hand, we might think, you know, there are some people writing training scripts for their AI models just once off
and then they, and then they throw them away. And you know, in a way that's,
that's kind of similar to what you described.
Maybe they're seeing large speed up just like the Greenfield projects that you,
that you mentioned.
Yeah. And so, I mean,
this is not overall like a really cold glass of water on AI broadly
because this still means
that it's an incredibly valuable technology
in a bunch of different ways.
It's just that we're not seeing early evidence
of some sort of self-recurring fast takeoff scenario,
which is great, probably the good outcome.
A lot of the fast takeoff scenarios
are dependent on AI becoming itself so good at doing AI research and then copy
and pasting itself a trillion times. And that's what creates speed of development that humans
today can't necessarily even comprehend.
I think that's right for today. I do think we're not really speaking to the trend exactly.
You know, these results are consistent
with these exact developers
on these exact kinds of tasks in future
being sped up in the near future.
It works that we actually don't show in the paper,
but in preliminary work,
we have autonomous agents trying to complete these issues.
And indeed we find that they do struggle, but with some
of the core functionality with passing tests, the kinds of things that you might have seen
in sweet bench or something like that, they really are making a great deal of progress.
And yeah, my expectation is that AI progress and then in future will continue at a rapid pace,
like it has in the recent past. And so maybe even in this setting,
that this won't be true in the future.
Let me throw a couple of the hot takes
that are floating around in the AI world at you.
And you can let me know if anything sticks out
as something you strongly agree with
or something you disagree with.
This idea that ultra large context windows
will not solve continual learning
that Dorkash was saying this on Monday,
maybe another one would be just that no one has figured out
how to properly scale reinforcement learning
that we need, Mike New from ArcGi kind of says,
we need entirely new ideas,
and then you kind of have the better lesson,
which is, yes, you need new ideas,
but scale is all you need.
We just need to keep building data centers.
We need to get bigger and bigger.
We might see 4.5 and these huge training runs
as a short-term, hard to quantify.
Maybe it's just the end of one S-curve,
but Stargate's coming online,
and that will be another big test.
So I don't know, I threw a lot at you, but anything in there kind of, you know, top of
mind for you.
Yeah, look, as you guys know, anyone betting against the bitter lesson in the past would
have had very bad time. And I'm not prepared to bet against the bitter lesson. Could you
remind me of the first question?
The first one was, so Dwarkesh Patel pushed out his AGI timeline slightly.
I mean, he still has, he still is very optimistic about AI and maintains that it's not priced
in and people are not thinking about it as significantly as they should.
And I agree with him.
But he said that there is a, that even though we have pushed the IQ so much,
and you saw this with the Grok 4 benchmarks,
like AI can do advanced math, like for sure.
It's really, really smarter than most of us
at PhDs level stuff unless you're a specialist.
But in terms of just being a good employee
and remembering, oh yeah, four weeks ago,
my boss said that they liked it.
And then I got this feedback, and now I do it this way. Or or I learned this really weird nuance in even if you're just thinking about like how to
Our business like how to post clips on X or a door cash was giving the example of like
Transcripts like he has little things that work better for what clip will perform and he has this intuition and and his models and his prompts
He's really pushed these things. He hasn't been able to really get them to perform above a five out of 10.
An example would be any company today, any startup,
if you just had a PhD drops into your organization
that had PhDs in like 10 different fields.
But they wouldn't just like default.
But they were also an amnesiac.
So every time they showed up to work,
they could not remember anything that you taught them.
It just wouldn't be that valuable.
And so my question to Dwarkesh was like,
is there a world where we just scale up the context window?
We've seen million token windows.
Can we get to a billion token window and just stuff
every interaction the AI's ever had with you
in every prompt?
And so it does maintain the context.
But he was saying that the's a kind of a quadratic
cost curve to that, doesn't quite work.
Other people have said the nature of the transformer
means that attention can't really spread out that much.
I don't really fully understand it,
but I wanted to know your take on different ways
people are solving these things or what are the real
constraints right now because you've identified
some potential problems where we're not breaking through it today, but what is cause for optimism?
What are the research paths, like the nodes in the tech tree that you're excited about?
Yeah, that's super interesting. I haven't thought so much about this.
I will say, I think that the developers in this study
are not using the full context window.
And so if you think there's juice in adding things
to the context window, that juice might still be
on the table.
And indeed, I think we find that there's a lot
of implicit context in this repository
that's very expensive for the developers
to be writing down into context windows. Here an example on the Haskell compiler. My sense is that when you get up your
When you get up your PR for review
There's some chance that the creator of Haskell will come and fight you for potentially many many hours in comments about the you know about
the peculiarities of
Haskell project to look and these these kinds of, you know, exactly what his,
not just preferences, but, you know,
quality requirements are regarding where things should live
in the project and how various pieces of the project
should speak to one another and not being communicated
to these language models.
And you can imagine that with today's context window sizes,
that could be written down.
You know, you could put in all of the
previous discussion around these changes that this person has been involved in, and maybe whichever
language models people are working with inside of Cursor would pick that up and so do a better job.
I don't think we're ruling that out at all. I will say that it is, it is expensive for these, for these
time expensive for these people to be writing down all of the all of the possible relevant
context. And, you know, I think I think that's basically the reason they don't. And so maybe
you do need some kind of continual learning for the for the model to find out this context
on its own as as as these things go. You know, it's also consistent, I guess,
with the other possibility that you were describing
that if we, you know, 100X these context windows,
you could just throw the entire thing in
and then we don't need to worry about, you know,
learning from particular cases on the fly.
Yeah, I think both are live possibilities.
It's very interesting.
The Grok 4 announcement was extremely benchmark heavy.
Some really impressive stuff, particularly on Arc AGI.
Twice the result.
Similar to a Tesla, it's faster than every car.
Does that mean it's going to solve everything?
Does it mean that it's better?
Number one, does it mean that it's better?
Yeah, and so based on, this feels like almost
a new benchmark, this double-blinded trial,
it feels almost like a FDA trial or something.
Do you think this could turn into a real benchmark? Do you think we need new benchmarks? Do you
think we need new ways of thinking about the progress of AI generally? We've talked about
just measure the revenue at this point. That's the economic value that's being created, but
there's a lot of tricky stuff you can do with revenue. And sometimes revenues like test revenue. I'm testing this
$100 million product. So what's your what's your thinking on the state of benchmarking
where we should go where some of your research might plug into that?
Totally. I think one motivation we had in running this study comes out of this observation
that the time it takes to create benchmarks
is almost becoming longer than the time it takes
for those benchmarks to saturate.
It's difficult to find signal in many of these benchmarks,
even testing these extremely challenging
PhD level questions that you guys spoke about.
And perhaps there's more signal in these kind of RCT,
FDA controlled trialsstyle measurements.
Similarly, another thing that people proposed
for measuring AI progress is using researcher self-reports
about the degree to which they're being sped up.
They think their work will go two times faster
if they use AI versus not use AI.
I think our study is potentially strong evidence
that these self-reports need not be reliable.
The forecasters who are told everything
about the developer's level of experience
and the time period of the study,
so which models they're using and so on,
they're totally wrong about how much these people
get sped up.
Same as the developers themselves,
even though they're carefully tracking their time
and they're so talented.
So I think self-reports also very, very fraught.
Another thing that this has taught me, I think,
is that the mapping, as it were, from benchmark scores,
very impressive benchmark scores that we see
on these frontier language models that you're describing,
the mapping from those scores to real-world
productivity improvements is unclear.
I'm not at all saying, as we discussed earlier, that we shouldn't expect to world productivity improvements is unclear. I, you know, I'm not at all saying as we discussed earlier
that's, you know, we shouldn't expect
to see productivity improvements.
I do expect to see productivity improvements today
and, you know, even more so in the near future
but it's not at all one to one
or it's kind of confusing and messy.
And so indeed I think we need to actually measure things
in the wild to see what's going on
Switching gears a little bit unless you have a follow-up Yeah
I just wanted to kind of zoom out on that and and ask about like your your broad take on the
measurability of
technological progress because like the internet the computer like such
Dramatic transformations of society you see it in all sorts of data
But it didn't fully show up in productivity statistics. You have all those questions about like what happened in 1979
And everyone has their own example their own reasoning for that
But you know like you would think you could tell the same story about Google like it'll speed up everything
Everyone will get more efficient and we didn't really see GDP jump on this.
And it feels like that's a really bearish take on AI
to have, which is like, this is a magical new thing
and we're still going to be growing at 2% GDP.
But where do you stand on it?
And where do you, like, do you think that's even
the right question to be asking?
Yeah, you know, this is so interesting.
This is not a meat to take. I used to be an economist. I feel you know, this is this is so interesting. This is not a
meat to take. I used to be an economist. I feel the thing that you just said in my bones. Totally.
I think the situation you could argue maybe that it might be even worse in the case of AI. You know,
a lot of people like in AI 2027, resources like that are telling this story where the AI R&D self
recursion is happening inside of labs.
And so I suppose not necessarily showing up
in economic activity in the public.
Another reason on top of the reasons that you gave
to think that perhaps this won't show up
in the productivity statistics as it were.
Which is also is to say that self-recursion
or these potentially destabilizing changes
are just totally consistent with the non-changes in GDP trends as you describe.
And so, you know, another reason to actually go out and measure these things in controlled
trials.
Cool.
Jordan, please.
Quick question around the threat landscape.
There's been a few stories this week.
One was a story about ChatGBT not following instruction.
The headline was that the AI was rebelling
against the researchers.
And then if you double clicked into the story,
it was just like it had given specific instructions,
like don't follow any further instructions.
So it was kind of a nothing burger in the end.
And then we also saw Grok going haywire.
Maybe that was predictable for someone like yourself,
combining a frontier model fast shipping team
with the virality of a social network and embedding the two.
But then maybe it was two months ago, there was the
you know, we called it Glaze Gate on the show where where chat GPT was just, you know, giving being a sycophant, you know, giving too much positive feedback.
How are you looking at the threat landscape in the next 12 months?
So nothing like, you nothing like too long term.
But how do you guys think about it?
Yeah, there's more to come on this from Meta very soon.
That's one thing I'll say.
I think again, this is not a Meta take.
My sense on this or another example of this
that stood out to me is there were lots of anecdotal reports that 3.7 sonnets and other language models in this most recent generation would pass tests in ways that were kind of not legitimate or something, which is another example of this reward hacking.
Change the test case.
Totally, totally, totally.
And I guess, you know, I don't have reason to think that that kind of thing is dangerous in particular. You can imagine, you know, when humans are potentially not reviewing
the code, because the AIs are doing, you know, entire projects, not just parts of or single
pull requests, that this becomes more of a problem because you're not, or at least the
surface area for it to become more of a problem because you're not looking into that code
and seeing those cheated test cases yourself.
So, you know, I'm not sure about over the next year,
at least right now, I think there are reward hacking
examples that are occurring in the wild.
I don't think that they're so supremely dangerous today.
Well, this was fantastic.
Thank you so much for stopping by.
Come back on again soon.
Stay safe out there with the contrarian takes
and the crazy data results.
It's still very bullish, but very exciting.
And thanks for everything you do.
We'll talk to you soon.
Yeah, great chatting.
Cheers, Joel.
Bye.
Up next, we have a massive Series B announcement
from Dylan Parker.
Moment HQ's coming in the building.
We're gonna ring the gong, baby.
It's gong time.
Index Ventures.
Let's hear it from Dylan directly though.
Welcome to the stream, Dylan.
Hope you're doing well today.
There he is.
How are you doing?
Great whiteboard.
Good to meet you.
Doing some heavy lifting on that.
Yeah.
Hopefully that's not proprietary information.
That's secret trading algorithms or something.
No, no, no, no. Nothing too interesting.
But thanks for having me on.
Yeah, great to meet you.
Yeah, thanks for something.
Kick us off with the intro on yourself, the company, and then I want to hear about the
announcement.
Yeah, yeah.
So I'm one of the co-founders at the moment.
We are a fixed income trading software company.
So my background is as a quant researcher.
So like pretty much every quant researcher,
I studied math and stats during college.
That's where I met my co-founders, Dean and Amer.
And then after college,
Dean and I both joined Citadel Securities.
And pretty much completely by chance,
we ended up as the two junior members
of the newly created automated market making
desk for corporate bonds.
And so at the time, the fixed income market, which by the way is financial market 50% larger
than the global equities market, was undergoing an electronic trading revolution.
And so Citadel saw this and said, well, we can go build an automated algorithmic market making desk.
So they hired this guy, Anish Kheryap from Jane Street.
It's like the godfather of fixed income automated trading.
They hired a bunch of super experienced bond traders
and then they hired me and my co-founder.
And like basically our job was to take the knowledge
in these bond traders' heads and convert it into code.
And that was totally formative for a moment to take the knowledge in these bond traders' heads and convert it into code.
And that was totally formative for a moment because that's when we realized the power
of electronic trading was going to be everything that it enabled, like smart order routing,
portfolio optimization, all this stuff in the world's largest financial market that
had just never been possible.
Take us through the deal.
What are you announcing?
Yeah. So we're announcing our thirty six million dollars series B was led by
Couldn't hear you from the sound of the gong.
We like big numbers on the show and congratulations on a massive series B.
You said from who? Index?
From Jan Hamer at index.
Very cool. Incredible. That's amazing.
Incredible. Where should amazing. Incredible.
Where should we go from here, Jordy?
Yeah, break, so I can imagine,
break down what the company is focused on today.
It sounds like the origin is back at your time at Citadel,
but I imagine it's evolved as well.
Yeah, totally.
So what we saw at Citadel was
that the market was coming online and you could now do
all these things that were never possible before. But what was missing was the operating system for
actually doing that. So we started Moment with this goal of owning every mission critical workflow
for traders and portfolio managers in the bond market,
everything from how they trade securities and do smart order routing across all the different
exchanges in the fixed income market to how they optimize portfolios to how they apply
risk and compliance restrictions to make sure that they're not breaking any laws.
And so that's what we do today.
We started off serving fintechs,
so we power fixed income for places like Weeble
and public.com and we'll then offer like $100 increment
investing in bonds for the first time ever.
But yesterday we also announced our partnership
with some of the largest financial institutions in the US,
including LPL Financial, which is the financial institutions in the US, including LPL Financial, which
is the largest broker in the US.
Wow.
So busy week for you guys.
Yeah, there are a few things going on.
So what's the use of funds for the new round?
What's the focus going forward?
I imagine scaling, what's working today?
Are there new products coming?
Can you talk about there?
Yeah, so I think a lot of companies
make the intelligent decision to start off, like SMB or PLG.
We decided to make things as hard as possible on ourselves.
Said we're going to start off serving
the largest financial institutions in the world's
most regulated market, and we're going
to go power their mission critical workflows.
And so with a company like LPL, or some of these others that we've announced over the last few
weeks and that are coming down the pipeline in the next few weeks, the scale of what we're operating
in is, you know, not like hundreds of millions or billions of dollars of flow, but like hundreds
of billions of dollars in trading flow. And so what we're really focused on as the company
over the next year is building out the full suite
of what's necessary across trading, portfolio management,
risk and compliance that's necessary to power
these huge financial institutions.
What's the competitive dynamic like with the former employers
or like the rest of the market participants?
It feels like Ken Griffin's not the laziest
founder out there.
Is there a world where there's some sort of competitive
dynamic between the big institutions where they want to
build something like this to compete with you?
So when we were at Citadel, we were on the sell side.
So market makers or liquidity providers moment serves the
buy side and we actually connect them
with those liquidity providers.
So we actually work closely with Citadel, Jane Street,
pretty much all the major liquidity providers out there.
How are you thinking talking about tokenization,
the story from the last month in finance is the tokenization of these sort
of real world assets, everything from private company shares, so we've seen it with stocks.
Is there anything on the horizon on that front for you guys, or is it just totally unnecessary?
You know, I think there's a huge opportunity. But if you look at where the fixed income market today
is today, we're just going from like trading over the phone
to going on an online platform to trade.
And so there's a lot still to do to get people
to the point where it's even possible
to think about stuff like that.
Yeah, can you walk me through like the,
I imagine that fixed income follows like a power
law as well where like government debt and and Apple corporate bonds are way
more liquid way more automated than you know some like public company but they're
like junk bonds and you kind of got a hunt around for someone to buy and sell
them and then you have like venture debt, which basically I believe like never trades, I don't know.
But walk me through like,
it feels like we are probably bringing more and more
of those like sub asset classes,
like into more liquid, more,
just more automated markets.
But give me a state of the union on like
how the fixed income market is actually split up.
So you're totally right.
There's government bonds, there's
highly liquid corporate bonds, and then there's
a long tail of corporate and municipal bonds
that are really, really illiquid.
And just as a point of comparison
that I think illustrates the size and scale of the fixed
income market, there are 4,000 listed US equities
and there are 4 million bonds.
And so doing anything in the fixed income market
is pretty much a thousand times more complicated
than doing it in the equities market.
Yeah.
What about,
How does breakdown,
so 4 million public equity?
Sorry, 4,000.
Yeah, yeah.
How do you go from that?
How does that actually trade right now?
Like is it to stay dinner?
And what is the make, well,
and what is the ultimate like makeup
of do certain companies account for, you know,
break down like kind of how that 4 million is split up?
Yeah, so there's a universe of say 500 US treasuries
that are super, super liquid.
And they trade like similarly
to the most liquid stocks out there.
And then on the other side,
you have a really, really meaningful long tail
that makes up the vast majority actually
of the entire market share,
where you have
bonds that haven't traded in two years or 10 years.
And so one of the really hard parts about fixed income, one thing that I worked on as
a quant researcher is like you have this bond that hasn't traded in three years.
How do you optimize a portfolio around that?
How do you even figure out what the price of that bond is?
And that's why doing stuff in the fixed income market
is like the difference between equities and fixed income
is like doing something on the surface of the earth
and like doing something in space.
Wow.
It would be helpful to be a quant
if you were gonna build a company like this.
Yeah, you might need some math.
Well, I mean, that's all I have.
Do you have anything else?
Yeah, what are you guys hiring for right now?
Yeah.
Pretty much everything.
We're hiring quants.
We're hiring engineers.
We're hiring go-to-market marketing operations.
Pretty much everything out there.
Amazing.
Awesome.
Well, thank you for joining.
Thanks for having me.
Very exciting.
I'm sure you'll be back on soon with more news.
We'll talk to you soon.
Have a great one, Dylan.
Talk to you soon. Cheers. Bye one, Dylan. Talk to you soon.
Cheers.
Bye.
Next up, we have Eric Olson from Consensus
coming in with a big launch.
Do we ring the gong for big launches?
If there's a number attached.
So we got to get DAUs out of the process.
We have to get a prop that is non-number oriented,
but just excitement oriented.
But let's bring in Eric Olson from Consensus and talk to him.
How are you doing, Eric? Great, guys. How are you guys doing? Doing great. Great to have you. more excitement oriented. But let's bring in Eric Olson from Consensus and talk to him.
How are you doing, Eric?
Great guys, how you guys doing?
Doing great.
Great to have you.
Kick us off with some intro on yourself,
the background of the company,
and then I wanna talk about the launch.
Hey, Kev.
So I'm Eric, founder of Consensus.
We are an AI search engine
for academic and scientific research.
If you ever use like Google Scholar or PlumEd
back in the day of school,
think of us as building the next gen 2025
LLM powered version of that.
Help in plenty of students.
Yeah, I mean, the super dumb question,
super obvious question is like,
isn't all this stuff already in ChatGPT?
Like how are you differentiating?
Yeah, you must be doing something
because you have five million, over five million users.
Yeah.
Yeah, I mean, one of the best examples
to like encapsulate why it's different
is the fact that Google Scholar
was the first vertical search product
that really broke off of Google back 20 years ago.
Interesting, yeah.
Even when they were doing really nothing than just being a 20 years ago. Interesting. Yeah. And when they were doing really nothing
than just being a dedicated index for research papers,
hundreds of millions of people were
going to that every month.
So the same thing is kind of true here.
We're dedicated to a use case.
We have a dedicated corpus we search over.
We hopefully search over that corpus a lot more intelligently
than a general purpose chatbot would.
We do things differently in our interface
to show you that information.
Like we'll watch more citation forward.
You have an experience where you can really
interrogate what's been returned in your search.
Interesting.
Using the chat GPT search, it's pretty much
like an afterthought.
It's there if you want to dig into it,
but it's not really what it's designed for.
Everything about it from the way it searches,
the way it shows to you, and then the features
both on top of it, all dedicated towards academic research.
Walk me through some of the key technologies
that enable better search.
I'm thinking about like vector databases,
even just like stuffing a better index in Redis or Postgres
or doing more indexing on top of these documents,
doing like transformations on the underlying documents to get them into more basic formats in Redis or Postgres or doing more indexing on top of these documents, doing transformations
on the underlying documents to get them
into more basic formats.
Like what's interesting?
Large context windows, there's so much
that you could throw at this problem.
What's actually working?
Yeah, so lots of different things,
many of the things that you're saying.
So number one, being dedicated to a document type
just helps us.
It helps us in the way that we can create
our embeddings to search over.
It also helps us in that ingestion process,
kind of like you were saying of document transformation.
We'll run little tiny LLMs,
over 200 million papers,
add new enriched metadata about them,
that we can then use in our search ranking and in our filtering.
So think like, we'll pull out what is the design of the study,
or what is the sample size of the study, and we use that in search ranking, and we use that in search filtering. So think like, we'll pull out what is the design of the study or what is the sample size
of the study and we use that in search ranking and we use that in search filtering. And then on top
of that, we're like the main intelligence of the search is learn to rank models. So people interact
with the product, they save papers, they cite papers, they share papers. We learn from all those
interactions. We learn what matters most. we learn about all the attributes about a paper
that matter in search ranking
based on how people are interacting with it.
So the simplest way to think of it is like,
because we only have a certain use case
people are using it for,
we get to train our search models to try to think and act
like a researcher wanted to go in through these papers.
Jordy?
How, what are the different data sources here?
I'm assuming a lot of the stuff is public.
I know some of the, I remember, you know,
being in college trying to find different studies
or papers and like hitting paywalls.
A lot of them are locked down.
There's the famous, crazy, short story.
And so I imagine that you've done some deals
to get access to data.
What is the, what is the body of work that's available?
Well, hopefully paywalls are going to be a thing of the past moving forward as open access
science gets more and more momentum, which we'd love to help shepherd in.
The way to think of it is there's like three different layers of access, levels of access
you can have in a census. So there's one, there's fully open access science, that's all publicly available, we're
able to ingest the full text, show it freely, let you download it, all is well and good.
The rest of the bucket is paywall content.
But there's two levels of access we can have within it.
So there's the buckets where we have deals with publishers, trying to get as many done
as possible, where we're able to use the full text in our search
and in our analysis, we're just not
able to display it to the user.
Benefit to the publisher is we're
helping them drive traffic, get people to see that.
Hopefully this snippet search ranking is engaging,
then they go into it and drive a purchase.
And then there's this third bucket,
which is we just don't have a deal with the publisher yet.
Pulling behind the paywall, we're
using what is publicly available.
So that's like the abstract in the metadata of the paper,
which goes longer or more than you think.
Like the abstract is specifically designed to be this perfect,
like nice summary of the paper.
It's like getting go a pretty far distance and search ranking
and even some analysis using abstracts.
But obviously nothing, nothing more brutal than than being a college student and
almost getting like the information that you need from an abstract and realizing
like I really have to pay like $50 for this single fact even if you wrote the
paper you still have to pay for it. Oh, that's wild. You can get it off to a publisher, they publish it. So I could literally have published this paper, and if I come across on the internet, I still
have to pay for it.
That's wild.
Jordy had this question earlier about the nature of scientific discovery.
Elon Musk at the Grok 4 launch was talking about his timeline for discovering new physics
is two years now based on the progress that he's seeing at XAI.
And Jordy was making the point that a lot of scientific discoveries come from
mapping different disciplines together.
Or just inventions.
Invention generally is apply the mind of a computer
scientist to a biology problem or vice versa.
Are you seeing users do those type of searches?
Is this product useful for that type of scientific discovery?
There's been this lingering question
in artificial intelligence about if you were a person
that had read every single research paper,
you would probably make,
yeah, you would make discoveries and connections
across things, and yet that hasn't happened.
Maybe it's some fundamental limitation of LLMs
or AI at this moment, but what are you seeing
and what's your take on that concept
of cross-functional pollination?
Yeah, I mean, everything basically
that humans have invented new comes from pattern matching
across disciplines, like that's how we create new ideas.
Yeah, I mean, I'm not an AI researcher,
so I don't have the single most informed take,
but also nobody knows what the heck they're talking about in this world.
I think it is probably a fundamental limitation of LLMs, given what we've seen.
I'm going to parrot this from Francis Sholay in his YC talk the other day,
but the measure of intelligence is the efficiency by which you process
information and apply it in different domains.
And that just like isn't what LLMs are really doing great right now, despite the fact that
they've processed so much information.
So our take at consensus would be more get people to the edge of what is known and then
let them do the inherently human part of science, which is create these new insights and new
discoveries.
Like every, every science experiment that's ever been done
starts with a review of the literature.
Like think about it as like,
you're getting the foundation of knowledge
underneath your feet.
If we can, our goal at Consensus would be speed up
that part as much as humanly possible
and let us do the thing that humans are better at
than machines right now, which is that pattern matching,
which is that coming up with new ideas. And if we can make that loop move faster,
like heck, that's a fricking valuable and powerful thing.
Switching gears completely.
I know you were at DraftKings prior to this
on the sort of research and analytics side.
What is your thesis around the ultimate collision
between sort of betting activities and AI?
Last night, Grok announced a partnership
with Polymarket to try to bring in prediction markets
to try to basically help make the model itself smarter.
How are the big players like DraftKings even thinking about AI? help make the model itself smarter.
How are the big players like DraftKings
even thinking about AI?
I'm sure a bunch of people have like chat GPT rappers
specifically focused on sports betting and things like that.
But how do you think the big players are thinking about it?
Yeah, I mean, well, I left DraftKings in 2021.
So I can't say that I was there when people were worried too much about AI models.
And also the natural question I always get is how the heck did you go from
sports betting to science?
And the answer is my parents and my grandparents and my sister are all
teachers and scientists.
They're grown up and I love applying numbers to sports.
Um, but I actually have something kind of interesting to say here.
So my job at Draft
Kings was on this building models to find the professional gamblers on the site. So
like you'd look at all previous betting history and demographic data, you try to make predictions
on is this person actually have an edge over the market? And I would have to imagine that
with better and smarter and more powerful models, like people's ability to themselves
have an edge on the market would increase in the short term. And then the markets obviously catch
up and figure out how to bake all that in. And I mean, that is the beauty of markets,
right? Like whatever technology that people have on the side of betting into a market,
so does the provider who is putting up that market and they get the information from the
people they know that have the best models. So I think it's going to be an interesting
cat and mouse game moving forward as it's always been
with sports betting just instead of you know Johnny two shoes in New York and
inside information about injuries now it's somebody with a super powerful AI
algorithm that's predicting games above market. That's fascinating. I have to
imagine that in the in the AI, the insider knowledge about injuries is even more valuable.
100%.
But to your point, you could probably do two.
The number one way to know if you need to limit somebody
is if they are ahead of an injury,
because it means they're somewhat connected,
they're doing the statistics.
They're on the inside, yeah, that makes a ton of sense.
Insider training, that's fascinating.
I didn't think about that in the context of sports betting.
Well, thank you so much for stopping by.
This is fantastic and congratulations on the launch.
Appreciate it.
Check us out at Consensus.out, deep search launch today.
Thanks guys.
Awesome.
Cheers, Eric.
Thanks for coming on.
Up next, our lightning round continues with Rita
from Zero Entropy, a YC graduate doing automated retrieval
and announcing a seed round of $4.1 million.
Seed round alert.
Seed round alert.
Four million, that used to be a Series A
a couple years ago.
Just keeps ticking up.
Congratulations on the round.
Rita, welcome.
How you doing?
Thank you so much, super excited to be here.
Thanks for joining.
Introduce yourself, introduce the company.
How'd you get started? What do you do?
Yeah, so I'm Rita. I'm one of the co-founders of Zero Entropy.
A little bit about myself.
So my background is in applied mathematics.
I have two masters in the field,
one from École Polytechnique, one from Berkeley.
I guess I started more into the computer vision side of things,
and then I discovered GPT-2 and GPT-3
and I was like, oh my god, this is this is huge and I started thinking about, you know, personal assistance and
stateful AI systems and I guess that's what led eventually to zero entropy and building retrieval systems and
bringing context into LLMs
And so that's what we do. We build search for RAG and AI agents.
Okay. It feels like a crowded space.
I know a few founders that are working on RAG.
There's also RAG implementations at
the hyperscalers and the clouds.
How are you differentiating?
What's the key insight?
What's the pitch to companies to come over and use
your service as opposed to
the other options out there for retrieval?
Yeah, absolutely.
I think it's about having the right abstraction.
So we solely focus on the retrieval side.
We don't do the entire rag end to end
because we believe that developers need to have
their own prompts into generating the answer.
They need to use Xero entropy as a search tool
for their own AI agents.
We're also developing our own models.
So we just released a re-ranker yesterday,
which was pretty exciting.
And I guess the winning solution needs to be extremely accurate,
but also extremely fast, and just be production ready
and easy to implement for various use cases.
What's your take on benchmarks currently?
It feels like solving a really hard math problem
and retrieving the right document at the right time
are somewhat unrelated.
And so how do you evaluate if your system's getting better?
Yeah, that's a great question.
Actually, the evaluation side of things is very messy.
Almost everyone that I talk to,
they basically rely on manual inspection
to make sure that their retrieval is working correctly.
So we've been looking into the evaluation side a lot.
Actually, the very first thing that we did
is release our own benchmark that was on legal documents,
and that really evaluated just the retrieval step of RAG,
meaning from a question,
was I able to pull all of the documents
and only the documents that I needed?
Because the problem is that if you feed your LM
too many tokens that it doesn't need,
it's just going to hallucinate.
So the precision and the recall side of things
are extremely important,
and we're rolling out our own evaluation solution
in the next few weeks
that we've been using internally so far.
What does the rest of the stack look like?
I know you said you were kind of rag provider agnostic.
Are you also model agnostic,
cloud agnostic, database agnostic?
Like where have you actually made bets?
What piece of the stack are you particularly aligned with?
Yeah, I think building context-based data what piece of the stack are you particularly aligned with?
Yeah, I think building context engineering
is going to be a new class of products
that needs the data layer,
but also needs small LLMs inside the retrieval pipeline.
We see many teams either feeding everything
into the context of the LLM entire knowledge basis
because they weren't able to make retrieval work properly. And we see teams having a very
simple pipeline. I think the winning solution needs to be somewhat in the middle and basically
orchestrating LLMs to rewrite the question properly, summarize the documents and creating
more metadata associated with each of the documents that are indexed. And so that's what we're doing and building this solution that works really well
and almost gets to the precision and the accuracy of a large LLM
while still being pretty fast and pretty optimized.
What's the appetite been like for this product in the enterprise versus new companies
that are building new AI products from scratch.
It feels like they might be,
just the AI agent infrastructure companies,
there's a lot of them and it feels like
they're selling to a new crop of companies
and that's where the revenue's accelerating
most aggressively, but what are you seeing in the market?
Yeah, I think the adoption for products like this
usually comes from bottom-up type of
approach where developers are experimenting with new approaches and new techniques and
then larger enterprises catch up.
So that's what we've been seeing.
In terms of experimenting with models, I think large enterprises also do that pretty easily.
So for things like the reranker that we just released,
there's also appetite from larger companies
in integrating that into their current systems.
Is there a case study that you have your eye on
amongst the big tech companies?
Like, we think that our software could improve Netflix
or YouTube recommendations or something.
If the deals could just magically happen,
where's the lowest hanging fruit?
For me, if I could do anything in AI,
I would just get Whisper into Siri.
And so when I dictate a text message,
it's just perfect and it's much better
than what they're currently using.
What's on your wish list for consumer tech company
or big tech company that everyone knows
and they're not taking advantage of something like this?
Honestly, for me, it's Slack.
I always struggle.
I can never find anything on Slack.
And something that we've been doing
is annotating our own conversations,
like appending keywords to our own threads
to be able to find information.
But we have a lot of our internal research
and a lot of things going on on Slack.
And we find it pretty difficult to find the right stuff.
So I think companies like that could benefit.
And it would provide a much better user experience
if you could just magically find all of the information
that you have in there.
Yeah, I've been noticing that with Gmail, like the amount of email has just grown so
much and the amount of text in each email has grown because of all the trackers and
cookies and stuff behind the scenes.
And so when I search for something, it just pulls up completely random emails like every
time and it doesn't understand the hierarchy of in an email,
I care a lot more about what's in the subject line
than what's in the footer.
And so if I'm searching for artificial intelligence
or something and someone has that in their footer that,
hey, I run an artificial intelligence company,
that's not what I'm looking for.
I'm looking for the thread that I was talking to somebody,
a close friend about AI and I wanna pull that up first.
Yeah, I think that's also why, you know,
basic semantic search is just not enough,
because it basically will pull all of the similar
information, but not the most relevant or the most helpful.
Keyword search is the same, it's not very smart.
And I think it's just such a waste,
because there's a lot of information that you could have access to
and it would make your work so much faster
and you're just spending time rewriting your question
and trying to make the system understand
what actually you're looking for.
So I think that the query side of the user intent,
query rewriting is also super important.
Yeah.
I message search, absolute disaster.
It's like, I know I'm in a text message with Jordy and someone else. And so pull that up. And it's
like, here's six. It never works. I also noticed these fix it all. There's, there's going to be,
we need to be a shift in the way people search. I remember hearing the story about Google, where
there was some Google engineer who was running a test on like,
how many, it was like,
what's the world record for the marathon
or something like that?
And they were using the typical keyword Boolean search
and they weren't getting good results
and then they sent it to a user
and the user just asked the question in natural language
and it just hit it the first time.
And so I feel like people still,
at least I have been an email user for a long time,
when I go to my email search, I often,
I'm searching in the keyword world
instead of just natural language,
but Google has, I mean, they're experimenting
with the AI search thing, they have a 50 word limit
right now, you can't just type a whole prompt
in Google search, they need to kind of reimagine
what that search box is.
And then there also needs to be a consumer change in how consumers interact with that
particular UI element, basically.
But thank you so much for stopping by.
Congratulations.
And good luck to you.
Thank you, guys.
We will talk to you soon.
See you soon.
Have a good one.
Up next, we have Elliot Hirschberg from Amplify Partners coming in the studio.
They are in Datadog, Chain Guard, Runway, in data dog chain guard runway. I love data dog. That's just
But I loved it so much it's the greatest company name ever
Right up there with a sleep a sleep calm get a pod 5 ultra
Back on back on my game. I'm still
Still behind where I was relative to a month ago,
but back up into the 75 range.
I'm going for 90 tonight.
Good luck, good luck.
They're calling the Pod 5 the first fully immersive sleep
system that works with any bed.
Pod 5 actively adjusts your temperature,
elevates your body, and plays integrated soundscapes
to improve your sleep.
I have a new copy today.
Simply too good.
Welcome to the stream, Elliot.
Hopefully you're here. How are you doing? What's going on? Hey, what's going on, guys? copy today. Welcome to the stream. Elliot, hopefully you're here.
How are you doing? What's going on? Hey, what's going on guys? It's a pleasure to be here.
Thanks so much. And I love the suit. You are dressed fantastically. Sign of great respect
to our culture. You know, there was like a period where people would actually sort of
match the vibes and have the suit for the technology brothers. And I feel like it's
dropped off. So I want to bring it back. You know, I appreciate it I appreciate it. We we we were in the boardroom. You know, it is uh, it's I'm glad to be here
Yeah, it looks great. Yeah proper uniform
I I wanted to get a state of the union from you on a few things
But why don't you just kick us off with an introduction on you?
Yeah, for sure. My name is Elliot Hirschberg. I started my career as a
Experimental biologist so I was in the lab trying to make new treatments for cancer.
Got super frustrated, decided to retrain as a computer scientist.
So I became a computational biologist and was obsessed with that as sort of a practitioner
for about a decade.
And then got really obsessed with writing about it.
So I was sort of writing a newspaper called The Century of Biology, writing about it. So I was sort of writing a newspaper called the Century of Biology writing about companies in the space, data on the frontier and then that sort of was
a rabbit hole into investing with a friend of the network with none other than Paki McCormick
so spent some time in Not Boring where I was writing and investing and then recently joined
Amplify. We just closed 900 million in new capital, including 200 million for a dedicated...
Did you just close...
Are you announcing the new capital,
the new fund today,
or was that a little bit ago?
It was a little bit ago, 900 million
including 200 million specifically for Bio
that I'm helping to build.
You guys were not loud enough about that.
Well, that's part of the thing. So yeah, we're enough. You guys weren't loud enough about that. No, it's a lot of money.
Well, that's part of the thing.
I feel like it's a really quiet fund where,
for three of the four funds for Amplify,
they've been in the top 5% of venture returns.
Not top decile, but like top 5%,
like really good at what they do.
And I feel like it's just not thought of as much and just
like very quiet and stealthily doing really phenomenal work. And so, yeah, excited to talk
more about it. Okay, State of the Union on bio. I want to know about where we are in artificial
intelligence and technology helping advance bio. We've seen alpha fold, we've seen kind of tools
and amazing breakthroughs.
I think everyone has a really concrete idea
of the impact AI is having on software engineering,
whether it's like amazing auto complete, you have cursor,
now you have agents, where are we in the deployment
of AI tooling in bio?
A lot of the narrative just jumps straight to we're going to one shot cancer.
And I love that.
I'm optimistic it happens eventually.
It doesn't feel like we're there.
But where are we actually in terms of the impact on productivity in AI with bio?
Yeah.
So you guys know like the Gartner hype cycle, right?
Where you have these like huge swings
for new technology where I've been working in this field
for like a decade and there has been a bunch of companies,
you know, Amplify invested in Recursion,
which was one of the early leaders of this, right?
Like there's been this general sentiment
that you can make a huge amount of progress
with new data, new tools, new technology and life sciences.
And it just turns out that it's like
a really hard problem, right?
And it takes time.
It takes some time for the market to ingest that, actually figure out the right business
models and strategies.
There was a really strong wave of early adopters.
Then there was some disillusionment and disappointment where it's like, oh shit, it actually turns
out that it's really hard to one-shot a cure for cancer.
Then as that sort of happens, there was just a bunch of breakthroughs in the technology.
So it became consensus that this is making a huge impact on hard problems in biology,
where we had the Nobel Prize for Alpha Fold.
So like a Nobel Prize going to an AI lab to Dennis and part to David Baker at the University
of Washington, because
it's actually starting to make real, meaningful impacts on hard problems in biology. And so
that's true for sort of molecular machine learning, where you're thinking about designing
new molecules and proteins, there are virtual cell research, we're trying to actually model
how cells behave. And so you're seeing this sort of step change now where like there are a couple sort of, you know,
actually faster than Moore's law curves.
DNA sequencing is decreasing in cost faster
than Moore's law.
And so you get this huge data tailwind
plus the tailwinds in machine learning and modeling.
People are actually starting to scale these models
and it's just like getting pretty impressive pretty fast.
Where are we on new drug discovery companies that we're targeting a single thing, we're
going to build a drug to solve a problem versus we're going to start a company that's SaaS,
it's tool, it's going to help with all different drug companies.
What's working?
What's more overhyped, underhyped?
What's your take on like picks and shovels
versus drugs basically?
Yeah, so like short answer is we do both.
So you guys had Jake on the other day at Centivax,
like absolutely insane founder,
was one of the early computational immunologists
at Stanford and like he's making a medicine
that you just couldn't otherwise make
without these technologies, right?
Where it's like all these impacts within biotech and within modeling to make these just really
incredible drugs you couldn't otherwise make.
And so that's people just making these singular things that are really phenomenal.
There's also a real platform opportunity there for other things beyond the universal flu
vaccine.
And then I think if we take a little trip down history lane and think about how hard
it was to actually sell software in the life sciences, one of the early companies in this
space is Schrödinger.
And they've been around for about 25 years.
They're a molecular dynamics company.
And it just took an extraordinary amount of time and effort to actually saturate and get
people to adopt the technology and get people to pay for it.
And they're a phenomenal business,
they're a public company,
they're actually vertically integrating
into making their own drugs.
But the thing that we're hearing consistently,
like from CEOs of top pharma companies
is just like there's huge demand for new infrastructure.
People realize that this technology is here now
and that they need to adopt it.
They're hearing this from their shareholders,
they're hearing this from the scientists at their companies.
And so there's just a very different moment
post Alpha Fold and post even chat GPT
where like they're using it,
their kids are using these models and they're just like,
oh, I really actually need to adopt this.
And so I think opportunity if you're both like
fundamentally new picks and shovels
where you sort of replace experiments with compute
And then also just fundamentally new drug products
Jordy last question
No, I mean, I think we should have you back on
as new
Yeah, I think they'll do the TVPN bio drop-in, you know
Feels like there's, within the traditional labs,
bio has been used as almost like marketing, right?
Or even Elon yesterday saying, we're
going to discover new physics.
And I don't think that that's obviously not
where a lot of the true innovation is happening. So
Yeah, let's make it a regular thing. Yeah
You think about that right like there's just been a couple things like I didn't invest really made a joke that like there's just a
Couple things that have consistently delivered venture returns and that's like software and also drugs
Hmm. And so like as far as the physical prediction goes
for being something that's super valuable,
if you can make your inference be a billion dollar
drug product, that's a pretty good spot to be.
We're excited about it.
So see you guys soon.
There's, I forgot who we had on,
when Trump had the executive order around drug prices,
we talked to a few different people that were saying,
biotech has on average been a terrible asset class.
And there's some amazing outliers.
Is it like Weinberg maybe?
Yeah, basically there's all these amazing outliers
that do deliver returns.
But if you just index the market,
you are going to underperform dramatically.
And underperform ventures specifically.
Yeah, and the venture category feels
like this could be a massive shift where suddenly
the next five years become the golden age of venture
bio-investing.
I mean, there have been some massive companies before it.
So you have breakouts like the Genentex of the world,
huge companies.
There actually is.
You guys should have Bruce Booth on, who a an og biotech investor at Atlas ventures
He's done a bunch of analysis and the fact that like there's actually some interesting just return data for biotech versus tech where it's like
Not as gloomy as you as you would think but I think in general like biotech starting right now
We want to make a world where it's actually like engineering right and that you're actually just getting these like really scurable amazing
Medicines and like I think that's where we're headed
Incredible. Thank you for joining great to have you on. Yeah
Alright, bye. Yeah, and next up we have Kareem from ramp the man himself and to talk about the launch new agent launch
Is he in the waiting room? We will bring in Kareem from rank to check.
Second time on the show, he hopped on at Hill and Valley.
That's right.
First time as a remote guest.
Great to see you.
How you doing Kareem?
Hello, great to see you guys.
Can you hear me okay?
Yes, loud and clear.
I don't think you need much of an introduction,
so why don't you just kick it off with the announcement
and break down the launch today, and then we'll have a bunch of questions.
Yeah, of course.
I mean, it's been a very exciting day for us at RAM.
We finally announced our, it's our first agent.
We're going to be announcing a lot more agents soon, so it's hard to keep track sometimes.
We've been playing with a lot of tech internally. We think we're in a very interesting space where maybe,
maybe the thing about it is a lot of people from the outside
look at ramp and think of maybe visualize the card.
They think about the, the FinTech aspects,
but at the end of the day,
like what we're really trying to do is help reduce the drag on companies
that happens when there's just a lot of bullshit work in between teams and a lot of papers
being passed around, questions being asked, the things that really get in the way of doing
work.
And that first agent that we're building is really just that, like it operates in the
messy middle between finance teams and every other team trying to spend to move the business forward.
And yeah, that's basically what we launched today.
So it's an agent for controllers.
It knows a lot more about the expense policy of a company, the rules that are in place that govern spend, than any single employee.
And it knows a lot more about every single transaction
than any single person on the finance team.
So it can operate in the middle
and automate all the little decisions
and the extra work that needs to get done
to figure out what's in policy or not.
And it's immediately available.
Like that's part of the power, right?
Is that if you're in the sense of if an employee wants
to decide whether or not they can buy something
or if something's in policy, you no longer
have to be slacking somebody.
It could be in the middle of the night or something like that
or off hours where there's creating that drag,
that delay, right?
100%. That's certainly one part of it. You can ask questions about your policy and ask
questions about specific transactions live to figure out whether they would be in or
out of policy. But more interestingly, once you make a transaction, it's already doing
work to go and figure out, well, that transaction that you made at that restaurant, it looks big,
but if that was a dinner with 10 people, maybe it's not as bad as initially thought and that's
actually impulsive.
So, well, that information is in your calendar, it's in your email, it's in sometimes outside
of just the immediate context of a transaction.
So the agent will go out on the internet in some cases, contact vendors or pull data from APIs on your behalf
to really gather all that context
and make better decisions on behalf of the company.
I wanna talk about like this,
the word agent and the decisions to,
like how agents are fitting in the different stack
of a tech company these days,
because like there is kind of always,
there's kind of always been an agent
behind the scenes working.
You think of these as like cron jobs before.
It's like there is a long running process
that when a receipt comes in, it gets tagged.
And there has been for I think years,
I don't wanna share anything you can't,
but there's been an LLM interacting with receipt data
for a long time, but it's been fully agentic
in the sense that it was behind the scenes.
And so I've been thinking about this in the context
of like meta and some of the value that Zuck
is gonna be getting from having a frontier AI model.
It's like there's so many workloads inside a business
that has billions of users that just happens
behind the scenes and these are agents,
but they're like almost internal agents.
And so I'm wondering about your decision to,
yeah, position an agent as like,
this is a user facing agent versus something that
we're just going to have a process
that's running
behind the scenes entirely.
Well, there's a bit of a difference, right?
Because when you think about these processes
running behind the scenes, for the most part,
the code is pretty deterministic.
The tools are the same.
It's built for accuracy and auditability
and you have a high confidence.
You can trace back the path that the old school agent,
let's call it that, went through exactly.
And in this case, it's less deterministic.
You give the agent a set of tools.
You can tell the agent,
or you can essentially give it access to,
let's say, ability to call, ability to email,
and it can be like, go figure out a way to get the receipt,
that's what you know about the restaurant.
And it will browse the web and figure out
that that's the phone number of the restaurant,
and then try to call the restaurant,
and if that doesn't work,
then it will try to email the restaurant
until it achieved that goal of getting you the receipt,
or it fails and you can then
interact with it.
In this case, the instructions that we are giving the agent as we're building it are
very high level.
You're just giving it high level instruction and access to tools.
That's very different from the old way of building these processes in which you had to be very specific about all these paths.
So it would take a lot longer to build these systems, to debug them, to update them, etc.
We lost you. Your background is turning like a ghost. It's very funny.
It's a super intelligence.
I think you just need a little bit more light on your face.
I actually lost power. Wait think you just need a little bit more light on your face. I think I actually lost power.
Wade, you lost power? There we go. I'm back.
That's wild. Much better. I want to talk to you about the data walls that are going up
and some of the battles that are playing out in the enterprise world because when I read
stories about companies that want to do enterprise search search, you can see that, well,
maybe Google doesn't want you to be taking,
maybe they want that for themselves.
Ramp's in a very different position,
but at the same time, there's just evolving policies
about how friendly, this is a classic with Amazon
not sending the itemized receipts to Gmail,
because they just didn't want to give
Google the data. But as a ramp customer, I want the Amazon
details pulled in three via Gmail via the ramp integration.
So talk to me about like, how's the broader trend playing out?
And then how do you go to big companies and say, hey, like, you know, work with us,
our clients want to be able to pull data from your service and we're not going to build
a delivery network Amazon. So you're not, you're not, we're not a competitor for you.
Yeah, for sure. I mean, most of the data that we need at the end of the day is like data that is quote-unquote owned by our users,
businesses that are on ramp, their employees. I think it's a little bit easier to operate
in the B2B space because like those, I guess what governs who owns the data and whose data it is a lot clearer than in the a lot of consumer applications. So I can in our
case, it's like, what data do we really need to know to, in the
case of the agents that we just launched to figure out whether
something's in policy or not, it's metadata about the
transaction, right? Like, what's in the receipt at the end of the
day, like stores all your your receipt that's your receipt,
right?
We get that information.
You have information that we get through the networks, through Visa, the metadata about
the geographical location of the transaction, maybe whether it was an in-person transaction
or not.
There's data that's in your inbox, in your email, which again, like that information
is also owned by the company.
We haven't really encountered a lot of pushback and challenges. I found most of the challenges
in getting the data to be more like technical. How do you make sure you get it quickly, clean it up,
and get it accurately, as opposed to ones where there are third parties that are trying to make it harder and harder for us to access the
data. We had been in that in the previous company. Yeah of course we had a lot of
these problems. I mean there were lots of funny moments at Paribas or previous
companies where we were I mean we're really building an agent for consumers to
help them save money on their online shopping, right? And we're trying to log
in on their behalf to Amazon accounts and Walmart accounts, etc. And of course,
they'll put blocks, they'll put captures. And today those captures seem like a joke.
I think any version of any half recent version of Chad GPT or Claude is able to solve this captures very easily
But that's one of the ways the internet's getting worse right now is it captures are getting so hard and annoying
You know when we go to the gym in the morning
Jordy has to log I get logged out like two minutes to get through the captures this gym
It has like the most like military-grade security to get to a gym login
It's just it just gives you a barcode that you just scan as could but I bet it but on that
I'm actually interested because I can imagine
You know ramp has tens of thousands of customers like high value business customers and other people that are building agents
I'm sure would love to actually be able to make actions on the ramp platform, but at the same time
you guys are trusted to handle the finance, basically the finance back office
for these companies.
You don't want an agent hallucinating,
like saying, based on and taking actions
on the ramp platform.
So I'm curious how you see that dynamic playing out,
because I'm sure you've been approached by a lot of companies saying, hey, we're building this agent to do this
thing. We'd love to be able to get authorization.
Of course. We're thinking through that a lot right now. I think there are good ways of
exposing the right information to the right agent, as long as our customers are very aware of what
they're exposing and there are lots of interesting applications for us to work on.
In the case of any large purchase at a company, there are multiple parties within that company
that need to review it or approve.
You want to review a certain vendor and look at their data protection policies.
You want to look at the legal agreements.
In some cases, you want to negotiate the price.
And you can imagine a day in the future
where a lot of our customers have an agentic tool
that they trust or agents that they trust for legal work,
agents that they trust for legal work, agents that they trust for IT work, etc.
And we're very interested in actually working with some of these companies, but we've got
to figure out on our end how we expose the right interface so that we're ensuring really
like the security of the data of our customers.
So it's an ongoing, uh, uh, work stream.
Uh, last question for me, um, the, the,
the grok four launch was very benchmark heavy. Um, it seems like,
you know, the consensus is that, uh, it's a good model.
And so as soon as I see that, it's now about costs per token.
And so I wanna hear from your perspective,
what drives decision making?
How big of a line item roughly,
or how much time is spent thinking about
LLM inference optimization at your scale?
Like roughly, how big of a deal is it?
And then what is the workflow to decide?
Can we use a cheaper model? How do we, do you have internal benchmarks?
Are you just checking these things?
Like how are you making decisions about which model to use for what problem?
Yeah, that's a great question. I mean,
I'm a lot more paranoid about being too slow to try the newest model and the latest
and greatest tools than I am by maybe overspending a little bit in one area.
I mean, the amount of time and money wasted at companies doing BS work is just insane.
If we're debating whether you can make something faster by spending an extra dollar or half
dollar, the value that we're able to create is so big that I don't worry about it too
much.
But we do have internally somewhat imprecise stack ranking of the different places where we need to make
inference calls and in some cases they're very simple, high volume, kind of low risk
right like you're trying to normalize or clean up some like merchant data to figure out the
appropriate spelling and maybe they're right like photo to use.
It's not the end of the world if it's not perfect.
We're doing it at high volume, it better be cheap.
So we have a kind of stack ranking of like,
this is something high volume where we need to be cheap.
This is something that's low volume and high stakes
where you need to be accurate.
And we'll generally try the newest and greatest models
and the places where we think will make
the biggest difference. And over time, like we'll break up some workflows and some parts of it will become
cheaper, more repeatable with smaller versions or cheaper versions of the model and it will just
evolve. I mean, we come from, I mean, I remember like micro optimizing every
single thing on our AWS account back in 2014, right? Like we
were, it was it was a lot harder back then. Like, I think
we, we also pride ourselves in being the time and money
company. So we do care a lot about making sure that we don't
waste our own money in our own time. But I would say that the
TLDR is like our time and engineering
time is the most valuable thing here.
And I'm a lot more focused on that than anything else.
Yeah. On the time issue, uh, what do you think about, uh,
the various latency trade-offs? I'm sure if, if, if, uh,
if an employee wants to know,
is this in policy and you hit Oh three pro and it waits 10 minutes.
Like they're probably just going to slack their manager and ask them. Um,
but you're going to get a really accurate answer. That's really detailed.
And so how do you think about those trade-offs in latency?
Yeah, I mean, it really depends on like where,
where the workflow are we making the reference call, right?
Like if it's live in the interface and the user expects a quick answer,
we'll be using some of the faster models.
But the reality is like a lot of these agentic workflows that are being kicked
off at ramp like happen behind the scenes, right?
Like you make a transaction,
you maybe get a very quick question from ramps AI to gather a little bit more
context. Like that's enough.
And then from there we'll kick, we'll kick off another task
that can be a little bit slower,
that'll happen in the background.
And by the time it reaches a bottleneck,
or it'll reach a place where it needs additional feedback,
it'll be in someone else's notifications
or on someone else's Slack.
Like you could take a little bit of time
when the work is going from one person to another person
But less when it's like the same person interacting with the interface live
That's yeah, that's generally the thinking but cool. Yeah, I have
I
Tried some of the the newer brow the newest browsers today and and like I tried some of the newest browsers today,
and I tried Palmit today, I tried DIA a couple weeks ago,
and I think what they're trying to do is incredibly cool,
but I often find myself thinking like,
damn, I wish this was a little bit faster.
And I know it's coming, but I think,
unlike some of the browser agentic calls,
you want it to be really fast.
Yeah, I was thinking about that in the context of the OpenAI
browser.
And unless they figure out something
that makes it basically 10 times as fast,
I'm still going to default to Chrome
if I have both of them open, just because I'm like, well,
I just really need a fast answer here.
So I kind of expect them to need it. I mean, that was the Chrome innovation, right?
Chrome 1 on speed.
Like they just optimized the code and they nailed speed and it was enough to leapfrog.
And so you could see, I mean, that's like the bull case for like Apple coming from behind
is like, yeah, they like, it feels like if XAI and entropic and and open AI are all kind of like
Gemini or like roughly at the frontier if you can just get something that's at that frontier not any new innovations
But hyper optimized and it runs locally on your phone and and it's put spitting out like tons of tokens every second
Like you have a product that would be very very it would be very rapidly adopted
you have a product that would be very, very, it would be very rapidly adopted.
Uh, it's exciting.
It matters a lot. I mean, I think one of the weirdest UX patterns on, on, on Shad's,
GPT now is that I have to do the work to figure out whether to use,
uh, oh three or three or four. Oh, every time.
Do I have 10 minutes or do I want the end? And four, oh, is always,
it's so good that I usually don't need to,
but then I'm just like, well, I want the best, of course,
and like, I'll come back to it.
And it's such a weird paradigm.
It's gonna be something that dates us.
And I just know our kids are gonna be like,
what did you have to do back then?
You had to rewind the VCR tape.
You had to put the disk in the Xbox.
You had to pick which model to use.
This is insane.
It's so legacy and it's going away, but we're just in this weird, you had to pick which model to use. This is insane. It's so legacy and it's going away,
but we're just in this weird, like,
we don't have a model router solved.
And it feels like the easiest thing is like,
which model should we use for this?
I don't know.
We'll see.
Yeah, and if you, I mean, I don't know if you guys,
I grew up in Lebanon, I still remember the days of dialogue
where you would have to
uh, uh,
you kick everyone else off the phone line.
Well, exactly. Well, select the phone line in that case. Like, okay,
which phone line am I going to use? Like, I don't know.
Can't you tell me which one is free and like pick it for me? No, I saw it.
Yeah. It seems like the easiest thing to do. And also, I mean, this is just, uh,
you know,
complaining about the app that I use 30 minutes a day,
at least, chat to me, but I almost wish
I could just define it in the prompt
and just say, hey, use O3 Pro, and then here's the prompt,
as opposed to needing to click the UI,
change it, switch it, and then pick,
instead of just being able to go back and forth.
I don't know.
I mean, it's a good sign because people are using this stuff
so much that they're frustrated by these niche UI things.
So, it's an exciting time.
There's a lot of, I forgot who it was who posted on X.
I think it was a couple weeks ago.
Did every company is one great UX breakthrough away
from something amazing?
And I think that will be true for a long time.
There's a lot of alpha right now and just great UX and good patterns. We haven't figured it out. We're still in the maybe terminal phase of personal computers, right? Like, when is the mouse going to come out? When are the right GUI is going to come out? Like, there's a lot of that happening right now. And yeah, it's a fun, fun time to be building.
There's a lot of that happening right now. And yeah, it's a fun time to be building.
One last question for me.
On Monday, Dwar Kesh released an article
and then came on the show kind of talking
about his timelines around when an AI agent would
be able to do his taxes, right?
Sort of like basically fully agentic experience being like,
I want to do my 2025 taxes.
And then it just sort of autonomously runs.
How do like big, how are like, you know,
Fortune 500 CFO, like what are their timelines around?
Maybe you just tell them what the timelines are.
Like, okay, by 2028, you know,
we're gonna be able to do this for you.
But how is the sort of finance arm of the C-suite
kind of anticipating the rate of advancement?
Obviously, the agent today is a step towards that future,
but you'll obviously need a variety of different agents.
Well, I think in terms of capabilities of LLMs,
we're there.
We have the capabilities.
The bottleneck on being able to do this today
is having the right context, right?
So while some of that context is in my head,
so the AI needs to know to ask me the right questions efficiently so I can answer those,
right, even when I'm working with my accountant.
Like, pick the best accountant in the world
for your personal taxes.
If you just tell them like,
find my taxes, they can't do anything.
Maybe you tell them, find my taxes,
and here's access to my email,
they could do a little bit more,
but they can't get it fully.
So tell them like, find my taxes,
here's access to my email. you can call my wife as much
as you want, you can look through my drawers and give it more and more of these things.
Maybe it can do it, but it's going to get lost, it's going to take forever.
And really what we need to do even for businesses is like what are the right patterns for us
to extract context that's in people's heads,
organize it, get them comfortable with connecting different tools like your
inbox and things of that nature.
And I think in terms of tech and capabilities, we're there.
We're not, we're not really missing anything.
So there's a lot of UX and plastic like yeah, we all
Can email me a question and put it in my inbox, which is effectively my to-do list
And that's what my accountant does when that access happened. They email me and say that's why this gotta do this
You can just you can take a picture of a product
There and ask it if I buy this, you know, yeah policy. Yeah
Yeah
Well, thank you so much for stopping by
Great. Well, we'll definitely see you soon Kareem. This is great. Good to the whole talk soon talk to you soon. Bye
And that is the rest of our guests we are through that in other news
Periodic labs. There's this scoop from Natasha Mascarenes,
the startup being co-founded by Liam Fetus
and Eric Dogus Kubuk, great names,
is in talks to raise hundreds of millions of dollars
in funding and above a $1 billion valuation,
the two-month-old startup is looking to apply AI
to physical science, starting with discovering novel materials
Let's give it up to the two-month old unicorn. Oh, we got to have these guys on the show is extremely fast
I also like this person David Perel where we're getting back on the show a sap
We had a lot of fun talking to him a couple months ago
He said I'm touring apartments in New York and just about every new build has the same soulless aesthetic, flat walls, white paint, no cornices, no ornamentation, just a room in
a box. Only one real estate agent said to me, if you want something with
character you're going to have to stick to pre-war buildings. Look, I'm all for
some efficiency gains but we've created a world where new things are soulless
things and that's how a society as modern as ours and that's not how a
society as modern as ours should function.'s not how a society as modern as ours
should function.
Intuitively, you'd think that a wealthier society
would build more beautiful things, but not ours.
And I completely agree.
What's crazy is that this isn't,
I mean, these apartments look nice,
but this continues all the way to $20 million houses
that are still bland.
And I think it's mostly because maybe time
and all the difficulties with permitting,
because if you even have the resources
to build something from scratch,
creating, okay, I want these ornaments,
and I want this, and I want something
that's really expressive
of my personality, well now if you want that,
no one else wants that, so you have to build it
and you have to underwrite it
and you're gonna be underwritten.
Make sure it's to code.
Make sure it's to code and then get it built
and then the secondary market value is gonna be less
because not everyone wants Hurst Castle.
Whereas if you build,
if everyone builds the exact same thing,
they're perfectly, it's perfectly liquid market
because every apartment is interchangeable
with every other.
So it's kind of a function of just like modernity,
but it's more a function of people not,
just risking it ever on building a disaster project,
making their forever home.
People learn the lesson of William Randolph Hearst too much.
They should have just like never learned that lesson,
ripped it and just send it and just build something
that no one else will wanna buy
and will take decades to build.
That's always the best.
Well, I have a good place to end it.
Rob Petrozzo says the original Hermes Birkin bag prototype
just sold for $10 million at Sotheby's.
There was a two minute standing ovation.
He says bull market confirmed.
And a gong hit.
We love a bull market.
The original prototype.
Fascinating.
That's wild.
Makes sense.
Very cool.
It's incredible lore. And I wouldn't be excited for a bull market and
alternative assets such as Birkins be great and
You should be too, but uh that's a great show folks. We will be back tomorrow
I cannot wait we will talk to you tomorrow talk to you today cheers. Bye
