No Priors: Artificial Intelligence | Technology | Startups - The Data Foundry for AI with Alexandr Wang from Scale
Episode Date: May 22, 2024Alexandr Wang was 19 when he realized that gathering data will be crucial as AI becomes more prevalent, so he dropped out of MIT and started Scale AI. This week on No Priors, Alexandr joins Sarah and ...Elad to discuss how Scale is providing infrastructure and building a robust data foundry that is crucial to the future of AI. While the company started working with autonomous vehicles, they’ve expanded by partnering with research labs and even the U.S. government. In this episode, they get into the importance of data quality in building trust in AI systems and a possible future where we can build better self-improvement loops, AI in the enterprise, and where human and AI intelligence will work together to produce better outcomes. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @alexandr_wang (0:00) Introduction (3:01) Data infrastructure for autonomous vehicles (5:51) Data abundance and organization (12:06) Data quality and collection (15:34) The role of human expertise (20:18) Building trust in AI systems (23:28) Evaluating AI models (29:59) AI and government contracts (32:21) Multi-modality and scaling challenges
Transcript
Discussion (0)
Hi, listeners, and welcome to No Priors.
Today, I'm excited to welcome Alex Wang, who started Scale AI as a 19-year-old college dropout.
Scale has since become a juggernaut in the AI industry.
Modern AI is powered by three pillars, compute, data, and algorithms.
While research labs are working on algorithms, and AI chip companies are working on the
compute pillar, scale is the data foundry,
serving almost every major LLM effort,
including OpenAI, Meta, and Microsoft.
This is a really special episode for me,
given Alex started Scale in my house in 2016,
and the company has come so far.
Alex, welcome.
I'm so happy to be talking you today.
Thanks for having me.
No new goal for quite some time, so excited to be on the pod.
Why don't we start at the beginning just for a broader audience?
Talk a little bit about the founding story of scale.
Right before scale,
I was studying AI and machine learning at MIT.
And this was the year when DeepMind came out with AlphaGo,
where Google released TensorFlow,
so it was sort of maybe the beginning of the deep learning hype wave or hype cycle.
And I remember I was at college.
I was trying to use neural networks.
I was trying to train image recognition neural networks.
And the thing I realized very quickly is that these models were very much
so just a product of their data.
And I sort of played this forward and thought through it, and, you know, these models, or AI in general, is the product of, you know, three fundamental pillars.
There's the algorithms, the compute, and the computation power that goes into them, and the data.
And at that time, it was clear, you know, there were companies working on the algorithms, labs like Open AI or Google's labs or, you know, a number of AI research efforts.
There were, Nvidia was already a very clear leader in building compute for these AI systems.
but there was nobody focused on data.
And it was really clear that over the long arc of this technology,
data was only going to become more and more important.
And so in 2016, dropped out of MIT, did YC,
and really started scale to solve the data pillar of the AI ecosystem
and be the organization that was going to solve all the hard problems associated with,
how do you actually produce and create enough data
to fuel this ecosystem,
And really, this was the start of scale as the data foundry for AI.
It's incredible foresight, because you describe it as like the beginning of the deep learning hype cycle.
I don't think most people notice that a hype cycle was yet going on.
And so I just distinctly remember, you know, you working through a number of early use cases, you know,
building this company in my house at the time and discovering, I think, far before anybody else noticed that,
the AV companies were spending all of their money on data.
How did you think about, like talk a little bit about how the business has evolved since then?
Because it's certainly not just that use case today.
AI is an interesting technology because it is at the core mathematical level such a general purpose technology.
It could be, you know, it's basically functions that can approximate nearly any function,
including like intelligence.
And so it can be applied in a very wide breadth of use cases.
And I think one of the challenges in building in AI over the past, you know, we've been at it for eight years now, has really been what are the applications that are gaining traction and how do you build the right infrastructure to fuel those applications?
So as an infrastructure provider, you know, we provide the data foundry for all these AI applications.
Our burden is to be thinking ahead as to where are the breakthrough use cases in AI going to be and how do we basically lay down the tracks before the sort of, you know, frame.
train of AI comes rolling through. We, you know, when we got started in 2016, this was the
very beginning of the autonomous vehicle sort of cycle. It was, I think, right when we were doing
YC was when Cruz got acquired, and it was sort of the beginning of, you know, the sort of the wave
of autonomous driving being one of the key tech trends. And I think that, you know, we followed
the early startup advice. You have to focus early on as a company. And so we, we, we followed the early on as a
company. And so we built the very first data engine that supported sensor fuse data, so support
a combination of 2D data plus 3D data, so LIDARs plus cameras that were built onto the, onto the
vehicles. And then that very quickly became an industry standard across all the players, you know,
working folks like General Motors and Toyota and Stalantis and many others. In the first few years
of the company were just focused on autonomous driving and a handful of other robotics use
cases, but that was the, that was sort of the, the prime time AI use case. And then starting in about
2019, 2020, it was an interesting moment where it was actually pretty unclear where the
future of, you know, AI use cases where AI applications were going to come. And this is,
obviously, pre-language model, pre-generative AI, and it was a period of high uncertainty. So we
then started focusing on government applications. That was one of the areas where it was
clear that there was high applicability and it was one of the areas that was becoming more and more
important globally. So we built the very first data engines to support government data. This was
support mostly geospatial and satellite and other overhead imagery. This ended up fueling the
first AI program of record for the U.S. DoD and was sort of the start of our government business.
And that technology ended up being critical years later in the Ukraine conflict.
And then also around that time was when we started working on generative AI.
So we partnered with Open AI at that time to do the very first experiments on RLHF on top of
GPD2.
This was like the primordial days of RLHF.
And the models back then were really rudimentary.
Like they didn't, they truly, it did not seem like anything to us.
But we were just like, you know, open AI, there are a bunch of things.
smart people, we should work with them, we should partner with them. And so we, we partner with
a team that, that originally invented RLHF. And then we basically continued innovating with them
from 2019 onwards, but we didn't think that much about the, the underlying technological
trend. You know, they integrated all of this technology into GPD3 with a, there's a paper
instruct GBT, which is kind of the precursor to chat GPD that we worked with them on. And then
ultimately, you know, in 2022, Dolly 2 and Chat Chip-D rolled around, and we ended up focusing a lot of
our effort as a company into how do we fuel the data for gendered AI? How do we be the data foundry
for gendered AI? And today, you know, fast forward to today, our data foundry fuels basically
every major large language model in the industry, work with Open AI, Meta, Microsoft, many of the
other players, partner with them very closely in fueling their AI development.
And in that timeframe, the ambitions of AI have just, you know, totally exploded.
I mean, we've gone from, you know, GPD3, I think was, it was a landmark model, but it was,
you know, there was a modesty to GPD3 at the time.
And now, you know, we're looking at building, you know, agents and very complex reasoning
capabilities, multi-modality, multi-legality.
I mean, the infrastructure that we have to build is to support all.
all the directions that developers want to take this technology has been really staggering
and quite incredible.
Yeah.
You've basically surfed multiple waves of AI, and one of the big shifts is happening right
now is there's other types of parties that are starting to engage with this technology.
So you're obviously now working with a lot of the technology giants, with government, with
automotive companies.
It seems like there's emergence now of enterprise customers and a platform for that.
There's emergence of sovereign AI.
How are you engaging with these other massive use cases that are coming now in the general
of AI side? It's quite an exciting time because I think for the first time in maybe the entire
history of AI, AI truly feels like a general purpose technology, which can be applied in, you know,
a very large number of business use cases. I contrast this to, you know, the autonomous vehicle
era where it really felt like we were building a very specific use case that happened to be
very, very valuable. Now its general purpose can be, it can be encompassed across the broad span.
And as we think about, what are the infrastructure requirements to support this broad
industry and what is the what is the broad arc of the technology it's really one where we think
how do we empower data abundance right there's a there's a question that comes up a lot you know
are we going to run out of tokens and and what happens when we do and i think that that's a choice
i think we as an industry can either choose data abundance or data scarcity um and we view our role
in our job in the ecosystem to be to build data abundance um the key to the scaling of the
these large language models and the, you know, these, these language models in general is the
ability to scale data. And I think that one of the fundamental bottlenecks to, you know, what's,
what's in the way of us getting from GPD4 to GPD 10 is, you know, data abundance. Are we going to
have the data to actually get there? And our goal is, you know, how do we, how do we ensure that we have
enough tokens to do that? And we've sort of, as a community, we have, we've had easy data, which is all
the data on the internet. And we've kind of exhausted all the easy data. And now it's about,
you know, forward data production that has high supervisory signal that is basically very
valuable. And we think about this as, you know, frontier data production. And the kinds of data
that are really relevant and valuable to the models today, there's a, you know, the quality
requirements have just increased dramatically. It's not any more the case that these models can
learn that much more from, you know, various comments on Reddit or whatnot. They need
they need truly frontier data.
And what does this look like?
This is, you know, reasoning chain of thoughts
from the world's experts or from mathematicians
or physicists or biologists or chemists
or lawyers or doctors.
This is agent workflow data of agents in enterprise use cases
or in consumer use cases or even coding agents
and other agents like that.
This is multilingual data, so data that encompasses
the full span of, you know, the many, many languages
that are spoken in the world.
This includes all the multimodal data to your point,
like, you know, how do we integrate video data,
audio data, you know, start including more
of the esoteric data types that exist within enterprises
and this exists within a lot of industrial use cases
into these models.
There's this very large mandate, I think, for our industry
to actually figure out what is the means of production
by which we're actually going to be able to generate
and produce more tokens to fuel the future of the
industry and I think there's there's a few sources or there's a few answers
to this so the first is we need we need the best and brightest minds in the
world to be contributing data I think it's one of the things I think is
actually quite interesting about this technology is you know very smart
humans so GHD's or doctors or lawyers or experts in all these various
fields actually have a can have an extremely high impact into the future of
this technology by producing data that will
ultimately feeds into the algorithms.
If you think about it's actually their work is one of the ways
that they can have a very scaled society level impact.
You know, there's an argument that you can make that producing high quality data for
AI systems is near infinite impact because, you know, even if you improve the model just
a little bit, if you were to integrate that over all of the future invocations of that
model, that's like a ridiculous amount of impact.
So I think that's something that's quite exciting.
It's kind of interesting because Google's original mission
was to organize a world's information
and make it universally accessible and useful.
And they would go and they would scan in books, right,
from library archives.
And they were trying to find different ways
to collect all the world's information.
And effectively, that's what you folks are doing
or helping others do.
You're effectively saying, where is all the expert knowledge
and how do we translate that into data
that can then be used by machines so that people can ultimately
use that information?
And that's super exciting.
It's exciting to the contributors who are in our network.
as well because I think, you know, that there's obviously a monetary component and they're excited
to do this work, but there's a, there's a very meaningful motivation, which is how do I leverage
my expert knowledge and expert insight and use that to fuel this entire AI movement, which I think
is, is like a deep, you know, that's kind of like the deepest scientific motivation, which is
how do I use my knowledge and capability and intelligence to fuel humanity and progress and
knowledge going into the future.
I think it's a somewhat undervalued thing where it's going to age me, but like there was a
decade or so where like the biggest thing happening in technology was digitization of different
processes.
And I think there's actually some belief that like, oh, that's happened, right?
Like, you know, interactions are digital and like information is captured in relational database
systems on, you know, customers and employees or whatever.
But one of the big discoveries as a investor in this field over the last five years,
has been like the data is not actually captured for almost any use case you might imagine
for AI, right? Because I have multiple companies, and I'm sure a lot does too. And you and your
personal investing where, you know, the first six months of the company is a question of where
are we going to get this data. You go to many of the incumbent software and services vendors.
And despite having done this task, you know, for years, they have not actually captured the
information you'd want to teach a model. Yeah.
And like that, you know, that knowledge capture era, I think is happening in skill is a really important part.
To make a Dune 2 analogy, I mean, I think it really is, you know, data production is very similar to spice production.
It is the, it will be the lifeblood of all the future of these AI systems.
And, you know, so I think best and brightest people is one key source.
Proprietary data is definitely a very important source as well.
You know, crazy staff, but J.P. Morgan's proprietary data set is 150 petabytes of data.
GPD4 is trained on less than one petabyte of data.
So there's clearly so much data that exists within enterprises and governments that is
proprietary data that can be used for training incredibly powerful AI systems.
And then I think there's this key question of what's the what's the future of synthetic data
and how synthetic data needs to emerge.
And our perspective is that the critical thing is what we call hybrid human AI synthetic data.
So how can you build hybrid human AI systems such that AI are doing a lot of the heavy lifting,
but human experts and people, you know, the basically best and brightest, the smartest people,
the sort of best at reasoning can contribute all of their insight and capability to ensure that you produce data that's of extremely high quality,
of high fidelity to ultimately feel the future of these models.
I want to pull this thread a little bit because something you and I were talking about,
both in the context of data collection and evals, is like, what do you do when the models are actually quite good?
Right, better than humans on many measured dimensions.
And so, like, can you talk about that from both the data and perhaps, you know, we should talk about evaluation as well?
I mean, I think philosophically, the question is not, is a model better than a human unassisted from a model?
The question is, is a human plus a model together when people produce better output than a model alone?
And I think that will be the case for a very, very, very long time, that humans are still, you know, human intelligence is,
complementary to machine intelligence that we're building, and they're going to be able to combine to build, you know, to do things that are strictly better than what the models are going to be able to do on their own.
I have this optimism.
A lot and I had a debate at one point that was challenging for me philosophically about whether or not Centaur play or like machine and human intelligence were complementary.
My simple case for this is when we look at the machine intelligence, like the models that are produced, you know, we always, you know, you see things that are really weird.
You know, there's like the rot 13 versus rot 8 thing, for example, where the models know how to do rot 13, they don't know how to do rot 8, there's the reversal curse.
You know, there's all these artifacts that indicate somehow that it is not like human intelligence or not like biological intelligence.
And I think that's a, that's the bull case for humanity, which is that, you know, there are certain qualities and attributes of human intelligence, which are somehow distinct from the very separate and very different process by which we're training these algorithms.
And so then I think, you know, what does this look like in practice?
It's, you know, if a model produces an answer or response, how can a human critique that
response to improve it?
How can a human expert, you know, highlight where there's factuality errors or where there's
reasoning errors to improve the quality of it?
How can the human aid in guiding the model over like a long period of time to produce reasoning
chains that are very, that are very correct and deep and are able to drive, you know,
the capability of these models forward?
And so I think there's a lot that goes into,
this is what we spend all of our time thinking about,
what is the human expert plus model teaming
that's going to help us keep pushing the boundary
of what the models are capable of doing.
How long do you think human expertise continues
to play a role in that?
So if I look at certain models,
Med Palm 2 would be a good example,
where Google released a model
where they showed that the model output
was better than the average position.
You could still get better output from a cardiologist,
but if you just ask a GP a cardiology question,
the model would do better, as ranked by physician experts.
So it showed that already, for certain types of capabilities,
the model provided better insights or output
than people who were trained to do some aspects of that.
How far do you think that goes in terms of,
or when do you think human expertise no longer is additive to these models?
Is that never?
Is it three years from now?
I'm sort of curious at the time frame.
I think it's never, because I think that, you know,
So the key quality of human intelligence or biological intelligence is this ability to reason
and optimize over a very long time horizons.
So, and this is biological, right, because our goals as biological entities is to optimize
over, you know, our lifetimes, optimize for reproduction, et cetera.
So we have the ability as human intelligence is to produce long-term goals, continue
optimizing, adjusting, and reasoning over very long, very long time horizons.
current models don't have this capability because the models are trained on these like little
nuggets of human intelligence so they're they're trained you know they're very good at like
almost like a a like a shot glass full of human intelligence but they're very bad at continuing
that intelligence over a long time period or a long time horizon and so this this fundamental
quality of biological intelligence i think is something that will only be taught to the model
over time through, you know, through a direct transfer via data to fuel these models.
You don't think there's a like a architectural breakthrough in planning that solves it?
I think there will be architectural breakthroughs that improve performance dramatically,
but I think if you think about it inherently, like these models are not trained to optimize
over long time horizons in any way. And we don't have the environments to be able to get them
to optimize for these like, you know, amorphous goals over long time horizons. So I think this is a somewhat
but fundamental limitation.
Before we talk about some of the cool releases,
you guys have coming out and what's next for scale,
maybe we can zoom out and just congratulate you
on the fundraise that you guys just did.
A billion dollars at almost 14 billion in valuation
with a really interesting investors, AMD, Cisco, Meta,
I want to hear a little bit about the strategics.
Our mission is to serve the entire AI ecosystem
the broader AI industry.
You know, we're an infrastructure provider.
That's our role is to be as much as possible,
supporting the entire industry to flourish as much as possible.
And we thought an important part of that was how can we be an important part of the
ecosystem and build as much ecosystem around this data foundry,
which is going to fuel the future of the industry as much as possible,
which is one of the reasons why we wanted to bring along, A,
other infrastructure providers like Intel and AMD and folks who are also
also laying the groundwork for the future of the technology,
but also, you know, key players in the industry like meta.
Folks like Cisco as well, you know, our view is that ultimately there's the stack
that we think about the, there's the infrastructure, there's the technology, and there's the application.
And our goal as much as possible is how do we leverage this data capability, this data foundry
to empower every layer of that stack as much as possible and build a broad
industry viewpoint around what's needed for the future of data.
I mean, I think that this is an exciting moment for us.
I mean, we see our role, you know, going back to the framing of what's holding us back from
GPD 10.
What's in the way from GPD4 is GPD 10?
We want to be investing into actually enabling that pretty incredible technology journey.
And, you know, there's tens of billions, maybe hundreds of billions of dollars investment going
into the compute side of this equation.
And one of the reasons why we thought was important to raise the money and continue investing
is, you know, there's real investment that's going to have to be made into the data production
to actually get us there.
With great power comes great responsibility.
If, you know, if these AI systems are what we think they are in terms of societal impact,
like trust in those systems is a crucial question.
Like how do you guys think about this as part of your work at scale?
A lot of what we think about is how do we utilize, how does the data foundry, um,
enhance the entire AI lifecycle, right?
And that life cycle goes from, you know,
A, ensuring that there's data abundance,
as well as data quality going into the systems,
but also being able to measure the AI systems,
which builds confidence in AI,
and also enables further development
and further adoption of the technology.
And this is the fundamental loop
that I think every AI company goes through.
You know, they get a bunch of data
or they generate a bunch of data,
they train their models, they evaluate those systems,
and they sort of, you know, go again in the loop.
And so evaluation,
and measurement of the AI systems is a critical component of the life cycle, but also a critical
component I think of society being able to build trust in these systems. You know, how are
governments going to know that these AI systems are safe and secure and fit for, you know,
broader adoption within their countries? How do, how are enterprises going to know that when
they deploy an AI agent or an AI system that it's actually going to be good for the consumers
and there's not going to create greater risk for them? How do, how are labs going to be able to
consistently measure what are the intelligences of my of the AI systems that we
build and how are we going to you know how do they make sure they continue to
develop responsibly as a result can you give our listeners a little bit of
intuition for like what makes Eval's hard one of the hard things that you know
because we're building systems that we're trying to approximate and and
build human intelligence grading one of these AI systems is is not something
that's very easy to do automatically and it's it's sort of like you know you have
to kind of build IQ tests for these models, which in and of itself is a very fraught philosophical
questions, like how do you measure the intelligence of a system? And there's a very practical
problems as well. So most of the benchmarks that we as a community look at for the academic
benchmarks. Yeah, the academic benchmarks that are what the industry used to measure the performance
these algorithms are fraught with issues. Many of the models are overfit on these benchmarks.
They're sort of in the training data sets of these models. And so-
You guys just did some interesting research here.
Yes.
Publish some.
Yep.
So we, one of the things we did is we published DSM 1K, which was a held-out e-val.
So we basically produced a new evaluation of the math capabilities of models that there's
no way it would ever exist in the training data set to really see how much of the performance
of the models, were the reported performance of the model capability versus the actual capability.
And what you notice is some of the models performed really well, but some of them perform
much worse than the reported performance.
And so this whole question of how we decide are actually going to measure these models is
a really tough one.
And our answer is we have to leverage the same human experts and kind of the best and brightest
minds to do expert evaluations on top of these models, to understand, you know, where
are they powerful, where are they weak, and what's the sort of, what are the sort of risks
associated with these models?
So, you know, one of the things that we're very, you know, we're going to, we're very passionate
about is there needs to be sort of public visibility and transparency into the performance
of these models. So there need to be leaderboards, there need to be evaluations that
are public that demonstrate in a very rigorous scientific way, what the performance of these
models are. And then we need to build the platforms and capabilities for governments, enterprises,
labs, to be able to do constant evaluation on top of these models to ensure that we're always
developing the technology in a safe way and we're always deploying it in a safe way. So
this is something that we think is, you know, just in the same
way that our roles in infrastructure provider is to support the data needs for the entire
ecosystem.
We think that building this layer of confidence in the systems through accurate measurement
is going to be fundamental to the further adoption and further development of the technology.
You want to talk about state of AI at the application layer?
Because you have a viewpoint into that that very few people do.
You know, after GPD4 launched, there was sort of this frenzy of sort of an application
build out. And I think that there was, you know, there were all these like agent companies,
there was excitement around agents. There was all these, like, you know, a lot of applications
that were built out. And I actually think it's, it's an interesting moment in the, in the, in the
life cycle of AI, which is that, you know, GPD4, I think was as a model was a little early
of a technology for us to have this entire hype wave around. And I think we, you know, the community
very quickly discovered all the limitations of GPD4. But, you know, we all know, GPD4 is
not the terminal model that AI that we are going to be using. There are better models on the
way. And so I think there was a, there's an element by which, you know, it's sort of a classic
hype cycle, GPD4 came out, lots of hype around building applications around GPD4, but it was,
it was probably a few generations too early of a model to, for the thousand flowers to bloom.
And so I think in the coming models, we're going to see, we're going to, this sort of like
trough of disillusionment, I think we're going to come out of because the next, the future models
are going to be so much more powerful and you're actually going to have all of the fundamental
capabilities you need to build agents or all sorts of incredible things on top of it.
And we think what we're very passionate about is how do we empower application builders,
so whether that be enterprises or governments or startups, to build self-improvement into the applications
that they build.
So what we see from the large labs like Open Eye and others is that self-improvement comes
from data flywheels. So how do you have a flywheel by which you're constantly, you know,
getting new data that improves your model. You're constantly valuing that system to understand
where there's weaknesses. And you're sort of like continually hydrating this, this workflow.
We think that fundamentally every enterprise or government or startup is going to need to build
applications that have this self-improvement loop and cycle. And it's very hard to build.
And so, you know, we built this product, our Gen.A.I. Platform to really build, you know, lay the groundwork and the platform to enable the entire ecosystem to be able to build these cell phone improvement loops into their products as well as possible.
I was just curious. I mean, one thing related to that is you mentioned that, for example, J.P. Morgan has 150 petabytes of data relative.
of, you know, it's 150 times what some early GPT models trained on.
How do you work with enterprises around those loops or what are the types of customer needs
that you're seeing right now or application areas?
One of the things that every, you know, all the model developers understand well, but the
enterprises understand super well is that, you know, not all data is created equal and high
quality data or frontier data is, is, can be, you know, 10,000 times more valuable than just
any run-of-the-mill data within an enterprise.
And so a lot of the challenge or a lot of the problems
that we solve with enterprises are,
how do you go from this giant mountain of data
that is truly all over the place
and distributed everywhere within the enterprise
to what are the, how do you can press that down
and filter it down to the high quality data
that you can actually use to fine tune or train
or continue to enhance these models
to actually drive differentiated performance?
I think one thing that's interesting is that there's some papers out of meta, which basically shows that
actually narrowing the amount of data that you use creates better models. So the output is better.
The models are smaller, which means they're cheaper to run, they're faster to run. And so to your point,
it's really interesting because a lot of people are sitting on these massive data sets,
and they think all that data is really important. It sounds like you're really working with
enterprises to sort of narrow that down into what's the data to actually improve the model.
It's almost that information theory question in some sense.
What are some of the launches that are coming from scale now?
You know, we're building evaluations for the ecosystem.
So one is that we're going to launch these private held-out evaluations
and have leaderboards associated with these e-vals for the leading LLMs in the ecosystem.
And we're going to rerun this contest periodically.
So every few months, we're going to do a new set of held-out e-vals
to basically consistently benchmark and monitor the performance of our models
and continue adding more domains.
So we're going to start with areas like math and coding, instruction,
and following adversarial capabilities.
And then we're going to, over time,
continue increasing the number of areas
that we test these models on.
We think about as kind of like an Olympics for LLMs,
but instead of every four years,
it'll be every few months.
So that's one thing we're quite excited about.
And then we have an exciting launch coming
with our government customers.
So one of the things that we see in the government space
is they're trying to use
and they're trying to use these capabilities is there's actually a lot of there's a lot of cases
where even the current agentic capabilities of the models can be extremely valuable to the to the
government and it's often in in pretty boring use cases like writing reports or filling out forms
or pulling information one place to another but it's well within the capabilities of these
models and so we're going to be we're excited about launching some agentic features for our for our
our government customers in uh with our donovan product these are applications you build yourselves
or an application building framework so for our government customers we basically build a like a
a i staff officer so it's a it's a full application but it integrates with whatever model our
customers think is appropriate for their use case and do you think scale will invest in that for
enterprise applications in the future our view for enterprises is is fundamentally like how do we how
do we, for the applications that enterprises are going to build, how do we help them build
self-improvement into those products? So we think about much more at the platform level for
enterprises. Does the new OpenAI or Google release change your point of view on anything
fundamentally, multi-modality, you know, the applicability of voice agents, et cetera?
You know, I think you tweeted about this. But one very interesting element is the direction
that we're going in terms of consumer focus.
And it's fascinating.
I mean, I think multimodality, well, taking a step back,
first off, I think it points to where there's still huge data needs.
So multimodality as an entire space is one where, for the same reasons that we've exhausted
a lot of the internet data, there's a lot of scarcity for good multimodal data that can empower
these personal agents and these personal use cases.
So I think there's, you know, as we want to keep improving these systems and improving
these personal agent use cases, there's, you know, we think about this a lot, what are the data
needs that are actually, that are going to be required to actually fuel that. I think the other
thing that's fascinating is, is the convergence, actually. So both labs have been working
independently on various technologies. And, you know, Astra, which is Google's major sort of
hubcap release, as well as 4-0, you know, they're both shockingly similar and sort of
of, you know, demonstrations of the technology.
And so there's, I think that was, that was very fascinating
that the lab was sort of converging on the same end use cases
or the same visionary use cases for the technology.
I think there's two reads of that.
One is like there's an obvious technical next step here.
And very smart people have independently arrived.
And the other is like competitive intelligence is pretty good.
Yeah, I think both are probably true.
I think both are true.
It's funny because when I used to work on products at Google,
we'd spend two years working on something.
And then the week of launch, somebody else would come out
with something and we launch it and then people would claim that we copied them. And so I do think
a lot of this stuff just happens to be in some cases just where the whole industry is heading
and it's kind of people are aware that multimodality is one of the really big areas. And a lot
of these things are years of work going into it. So it's kind of interesting to watch it as an
external observer. Yeah. I mean, this is also not a like a training run that is a one week copy
effort, right? Well, and then I think the last thing that is that, you know, I've been thinking
a lot about is like, when are we going to get smarter models? So, you know, we got
multi-modality capability, that's exciting.
It's more of a lateral expansion of the models.
And the industry needs smarter models.
We need GP5 or we need Gemini 2 or whatever that those models are going to be.
And so to me it was, you know, I was somewhat disappointed because I just want much smarter
models that are going to enable kind of as we mentioned before, you know, way more applications
to be built on top of them.
The year is long, end of year.
Okay, so quick fire and Alon chime in if you have ones,
here, something you believe about AI that other people don't?
My biggest belief here is that the path to AGI is one that looks a lot more like curing cancer
than developing a vaccine.
And what I mean by that is I think that the path to build AGI is going to be in, you know,
you're going to have to solve a bunch of small problems where you don't get that much positive leverage between solving one problem to solving the next
problem. And there's just sort of, you know, it's like curing cancer, which is you have to then
zoom into each individual cancer and solve them independently. And eventually over a multi-decade
time frame, we're going to look back and realize that we've, we've, you know, built AI, we've cured
cancer. But the path to get there will be this like, you know, quite plotting road of
solving individual capabilities and building individual sort of data fly wheels to support this
end mission. Whereas I think a lot of people in industry paint the path to AI as like, you know,
eventually we'll just, boop, we'll get there. We'll like, you know, we'll, we'll, like, we'll solve it
in one fell swoop. And I think there's a lot of implications for how you actually think about, you know,
the technology arc and, and how society's going to have to deal with it. I think it's actually
pretty bullish case for society adapting the technology, because I think it's going to be, you know,
consistent slow progress for quite some time. And society will have time to fully sort of acclimate
to the technology that develops.
solve like a problem at a time, right? If we just like pull away from the analogy a little bit,
and should I think of that as generality of multi-step reasoning is really hard, as, you know,
Montecalo-Tresearch is not the answer that people think it might be. We're just going to run
into scaling walls. Like what sort of what are the dimensions of like solving multiple problems?
I think the main thing fundamentally is I think there's there's very limited generality that we get
from these models. And even for multimodality, for example, my understanding,
there's no positive transfer from learning in one modality to other modality.
So like training off of a bunch of video doesn't really help you that much with your text
problems and vice versa.
And so I think what this means is like each sort of each niche of capabilities or each area
of capability is we're going to require separate flywheels, data flywheels, to be able to
push through and drive performance.
You don't yet believe in video as basis for a world model that helps.
I think that's reason.
I think it's great narrative.
I don't think there's strong.
scientific evidence of that yet. Maybe there will be eventually. But I think that this is the,
I think the base case, let's say, is one where, you know, there's not that much generalization
coming out of the models. And so we actually just need to slowly solve lots and lots of little
problems to ultimately result in AGI. One last question for you is like, you know, leader of
scale a scaling organization. Like, what are you thinking about as a CEO? And this will almost
sound cliche, but just how early we are in this, in this technology. I mean, I think that
But there's, you know, it's strange because on the one hand, it feels like we're so late
because the tech giants are investing so much and there's a bajillion launches all the time
and there's, you know, there's all sorts of investment into this space.
Markets look crowded in the obvious use cases.
Yeah, exactly.
Markets look super crowded, but I think fundamentally we're still super early because the technology
is, you know, one one hundredth or one thousandth of its future capability.
And as we, as a community and as an industry and as a society ride that wave, it's just going to be, you know, there's so many more chapters of the book.
And so as a, you know, you think about any organization, what we think about a lot is nimbleness.
Like, how do we ensure that as this technology continues to develop, that we're able to continue adapting alongside the developments of the technology?
All right. That's a great place to add. Thanks so much for joining us today.
Yeah. Thanks, Alex.
Thank you.
Find us on Twitter at No Prior's Pod.
Subscribe to our YouTube channel if you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen.
That way you get a new episode every week.
And sign up for emails or find transcripts for every episode at no dash priors.com.