No Priors: Artificial Intelligence | Technology | Startups - The Data Foundry for AI with Alexandr Wang from Scale

Episode Date: May 22, 2024

Alexandr Wang was 19 when he realized that gathering data will be crucial as AI becomes more prevalent, so he dropped out of MIT and started Scale AI. This week on No Priors, Alexandr joins Sarah and ...Elad to discuss how Scale is providing infrastructure and building a robust data foundry that is crucial to the future of AI. While the company started working with autonomous vehicles, they’ve expanded by partnering with research labs and even the U.S. government.   In this episode, they get into the importance of data quality in building trust in AI systems and a possible future where we can build better self-improvement loops, AI in the enterprise, and where human and AI intelligence will work together to produce better outcomes.  Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @alexandr_wang (0:00) Introduction (3:01) Data infrastructure for autonomous vehicles (5:51) Data abundance and organization (12:06)  Data quality and collection (15:34) The role of human expertise (20:18) Building trust in AI systems (23:28) Evaluating AI models (29:59) AI and government contracts (32:21) Multi-modality and scaling challenges

Transcript
Discussion (0)
Starting point is 00:00:00 Hi, listeners, and welcome to No Priors. Today, I'm excited to welcome Alex Wang, who started Scale AI as a 19-year-old college dropout. Scale has since become a juggernaut in the AI industry. Modern AI is powered by three pillars, compute, data, and algorithms. While research labs are working on algorithms, and AI chip companies are working on the compute pillar, scale is the data foundry, serving almost every major LLM effort, including OpenAI, Meta, and Microsoft.
Starting point is 00:00:38 This is a really special episode for me, given Alex started Scale in my house in 2016, and the company has come so far. Alex, welcome. I'm so happy to be talking you today. Thanks for having me. No new goal for quite some time, so excited to be on the pod. Why don't we start at the beginning just for a broader audience?
Starting point is 00:00:56 Talk a little bit about the founding story of scale. Right before scale, I was studying AI and machine learning at MIT. And this was the year when DeepMind came out with AlphaGo, where Google released TensorFlow, so it was sort of maybe the beginning of the deep learning hype wave or hype cycle. And I remember I was at college. I was trying to use neural networks.
Starting point is 00:01:17 I was trying to train image recognition neural networks. And the thing I realized very quickly is that these models were very much so just a product of their data. And I sort of played this forward and thought through it, and, you know, these models, or AI in general, is the product of, you know, three fundamental pillars. There's the algorithms, the compute, and the computation power that goes into them, and the data. And at that time, it was clear, you know, there were companies working on the algorithms, labs like Open AI or Google's labs or, you know, a number of AI research efforts. There were, Nvidia was already a very clear leader in building compute for these AI systems. but there was nobody focused on data.
Starting point is 00:02:00 And it was really clear that over the long arc of this technology, data was only going to become more and more important. And so in 2016, dropped out of MIT, did YC, and really started scale to solve the data pillar of the AI ecosystem and be the organization that was going to solve all the hard problems associated with, how do you actually produce and create enough data to fuel this ecosystem, And really, this was the start of scale as the data foundry for AI.
Starting point is 00:02:32 It's incredible foresight, because you describe it as like the beginning of the deep learning hype cycle. I don't think most people notice that a hype cycle was yet going on. And so I just distinctly remember, you know, you working through a number of early use cases, you know, building this company in my house at the time and discovering, I think, far before anybody else noticed that, the AV companies were spending all of their money on data. How did you think about, like talk a little bit about how the business has evolved since then? Because it's certainly not just that use case today. AI is an interesting technology because it is at the core mathematical level such a general purpose technology.
Starting point is 00:03:13 It could be, you know, it's basically functions that can approximate nearly any function, including like intelligence. And so it can be applied in a very wide breadth of use cases. And I think one of the challenges in building in AI over the past, you know, we've been at it for eight years now, has really been what are the applications that are gaining traction and how do you build the right infrastructure to fuel those applications? So as an infrastructure provider, you know, we provide the data foundry for all these AI applications. Our burden is to be thinking ahead as to where are the breakthrough use cases in AI going to be and how do we basically lay down the tracks before the sort of, you know, frame. train of AI comes rolling through. We, you know, when we got started in 2016, this was the very beginning of the autonomous vehicle sort of cycle. It was, I think, right when we were doing
Starting point is 00:04:09 YC was when Cruz got acquired, and it was sort of the beginning of, you know, the sort of the wave of autonomous driving being one of the key tech trends. And I think that, you know, we followed the early startup advice. You have to focus early on as a company. And so we, we, we followed the early on as a company. And so we built the very first data engine that supported sensor fuse data, so support a combination of 2D data plus 3D data, so LIDARs plus cameras that were built onto the, onto the vehicles. And then that very quickly became an industry standard across all the players, you know, working folks like General Motors and Toyota and Stalantis and many others. In the first few years of the company were just focused on autonomous driving and a handful of other robotics use
Starting point is 00:04:52 cases, but that was the, that was sort of the, the prime time AI use case. And then starting in about 2019, 2020, it was an interesting moment where it was actually pretty unclear where the future of, you know, AI use cases where AI applications were going to come. And this is, obviously, pre-language model, pre-generative AI, and it was a period of high uncertainty. So we then started focusing on government applications. That was one of the areas where it was clear that there was high applicability and it was one of the areas that was becoming more and more important globally. So we built the very first data engines to support government data. This was support mostly geospatial and satellite and other overhead imagery. This ended up fueling the
Starting point is 00:05:38 first AI program of record for the U.S. DoD and was sort of the start of our government business. And that technology ended up being critical years later in the Ukraine conflict. And then also around that time was when we started working on generative AI. So we partnered with Open AI at that time to do the very first experiments on RLHF on top of GPD2. This was like the primordial days of RLHF. And the models back then were really rudimentary. Like they didn't, they truly, it did not seem like anything to us.
Starting point is 00:06:14 But we were just like, you know, open AI, there are a bunch of things. smart people, we should work with them, we should partner with them. And so we, we partner with a team that, that originally invented RLHF. And then we basically continued innovating with them from 2019 onwards, but we didn't think that much about the, the underlying technological trend. You know, they integrated all of this technology into GPD3 with a, there's a paper instruct GBT, which is kind of the precursor to chat GPD that we worked with them on. And then ultimately, you know, in 2022, Dolly 2 and Chat Chip-D rolled around, and we ended up focusing a lot of our effort as a company into how do we fuel the data for gendered AI? How do we be the data foundry
Starting point is 00:06:57 for gendered AI? And today, you know, fast forward to today, our data foundry fuels basically every major large language model in the industry, work with Open AI, Meta, Microsoft, many of the other players, partner with them very closely in fueling their AI development. And in that timeframe, the ambitions of AI have just, you know, totally exploded. I mean, we've gone from, you know, GPD3, I think was, it was a landmark model, but it was, you know, there was a modesty to GPD3 at the time. And now, you know, we're looking at building, you know, agents and very complex reasoning capabilities, multi-modality, multi-legality.
Starting point is 00:07:36 I mean, the infrastructure that we have to build is to support all. all the directions that developers want to take this technology has been really staggering and quite incredible. Yeah. You've basically surfed multiple waves of AI, and one of the big shifts is happening right now is there's other types of parties that are starting to engage with this technology. So you're obviously now working with a lot of the technology giants, with government, with automotive companies.
Starting point is 00:08:01 It seems like there's emergence now of enterprise customers and a platform for that. There's emergence of sovereign AI. How are you engaging with these other massive use cases that are coming now in the general of AI side? It's quite an exciting time because I think for the first time in maybe the entire history of AI, AI truly feels like a general purpose technology, which can be applied in, you know, a very large number of business use cases. I contrast this to, you know, the autonomous vehicle era where it really felt like we were building a very specific use case that happened to be very, very valuable. Now its general purpose can be, it can be encompassed across the broad span.
Starting point is 00:08:35 And as we think about, what are the infrastructure requirements to support this broad industry and what is the what is the broad arc of the technology it's really one where we think how do we empower data abundance right there's a there's a question that comes up a lot you know are we going to run out of tokens and and what happens when we do and i think that that's a choice i think we as an industry can either choose data abundance or data scarcity um and we view our role in our job in the ecosystem to be to build data abundance um the key to the scaling of the these large language models and the, you know, these, these language models in general is the ability to scale data. And I think that one of the fundamental bottlenecks to, you know, what's,
Starting point is 00:09:21 what's in the way of us getting from GPD4 to GPD 10 is, you know, data abundance. Are we going to have the data to actually get there? And our goal is, you know, how do we, how do we ensure that we have enough tokens to do that? And we've sort of, as a community, we have, we've had easy data, which is all the data on the internet. And we've kind of exhausted all the easy data. And now it's about, you know, forward data production that has high supervisory signal that is basically very valuable. And we think about this as, you know, frontier data production. And the kinds of data that are really relevant and valuable to the models today, there's a, you know, the quality requirements have just increased dramatically. It's not any more the case that these models can
Starting point is 00:10:03 learn that much more from, you know, various comments on Reddit or whatnot. They need they need truly frontier data. And what does this look like? This is, you know, reasoning chain of thoughts from the world's experts or from mathematicians or physicists or biologists or chemists or lawyers or doctors. This is agent workflow data of agents in enterprise use cases
Starting point is 00:10:25 or in consumer use cases or even coding agents and other agents like that. This is multilingual data, so data that encompasses the full span of, you know, the many, many languages that are spoken in the world. This includes all the multimodal data to your point, like, you know, how do we integrate video data, audio data, you know, start including more
Starting point is 00:10:49 of the esoteric data types that exist within enterprises and this exists within a lot of industrial use cases into these models. There's this very large mandate, I think, for our industry to actually figure out what is the means of production by which we're actually going to be able to generate and produce more tokens to fuel the future of the industry and I think there's there's a few sources or there's a few answers
Starting point is 00:11:12 to this so the first is we need we need the best and brightest minds in the world to be contributing data I think it's one of the things I think is actually quite interesting about this technology is you know very smart humans so GHD's or doctors or lawyers or experts in all these various fields actually have a can have an extremely high impact into the future of this technology by producing data that will ultimately feeds into the algorithms. If you think about it's actually their work is one of the ways
Starting point is 00:11:43 that they can have a very scaled society level impact. You know, there's an argument that you can make that producing high quality data for AI systems is near infinite impact because, you know, even if you improve the model just a little bit, if you were to integrate that over all of the future invocations of that model, that's like a ridiculous amount of impact. So I think that's something that's quite exciting. It's kind of interesting because Google's original mission was to organize a world's information
Starting point is 00:12:10 and make it universally accessible and useful. And they would go and they would scan in books, right, from library archives. And they were trying to find different ways to collect all the world's information. And effectively, that's what you folks are doing or helping others do. You're effectively saying, where is all the expert knowledge
Starting point is 00:12:27 and how do we translate that into data that can then be used by machines so that people can ultimately use that information? And that's super exciting. It's exciting to the contributors who are in our network. as well because I think, you know, that there's obviously a monetary component and they're excited to do this work, but there's a, there's a very meaningful motivation, which is how do I leverage my expert knowledge and expert insight and use that to fuel this entire AI movement, which I think
Starting point is 00:12:55 is, is like a deep, you know, that's kind of like the deepest scientific motivation, which is how do I use my knowledge and capability and intelligence to fuel humanity and progress and knowledge going into the future. I think it's a somewhat undervalued thing where it's going to age me, but like there was a decade or so where like the biggest thing happening in technology was digitization of different processes. And I think there's actually some belief that like, oh, that's happened, right? Like, you know, interactions are digital and like information is captured in relational database
Starting point is 00:13:28 systems on, you know, customers and employees or whatever. But one of the big discoveries as a investor in this field over the last five years, has been like the data is not actually captured for almost any use case you might imagine for AI, right? Because I have multiple companies, and I'm sure a lot does too. And you and your personal investing where, you know, the first six months of the company is a question of where are we going to get this data. You go to many of the incumbent software and services vendors. And despite having done this task, you know, for years, they have not actually captured the information you'd want to teach a model. Yeah.
Starting point is 00:14:05 And like that, you know, that knowledge capture era, I think is happening in skill is a really important part. To make a Dune 2 analogy, I mean, I think it really is, you know, data production is very similar to spice production. It is the, it will be the lifeblood of all the future of these AI systems. And, you know, so I think best and brightest people is one key source. Proprietary data is definitely a very important source as well. You know, crazy staff, but J.P. Morgan's proprietary data set is 150 petabytes of data. GPD4 is trained on less than one petabyte of data. So there's clearly so much data that exists within enterprises and governments that is
Starting point is 00:14:45 proprietary data that can be used for training incredibly powerful AI systems. And then I think there's this key question of what's the what's the future of synthetic data and how synthetic data needs to emerge. And our perspective is that the critical thing is what we call hybrid human AI synthetic data. So how can you build hybrid human AI systems such that AI are doing a lot of the heavy lifting, but human experts and people, you know, the basically best and brightest, the smartest people, the sort of best at reasoning can contribute all of their insight and capability to ensure that you produce data that's of extremely high quality, of high fidelity to ultimately feel the future of these models.
Starting point is 00:15:26 I want to pull this thread a little bit because something you and I were talking about, both in the context of data collection and evals, is like, what do you do when the models are actually quite good? Right, better than humans on many measured dimensions. And so, like, can you talk about that from both the data and perhaps, you know, we should talk about evaluation as well? I mean, I think philosophically, the question is not, is a model better than a human unassisted from a model? The question is, is a human plus a model together when people produce better output than a model alone? And I think that will be the case for a very, very, very long time, that humans are still, you know, human intelligence is, complementary to machine intelligence that we're building, and they're going to be able to combine to build, you know, to do things that are strictly better than what the models are going to be able to do on their own.
Starting point is 00:16:13 I have this optimism. A lot and I had a debate at one point that was challenging for me philosophically about whether or not Centaur play or like machine and human intelligence were complementary. My simple case for this is when we look at the machine intelligence, like the models that are produced, you know, we always, you know, you see things that are really weird. You know, there's like the rot 13 versus rot 8 thing, for example, where the models know how to do rot 13, they don't know how to do rot 8, there's the reversal curse. You know, there's all these artifacts that indicate somehow that it is not like human intelligence or not like biological intelligence. And I think that's a, that's the bull case for humanity, which is that, you know, there are certain qualities and attributes of human intelligence, which are somehow distinct from the very separate and very different process by which we're training these algorithms. And so then I think, you know, what does this look like in practice? It's, you know, if a model produces an answer or response, how can a human critique that
Starting point is 00:17:11 response to improve it? How can a human expert, you know, highlight where there's factuality errors or where there's reasoning errors to improve the quality of it? How can the human aid in guiding the model over like a long period of time to produce reasoning chains that are very, that are very correct and deep and are able to drive, you know, the capability of these models forward? And so I think there's a lot that goes into, this is what we spend all of our time thinking about,
Starting point is 00:17:35 what is the human expert plus model teaming that's going to help us keep pushing the boundary of what the models are capable of doing. How long do you think human expertise continues to play a role in that? So if I look at certain models, Med Palm 2 would be a good example, where Google released a model
Starting point is 00:17:50 where they showed that the model output was better than the average position. You could still get better output from a cardiologist, but if you just ask a GP a cardiology question, the model would do better, as ranked by physician experts. So it showed that already, for certain types of capabilities, the model provided better insights or output than people who were trained to do some aspects of that.
Starting point is 00:18:13 How far do you think that goes in terms of, or when do you think human expertise no longer is additive to these models? Is that never? Is it three years from now? I'm sort of curious at the time frame. I think it's never, because I think that, you know, So the key quality of human intelligence or biological intelligence is this ability to reason and optimize over a very long time horizons.
Starting point is 00:18:36 So, and this is biological, right, because our goals as biological entities is to optimize over, you know, our lifetimes, optimize for reproduction, et cetera. So we have the ability as human intelligence is to produce long-term goals, continue optimizing, adjusting, and reasoning over very long, very long time horizons. current models don't have this capability because the models are trained on these like little nuggets of human intelligence so they're they're trained you know they're very good at like almost like a a like a shot glass full of human intelligence but they're very bad at continuing that intelligence over a long time period or a long time horizon and so this this fundamental
Starting point is 00:19:19 quality of biological intelligence i think is something that will only be taught to the model over time through, you know, through a direct transfer via data to fuel these models. You don't think there's a like a architectural breakthrough in planning that solves it? I think there will be architectural breakthroughs that improve performance dramatically, but I think if you think about it inherently, like these models are not trained to optimize over long time horizons in any way. And we don't have the environments to be able to get them to optimize for these like, you know, amorphous goals over long time horizons. So I think this is a somewhat but fundamental limitation.
Starting point is 00:19:55 Before we talk about some of the cool releases, you guys have coming out and what's next for scale, maybe we can zoom out and just congratulate you on the fundraise that you guys just did. A billion dollars at almost 14 billion in valuation with a really interesting investors, AMD, Cisco, Meta, I want to hear a little bit about the strategics. Our mission is to serve the entire AI ecosystem
Starting point is 00:20:21 the broader AI industry. You know, we're an infrastructure provider. That's our role is to be as much as possible, supporting the entire industry to flourish as much as possible. And we thought an important part of that was how can we be an important part of the ecosystem and build as much ecosystem around this data foundry, which is going to fuel the future of the industry as much as possible, which is one of the reasons why we wanted to bring along, A,
Starting point is 00:20:48 other infrastructure providers like Intel and AMD and folks who are also also laying the groundwork for the future of the technology, but also, you know, key players in the industry like meta. Folks like Cisco as well, you know, our view is that ultimately there's the stack that we think about the, there's the infrastructure, there's the technology, and there's the application. And our goal as much as possible is how do we leverage this data capability, this data foundry to empower every layer of that stack as much as possible and build a broad industry viewpoint around what's needed for the future of data.
Starting point is 00:21:25 I mean, I think that this is an exciting moment for us. I mean, we see our role, you know, going back to the framing of what's holding us back from GPD 10. What's in the way from GPD4 is GPD 10? We want to be investing into actually enabling that pretty incredible technology journey. And, you know, there's tens of billions, maybe hundreds of billions of dollars investment going into the compute side of this equation. And one of the reasons why we thought was important to raise the money and continue investing
Starting point is 00:21:54 is, you know, there's real investment that's going to have to be made into the data production to actually get us there. With great power comes great responsibility. If, you know, if these AI systems are what we think they are in terms of societal impact, like trust in those systems is a crucial question. Like how do you guys think about this as part of your work at scale? A lot of what we think about is how do we utilize, how does the data foundry, um, enhance the entire AI lifecycle, right?
Starting point is 00:22:22 And that life cycle goes from, you know, A, ensuring that there's data abundance, as well as data quality going into the systems, but also being able to measure the AI systems, which builds confidence in AI, and also enables further development and further adoption of the technology. And this is the fundamental loop
Starting point is 00:22:39 that I think every AI company goes through. You know, they get a bunch of data or they generate a bunch of data, they train their models, they evaluate those systems, and they sort of, you know, go again in the loop. And so evaluation, and measurement of the AI systems is a critical component of the life cycle, but also a critical component I think of society being able to build trust in these systems. You know, how are
Starting point is 00:23:00 governments going to know that these AI systems are safe and secure and fit for, you know, broader adoption within their countries? How do, how are enterprises going to know that when they deploy an AI agent or an AI system that it's actually going to be good for the consumers and there's not going to create greater risk for them? How do, how are labs going to be able to consistently measure what are the intelligences of my of the AI systems that we build and how are we going to you know how do they make sure they continue to develop responsibly as a result can you give our listeners a little bit of intuition for like what makes Eval's hard one of the hard things that you know
Starting point is 00:23:33 because we're building systems that we're trying to approximate and and build human intelligence grading one of these AI systems is is not something that's very easy to do automatically and it's it's sort of like you know you have to kind of build IQ tests for these models, which in and of itself is a very fraught philosophical questions, like how do you measure the intelligence of a system? And there's a very practical problems as well. So most of the benchmarks that we as a community look at for the academic benchmarks. Yeah, the academic benchmarks that are what the industry used to measure the performance these algorithms are fraught with issues. Many of the models are overfit on these benchmarks.
Starting point is 00:24:13 They're sort of in the training data sets of these models. And so- You guys just did some interesting research here. Yes. Publish some. Yep. So we, one of the things we did is we published DSM 1K, which was a held-out e-val. So we basically produced a new evaluation of the math capabilities of models that there's no way it would ever exist in the training data set to really see how much of the performance
Starting point is 00:24:36 of the models, were the reported performance of the model capability versus the actual capability. And what you notice is some of the models performed really well, but some of them perform much worse than the reported performance. And so this whole question of how we decide are actually going to measure these models is a really tough one. And our answer is we have to leverage the same human experts and kind of the best and brightest minds to do expert evaluations on top of these models, to understand, you know, where are they powerful, where are they weak, and what's the sort of, what are the sort of risks
Starting point is 00:25:08 associated with these models? So, you know, one of the things that we're very, you know, we're going to, we're very passionate about is there needs to be sort of public visibility and transparency into the performance of these models. So there need to be leaderboards, there need to be evaluations that are public that demonstrate in a very rigorous scientific way, what the performance of these models are. And then we need to build the platforms and capabilities for governments, enterprises, labs, to be able to do constant evaluation on top of these models to ensure that we're always developing the technology in a safe way and we're always deploying it in a safe way. So
Starting point is 00:25:42 this is something that we think is, you know, just in the same way that our roles in infrastructure provider is to support the data needs for the entire ecosystem. We think that building this layer of confidence in the systems through accurate measurement is going to be fundamental to the further adoption and further development of the technology. You want to talk about state of AI at the application layer? Because you have a viewpoint into that that very few people do. You know, after GPD4 launched, there was sort of this frenzy of sort of an application
Starting point is 00:26:14 build out. And I think that there was, you know, there were all these like agent companies, there was excitement around agents. There was all these, like, you know, a lot of applications that were built out. And I actually think it's, it's an interesting moment in the, in the, in the life cycle of AI, which is that, you know, GPD4, I think was as a model was a little early of a technology for us to have this entire hype wave around. And I think we, you know, the community very quickly discovered all the limitations of GPD4. But, you know, we all know, GPD4 is not the terminal model that AI that we are going to be using. There are better models on the way. And so I think there was a, there's an element by which, you know, it's sort of a classic
Starting point is 00:26:53 hype cycle, GPD4 came out, lots of hype around building applications around GPD4, but it was, it was probably a few generations too early of a model to, for the thousand flowers to bloom. And so I think in the coming models, we're going to see, we're going to, this sort of like trough of disillusionment, I think we're going to come out of because the next, the future models are going to be so much more powerful and you're actually going to have all of the fundamental capabilities you need to build agents or all sorts of incredible things on top of it. And we think what we're very passionate about is how do we empower application builders, so whether that be enterprises or governments or startups, to build self-improvement into the applications
Starting point is 00:27:36 that they build. So what we see from the large labs like Open Eye and others is that self-improvement comes from data flywheels. So how do you have a flywheel by which you're constantly, you know, getting new data that improves your model. You're constantly valuing that system to understand where there's weaknesses. And you're sort of like continually hydrating this, this workflow. We think that fundamentally every enterprise or government or startup is going to need to build applications that have this self-improvement loop and cycle. And it's very hard to build. And so, you know, we built this product, our Gen.A.I. Platform to really build, you know, lay the groundwork and the platform to enable the entire ecosystem to be able to build these cell phone improvement loops into their products as well as possible.
Starting point is 00:28:23 I was just curious. I mean, one thing related to that is you mentioned that, for example, J.P. Morgan has 150 petabytes of data relative. of, you know, it's 150 times what some early GPT models trained on. How do you work with enterprises around those loops or what are the types of customer needs that you're seeing right now or application areas? One of the things that every, you know, all the model developers understand well, but the enterprises understand super well is that, you know, not all data is created equal and high quality data or frontier data is, is, can be, you know, 10,000 times more valuable than just any run-of-the-mill data within an enterprise.
Starting point is 00:28:59 And so a lot of the challenge or a lot of the problems that we solve with enterprises are, how do you go from this giant mountain of data that is truly all over the place and distributed everywhere within the enterprise to what are the, how do you can press that down and filter it down to the high quality data that you can actually use to fine tune or train
Starting point is 00:29:22 or continue to enhance these models to actually drive differentiated performance? I think one thing that's interesting is that there's some papers out of meta, which basically shows that actually narrowing the amount of data that you use creates better models. So the output is better. The models are smaller, which means they're cheaper to run, they're faster to run. And so to your point, it's really interesting because a lot of people are sitting on these massive data sets, and they think all that data is really important. It sounds like you're really working with enterprises to sort of narrow that down into what's the data to actually improve the model.
Starting point is 00:29:52 It's almost that information theory question in some sense. What are some of the launches that are coming from scale now? You know, we're building evaluations for the ecosystem. So one is that we're going to launch these private held-out evaluations and have leaderboards associated with these e-vals for the leading LLMs in the ecosystem. And we're going to rerun this contest periodically. So every few months, we're going to do a new set of held-out e-vals to basically consistently benchmark and monitor the performance of our models
Starting point is 00:30:22 and continue adding more domains. So we're going to start with areas like math and coding, instruction, and following adversarial capabilities. And then we're going to, over time, continue increasing the number of areas that we test these models on. We think about as kind of like an Olympics for LLMs, but instead of every four years,
Starting point is 00:30:39 it'll be every few months. So that's one thing we're quite excited about. And then we have an exciting launch coming with our government customers. So one of the things that we see in the government space is they're trying to use and they're trying to use these capabilities is there's actually a lot of there's a lot of cases where even the current agentic capabilities of the models can be extremely valuable to the to the
Starting point is 00:31:07 government and it's often in in pretty boring use cases like writing reports or filling out forms or pulling information one place to another but it's well within the capabilities of these models and so we're going to be we're excited about launching some agentic features for our for our our government customers in uh with our donovan product these are applications you build yourselves or an application building framework so for our government customers we basically build a like a a i staff officer so it's a it's a full application but it integrates with whatever model our customers think is appropriate for their use case and do you think scale will invest in that for enterprise applications in the future our view for enterprises is is fundamentally like how do we how
Starting point is 00:31:50 do we, for the applications that enterprises are going to build, how do we help them build self-improvement into those products? So we think about much more at the platform level for enterprises. Does the new OpenAI or Google release change your point of view on anything fundamentally, multi-modality, you know, the applicability of voice agents, et cetera? You know, I think you tweeted about this. But one very interesting element is the direction that we're going in terms of consumer focus. And it's fascinating. I mean, I think multimodality, well, taking a step back,
Starting point is 00:32:25 first off, I think it points to where there's still huge data needs. So multimodality as an entire space is one where, for the same reasons that we've exhausted a lot of the internet data, there's a lot of scarcity for good multimodal data that can empower these personal agents and these personal use cases. So I think there's, you know, as we want to keep improving these systems and improving these personal agent use cases, there's, you know, we think about this a lot, what are the data needs that are actually, that are going to be required to actually fuel that. I think the other thing that's fascinating is, is the convergence, actually. So both labs have been working
Starting point is 00:33:03 independently on various technologies. And, you know, Astra, which is Google's major sort of hubcap release, as well as 4-0, you know, they're both shockingly similar and sort of of, you know, demonstrations of the technology. And so there's, I think that was, that was very fascinating that the lab was sort of converging on the same end use cases or the same visionary use cases for the technology. I think there's two reads of that. One is like there's an obvious technical next step here.
Starting point is 00:33:32 And very smart people have independently arrived. And the other is like competitive intelligence is pretty good. Yeah, I think both are probably true. I think both are true. It's funny because when I used to work on products at Google, we'd spend two years working on something. And then the week of launch, somebody else would come out with something and we launch it and then people would claim that we copied them. And so I do think
Starting point is 00:33:51 a lot of this stuff just happens to be in some cases just where the whole industry is heading and it's kind of people are aware that multimodality is one of the really big areas. And a lot of these things are years of work going into it. So it's kind of interesting to watch it as an external observer. Yeah. I mean, this is also not a like a training run that is a one week copy effort, right? Well, and then I think the last thing that is that, you know, I've been thinking a lot about is like, when are we going to get smarter models? So, you know, we got multi-modality capability, that's exciting. It's more of a lateral expansion of the models.
Starting point is 00:34:21 And the industry needs smarter models. We need GP5 or we need Gemini 2 or whatever that those models are going to be. And so to me it was, you know, I was somewhat disappointed because I just want much smarter models that are going to enable kind of as we mentioned before, you know, way more applications to be built on top of them. The year is long, end of year. Okay, so quick fire and Alon chime in if you have ones, here, something you believe about AI that other people don't?
Starting point is 00:34:51 My biggest belief here is that the path to AGI is one that looks a lot more like curing cancer than developing a vaccine. And what I mean by that is I think that the path to build AGI is going to be in, you know, you're going to have to solve a bunch of small problems where you don't get that much positive leverage between solving one problem to solving the next problem. And there's just sort of, you know, it's like curing cancer, which is you have to then zoom into each individual cancer and solve them independently. And eventually over a multi-decade time frame, we're going to look back and realize that we've, we've, you know, built AI, we've cured cancer. But the path to get there will be this like, you know, quite plotting road of
Starting point is 00:35:35 solving individual capabilities and building individual sort of data fly wheels to support this end mission. Whereas I think a lot of people in industry paint the path to AI as like, you know, eventually we'll just, boop, we'll get there. We'll like, you know, we'll, we'll, like, we'll solve it in one fell swoop. And I think there's a lot of implications for how you actually think about, you know, the technology arc and, and how society's going to have to deal with it. I think it's actually pretty bullish case for society adapting the technology, because I think it's going to be, you know, consistent slow progress for quite some time. And society will have time to fully sort of acclimate to the technology that develops.
Starting point is 00:36:16 solve like a problem at a time, right? If we just like pull away from the analogy a little bit, and should I think of that as generality of multi-step reasoning is really hard, as, you know, Montecalo-Tresearch is not the answer that people think it might be. We're just going to run into scaling walls. Like what sort of what are the dimensions of like solving multiple problems? I think the main thing fundamentally is I think there's there's very limited generality that we get from these models. And even for multimodality, for example, my understanding, there's no positive transfer from learning in one modality to other modality. So like training off of a bunch of video doesn't really help you that much with your text
Starting point is 00:36:53 problems and vice versa. And so I think what this means is like each sort of each niche of capabilities or each area of capability is we're going to require separate flywheels, data flywheels, to be able to push through and drive performance. You don't yet believe in video as basis for a world model that helps. I think that's reason. I think it's great narrative. I don't think there's strong.
Starting point is 00:37:16 scientific evidence of that yet. Maybe there will be eventually. But I think that this is the, I think the base case, let's say, is one where, you know, there's not that much generalization coming out of the models. And so we actually just need to slowly solve lots and lots of little problems to ultimately result in AGI. One last question for you is like, you know, leader of scale a scaling organization. Like, what are you thinking about as a CEO? And this will almost sound cliche, but just how early we are in this, in this technology. I mean, I think that But there's, you know, it's strange because on the one hand, it feels like we're so late because the tech giants are investing so much and there's a bajillion launches all the time
Starting point is 00:37:55 and there's, you know, there's all sorts of investment into this space. Markets look crowded in the obvious use cases. Yeah, exactly. Markets look super crowded, but I think fundamentally we're still super early because the technology is, you know, one one hundredth or one thousandth of its future capability. And as we, as a community and as an industry and as a society ride that wave, it's just going to be, you know, there's so many more chapters of the book. And so as a, you know, you think about any organization, what we think about a lot is nimbleness. Like, how do we ensure that as this technology continues to develop, that we're able to continue adapting alongside the developments of the technology?
Starting point is 00:38:35 All right. That's a great place to add. Thanks so much for joining us today. Yeah. Thanks, Alex. Thank you. Find us on Twitter at No Prior's Pod. Subscribe to our YouTube channel if you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.