The Data Stack Show - 229: The Future of AI: Superhuman Intelligence, Autonomous Coding, and the Path to AGI with Misha Laskin of ReflectionAI

Starting point is 00:00:00 Hi, I'm Eric Dotz. And I'm Jon Wessel. Welcome to the Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the Data Sack Show. We are here today with Misha Laskin.

Starting point is 00:00:33 And Misha, I don't know if we could have had a guest who is better suited to talk about AI because you have this amazing vast Basu and your co-founder, working in sort of the depths of AI, doing research, building all sorts of fascinating things, being part of the history's background of acquisition by Google, and on the DeepMind side, and some amazing stuff there.

Starting point is 00:01:01 So I am humbled to have you on the show. Thank you so much for joining us. Yeah, thanks a lot, Eric. It's great to be here. Okay, give us just a brief background on yourself, like the quick overview. How did you get into AI? And then, you know, what was your what was your high level journey? So initially, I actually did not start in AI. I started in theoretical physics. I wanted to be a physicist since I was a kid. And the reason was I just wanted to work on what I believe to be the most interesting and impactful scientific problems out there. And the one miscalibration I think I made is that

Starting point is 00:01:37 when I was reading back on all these really exciting things that happened in physics, they actually happened basically 100 years ago. And I sort of realized that I missed time. You know, you want to work on not just impactful scientific problems, but the impactful scientific problems of your time. And that's how I made it into AI. As I was working in physics, I saw the field of deep learning growing and all sorts of interesting things being invented. I actually, what made me get into AI is seeing AlphaVote happen, which was this system that was trained autonomously to beat kind of the world champion at the game of Go. And I decided I needed to get into AI then.

Starting point is 00:02:18 So after that, I ended up doing a postdoc in Berkeley and this lab called Peter Beals Lab, which specializes in reinforcement learning and other areas of deep learning as well. And then I joined DeepMind and worked there for a couple of years where I've been my co-founder as we were working on Gemini and leading a lot of the reinforcement learning efforts that were happening at Gemini at the time. Yeah, so many topics we could dive into, Amisha. So I'm gonna have to take the data topic. So I'm at the time. There are many things I'm really interested in, but something I'm really interested in is how do you set up evaluations on a data side that ensure that you can predict where your AIs will be successful?

Starting point is 00:03:13 Because when you deploy AIs to a customer, you don't know exactly what the customer's talents are. And so you need to set up evals that allow you to kind of predict what's going to happen. And I think that's a big part of what a data team does is setting up evaluations. And it's maybe one of the last things that a lot of people think about and think about AI because we're thinking about language models and reinforcement learning and so forth. But actually the first thing that any team needs to get right in any AI project is setting up clear evaluations

Starting point is 00:03:46 of matter. And so on the data side, that's something I'm really interested in. Awesome. All right, well, let's dig in because we have a ton to cover. Yeah, let's do it. Misha, I obviously want to talk about AI and we want to dig into reinforcement learning and talk about data for the entire show, but I have to ask about your interest in physics as a young child. So you mentioned that you were interested in sort of working on some of the most important scientific problems and you realized, okay, maybe some of those problems were actually maybe 100 years old.

Starting point is 00:04:21 But what sparked that interest as a, you know, knowing you want to get into physics, that obviously, you know, you ended up not being a professional physicist. But what sparked that interest as a young age? Do you have like a story that you could share around that? Because, you know, knowing you want to be a physicist as a child is not the most common thing. Yeah, I considered cowboy first, but yeah. Cowboy, fireman, physicist.

Starting point is 00:04:50 Well, what happened is that I'm not from the States originally. I'm Russian, Israeli, and then moved to the States as a kid. When I moved, I didn't really speak the language very well and didn't have a community here. And so I ended up having a lot of time on my hands. And my parents had a library, you know, a number of different kinds of books. But one of the books that they brought with them were these lectures in physics by Feynman. And this is kind of a legendary set of that I recommend anyone anyone should read them even if you're physics or not because it's kind of a example of really clear and in that way cleared simple and very beautiful thinking and

Starting point is 00:05:35 I read those books and it was just so interesting that The way in which Feynman described the physical world the way in which you could make really the way in which Feynman described the physical world, the way in which you could make really counterintuitive predictions about how the world works by just understanding how it works from, you know, a set of very simple assumptions, very simple equations. And it was the short answer is I had a lot of time on my hands and got interested actually in a lot of things. At the time, I got interested in literature as well.

Starting point is 00:06:02 I ended up double majoring in literature and physics, but it was literature and physics that I got interested in literature as well. Ended up double majoring in literature and physics, but it was literature and physics that I got interested in at the time and then ended up going kind of hard committing to physics. Wow, absolutely fascinating. Yeah, when you're chatting before the show and you said, you know, I realized it was 100 years too late, I was like, oh, theoretical physics, the answer to that problem is, you know, is traveling through time, so you can get back to, you know, back to that era. Yeah. Well, it might to that problem is traveling through time, so you can get back to that era. Yeah.

Starting point is 00:06:29 Well, it might be that the problems also that we have in physics today are just so hard that it's really hard to solve them. I think that progress in this problem is definitely not being made nearly as quickly as it was 100 years ago, and there's so much to discover. One of my folks with AI is that we develop AIs that are smart enough as scientists that they help us answer some of these fundamental questions that we have in physics, which I think like to me seemed like a complete sci-fi thing even a few years ago, but now almost counter-intuitively it's, I think, everyone

Starting point is 00:07:01 really like theoretical math and theoretical physics is going to be one of the first use cases that we applied as kind of next generation of models that are coming up today. counter-intuitively, I think theoretically, math and theoretical physics is going to be one of the first use cases that we applied as a generation of models that are coming up today. Let's dig into that a little bit because one of the questions, just one of my burning questions to ask you was, what do you envision the future with AI to be like? What does that look like for you? envision the future with AI to be like? types of things do you see in the future that make you excited and in the ways that humans will interact with AI or the way that it will shape the world that we live in? Yeah, I think that, I mean, I'm very personally quite optimistic about AI. Obviously, there are a lot of things that we need to be careful about, especially from

Starting point is 00:08:01 a safety perspective. But there's one quote that I heard a friend say that really stuck with me, which was, you know, artificial and general intelligence, AGI. He said, you know, I think AGI will come and no one will care. I hadn't heard that before. Then I thought about it and I think that's what's going to happen. But from the perspective of, right, we have computers today, we have personal computers today, which is a massive leap from what people had decades ago,

Starting point is 00:08:34 or personal phones. And I would say, and we don't care, we just don't know our lives in any other way. We don't know what life was like before computers or before personal phones, even though the iPhone, I remember when it was like not having an iPhone, but from a day-to-day perspective, I never even think about it anymore. So I think what's going to happen is that, you know, all of the ways in which AI is going to transform us are going to be similar in perception to the way technology has transformed us already. And so what I mean by that is that I

Starting point is 00:09:06 think that in AI, there are oftentimes really polarizing, either hyper-optimistic, we're going to be, it's going to be a completely transformed world, which obviously it is, or doomsday scenarios, things are really going to go down poorly. And I think the reality is that it's a remarkable piece of technology that's probably more transformative than mobile phones or computers themselves. But the effect

Starting point is 00:09:33 on us as people is going to be that we just live our day-to-day lives and it doesn't, it changed our day-to-day lives, but we won't even remember what Lides used to be like. So yeah, I think what's going to happen, for example, from a work perspective, is that now we don't really take notes, write a pencil and paper, we have much better storage systems on the computer for our notes and things like this. And so we've accelerated the amount of work we can do just by having a computer and knowledge work we can do. And I think there's gonna be kind of a massive increase in like productivity, especially in knowledge work to start

Starting point is 00:10:10 and in physical work as well. But let's just think about knowledge work. I think in the future, and this is kind of how I at least think about AGI is that it's a system that does the majority of knowledge work on a computer. So what I think that means, it's not that it's like a zero sum pie and that we go from today doing, let's say, almost 100% of knowledge work on a computer to us going to 10%, AI going to 90%, and now we're doing 10x less work. I think

Starting point is 00:10:37 it's going to be that we kind of work the same amount that we did before, but we're getting 10x more things done. And we don't even remember what it was like to get the amount of things done that we do today. I think that's what, that's what the world is going to look like. That fits the historical curve too, right? Like we don't even know what it's like to sit down and handwrite a memo and then wait several days for it get delivered, right? Like compare that to email, for example. So it seems days for it to get delivered, right? Like compare that to email, for example.

Starting point is 00:11:07 So it seems like it would fit that curve, right? Like you get that drastically faster, more leveraged life and it's just the life you live. Yep. Yeah, absolutely fascinating. One follow-up question to that, Misha, you talked about your co-founder developing some pretty

Starting point is 00:11:25 amazing technology that could mimic what humans do, right? And actually you mentioned, you know, seeing the world champion, world human champion at Go get beat by AI. And then I believe your co-founder developed an autonomous system that could play video games by looking at a screen, which is pretty wild. And one of the interesting things is that, and maybe you can give us some insight into the research aspects of this, But one thing that's interesting about your perspective on, you know, we sort of go about our day-to-day work

Starting point is 00:12:10 and we get 10x throughput. One thing that's interesting about that is, you know, is that a replacement of some of the things that I'm doing as a human? Is it augmentation? Is it... Can you speak to that a little bit? Because just replicating the keystrokes that I make in my computer isn't necessarily the way to get 10x, right? And I think we know that context is something that the AI is amazing with, right? It can take context and really do some amazing things with it.

Starting point is 00:12:37 So can you speak to that a little bit in terms of replicating humans, augmenting? What does that actually look like? Yeah, so the first thing that I'll say is that the kind of algorithms developed leading up to this wallet that we're in right now at the moment and the things that you mentioned too, that Janos, my co-planner worked on,

Starting point is 00:12:57 which we call DQ networks in the case of video games and AlphaGo in the case of the Go example, were actually superhuman. So they got to a human level, and then they exceeded it and became superhuman. So when you look at an AI system playing Atari, it looks kind of alien because it's just so much better now than a human could be. And the same thing was true for Go. And now what you said was right in that the way these systems are trained, especially like let's take AlphaGo as an example, it had two phases.

Starting point is 00:13:33 The first phase was you train it to mimic human behavior. So you have all these games, online games of Go, like similar to how it just has online game servers. There are a bunch of like online game servers for Go, and to how it just has online game servers. There are a bunch of online game servers for Go. And they picked a bunch of those games and filtered them for the expert amateur humans and taught a model to basically imitate expert amateur human behavior. And what that ended up getting was just a model that was pretty proficient, but still just kind of a human model. And then after that, they trained that model, that sort of human level model,

Starting point is 00:14:09 with reinforcement learning based on feedback of whether the model was winning the game or not. And the thing with reinforcement learning is that you don't need demonstrations from people. You just need a criteria for whether the thing that the model did was correct or not. And as long as you have that, which in the case of the game of Go is did you win the game or not? Sure. You can basically push it almost, you know,

Starting point is 00:14:32 if you throw enough compute at it, it will get to superhuman model, right? It will just find strategies have never even thought of. And that's kind of what ended up happening. So there's a famous move called move 37 in the game of AlphaGo against LisaDoll, the world champion in Go. And move 37 was a move that looked really bad at first,

Starting point is 00:14:54 like analysts were looking at it were confused and LisaDoll was confused. Everyone was just really confused by it. And then it turned out a few moves later that it was actually a really creative play that was just really hard for people to wrap their minds around. And it turned out to be the right play in retrospect. So we have that, that is all, what I'm trying to say is like, we have the blueprints for

Starting point is 00:15:17 how to build superhuman intelligence systems. And so I think we are heading into an era of super intelligence. Now, it does not necessarily mean super intelligence at everything, but we will have models that are super intelligent at some things. Well, I think it's a great time to talk about reflection. So tell us about reflection, because that's a focus of what you're trying to do at reflection. So tell us about reflection and because that's a focus of what? You know what you're trying to do at reflection So tell us about reflection what you're looking on before we jump into that on the just because I think I've seen a lot of this Throwing around and like news articles and stuff

Starting point is 00:15:54 So you've got a GI right and you've got this superhuman and I think there's been some chat around that like oh like we're like moving Past a GI to superhuman it'd be it'd be awesome I think for the listeners to just take a minute and be like alright. What do we mean a GI? like, oh, we're moving past AGI to superhuman. It'd be awesome, I think, for the listeners to just take a minute and be like, all right, what do we mean AGI? Obviously that's general intelligence, superhuman. And then just parse that out for them a little bit, because I think those words already are just getting thrown around. What does it mean to go beyond human level proficiency and be superhuman? Yeah, right. Yeah. Yeah. And I think, you know, if we put other words into the mix that may be good to kind of talk

Starting point is 00:16:32 about later is also the word agent, right? I think the word. Yeah, yeah, let's throw that into the soup. Yeah, for sure. And then super. Yeah, yeah, exactly. Many things. So at least the way I think about it is first, I don't think about binary events, like there's AGI and then there's super basically AGI. I think about it more as a continuous spectrum and that's kind of how like in the game of Go, for example, there was no, it's really hard to pinpoint a moment when it went from, you know, human level intelligence to superhuman. Like the curve is actually smooth. Like, so it's kind of a smooth continuum and even subhuman intelligence,

Starting point is 00:17:12 like it's smooth from subhuman to human to superhuman. So it's really around, like if we have discovered methods that scale, that the more kind of compute and data we throw at them, the just predictably at them, they just predictably, right, they scale on their intelligence, then those are kind of systems that we're talking about. So to answer your question, to me, the distinction between sub-human intelligence, human intelligence, and superintelligence is just where on the smooth curve of intelligence are you. Now, it's helpful to be, you know, yeah, it's helpful to define what some of these things are.

Starting point is 00:17:51 And different people have different definitions for AGI. I think that there's a centralized, like the community has converged on what people agree it to be. But we have a version that we're working with, a working version that is kind of meaningful to us. And that's kind of how we think about AGI, which is, it's a functional definition. It's just, we're thinking about digital AGI. We think the same thing can be applied to physical AGI.

Starting point is 00:18:19 It's a system. We don't know how, like, we don't know, it can be a model, it can be a model, it can be a model with, you know, with tools on a computer, but it's a system that does the majority of knowledge work on a computer. And notice I'm not saying the majority of knowledge work that people do today, because I think the majority of the knowledge work that's done, you know, even a few years from now is going to look largely different. So just at a given point in time, when you're, when you assess like the work that's done, you know, even a few years from now is going to look largely different. So just at a

Starting point is 00:18:45 given point in time, when you're, when you assess like the work that's being done on a computer, that's struggling with economic value, is majority of that being done by humans or by computer, basically the computers themselves. And, and to me, that's kind of what AGI is. So it's more a functional definition. And what that means is that the only benchmark that matters is whether AI is doing meaningful work for you on a computer. It doesn't matter what math benchmark it's solved. It doesn't matter.

Starting point is 00:19:13 None of the academic benchmarks matter whatsoever. All that matters is it doing meaningful work for you on a computer or not. And so what's an example of like products that I think, you know, make meaningful impact along that kind of benchmark? Let's say GitHub Copilot. GitHub Copilot, you can just track the amount of code that it writes versus the amount of

Starting point is 00:19:33 code that the person writes. Now, of course, you also have to decouple the amount of time a software engineer thinks about the design of the code and things like this. But it's hard to argue that it's not doing work on a computer. Like it's definitely doing some work on the computer. And so on the smooth spectrum from, you know, sub human intelligence to human intelligence,

Starting point is 00:19:54 super intelligence, I think copilot is on that spectrum. Right? It might not be general intelligence, but it's on the way there. So quick, quick followup. And then I definitely want to dig in on reflections application of superhuman intelligence. But something that's frustrated with me a little bit in how we talk about this is we've got this like AI curve that you just explained. But then we treat the human intelligence as like a static factor, like some kind of standard to get to. And like I would, I mean, the way I think about it is like that human intelligence has changed over time for sure and will continue to change. And I think there's an aspect of like whenever we talk about AGI, like when is AGI going to happen? It's like, well, I think the humans are going to get more intelligent too. And that like, you know, like even with a game of Go example, I would think it's very possible that like if somebody used,

Starting point is 00:20:45 you know, this model to essentially like learn new Go strategies and therefore like they're better too. Now maybe, you know, maybe the AI is still better than them overall. So like maybe just briefly like, I'd love your thoughts on that. I think that's actually exactly what's happened that the Go community and the chess community, they both, yeah, they both learn from the AI systems now. So, right, what made Move37 special, people analyzed it and have incorporated that into their gameplay. One of the things I'm really excited about is, you know, I just remember what my life was like as a theoretical physicist, which is, I mean, it was very like theoretical

Starting point is 00:21:22 physicists, like, you know, write equations on a chalkboard and, you know, derive things with pencil and paper. And you basically sit in the room, think really hard, derive things, go talk to collaborators and, you know, kind of try to sketch out ideas on a chalkboard. And what I'm really excited about, you know, especially AI that's super intelligent in some aspects of physics, that it's going to be this sort of patient and infinitely available thought partner for scientists to be able to do their best work. So I think that kind of for a while, it's going to be the combination of, you know, scientists together with an AI system that works together to accomplish something because something that's kind of counterintuitive that we usually think about intelligence is this very general thing because humans are generally intelligent and these AI systems are

Starting point is 00:22:15 generally intelligent and will continue to be as well but general in their case means something different than in our case. That is to say, they can be intelligent across many things, but there are some things where they're not gonna be as intelligent that are counterintuitive to us because you're like, wait, that's like so easy for us. It's kind of like the, yeah, we have these systems for playing like Go, but it's really hard to train robots to like move a cup somewhere or something like this.

Starting point is 00:22:41 Right? Yeah, yeah, yeah. Yeah, yeah. So yeah, that's how I kind of see the interplay. I think that this universal generality as we see it as sort of maybe if possible, but as somebody who needs a goal, these AI systems like end up spiking at many things that are counterintuitive to us and they end up being, you know, pretty done with many things that are kind of intuitive and we'll sort of co-evolve together with them. Yeah. Yeah I that's such a helpful perspective me Jenna. I want to return to the point that you made around the definition of a GI or the working definition reflection around You know AI doing the majority of knowledge work on a computer

Starting point is 00:23:21 But with the important distinction that you know, that's not just a wholesale replacement, you know, so it's not like, you know, the human is not even interacting with the computer. It's that the knowledge work that a human does actually changes. And I think that's a really helpful mindset to have in that when we talk about, you know, the future of AI, we tend to think about how it impacts the world as we experience it today, when in fact it will be a completely different context.

Starting point is 00:23:54 There will be new types of work that don't exist today, which is really interesting, so just appreciate that. And there'll be things that it's bad at, like there'll be lots of maybe more human cup movers or whatever the equivalent of that maybe in knowledge work that will be interesting. Yeah, there was actually a scene from I think it was like Willy Wonko, Charlie and Chocolate Factory. Yeah. And it's, I think it's that Tim Burton Johnny Depp one where they show like his father being on the conveyor belt line and like screwing on the caps to like a piece of toothpaste.

Starting point is 00:24:30 And then one day he gets replaced by a robot that does that. When I was at Berkeley, I studied robotics and you know how to make robotics autonomous. And then I thought about that and it was like, that's actually a really hard problem. You know, like that requires dexterity that requires like, like it's all those things that, you know, in the movies we think like you can, you can do that easily. That that was like one of those things that's counterintuitive. It's really hard. Yeah, that's hilarious. Yeah. I mean, that was truly, truly fantasy, you know, in the movie. Well, let's jump over to reflection. So you described reflection.

Starting point is 00:25:26 I know you're still early on the product side of things, but what can you tell us about what you're working on and what you're building? Definitely happy to share more. The way we think about our company and the way we thought about it since we started it is that we've been on the path as researchers of building AGI for the better part of the decade now, or that was kind of our entrance. Right? Yannis, my co-founder, joined DeepMind in 2012 as one of the founding engineers when it was just a crazy thing to even say that it just seemed like a complete sci-fi dream

Starting point is 00:26:01 that you want to work on AGI and in the scientific community, most people kind of even ostracized you if that's kind of what you want to do because it was just such a crazy, almost unscientific thing to say. It's just not serious. And so he joined at that time. And this is when these methods and reinforcement learning were developed that resulted in these projects like D2 Networks and AlphaGo. But ultimately, the reason he joined, the reason I joined AI as a researcher is this belief that at first it was pretty vague.

Starting point is 00:26:33 What does it mean? There's a belief that maybe we can build something like AGR within our lifetime, so might as well try it and see the most exciting thing we can do. But since then, I think it's gotten a bit more concrete. And now I think we're in a world where this definition of the system that does majority of meaningful knowledge work on a computer is in the realm of possibilities. Like it's not, it doesn't feel like sci-fi to me at all. It seems like something that we're just inevitably headed towards. And so if that's a system you want to build, you then have to think backwards towards what does that

Starting point is 00:27:05 mean from a product perspective, from a research perspective. And we basically started thinking about, well, what does the world look like a few years from now? Once we start making it as a field, a wedge into starting to do some meaningful knowledge or computer, where does that even start? Where does that happen? And what does the world look like? And one useful place to think is that now that we have, you know, before language models, we didn't even know what the form calculation would be, right? It was the fact that language models work was pretty crazy. It surprised everyone. And it's still today. I just remember what the world was like before then. And it's just kind of magic that it even worked. Like we, this is one of those things, right?

Starting point is 00:27:45 It happened and we don't care. Like language models are just magic. Yeah. I just want to stop and appreciate that you have been researching AI for a decade. And that's the way that you describe it was that everyone was surprised at this. Cause I was thinking, I wonder if Misha, you know,

Starting point is 00:28:04 sort of could see this acceleration happening, but it sounds like, you know, surprised at this? which was like, yeah, it works at this and this, but will it really scale and do these things? And different AI researchers at different points in time in their careers got scaling-pilled and realized that, wow, these things do scale. Some people, some people happened earlier, I think the opening I crew had happened earlier. I was, I would say somewhere middle on that spectrum.

Starting point is 00:28:42 So, early enough where got to be part of the early team in Gemini and really on that spectrum. So, you know, early enough where got got to be, you know, part of like the early team in Gemini and really build that out. But still, I feel like I was it felt like I was a bit late to the game. Fascinating. Okay, sorry to interrupt. Okay, so reflection, you are looking several years ahead, imagining what it takes to, you know, for AI to do a majority of knowledge work on a computer and you're working back. So where, where did you like, where's the focus? Right? Like, you know, cause that's a pretty broad thing, right? Like knowledge work on a computer is pretty broad.

Starting point is 00:29:18 Yeah. So I'll start with the punchline first and kind of explain why just to contextualize. So we decided that the problem that needs to be solved is the problem of autonomous coding. So if you want to build for this future, or if you have systems that do majority of knowledge work in a computer, you have to solve the autonomous coding problem. It's kind of an inevitable problem that just must be solved. The reason is the following. Language models, the way language models are most likely going to interact with a lot of software and computer is going to be through code. We think about interactions

Starting point is 00:29:56 with computer through keyboard and mouse because the mouse was designed for this. By the way, the mouse was invented what, like 60 years ago? like the angle bar kind of mother of all demos was in the 1960s. So it's to, you know, it's actually like a pretty new thing. And it was an affordance that unlocked our ability to interact with computers. Now we have to think about for AI is like knowing that now the form factor that's really working is language models. What is like the most ergonomic way for them to do work on a computer? And by and large, it turns out that they actually

Starting point is 00:30:28 understand code pretty well because there's a lot of code on the internet. And so the most natural way for a language model to do work on a computer is basically through function calls, API calls and programmatic languages. And we're starting to see the software world kind of evolve around that already. Like Stripe a few months ago released an SDK that is built for a language model to basically

Starting point is 00:30:51 transact on Stripe reliably. And we think that along the software, like Excel, for example, do we think that a language model is going to drag, you know, an AI is going to drag a mouse around that people do to click a table in Excel and manipulate data that way. Almost certainly not. It's going to probably do it through, again, through function columns, right? We have SQL, we have querying languages. And so we kind of need to think about how do we believe Sahu will get re-architected in a way that is ergonomic to AI's systems. So that's how we're thinking about things. And if you think about it that way, you just realize that, I mean, there's always

Starting point is 00:31:28 going to be a long list of things that there are going to be code affordances for, but a lot of the meaningful work will, like a lot of those big pieces of software that people use today and where you do most of your work today, will have affordances through basically programmatic affordances. So if that's what you believe the world looks like, at least, a significant part of knowledge

Starting point is 00:31:50 work on a computer is done that way, then the bottleneck problem is, okay, assume it has all of these programmatic affordances, how do you build the intelligence around it? And so the intelligence around that is an autonomous coder. It's something that kind of, you know, it's not just generating code. It's also thinking. It's reasoning. I would say, I think now I need to go, like, and open up this file and, you know, search for this information and then, you know, maybe send an email to this person, right? Like, it needs to be thinking and kind of reasoning. But then it was, it's acting on a computer to

Starting point is 00:32:23 code. So, we kind of thinking backwards and we thought about, okay, what is the category that today, like basically today has like the affordances that we need to start and it's like very valuable and something that we do all the time that we can have kind of high empathy for as product builders. And so it did not really just on time as coding for us, both because we believe that that's sort of the gateway problem to automation a whole bunch of pieces of software, they're not coding,

Starting point is 00:32:54 but coding is also the problem setting where that is right today for language models because the ergonomics are already there. Like because like it's kind of language laws are good at code because there's a lot of code on the internet. And so you don't need, the ergonomics are there. They know how to, like, you can build tools to read files, to add current terminal, to read documentation. And so it's just kind of a right category today that's truly valuable that we understand very well, code, based on your study of AI over the last decade,

Starting point is 00:33:53 that's an area that's right for this. What are the other areas you think in the relative near term are right for automation as well? There are a bunch of them. The way I would think about it, the way we think about it, and this is true both for what we're doing and both for other companies that are working in automation with AI, building autonomous agents, be it for coding or something else, is that what you're really saying, I think that a good analogy here is this transportation going from cars as they are today to autonomous vehicles. I think it kind of analogy here is this sort of transportation, like going from cars as they are today to autonomous vehicles. I think that kind of lands here as well. And the way to think about is that chatbots like ChatGPT and Complexity and GitHub Copilot, these products that are much more

Starting point is 00:34:38 chat, like you ask them something, they give you something back. We think about them as like the cruise control of vehicles, of transportation vehicles, because they kind of work everywhere. They're not fully autonomous on anything really yet, but they work everywhere. And so there are these like general purpose tools that are kind of, you know, that are cruise controls, augment the human. Now, if you're trying to build a fully autonomous experience, like, you know, this is what people refer to as vegans today, a fully autonomous experience, like people refer to as vegans today. The same thinking, it's much closer to how you would think about designing an autonomous vehicle.

Starting point is 00:35:17 Autonomous vehicles don't work everywhere from day one. They have a geofencing problem. And the kind of players that won are Waymo, I think is, I got on a Waymo when I was in San Francisco last, it was just this magical experience. And they did a fantastic job by basically nailing San Francisco and they geofence it. And you can't go on high roads, you can't do all these things that you can do in normal car. But within the geofence area, it works so well, that it's just a transformative magical experience. And I think that is how people should be thinking about autonomous agents. So we shouldn't be actually promising, you know, we were promising like a fully autonomous vehicle, like in the future. So right here, promising a thing that automates a lot of stuff, like a computer

Starting point is 00:35:56 future, that's clear where things are going. But today the important problem is geo-fencing. And so what are examples of that? I think customer support is an area that has shown this kind of workflow work really well. How does the geofencing analogies transfer there? It's that some tickets that your customers are asking about can be fully resolved with. Maybe they have a simple question that's actually an FAQ or something like this. And so you'll route that to an autonomous agent that will just solve that.

Starting point is 00:36:25 And the tickets that are more complex, you'll send to a human. Or if like the customer asks it to be escrow, you'll send it to a human. So there's a sort of a, like I think that successful product form factors in agency and autonomy, have this sort of geo-fencing baked into them,

Starting point is 00:36:40 but they kind of take on the thing they can do well, and then help the customer outsource the thing that they can't do well yet to like the normal, you know, state of affairs. So I'm curious your opinion on this. I think there's an interesting like loop here where, yeah, like it makes total sense, like interact with this AI thing and then like human, you know, human in the loop type thing. But I think there's also this aspect of enough companies have to like generally be able to human in the loop type thing. So say it was a solved problem, but essentially 5% of companies have the technology where this works well.

Starting point is 00:37:27 Humans are going to be like, I want to talk to a person. They're just going to try to get past the AI agent as soon as possible. So I'm curious about your thoughts with that because there's this what's possible problem and there's like, will humans adopt it? Will humans use it? Because you guys must face that building a product. Yeah. So I think for us and for others like to complete the customer support thing, the ideal experience is that the human doesn't even know. Like it's just the customer came in, their problem got solved and they didn't care or

Starting point is 00:37:56 know what to do for them. Yeah, give it a name, give it a face. Right. And that's the way we think about kind of autonomous coding. So the kinds of things, you know, so when we think about geo-fencing, we think about, we, you know, you want to go for tasks that are actually pretty straightforward for an engineer to do because these models aren't, you know, like super-aggressive yet, but you want these tasks to be things that are tedious and high volume and that engineers don't like

Starting point is 00:38:22 doing. So there's so many examples of these things, like code migrations. There's so much part of like a migration when you're moving like this version of Java to that one that is kind of thankless work or, you know, suppose you're relying, writing tests, suppose you're relying on a bunch of third party APIs

Starting point is 00:38:38 or dependencies. It got, you know, an API got updated. It wasn't backwards compatible. Your code fails. Your engineer has to change what they were doing to go fix that. And right, it's sort of, again, undifferentiated work that's, especially for companies that have very sophisticated engineering teams and are doing a lot, they end up having this sort of backlog of these kinds of tedious small tasks that actually not really like

Starting point is 00:39:00 well differentiating tasks for them as a business at all. And so these are the kinds of tasks where a product like ours comes in. We, you know, when customers kind of talk to us, we, like, we, they don't even think necessarily of like a co-pilot like product because they think about if we can just automate these, you know, for them, some subset of them, right? Some subset of these migration tasks or like third party API breaking or having some subset of her backlog, then it's something that engineers never even have to do. And so whereas like the co-pilot helps them do the things that are on their plate faster. And interesting from the developer's perspective, like so much as customer support use case, it should be indistinguishable for the tasks where it works from like a competent engineer

Starting point is 00:39:45 sending them a pull request to review, right? Like a failure mode for a company that does autonomous coding is that you took on more than you could chew and your agent is sending bad pull requests and now the developers are wasting their time like you jump code. Yeah, right, right. So as long as from like, you know, you have to be pretty strategic about the tasks that you pick on and sort of not promise, you know, set expectations correctly

Starting point is 00:40:12 and deliver an experience that is basically indistinguishable from a competent engineer doing this. Yeah, so that's really interesting. So essentially, and I mean, this is an overused term, but essentially like this could look like some kind of like self-healing component of an app. So essentially, and I mean this is an overused term, but essentially this could look like some kind of self-healing component of an app. So from an engineer's perspective, you could engineer this into the app and it's able to autonomously take care of API updates

Starting point is 00:40:40 and maybe a couple other things. That's really interesting. One question I have is around what it takes to get to fully autonomous. We use the example of tests or API integrations or other things like that, is there, really hard because they had to deal with all these edge cases. Even geofencing, I think, helped limit the scope of that, but it was still really difficult to solve for all these edge cases. Is it the same way when you think about an autonomous coding?

Starting point is 00:41:39 Is the last 10% really difficult to go from, this is something where it is truly autonomous. There's kind of a yes and no part to that. The part where I think the analogy to autonomous vehicle breaks is that an autonomous vehicle is truly autonomous and safety is so important that there's absolutely no way it can do anything wrong, right? But in this instance, right, suppose that a coding agent did most of what you asked it to do, but didn't do, you know, miss some things.

Starting point is 00:42:11 Well, if it did stuff that was pretty reasonable, right, then you just go into code review with it and tell it, hey, you missed this, just like you would with a developer. So I think that the kind of failure tolerance is higher. Like, you know, there's more tolerance in like digital applications like this. Now, the thing is, what you want to avoid is, you know, a model that you asked it to do something, it came back and just, it just wasted your time, basically, right? It's

Starting point is 00:42:38 like there's, it's kind of the amount of time that would save me to go like back and forth with this thing, we just wasted time. So it's similar to how when you hire someone, if it's someone who, let's say, was just not trained in software engineer, and it would take longer to upskill them and train them to be a software engineer than just to do the task yourself. So I think that the actual eval is like, is this net beneficial to you as a developer? Like are you spending less time on doing things you don't like to do with the system or not? Rather than like meeting that level of perfection in the time you spend. Makes total sense. Okay, you mentioned evals early in the show, we were talking earlier, and how you said

Starting point is 00:43:21 that's one of the most important aspects of this, especially as it relates to data. So I think the last topic we should cover is your question, John, which we made everyone wait a really long time around data teams. Some games have been changed over here. Yeah, exactly, exactly. So John, why don't you revisit your question?

Starting point is 00:43:44 Because I want to wrap up by talking about the data aspect of this. I mean, I could keep going asking a ton of questions because it's so interesting, but. Yeah, I think, you know, obviously a lot of our audience, you know, works on data teams. And I think I'm personally curious and I bet a lot of the audience is curious about what,

Starting point is 00:44:01 what does it look like? So say I'm a data team that works for reflection. I'm on that data team and dealing with AI agents and on a daily basis, like how is it similar to what I might do at a B2B tech company or an industry? And what are the main differences? Something, as I mentioned earlier, when you kind of first asked the question,

Starting point is 00:44:26 I think something that is possibly the most important thing to any successful like AI project product or research is getting your evaluations right. So actually the most like successful AI projects, they typically start with some phase where they spend, they don't, they're not training any models about doing anything like this. They're just figuring out like,

Starting point is 00:44:46 how are we getting value and success? And the reason, this is something that when we typically, right, adopt, let's say, when you see like all these coding products and like AI products in the market, there is sort of like shooting from the hip thing where it's like, I put it through some workflow, here you go customer, like does it have value or not?

Starting point is 00:45:06 Whereas the way like I've seen like successful products like this built out, like how does, for example, like when a company develops a language model, a GBT model or a Gemini model or whatever, how does it know that the thing it's running, like the people, the users will like it, right? You have to develop all these evaluations internally that are really well correlated to what your customers actually care about. And so in the case of

Starting point is 00:45:32 chatbots, that evaluation is basically preferences. You have your data team, what does a data team do for a normal language model chatbot-like product. They get a lot of data from human ratings that is different prompts. Then those raters basically say which ones, which prompts they liked more over the others. Typically, it means that the thing that gets up-weighted is more helpful responses, things that are formatted nicely,

Starting point is 00:46:02 things that are safe, like they say they're not offensive. And those, it's really important to set up those evals that you're benchmarking internally to actually correlate with what your customers actually care about in your end product. And I think that that's something that it's kind of a new way of operating

Starting point is 00:46:20 because you're like, these systems aren't deterministic, like, you know, like software as we know know it and so when you're shipping like something that is probabilistic that is going to work in some cases not work in other cases you have to come in with some degree of confidence like whether you know we're coming to a customer sometimes our use cases will not be a good fit for us because we built evals and we were able to predict that actually for these use cases like the models are not ready yet. Yeah Can you give us an example of just a really simple eval, like what that would look like? Yeah. So for example, like for coding, right? That's kind of what we're building these autonomous coding models and the eval,

Starting point is 00:47:01 what is the eval there? The eval there will be, from a customer perspective, will they actually merge the code that are proposed and how long of an interaction or back and forth will it take them to merging, right? So then the question is, well, we want that experience to be delightful for customers. We're not going to, we don't want to like set up complex evals for every customer

Starting point is 00:47:24 because that's just gonna be a waste of their time. So it't want to like set up complex evals for every customer because that's just going to be a waste of their time. So it's how do we set up internal evals that are kind of representative of what our customers care about? And so an example of this is, well, if we care about the merge rate, like the merge rate of pull requests from our customers, then we should be tracking like the merge rate on similar kinds of tasks to our customers. So you know, some things that we, right? So we have different task categories like migrations,

Starting point is 00:47:49 cybersecurity vulnerabilities, these sort of third-party like API breakages. And, you know, your data team, what it does is that the eval side of it is that it curates data sets that are representing that. And then for every version of our model, right, we run, we basically run a Tunze eval and we have different eval for different use cases.

Starting point is 00:48:12 And we're seeing like where our models stack up, you know, some of them they do better, some they do worse, but it allows us to come to customers. And when we've identified a use case that is a good fit, that's high confidence that it will be a delightful experience. And I don't think most teams that build like products that may do not come from a research background are as scientific about it because it's setting up the eval takes a really long

Starting point is 00:48:34 time. And it's just kind of a pretty complex process, right? Where are you going to like, where are you going to source like the coding raters who are going to basically rate whether you'd merge these things or not? How are you going to manage that team? Where are you going to source the tasks from that are representative of what your customers care about? These are the kinds of questions that the data team answers and more so beyond that, it's how do we collect the data that we need to train models to be good at the things that

Starting point is 00:49:05 the customers care about? At various aspects, how do we collect data for super-volume fine-tuning? How do we collect data for reinforcement learning? You need to be as nimble on data research as you are on basically software and model research. We think a lot about algorithms and model architectures and things like that. And the thing that maybe is equally important but less frequently talked about, like in papers, is the data research that needs to go into like operational data research to make sure that these systems are reliable, the things you carry out.

Starting point is 00:49:38 Right. Love it. That's so interesting and very true of people too. Well, I was just going to say there's timeless, there's a lot of timeless wisdom in that approach as well. and very true of people too. probably a lot of our listeners have tried can help me write code. space, the best thing is to use coding products that are copilot or cursor, that are these five initial, as you're talking about, cruise control. I think that that's how I actually started using both products. A lot of members of our team use those products and they've been very, very informative. And as I said, in a sense, sort of complimentary.

Starting point is 00:51:05 I think that getting autonomy right and getting agency to work is a more complex and nuanced problem. And typically what we find when we talk to customers, by the time they're thinking about autonomy and agency, they've already been using copilot for some time. And they're pretty well educated on what kinds of problems they believe they have or can be automated. So if it's someone coming from blank slate I would find you know take a like off the shelf product like a copilot or a cursor and give that a shot and sort of start just trying it out empirically and seeing like what sorts of values drive

Starting point is 00:51:40 into them. Love it. All right, well, Misha, best of luck as you continue to dig into research and build products. And when you're ready to come out of stealth mode, of course, you know, tell John and I, so we can, you know, so we can kick the tires, but we'd love to have you back on the show to talk about some product specifics in the future. That sounds great.

Starting point is 00:52:00 Thanks, Eric. Thanks, John, for having me. The Data Stack Show is brought to you by Rutter Stack, the warehouse native customer data platform. Rutter Stack is purpose-built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com.

Your Ad Here

The Data Stack Show - 229: The Future of AI: Superhuman Intelligence, Autonomous Coding, and the Path to AGI with Misha Laskin of ReflectionAI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.