The Data Stack Show - Re-Air: The Future of AI: Superhuman Intelligence, Autonomous Coding, and the Path to AGI with Misha Laskin of ReflectionAI

Episode Date: October 22, 2025

This episode is a re-air of one of our most popular conversations from this year, featuring insights worth revisiting. Thank you for being part of the Data Stack community. Stay up to date with the la...test episodes at datastackshow.com.This week on The Data Stack Show, Eric and John welcome Misha Laskin, Co-Founder and CEO of ReflectionAI. Misha shares his journey from theoretical physics to AI, detailing his experiences at DeepMind. The discussion covers the development of AI technologies, the concepts of artificial general intelligence (AGI) and superhuman intelligence, and their implications for knowledge work. Misha emphasizes the importance of robust evaluation frameworks and the potential of AI to augment human capabilities. The conversation also touches on autonomous coding, geofencing in AI tasks, the future of human-AI collaboration, and more. Highlights from this week’s conversation include:Misha's Background and Journey in AI (1:13)Childhood Interest in Physics (4:43)Future of AI and Human Interaction (7:09)AI's Transformative Nature (10:12)Superhuman Intelligence in AI (12:44)Clarifying AGI and Superhuman Intelligence (15:48)Understanding AGI (18:12)Counterintuitive Intelligence (22:06)Reflection's Mission (25:00)Focus on Autonomous Coding (29:18)Future of Automation (34:00)Geofencing in Coding (38:01)Challenges of Autonomous Coding (40:46)Evaluations in AI Projects (43:27)Example of Evaluation Metrics (46:52)Starting with AI Tools and Final Takeaways (50:35)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everyone, before we dive in, we wanted to take a moment to thank you for listening and being part of our community. Today, we're revisiting one of our most popular episodes in the archives, a conversation full of insights worth hearing again. We hope you enjoy it and remember you can stay up to date with the latest content and subscribe to the show at datastackshow.com. Hi, I'm Eric Dots. And I'm John Wessel. Welcome to The Datastack Show. The Datastack Show is a podcast where we talk about the technical, business, business and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data
Starting point is 00:00:37 technologies and how data teams are run at top companies. Welcome back to the Datasack show. We are here today with Misha Laskin and Misha, I don't know if we could have had a guest who is better suited to talk about AI. because you have this amazing pass you and your co-founder working in sort of the depths of AI doing research, building all sorts of fascinating
Starting point is 00:01:10 things, you know, being you know, part of the history's background of acquisition by Google and, you know, on the deep mind side and some amazing stuff there. So I am humbled to have you on the show. Thank you so much for joining us. Yeah, thanks a lot, Eric. It's great to be here. Okay, give us just a brief
Starting point is 00:01:27 background on yourself, like the quick overview, how did you get into AI and then, you know, what was your, what was your high level journey? So initially I actually did not start an AI. I started in theoretical physics. I wanted to be a physicist since I was a kid. And the reason was I just wanted to work on kind of what I believe to be the most interesting impact of scientific problems out there. And, you know, the one discalibration that I think I made is that when I was reading back and like all these really exciting things that happened in physics. They actually happened basically 100 years ago. And I sort of realized that I missed time.
Starting point is 00:02:05 You know, you want to work on not just impactful scientific problems, but the impactful scientific problems of your time. And that's how I made it into AI. As I was working in physics, I saw the field of deep learning growing and all sorts of interesting things being invented. I actually would maybe get into AI is seeing AlphaGoat happen, which was this system that,
Starting point is 00:02:27 was trained autonomously to beat the world champion at the game of Go, and I decided I needed to get into AI then. So after that, I ended up doing a postdoc in Berkeley in this lab called Peter Beals Lab, which specializes in reinforcement learning and other areas of deep learning and then I joined DeepMind and worked there for a couple of years, where I've been my co-founder as we were working on Gemini and leading a lot of the reinforcing learning efforts that were happening at jump at the time. Yeah, so many topics we could dive into Misha. So I'm going to have to take the data topic. So I'm really excited to talk about how data teams look the same and how they look a little bit different when they're working with AI data. What's a topic you're excited
Starting point is 00:03:14 to dig into? I think on the data side, there are many things I'm really interested in, but something I'm really interested in is how do you set up kind of evaluations on the data side that ensure that, you know, you can predict where your AIs will be successful. Because when you deploy AIS to a customer, it's sort of, you know, you don't know exactly what the customers townists are. And so you need to set up evals that allow you to kind of predict what's going to happen. And I think that's part of, a big part of what a data team does, is setting up evaluations. And it's maybe one of the least, maybe it's one of the last things that a lot of people think about and think about AI because you think about language models
Starting point is 00:03:56 and reinforcement learning and so forth. But actually the first thing that any team needs to get right in any AI project is setting up clear evaluations of matter. And so on the data side, that's something I'm really interested in. Awesome. All right, well, let's dig in because we have a ton to cover. Yeah, let's do it. Misha, I obviously want to talk about AI
Starting point is 00:04:16 and we want to dig into reinforcement learning and talk about data for the entire show. But I have to ask about your interest in physics as a young child. So you mentioned that you were interested in, you know, sort of working on some of the most important, you know, scientific problems. And you realized, you know, okay, maybe some of those problems were actually, you know, maybe 100 years old. But what sparked that interest as a, you know, knowing you want to get into physics that obviously, you know, you ended up not being a professional physicist. But what sparked that interest as a young age? Do you have like a story that you could share around that? because, you know, knowing you want to be a physicist as a child is not the most common thing.
Starting point is 00:05:01 Yeah, I considered cowboy first, but... Yeah, cowboy, fireman, physicists. Well, what happened is that I... So, I'm not from the States originally. I'm Russia and Israeli, and then moved to the States of the kids. And when I moved, I didn't really speak the language very well and didn't have a community here. and so I ended up having a lot of kind of time on my hands. And my parents had a library, you know, a number of different kinds of books.
Starting point is 00:05:32 But one of the books that they brought with them were these lectures in physics by finding. And this is kind of a legendary set of books that I recommend anyone should read them, even if your physics or not, because it's kind of an example of really clear and in that way, cleared simple and very beautiful thinking. And I read those books and it was just so interesting that the way in which Feynman described the physical world, the way in which you could make really counterintuitive predictions about how the world works by just understanding how it works from a set of very simple assumptions, very simple equations. And it was the short answer is I had a lot of time in my hands and got interested actually in a lot of things. I at the time got
Starting point is 00:06:18 interested in literature as well. Ended up double majoring literature and physics, but there was literature and physics that I got interested in at the time and then ended up going kind of hard committing to physics. Wow, absolutely fascinating. Yeah, when you're chatting before the show and you said, you know, I realized it was 100 years too late, I was like, oh, theoretical physics, the answer to that problem is it, you know,
Starting point is 00:06:40 is traveling through time, you know, so you can get back to, you know, back to that era. Yeah. Well, it might be that the problem is also. that we have in physics today are just so hard that it's really hard to solve them. I think that progress in it's probably not, it's definitely not being made nearly as quickly as it was 100 years ago and there's so much to discover. And one of my kind of folks with AI is that we develop AIs that are smarter at this
Starting point is 00:07:07 scientists that they help us answer some of these fundamental questions that we have in physics, which I think like to me seem like a complete sci-fi thing even a few years ago. But now, almost counter-attuitively, it's, I think, kind of you want to read, like, theoretical math and theoretical cynics is going to be one of the first use cases that we applied as kind of next generation of models that are coming out today. Fascinating. Yeah. Let's take into that a little bit, because one of the questions, just one of my burning questions to ask you was what, you know, what do you envision the future with AI to be like? I mean, what does that look like for you? Maybe in sort of some of the, like, the best ways possible.
Starting point is 00:07:45 So, for example, AI can help scientists accelerate progress on these like monumentally difficult problems to create breakthroughs. I mean, that's incredible. What other types of things do you see in the future that make you excited and in the ways that humans will interact with AI or the way that it will, you know, shape the world that we live in? Yeah, I think that, I mean, I'm very personally, like, quite optimistic about AI. there are a lot of, a lot of things that we need to be careful about, especially from safety perspective. But there's one quote that I heard of friends say that really stuck with me, which was, you know, so artificial and general intelligence, AGI. He said, you know, I think AGI will come and no one will care.
Starting point is 00:08:34 Hmm. I hadn't heard that before. And then I thought about it. And I think that's, I think that's what's going to happen. But from the perspective of, right, we have computers today. We have personal computers today, which is a massively from what we, what people had, you know, decades ago or personal phones. And I would say we, and we don't care. Like, we just don't know our lives in any other way. Like, we don't know what life is like before computers or before personal phones, even though, right, the iPhone, you know, I remember when it was like not having an iPhone, but from a day-to-day perspective, I never even think about it. Sure. So I think what's going to happen is that, you know, all of the ways in which AI is, you know,
Starting point is 00:09:15 transform us are going to be similar in perception to the way technology has transformed us already. And so what I mean by that is that I think that in the eye, there are oftentimes like really polarizing like either like hyper optimistic we're going to be, you know, you know, it's going to be a completely transformed world, which obviously it is, or like doomsday scenarios, like things are really going to go down poorly. And I think the reality is that it's a remarkable piece of technology that's probably more transformative than mobile phones or computers themselves.
Starting point is 00:09:50 But the effect on us as people is going to be that we just live our day-to-day lives and it changed our day-to-day lives, but we won't even remember what rides used to be like. So, yeah, I think what's going to happen, for example, from a work perspective, is that
Starting point is 00:10:07 you know, now we don't really take notes, right, a pencil and paper. We have, like, much better storage systems on the computer for our notes and things like this. And so, right, we've accelerated the amount of work we can do just by having a computer and knowledge work we can do. And I think there's going to be kind of some massive increase in like productivity, especially in knowledge work to start and in physical work as well. But let's just think about knowledge work. I think in the in the future, and this is kind of how I at least think about AGI, is that it's a system
Starting point is 00:10:38 that does the majority of knowledge work on a computer. So what I think that means, it's not if it's like a zero sum pie and that we go from today doing, let's say, almost 100% of knowledge work in a computer to us going to 10%, AI going to 90%, and now we're doing 10x less work. I think it's going to be that we kind of work the same amount that we did before, but we're getting 10x more things done. And we don't even remember what it was like to get the amount of things down that we do today. that's what that's what the world is going to look on that fits the historical curve too right like we
Starting point is 00:11:15 we don't even know what it's like to sit down and handwrite a memo and then wait several days for it get delivered right like compare that to email for example so it seems like it would fit that curve right like you get that drastically faster more leveraged life and it's just the life you live yep yeah absolutely fascinating one one follow up question to that, Misha. You talked about your co-founder developing some pretty amazing technology that could mimic what humans do, right? And actually you mentioned, you know, seeing the world champion, world human champion at Go get beat by AI. And then I believe your co-founder developed a, you know, autonomous system that could play video games by looking at a screen,
Starting point is 00:12:05 which is pretty wild. And one of the interesting things is that, And maybe you can give us some insight into the research aspects of this, but, you know, replicating things that humans can do seems to be a consistent pattern. But one thing that's interesting about your perspective on, you know, we sort of go about our day-to-day work and we get 10x throughput, one thing that's interesting about that is, you know, is that a replacement of some of the things that I'm doing as a human? Is it augmentation? Can you speak to that a little bit? Because just replicating, you know, the keystrokes that I make in my computer isn't necessarily the way to get 10x, right? And I think we know that context is something that the AI is amazing with, right? It can take context and really do some amazing things with it. So can you speak to that a little bit in terms of replicating humans augmenting? What does that actually look like? Yeah. So the first thing that I'll say is that, The kind of algorithms developed leading up to the small that we're in right now at the Ayan. And the things that you mentioned to that Janus, my co-founder, worked home, which were called G-Q networks in the case of video games and AlphaGo, in the case of the Go example, were actually superhuman.
Starting point is 00:13:25 So they got to a human level, and then they exceeded it and became superhuman. So when you look at an AI system playing Atari, it looks kind of alien. because it's just so much better now, you know, than a human could be. And the same thing is true for Go. And now what you said was right in that the way these systems are trained, they, especially like let's take AlphaGo on it as an example, it had two phases. The first things was you train it to mimic human human behavior. So you have all these games, online games of Go, like similar to how just has online game
Starting point is 00:14:01 servers. Sure. They're a bunch of like online game servers for Go. and they picked a bunch of those games and filtered them for like the expert amateur humans and taught a model to basically imitate like expert amateur human behavior. And what that ended up getting
Starting point is 00:14:17 was just a model that was pretty proficient but still just kind of human model. And then after that, they trained that model that, you know, that sort of human level model with reinforces learning based on feedback of whether the model was winning the game or not. And the thing with reinforced learning is that you don't need demonstrations from people
Starting point is 00:14:36 and you just need a criteria for whether the thing that the model did was correct. And as long as you have that, which in the case of the game of Go is, did you win the game or not? Sure. You can basically push it almost, you know, if you throw enough compute at it,
Starting point is 00:14:52 it will get to superhuman model. It will just find strategies have never even thought of it. And that's kind of what ended up happening. So there's a famous move called Moot 37 in the game of Alps. to go against Lisa Dahl, the world champion in Go. And Move 37 was a move that looked really bad at first.
Starting point is 00:15:12 Like analysts were looking at it were confused, and Lisa Dahl was confused. Everyone was just really confused by it. And then it turned out a few moves later that it was actually a really creative play that was just really hard for people to wrap their minds around. And it turned out to be the right play in retrospect. So we have that, that is all in what I'm trying to say is like, we have, we have, the blueprints for how to build superhuman intelligence systems. And so I think we are heading into an era of superintelligence. Now, it does not necessarily mean super intelligence at everything,
Starting point is 00:15:48 but we will have models that are super intelligent at some things. Well, I think that's a great time to talk about reflection. So tell us about reflection, and because that's a focus of what you're trying to do at reflection. So tell us about reflection and what you're working on. Before we jump into that, just because I think I've seen a lot of this thrown around in like news articles and stuff. So you've got AGI, right? And you've got this superhuman.
Starting point is 00:16:16 And I think there's been some chat around that like, oh, like we're like moving past AGI to superhuman. It'd be awesome, I think, for the listeners to just take a minute and be like, all right, what do we mean AGI? Obviously, that's like general intelligence, superhuman. And then, like, just parse that up for them a little bit. Because I think those words already are just getting, like, thrown around. Sure.
Starting point is 00:16:35 People repeat them and, like, you know. What does it mean to go beyond human-level proficiency and be superhuman? Yeah, right. Yeah. Yeah. And I think, you know, if we put other words into the mix that may be good to kind of talk about later, is also the word agent, right? I think the word...
Starting point is 00:16:54 Yeah, yeah, let's throw that into the API and then super... Yeah, exactly. It can mean many things. So at least the way I think about it is first I don't think about binary events like there's AGI and then there's super basically AGI I think about more as a continuous spectrum and that's kind of how like in the game of Go for example there was no it's really hard to pinpoint a moment when it went from you know human level intelligence to super human like the curve is actually smooth like so it's a it's kind of a smooth continuum and even, you know, subhuman intelligence, like it's smooth from subhuman to human up to superhuman. So it's really around, like, if we have discovered methods that scale, that the more kind of compute and data we throw at them, the just predictably, right, they scale in their intelligence, then that's, those are kind of the systems that we're talking about. So to answer your question,
Starting point is 00:17:53 to me, the distinction between subhuman intelligence, human intelligence, and superintelligence is just where I'm a smooth curve of intelligence are you. Now, it's helpful to be, you know, yeah, it helps to define what some of these things are. And different people have different definitions for AGI. I think that there isn't like a centralized, like the community has converged on what people agree it to be. But we have a version that we're working with, a working version that is kind of meaningful to us. And that's kind of how we think about AGI, which, is it's a functional definition. It's just, we're thinking about digital AI. We think the same
Starting point is 00:18:34 thing can be applied to physical AI. It's a system. We don't know how, like, we don't know, it can be a model, it can be a model with, you know, with tools on a computer, but it's a system that does the majority of knowledge work on a computer. And notice I'm not saying the majority of knowledge work that people do today, because I think the knowledge work that's done, done even a few years from now is going to look largely different. So just at a given point in time, when you assess the work that's being done in a computer that's struggling economic value, is the majority of that being done by humans or by computer, basically the computers themselves.
Starting point is 00:19:15 And to me, that's kind of what AGI is. So it's more a functional definition. And what that means is that the only benchmark that matters is whether AI is doing meaningful work for you on a computer. It doesn't matter what math benchmark it's solved. It doesn't matter. None of the academic benchmarks matter whatsoever. All that matters, is it doing the meaningful work for you on a computer or not?
Starting point is 00:19:37 And so what's an example of like products that I think, you know, make meaningful impact along that kind of benchmark? Let's say GitHub co-pilot. Right. GitHub co-pilot, you can just track like the amount of, right, like code that it writes versus the amount of code that the person writes. Now, of course, you also have to decouple like the amount of time and software engineer. thinks about the design of the code and things like this. But it's hard to argue that it's not doing work on a computer. Like, it's definitely doing some work on a computer.
Starting point is 00:20:07 And so on the smooth spectrum from, you know, subhuman intelligence to human intelligence, superintelligence, I think co-pilot is on that spectrum, right? It might not be general intelligence, but it's on the way there. So quick follow-up, and then I definitely want to dig in on reflections application of superhuman intelligence. But something that's frustrated with me a little bit, and how we talk about this
Starting point is 00:20:29 is we've got this like AI curve that you just explained but then we treat the human intelligence as like a static factor like some kind of standard to get to and like I would I mean the way I think about it
Starting point is 00:20:43 is like that human intelligence has changed over time for sure and we'll continue to change and I think there's an aspect of like whenever we talk about AGI like when is AGI going to happen it's like well I think the humans are going to get more intelligent too
Starting point is 00:20:55 and that like you know Like even with a game of go example, I would think it's very possible that like if somebody used, you know, this model to essentially like learn new go strategies and therefore like they're better too. Now maybe, you know, maybe the AI is still better than them overall. So like maybe just briefly like I love your thoughts on that. I think that's actually exactly what's happened. That the go community and the chess community, they both, yeah, they both learn from the AI systems now. So like what made move 37 special. people analyzed it and have incorporated that into their gameplay. One of the things I'm really excited about is, you know, I just remember what my life was like as a theoretical physicist, which is, I mean, it was very like theoretical thesis, like, you know, write equations on a chalkboard and, you know, derive things with pencil and paper. And you basically sit in the room, think really hard, derive things, go talk to collaborators and, you know, kind of try to sketch out
Starting point is 00:21:56 ideas on the chalkboard. And what I'm really excited about, you know, AI, especially AI that say super intelligent in some aspects of physics, that it's going to be this sort of patient and infinitely available thought partner for scientists to be able to do their best work. So I think that kind of for a while, it's going to be the combination of, you know, the scientists together with an AI system that works together to accomplish something. Because something that's kind of counterintuitive, we usually think about intelligence is this very general thing because humans are generally intelligent.
Starting point is 00:22:32 And these AI systems are generally intelligent and it will continue to be as well. But general, in their case, means something different than in our case. That is to say, they can be intelligent across many things, but there are some things where they're not going to be as intelligent that are counterintuitive to us because you're like, wait, that's like so easy for us. It's kind of like we, yeah, we have these systems for playing like Go, but it's really hard to train robots to, you know, like move a cut somewhere or something like this, right? Right. Yeah. Yeah. Yeah. So yeah, that's how I kind of see the interplay. I think that this
Starting point is 00:23:06 universal generality as we see it as sort of maybe if possible, but as someone in lucid goal, these AI systems like end up spike at many things that are counterintuitive to us and end up being, you know, pretty done with many things that are kind of intuitive and we'll sort of co-evolve together with them. Yeah. Yeah. That's such a helpful perspective. I want to return to the point that you made around the definition of AGI or the working definition reflection around, you know, AI doing the majority of knowledge work on a computer, but with the important distinction that, you know, that's not just a wholesale replacement, you know, so it's not like, you know, the human is not even interacting with a computer. It's that the knowledge work that a human
Starting point is 00:23:52 does actually changes. And I think that's a really helpful. helpful mindset to have in that when we talk about, you know, the future of AI, we tend to think about how it impacts the world as we experience it today when, in fact, it will be a completely different context, right? There will be new types of work that don't exist today, you know, which is really interesting. So just appreciate that. There'll be things that it's bad at. Like there'll be lots, maybe more human cup movers. and the equivalent of that maybe and knowledge work that
Starting point is 00:24:28 yeah be interesting yeah there was actually a scene from I think it was like Willie Wonko you know Charlie and Chocolate Factory yeah and it's I think it's that
Starting point is 00:24:39 Tim Burton Johnny Depp one where they show like his father being on the conveyor belt line and like screwing on the caps to like a piece of toothpaste and then one day he gets replaced by a robot that does that when I was at Berkeley I studied
Starting point is 00:24:53 robotics and you know how to make robotics autonomous and then I thought about that and it was like that's actually a really hard problem yeah you know like that requires dexterity that requires like it's all those things that you know in the movies we think like you can you can do that easily that that that was like a one of those things that's counterintuitive it's really hard yeah sure that's hilarious yeah I mean that was truly truly fantasy you know in the movie well let's jump over to reflection so you described reflection I mean you and your co-founder have backgrounds and research. And so I'm assuming that's still a big part because you're trying to solve some really hard problems, which requires research, you know, but you're also
Starting point is 00:25:33 building things that, you know, that people can use. You're, you know, I know you're still early on the product side of things, but what can you tell us about, about what you're working on and what you're building? Definitely happy to share more. So the way we think about our company And the way we thought about it since we started it is that we've been on the path as researchers of building AGI for the better part of the decade now. Or that was kind of our interest, right? Janice, my co-founder, joined Deep Mind in 2012 as one of the founding engineers when it was just a crazy thing to even say that it just seemed like a complete sci-fi dream that you want to work on AGI. and in the scientific community, most people kind of ostracized you if that's kind of what you want to do, because it was just such a crazy, like, almost unscientific thing to say.
Starting point is 00:26:30 Like, it's just not serious. And so he joined at that time. And this is when, like, these methods and, like, reinforcement learning were developed that result in these projects like deep few networks in AlphaGo. But ultimately, you know, what the reason he joined, the reason I joined, you know, AI as a researcher, is this belief that at first it was pretty vague. Like, what does it mean? There's a belief that, like, maybe we can build something like AGR within our lifetime, so it might as well try it, like, to see the most excited thing we can do. But since then, I think it's gotten a bit more concrete.
Starting point is 00:27:03 And now I think we're in a world where this definition of the system that does majority of meaningful knowledge work on a computer is in the realm of possibilities. Like, it's not, it doesn't feel like sci-fi to me at all. It seems like something that we're just inevitably headed towards. And so if that's a system you want to build, you then have to think, backwards towards what does that mean from a product perspective, from a research perspective. And we basically started thinking about, well, what does the world look like a few years from now? Like, you know, once we start making it as a field a wedge into starting to do some meaningful
Starting point is 00:27:38 knowledge or computer, where does that even start? Where does that happen? And what does the world look like? And one useful place to think is that now that we have, you know, before language models, we didn't even know what the form factor would be. The fact that language models worked was pretty crazy. It surprised everyone, and it's still today, I just remember what the world was like before them. And it's just kind of magic that it even worked.
Starting point is 00:28:02 This is one of those things, right? It happened, and we don't care. Like, language models are just magic. Yeah. I just want to stop and appreciate that you have been researching AI for a decade, and that's the way that you describe it, was that everyone was surprised at this. because I was thinking, I wonder if Misha, you know, sort of could see this acceleration happening, but it sounds like, you know, that was a pretty surprising leap forward.
Starting point is 00:28:29 You know, I saw it happening before my eyes because I was, you know, like many researchers, was on the front line, but there was always this question among many researchers, myself included, which was like, yeah, it works at this and this, but will it really scale and do these things, you know, and different AI researchers at different points in time in their careers got scaling killed and realized that, wow, these things do scale. Some people, for some people, it happened earlier. I think the opening eye crew had happened earlier. I was, I would say, somewhere middle on that spectrum.
Starting point is 00:29:01 So, you know, early enough where got to be, you know, part of like the early team in Gemini and really built that out. But still, I feel like it felt like I was a bit late to the game. Fascinating. Okay, sorry to interrupt. Okay, so reflection, you are, looking several years ahead, imagining what it takes to, you know, for AI to do a majority of knowledge work on a computer and you're working back. So where did you like, where's the focus,
Starting point is 00:29:30 right? Like, you know, because that's a pretty broad thing, right? Like knowledge work on a computer is pretty broad. Yeah. So I'll start with the punchline first and kind of explain why, just contextual lines. So we decided that the problem that needs to be solved is the problem of autonomous coding. So if you want to build for this future, if you have systems of doing majority of knowledge working a computer, you have to solve the autonomous coding problem. It's kind of an inevitable problem that just must be solved. And the reason is the following. Language models, like the way language models are most likely going to interact with a lot of software, a computer is going to be through code.
Starting point is 00:30:13 You know, we think about interactions with a computer through keyboard and mouse because the mouse was designed for this. And by the way, right, the mouse was invented, what, like 60 years ago, like the angle bar kind of mother of all demos was in the 1960s. So it's, you know, it's actually like a pretty new thing. And it was an affordance that unlocked our ability to interact as computers. Now, we have to think about for AI, like knowing that now the form factor that's really working as language models, what is like the most ergonomic way for them to do work on a computer?
Starting point is 00:30:44 And by and large, it turns out that they actually understand code pretty well because there's a lot of code on the internet. And so the most natural way for a language model to do work on a computer is basically through function calls, API calls, and programmatic languages. And we're starting to see the software world kind of evolve around that already, like Stripe a few months ago released an SDK that is built for like a language model to basically, trans-act on Stripe reliance. And we think that a lot of the software, like Excel, for example, do we think that a language model is going to drag, you know, and AI is going to drag on us around what
Starting point is 00:31:21 people do to, like, click a table in Excel and manipulate data that way? Almost certainly not. It's going to probably do it through, again, through function calls, right? We have like, we have SQL, we have querying languages. And so we kind of need to think about how do we believe software will get re-architected in a way that is ergonomic to AI statistics. So that's how we're thinking about things. And if you think about that way,
Starting point is 00:31:44 you just realize that, I mean, there's always going to be a long deal of things that, you know, there are going to have code affordances for, but a lot of the meaningful work will, like, a lot of these big pieces of software that people use today and, you know, where you do most of your work today, will have, like, affordances through basically programmatic affordances. So if that's what we believe the world looks like,
Starting point is 00:32:05 at least, you know, like a significant part of knowledge worker computers done that way, then the bottleneck problem is, okay, assume it has all these programmatic affordances, how do you build the intelligence around it? And so the intelligence around that is an autonomous coder. It's something that kind of, you know, it's not just generating code. It's also thinking, it's reasoning, right? It's saying, I think now I need to go like an open up this file and, you know, search for this information and then, you know, maybe send an email to this person, right? Like, it needs to be thinking and kind of reasoning, but then it was, it's acting on a computer to code.
Starting point is 00:32:42 So we kind of thinking backwards and we thought about, okay, what is the category that today, like basically today has like the affordances that we need to start and it's like very valuable and something that we do all the time that we can have kind of high empathy for as product builders. And so we, it inevitably just converged on applying this coding for us, both because we believe that that's sort of the gateway problem to automation, a whole bunch of pieces of software. They're not coding.
Starting point is 00:33:13 But coding is also the problem setting where that is ripe today for language models because the ergonomics are already there. Like, because language models are good at code because there's a lot of code on the internet. And so you don't need, the ergonomics are there. They know how to, like you can build tools to read files, to at current terminal, to read documentation. And so it's just kind of a right category today that's truly valuable that we understand very well, that also is the bottleneck category to the future.
Starting point is 00:33:43 So that was how we ended up sort of kind of centralizing on code. Fascinating. Talk a little bit about automation. So, you know, operating autonomously makes total sense with code. What are, you know, based on your study of AI over the last decade, that's an area that's that's right for this what are other areas where you think that automation you know what are the other areas you think in the relative near term you know are are right for automation as well there are a bunch of them the way I would think about it the way we think
Starting point is 00:34:23 about it and this is true both for what we're doing and both for other companies that are working in kind of the sort of yeah automation with with AI kind of building building autonomous agents be it for coding or something else is that what you're really saying, I think that a good analogy here is this sort of transportation like we're going from cars as they are today to autonomous vehicles. I think it kind of lands here as well. And the way to think about is that chat bots like chat GPT and complexity and get a co-pilot, these products that are much more, you know, chat, like you asked them something to give you something back. We think about them as like the cruise control of the vehicles of
Starting point is 00:35:05 transportation vehicles because they kind of work everywhere. They're not fully autonomous on anything really yet, but they work everywhere. And so there are these general purpose tools that are kind of, you know, that are cruise controls, augment the human. Now, if you're trying to build a fully autonomous experience, like, you know, people refer to as agents today, the same thinking, you know, it's much closer to how you would think about designing an autonomous vehicle. Autonomous vehicles don't work everywhere from day one.
Starting point is 00:35:33 They have a geo-fencing problem. And the kind of players that, one, are, you know, Waymo, I think is, I got on a Waymo when I was in San Francisco last. It was just this magical experience. And they did a fantastic job by basically nailing San Francisco, and they geo-offense it. And you can't be on high roads. You can't do all these things that you can do a normal car. But within the geo-franced area, it works so well that it's just a transformative magical experience. And I think that is how people should be thinking about autonomous agents.
Starting point is 00:36:05 So we shouldn't be actually promising, you know, we're promising like a fully autonomous vehicle like in the future. So right, we're promising a thing that automates a lot of stuff in the computer future. That's clear where things are going. But today, the important problem is geopensing. And so, yeah, what are examples of that? I think customer support is an area that has shown this kind of workflow work really well. How does a geofencing analogy, you know, transfer there?
Starting point is 00:36:29 it's that some tickets that your customers, you know, are asking about can be fully resolved. Like, maybe they have a simple question that's actually an FAQ or something like this. And so you'll route that to an autonomous agent that will just solve that way. And the tickets that are more complex, you'll send to a human. Or if, like, the customer asks it to be escular, you'll send it to a human. So there's a sort of a, like, I think that successful product form factors in agency and autonomy have this sort of geo-fencing bake into them, that they kind of take on that thing they can do well and then help the customer outsource the thing that they can't do well
Starting point is 00:37:04 yet to like the normal you know state of theirs so I'm curious your opinion on this I think there's an interesting like loop here where yeah like it makes total sense like interact with this AI thing and then like human you know human in the loop type thing but I think there's also this aspect of enough companies have to like generally be able to do this well from a like human adoption standpoint, right? Because let's say that like this, say this was a solved problem, but essentially,
Starting point is 00:37:39 to say it was a solved problem, but essentially 5% of companies like have the technology where this works well, like humans are going to be like, I want to talk to a person. They're just going to like, you know, try to get past the AI agent as soon as possible. So I'm curious about your thoughts with that
Starting point is 00:37:54 because there's this like, what's possible problem? And there's like, will humans adopt it? Will humans use it? because you guys must, you know, face that building a product. Yeah, so I think for us and for others, like, to complete the customer support thing, the ideal experience that the human doesn't even know.
Starting point is 00:38:10 Like, it's just the customer came in, their problem got solved, and they didn't know, they didn't care or know who was solved for them. Yeah, give it a name, give it a face. Right. Yeah. And that's the way we think about kind of autonomous coding. So the kinds of things, you know, so when we think about geo-fencing, we think about, you know, you want to go for tasks.
Starting point is 00:38:29 that are actually pretty straightforward for an engineer to do, because these models aren't, you know, like super-analytic yet. But you want these tasks to be things that are tedious and high volume and that engineers don't like it. So there's so many examples of these things, like code migrations. There's so much part of like a migration when you're moving like this version of job with that one that is kind of sanguous work or, you know, testing tests.
Starting point is 00:38:53 Suppose you're relying on a bunch of third-party APIs or dependencies, it got, you know, an API got updated, it wasn't backwards compatible, your code fails, your engineer has to change what they were doing to go fix that. And right, it's sort of, again, undifferentiated work that's, especially for companies that have very sophisticated engineering teams and are doing a lot, they end up having this sort of backlog of these kind of tedious small tasks that actually not really like more differentiating tasks for them as a business at all. And so these are the kinds of tasks where a product like ours comes in.
Starting point is 00:39:25 We, you know, when customers kind of talk to us, we, like, they don't even think necessarily of like a co-pilot like product because they think about if we can just automate these, you know, for them, some subset of them, right? Some subset of these migration tasks or like 30-a-I-breaking or have some subset of her backlog, then it's something their engineers never even have to do. And so, whereas like the co-pilot helps them do the things that are on their plate faster. And interesting. From the developer's perspective, like so much as customer support use case, it should be indistinguishable for the task where it works from like a competent engineer sending them a pull request to review, right?
Starting point is 00:40:06 Like a failure mode for a company that does autonomous coding is that you took on more than you could chew and your agent is sending bad pull requests and now the developers are wasting their time like really junk code. Right, right. So as long as from like, you know, you have to be pretty strategic about the tasks that you're, that you pick on and sort of not promised, you know, set expectations correctly, and deliver an experience that is basically indistinguishable from a competent engineer doing this. Yeah. So that's really interesting. So essentially, and I mean, this is an overused term, but essentially like this could look like some kind of like self-healing component of an app. So like from a from an engineer's perspective, like you could engineer this end of the app and it's able to autonomously take care of API updates and maybe a couple other things. Yeah, that's really interesting.
Starting point is 00:41:04 One question I have is around what it takes to get too fully autonomous, right? So we use the example of tests or API integrations or other things like that. is there, and you use the example of self-driving vehicles, right? Even within the context of geo-fencing, right, for Waymo, still in the development curve of that, you know, they had vehicles that could do a lot of stuff, but like the last 20%, 10%, was really hard
Starting point is 00:41:38 because they had to deal with all these edge cases, even though, you know, geo-fencing, I think, helped, geo-fencing helped, you know, limit the scope of that, but it was still really difficult to, you know, to solve for all these edge cases. Is it the same way when you think about a autonomous coding, like, is the last 10% really difficult to go from, you know, this is something where it is truly autonomous? Yeah, I think that it's, there's kind of a yes and no part to that. So the part where I think the analogy to the autonomous vehicle breaks is that an autonomous
Starting point is 00:42:14 vehicle is really autonomous and like safety is so important that there's like absolutely no way it can do anything wrong, right? But in this instance, right, suppose that like a coding agent did most of what you asked them to do, but didn't do like, you know, it missed some things. Well, if it was, if it did stuff that was pretty reasonable, right, then you just go into code review and tell, hey, you missed this on this. Just like you were within developers. So I think that the kind of failure tolerance is higher, like, you know, there's more tolerance
Starting point is 00:42:43 in like digital applications like this. Now, the thing is what you want to avoid is, you know, a model that you asked it to do something, and it came back and just, it just wasted your time, basically, right? It's like there's, it's kind of, the amount of time that would save me to go, like, back and forth with this thing,
Starting point is 00:43:01 it's just wasted time. So it's similar to how, like, when you hire someone, if you, if it's someone who, like, let's say, it was just not trained in a software engineer and, like, it would take longer to, like, upskill them and training like to be a software and you have just to do the task yourself. So I think that the actual
Starting point is 00:43:18 eval is like, is this like net beneficial to you as a developer? Like are you spending less time doing things you don't like to do with this system or not rather than like meeting that level of perfection and the time of speed that has? Makes total sense. Okay, you mentioned evals early in the show
Starting point is 00:43:37 when we were talking earlier and how you said that's one of the most important. important aspects of this, especially as it relates to data. So, I think the last topic we should cover is your question, John, which we made everyone wait a really long time around, you know, data teams. Yeah, exactly, exactly. So, John, why don't you revisit your question?
Starting point is 00:44:02 Because I want to wrap up by talking about the data aspect of this. I mean, I could keep going asking a ton of questions because it's so interesting. but yeah i think you know obviously a lot of our audience you know works on data teams and i i think i'm personally curious and i bet a lot of the audience is curious about what what does it look like so say i'm a data team that works for reflection i'm on that data team and dealing with AI agents and and on a daily basis like what what is it how is it similar to what i might do at a b2b tech company or or an industry and and what are the the main differences?
Starting point is 00:44:41 Something, as I mentioned earlier, when you first asked the question, I think something that is possibly the most important thing to any successful, like, AI project, product, or research is getting your evaluations right. So actually the most successful AI projects, they typically start with some phase where they spend, they're not training any models, not doing anything like this, they're just figuring out, like, how are we going to evaluate in success? And the reason, this is something that when we typically adopt, let's say, when you see all these coding products and like AI products in market, there is sort of like shooting from the hip thing where it's like, I put it through some workflow, periodo customer, like, does it have value or not. Whereas the way, like, I've seen, like, successful products like this built out, like,
Starting point is 00:45:29 how does, for example, like, when a company develops a language model, like, a GVT model, whatever, how does it know the thing that's turning, like the people, that users will like it, right? You have to develop all these evaluations internally that are really well correlated to what your customers actually care about. And so in the case of chatbots, like that evaluation is basically like preference. It's, you have your data team, like what it does, what does a data team do for, like, a normal, like, language model of chat bot like product? They get a lot of data from human ratings that is, you know, they have different prompts.
Starting point is 00:46:07 And then, you know, those raters basically, you know, say which ones, which prompts they liked more over the others. And so typically, it means that the thing that gets upweighted is, like, more helpful responses, things that are formatted nicely, things that are, you know, safe, right? like they say they're not offensive. And those, it's really important to set up those e-vows that you're benchmarking internally to actually correlate with what your customers actually care about in your end product. And I think that that's something that it's kind of a new way of operating because these systems aren't deterministic, like software as we know it. And so when you're shipping like something that is probabilistic that is going to work in some
Starting point is 00:46:49 cases, not work in other cases, you have to come in with some degree of confidence, like whether, you know, we're coming to a customer, sometimes our use cases will not be a good fit for us because we built evals and we were able to predict that actually for these use cases, like the models are not ready yet. Yeah. Can you give us an example of just a really simple eval, like what that would, what that would look like? Yeah.
Starting point is 00:47:11 So, for example, like for coding, right, that's kind of what we're building these autonomous coding models. And the Eval, what is the Eval there? The Eval there will be, from a customer perspective, will they actually merge the code that are used to proposed? And how long of the interaction or back and forth will it take them to merging, right? So then the question is, well, we want that experience to be delightful for customers. We're not going to, we don't want to like set up complex e-vals for every customer because
Starting point is 00:47:43 that's just going to be a waste of their time. So it's how do we set up internal evals that are kind of represent? of what our customers care about. And so an example of this is, well, if we care about the merge rate, like the merge rate of pull requests from our customers, then we should be tracking, like, the merge rate on similar kinds of tasks to our customers help. So, you know, some things that we, right,
Starting point is 00:48:04 so we have different task categories like migrations, cybersecurity vulnerabilities, these sort of third party like API breakages. Right. And, you know, your data team, what it does is that the Eval side of it that it curates data sets and representing them now. And then for every version of our model,
Starting point is 00:48:24 we basically run at Tunes E-Vowls and we have different e-vows for different use cases. And we're seeing where our model stack up. Some of them they do better, some they do worse, but it allows us to come to customers and when we've identified a use case that has a good fit, have high confidence that will be a delightful experience. And I don't think most teams
Starting point is 00:48:44 that build products that may not come from our research background are as scientific about it because it's kind of setting up the e-val takes a really long time. And it's just kind of a pretty complex process, right? Where are you going to, like, where are you going to source, like, the coding raters or are going to basically rate whether you merge these things or not? How are you going to manage that team? Where are you going to source the tasks from that are representative of what your
Starting point is 00:49:11 customers care about? These are the kinds of questions that the data team answers. And more so, right, beyond that, it's how do we collect the data that we need to train models to be good at the things that the customers care about? At various aspects, right, how do we collect data for supervines fine tuning? How do we collect data for reinforcing learning? So the data team, you need to be as nimble on data research as you are on like basically software and model research, right?
Starting point is 00:49:38 We think a lot about algorithms and model architectures and things like that. And the thing that it may be is equally important, but less frequently talked about, like in papers, is the data research that needs to go into, like, operational data research to make sure that these systems are reliable of the things you carry out. Right. That's so interesting. And very true of people, too. Well, I was just going to say, there's timeless, there's a lot of timeless wisdom in that approach as well. Well, as we say, we're the buzzer, Misha. I do want to ask one really practical question.
Starting point is 00:50:12 I know reflection is still, you know, in stealth mode in many ways, but I know probably a lot of our listeners have tried or are exploring different tools around augmenting the technical work that they do every day. From your perspective, if someone is saying, okay, you know, I see all these posts on hacker news about, you know, these tools and, you know, bots, it can help me, you know, or co-pilots,
Starting point is 00:50:38 it can help me write code. Where would you encourage people to dig in if they feel either overwhelmed or they're kind of new to exploring that space of like AI augmented technical work and coding specifically? I think that if people are just kind of dipping their toes and just getting started and trying to explore this space, the best thing is to sort of use products that are, you know, use coding products that like a co-pilot or cursor that are these kind of initial you know like they're kind of as you're talking about like cruise control right i think that that's how i actually you know i started using both products like a lot of
Starting point is 00:51:16 members of our team use those products and you know they've been very very informative and as i said kind of in a sense sort of complementary i think that getting like getting autonomy right and getting agency to work is a more complex and nuanced problem and typically what we find we talk to customers By the time they're thinking about a timeian agency, they've already been using co-pilot for some time and they're pretty well educated on what kinds of problems they believe they have or can be automated. So if it's selling coming from blank slate,
Starting point is 00:51:48 I would kind of take an off-the-shelf product like a copilot or a cursor and give that a shot and sort of start just trying it out empirically and seeing what sorts of values drive and the one. Love it. All right. Well, Misha, best of luck as you continue to take. dig into research and build product. And when you're ready to come out of stealth mode,
Starting point is 00:52:09 of course, you know, tell John and I so we can, you know, so we can kick the tires. But we'd love to have you back on the show to talk about some product specifics in the future. That sounds great. Thanks, Eric. Thanks, John, for having you. The Datastack show is brought to you by Rudderstack,
Starting point is 00:52:23 the warehouse native customer data platform. Rudderstack has purpose built to help data teams turn customer data into competitive advantage. Learn more at Rudderstack.com. Thank you. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.