The Data Stack Show - Re-Air: The Future of AI: Superhuman Intelligence, Autonomous Coding, and the Path to AGI with Misha Laskin of ReflectionAI
Episode Date: October 22, 2025This episode is a re-air of one of our most popular conversations from this year, featuring insights worth revisiting. Thank you for being part of the Data Stack community. Stay up to date with the la...test episodes at datastackshow.com.This week on The Data Stack Show, Eric and John welcome Misha Laskin, Co-Founder and CEO of ReflectionAI. Misha shares his journey from theoretical physics to AI, detailing his experiences at DeepMind. The discussion covers the development of AI technologies, the concepts of artificial general intelligence (AGI) and superhuman intelligence, and their implications for knowledge work. Misha emphasizes the importance of robust evaluation frameworks and the potential of AI to augment human capabilities. The conversation also touches on autonomous coding, geofencing in AI tasks, the future of human-AI collaboration, and more. Highlights from this week’s conversation include:Misha's Background and Journey in AI (1:13)Childhood Interest in Physics (4:43)Future of AI and Human Interaction (7:09)AI's Transformative Nature (10:12)Superhuman Intelligence in AI (12:44)Clarifying AGI and Superhuman Intelligence (15:48)Understanding AGI (18:12)Counterintuitive Intelligence (22:06)Reflection's Mission (25:00)Focus on Autonomous Coding (29:18)Future of Automation (34:00)Geofencing in Coding (38:01)Challenges of Autonomous Coding (40:46)Evaluations in AI Projects (43:27)Example of Evaluation Metrics (46:52)Starting with AI Tools and Final Takeaways (50:35)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
Hey everyone, before we dive in, we wanted to take a moment to thank you for listening
and being part of our community. Today, we're revisiting one of our most popular episodes in the
archives, a conversation full of insights worth hearing again. We hope you enjoy it and remember
you can stay up to date with the latest content and subscribe to the show at datastackshow.com.
Hi, I'm Eric Dots. And I'm John Wessel. Welcome to The Datastack Show.
The Datastack Show is a podcast where we talk about the technical, business,
business and human challenges involved in data work.
Join our casual conversations with innovators and data professionals to learn about new data
technologies and how data teams are run at top companies.
Welcome back to the Datasack show.
We are here today with Misha Laskin and Misha, I don't know if we could have had a guest who
is better suited to talk about AI.
because you have this amazing
pass you and your co-founder
working in sort of the depths of AI
doing research, building all sorts of fascinating
things, you know, being
you know, part of the history's
background of acquisition by Google
and, you know, on the deep mind side
and some amazing stuff there. So
I am humbled to have you on the show. Thank you so much
for joining us. Yeah, thanks a lot, Eric. It's great to be here.
Okay, give us just a brief
background on yourself, like the quick overview, how did you get into AI and then, you know,
what was your, what was your high level journey? So initially I actually did not start an AI. I started
in theoretical physics. I wanted to be a physicist since I was a kid. And the reason was I just
wanted to work on kind of what I believe to be the most interesting impact of scientific problems
out there. And, you know, the one discalibration that I think I made is that when I was reading back
and like all these really exciting things that happened in physics.
They actually happened basically 100 years ago.
And I sort of realized that I missed time.
You know, you want to work on not just impactful scientific problems,
but the impactful scientific problems of your time.
And that's how I made it into AI.
As I was working in physics,
I saw the field of deep learning growing
and all sorts of interesting things being invented.
I actually would maybe get into AI is seeing AlphaGoat happen,
which was this system that,
was trained autonomously to beat the world champion at the game of Go, and I decided I
needed to get into AI then. So after that, I ended up doing a postdoc in Berkeley in this lab
called Peter Beals Lab, which specializes in reinforcement learning and other areas of deep learning
and then I joined DeepMind and worked there for a couple of years, where I've been my co-founder
as we were working on Gemini and leading a lot of the reinforcing learning efforts that were
happening at jump at the time. Yeah, so many topics we could dive into Misha. So I'm going to have
to take the data topic. So I'm really excited to talk about how data teams look the same and how
they look a little bit different when they're working with AI data. What's a topic you're excited
to dig into? I think on the data side, there are many things I'm really interested in, but
something I'm really interested in is how do you set up kind of evaluations on the data side that
ensure that, you know, you can predict where your AIs will be successful.
Because when you deploy AIS to a customer, it's sort of, you know, you don't know exactly
what the customers townists are. And so you need to set up evals that allow you to kind of predict
what's going to happen. And I think that's part of, a big part of what a data team does,
is setting up evaluations. And it's maybe one of the least, maybe it's one of the last
things that a lot of people think about and think about AI because you think about language models
and reinforcement learning and so forth.
But actually the first thing that any team needs to get right in any AI project
is setting up clear evaluations of matter.
And so on the data side, that's something I'm really interested in.
Awesome.
All right, well, let's dig in because we have a ton to cover.
Yeah, let's do it.
Misha, I obviously want to talk about AI
and we want to dig into reinforcement learning and talk about data for the entire show.
But I have to ask about your interest in physics as a young child.
So you mentioned that you were interested in, you know, sort of working on some of the most important, you know, scientific problems.
And you realized, you know, okay, maybe some of those problems were actually, you know, maybe 100 years old.
But what sparked that interest as a, you know, knowing you want to get into physics that obviously, you know, you ended up not being a professional physicist.
But what sparked that interest as a young age?
Do you have like a story that you could share around that?
because, you know, knowing you want to be a physicist as a child is not the most common thing.
Yeah, I considered cowboy first, but...
Yeah, cowboy, fireman, physicists.
Well, what happened is that I...
So, I'm not from the States originally.
I'm Russia and Israeli, and then moved to the States of the kids.
And when I moved, I didn't really speak the language very well and didn't have a community here.
and so I ended up having a lot of kind of time on my hands.
And my parents had a library, you know, a number of different kinds of books.
But one of the books that they brought with them were these lectures in physics by finding.
And this is kind of a legendary set of books that I recommend anyone should read them,
even if your physics or not, because it's kind of an example of really clear and in that way,
cleared simple and very beautiful thinking. And I read those books and it was just so interesting
that the way in which Feynman described the physical world, the way in which you could make
really counterintuitive predictions about how the world works by just understanding how it works
from a set of very simple assumptions, very simple equations. And it was the short answer is I had
a lot of time in my hands and got interested actually in a lot of things. I at the time got
interested in literature as well.
Ended up double majoring literature and physics,
but there was literature and physics that I got interested in at the time
and then ended up going kind of hard committing to physics.
Wow, absolutely fascinating.
Yeah, when you're chatting before the show and you said,
you know, I realized it was 100 years too late, I was like,
oh, theoretical physics, the answer to that problem is it, you know,
is traveling through time, you know, so you can get back to, you know,
back to that era.
Yeah.
Well, it might be that the problem is also.
that we have in physics today are just so hard that it's really hard to solve them.
I think that progress in it's probably not, it's definitely not being made nearly as quickly
as it was 100 years ago and there's so much to discover.
And one of my kind of folks with AI is that we develop AIs that are smarter at this
scientists that they help us answer some of these fundamental questions that we have in
physics, which I think like to me seem like a complete sci-fi thing even a few years ago.
But now, almost counter-attuitively, it's, I think, kind of you want to read, like, theoretical math and theoretical cynics is going to be one of the first use cases that we applied as kind of next generation of models that are coming out today.
Fascinating.
Yeah.
Let's take into that a little bit, because one of the questions, just one of my burning questions to ask you was what, you know, what do you envision the future with AI to be like?
I mean, what does that look like for you?
Maybe in sort of some of the, like, the best ways possible.
So, for example, AI can help scientists accelerate progress on these like monumentally difficult problems to create breakthroughs.
I mean, that's incredible.
What other types of things do you see in the future that make you excited and in the ways that humans will interact with AI or the way that it will, you know, shape the world that we live in?
Yeah, I think that, I mean, I'm very personally, like, quite optimistic about AI.
there are a lot of, a lot of things that we need to be careful about,
especially from safety perspective. But there's one quote that I heard of friends say
that really stuck with me, which was, you know, so artificial and general intelligence,
AGI. He said, you know, I think AGI will come and no one will care.
Hmm. I hadn't heard that before. And then I thought about it. And I think that's,
I think that's what's going to happen. But from the perspective of, right, we have computers
today. We have personal computers today, which is a massively from what we, what people
had, you know, decades ago or personal phones. And I would say we, and we don't care. Like,
we just don't know our lives in any other way. Like, we don't know what life is like before computers
or before personal phones, even though, right, the iPhone, you know, I remember when it was like
not having an iPhone, but from a day-to-day perspective, I never even think about it. Sure.
So I think what's going to happen is that, you know, all of the ways in which AI is, you know,
transform us are going to be similar in perception to the way technology has transformed us
already. And so what I mean by that is that I think that in the eye, there are oftentimes like
really polarizing like either like hyper optimistic we're going to be, you know, you know,
it's going to be a completely transformed world, which obviously it is, or like doomsday scenarios,
like things are really going to go down poorly. And I think the reality is that it's a remarkable
piece of technology that's probably more
transformative than mobile
phones or computers themselves.
But the effect on us
as people is going to be that we just
live our day-to-day lives and
it changed our day-to-day lives, but we won't
even remember what rides used to be
like. So, yeah, I think what's
going to happen, for example, from a work
perspective, is that
you know, now we don't
really take notes, right, a pencil and paper.
We have, like, much better storage
systems on the computer for our notes and things like this. And so, right, we've accelerated
the amount of work we can do just by having a computer and knowledge work we can do. And I think
there's going to be kind of some massive increase in like productivity, especially in knowledge
work to start and in physical work as well. But let's just think about knowledge work. I think
in the in the future, and this is kind of how I at least think about AGI, is that it's a system
that does the majority of knowledge work on a computer.
So what I think that means, it's not if it's like a zero sum pie
and that we go from today doing, let's say, almost 100% of knowledge work in a computer
to us going to 10%, AI going to 90%, and now we're doing 10x less work.
I think it's going to be that we kind of work the same amount that we did before,
but we're getting 10x more things done.
And we don't even remember what it was like to get the amount of things down that we do today.
that's what that's what the world is going to look on that fits the historical curve too right like we
we don't even know what it's like to sit down and handwrite a memo and then wait several days
for it get delivered right like compare that to email for example so it seems like it would
fit that curve right like you get that drastically faster more leveraged life and it's just the life
you live yep yeah absolutely fascinating one one follow
up question to that, Misha. You talked about your co-founder developing some pretty amazing technology
that could mimic what humans do, right? And actually you mentioned, you know, seeing the
world champion, world human champion at Go get beat by AI. And then I believe your co-founder
developed a, you know, autonomous system that could play video games by looking at a screen,
which is pretty wild. And one of the interesting things is that,
And maybe you can give us some insight into the research aspects of this, but, you know, replicating things that humans can do seems to be a consistent pattern.
But one thing that's interesting about your perspective on, you know, we sort of go about our day-to-day work and we get 10x throughput, one thing that's interesting about that is, you know, is that a replacement of some of the things that I'm doing as a human?
Is it augmentation? Can you speak to that a little bit? Because just replicating, you know, the keystrokes that I make in my computer isn't necessarily the way to get 10x, right? And I think we know that context is something that the AI is amazing with, right? It can take context and really do some amazing things with it. So can you speak to that a little bit in terms of replicating humans augmenting? What does that actually look like?
Yeah. So the first thing that I'll say is that,
The kind of algorithms developed leading up to the small that we're in right now at the Ayan.
And the things that you mentioned to that Janus, my co-founder, worked home, which were called
G-Q networks in the case of video games and AlphaGo, in the case of the Go example, were actually superhuman.
So they got to a human level, and then they exceeded it and became superhuman.
So when you look at an AI system playing Atari, it looks kind of alien.
because it's just so much better now, you know, than a human could be.
And the same thing is true for Go.
And now what you said was right in that the way these systems are trained, they,
especially like let's take AlphaGo on it as an example, it had two phases.
The first things was you train it to mimic human human behavior.
So you have all these games, online games of Go, like similar to how just has online game
servers.
Sure.
They're a bunch of like online game servers for Go.
and they picked a bunch of those games
and filtered them for like the expert amateur humans
and taught a model to basically imitate
like expert amateur human behavior.
And what that ended up getting
was just a model that was pretty proficient
but still just kind of human model.
And then after that, they trained that model
that, you know, that sort of human level model
with reinforces learning based on feedback
of whether the model was winning the game or not.
And the thing with reinforced learning
is that you don't need demonstrations from people
and you just need a criteria
for whether the thing that the model did
was correct. And
as long as you have that, which in the case of
the game of Go is, did you win the game or not?
Sure. You can basically
push it almost, you know,
if you throw enough compute at it,
it will get to superhuman model.
It will just find strategies
have never even thought of it.
And that's kind of what ended up happening.
So there's a famous move called Moot 37
in the game of Alps.
to go against Lisa Dahl, the world champion in Go.
And Move 37 was a move that looked really bad at first.
Like analysts were looking at it were confused, and Lisa Dahl was confused.
Everyone was just really confused by it.
And then it turned out a few moves later that it was actually a really creative play
that was just really hard for people to wrap their minds around.
And it turned out to be the right play in retrospect.
So we have that, that is all in what I'm trying to say is like, we have, we have,
the blueprints for how to build superhuman intelligence systems. And so I think we are heading
into an era of superintelligence. Now, it does not necessarily mean super intelligence at everything,
but we will have models that are super intelligent at some things.
Well, I think that's a great time to talk about reflection. So tell us about reflection,
and because that's a focus of what you're trying to do at reflection.
So tell us about reflection and what you're working on.
Before we jump into that, just because I think I've seen a lot of this thrown around
in like news articles and stuff.
So you've got AGI, right?
And you've got this superhuman.
And I think there's been some chat around that like, oh, like we're like moving past
AGI to superhuman.
It'd be awesome, I think, for the listeners to just take a minute and be like, all right,
what do we mean AGI?
Obviously, that's like general intelligence, superhuman.
And then, like, just parse that up for them a little bit.
Because I think those words already are just getting, like, thrown around.
Sure.
People repeat them and, like, you know.
What does it mean to go beyond human-level proficiency and be superhuman?
Yeah, right.
Yeah.
Yeah.
And I think, you know, if we put other words into the mix that may be good to kind of talk about later,
is also the word agent, right?
I think the word...
Yeah, yeah, let's throw that into the API and then super...
Yeah, exactly.
It can mean many things.
So at least the way I think about it is first I don't think about binary events like there's AGI and then there's super basically AGI I think about more as a continuous spectrum and that's kind of how like in the game of Go for example there was no it's really hard to pinpoint a moment when it went from you know human level intelligence to super human like the curve is actually smooth like so it's a it's kind of a smooth continuum and
even, you know, subhuman intelligence, like it's smooth from subhuman to human up to superhuman.
So it's really around, like, if we have discovered methods that scale, that the more kind of
compute and data we throw at them, the just predictably, right, they scale in their intelligence,
then that's, those are kind of the systems that we're talking about. So to answer your question,
to me, the distinction between subhuman intelligence, human intelligence,
and superintelligence is just where I'm a smooth curve of intelligence are you.
Now, it's helpful to be, you know, yeah, it helps to define what some of these things are.
And different people have different definitions for AGI.
I think that there isn't like a centralized, like the community has converged on what people agree it to be.
But we have a version that we're working with, a working version that is kind of meaningful to us.
And that's kind of how we think about AGI, which,
is it's a functional definition. It's just, we're thinking about digital AI. We think the same
thing can be applied to physical AI. It's a system. We don't know how, like, we don't know,
it can be a model, it can be a model with, you know, with tools on a computer, but it's a system
that does the majority of knowledge work on a computer. And notice I'm not saying the majority
of knowledge work that people do today, because I think the knowledge work that's done,
done even a few years from now is going to look largely different.
So just at a given point in time, when you assess the work that's being done in a computer
that's struggling economic value, is the majority of that being done by humans or by
computer, basically the computers themselves.
And to me, that's kind of what AGI is.
So it's more a functional definition.
And what that means is that the only benchmark that matters is whether AI is doing meaningful
work for you on a computer.
It doesn't matter what math benchmark it's solved.
It doesn't matter.
None of the academic benchmarks matter whatsoever.
All that matters, is it doing the meaningful work for you on a computer or not?
And so what's an example of like products that I think, you know, make meaningful impact along that kind of benchmark?
Let's say GitHub co-pilot.
Right.
GitHub co-pilot, you can just track like the amount of, right, like code that it writes versus the amount of code that the person writes.
Now, of course, you also have to decouple like the amount of time and software engineer.
thinks about the design of the code and things like this.
But it's hard to argue that it's not doing work on a computer.
Like, it's definitely doing some work on a computer.
And so on the smooth spectrum from, you know,
subhuman intelligence to human intelligence, superintelligence,
I think co-pilot is on that spectrum, right?
It might not be general intelligence, but it's on the way there.
So quick follow-up, and then I definitely want to dig in
on reflections application of superhuman intelligence.
But something that's frustrated with me a little bit,
and how we talk about this
is we've got this like AI curve
that you just explained
but then we treat
the human intelligence
as like a static factor
like some kind of standard to get to
and like I would
I mean the way I think about it
is like that human intelligence
has changed over time for sure
and we'll continue to change
and I think there's an aspect of like
whenever we talk about AGI
like when is AGI going to happen
it's like well I think the humans
are going to get more intelligent too
and that like you know
Like even with a game of go example, I would think it's very possible that like if somebody used, you know, this model to essentially like learn new go strategies and therefore like they're better too. Now maybe, you know, maybe the AI is still better than them overall. So like maybe just briefly like I love your thoughts on that.
I think that's actually exactly what's happened. That the go community and the chess community, they both, yeah, they both learn from the AI systems now. So like what made move 37 special.
people analyzed it and have incorporated that into their gameplay. One of the things I'm really
excited about is, you know, I just remember what my life was like as a theoretical physicist,
which is, I mean, it was very like theoretical thesis, like, you know, write equations on a chalkboard
and, you know, derive things with pencil and paper. And you basically sit in the room, think
really hard, derive things, go talk to collaborators and, you know, kind of try to sketch out
ideas on the chalkboard.
And what I'm really excited about, you know, AI, especially AI that say super intelligent
in some aspects of physics, that it's going to be this sort of patient and infinitely
available thought partner for scientists to be able to do their best work.
So I think that kind of for a while, it's going to be the combination of, you know,
the scientists together with an AI system that works together to accomplish something.
Because something that's kind of counterintuitive, we usually think about intelligence
is this very general thing because humans are generally intelligent.
And these AI systems are generally intelligent and it will continue to be as well.
But general, in their case, means something different than in our case.
That is to say, they can be intelligent across many things,
but there are some things where they're not going to be as intelligent
that are counterintuitive to us because you're like, wait, that's like so easy for us.
It's kind of like we, yeah, we have these systems for playing like Go, but it's
really hard to train robots to, you know, like move a cut somewhere or something like this, right?
Right. Yeah. Yeah. Yeah. So yeah, that's how I kind of see the interplay. I think that this
universal generality as we see it as sort of maybe if possible, but as someone in lucid goal,
these AI systems like end up spike at many things that are counterintuitive to us and
end up being, you know, pretty done with many things that are kind of intuitive and we'll sort
of co-evolve together with them. Yeah. Yeah. That's such a helpful perspective.
I want to return to the point that you made around the definition of AGI or the working definition
reflection around, you know, AI doing the majority of knowledge work on a computer, but with the
important distinction that, you know, that's not just a wholesale replacement, you know, so it's not like,
you know, the human is not even interacting with a computer. It's that the knowledge work that a human
does actually changes. And I think that's a really helpful.
helpful mindset to have in that when we talk about, you know, the future of AI, we tend to think
about how it impacts the world as we experience it today when, in fact, it will be a completely
different context, right? There will be new types of work that don't exist today, you know,
which is really interesting. So just appreciate that. There'll be things that it's bad at. Like there'll be
lots, maybe more human cup movers.
and the equivalent of that maybe
and knowledge work that
yeah
be interesting
yeah there was actually
a scene from I think it was like
Willie Wonko
you know Charlie and Chocolate Factory
yeah and it's
I think it's that
Tim Burton Johnny Depp one
where they show like
his father being on the conveyor belt line
and like screwing on the caps
to like a piece of toothpaste
and then one day he gets replaced
by a robot that does that
when I was at Berkeley I studied
robotics and you know how to make robotics autonomous and then I thought about that and it was like
that's actually a really hard problem yeah you know like that requires dexterity that requires like
it's all those things that you know in the movies we think like you can you can do that easily
that that that was like a one of those things that's counterintuitive it's really hard yeah sure
that's hilarious yeah I mean that was truly truly fantasy you know in the movie well let's jump
over to reflection so you described reflection I mean you and your
co-founder have backgrounds and research. And so I'm assuming that's still a big part because you're
trying to solve some really hard problems, which requires research, you know, but you're also
building things that, you know, that people can use. You're, you know, I know you're still
early on the product side of things, but what can you tell us about, about what you're working
on and what you're building? Definitely happy to share more. So the way we think about our company
And the way we thought about it since we started it is that we've been on the path as researchers of building AGI for the better part of the decade now.
Or that was kind of our interest, right?
Janice, my co-founder, joined Deep Mind in 2012 as one of the founding engineers when it was just a crazy thing to even say that it just seemed like a complete sci-fi dream that you want to work on AGI.
and in the scientific community, most people kind of ostracized you if that's kind of what you want to do,
because it was just such a crazy, like, almost unscientific thing to say.
Like, it's just not serious.
And so he joined at that time.
And this is when, like, these methods and, like, reinforcement learning were developed that result in these projects like deep few networks in AlphaGo.
But ultimately, you know, what the reason he joined, the reason I joined, you know, AI as a researcher, is this belief that at first it was pretty vague.
Like, what does it mean?
There's a belief that, like, maybe we can build something like AGR within our lifetime,
so it might as well try it, like, to see the most excited thing we can do.
But since then, I think it's gotten a bit more concrete.
And now I think we're in a world where this definition of the system that does
majority of meaningful knowledge work on a computer is in the realm of possibilities.
Like, it's not, it doesn't feel like sci-fi to me at all.
It seems like something that we're just inevitably headed towards.
And so if that's a system you want to build, you then have to think,
backwards towards what does that mean from a product perspective, from a research perspective.
And we basically started thinking about, well, what does the world look like a few years from now?
Like, you know, once we start making it as a field a wedge into starting to do some meaningful
knowledge or computer, where does that even start?
Where does that happen?
And what does the world look like?
And one useful place to think is that now that we have, you know, before language models,
we didn't even know what the form factor would be.
The fact that language models worked was pretty crazy.
It surprised everyone, and it's still today, I just remember what the world was like before them.
And it's just kind of magic that it even worked.
This is one of those things, right?
It happened, and we don't care.
Like, language models are just magic.
Yeah.
I just want to stop and appreciate that you have been researching AI for a decade,
and that's the way that you describe it, was that everyone was surprised at this.
because I was thinking, I wonder if Misha, you know, sort of could see this acceleration happening,
but it sounds like, you know, that was a pretty surprising leap forward.
You know, I saw it happening before my eyes because I was, you know, like many researchers,
was on the front line, but there was always this question among many researchers, myself included,
which was like, yeah, it works at this and this, but will it really scale and do these things,
you know, and different AI researchers at different points in time in their careers
got scaling killed and realized that, wow, these things do scale.
Some people, for some people, it happened earlier.
I think the opening eye crew had happened earlier.
I was, I would say, somewhere middle on that spectrum.
So, you know, early enough where got to be, you know,
part of like the early team in Gemini and really built that out.
But still, I feel like it felt like I was a bit late to the game.
Fascinating.
Okay, sorry to interrupt.
Okay, so reflection, you are,
looking several years ahead, imagining what it takes to, you know, for AI to do a majority of
knowledge work on a computer and you're working back. So where did you like, where's the focus,
right? Like, you know, because that's a pretty broad thing, right? Like knowledge work on a computer
is pretty broad. Yeah. So I'll start with the punchline first and kind of explain why, just
contextual lines. So we decided that the problem that needs to be solved is the problem of
autonomous coding. So if you want to build for this future, if you have systems of doing
majority of knowledge working a computer, you have to solve the autonomous coding problem. It's
kind of an inevitable problem that just must be solved. And the reason is the following. Language models,
like the way language models are most likely going to interact with a lot of software,
a computer is going to be through code.
You know, we think about interactions with a computer through keyboard and mouse because
the mouse was designed for this.
And by the way, right, the mouse was invented, what, like 60 years ago, like the
angle bar kind of mother of all demos was in the 1960s.
So it's, you know, it's actually like a pretty new thing.
And it was an affordance that unlocked our ability to interact as computers.
Now, we have to think about for AI, like knowing that now the form factor that's really
working as language models, what is like the most ergonomic way for them to do work on a computer?
And by and large, it turns out that they actually understand code pretty well because there's a lot
of code on the internet. And so the most natural way for a language model to do work on a computer
is basically through function calls, API calls, and programmatic languages. And we're starting to
see the software world kind of evolve around that already, like Stripe a few months ago
released an SDK that is built for like a language model to basically,
trans-act on Stripe reliance.
And we think that a lot of the software, like Excel, for example, do we think that
a language model is going to drag, you know, and AI is going to drag on us around what
people do to, like, click a table in Excel and manipulate data that way?
Almost certainly not.
It's going to probably do it through, again, through function calls, right?
We have like, we have SQL, we have querying languages.
And so we kind of need to think about how do we believe software will get re-architected in a way
that is ergonomic to AI statistics.
So that's how we're thinking about things.
And if you think about that way,
you just realize that, I mean,
there's always going to be a long deal of things
that, you know, there are going to have code affordances for,
but a lot of the meaningful work will, like,
a lot of these big pieces of software that people use today
and, you know, where you do most of your work today,
will have, like, affordances through basically programmatic affordances.
So if that's what we believe the world looks like,
at least, you know, like a significant part
of knowledge worker computers done that way, then the bottleneck problem is, okay, assume
it has all these programmatic affordances, how do you build the intelligence around it?
And so the intelligence around that is an autonomous coder. It's something that kind of,
you know, it's not just generating code. It's also thinking, it's reasoning, right?
It's saying, I think now I need to go like an open up this file and, you know, search for this
information and then, you know, maybe send an email to this person, right? Like, it needs to be
thinking and kind of reasoning, but then it was, it's acting on a computer to code.
So we kind of thinking backwards and we thought about, okay, what is the category that today,
like basically today has like the affordances that we need to start and it's like very valuable
and something that we do all the time that we can have kind of high empathy for as product
builders.
And so we, it inevitably just converged on applying this coding for us, both because we believe
that that's sort of the gateway problem to automation,
a whole bunch of pieces of software.
They're not coding.
But coding is also the problem setting where that is ripe today for language models
because the ergonomics are already there.
Like, because language models are good at code because there's a lot of code on the internet.
And so you don't need, the ergonomics are there.
They know how to, like you can build tools to read files, to at current terminal,
to read documentation.
And so it's just kind of a right category today that's truly valuable that we understand
very well, that also is the bottleneck category to the future.
So that was how we ended up sort of kind of centralizing on code.
Fascinating.
Talk a little bit about automation.
So, you know, operating autonomously makes total sense with code.
What are, you know, based on your study of AI over the last decade,
that's an area that's that's right for this what are other areas where you think that automation
you know what are the other areas you think in the relative near term you know are are right for
automation as well there are a bunch of them the way I would think about it the way we think
about it and this is true both for what we're doing and both for other companies that are working
in kind of the sort of yeah automation with with AI kind of building building autonomous
agents be it for coding or something else is that
what you're really saying, I think that a good analogy here is this sort of transportation
like we're going from cars as they are today to autonomous vehicles. I think it kind of lands here
as well. And the way to think about is that chat bots like chat GPT and complexity and
get a co-pilot, these products that are much more, you know, chat, like you asked them something
to give you something back. We think about them as like the cruise control of the vehicles of
transportation vehicles because they kind of work everywhere.
They're not fully autonomous on anything really yet, but they work everywhere.
And so there are these general purpose tools that are kind of, you know, that are cruise
controls, augment the human.
Now, if you're trying to build a fully autonomous experience, like, you know, people refer to
as agents today, the same thinking, you know, it's much closer to how you would think about
designing an autonomous vehicle.
Autonomous vehicles don't work everywhere from day one.
They have a geo-fencing problem.
And the kind of players that, one, are, you know, Waymo, I think is, I got on a Waymo when I was in San Francisco last.
It was just this magical experience.
And they did a fantastic job by basically nailing San Francisco, and they geo-offense it.
And you can't be on high roads.
You can't do all these things that you can do a normal car.
But within the geo-franced area, it works so well that it's just a transformative magical experience.
And I think that is how people should be thinking about autonomous agents.
So we shouldn't be actually promising, you know, we're promising like a fully autonomous
vehicle like in the future.
So right, we're promising a thing that automates a lot of stuff in the computer future.
That's clear where things are going.
But today, the important problem is geopensing.
And so, yeah, what are examples of that?
I think customer support is an area that has shown this kind of workflow work really well.
How does a geofencing analogy, you know, transfer there?
it's that some tickets that your customers, you know, are asking about can be fully resolved.
Like, maybe they have a simple question that's actually an FAQ or something like this.
And so you'll route that to an autonomous agent that will just solve that way.
And the tickets that are more complex, you'll send to a human.
Or if, like, the customer asks it to be escular, you'll send it to a human.
So there's a sort of a, like, I think that successful product form factors in agency and autonomy
have this sort of geo-fencing bake into them, that they kind of take on that
thing they can do well and then help the customer outsource the thing that they can't do well
yet to like the normal you know state of theirs so I'm curious your opinion on this I think
there's an interesting like loop here where yeah like it makes total sense like interact with this
AI thing and then like human you know human in the loop type thing but I think there's also this
aspect of enough companies have to like generally be able to do this well from a like
human adoption standpoint, right?
Because let's say that like this,
say this was a solved problem,
but essentially,
to say it was a solved problem,
but essentially 5% of companies like have the technology
where this works well,
like humans are going to be like,
I want to talk to a person.
They're just going to like, you know,
try to get past the AI agent as soon as possible.
So I'm curious about your thoughts with that
because there's this like,
what's possible problem?
And there's like,
will humans adopt it?
Will humans use it?
because you guys must, you know, face that building a product.
Yeah, so I think for us and for others, like, to complete the customer support thing,
the ideal experience that the human doesn't even know.
Like, it's just the customer came in, their problem got solved,
and they didn't know, they didn't care or know who was solved for them.
Yeah, give it a name, give it a face.
Right.
Yeah.
And that's the way we think about kind of autonomous coding.
So the kinds of things, you know, so when we think about geo-fencing, we think about,
you know, you want to go for tasks.
that are actually pretty straightforward for an engineer to do,
because these models aren't, you know, like super-analytic yet.
But you want these tasks to be things that are tedious and high volume
and that engineers don't like it.
So there's so many examples of these things, like code migrations.
There's so much part of like a migration when you're moving like this version of
job with that one that is kind of sanguous work or, you know,
testing tests.
Suppose you're relying on a bunch of third-party APIs or dependencies, it got, you know,
an API got updated, it wasn't backwards compatible, your code fails, your engineer has to change
what they were doing to go fix that.
And right, it's sort of, again, undifferentiated work that's, especially for companies that have
very sophisticated engineering teams and are doing a lot, they end up having this sort of backlog
of these kind of tedious small tasks that actually not really like more differentiating tasks
for them as a business at all.
And so these are the kinds of tasks where a product like ours comes in.
We, you know, when customers kind of talk to us, we, like, they don't even think necessarily
of like a co-pilot like product because they think about if we can just automate these,
you know, for them, some subset of them, right?
Some subset of these migration tasks or like 30-a-I-breaking or have some subset of her backlog,
then it's something their engineers never even have to do.
And so, whereas like the co-pilot helps them do the things that are on their plate faster.
And interesting.
From the developer's perspective, like so much as customer support use case, it should be indistinguishable for the task where it works from like a competent engineer sending them a pull request to review, right?
Like a failure mode for a company that does autonomous coding is that you took on more than you could chew and your agent is sending bad pull requests and now the developers are wasting their time like really junk code.
Right, right.
So as long as from like, you know, you have to be pretty strategic about the tasks that you're, that you pick on and sort of not promised, you know, set expectations correctly, and deliver an experience that is basically indistinguishable from a competent engineer doing this.
Yeah. So that's really interesting. So essentially, and I mean, this is an overused term, but essentially like this could look like some kind of like self-healing component of an app. So like from a from an engineer's perspective,
like you could engineer this end of the app
and it's able to autonomously take care of API updates
and maybe a couple other things.
Yeah, that's really interesting.
One question I have is around what it takes to get too fully autonomous, right?
So we use the example of tests or API integrations
or other things like that.
is there, and you use the example of self-driving vehicles, right?
Even within the context of geo-fencing, right, for Waymo,
still in the development curve of that,
you know, they had vehicles that could do a lot of stuff,
but like the last 20%, 10%, was really hard
because they had to deal with all these edge cases,
even though, you know, geo-fencing, I think, helped,
geo-fencing helped, you know, limit the scope of that,
but it was still really difficult to, you know, to solve for all these edge cases.
Is it the same way when you think about a autonomous coding, like, is the last 10% really difficult
to go from, you know, this is something where it is truly autonomous?
Yeah, I think that it's, there's kind of a yes and no part to that.
So the part where I think the analogy to the autonomous vehicle breaks is that an autonomous
vehicle is really autonomous and like safety is so important that there's like absolutely no way
it can do anything wrong, right?
But in this instance, right, suppose that like a coding agent did most of what you asked
them to do, but didn't do like, you know, it missed some things.
Well, if it was, if it did stuff that was pretty reasonable, right, then you just go into
code review and tell, hey, you missed this on this.
Just like you were within developers.
So I think that the kind of failure tolerance is higher, like, you know, there's more tolerance
in like digital applications like this.
Now, the thing is what you want to avoid is, you know,
a model that you asked it to do something,
and it came back and just,
it just wasted your time, basically, right?
It's like there's, it's kind of,
the amount of time that would save me to go, like,
back and forth with this thing,
it's just wasted time.
So it's similar to how, like, when you hire someone,
if you, if it's someone who, like, let's say,
it was just not trained in a software engineer
and, like, it would take longer to, like, upskill them
and training like to be a software
and you have just to do the task yourself.
So I think that the actual
eval is like, is this like
net beneficial to you as a developer?
Like are you spending less time
doing things you don't like to do with this system or not
rather than like meeting that level of perfection
and the time of speed that has?
Makes total sense.
Okay, you mentioned evals early in the show
when we were talking earlier
and how you said that's
one of the most important.
important aspects of this, especially as it relates to data.
So, I think the last topic we should cover is your question, John, which we made everyone
wait a really long time around, you know, data teams.
Yeah, exactly, exactly.
So, John, why don't you revisit your question?
Because I want to wrap up by talking about the data aspect of this.
I mean, I could keep going asking a ton of questions because it's so interesting.
but yeah i think you know obviously a lot of our audience you know works on data teams and i i think
i'm personally curious and i bet a lot of the audience is curious about what what does it look like
so say i'm a data team that works for reflection i'm on that data team and dealing with
AI agents and and on a daily basis like what what is it how is it similar to what i might
do at a b2b tech company or or an industry and and what are the
the main differences?
Something, as I mentioned earlier, when you first asked the question, I think something that
is possibly the most important thing to any successful, like, AI project, product, or
research is getting your evaluations right.
So actually the most successful AI projects, they typically start with some phase where they
spend, they're not training any models, not doing anything like this, they're just figuring
out, like, how are we going to evaluate in success?
And the reason, this is something that when we typically adopt, let's say, when you see all these coding products and like AI products in market, there is sort of like shooting from the hip thing where it's like, I put it through some workflow, periodo customer, like, does it have value or not.
Whereas the way, like, I've seen, like, successful products like this built out, like,
how does, for example, like, when a company develops a language model, like, a GVT model,
whatever, how does it know the thing that's turning, like the people, that users will like
it, right?
You have to develop all these evaluations internally that are really well correlated to what
your customers actually care about.
And so in the case of chatbots, like that evaluation is basically like preference.
It's, you have your data team, like what it does, what does a data team do for, like, a normal, like, language model of chat bot like product?
They get a lot of data from human ratings that is, you know, they have different prompts.
And then, you know, those raters basically, you know, say which ones, which prompts they liked more over the others.
And so typically, it means that the thing that gets upweighted is, like, more helpful responses, things that are formatted nicely, things that are, you know, safe, right?
like they say they're not offensive.
And those, it's really important to set up those e-vows that you're benchmarking internally
to actually correlate with what your customers actually care about in your end product.
And I think that that's something that it's kind of a new way of operating because these systems
aren't deterministic, like software as we know it.
And so when you're shipping like something that is probabilistic that is going to work in some
cases, not work in other cases, you have to come in with some degree of confidence, like
whether, you know, we're coming to a customer, sometimes our use cases will not be a good fit
for us because we built evals and we were able to predict that actually for these use
cases, like the models are not ready yet.
Yeah.
Can you give us an example of just a really simple eval, like what that would, what that would look
like?
Yeah.
So, for example, like for coding, right, that's kind of what we're building these autonomous
coding models.
And the Eval, what is the Eval there?
The Eval there will be, from a customer perspective, will they actually merge the code
that are used to proposed?
And how long of the interaction or back and forth will it take them to merging, right?
So then the question is, well, we want that experience to be delightful for customers.
We're not going to, we don't want to like set up complex e-vals for every customer because
that's just going to be a waste of their time.
So it's how do we set up internal evals that are kind of represent?
of what our customers care about.
And so an example of this is, well, if we care about the merge rate,
like the merge rate of pull requests from our customers,
then we should be tracking, like, the merge rate
on similar kinds of tasks to our customers help.
So, you know, some things that we, right,
so we have different task categories like migrations,
cybersecurity vulnerabilities,
these sort of third party like API breakages.
Right.
And, you know, your data team, what it does is that the Eval side of it
that it curates data sets
and representing them now.
And then for every version of our model,
we basically run at Tunes E-Vowls
and we have different e-vows for different use cases.
And we're seeing where our model stack up.
Some of them they do better, some they do worse,
but it allows us to come to customers
and when we've identified a use case that has a good fit,
have high confidence that will be a delightful experience.
And I don't think most teams
that build products that may not come
from our research background are as scientific about it because it's kind of setting up the e-val
takes a really long time.
And it's just kind of a pretty complex process, right?
Where are you going to, like, where are you going to source, like, the coding raters or
are going to basically rate whether you merge these things or not?
How are you going to manage that team?
Where are you going to source the tasks from that are representative of what your
customers care about?
These are the kinds of questions that the data team answers.
And more so, right, beyond that, it's how do we collect the data that we need to train models
to be good at the things that the customers care about?
At various aspects, right, how do we collect data for supervines fine tuning?
How do we collect data for reinforcing learning?
So the data team, you need to be as nimble on data research as you are on like basically
software and model research, right?
We think a lot about algorithms and model architectures and things like that.
And the thing that it may be is equally important, but less frequently talked about, like in papers, is the data research that needs to go into, like, operational data research to make sure that these systems are reliable of the things you carry out.
Right.
That's so interesting.
And very true of people, too.
Well, I was just going to say, there's timeless, there's a lot of timeless wisdom in that approach as well.
Well, as we say, we're the buzzer, Misha.
I do want to ask one really practical question.
I know reflection is still, you know, in stealth mode in many ways,
but I know probably a lot of our listeners have tried
or are exploring different tools around augmenting
the technical work that they do every day.
From your perspective, if someone is saying,
okay, you know, I see all these posts on hacker news
about, you know, these tools and, you know, bots,
it can help me, you know, or co-pilots,
it can help me write code. Where would you encourage people to
dig in if they feel either overwhelmed or they're kind of new to
exploring that space of like AI augmented technical work and coding
specifically? I think that if people are just kind of dipping their toes and
just getting started and trying to explore this space, the best thing is to
sort of use products that are, you know, use coding products that like a co-pilot or
cursor that are these kind of initial you know like they're kind of as you're talking about like cruise
control right i think that that's how i actually you know i started using both products like a lot of
members of our team use those products and you know they've been very very informative and as i said
kind of in a sense sort of complementary i think that getting like getting autonomy right and getting
agency to work is a more complex and nuanced problem and typically what we find we talk to customers
By the time they're thinking about a timeian agency,
they've already been using co-pilot for some time
and they're pretty well educated on what kinds of problems they believe they have
or can be automated.
So if it's selling coming from blank slate,
I would kind of take an off-the-shelf product like a copilot or a cursor
and give that a shot and sort of start just trying it out empirically
and seeing what sorts of values drive and the one.
Love it.
All right.
Well, Misha, best of luck as you continue to take.
dig into research and build product.
And when you're ready to come out of stealth mode,
of course, you know, tell John and I so we can,
you know, so we can kick the tires.
But we'd love to have you back on the show to talk about some product
specifics in the future.
That sounds great.
Thanks, Eric.
Thanks, John, for having you.
The Datastack show is brought to you by Rudderstack,
the warehouse native customer data platform.
Rudderstack has purpose built to help data teams turn customer data into competitive
advantage.
Learn more at Rudderstack.com.
Thank you.
Thank you.
