The a16z Show - From Code Search to AI Agents: Inside Sourcegraph's Transformation with CTO Beyang Liu
Episode Date: January 20, 2026Sourcegraph's CTO just revealed why 90% of his code now comes from agents—and why the Chinese models powering America's AI future should terrify Washington. While Silicon Valley obsesses over AGI ap...ocalypse scenarios, Beyang Liu's team discovered something darker: every competitive open-source coding model they tested traces back to Chinese labs, and US companies have gone silent after releasing Llama 3. The regulatory fear that killed American open-source development isn't hypothetical anymore—it's already handed the infrastructure layer of the AI revolution to Beijing, one fine-tuned model at a time. Resources:Follow Beyang Liu on X: https://x.com/beyangFollow Martin Casado on X: https://x.com/martin_casadoFollow Guido Appenzeller on X: https://x.com/appenz Stay Updated:If you enjoyed this episode, be sure to like, subscribe, and share with your friends!Find a16z on X: https://x.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zListen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYXListen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711Follow our host: https://x.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see http://a16z.com/disclosures. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
This is the first time in computer science I can think of where we've actually abdicated, like, correctness and logic to us.
Like in the past, it was a resource, right?
So maybe the performance is different.
Maybe the availability is different.
But like, whatever I put in, I'm going to get back out.
But now we're like, figure out this problem for me.
You talk to some devs and they're like, you know, I've never been more productive, but coding isn't fun anymore.
That's one of the things that we're trying to solve for it.
It's like amazing new technology.
It feels like magic.
Never experienced anything like this before my life.
And the narrative that was fun was like, this thing will do.
just run our lives for us, or it's going to kill us all.
Total annihilation.
Like Terminator.
And there's just like absolutely no danger that like this thing's going to, you know,
acquire a mind of its own and like try to reach out of the computer and kill you.
If you use this every day, right, this idea that this thing could take over to us, it's funny.
Yeah, exactly.
That narrative, I think, is largely been dispelled within our circles.
But I think that it's sort of like taken on a life of its own in other circles.
And it's made its way to the halls of policy making.
the U.S.
This is the old adjective of like, you know, do you blame it on ignorance or malice?
I honestly don't know, but it is clearly like nonsensical and I think very much in the
national interests to be still telling this story.
The United States invented the AI Revolution.
We built the chips, trained the frontier models, and created the entire ecosystem.
For right now, if you're a start of building AI products, you're probably writing your code
on Chinese models.
Today's guest is Beyond Liu, one of the co-founders of Sourcegraph.
Biong is joined by A16Z's Martin Casado and Guido Abenzeller
to talk about the shift he's seeing on the front lines of software development today.
Sourcegraph's coding agent, which has hit number one on the benchmark for merged pull requests,
runs on open source models.
Many of them are Chinese, not because of ideology, but because they work better for what the company needs.
Here's the tension.
Beyond study machine learning under Daphne Kohler at Stanford.
He spent a decade building developer tools.
He knows the technology cold.
And his view is that we're sleepwalking into a dependency problem.
Not because Chinese models are dangerous,
but because American policy has made it nearly impossible to compete in open source AI.
We dig into why the Terminator narrative around AI safety
might be our biggest strategic mistake,
whether it's already too late to catch up,
and what happens when the atomic unit of software isn't a function anymore,
but a stochastic subroutine you can't fully control.
Behind, thanks for coming and joining.
So the topic today is AI encoding,
but I mean, I would say you're one of the world's experts on this.
And so we would love to kind of do a deep dive
and kind of how you view the problem,
how you view the solution, of course,
your co-founder and CTO of SourceGraft.
Yeah. So we'll talk a bit about that as well.
Of course, we've got Guido. Thanks for being here.
And so maybe just to start, we can do a bit of a background on you
and then we'll just kind of dig into details.
Yeah, so background, I've been working on dev tools
for more than the past decade of my life.
I started Sourcegraph about 10 plus years ago,
brought the world's first kind of like production legit
code search engine to market
and pushed that to, I think, a good portion of the Fortune 500.
Prior to that, I was a developer at Palantir,
and I guess now it's like the early days, right?
That's where I met my co-founder Quinn
and we were working on data analysis software
and a lot of large enterprise code bases
that were kind of like drop-chipped into
and realized that there was a big need for better tooling
for understanding massive code bases.
And then before that, I guess relevant now is
I actually did machine learning as a concentration
when I was doing my studies.
So it did some computer vision research.
I know that?
Yeah, under Daphne Kohler at the...
I did.
I didn't know.
I was an AI guy.
Yeah, yeah.
O.G.A.I.
Dawson Engler, like, Eiler stuff.
I thought you were a systems guy.
Yeah, yeah.
For me, like this whole phenomenon of like LMs and coding,
it's almost like a homecoming of sorts.
Because I really...
I didn't know that.
That's awesome.
Yeah.
Definitely taught me air as well.
She's a great teacher.
Yes, that's amazing.
I got to say, that was like the one class I was so happy I comped out of
because I didn't think I would do well.
If I didn't pass the comp, I thought it would defeat me.
I think I failed the cop too many times.
You had to take it.
There were those two.
I was the TA for that class, actually.
121?
228.
Yeah, that's right.
That's what it was.
Yeah, yeah.
So, cool.
Great.
So Sourcecraft started code search.
navigation, but now you've been making ways with AMP, which is like an agent, would we call it.
So maybe we talked through a little bit about what you've been working on, maybe pre-AI and now,
just to level set.
Yeah, so kind of like the history of the company is we were really built to make coding
a lot more efficient inside large organizations and to make the practice of actually building
software way more accessible, primarily to professional software engineers.
But I think our eventual vision was always to expand the franchise.
And we started by tackling the key problem, which is an able to.
enabling humans to understand code because if you've ever worked inside a large code base,
you know that that probably takes anywhere from 80 to 99% of the time.
And then the remainder is when you actually understand the problem well enough to actually
write the code.
So that's where we kind of like built up our domain expertise.
And then when LM sort of matured, it was something that we were always kind of like monitoring
in the back of our minds, originally looked at LMs and embeddings as a way to
enhance the ranking signals that we're incorporating into our search engine.
and then when things really hit their stride with Chachapit and all that,
it was fairly obvious to us that there was a big opportunity to combine LMs,
which were this amazing technology,
with a lot of the stuff that we'd built up to that point.
And then, I guess, to round that out,
finally, our latest product is this coding agent called AMP.
What's interesting about AMP is this kind of view that's like
a very sophisticated kind of opinionated view on agents.
Do you share that or is that just kind of the outside?
I would say there's certain things that we're doing that I think are quite unique.
It was the top recently on one of these benchmarks, right?
Yeah, I think there's like some startup out there that compares pull request merge rates or something.
We manage to claim the top spot.
It's awesome.
Yeah, excellent.
Yeah, it was very gratifying to see.
But again, like, I would say that I think we're opinionated on some parts of our philosophy of building agents.
And my own take is I think a lot of these opinions will soon become widespread.
But there's other elements of what we're doing, which are like people like to read.
read in a lot to AI these days.
And sometimes it's just like, look, we actually did something very simple here.
You'll have good results, and we shipped that and it works very well.
Okay.
So I think your focus really on large code bases.
How is that structurally different from me coding my little Homeroom 201?
Yeah, so it's funny you mentioned that.
Like historically, the company's focus has really been on large code bases.
But with AMP, we decided to build it almost completely separate from the existing code.
And the reason for that was, one, we built AMP.
Amp is really, at this point, like, seven, eight months old.
So we started AMP in around, like, February, March this year.
And that was right at the wave of this new type of LLM hitting the world,
the, like, agentic tool-use LLM.
Finally worked.
Yeah.
Yeah, finally worked, right?
Like, after so many demo videos, finally there was a model that could actually do robust tool calling
and compose that with reasoning.
and our original tack was like, okay, let's build this into some of the existing things that we've created.
But the more we started playing around with the technology, the more we sort of came to the conclusion that this was actually truly disruptive.
And we should actually start from first principles to see, you know, build the agent from the ground up and see what tools we really need.
So what we've arrived at is the coding agent, which works, I think, very well in large code bases because, again, we push this to a lot of our customers.
But it's also great for, like, hobby coding.
Like I spun my dad up recently on it, and he's been using it to create these, like, iPad games for our kid.
Because, you know, typical Asian dad.
Trying to teach him math, right?
I want to teach him arithmetic and whatnot.
And so my dad who's never written a single line of code in his life is able to just like, hey, make a simple game that has them count the numbers.
And then if he gets it right, the little rocket ship blasts off.
So it's kind of interesting.
It's a really interesting time to be building because even if you're building for a professional
developers as we are, a lot of the technology ends up being just kind of widely accessible.
This is the new parenting.
You talk as a parent's not to write the games for your kids that are inappropriate and
the volunteer curriculum what they're supposed to learn.
Yeah, I love it.
Another thing that's been kind of made kind of splash is you've recently decided to go to an
advertisement-based model.
So like, on one hand, so I've got this dissidents internally, which is on one hand,
I'm like, this is the boutique, you're sophisticated.
On the other hand, I'm like, and it's also for everybody with ads.
And so, like, how do you kind of write?
reconcile. It's really funny because I think we had this sort of reputation for being like the
primo agent. Yeah, totally. It's a super intelligent one, but we never had like a flat rate pricing
we did pure usage based pricing and that also meant that there was never any incentive to
switch to a cheaper model for our users. So our tack was like the most intelligence and you just
pay for the inference cost. But as we built more and more, we kind of realized there's sort of this
efficient frontier that you can draw, this two by two grid and one axis is intelligence, but the other
axis is latency. And there's multiple interest points along this tradeoff curve. It's not just that
having the smartest model makes your experience the best. The smartest model often tends to be a significant
amount slower than other models on the market. And so we felt that there was like an opportunity for
us to create like a faster top level agent that couldn't do as complex of coding tasks, but it could
do these like targeted edits. And when we started to play around with these small fast models, we realized
that, hey, actually, the inference costs are significantly lower.
And that got us thinking, like, going back to folks like my dad, right?
Like, he's just doing this stuff on the side.
He doesn't want to spend hundreds of dollars per month to create these kind of like simple games.
We're like, hmm, maybe there's like a model here.
I think it started as a joke.
Someone was like, we should just do ads and see how that works.
And everyone was like, nah, that'll never work.
But then it just kind of kept coming back up.
And at one point, we're like, all right, let's just try it and see how it works.
And we launch it.
And it's been growing very quickly since then.
Can I dig philosophically into this just a little bit?
So I had a conversation with somebody that works on cloud code,
which is a very successful CLI tool.
And this person is like, you know, what we've done over time
is we've literally just removed, you know, stuff between the user and the model.
Like, that's it.
Like, that's, like, kind of like, the way that we improve things,
are we just, like, do less and let the model do more?
And so I guess that makes, you know,
it sounds kind of intellectually or intuitively interesting.
Yeah.
It kind of makes sense.
But on the other hand, it seems expensive.
Yeah.
You're like, here is this state-of-the-art model that costs a billion dollars to train.
Yeah.
And like, now it's just a user in the model.
And so it's almost like that statement is almost contrary to an advertisement-based model
or like what you're talking about, like, you know, like a fast model or smaller models.
Yeah.
So, like, are we seeing two parallel paths in the industry?
I, so there's definitely, um,
you can, there's definitely different, like, working styles, right?
Like, depending on the task, or maybe depending on the person,
you talk to people using coding agents, and some of them are like,
I just want to write a paragraph long prompt and then have the agent go figure it out.
I want to come back to something that's, like, mostly working.
And then there's other people who say, like, actually, I don't want to do that
because half the time I myself don't have a clear idea of what I want yet.
The creative process is sort of one where you kind of,
of like figure out what the software looks like as you go along.
And sometimes it's the same person saying both things, right?
Like when I go, there's some features where it's like implement billing,
where I'm like, okay, I know exactly what protocols we need to support
and the stripe integration.
I know what feedback loops we need to hit.
Then it's like, okay, big prompt, agent go at it.
But then there's other types of development where it's like,
I want to build a brand new feature.
We just shipped this code review panel.
in our editor extension.
And that was a kind of like situation
where I was like,
I don't actually know what this review experience
should look like because it's not me reviewing other people's code,
it's me reviewing agents code,
which is like a new workflow.
And for that, I kind of did want
like a more interactive back and forth
interaction between me and the agent.
So I don't think it's necessarily like,
these two things don't have to be completely separate products,
but they are distinct working modalities.
That's interesting.
That's a great way to put.
How do you think about the difference between, like, using somebody else's model, like, one of the sort of labs versus...
Yeah.
Building your own model versus, you know, using an open-source model.
Like, how does that fit in your philosophy?
Yeah, so I would say our philosophy is not model-centric.
It's more agent-centric.
So we view the model as an implementation detail.
Yeah, I don't know what that means.
Okay, so let me explain.
So, like, when you're interacting with an agent,
At the end of the day, you care about how that agent is going to respond to your inputs,
you know, what tools it's going to use, what sort of trajectories it's going to take,
what sort of thinking it does.
A lot of that goes back to the model, but it's not solely dependent on the model.
There's a lot of other things that can influence how an agent behaves.
There's the system prompt.
There's a set of tools that you give it.
There's a tooling environment.
There's the tool descriptions.
There's the sort of instructions that you give it for connecting to feedback loops.
And with the same model, with wildly different, like, tool descriptions
and system prompts, you actually get, like, you know,
completely different behaviors out of that model.
Is that true in both directions?
Like, with the same prompts and two completely different models
would get different barriers?
Oh, for sure, for sure.
It's like if you have, like, an agent harness,
like a set of tool descriptions, and you swap out the model,
then there's no guarantee that that thing is going to work well
with the model that you swapped in.
And so what we view as, like, the kind of atomic composable unit
is not the model.
It's this thing called the agent,
which is essentially this contract of like user puts text in
and gets certain behaviors out.
And that agent is really a product of both the model
plus all these other things that I just listed.
And so when it comes to like figure out what models we want to use,
it's not so much like, hey, we want to use like the latest, quote unquote,
frontier model from XYZ lab.
It's really about, hey, what behavior do we want the agent to take
or in some cases the sub-agent,
and how do we find the right model
that enables that agent to do its job?
It sounds so hard to me.
This is the first time in computer science
I can think of where we've actually abdicated
like correctness and logic to us.
Like in the past, it was a resource, right?
Yeah.
So like, whatever, it's not logic.
It's like, okay, so maybe the performance is different,
maybe the availability is different,
but like whatever I put in, I'm going to get back out,
whether it's a database or compute or whatever.
Yeah.
These are like, you know, but now we're like, figure out this problem for me, right?
So you're kind of abdicating like, you know, core logic and correctness.
Unit test comes back with works 45% of the cases.
Yeah, yeah, yeah.
The nondeterminism is something that people struggle with a lot.
But so for me, I actually do think, you know, like historically, pre-AI, like when you think
about computer systems, the basic unit of composability is like the function call in programming, right?
So it's like, when you think about your system, it's like this function,
calls out to these other functions
and those other functions
delegate to these other functions.
I do think there's still
an analog to that in the agent world.
Like the agent is really the analog
of the function,
but just updated or generalized to AI.
Can I just push on this?
Sure.
I mean, listen, call me as traditionalists.
Yeah.
But for me, like, computer infrastructures,
compute network is storage, right?
And, like, databases.
And these are resources that are abstracted.
Sure.
Like, so give me storage,
give me network.
Yep.
But like,
the semantics, like what actually happens, I write, right?
That's like my code.
Here we're like figure it out for me.
It's like we're advocating actual like logic and correctness.
It just feels like in a way like a little bit, you know, like in your case, for example,
if you pick up, you know, let's say you're using model v2.1 and then you go to model v2.2.
Like you have wildly different answers, right?
It's almost like a new instruction set or something.
You might have different answers, but I think if you construct the agent right, they're not going to be wildly different.
So, like, for instance, we have a sub-agent that's designed to search for things, like uncover relevant context.
And, you know, it is a bit of dice roll every time, right?
Like, it takes a slightly different trajectory.
It might search for different things.
But it's to the point now where if I want to find something in the code base, I have, like, 99% confidence that this thing will eventually be able to kind of like stochastically.
iterate to the right answer. And so in that way of thinking, it's like, yeah, how it gets there
might vary, but if I wanted to do a specific thing, it's reliable enough that I can invoke it.
It feels like there's kind of a backlash right now in the industry to e-vals. Do you view like this is
an e-vow problem or like a runtime system problem? Yeah, so, you know, my take on e-vals is
Evales are definitely effective as a sort of like unit test or a smoke test.
Because if you push a change, your agent and it breaks something, you want to know, right?
Like if there's an important workflow that you're like, hey, this should work reliably well.
Because if this doesn't work, then probably a lot of other things break.
And that's a great instance where you want an e-vow that will alert you when it goes from green to red.
I think where it gets hairier
is treating e-vals as
a kind of like optimization
target
because any eval set
like what are you trying to capture
if you're building an end-user product
at the end-of-the-day what you care about
is the product experience
and so you construct the eval set
to kind of proxy the vibes of the user
using the product
and by definition that means your eval set
is always lagging a little bit
from the frontier
because it takes time to like distill
what is a good product experience
into a set of e-vals.
And we've had multiple times in our past
where we've picked a number,
just to take an example,
like, with, you know,
back in the kind of like code completion days
of, you know,
20-23 or whatnot,
we had a coding tool
that would do coding auto-complete
and the kind of like banner,
top-line metric there was completion acceptance rate.
You know, like,
given that I suggest this change to the user,
what is the likelihood they're going to accept?
That seems like, you know,
bulletproof, right?
But actually, like, I think in building that,
we ended up over-optimizing that to a certain extent.
Because there's, like, any metric you choose,
there's going to be a way to game it eventually.
Even in this one, like, okay, so, like, the developer accepts it,
but do they end up committing it?
Yeah.
Oh, they committed it, but, you know, like, whatever.
Did, like, did the PR get used that?
Yeah, exactly.
Did the PR get.
There's like a subtle bug introduced or whatnot.
Yeah.
You know, did it get merged into Maine?
Like, I mean, it just feels like, you know.
Yeah, yeah.
You know, like, this is kind of an adjacent talk.
topic. But something that Guido and I discuss a lot is to what extent the market is Pareto
efficient on the Pareto Frontier. Like if you can trade off like let's say performance for cost
or intelligence for costs, like will the market kind of adopt that uniformly or does it just
optimize only for speed or only for correctness? Like being on the front lines, we would love
You know, your sense on this.
He's a simple question.
We ask this question a lot and nobody seems to know.
Like is the question here like what matters more, speed or intelligence?
Is whether the Pareto Frontier is what matters or if it's kind of there's points on the
Prater frontier that matter, right?
So you can imagine.
So traditional pricing psychology is you're the expensive one or you're the cheap one.
Yeah.
And everything in the middle is called the value gap, which people don't use, right?
And so originally we're like, oh, the Prater, like that happens here.
So either you buy the most expensive one
and you buy the cheapest one.
But actually, as we kind of look in the market,
I actually feel as like most of the frontier is pretty full.
Like developers are pretty sophisticated.
Like, you know, different, you know,
there's different cost sensitivities,
different price sensitivities.
Yeah.
So, you know, it's funny that you mention this like,
you know, the cheap option versus like the premium option.
It just so happens that AMP has two top-level agents.
There's a smart agent and there's a fast agent.
Oh, that's interesting.
And the fast agent is the one that's ad-supported,
like that we can offer for free.
And the smart agent is the one where we're like, okay, we're not, we will always only do usage-based pricing for that because we want to keep that at the frontier of smartness.
But that being said, like, I don't know, like maybe there's like a third point in there that could make sense.
It really just comes out of the vibes at the end of the day, like as we use this more heavily and see the usage patterns emerge.
The mid-agent.
Yeah, the mid-agent.
Like, I honestly, yeah.
Well, if you put it that way, and they're like, oh, like, yeah, they're like, yeah, they're.
Galaxy brain ideas, you either want, you know, smart or fat.
Cool.
So, I mean, if you're open to, I'd love to dig into a bit on kind of your view on
open source models.
Yeah, sure.
Do you use them?
Yes.
You know, do you think that they are an important part of the ecosystem?
Yeah.
So we do use a variety of open source models.
You know, we use both closed source and open source models quite heavily.
but the open source ones, I think, are becoming a bigger theme now for a couple of reasons.
One is, you know, with an open source or open weight model, you can post-trained them, right?
Which means, like, if you have a domain-specific task, like AMP has a growing number of sub-agents
that are specialized for a specific task like contact retrieval or, like, extra reasoning, library fetching,
those are more constrained tasks
where you don't necessarily need
like frontier general intelligence
if anything you want faster, right?
And so the benefit of having open weight models
is you can look at the thing that you're trying to optimize for
like what that sub-agent needs
and post-trained the model to accomplish that more effectively.
And the other element of open-weight models
that's very appealing is just the pricing aspect.
a bit. Like there's now more
and more like
effective open weight models
that are emerging on the scene that are
actually quite robust at agentic
tool use. You know
the landscape has changed immensely since
like June
of this year. We've gone from
like you know there's really only one
really good agentic tool use model
to now there's like... Could you name though? I mean it'd be great
to actually I mean I open. Yeah I mean like so
you know originally there was
Claude right like
Sonnet or Opus. That was the first agentic tool use model and that sort of, you know, ushered
in in the current agent wave. But now, you know, there's GP5, there's Kimi K2, there's Quintree
coder, GLM. Are these open source models like on par are pretty close?
It depends on the workload. So I would say in our evaluations for kind of like the top level
smart
coding agent driver
we still tend to prefer
Sonnet or
GP5
but for
kind of like quick
targeted edits or
specific subagents
I think more and more we're preferring
smaller models because
they have better latency characteristics
and
because the complexity of the task isn't
high, like you reach a ceiling.
It's like once you reach a certain level of quality,
there's diminishing returns,
and then you start optimizing for latency
because that gets you more, you know, interactivity.
What's the smallest models you can use for an effective agent?
I mean, for an agent right now,
it's probably still fairly large,
like talking to probably like hundreds of billions of parameters
for kind of like a top level agent.
But for like search agents,
you could go smaller than that.
And then we also have a model that does kind of like edit suggestions.
So, you know, for those times where you still have to go into the code and manually edit stuff,
this thing suggests the next edit that you'll make.
And for that, we use a very small model, like, you know, single-digit billions parameters.
So do you train your own models?
Yeah, we do.
Oh, wow.
But I would say we don't train them from scratch.
No pre-training.
It's mostly...
No pre-training.
And that would be dumb.
Yeah.
At this point, it's just like it would be fiscally irresponsible.
Probably pointless.
Yeah.
Are these for special use, special use cases?
Like, a lot of the products that we work with, let's say just outside of coding just like, a lot of products that we work with, you know, and just, I mean, here's this general view.
Pre-training is done.
Yeah.
Right.
Paying people to create data, we've hit economic equilibrium, right?
It's like, you can keep paying people, but like, you know, we're hitting people.
diminishing returns there because you need kind of more expensive people and like you need 10 times
more data and so at some point you hit equilibrium. But you know like there's a lot of product data
out there and there's a lot of users out there and like you know the the solution domain is
enormous and so you can start building smaller models and you know so it's like you know like A is
that correct and B you know like the models that you train do they kind of fit in that general pattern
of specific smaller models? I think that's spot on actually. It's like
the very large generalist models were great,
and they still are great for experimentation,
because it's almost like, you know,
you train this thing on all sorts of data,
and it's almost like a discovery process
where, like, the training team themselves
don't quite know, you know, what behaviors might emerge.
But once you map those to specific workload,
specific agents that you want to build,
then you have a much clearer target.
And, you know, it's widely known
that, like, a lot of the model labs do this now
behind the scenes. Like they might expose an API that's like, you know, one model, but behind
the scenes they're routing to, you know, smaller models. And you can also do that at the
application layer. Like if you have an agent architecture like we do, there's all sorts of
specialized tasks. Like we've broken down the process of like software creation to various tasks,
like context fetching or, you know, debugging or things like that. And once you have a specialized
agent for each, then you take a look at, you know, what the agent needs to succeed. And you try to
get the model as small as possible while still maintaining the requisite quality bar.
So it sounds like it's not just a per rate of frontier of quality versus cost, but there's
like a use case as well.
Oh, there's also multiple graphs.
Yeah, exactly.
Yeah, it's basically per agent.
Like every agent maps to a workflow.
Yeah.
It's emulating some workflow that, you know, maybe approximately maps to something that a human
used to do, maybe it doesn't, but it's, it's like a, it's a subroutine.
This is why I go back to like the, the function analogy.
And so for any given...
It's a subroutine where you abdicate the logic.
It's a, it's a stochastic subroutine.
I mean, now we have parameters like, how much reasoning do you want?
It's a tunable circuit.
Yeah, yeah.
How powerful do you want to make this?
What's your budget?
Yeah.
But there's like a mini Pareto frontier for each of these tasks, right?
And the optimal point along that frontier is different for each
task. So I actually want to dig into, you know, like the open source models, the implications.
I mean, I know that you've got opinions on that. We've got opinions. It's an interesting topic.
But before we do that, so in 10 years, are we using an IDE? Are we using agents on a CLA?
What happens to software engineering? In 10 years. Simple question.
Okay, so.
I mean, listen, you're like the other people of this.
No, no, no. I used for quite a while. I do have a take on this. Literally like the world expert on this question.
I'm serious.
Yeah.
So here's my take.
Like, I don't think it's not going to be an ID that looks like any ID that exists today.
And it's not going to be like a terminal that looks like any terminal that exists today.
My view is that, and I don't think this is like a particularly unique view.
It's just that, you know, the effective AI on every single knowledge domain, including coding,
is that it's just going to enable the human to level up, right?
So the job that you do already, like that, like my job has changed so much in the past year.
Like, I think about all the kind of like, toilsome, like, line by line editing that I did, like, a year ago today, it seems like completely foreign.
I, like, honestly, don't think I could go back at this point.
Now, when I'm doing stuff, it's more at the level of, like, telling the agent to make the specific edits or execute, like, a specific plan.
And I'm really playing the role more of, like, an orchestrator.
Now and then, you still have to, like, pop in and make some manual edits when it gets stuck.
but increasingly, like I would say by sheer lines of code volume,
probably more than 90% of the code that I write these days
is through AMP.
And I think it's only going to get higher and higher level over time.
And so when we think about the interface
that a human will interact with primarily,
I think the future looks like something that allows you
to orchestrate the job of multiple agents
and, crucially, something that allows
you as the human
to understand the essentials
of what these agents are
outputting. And I actually think that's
probably the limiting bottleneck today.
Of course. It's like the human
comprehension just on like, yeah.
Does it map to like my understanding of like
the problem needs even at like a business
size? Because there are fundamental tradeoffs
in the system. Yes.
Yes. But I think you can't wish those away.
You can't wish them away. And the human is the bottleneck
but I think the human is still essential
and will still remain essential
10 years from now in software insurance
because it's fundamentally a creative process.
No, no, that's what I mean.
Sorry, I just want to make sure we're talking about the same thing.
Oh, yeah, yeah.
Like, a human that has in their head of what they want to accomplish.
Yes.
And only the human has that in their head.
Yeah, yeah, yeah.
And so, like, often that's going to require choosing a point between two tradeoffs.
Yes.
Like, whatever that is.
And so, like, there has to be some way that this articulation happens.
Yes.
And when you talk to, like, practitioners today, like, a lot of them are very,
it's, like, bittersweet.
Because on the one hand, it's like, oh, my God, like agents, they're writing all this code and they're actually pretty good at it.
On the other hand, it's like, oh, I'm spending like 90% of my time, like, essentially doing code review now.
Which is, you know, there's like the one in 100 dev that you talk to that says, like, I really love code review.
The rest of us are like, oh, man, it's like such a drag.
Well, becoming middle managers of coding.
Yeah, yeah, exactly.
I mean, you talk to some devs and they're like, you know, I've never been more productive, but code.
isn't fun anymore.
And so, you know, that's one of the things
that we're trying to solve for.
The beauty, the elegance is gone.
It's not all looking at the implementations requirement.
It's, yeah, it's, it's, it's that, but also,
it's just, like, the task of, like, reviewing code, I think, is a slog.
And, like, classical code review interfaces are just not that good.
Like, I think they were never that good.
Yeah.
But it wasn't, like, blindingly obvious because the, the rate at which, like,
lines of code were shipping was...
It's a super simple example, right?
Today, if I review code from...
pretty much any coding agent out there.
Typically, it's just like file by file by file by file.
Yeah, yeah.
Like grouping this by task or something like that
or explaining it.
A couple of arrows with little buckles.
Yeah, exactly.
You are literally like, there's so much low-hanging footage.
Yes.
So we launched a review panel in our editor extension last week.
It doesn't get all the way there, but I think it's the first step.
And it's already like, it's way better than like an existing like code host review tool.
Like, it's mind-boggling to me that, like, we live in an age where, like, you can literally have a robot, like, you know, one shot, a very large change.
And then you pop over to, like, you know, GitHub PRs and you're clicking, you know, expand-hunk, expand-hunk, expand-hunk.
No code intelligence can't edit.
No diagrams.
Yeah, yeah.
Like, it just feels like, you know, it's like we have, we have like a Ferrari engine, but then part of our workflow still requires, like, strapping it to this, like, horse and buggy style thing.
So anyways.
It's like a create a microchip and then I give you an oscilloscope.
Yeah.
Yeah, exactly, exactly.
All right, so listen, we're moving on on time here.
So I actually want to get more to the policy side.
Because I do think, like, listen, a lot of the way this goes is the way the model goes.
Yep.
The open source ecosystem, we see it all over the place.
Not even talking about source draft, but I would say if a company walks in now,
that's a product company that's decided that they need to push train,
their own models. It's going to be on an open source model.
Yep. And more and more, these are Chinese models.
Yeah. And so, you mentioned that you do use open source models and Chinese models.
So, like, how do you think about that as far as, like, A, maybe just, like, the implications of the
dependency, and then B, what does this mean? Like, maybe more holistically with the United States and
the ecosystem. Yeah. So, like, first off, like, in terms of our production setup, like, every
model that we hit is hosted on American servers. So from, like, an information
security point of view, I think this is like best practice across the industry. It's like you
don't hit models that are hosted in China or yeah. So like from from that part, it's fine.
I would say though if you take a step back, it is fairly concerning because my view is that
as the model landscape evolves, you're going to start to see a flattening in terms of model
capabilities.
Right?
Like there's going to be
a healthy competition
at the model layer
and there's going to be
a number of options
for choosing a model
at a given point
in the Pareto Frontier.
And with that flattening,
there's a strong incentive
for application builders
to, you know,
at a given capability level,
use the one that's open
for the reasons stated before.
And because the most
capable open weight models
right now are of Chinese origin,
it essentially means
that like application
builders around the world are choosing to post-trained on top of these models.
And so if the U.S. Openweight ecosystem doesn't catch up, we're kind of in danger of the world
migrating to a world where most systems are heavily dependent on models of Chinese origin.
Do we have competitive U.S. open source models right now?
I mean...
I think it could have none-Chinese.
You know, we've sampled, like, a good portion of the model landscape.
Because, again, like, we have all these sub-agents and agents.
We want to find the best ones for the job.
And, frankly, like, the ones that we find most effective at agentic workloads,
they're almost all – I would say they are all of Chinese origin right now.
And that's not to say that, like, there haven't been, like, good efforts by American companies.
It's just that when you plop those into like an agentic application, you know, the tool use isn't quite robust enough.
It's not quite there yet.
Do you think this is a result of policy or funding or like?
I think probably all of the above.
The easy answer is like, yes, you know, it's a regulatory thing, this and that.
I just don't know how true that is.
I mean, it just turns out
that's very sophisticated, like...
You know, so it is interesting.
Like, it's like, you know,
the AI Revolution was basically, like,
born and created, uh,
in the West, right?
And I think...
Down the street.
I mean?
Yeah, down the street.
And, uh,
the U.S. still holds a lead in,
and basically, like, every part of the stack,
uh, you know,
whether it's like chips or, you know,
frontier intelligence,
um, like,
basically every place except open weight models.
And electronics.
Yeah.
I guess like, that's the manufacturing.
aspect of it.
Yeah, yeah.
But, yeah, from where I stand,
it's like, you know, if you go back to
the quote-un-quote early days of the AI revolution
back to, you know, 2022 or so,
yeah.
I feel like the narrative that was told,
that was like the dominant narrative
was this one of like AGI at that point.
Yeah.
Where it was kind of like this,
it's like amazing new technology.
It feels like magic, right?
Like never experienced anything like this before my life.
And then the narrative that was spun was like,
Hey, AGI is nine.
What does AGI mean?
Well, either, one, it's like utopia.
All our problems are solved.
This thing will just, you know, run our lives for us.
Or it's going to kill us all.
Total annihilation.
Like Terminator, uh, a style outcome.
Skynet, yeah.
I love, I love the ballogy view of this.
He's like, there's this very Abrahamic view of it.
It's either like, God or the devil, right?
And then he's like, I'm Hindu.
He's like, we've got a bunch of God.
Some of Capricias, summer eyes.
Yeah.
And I've chosen the Hindu view of this.
Yeah.
Arguably that view of the model landscape was the right one in retrospect.
And I think at the time, like, people, like, using these models directly kind of realize this, right?
It's like you use the models.
They can emulate intelligence of a certain kind, but it's like mostly pattern matching.
And there's just like absolutely no dangerous.
that like this thing's going to, you know, acquire a mind of its own
and, like, try to reach out of the computer and kill you.
If you use this every day, right, this idea of that this thing could take over the world.
Yeah, exactly.
So, like, now if you talk to practitioners, like, anyone who's building it,
and increasingly anyone who's using it, right?
Because, like, now, you know, Chachabit has been out for, like, three-some years
and everyone and their mom has used it.
Like, people kind of understand what the limitations are.
So, like, that narrative, I think, is largely been dispelled within our circles.
But I think that it's sort of like taking on a life of its own in other circles,
and it's made its way to some of the halls of policymaking in the U.S.
It's part of the problem that not every policymaker is using LM state today.
To put it carefully.
Yeah, I don't know.
You know, this is the old adage of like, you know, do you blame it on, you know, ignorance or malice?
Yeah.
Yeah, like I honestly don't know.
Like, it's a black box, but it is like,
like, it is clearly like nonsensical and I think very much in the, the national interests to be still telling this story because it, one, it leads to kind of like over emphasis on, on like the model as the end all be all of AI, where in reality it's like pushing this, pushing the models into like all these different application areas where like the rubber meets the road and things become useful.
Yeah, yeah.
Yeah, but then also, like, when you think about making laws and regulations for this sort of stuff,
if, you know, you've been sold on this sort of like Terminator-style narrative,
that's going to put you in a very different mindset with respect to how much risk tolerance you're willing to take on,
how much innovation you're going to allow in the ecosystem,
and your tolerance for open-sourcing model weights.
So, you know, you use a bunch of open-source models,
and there's a question that we actually debate quite a bit,
which is assume the policy environment exists as it is,
even with like infinite funding and infinite talent,
could you still actually build competitive models?
Or like now are we at a place that like we're just at a disadvantage
just because of the, like, is it too late to actually assume
that we can do it without actually changing policy?
Like build adequate open, open way models?
Well, let me just give an example.
Like, I don't know why Open AI released the open source models
the way they did.
But it seems like they were very, very sensitive
to what data was in them.
And I presume this is a concern around copyright.
I don't know the answer to this.
I just assume that.
Interesting.
We haven't seen something come out of meta in quite a while.
Yeah.
Like, are there even any open source model?
So, like, it's just very unusual
for the United States not to do this.
And the efforts that have done it
have seemed to be, like, handicapped in one way.
And so, like, there's one view of the world
that, like, this isn't a tech problem,
it isn't a money problem.
We're already in the overhang of policy.
Like, that's one view.
So, like, I guess my specific question is, do you think that is the case?
Or do you think we've just kind of, you know, haven't kind of gotten to it yet?
And we're going to come up with open source models.
You know, I honestly don't know.
Like, I don't have like inside knowledge of what goes on inside a lot of these research organizations.
But it says it's super remarkable that, like, here we were the U.S.
You know, we were the first with open source models.
We had Lama 3.
And now we're like, listen, you're using Chinese models.
Yeah.
Like, where are the U.S. models?
and why aren't they there?
And I guess my best guess,
I mean, again, you both can gut check me on this.
It's like, actually there's like all of the rhetoric around like developer reliability,
even though it didn't happen, but there was rhetoric around it,
all the policy stuff, all the copyright stuff, all the lawsuits.
My guess is that, you know, a lot of these folks are gun-shy.
Yeah, I think that could very well be the case.
And I think that the way that the regulatory landscape is evolving,
doesn't help at all as well.
Because, you know, there was an effort earlier this year
to have kind of like a federal set of standards
for AI model layer regulation,
but that I think fell apart.
And so now we're kind of like slow walking,
some case fast walking towards this like patchwork quilt
of state-by-state regulations.
It's not going to be good.
Some of that state regulation writes in that
it applies to anybody making a model available
in the particular state.
and theory
you're going to
one state
every state
tries to
drive policy
for all the
United States.
It's very
vaguely worded
and it leaves a lot
of room for
interpretation
which is never
good, I think,
for...
A lot of that
hasn't been
litigated either,
right?
So it's...
Yeah.
It massively
increases complexity.
I think for a small
startup to build
an open
open weight model
at this point
is extremely hard.
You know what are...
Who wants to take
that risk?
It reminds me
like back in the day
when we're like
looking at
GDPR
compliance when that was the first thing. And like I was talking with like our legal team and
external counsel and trying to like read the text of that regulation and figure out like, oh,
you know, is this thing, you know, technically in violation? It seems kind of high level. And the
answer that I got was like, look, honestly, these are underspecified. And it's really like,
it's going to come down to some decision maker within, you know, that bureaucracy. And they're going to
make a judgment call, and hopefully they lean towards going after, you know, the bigger fish
in the pond before they come after you. So, you know, paradoxically, this is the greatest case
you could have ever given to the large social networking giants. So the only ones that actually
could have the legal teams and the policy teams to navigate those stuff. And we saw this up close
as investors. We're like, as soon as these things came up, it basically entrenched the incumbents.
Yeah. Who could come play. Last quick topic, if, you know, if you did have, like, some recommendation
on, like, how we should think about policy going forward to kind of aid in,
you know, open source efforts for the United States.
Yeah.
What would you guide?
Do as much as possible to ensure a dynamic and competitive AI ecosystem within the U.S.
I mean, the best thing that we can do, I mean, we're America.
Like, the best thing we can do is to take a step back and let the free market function.
And so to that end, like, ensuring there's kind of like a standard, like, you know,
nationwide set of regulations that's, you know, clear and, you know, like, well-specified.
to be going after like specific applications
and application areas
rather than, you know, like general,
you know, existential risk at the model layer.
That would be good.
And then two, just ensuring that there's,
like, competition at the model layer,
avoiding any sort of, like,
anti-competitive behavior.
Regulatory lock and any of that.
Yeah, regulatory lock and that sort of thing.
You know, essentially, like, you know,
don't let the, like, Internet Explorer
versus Netscape thing play out the way it did
and like internet 1.0 with like the AI ecosystem.
Yeah, we're very, very lucky that actually academia and the broad industry ended up airing on the side of openness.
Let's hope this happens this time too.
Yeah.
Thanks for listening to this episode of the A16Z podcast.
If you like this episode, be sure to like, comment, subscribe, leave us a rating or review and share it with your friends and family.
For more episodes, go to YouTube, Apple Podcast, and Spotify.
Follow us on X at A16Z and subscribe to our podcast.
Substack at A16Z.substack.com. Thanks again for listening and I'll see you in the next episode.
As a reminder, the content here is for informational purposes only. It should not be taken as legal
business, tax, or investment advice or be used to evaluate any investment or security and is not
directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its
affiliates may also maintain investments in the companies discussed in this podcast. For more
details, including a link to our investments, please see a16Z.com forward slash disclosures.
