No Priors: Artificial Intelligence | Technology | Startups - Coding in Collaboration with AI with Sourcegraph CTO Beyang Liu
Episode Date: January 18, 2024Coding in collaboration with AI can reduce human toil in the software development process and lead to more accurate and less tedious work for coding teams. This week on No Priors, Sarah talked with Be...yang Liu, the cofounder and CTO of Sourcegraph, which builds tools that help developers innovate faster. Their most recent launch was an AI coding assistant called Cody. Beyang has spent his entire career thinking about how humans can work in conjunction with AI to write better code. Sarah and Beyang talk about how Sourcegraph is thinking about augmenting the coding process in a way that ensures accuracy and efficiency starting with robust and high-quality context. They also think about what the future of software development could look like in a world where AI can generate high-quality code on its own and where that leaves humans in the coding process. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @beyang Show Notes: (0:00) Beyang Liu’s experience (0:52) Sourcegraph premise (2:20) AI and finding flow (4:18) Developing LLMs in code (6:46) Cody explanation (7:56) Unlocking AI code generation (11:00) search architecture in LLMs (16:02) Quality-assurance in data set (18:03) Future of Cody (22:48) Constraints in AI code generation (30:28) Lessons from Beyang’s research days (33:17) Benefits of small models (35:49) Future of software development (42:14) What skills will be valued down the line
Transcript
Discussion (0)
Hi, listeners, and welcome to another episode of No Pryors.
This week, we're talking to Bayang Liu, the co-founder and CTO of Sourcegraph, which builds tools that helps developers innovate faster.
Their most recent launch was an AI coding assistant called Cody.
We're excited to have Bayang on to talk about how AI changes software development.
Welcome.
Cool. Thanks, Sarah. It's great to be on. Thanks for having me.
Yeah. So you guys founded SourceGraph.
all the way back in 2013, right?
I feel like I met you and Quinn at GopherCon either that year or the year after.
Do you remember?
Yeah, I think that's right.
We met at one of those like after conference events.
And I remember you asked me a bunch of questions about developer productivity and code search
and what we're doing back then.
Many listeners to the podcast are technical, but can you describe the core thesis of the company?
Quinn and I are both developers by background.
We felt that there was kind of like this gap between the promise of programming being in flow.
and getting stuff done and creating something new
that everyone experiences,
it's probably the reason that many of us
got into programming in the first place,
the joy of creation.
Then you compare that with the day-to-day
of most professional software engineers,
which is a lot of toil and a lot of drudgery.
When we kind of drilled into that,
why is that?
I think we both realized that we're spending a lot of our time
in the process of reading and understanding
the existing code rather than building new features,
because all that is prerequisite.
for being able to build quickly and efficiently.
And that was a pain point that we saw again and again,
both with the people that we collaborated with inside the company we were working at
at the time, Palantir, as well as a lot of the enterprise customers
that Palantir was working with.
So we were kind of drop shipping into large banks and Fortune 500 companies
and building software kind of embedded with their software teams.
And if anything, the pain points they had around understanding legacy code
and figuring out the context,
of the code base so that could work effectively was, you know, 10x, 100x of the challenges that
we were experiencing. So it was partially, you know, scratching our own itch and partially like,
hey, like the pain we feel is reflected across all these different industries trying
to build software. Yeah, and we're going to come back to context and how important is for
using this generation of AI. But I want to go actually back to, like, some routes you have
in thinking about AI and you're interning at the Stanford AI Research Lab way back one.
Yeah.
Like, that wasn't the starting point for source graph.
It was more like, oh, we need like super grep, right?
Like, we just need a version of search that works in real environments and is useful for getting to flow.
When in the story of source graph did you start thinking about how advancements in AI could change the product?
My first love in terms of computer science was actually AI and machine learning.
That's what I concentrated in when I was a student at Stanford.
I worked in Stanford AI Lab with Stephanie Kohler, she's my advisor,
mostly doing computer vision stuff in those days.
And it was very different in those days.
We're now living through the neural net revolution.
We're well into it.
It's just like neural nets everywhere.
And in those days, it's still kind of like the dark ages of neural nets,
where it was after the first initial successes they had in like the late 80s and 90s,
doing OCR with them.
But then after that, the use cases sort of petered out.
the time that I was doing it, the conventional wisdom, the thing that they told us in, you know,
machine learning one and one was like, you know, neural nets were this thing that we tried, you know,
a decade or so ago, but it didn't really pan out. So these days were mostly focused on
graphical models and statistical learning techniques, you know, really trying to be explicit
about modeling the probability distribution of what we're trying to represent. We actually had
Daphne and one of her other former students, Lucas Bewald, from now weights and biases on the
podcast as well. And both of them were also like lamenting the dark ages when like neural nets
or like this weird niche thing. And we're going to work on graphical models instead. But it's
very cool to see so many people who have like a, you know, interest in technical passion in this,
like emerge the other end and be like, aha, like now is the time. So at what point where you're like,
okay, like I'm going to look at this and we're going to try to work on it at source graph?
Yeah, it's great. It really feels like a homecoming of sorts. And I think we're very fortunate that a lot
of the underlying skill sets, I think, do transfer pretty well. I mean, it's all linear algebra
and matrix operations underneath the hood, and that stuff is still applicable. And a lot of the
intuitions, like the value of sparsity and things like that, still are kind of applicable.
I'm still waiting for the statistical learning and maybe some of the convex optimization stuff
to reemerge. I wouldn't count it entirely out yet. I feel like the pendulum always swings back
the other way. It's swung away from statistical learning and convex optimization.
and models now.
But I think they'll reemerge,
especially as we try to get deeper
into interpreting how and why
neural nets and attention
is as good as it is.
But to answer your question,
when do we start thinking about this
at Sourcegraph?
I want to say it was like
surga 2017-2018
that we started to kind of like revisit
some of this because
I think 2017 was when the attention paper
came out and you started to see more applications
of LMs in
in the space of code.
I think Tab 9 was one of the earliest to market there
with the L-LM-based auto-complea.
I remember chatting with someone
who had essentially implemented that
on top of GPD2 at the time.
And it wasn't nearly as good as it is now, even then,
you know, like two or three years ago.
But we ran some early experiments applying LLMs,
specifically embeddings to code search
and that yielded some interesting results.
Again, the quality wasn't at the point where we were ready to productionize it yet, but it was certainly like enough to keep us going.
I think things really picked up September or October of last year.
It was a confluence of factors.
I think one, our internal efforts just kind of reached a level of maturity where we started being more serious, devoting more time to it.
Second thing is I went on paternity leave, so I was able to step away from kind of like the day-to-day stuff a little bit,
and that gave some time and room for kind of experimentation.
And then, of course, at the end of November, chat GPT landed,
and that just changed the game for everyone.
And there was a ton of interest in excitement that really gave us a big kick
to start exploring in-depth the efforts that we already had underway.
Awesome. And so explain what Cody is today.
Cody is an AI coding assistant.
It integrates into your editor, whether you're using VSCode or JetBrains.
I have experimental support for NeoVim,
and as an EMAX user, EMAX is on the way.
We've also integrated it into our web application.
So if you go to sourcegraph.com and go to a repository page,
there's an Ask Cody button that allows you
to ask high-level questions about that code base.
And in terms of feature set, it supports a lot of the features
that other coding assistance support inline completions,
high-level Q&A informed by the context of your code base,
kind of specific commands like generate unit
test or fix this compiler error that are kind of like inline actions in the editor.
And our main point of differentiation is across that feature service area, we augment the
large language model that we're using underneath the hood with all the context that we can
pull in through a source graph and through techniques that we've refined over the past
decade, building a really awesome code understanding tool for human developers.
Okay, so you have said, and I think it is like a more interesting point of view now that there's an argument that choosing and structuring like large repo context is the key unlock for code generation and like AI code functionally.
Can you explain how you guys approach it?
Yeah, so in many ways the context problem, so, you know, context, another word for it is retrieval augmented generation.
The basic idea, I mean, listeners of your pot are probably familiar with this, but just for the ones that are, you know, tuning in.
and unfamiliar. The idea is that large language models get a lot smarter when they're
augmented with some sort of context fetching ability, the most common of which is typically
like a search engine. So there's a number of examples out there of doing this. Bing chat is one
example. Perplexity is another example. They're building Google competitors where they integrate
the large language model with a web search functionality. And fetching search results into the
context window of the language model helps basically.
anchor it to specific facts and knowledge that helps it hallucinate less and generate more accurate
responses. We essentially do the same thing for code, using a combination of code search and
also something we call graph context to pull in relevant code snippets and pieces of documentation
into the context window in a way that improves code generation and high-level Q&A about the code.
And so on the code search end, we're essentially incorporating the technologies that we've built to help human developers over the past decade.
So if you look at the core feature set of source graph, the bread butter really is you have code search, which allows you to go from, you know, I'm thinking of a function or I'm thinking of an error message and quickly pinpointing the needle in the haystack in a giant universe of code.
And then from there, it's sort of this walking the reference graph of code.
So go to definition, find references in a way that doesn't require you to set a development environment or, you know, tangle with any build systems.
It just all kind of works.
So the analogy there is like we want to make exploring and searching code as easy as it is to explore and search the web.
That's a huge unlock for humans being able to take advantage of the institutional knowledge embedded in that data source.
And it turns out those same actions, the code search and then the walking of the reference graph,
it turns out to be really useful for surfacing relevant pieces of context that you can then place into a language model's context window
that makes it much better at generating code that fits within the context of your code base and also answering questions accurately without making as much stuff up.
Actually, I'm very interested. Do you do both, let's say, other traditional information retrieval approaches like ranking along with AST transversal?
Or, like, is there information missing from the graph context that's also useful,
either for your humans using search or for the models using search?
Yeah, there's a ton of data sources.
Let's start with the search side, which the search problem is really like,
hey, the user asks a question, now find me all the pieces of code or piece of documentation
that could be relevant to answering that question.
We really view that as a generalized search problem.
It has a lot of parallels to end user search with the difference being, you know,
for human search, it's really important to get the quote-unquote right result in the top three.
Otherwise, people will ignore it.
Whereas with language models, you actually have a little bit more flexibility because, you know,
you have a context window of these days at least, you know, 2,000 tokens, some cases much longer, right?
And then in terms of how you do that fetching, the overall architecture is very similar
to how you would design a search engine.
So you have a two-layered architecture at the bottom layer are your kind of like underlying,
retrievers. So the base case here would be just keyword search or the fancy way of saying that
nowadays is sparse specter search. If you use the kind of like one hot encoding where ones
correspond to the presence of certain dictionary words. Anyways, that's just keyword search. It actually
works reasonably well. I think if you talk to a lot of rag practitioners, you'll find that
the kind of like dirty secret is that keyword search can probably get you
more than 90% of the way there.
Let's talk about embeddings in a little bit.
But on keyword search alone, there's a lot that we do.
It's a combination of classic keyword search,
combining that with things that work well for code,
like regular expressions and string literals.
Also really important is how you index the data.
So what you're treating as, quote, unquote,
like the document in your keyword search backend.
We found it that it's absolutely essential
if you're searching over code to parse things.
And so you can extract specific functions and methods and classes
along with the corresponding dock string
and treat those as separate entities in your system
rather than indexing at the file level
or trying to do some more naive chunking.
So there's the keyword search.
We also have embeddings-based search or dense vector search
where you basically run those same documents,
those functions and symbols and encode through an LM,
take out the kind of internal representation,
the embeddings vector,
and then do nearest neighbor search against that.
There's a couple other techniques that can use to surface,
you know, relevant context to,
like matching file names and things like that.
Anyways, you have this basket of underlying kind of retrievers.
And the goal of the retrievers is just to get, preserve 100% recall.
So make sure you don't miss anything,
but also get the candidate result set down to a size where you can use a fancier method to bump the really relevant stuff up in the context window.
And that's where the second layer of the architecture comes into play.
And the second layer is the re-ranking layer.
Again, if you're implementing a search engine, this is how you do it, right?
Like you have, after your layer ones proposed like all the candidates up, you have a fancier, you know, re-ranking layer that would be too slow to invoke across the entire document corpus.
But once you've kind of scoped it down to a smaller set, you can,
take the re-ranker and the purpose of the re-ranker is really to bump the right result or the
most relevant results up to the top. So optimizing for precision over recall. So that's kind of like
the general architecture of the search backend that powers Cody. Awesome. Yeah. I think one of
the things that I believe and we believe at conviction is that people are going to build
pipelines that look like search pipelines attached to a large language model in many more domains.
And you should treat that entire, like you guys are building a very sophisticated version here
having worked on search for a while, but that parts beyond the language model itself are quite
important. For example, like the embeddings model and your chunking strategy. And they're actually
pretty data specific. Yep. Right. We were just talking about this. And I think people are going to
end up with domain-specific and even fine-tuned embeddings models from companies like Voyage or
in-house because there's a – I think there's a lot of headroom on performance there.
Yep, absolutely. I think the Voyage folks are doing really interesting stuff working on an embeddings
model for code. We're kind of collaborating with them at the moment. They're really smart
set of folks. And I think you're absolutely right. There's so many components in these AI systems
that are outside of the, quote, quote, main language model that are really important.
And we found that the most important things are, I mean, really what this comes down to is like
a data quality and data processing pipeline, which has been something that people have realized
for a long time, right?
Like, your model architecture can only go so far if your data is garbage.
So you really need a high quality data pipeline.
And that means not only having, you know, in our domain, high quality code that can
serve as the underlying data to use, but also a way to structure that data in a way where
you can maximize your ability to extract signal from noise.
Do you take into account the quality of code in this pipeline in some way?
Because, you know, you're working on customer code basis, like if there's anything like
the code basis I've interacted with.
You know, there's a variance of quality, but that's the real world.
So, like, what do you mean by, you know, high quality code here?
I mean, we kind of implicitly do right now because we,
Built into Cody is this notion of like, you know, which code is it referencing?
It's going to reference the code in your code base first.
And that's probably the most relevant code if you're trying to work on day-to-day tasks in a private code base.
We're probably going to release a feature soon.
This is something that our customers have requested.
Basically, the ability to point Cody at areas of the code base that are better models of what good looks like.
We've talked with a lot of enterprise customers where when we say, like,
hey, you know, Cody has the context of your code base.
It will go and do a bunch of code searches when it's generated code for you.
Their initial reaction is like, can I tell it to ignore a large parts of the code?
Because there's certain parts of the code where like, yeah, those are antipads.
We're trying to like deprecate that or migrate away from that pattern.
And we're like, yeah, absolutely.
That's actually like a very easy thing to do at the search layer.
And the nice part of this, too, is when you're doing RAG, you can be very explicit about
the type of information, the type of data you're fetching into the context window.
You basically can give someone like a lever that they could turn on or off or like a slider
at query time that kind of controls what you tag in as context.
So, you know, maybe sometimes you really do want the full code base as context when you're
doing something like a completions or you're just trying to, you know, get something out
the door.
Other times maybe you want to be a little bit more thoughtful about what context you're
attending to because you have another goal in mind.
You know, not only do you want to ship the feature, but you also want to up-level the quality of your code or make it look more like some golden copy you have somewhere in your code base.
You just mentioned completions, and then there's the other sort of user experience model that we've seen, which is chat in terms of how people interact with code generation capabilities.
Yep.
Where do we go from here, right?
Is it like, is it agents?
Is it more reliability?
Like, what do you want to build Cody into?
Yeah.
So I think there's kind of like the short term, the long term to think about.
In the short term, I think there's a ton more surface area in the developer interloop
and kind of like human in the loop use cases.
Sorry, describe what you mean by interloop.
When you think about the software development lifecycle, this kind of iterative cycle
through which we build software, there's kind of like an interloop and the outer loop.
The outer loop is kind of like the entire ring of like you plan for a feature, you decide what
you want to build, you go and actually implement the feature, you write the tests,
you submit it to CI, you submit to code review, and then provided you pass all that,
then it's time to deploy it in production. Once it's in production, you've got to observe
and monitor it and react to any issues that happen along the way. So that's kind of like
the outer loop. That sort of happens at the team level or maybe the organizational level.
The interloop is the kind of cycle that a single developer iterates on potentially multiple
times per day. And this is really the engine of how you iterate to something that is like a
working patch that actually delivers the feature. So in one invocation of the outer loop, there's
many interloups that you go through because as a developer, unless you're like a superstar
genius who's already written this feature before, the first attempt at implementing new feature,
you're going to get a lot of stuff wrong, you're going to kind of like figure stuff out along the way,
you're going to acquire more context and realize,
oh, there's this other thing that exists that I should be using.
And so it's that kind of like learning process
that you want to accelerate as much as possible.
And so if you look at the landscape of code AI today,
the systems that are actually in production and in use,
they're all interloop tools.
So anything that is in your editor doing inline completions or chat,
that's kind of assisting you in the process of writing the code
in assisting you and kind of accelerating your interloop as a developer.
And there's just a ton of opportunity there.
And we think of it mainly in terms of, you know, beyond chat and completion,
there are these specific use cases that represent forms of toil or are just, you know,
a little bit tedious or repetitive or just non-creative that we can help accelerate.
And so we've broken those out into distinct use cases that map to commands in Cody.
So there's a command to generate a unit test informed by the context of your code base.
There's a command to generate dock strings.
There's a command to explain the code, again, pulling in context through the graph
and through using code search that we think can be targeted.
Basically, these are like laser beams that allow us to focus on key pain points in the developer in a loop,
things that like disrupt you, slow you down, and maybe take you out of flow.
A ton of stuff there.
That's all near term.
In the longer term, I think the vision that we and a lot of folks are working toward is, hey, can we get to the point where the system can write the feature itself? The code writes itself, so to speak. An AI engineer, yeah. An AI engineer, exactly. The kind of interface for that, the way we describe it is can you take an issue description? It's either a bug report or the description of a new feature that you want to add. And can your system generate a pull request or a change set?
that implements the spec that you provide without human intervention or human supervision
in the actual process of writing the code.
And so in the long term, we are working towards that.
I think we're still a little bit away from getting there.
There will be kind of like a range of issues that can be supported in terms of complexity.
Right.
Like there's certain like bugs and issues that, you know, in whole are kind of a form of
toil. Like, no one wants to do them because it's kind of like busy work, even though
it might be really important busy work, you know, like keeping your dependencies up to date
and things like that. Those are probably the things that we'll tackle first, be able to completely
automate first, and then we'll slowly work our way up towards more sophisticated features.
Migrating database schema. Yeah, exactly. There's probably maybe like a two-by-two you want
to draw here between like how tedious is it and how high stakes is it. And, you know, we'll slowly
try to migrate up into the the upper right quadrant.
You don't trust my AI to do that yet?
Actually, I do want to talk about the constraints because like I've been thinking a bunch
about this too.
And like one, if you take inspiration from the iterative process of real humans writing
code and I'm like, okay, like, you know, there's pseudo code in my head and I'm going to
test something and then I've got to like remember how something works.
There's now one within a small community of people working on this, like, including
increasingly interesting vein of thought, which is like, okay, we're going to invest more
in, sometimes people call it system two thinking or, you know, variations of test time
search, like generate more samples. And because it is code, do different types of validation.
Right? Yeah. There's another school that's just like make the model better, right? Like,
we don't need any validation. We just need more reasoning, right? I don't know if there's others
that you think about, but like, are those the right dimensions of constraint? Like, be more right
in terms of what we show the end user or just, you know, have the model.
So, I mean, just to restate what you just said, I think that's a good way to slice it.
Like the two examples you mentioned, who's like, okay, like, is the approach, just integrate validation methods into the kind of like chain of thought and execution.
And maybe we can get by with like small dumb models as long as like there's a feedback loop and work.
Or what we have today, right?
Or what we have today.
And then another school thought would be like, hey, we really need just like much smarter models who don't make the same sorts of like stupid mistakes as are made today.
I think that's an interesting way to slice it.
Another way to slice it that has been kind of top of line for me is if your goal is issue to pull request, one way to do it is you could take a model of whatever size and basically like decompose that task.
down into sub-tasks.
So if you're trying to implement this complex feature,
which files do you need to edit,
what functions do you add to each file,
and what unit tests do you need to validate that functionality?
You can keep decomposing it until you're at the level
where today's language models can solve that,
and then you kind of chain them together, right?
So that's kind of one way.
It's kind of like to break it down
and then build it from the bottom up.
The other way to do this is just,
I mean, you could just say like,
the first way is wrong,
Like, the first way is how humans do it, but, you know, it's not necessarily the case that the best way to do it for a machine is the way that humans do it.
Another way you could do it is just say, like, hey, let's expand the context window of the models that can attend to, you know, a large chunk of the existing code base, and then just ask it to generate the diff.
And, you know, if that is reliable enough, it'll probably be unreliable, but if it's reliable enough such that, you know, it works 1% of the time, then you can just roll the dice.
100 times. And as long as you have like a validation mechanism, you know, as long as it outputs
the unit tests, which you can kind of like quickly review, then you just roll a dice 100 times.
And chances are, at least one of them will be correct. And that's the one that you go with.
And that's kind of the latter approach is kind of the approach that papers like, you know,
alpha code or systems like alpha code take when they're trying to tackle these like programming
competition type problems. So the limiting factor in the first approach, the bottom of approach,
is what percentage of time does like a single step in your whole process work?
Because you're essentially rolling the dice, you know, end times.
And if each, your success rate each time is, you know, like 90%,
then it's basically like, you know, decays to zero, the longer your chain of execution is.
So the more steps that are required, the more, the exponentially less likely you're going to get all the way to full success.
process is. And right now, I think, like, the, the, uh, the fidelity of today's systems
far less than 90% for each step. So I think this is the issue that everyone building agents
in that way is, is like, you know, how, you have compounding failure. Yeah. You can have
compounding failure. And then, I mean, you have a kind of similar issue on the other side of
things, which is like if you're trying to do the alpha code thing, you've gotten that to work
decently well for programming language competition style problems. But working, like building a new
feature within the context of a large code base, you try to zero shot it. I think the number of
times you'd have to roll the dice would be basically cost prohibitive or time prohibitive.
For both approaches, I think context quality can play a key role because what we found is, for
Cody, for example, when our context fetching engine works, the quality of code generated by
Cody, it's like night and day. The ability for today's LMs to kind of like pick up on
patterns in the existing code, understand what the existing APIs are in use, pick up on
like the testing framework that you're using. It's like really, really good. And so it raises the
kind of like reliability level up from, you know, this is a complete dice roll. We definitely
need to keep the human in the loop to the point where you're like, okay, maybe if we improve
this context fetching engine just a little bit more, we can get the point where we can start
chaining these like two, three, four step workflows together into something that works. So I guess
the short answer to your question, like, how do we get to more reliable agents? For us, the
answer relies heavily on context quality and fetching the right context for the right use cases
quickly. Yeah, I guess I have a lot of optimism. If you look at this as just like a, it's just
an engineering problem with a pipeline that has a bunch of different inputs, each of which
you can improve from here, and you're doing tradeoffs against improvement in any part of that
pipeline.
And like that could include how we turn that natural language issue into something that a model can
plan from or that we decompose, right, to what the context quality is to solve that, to what is
the efficiency tradeoff of like go sample new solutions from the language model versus
what is the quality of your feedback from runtime?
evaluation and there's different types of feedback you could get.
I assume that there's like some, for any given level of language model quality, there's
like some optimal pipeline.
And I think we're like very far from that today.
And then all of the dimensions are improving.
So I still kind of think the AI engineer is going to come sooner than rather than later.
Yeah, I'm optimistic.
You see very promising signs, especially when the context engine works.
And I think you raise an interesting point.
I think it still is a bit of an open question.
I think maybe the question comes down to like, you know, this system, this AI engineer,
how much of the architecture of that system is going to be captured at the model layer,
you know, embedded in the parameters of some very large neural network or something that looks like a neural network
versus how much of it is going to be in, I guess, a more traditional software system,
kind of like a traditional, you know, boxes and arrows architecture.
And yeah, my honest question is I'm not exactly sure.
Like, it's not like we don't have any model layer stuff going on at all.
It's certainly something that we're interested in.
But I think our philosophy is we always want to do the simplest thing or what feels like
the simplest thing first.
I think, you know, when I was doing machine learning research,
it was like a principle that I took away.
Because doing the simple thing, it establishes baseline.
Like, oftentimes you'll find that, like,
doing the fancier thing is often sexier.
And certainly these days, it's like trendier, right?
Because you can kind of claim the mantle like,
ah, you know, I made my own LLM, beyond LLM.
And, you know, I trained it on my own data.
Now I'm, I have, you know, AI or ML Streetcred
because I did something at the model layer.
But the lesson I took away from my research days was really the importance of establishing a baseline
because oftentimes, if you do the fancy thing first, you might have something that looks like a good
result because it's going to work to some degree.
But then someone else might come along and do a much dumber, simpler thing, cheaper and one
that can be improved more iteratively.
And it's going to work as well or better than your solution.
There's like many examples of that.
I think there was a most recent example that comes to mind was there was some paper in nature that was published where some, a research group trained a very large neural network to do like climate prediction, you know, a very important problem predicting the weather. It's very tough, right? And the thought was like, you know, using the power and magic of neural networks, we could actually train something to predict the weather. And lo and behold, you know, like it generated good predictions and it was published in nature. And then a year later, there was another paper that was published.
nature where another research group trained a neural network for this exact same application.
But in this case, the neural network was one neuron.
It was literally just like a single aggregator.
And that performed as well as the gigantic neural net.
So it basically established a baseline first.
And that was kind of like what informed our initial prioritization of rag over fine tuning.
It's not that we don't think that there's value in fine-tuning or there's value in training at the model layer.
It's that, you know, RAG helps you establish a baseline.
And I think you're still going to want to do RAG anyways.
Like, even if you have fine-tuned models in the mix, RAG is still sort of this like last-mile data or context.
And so you'll want to do that anyways.
So why not do that first and establish a baseline that will actually inform where you want to invest in at the kind of training layer?
I absolutely agree with that characterization, and I'd say if you approach RAG first, you'll benefit from improvements at the model layer, internal or external, right?
Absolutely.
One question for you before we zoom out from some of the technical stuff, does the offering of small models like the 7 or 8 by 7B size that are quite capable, I think surprised.
a lot of people from from mistral like do small models that show higher level reasoning change
your point of view at all or how you guys approach this we're very bullish on small models
so we've actually integrated mixtrol into cody you can use mixtrol as one of the models
in cody chat as of last week and it's just amazing to see the progress on that side i mean there's
a lot to like about small models they're cheaper and faster and if you can make them approach
the quality of the larger models for your specific use case, then, you know, it's a no-brainer to use
them. I think we also like them in the context of completions. The primary model that Cody uses for
inline completions right now is a star coder, $7 billion. And with the benefit of context,
that actually matches the performance of, you know, larger proprietary models. And we're just
scratching the tip of what's possible there with context fetching right now. So I think
we're very bullish on pushing that boundary up even further. And again, with a smaller model,
inference goes much faster. It's also much cheaper, which means we can provide a faster,
cheaper product to our users. What's not to like there? I think there is a question with the
smaller models, specifically in the context of RAG, because I think there's been some research
that shows that the kind of like in-context learning ability of large language models is a little bit
emergent, like it emerges at a certain level of model size or maybe a certain volume of like
training data size. And if you fine-tune a medium-sized-ish model, sometimes it loses the ability
to do effective in context learning because I think the intuition is it's devoting more of its
parameter space to kind of like memorizing the training set so it can do better kind of like
a route to completion rather than have something that approaches kind of like general reasoning
ability. So that's something that we're kind of watchful for. And it does mean in certain use
cases, chat for instance, Cody still uses some pretty large models for chat. And we have seen
better results with models that have more of a kind of like general reasoning ability because
they're able to better take advantage of the context that's fetched in.
We can't at this time if you're not make predictions.
So one is just you have thought about software development and how to change it for literally
a decade now, probably longer since you had to like think about it to start the company.
What does it look like five years from now?
That is a great question.
Where my mind goes is, well, I guess to answer where software development will go in the next five years,
maybe it's kind of informative to look at how it's evolved over the past.
There's a similar work called The Mythical Man Month that was written in the 70s about software
development that today, oddly enough, despite all the technological changes, still rings very true.
And the core thesis of that book is that software development is this strange beast of knowledge
work that's very difficult to measure.
The common mistake that people make again and again is to treat it as some sort of like factory style work where, you know, commits or lines of code are kind of commodities.
And the goal is just to try to like ship as many of those widgets out as possible.
Whereas, you know, anyone who spent, you know, a month inside a software development or working as an actual software creator knows that there's such a high variance in terms of the impact that.
a line of code can make. You know, you have some features that eat up many lines of code that have
very little business impact. And there's also kind of like one-line changes that can be game
changers for the product that you're building. And so when I look forward at how software
development is going to change, I like to place it in the context of solving a lot of the challenges
that that book called out in the 70s that still exist today. And I think the,
The core problem of software development is one of coordination and visibility.
So to develop the volume of software that we need in today's world requires teams of software developers,
often large teams, building complex systems, features that span many layers of stack.
And a lot of the slowness and a lot of the pain points and a lot of the toil of software development comes from the task of
coordinating human labor across all these different pieces among many different people with
different areas of specialization and also different incentives at play.
And I think the real potential of large language models and AI more generally is to bring
more cohesion to that process.
And I think the gold standard is to really try to get a team of software developers to
operate as if you are of one mind.
you know, one really, really insanely intelligent, productive person with kind of a coherence
of vision and a unity of goals and clarity of focus. And so there's a couple ways in which
AI can do that. Well, specifically two. One is, you know, working from the bottoms up,
making individual developers more productive such that more and more scope of software can be
produced by a single human. If a single human brain is producing that software, then of course
there will be more of a coherence of vision because it's just you that's building everything.
And you can kind of ensure there's a consistency of experience and code quality there.
The other way of doing this is giving people responsible for the overall execution of software
engineering team. You know, the team lead or an engineering leader, director,
visibility into how the code base is changing,
actually helping you keep up to date
with the changes that are happening
across the area of code that is your responsibility.
I don't know of a single director or VP level
of engineering today that reads through
the entire Git commit log of their code base
because doing so would be just literally so tedious
and so time-consuming that you wouldn't have time
for any other parts of the job that are very critical as an engineering leader.
But with the benefit of AI, I think now we have a system that can read a lot of the code on
your behalf and summarize the key bits and sort of grant engineering leaders at long last
the sort of visibility and transparency into how the system as a whole is evolving so they can
attend to the parts that need attention and also make visible to all the other people on the team
how things are evolving so that everyone has kind of the context of the overall narrative
that you're trying to drive when you're kind of shipping day to day and making changes to
code base. I'm just going to take this to its logical conclusion, Bayong. So like Brooks's law
from this book was that adding manpower to a late software project makes it later, right? So I think
the future is just me and like, you know, like a Jira linear shortcut interface, a really good
spec and like one sprint later, my AI engineer is done because I didn't have to communicate with my
team. That's it. Yeah. If your goal is to build software as it exists today, then yes, I think in
the future, a single human will be able to build applications that today require large numbers
of people to coordinate. On the other side of things, though, I think that the demand for
software, we're nowhere close to reaching the demand for good, high quality software.
And I think human beings have a tendency to take any system or technology that we're given
and kind of push it to the limits or stretch it as far as we can.
So I think the other thing that's going to happen is that our ambitions as a species
for building complex, sophisticated software are going to kind of grow with the capabilities
that we have.
And so I still think we will have large teams of software developers in the future.
they'll just, you know, each individual will be responsible for far more feature scope than they
are today. And the system as a whole will be more sophisticated and more powerful. But people still have to
coordinate. So what do you think will matter in that, like, future in terms of how, like, what
software engineers need to know how to do, right? And the little bit of color I'll give you here is we
ran this hackathon early in the year for a bunch of talented undergrads who had built, like,
You know, they're working on startups.
They built, like, really good machine learning demos or done interesting research or something.
And so there are people who are like, I learned to code around AI tools, which is a wild idea to me.
Yeah, yeah, yeah.
Like, I started on cursors, my first IDE or whatever.
And at a point of view, that was a little surprising to me.
And I think, like, March of this year was, like, we just don't need to learn to code anymore.
Right.
And I'm like, like, how could you say that?
Like, you know, like, they don't even teach a garbage collection anymore.
like, grumpy old man, like, where's the CS fundamentals?
Like, what do you think people need to know?
Like, what would be valued?
So my take on this, and here's the advice I would give to myself or, you know, a younger
sibling or my child, you know, if they were, you know, at that age where they're trying
to determine what skills they should invest in.
I think coding is still going to be an incredibly valuable skill moving forward.
I think in the limit, the things that are going to be valuable that are going to differentiate
humans operating in collaboration with AI, if you think about layers through which software delivers
value, at the very top, you have kind of like the product level concerns, the user level
concerns, like how do I design the appropriate user experience, how do I make this piece
of software meet the business objectives that I'm trying to achieve?
And then you have at the very bottom the very low level,
okay, like what data structures, what algorithms,
what sort of specific things underneath the hood
are happening that are gonna roll up
to the high level goals that I wanna achieve.
And then you have like a lot of stuff in the middle
that is really just mapping the low level capabilities
that you're implementing to the high level goals
that you're trying to achieve.
And I think what AI will do is it will compress
the middle because in the middle is really just a lot of like abstractions and middleware
and other things that are today necessary and today, you know, require a lot of human labor
to implement. It's more boilerplatey. It's more tedious, repetitive, non-differentiating.
It's more mechanical, but it's all necessary today because you got to connect the dots from the high-level goals to low-level functionality.
But the actual, like, creative points, the real linchpins around which software design turns are really going to be the high-level goals, like what you're trying to achieve, and then the low-level capabilities.
My maybe a bit contrarian hot take here is that CS fundamentals, if anything, are going to grow in importance.
you know, the stuff you learn in a coding boot camp, maybe that gets, you know, automated away.
But the fundamentals of knowing, you know, which data structures, what their properties are,
how you can compose them creatively into solutions that meet high-level goals,
that is kind of the creative essence of software development.
And I think humans will have the ability to spend more time connecting those dots in the future
because they'll just need less time spent on kind of like that middleware piece.
So I still think CS Fundamentals are very important and also domain expertise.
So if you're trying to build software in a given domain,
really understanding what moves the needle in that domain is going to be really important.
Awesome.
Bang, I think we're out of time.
It's a great conversation.
Thank you so much for doing this.
Thank you so much for having me.
This is really fun.
Find us on Twitter at No Prior's Pod.
Subscribe to our YouTube channel if you want to see our faces.
Follow the show on Apple Podcasts, Spotify, or wherever you listen.
That way you get a new episode every week.
And sign up for emails or find transcripts for every episode at no-dashfires.com.