No Priors: Artificial Intelligence | Technology | Startups - Coding in Collaboration with AI with Sourcegraph CTO Beyang Liu

Starting point is 00:00:00 Hi, listeners, and welcome to another episode of No Pryors. This week, we're talking to Bayang Liu, the co-founder and CTO of Sourcegraph, which builds tools that helps developers innovate faster. Their most recent launch was an AI coding assistant called Cody. We're excited to have Bayang on to talk about how AI changes software development. Welcome. Cool. Thanks, Sarah. It's great to be on. Thanks for having me. Yeah. So you guys founded SourceGraph. all the way back in 2013, right?

Starting point is 00:00:32 I feel like I met you and Quinn at GopherCon either that year or the year after. Do you remember? Yeah, I think that's right. We met at one of those like after conference events. And I remember you asked me a bunch of questions about developer productivity and code search and what we're doing back then. Many listeners to the podcast are technical, but can you describe the core thesis of the company? Quinn and I are both developers by background.

Starting point is 00:00:54 We felt that there was kind of like this gap between the promise of programming being in flow. and getting stuff done and creating something new that everyone experiences, it's probably the reason that many of us got into programming in the first place, the joy of creation. Then you compare that with the day-to-day of most professional software engineers,

Starting point is 00:01:14 which is a lot of toil and a lot of drudgery. When we kind of drilled into that, why is that? I think we both realized that we're spending a lot of our time in the process of reading and understanding the existing code rather than building new features, because all that is prerequisite. for being able to build quickly and efficiently.

Starting point is 00:01:33 And that was a pain point that we saw again and again, both with the people that we collaborated with inside the company we were working at at the time, Palantir, as well as a lot of the enterprise customers that Palantir was working with. So we were kind of drop shipping into large banks and Fortune 500 companies and building software kind of embedded with their software teams. And if anything, the pain points they had around understanding legacy code and figuring out the context,

Starting point is 00:02:00 of the code base so that could work effectively was, you know, 10x, 100x of the challenges that we were experiencing. So it was partially, you know, scratching our own itch and partially like, hey, like the pain we feel is reflected across all these different industries trying to build software. Yeah, and we're going to come back to context and how important is for using this generation of AI. But I want to go actually back to, like, some routes you have in thinking about AI and you're interning at the Stanford AI Research Lab way back one. Yeah. Like, that wasn't the starting point for source graph.

Starting point is 00:02:31 It was more like, oh, we need like super grep, right? Like, we just need a version of search that works in real environments and is useful for getting to flow. When in the story of source graph did you start thinking about how advancements in AI could change the product? My first love in terms of computer science was actually AI and machine learning. That's what I concentrated in when I was a student at Stanford. I worked in Stanford AI Lab with Stephanie Kohler, she's my advisor, mostly doing computer vision stuff in those days. And it was very different in those days.

Starting point is 00:03:05 We're now living through the neural net revolution. We're well into it. It's just like neural nets everywhere. And in those days, it's still kind of like the dark ages of neural nets, where it was after the first initial successes they had in like the late 80s and 90s, doing OCR with them. But then after that, the use cases sort of petered out. the time that I was doing it, the conventional wisdom, the thing that they told us in, you know,

Starting point is 00:03:31 machine learning one and one was like, you know, neural nets were this thing that we tried, you know, a decade or so ago, but it didn't really pan out. So these days were mostly focused on graphical models and statistical learning techniques, you know, really trying to be explicit about modeling the probability distribution of what we're trying to represent. We actually had Daphne and one of her other former students, Lucas Bewald, from now weights and biases on the podcast as well. And both of them were also like lamenting the dark ages when like neural nets or like this weird niche thing. And we're going to work on graphical models instead. But it's very cool to see so many people who have like a, you know, interest in technical passion in this,

Starting point is 00:04:10 like emerge the other end and be like, aha, like now is the time. So at what point where you're like, okay, like I'm going to look at this and we're going to try to work on it at source graph? Yeah, it's great. It really feels like a homecoming of sorts. And I think we're very fortunate that a lot of the underlying skill sets, I think, do transfer pretty well. I mean, it's all linear algebra and matrix operations underneath the hood, and that stuff is still applicable. And a lot of the intuitions, like the value of sparsity and things like that, still are kind of applicable. I'm still waiting for the statistical learning and maybe some of the convex optimization stuff to reemerge. I wouldn't count it entirely out yet. I feel like the pendulum always swings back

Starting point is 00:04:48 the other way. It's swung away from statistical learning and convex optimization. and models now. But I think they'll reemerge, especially as we try to get deeper into interpreting how and why neural nets and attention is as good as it is. But to answer your question,

Starting point is 00:05:07 when do we start thinking about this at Sourcegraph? I want to say it was like surga 2017-2018 that we started to kind of like revisit some of this because I think 2017 was when the attention paper came out and you started to see more applications

Starting point is 00:05:23 of LMs in in the space of code. I think Tab 9 was one of the earliest to market there with the L-LM-based auto-complea. I remember chatting with someone who had essentially implemented that on top of GPD2 at the time. And it wasn't nearly as good as it is now, even then,

Starting point is 00:05:43 you know, like two or three years ago. But we ran some early experiments applying LLMs, specifically embeddings to code search and that yielded some interesting results. Again, the quality wasn't at the point where we were ready to productionize it yet, but it was certainly like enough to keep us going. I think things really picked up September or October of last year. It was a confluence of factors. I think one, our internal efforts just kind of reached a level of maturity where we started being more serious, devoting more time to it.

Starting point is 00:06:15 Second thing is I went on paternity leave, so I was able to step away from kind of like the day-to-day stuff a little bit, and that gave some time and room for kind of experimentation. And then, of course, at the end of November, chat GPT landed, and that just changed the game for everyone. And there was a ton of interest in excitement that really gave us a big kick to start exploring in-depth the efforts that we already had underway. Awesome. And so explain what Cody is today. Cody is an AI coding assistant.

Starting point is 00:06:46 It integrates into your editor, whether you're using VSCode or JetBrains. I have experimental support for NeoVim, and as an EMAX user, EMAX is on the way. We've also integrated it into our web application. So if you go to sourcegraph.com and go to a repository page, there's an Ask Cody button that allows you to ask high-level questions about that code base. And in terms of feature set, it supports a lot of the features

Starting point is 00:07:12 that other coding assistance support inline completions, high-level Q&A informed by the context of your code base, kind of specific commands like generate unit test or fix this compiler error that are kind of like inline actions in the editor. And our main point of differentiation is across that feature service area, we augment the large language model that we're using underneath the hood with all the context that we can pull in through a source graph and through techniques that we've refined over the past decade, building a really awesome code understanding tool for human developers.

Starting point is 00:07:48 Okay, so you have said, and I think it is like a more interesting point of view now that there's an argument that choosing and structuring like large repo context is the key unlock for code generation and like AI code functionally. Can you explain how you guys approach it? Yeah, so in many ways the context problem, so, you know, context, another word for it is retrieval augmented generation. The basic idea, I mean, listeners of your pot are probably familiar with this, but just for the ones that are, you know, tuning in. and unfamiliar. The idea is that large language models get a lot smarter when they're augmented with some sort of context fetching ability, the most common of which is typically like a search engine. So there's a number of examples out there of doing this. Bing chat is one example. Perplexity is another example. They're building Google competitors where they integrate

Starting point is 00:08:40 the large language model with a web search functionality. And fetching search results into the context window of the language model helps basically. anchor it to specific facts and knowledge that helps it hallucinate less and generate more accurate responses. We essentially do the same thing for code, using a combination of code search and also something we call graph context to pull in relevant code snippets and pieces of documentation into the context window in a way that improves code generation and high-level Q&A about the code. And so on the code search end, we're essentially incorporating the technologies that we've built to help human developers over the past decade. So if you look at the core feature set of source graph, the bread butter really is you have code search, which allows you to go from, you know, I'm thinking of a function or I'm thinking of an error message and quickly pinpointing the needle in the haystack in a giant universe of code.

Starting point is 00:09:40 And then from there, it's sort of this walking the reference graph of code. So go to definition, find references in a way that doesn't require you to set a development environment or, you know, tangle with any build systems. It just all kind of works. So the analogy there is like we want to make exploring and searching code as easy as it is to explore and search the web. That's a huge unlock for humans being able to take advantage of the institutional knowledge embedded in that data source. And it turns out those same actions, the code search and then the walking of the reference graph, it turns out to be really useful for surfacing relevant pieces of context that you can then place into a language model's context window that makes it much better at generating code that fits within the context of your code base and also answering questions accurately without making as much stuff up.

Starting point is 00:10:27 Actually, I'm very interested. Do you do both, let's say, other traditional information retrieval approaches like ranking along with AST transversal? Or, like, is there information missing from the graph context that's also useful, either for your humans using search or for the models using search? Yeah, there's a ton of data sources. Let's start with the search side, which the search problem is really like, hey, the user asks a question, now find me all the pieces of code or piece of documentation that could be relevant to answering that question. We really view that as a generalized search problem.

Starting point is 00:10:59 It has a lot of parallels to end user search with the difference being, you know, for human search, it's really important to get the quote-unquote right result in the top three. Otherwise, people will ignore it. Whereas with language models, you actually have a little bit more flexibility because, you know, you have a context window of these days at least, you know, 2,000 tokens, some cases much longer, right? And then in terms of how you do that fetching, the overall architecture is very similar to how you would design a search engine. So you have a two-layered architecture at the bottom layer are your kind of like underlying,

Starting point is 00:11:35 retrievers. So the base case here would be just keyword search or the fancy way of saying that nowadays is sparse specter search. If you use the kind of like one hot encoding where ones correspond to the presence of certain dictionary words. Anyways, that's just keyword search. It actually works reasonably well. I think if you talk to a lot of rag practitioners, you'll find that the kind of like dirty secret is that keyword search can probably get you more than 90% of the way there. Let's talk about embeddings in a little bit. But on keyword search alone, there's a lot that we do.

Starting point is 00:12:13 It's a combination of classic keyword search, combining that with things that work well for code, like regular expressions and string literals. Also really important is how you index the data. So what you're treating as, quote, unquote, like the document in your keyword search backend. We found it that it's absolutely essential if you're searching over code to parse things.

Starting point is 00:12:39 And so you can extract specific functions and methods and classes along with the corresponding dock string and treat those as separate entities in your system rather than indexing at the file level or trying to do some more naive chunking. So there's the keyword search. We also have embeddings-based search or dense vector search where you basically run those same documents,

Starting point is 00:13:02 those functions and symbols and encode through an LM, take out the kind of internal representation, the embeddings vector, and then do nearest neighbor search against that. There's a couple other techniques that can use to surface, you know, relevant context to, like matching file names and things like that. Anyways, you have this basket of underlying kind of retrievers.

Starting point is 00:13:24 And the goal of the retrievers is just to get, preserve 100% recall. So make sure you don't miss anything, but also get the candidate result set down to a size where you can use a fancier method to bump the really relevant stuff up in the context window. And that's where the second layer of the architecture comes into play. And the second layer is the re-ranking layer. Again, if you're implementing a search engine, this is how you do it, right? Like you have, after your layer ones proposed like all the candidates up, you have a fancier, you know, re-ranking layer that would be too slow to invoke across the entire document corpus. But once you've kind of scoped it down to a smaller set, you can,

Starting point is 00:14:01 take the re-ranker and the purpose of the re-ranker is really to bump the right result or the most relevant results up to the top. So optimizing for precision over recall. So that's kind of like the general architecture of the search backend that powers Cody. Awesome. Yeah. I think one of the things that I believe and we believe at conviction is that people are going to build pipelines that look like search pipelines attached to a large language model in many more domains. And you should treat that entire, like you guys are building a very sophisticated version here having worked on search for a while, but that parts beyond the language model itself are quite important. For example, like the embeddings model and your chunking strategy. And they're actually

Starting point is 00:14:48 pretty data specific. Yep. Right. We were just talking about this. And I think people are going to end up with domain-specific and even fine-tuned embeddings models from companies like Voyage or in-house because there's a – I think there's a lot of headroom on performance there. Yep, absolutely. I think the Voyage folks are doing really interesting stuff working on an embeddings model for code. We're kind of collaborating with them at the moment. They're really smart set of folks. And I think you're absolutely right. There's so many components in these AI systems that are outside of the, quote, quote, main language model that are really important. And we found that the most important things are, I mean, really what this comes down to is like

Starting point is 00:15:30 a data quality and data processing pipeline, which has been something that people have realized for a long time, right? Like, your model architecture can only go so far if your data is garbage. So you really need a high quality data pipeline. And that means not only having, you know, in our domain, high quality code that can serve as the underlying data to use, but also a way to structure that data in a way where you can maximize your ability to extract signal from noise. Do you take into account the quality of code in this pipeline in some way?

Starting point is 00:16:04 Because, you know, you're working on customer code basis, like if there's anything like the code basis I've interacted with. You know, there's a variance of quality, but that's the real world. So, like, what do you mean by, you know, high quality code here? I mean, we kind of implicitly do right now because we, Built into Cody is this notion of like, you know, which code is it referencing? It's going to reference the code in your code base first. And that's probably the most relevant code if you're trying to work on day-to-day tasks in a private code base.

Starting point is 00:16:34 We're probably going to release a feature soon. This is something that our customers have requested. Basically, the ability to point Cody at areas of the code base that are better models of what good looks like. We've talked with a lot of enterprise customers where when we say, like, hey, you know, Cody has the context of your code base. It will go and do a bunch of code searches when it's generated code for you. Their initial reaction is like, can I tell it to ignore a large parts of the code? Because there's certain parts of the code where like, yeah, those are antipads.

Starting point is 00:17:04 We're trying to like deprecate that or migrate away from that pattern. And we're like, yeah, absolutely. That's actually like a very easy thing to do at the search layer. And the nice part of this, too, is when you're doing RAG, you can be very explicit about the type of information, the type of data you're fetching into the context window. You basically can give someone like a lever that they could turn on or off or like a slider at query time that kind of controls what you tag in as context. So, you know, maybe sometimes you really do want the full code base as context when you're

Starting point is 00:17:37 doing something like a completions or you're just trying to, you know, get something out the door. Other times maybe you want to be a little bit more thoughtful about what context you're attending to because you have another goal in mind. You know, not only do you want to ship the feature, but you also want to up-level the quality of your code or make it look more like some golden copy you have somewhere in your code base. You just mentioned completions, and then there's the other sort of user experience model that we've seen, which is chat in terms of how people interact with code generation capabilities. Yep. Where do we go from here, right?

Starting point is 00:18:12 Is it like, is it agents? Is it more reliability? Like, what do you want to build Cody into? Yeah. So I think there's kind of like the short term, the long term to think about. In the short term, I think there's a ton more surface area in the developer interloop and kind of like human in the loop use cases. Sorry, describe what you mean by interloop.

Starting point is 00:18:31 When you think about the software development lifecycle, this kind of iterative cycle through which we build software, there's kind of like an interloop and the outer loop. The outer loop is kind of like the entire ring of like you plan for a feature, you decide what you want to build, you go and actually implement the feature, you write the tests, you submit it to CI, you submit to code review, and then provided you pass all that, then it's time to deploy it in production. Once it's in production, you've got to observe and monitor it and react to any issues that happen along the way. So that's kind of like the outer loop. That sort of happens at the team level or maybe the organizational level.

Starting point is 00:19:06 The interloop is the kind of cycle that a single developer iterates on potentially multiple times per day. And this is really the engine of how you iterate to something that is like a working patch that actually delivers the feature. So in one invocation of the outer loop, there's many interloups that you go through because as a developer, unless you're like a superstar genius who's already written this feature before, the first attempt at implementing new feature, you're going to get a lot of stuff wrong, you're going to kind of like figure stuff out along the way, you're going to acquire more context and realize, oh, there's this other thing that exists that I should be using.

Starting point is 00:19:50 And so it's that kind of like learning process that you want to accelerate as much as possible. And so if you look at the landscape of code AI today, the systems that are actually in production and in use, they're all interloop tools. So anything that is in your editor doing inline completions or chat, that's kind of assisting you in the process of writing the code in assisting you and kind of accelerating your interloop as a developer.

Starting point is 00:20:16 And there's just a ton of opportunity there. And we think of it mainly in terms of, you know, beyond chat and completion, there are these specific use cases that represent forms of toil or are just, you know, a little bit tedious or repetitive or just non-creative that we can help accelerate. And so we've broken those out into distinct use cases that map to commands in Cody. So there's a command to generate a unit test informed by the context of your code base. There's a command to generate dock strings. There's a command to explain the code, again, pulling in context through the graph

Starting point is 00:20:52 and through using code search that we think can be targeted. Basically, these are like laser beams that allow us to focus on key pain points in the developer in a loop, things that like disrupt you, slow you down, and maybe take you out of flow. A ton of stuff there. That's all near term. In the longer term, I think the vision that we and a lot of folks are working toward is, hey, can we get to the point where the system can write the feature itself? The code writes itself, so to speak. An AI engineer, yeah. An AI engineer, exactly. The kind of interface for that, the way we describe it is can you take an issue description? It's either a bug report or the description of a new feature that you want to add. And can your system generate a pull request or a change set? that implements the spec that you provide without human intervention or human supervision in the actual process of writing the code.

Starting point is 00:21:50 And so in the long term, we are working towards that. I think we're still a little bit away from getting there. There will be kind of like a range of issues that can be supported in terms of complexity. Right. Like there's certain like bugs and issues that, you know, in whole are kind of a form of toil. Like, no one wants to do them because it's kind of like busy work, even though it might be really important busy work, you know, like keeping your dependencies up to date and things like that. Those are probably the things that we'll tackle first, be able to completely

Starting point is 00:22:21 automate first, and then we'll slowly work our way up towards more sophisticated features. Migrating database schema. Yeah, exactly. There's probably maybe like a two-by-two you want to draw here between like how tedious is it and how high stakes is it. And, you know, we'll slowly try to migrate up into the the upper right quadrant. You don't trust my AI to do that yet? Actually, I do want to talk about the constraints because like I've been thinking a bunch about this too. And like one, if you take inspiration from the iterative process of real humans writing

Starting point is 00:22:55 code and I'm like, okay, like, you know, there's pseudo code in my head and I'm going to test something and then I've got to like remember how something works. There's now one within a small community of people working on this, like, including increasingly interesting vein of thought, which is like, okay, we're going to invest more in, sometimes people call it system two thinking or, you know, variations of test time search, like generate more samples. And because it is code, do different types of validation. Right? Yeah. There's another school that's just like make the model better, right? Like, we don't need any validation. We just need more reasoning, right? I don't know if there's others

Starting point is 00:23:29 that you think about, but like, are those the right dimensions of constraint? Like, be more right in terms of what we show the end user or just, you know, have the model. So, I mean, just to restate what you just said, I think that's a good way to slice it. Like the two examples you mentioned, who's like, okay, like, is the approach, just integrate validation methods into the kind of like chain of thought and execution. And maybe we can get by with like small dumb models as long as like there's a feedback loop and work. Or what we have today, right? Or what we have today. And then another school thought would be like, hey, we really need just like much smarter models who don't make the same sorts of like stupid mistakes as are made today.

Starting point is 00:24:20 I think that's an interesting way to slice it. Another way to slice it that has been kind of top of line for me is if your goal is issue to pull request, one way to do it is you could take a model of whatever size and basically like decompose that task. down into sub-tasks. So if you're trying to implement this complex feature, which files do you need to edit, what functions do you add to each file, and what unit tests do you need to validate that functionality? You can keep decomposing it until you're at the level

Starting point is 00:24:53 where today's language models can solve that, and then you kind of chain them together, right? So that's kind of one way. It's kind of like to break it down and then build it from the bottom up. The other way to do this is just, I mean, you could just say like, the first way is wrong,

Starting point is 00:25:08 Like, the first way is how humans do it, but, you know, it's not necessarily the case that the best way to do it for a machine is the way that humans do it. Another way you could do it is just say, like, hey, let's expand the context window of the models that can attend to, you know, a large chunk of the existing code base, and then just ask it to generate the diff. And, you know, if that is reliable enough, it'll probably be unreliable, but if it's reliable enough such that, you know, it works 1% of the time, then you can just roll the dice. 100 times. And as long as you have like a validation mechanism, you know, as long as it outputs the unit tests, which you can kind of like quickly review, then you just roll a dice 100 times. And chances are, at least one of them will be correct. And that's the one that you go with. And that's kind of the latter approach is kind of the approach that papers like, you know, alpha code or systems like alpha code take when they're trying to tackle these like programming

Starting point is 00:26:01 competition type problems. So the limiting factor in the first approach, the bottom of approach, is what percentage of time does like a single step in your whole process work? Because you're essentially rolling the dice, you know, end times. And if each, your success rate each time is, you know, like 90%, then it's basically like, you know, decays to zero, the longer your chain of execution is. So the more steps that are required, the more, the exponentially less likely you're going to get all the way to full success. process is. And right now, I think, like, the, the, uh, the fidelity of today's systems far less than 90% for each step. So I think this is the issue that everyone building agents

Starting point is 00:26:47 in that way is, is like, you know, how, you have compounding failure. Yeah. You can have compounding failure. And then, I mean, you have a kind of similar issue on the other side of things, which is like if you're trying to do the alpha code thing, you've gotten that to work decently well for programming language competition style problems. But working, like building a new feature within the context of a large code base, you try to zero shot it. I think the number of times you'd have to roll the dice would be basically cost prohibitive or time prohibitive. For both approaches, I think context quality can play a key role because what we found is, for Cody, for example, when our context fetching engine works, the quality of code generated by

Starting point is 00:27:29 Cody, it's like night and day. The ability for today's LMs to kind of like pick up on patterns in the existing code, understand what the existing APIs are in use, pick up on like the testing framework that you're using. It's like really, really good. And so it raises the kind of like reliability level up from, you know, this is a complete dice roll. We definitely need to keep the human in the loop to the point where you're like, okay, maybe if we improve this context fetching engine just a little bit more, we can get the point where we can start chaining these like two, three, four step workflows together into something that works. So I guess the short answer to your question, like, how do we get to more reliable agents? For us, the

Starting point is 00:28:10 answer relies heavily on context quality and fetching the right context for the right use cases quickly. Yeah, I guess I have a lot of optimism. If you look at this as just like a, it's just an engineering problem with a pipeline that has a bunch of different inputs, each of which you can improve from here, and you're doing tradeoffs against improvement in any part of that pipeline. And like that could include how we turn that natural language issue into something that a model can plan from or that we decompose, right, to what the context quality is to solve that, to what is the efficiency tradeoff of like go sample new solutions from the language model versus

Starting point is 00:29:00 what is the quality of your feedback from runtime? evaluation and there's different types of feedback you could get. I assume that there's like some, for any given level of language model quality, there's like some optimal pipeline. And I think we're like very far from that today. And then all of the dimensions are improving. So I still kind of think the AI engineer is going to come sooner than rather than later. Yeah, I'm optimistic.

Starting point is 00:29:23 You see very promising signs, especially when the context engine works. And I think you raise an interesting point. I think it still is a bit of an open question. I think maybe the question comes down to like, you know, this system, this AI engineer, how much of the architecture of that system is going to be captured at the model layer, you know, embedded in the parameters of some very large neural network or something that looks like a neural network versus how much of it is going to be in, I guess, a more traditional software system, kind of like a traditional, you know, boxes and arrows architecture.

Starting point is 00:30:06 And yeah, my honest question is I'm not exactly sure. Like, it's not like we don't have any model layer stuff going on at all. It's certainly something that we're interested in. But I think our philosophy is we always want to do the simplest thing or what feels like the simplest thing first. I think, you know, when I was doing machine learning research, it was like a principle that I took away. Because doing the simple thing, it establishes baseline.

Starting point is 00:30:37 Like, oftentimes you'll find that, like, doing the fancier thing is often sexier. And certainly these days, it's like trendier, right? Because you can kind of claim the mantle like, ah, you know, I made my own LLM, beyond LLM. And, you know, I trained it on my own data. Now I'm, I have, you know, AI or ML Streetcred because I did something at the model layer.

Starting point is 00:30:58 But the lesson I took away from my research days was really the importance of establishing a baseline because oftentimes, if you do the fancy thing first, you might have something that looks like a good result because it's going to work to some degree. But then someone else might come along and do a much dumber, simpler thing, cheaper and one that can be improved more iteratively. And it's going to work as well or better than your solution. There's like many examples of that. I think there was a most recent example that comes to mind was there was some paper in nature that was published where some, a research group trained a very large neural network to do like climate prediction, you know, a very important problem predicting the weather. It's very tough, right? And the thought was like, you know, using the power and magic of neural networks, we could actually train something to predict the weather. And lo and behold, you know, like it generated good predictions and it was published in nature. And then a year later, there was another paper that was published.

Starting point is 00:31:55 nature where another research group trained a neural network for this exact same application. But in this case, the neural network was one neuron. It was literally just like a single aggregator. And that performed as well as the gigantic neural net. So it basically established a baseline first. And that was kind of like what informed our initial prioritization of rag over fine tuning. It's not that we don't think that there's value in fine-tuning or there's value in training at the model layer. It's that, you know, RAG helps you establish a baseline.

Starting point is 00:32:34 And I think you're still going to want to do RAG anyways. Like, even if you have fine-tuned models in the mix, RAG is still sort of this like last-mile data or context. And so you'll want to do that anyways. So why not do that first and establish a baseline that will actually inform where you want to invest in at the kind of training layer? I absolutely agree with that characterization, and I'd say if you approach RAG first, you'll benefit from improvements at the model layer, internal or external, right? Absolutely. One question for you before we zoom out from some of the technical stuff, does the offering of small models like the 7 or 8 by 7B size that are quite capable, I think surprised. a lot of people from from mistral like do small models that show higher level reasoning change

Starting point is 00:33:30 your point of view at all or how you guys approach this we're very bullish on small models so we've actually integrated mixtrol into cody you can use mixtrol as one of the models in cody chat as of last week and it's just amazing to see the progress on that side i mean there's a lot to like about small models they're cheaper and faster and if you can make them approach the quality of the larger models for your specific use case, then, you know, it's a no-brainer to use them. I think we also like them in the context of completions. The primary model that Cody uses for inline completions right now is a star coder, $7 billion. And with the benefit of context, that actually matches the performance of, you know, larger proprietary models. And we're just

Starting point is 00:34:16 scratching the tip of what's possible there with context fetching right now. So I think we're very bullish on pushing that boundary up even further. And again, with a smaller model, inference goes much faster. It's also much cheaper, which means we can provide a faster, cheaper product to our users. What's not to like there? I think there is a question with the smaller models, specifically in the context of RAG, because I think there's been some research that shows that the kind of like in-context learning ability of large language models is a little bit emergent, like it emerges at a certain level of model size or maybe a certain volume of like training data size. And if you fine-tune a medium-sized-ish model, sometimes it loses the ability

Starting point is 00:35:05 to do effective in context learning because I think the intuition is it's devoting more of its parameter space to kind of like memorizing the training set so it can do better kind of like a route to completion rather than have something that approaches kind of like general reasoning ability. So that's something that we're kind of watchful for. And it does mean in certain use cases, chat for instance, Cody still uses some pretty large models for chat. And we have seen better results with models that have more of a kind of like general reasoning ability because they're able to better take advantage of the context that's fetched in. We can't at this time if you're not make predictions.

Starting point is 00:35:45 So one is just you have thought about software development and how to change it for literally a decade now, probably longer since you had to like think about it to start the company. What does it look like five years from now? That is a great question. Where my mind goes is, well, I guess to answer where software development will go in the next five years, maybe it's kind of informative to look at how it's evolved over the past. There's a similar work called The Mythical Man Month that was written in the 70s about software development that today, oddly enough, despite all the technological changes, still rings very true.

Starting point is 00:36:23 And the core thesis of that book is that software development is this strange beast of knowledge work that's very difficult to measure. The common mistake that people make again and again is to treat it as some sort of like factory style work where, you know, commits or lines of code are kind of commodities. And the goal is just to try to like ship as many of those widgets out as possible. Whereas, you know, anyone who spent, you know, a month inside a software development or working as an actual software creator knows that there's such a high variance in terms of the impact that. a line of code can make. You know, you have some features that eat up many lines of code that have very little business impact. And there's also kind of like one-line changes that can be game changers for the product that you're building. And so when I look forward at how software

Starting point is 00:37:21 development is going to change, I like to place it in the context of solving a lot of the challenges that that book called out in the 70s that still exist today. And I think the, The core problem of software development is one of coordination and visibility. So to develop the volume of software that we need in today's world requires teams of software developers, often large teams, building complex systems, features that span many layers of stack. And a lot of the slowness and a lot of the pain points and a lot of the toil of software development comes from the task of coordinating human labor across all these different pieces among many different people with different areas of specialization and also different incentives at play.

Starting point is 00:38:12 And I think the real potential of large language models and AI more generally is to bring more cohesion to that process. And I think the gold standard is to really try to get a team of software developers to operate as if you are of one mind. you know, one really, really insanely intelligent, productive person with kind of a coherence of vision and a unity of goals and clarity of focus. And so there's a couple ways in which AI can do that. Well, specifically two. One is, you know, working from the bottoms up, making individual developers more productive such that more and more scope of software can be

Starting point is 00:39:00 produced by a single human. If a single human brain is producing that software, then of course there will be more of a coherence of vision because it's just you that's building everything. And you can kind of ensure there's a consistency of experience and code quality there. The other way of doing this is giving people responsible for the overall execution of software engineering team. You know, the team lead or an engineering leader, director, visibility into how the code base is changing, actually helping you keep up to date with the changes that are happening

Starting point is 00:39:37 across the area of code that is your responsibility. I don't know of a single director or VP level of engineering today that reads through the entire Git commit log of their code base because doing so would be just literally so tedious and so time-consuming that you wouldn't have time for any other parts of the job that are very critical as an engineering leader. But with the benefit of AI, I think now we have a system that can read a lot of the code on

Starting point is 00:40:10 your behalf and summarize the key bits and sort of grant engineering leaders at long last the sort of visibility and transparency into how the system as a whole is evolving so they can attend to the parts that need attention and also make visible to all the other people on the team how things are evolving so that everyone has kind of the context of the overall narrative that you're trying to drive when you're kind of shipping day to day and making changes to code base. I'm just going to take this to its logical conclusion, Bayong. So like Brooks's law from this book was that adding manpower to a late software project makes it later, right? So I think the future is just me and like, you know, like a Jira linear shortcut interface, a really good

Starting point is 00:40:59 spec and like one sprint later, my AI engineer is done because I didn't have to communicate with my team. That's it. Yeah. If your goal is to build software as it exists today, then yes, I think in the future, a single human will be able to build applications that today require large numbers of people to coordinate. On the other side of things, though, I think that the demand for software, we're nowhere close to reaching the demand for good, high quality software. And I think human beings have a tendency to take any system or technology that we're given and kind of push it to the limits or stretch it as far as we can. So I think the other thing that's going to happen is that our ambitions as a species

Starting point is 00:41:47 for building complex, sophisticated software are going to kind of grow with the capabilities that we have. And so I still think we will have large teams of software developers in the future. they'll just, you know, each individual will be responsible for far more feature scope than they are today. And the system as a whole will be more sophisticated and more powerful. But people still have to coordinate. So what do you think will matter in that, like, future in terms of how, like, what software engineers need to know how to do, right? And the little bit of color I'll give you here is we ran this hackathon early in the year for a bunch of talented undergrads who had built, like,

Starting point is 00:42:28 You know, they're working on startups. They built, like, really good machine learning demos or done interesting research or something. And so there are people who are like, I learned to code around AI tools, which is a wild idea to me. Yeah, yeah, yeah. Like, I started on cursors, my first IDE or whatever. And at a point of view, that was a little surprising to me. And I think, like, March of this year was, like, we just don't need to learn to code anymore. Right.

Starting point is 00:42:51 And I'm like, like, how could you say that? Like, you know, like, they don't even teach a garbage collection anymore. like, grumpy old man, like, where's the CS fundamentals? Like, what do you think people need to know? Like, what would be valued? So my take on this, and here's the advice I would give to myself or, you know, a younger sibling or my child, you know, if they were, you know, at that age where they're trying to determine what skills they should invest in.

Starting point is 00:43:20 I think coding is still going to be an incredibly valuable skill moving forward. I think in the limit, the things that are going to be valuable that are going to differentiate humans operating in collaboration with AI, if you think about layers through which software delivers value, at the very top, you have kind of like the product level concerns, the user level concerns, like how do I design the appropriate user experience, how do I make this piece of software meet the business objectives that I'm trying to achieve? And then you have at the very bottom the very low level, okay, like what data structures, what algorithms,

Starting point is 00:44:00 what sort of specific things underneath the hood are happening that are gonna roll up to the high level goals that I wanna achieve. And then you have like a lot of stuff in the middle that is really just mapping the low level capabilities that you're implementing to the high level goals that you're trying to achieve. And I think what AI will do is it will compress

Starting point is 00:44:24 the middle because in the middle is really just a lot of like abstractions and middleware and other things that are today necessary and today, you know, require a lot of human labor to implement. It's more boilerplatey. It's more tedious, repetitive, non-differentiating. It's more mechanical, but it's all necessary today because you got to connect the dots from the high-level goals to low-level functionality. But the actual, like, creative points, the real linchpins around which software design turns are really going to be the high-level goals, like what you're trying to achieve, and then the low-level capabilities. My maybe a bit contrarian hot take here is that CS fundamentals, if anything, are going to grow in importance. you know, the stuff you learn in a coding boot camp, maybe that gets, you know, automated away. But the fundamentals of knowing, you know, which data structures, what their properties are,

Starting point is 00:45:31 how you can compose them creatively into solutions that meet high-level goals, that is kind of the creative essence of software development. And I think humans will have the ability to spend more time connecting those dots in the future because they'll just need less time spent on kind of like that middleware piece. So I still think CS Fundamentals are very important and also domain expertise. So if you're trying to build software in a given domain, really understanding what moves the needle in that domain is going to be really important. Awesome.

Starting point is 00:46:04 Bang, I think we're out of time. It's a great conversation. Thank you so much for doing this. Thank you so much for having me. This is really fun. Find us on Twitter at No Prior's Pod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen.

Starting point is 00:46:21 That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-dashfires.com.

No Priors: Artificial Intelligence | Technology | Startups - Coding in Collaboration with AI with Sourcegraph CTO Beyang Liu

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.