Software at Scale - Software at Scale 9 - Beyang Liu: CTO, Sourcegraph

Starting point is 00:00:00 Welcome to Software at Scale, a podcast where we discuss the technical stories behind large software applications. I'm your host, Utsav Shah, and thank you for listening. Hey everyone, thanks for joining us on another edition of the Software at Scale podcast. Joining here with me today is Beyang Liu, who's the CTO and co-founder of Sourcegraph, which is an advanced code search system is the best way I can think of describing it. Prior to Sourcegraph, Beyang was a software engineer at Palantir, where he developed new data analysis software on a small customer facing team. He studied science at Stanford, and he was also an intern at Google for some time.

Starting point is 00:00:44 And my suspicion is that's where he got the idea or started looking at the idea of better code search. But I'll ask him to clarify that. So, yeah, just can you tell me a little bit more about yourself and how did you come up with the idea of Sourcegraph? Yeah, sure. So, first of all, you know, thanks for having me on the show. This seems like a really cool podcast and happy to be here chatting with you. So I guess like, indeed, you're right. Google Code Search was part of the inspiration for Sourcegraph. The kind of broader backstory is I'd always kind of been into trying out different developer

Starting point is 00:01:23 tools. I think that's a side hobby or passion that a lot of programmers get into as they learn how to program and they kind of explore ways of making themselves more productive. So, you know, when I first got into programming, I tried out many different editors, many different command line tools. And when I landed at Google for an internship, one of the cool things about that experience was Google had this amazing suite of tools that they had built to make their developers more productive. So, you know, various infrastructure tools, you know, some of which are now open source, like Bazel began its life as Blaze, Google's internal build system.

Starting point is 00:02:08 And so I got to play around with Blaze back in 2010, long before there was kind of a public version of it. But there was another tool that was called Google Code Search, which was especially useful. It was something that I used every single day, probably multiple times per day, and so did everyone I worked with. And it was this cool thing that, it is what it sounds like. It indexed all the code inside Google

Starting point is 00:02:34 and made it accessible at every engineer's fingertips. And so you'd type in a query, search for code, you click into a result, and you'd have kind of the standard editor navigation facilities like jump a dev, find references and things like that. And it was super useful because what it did was it reduced the friction of you diving into some arbitrary part of the code base. And as a consequence, it really felt like much more of the code was accessible to you

Starting point is 00:03:02 as a developer because you didn't have to go and like clone down some other repository you'd have to play around with setting it up in your editor you didn't have to you know think about any of the manual overhead that we typically associate with diving into a new code base you could just type in the thing you're searching for and you know explore it in your web browser so that was pretty cool. After the internship ended, the next year rolled around and I was like, all right, so the next internship or job I have, I look forward to playing around with their code search as well. And obviously, I was a bit disappointed to find out that code search was kind of a non-standard thing. I've since learned that the percentage of companies that have really built

Starting point is 00:03:46 code search for themselves, relatively few, it's like Google and Facebook has an internal app and maybe like a handful of large, other large tech firms, which is a shame because I think it was a really useful tool and it was a tool that I felt was always missing from my toolkit after that experience. And so later on when I got to Palantir, you know, still missing a code search like tool, a lot of what I did at Palantir was dive into legacy code and kind of refactor stuff, make it clean and robust and well-tested and ultimately, you know, try to build new features against it. And I just felt like so much of my day was spent like trying to read and understand the existing code and not enough that it was spent writing and building

Starting point is 00:04:37 stuff, which is in my opinion, the fun part of the job, right? There's nothing, I think that replaces that dopamine rush of building stuff, seeing it work, and then ultimately shipping it. And so I was feeling that pain point. Meanwhile, my co-founder Quinn, also a developer, was also feeling that same pain point. And we happened to be working on this project together where we were working with large Fortune 500 clients of the company. And our interaction with our technical counterparts at these customers was fairly close. And we got to see our pain point reflected across in these development teams at these large Fortune 500 companies. And so that was kind of where we sort of made the connection between, hey, we have a pain that we both experienced day to day.

Starting point is 00:05:26 We want to be more productive. We want to spend more time, you know, writing code and building stuff. And guess what? You know, that pain is not unique to us. It's also reflected across these teams in these really large companies. And, you know, I think a big part of the reason that a lot of those companies were paying Palantir a lot of money was in order to effectively develop software more efficiently. So then we got to thinking, hey, if we can build this tool that brings really high quality code search to everyone, we can have an enormous impact. It'll be super valuable to these other developers and the companies they work for. And guess what? We also get to scratch our own itch in doing so.

Starting point is 00:06:06 So that was kind of the kernel out of which Sourcegraph grew from my vantage point. Yeah. So maybe you can walk listeners through, like, how is Sourcegraph different compared to, you know, just using like RipGrap or something locally? Yeah. Yeah.

Starting point is 00:06:22 Yeah, so first I'll take a step back and say, you know, over the years I've kind of bucketed software developers as like a set of people into like two broad buckets. The first bucket is people have used code search before. And then the second one is people who have never used code search before. And I find that I have to explain the value of source graph in very different ways to these two subsets of the population so if you use code search before if you use something like you know mirth inside dropbox or google code search or tbgs inside facebook or maybe like debian code search it kind of instantly clicks because it's very similar to a tool that you've relied on to navigate this other code tool that you've relied on to navigate

Starting point is 00:07:05 this other code base that you've worked on in the past and you know how it fits into your day-to-day workflow. The second set of people, the ones who've never used code search before, it's a much larger set. It's a little bit hard to explain because it's like you said, you know, how there's a lot of search utilities that we already use as developers. There's the command line tools like grip, grab or grep. There's the search in your editor. And there's also kind of like the search that's built into code hosts. And so I'll start by, you know, I'll do my best to kind of paint a picture. I'll start with kind of the command line and local search tools. So I think the difference between the user experience of those tools and source graph ultimately boils down to

Starting point is 00:07:51 the universe of code that you're searching over and that is relevant to you. I guess those are two distinct universes. Like what is the universe of code that could be relevant to you that you might want to know about as you're building stuff? And what is the universe that is easily searchable by your tool? And related to that is how much friction is involved in any given search. And so with a command line tool or a local editor search, that searches over all the code

Starting point is 00:08:28 that you've checked out to your local machine. So as long as the thing that you're looking for is on your local machine, you can find it. Maybe it's a little bit more friction because if you're using RipGrab, then you have to go into your editor and open up the files represented in the search set and your editor and then continue exploring. But it's like mostly straightforward. The issue is what if you're trying to search over something that may not be checked out to your local machine, or maybe you don't, maybe you've checked it out, but you're not sure where it is, right?

Starting point is 00:09:00 You know, like, is it in some other repository? You don't want to search over all the repositories that you've checked out locally, because that'll take a long time. And so once you start thinking about all the manual effort it takes to look for that thing, most of the times your brain just goes like, ah, screw it. Like, I'm just going to make an assumption or, you know, continue on my way. I'm just going going to not bother with this because it's too much effort. It will take me out of my current focus area. And as a consequence, you don't end up finding the thing that you're looking for.

Starting point is 00:09:32 Maybe it was a particular error message or maybe it was a function whose implementation you're interested in reading or understanding. Or maybe it's a bunch of examples of how this particular function is used. Sourcegraph kind of expands your horizons and at the same time minimizes the friction it takes to search over the universe of code that's relevant to you. So you'd go to Sourcegraph with that query. And then the idea is that you can just type in whatever you're looking for into a single search box, not have to think about it, and see essentially everywhere that query or that pattern exists in your code. It could be the code that you're currently working on. It could be code in another repository in your organization. It could be any of the thousands or millions of open source repositories out there.

Starting point is 00:10:26 And in kind of today's development world, the universe of code that is actually relevant to use is those things, right? It's quite large. It's not just the code in your local machine. It's also code elsewhere inside your organization and open source repositories that you might be pulling in as dependencies or you might want to consider using so you don't have to you know reinvent a particular wheel so that's how sourcecraft is kind of better than than those existing tools but you know the long story short is like it's impossible to convey what this feels like without actually like trying to trying to use it. And so, yeah, I guess for, like, listeners who are in this second set of people, what I recommend is just go into sourcegraph.com. We have a public index that indexes a good chunk of the open source repositories on GitHub. And

Starting point is 00:11:17 just try typing in a query and see how instantly that gets to what you're looking for. I think the speed is definitely a key part. You don't have to think about where it is. With like, GripGrip, you have to open up every file and read it, basically. But Sourcegraph can maintain an index and do that much faster. Yeah. I think Sourcegraph, I can definitely attest to the fact that our company had, as you said, mirth, right? And then we moved to Stripe's open source LiveGraph in the middle.

Starting point is 00:11:49 And I think that's the currently used one. And I think we're transitioning to Sourcegraph as we speak. Yeah. We had Google Analytics on LiveGraph. And we also have Google Analytics on the internal code review system and all of that. And I've seen analytics on every single developer tool and by far code search is the most widely used in terms of just page views, number of times, you know, like the number of users, number of people that click on links. Cause

Starting point is 00:12:17 there's many times when even people who are not writing code, like they don't ever really see the code review system, but they might see code search, like a product manager, somebody in QA, and sometimes even people in support. So that's by far the number of people, like in terms of number of users, number of views, like our code search tool was pretty much like above everything else. And as soon as it would have issues and would break down, because like our repository is getting bigger and bigger, People would complain much more than they would complain if the CI system or something was down. Yeah. That's actually quite representative of what we hear across our customers.

Starting point is 00:12:58 Like when source graph goes down for whatever reason, a lot of people complain. It's not quite a work stopper because you can still technically push code, but it just feels like you're a lot more sluggish. Everyone on the team is a lot more sluggish because they don't have this nice facility to explore the code. And the other thing that you mentioned, which is non-programmer users jumping into the code, that's also something that we see across many, many users. You know, we're all about making code more accessible to everyone. Like our eventual mission as a company is to make it so that more and more people can access code, including people who aren't developers now or may not have the title, you know, software engineer in the future, you know, might be someone

Starting point is 00:13:43 in sales or marketing or product whose day-to-day job is not to write code, but they're now code adjacent. Most jobs are becoming code adjacent. And so it's becoming increasingly relevant to be able to jump into a piece of logic that affects you and understand how it works. And being able to do that just by navigating to a URL in your web browser, that's amazing. That's nice and easy. Cloning that to your local machine and opening up in your editor and futzing with config and getting it to build so that you have like jump to def and find references.

Starting point is 00:14:17 That's not so much. It's like challenging for even programmers to do that. So much less non-technical users. Yeah. I think Sourcegraph also has this one, like, really large or important feature, which it's kind of like IDE in your browser, right? You can't edit code, but it gives you, like, find references and, like, a bunch of stuff. Maybe you can talk about that. I think that's the reason why I'm so excited about Sourcegraph versus just using something like Live Graph. Yeah. Yeah. So, you know, the product is universal code search. And that's how we describe it. But it's, you know, not just code search, because a big part of how you use this product is you

Starting point is 00:14:56 search for something, you see a bunch of results, you click on result, and then it plops you in the middle of some file of code, right? And then the question is like, okay, what do you want to do from there? I think a lot of what we do as developers when we're trying to understand what a piece of code does is we kind of like walk the reference forward and then, uh, a fine references, uh, query would, would be, uh, like, show me all examples of how this is used. That's like a backwards, uh, walking. Um, and that's really important because, uh, you know, when you're trying to understand a piece of code, it calls into some subroutine and you want to, you know, dive into that subroutine and see what's going on there, or maybe, uh, you

Starting point is 00:15:44 know, you're trying to figure out how to use a piece of code. You found the definition, but you want to see used examples. And then find references just lets you see real world examples of how that particular thing is used elsewhere in the code base. And that's really important. Obviously, know it's important because it's built into basically every single IDE. Coding without those types of features is really, really painful. But it's also extremely useful in this kind of like read-only setting that Sourcegraph represents because essentially what you're doing is you're trying to build this mental model of code.

Starting point is 00:16:22 And if it required you to kind of go to a different tool in order to do that, if you couldn't just like search for something, jump into a file and immediately start doing that, that would be more mental overhead and it would kind of defeat the purpose. Because we're all about minimizing the friction it takes to like understand the code. Like you should focus on the creative part of the job, which is understanding why a piece of logic is written the way it is and what the semantics of that are. Not so much the low level mechanics of how do I even navigate this code in easy fashion. And yeah, that's what I love so much about it. Just the fact that all of this

Starting point is 00:17:00 stuff, I don't have to set up an IDE locally and wait for like IntelliJ to just index for like 30 minutes on like a large repositories. You get all of that just for free. And like you haven't pulled in a long time. And as soon as you do get pulled, like your index is out of date, all of that stuff, like I don't have to think about that anymore. But I'm sure it must be really hard to like build a system that can can do this well and scalably for the number of repositories you're handling. So can you talk a little bit about the engineering challenges or maybe just the implementation? How does this all work under the hood and how do you actually make it work efficiently? Yeah, totally.

Starting point is 00:17:41 Performance is a big area for us and there's many different places I could dive into. So I guess there's multiple dimensions to making this work. And I'll start with the different types of queries we support. So to start with the string literal queries where you have an exact string that you're trying to find somewhere in your code base. That's one type of query we support. Another kind is a regular expression pattern match that describes a pattern that you like to find. And then we also support this other pattern matching syntax called Comey.

Starting point is 00:18:18 It was actually this, it's a newer pattern matching syntax actually developed by one of our teammates as part of his PhD thesis. I'm going to butcher the description, but the TLDR description is that it's like regex, but more powerful and more ergonomic for use with code. So things like matching balance parens or other delimiters with Come syntax much easier than doing that with regex. So those are the type of queries that we support. And the question is, you know, how do we make all those things fast, especially at scale? So, you know, we have customers who have on the order of hundreds of thousands of repositories or,

Starting point is 00:19:04 you know, other customers have monorepos that are as large as hundreds of thousands of normal sized repositories. And supporting both string literal and pattern matching search at that scale is non-trivial. And the way we've kind of broken that up, it's kind of a combination of many different things. So maybe I'll just start with kind of the bottom of the stack and move up. So there is an index that we build that is perhaps like the foundation for why a lot of queries are fast.

Starting point is 00:19:44 So it's something called a trigram index. that we build on that is perhaps like the foundation for why a lot of queries are fast. So it's something called a trigram index. The way to think about it is you're building this index of trigrams. Trigram here means a sequence of three characters. So think about like any sequence of three characters that could occur in code. That becomes a trigram that then we index

Starting point is 00:20:06 and this is actually the same sort of index that google code search uses internally to support regular expression queries it's the same i believe live grep does the same we use this open source repository called zoekt z-o-e-k-t, as kind of the thing that handles the trigram indexing for us. And the idea is when you have a query, whether it's a string literal or a regular expression, you break up that query into literal trigrams. So if it's a string literal, you just kind of chop it up. Every sequence of three characters becomes a thing you look up in the index. For regular expressions, you kind of discard all the special syntax, you know, the dots and the stars, and just focus on the literal pieces of the query. And then you search for those against that index. And that gives you back a set of things.

Starting point is 00:21:01 For each trigram, you intersect those those and that helps you narrow down quickly the universe of possible matches in the code base. So that's a really important part of our stack. We've been using it for a long time. We shard it so that we can shard the index by repository essentially or by a chunk of the code base to make it so that we can scale things horizontally. So that's the trigram index. Another part of what we do is supporting really fast kind of in memory search. So there is a question of, you know, what goes into the index, we can put the master or the main branch of every repository into the index. But at some point, you know, we can't put every single branch or every single revision into the index because then the size of that index will explode. So the question is, you know, what if

Starting point is 00:21:54 you're browsing some non-master or non-main revision of code and you want to do a search of that repository? So we have another search backend that is just optimized for reading Git data into memory and then just doing a linear search quickly over all that. And I'm less familiar with internals of the in-memory searcher, but I know that we've borrowed a lot of ideas from RipGrip, the way it does kind of fast linear scans. I think particularly the way its regular expression engine

Starting point is 00:22:27 operates, because I know that that was regular expression performance can often become a bottleneck in those scenarios. So thank you, Andrew Glant, for writing that and making it open source so that we could take inspiration to that. Let's see, what else? There's also, so one thing that we could take inspiration that let's see what else there's also so one thing that we shipped recently is also streaming search and what that essentially means is our up and down the stack of our application now is designed to be able to stream search results back as they are kind of found in the index or in memory to the UI. So in cases where it's an extremely large code base and you

Starting point is 00:23:06 search for something, maybe it's like super common, there's a ton of results. We'll stream those results back to you rather than waiting for the entire results set to be collected. And that improves performance a lot, time to first result. And yeah, I think that I've covered kind of, like, the high level. I mean, there's obviously, like, a long tail of optimizations. But I think those are the big points of how we've made this fast. I think that's really interesting. I have so many more questions that I can just jump off. But the main part that seems like there is for the common use case of like the master or main branch

Starting point is 00:23:45 you can store like an in-memory index and and for the other non-regular like branch names that are not common ones you have so you're reading some amount of git memory or like git data into memory and search happens on that in-memory information on the fly, basically. Exactly. Super interesting. And I'm guessing you like measure performance of all of these and you have some estimates. How much slower is the just reading the non-master main branch case versus reading stuff from the index?

Starting point is 00:24:20 It's, well, I think it depends on the size of what you're searching over. If it's a single kind of medium-sized repository, I think the in-memory searcher can be as well the honest answer is I don't know the exact numbers. But the in-memory search, if it's just a single normal size repository, will be less than 100 milliseconds typically because it's super fast. Where it gets trickier is when you have to load gigabytes of code data into memory. And then it starts to get slow. And that's when the index helps because the index kind of gives you a much smaller set of things that you need to load into memory. And then you load those things into memory and then run the regular expression pattern matching over that.

Starting point is 00:25:10 And so like on code bases that are, you know, on the order of like, say like 10,000 repositories or more, if you're just using the in-memory searcher and you search for something that requires searching over the entire result set before you get the result, in kind of the earlier days of the product, we would see things take on the order of like 10 seconds, 20 seconds, some cases 60 seconds. That was our timeout because we were like, you know, no one's going to wait longer than 60 seconds for a search result to complete. And that motivated adding the index and being able to search larger corpuses of data quickly. Yeah. Doesn't the Git checkout itself take time or have you optimized that somehow? Because

Starting point is 00:26:03 these I've seen on our larger repositories, just the Git checkout, a three-week-old commit can take five seconds. Yeah. Yeah. So we have a bunch of optimizations around loading Git data. We do cache all the Git data within our application. So typically index from GitHub or GitLab or Bitbucket or like any Git code hosts. Actually really we're supporting more than just Git now, but just like any code repository, but we kind of clone a local copy of that and then use that to serve results in memory. Interesting.

Starting point is 00:26:41 And I guess the biggest like question that I have is around updating the index, right, question that I have is around updating the index, right? Like, for high throughput repositories, they're getting a lot of commits. And I'm guessing you're not rebuilding the index from scratch for every commit. So, how does that work? Yeah. Yeah. So, we have this thing called repo updater, which is this background loop that periodically scans all the repositories in our index and decides which ones need to be updated. I'm actually less familiar with what we've done to kind of take a look at preexisting indexes. I hope I'm not misspeaking, but I think we've actually done very little around, like, reusing

Starting point is 00:27:26 existing search indexes to make, like, well, I think what you're talking about is, like, incremental indexing. Like, you have an existing index and there's some diff. The diff is relatively small relative to the size of the overall code base. I believe we just reindex everything right now, which, you know, is suboptimal. I agree with you. But I think it makes sense because most code doesn't change most of the time. That's why the diff is small, right? So you can just search through the old index and it'll work for 95, 98% of searches. Yeah, absolutely. Yeah. But yeah, it seems like it's a problem with such a simple user interface. Somebody just typed

Starting point is 00:28:04 something and they're getting results. And there's so much happening behind the scenes. Around technical implementation, the last question is around ranking. Do you all do any kind of ranking at all? And how does that work with the streaming results? Because I'm sure that gets pretty complicated. Yeah. So currently, there's not any sort of ranking.

Starting point is 00:28:24 It's just the results know, the results get streamed in as they arrive. And the reason for that is because a lot of the queries are either of the form, I want to find something very specific. And so there's going to be like a handful of results. They all fit on one page or I'm trying to do a comprehensive search, in which case you're probably going to go through all the results anyways. I think in the future, there's a couple of, I should say that like this works for some workflows, but it doesn't work for others, right? Like if you're more generally searching for like a function name, maybe there's multiple matching functions and you're not familiar with each of them, it would be helpful to rank those functions, say, by a number of times they're used in the code base. Or maybe ranked by how recently you visited them in Sourcegraph or in your editor.

Starting point is 00:29:15 And those are all ranking criteria that we're kind of looking at and will eventually incorporate into the product. But that's, as of today, still part of our product roadmap. But it sounds like a really fun problem to like go over this, like what kind of user testing do you all do when you want to like come up with a new feature or like, yeah, ship something like, oh, we have this idea that ranking might be useful. How do you go about what is the product feedback loop with customers? Yeah. So there's, I think, a feedback in kind of multiple layers. The first layer is, you know, you yourself, the person building it. In this case, we have the advantage of, hey, like every engineer on the team is also a user of the product. And we should rely on our intuitions as developers to guide us in terms of like what sort of user experience we end up

Starting point is 00:30:14 building. So like at the first level, it's like, you know, what feels right to you as the person building it. The next level is kind of dogfooding the product as a team. So, you know, we have sourcegraph.com that indexes a good chunk of the open source repositories out there, including our own. Our own code base is mostly open. It's kind of open core licensed where the core is open source. But there's kind of a couple of packages that are enterprise specific. But it's all publicly available and therefore indexed by sourcegraph.com. And so we use that heavily.

Starting point is 00:30:50 If you push a feature, you'll very quickly get feedback, both positive and negative from the rest of the team because people will see it like, oh, huh, you know, like this thing feels new or like, you know, this feels awkward. I liked it the way it was before. And that's always a really good source of feedback to follow. And then the kind of like last layer of feedback

Starting point is 00:31:11 is chatting with your customers, people who are external to Sourcegraph. And this layer of feedback is, it's the last layer for a reason. It's because the iteration cycles tend to be a little bit longer. Obviously you're taking a dependency on someone outside your organization. But it's really important to get to because not everyone codes in the same way.

Starting point is 00:31:33 Our repository at source graph may not be reflective of repositories in terms of size, language, like file hierarchy, all these sorts of qualities that vary from code base to code base. So we take a pretty active stance in reaching out to our users. I think we're lucky in that we have pretty close relationships with people like software developers at most of our customer sites. And so when we want specific feedback for a given feature, and we know that this is a feature that, you know, a given customer will like,

Starting point is 00:32:12 we often reach out to our technical point of contact there and say like, Hey, you know, just wanted to let you know, there's this new thing. We'd love to hear your feedback or in some cases, you know, you know, do you have some time to hop on a 30 minute call later this week? We'd love to walk you through this and get your reaction. Or in some cases it's, you know, they'll come to us. Cause we actually add a lot of people in our Slack from outside the company and they'll be like, Hey, you know, in Sourcegraph latest version of Sourcegraph this behavior changed and it feels kind of weird. Can someone help us out? And that's kind of nice because like any member of the team

Starting point is 00:32:47 can hop in and respond to that. I guess like another dimension to that last layer of feedback is what we're trying to do more this year is reach out to users in open source and maintainers of open source repositories and understand how they're using Sourcegraph. I think that's an important set of people

Starting point is 00:33:08 that we really want to make happy. I think we're going to grow the size of our open source search index substantially this year, hopefully to the point where it indexes basically every open source repository out there. And we also have a couple of features in mind that we think will be particularly useful to the set of open source communities out there in terms of enabling

Starting point is 00:33:35 people who want to onboard to that project, enabling maintainers to make more productive use of their own time, and enabling all the different contributors and stakeholders in open source communities to very easily dive into the code and learn how stuff works. So to that end, I think we'll be explicitly reaching out to members of different open source communities

Starting point is 00:33:58 much more over the next year or so just to get their explicit feedback on how those features are working for them. Yeah. Hearing that you're trying to index every single open source repository sounds like super audacious just from somebody like with an outside perspective. But it's amazing that you can aim for something like that. And I would be really scared to see your AWS cloud bill or whatever. Yeah. It's substantial. But I think it's worth it

Starting point is 00:34:26 because I think what we really want to strive towards is like, imagine there's just a single search box where you could go in and like, it's just like Google, right? Like you never worry about if something's not in Google's index because like everything's in Google's index. Literally the entire internet

Starting point is 00:34:42 feels like it's at your fingertips. And we want to do the same thing for the world of open source code. I think that'll be awesome. I've always wondered why GitHub search just doesn't seem to do what I want it to do. I don't know if you've ever experienced this, but it never has the right results. Because I guess there's so many repositories like Forks and all of that that come in the way and like the same result shows up a bunch of times yeah but i i imagine that ranking might become more important once you can actually index every single repository yeah for sure and you know i want to give credit where it's due i think github search has improved um over the years i think they're actively investing in it now um

Starting point is 00:35:22 but you know like you said i think there are gaps and I think it's kind of a question of focus, I think, at the end of the day. I do sincerely hope that GitHub search improves. I hope search on all the code hosts improves because I'm a user of GitHub. I'm also a user of the other code hosts as well, GitLab, and I've used Bitbucket before as well. And they all need to do what they need to do to help their user communities and prioritize accordingly relative to all the other features that they want to work on. And for code hosts, they're very featureful applications. They have a lot of other features that need attention. Whereas our focus is really on code search and code understanding. So I think that historically it's boiled down to code search and then the browsing interface

Starting point is 00:36:14 that you use to navigate the reference graph of code. That really is our laser focus. Our worldview is that that is almost like the central focal point of software development, especially in like the modern world. Because everything you do kind of revolves around understanding the source code. And if you look at the way that people work, inside organizations that use Sourcegraph or organizations that use some other code search engine, a lot of the day-to-day, the stuff that you do many times per day is hopping back and forth between your editor and code search. And so we think that there's enough there to warrant a dedicated application to focus

Starting point is 00:36:59 on the code search aspect of that. Because it's so important, so central to what you do day to day as a software engineer. I think that makes total sense. And I was surprised when I found out when I started working at Dropbox, like, why don't we have something like Sourcegraph? Or at that time, I didn't know what Sourcegraph was. Like, why is this not how Google used to do it? They could manage for such a large monorepo. Why can't we do it? I think everyone should have something like a source graph. Just the whole experience of being able to share links to code, that is just so essential. And it seems so basic, but you miss it so much when you don't have

Starting point is 00:37:35 something like that. Yeah. The link sharing is almost like an unintended feature because it sounds so trivial. Any sort of web app you can share links to, but once you're able to do that with code, it's like it opens up a new world of possibilities because you can link someone to a place in code and start a conversation with them using that link. Then they can click that link and kind of poke around themselves. They don't have to go

Starting point is 00:37:58 through kind of the obstacle course of pulling that into their editor to explore and get a sense. And I think like I've benefited from it both as a link sender and a link receiver. Like when I'm sending a link, it's most of the time, it's like, I need to ask someone a question about the code. I want it like a quick answer. And so this helps me like instantly point them to what I have a question about. And then as a receiver, it's like oftentimes they're

Starting point is 00:38:26 asking me a question about code that I wrote a long time ago or maybe haven't touched in a while. So before I can answer, I have to go and kind of relearn the code a bit myself to gain enough familiarity in order to be sure that my answer is going to be correct. Yeah. How many engineers does Sourcegraph have now? We are up to mid 40s.

Starting point is 00:38:50 Wow. Okay. And Sourcegraph started like seven years, eight years ago now, right? 2013. Yeah. Mid 2013. Yeah.

Starting point is 00:38:57 How has your role evolved? I'm sure in the beginning you were writing all the code or most of the code with your co-founder. And yeah, how has your role evolved like over time? you were writing all the code or most of the code with your co-founder. And yeah, how has your role evolved like over time? Yeah. So the role has definitely evolved. There's many different hats. I guess I'll speak in terms of like hats, I guess.

Starting point is 00:39:21 Like I've definitely worn the like software engineer hat. I've worn the kind of engineering manager hat. I've worn the customer support hat, like hopping on calls and helping people work through bugs and kinks in the product. I've written a lot of blog posts, like an editor. Or recently podcast hosts. We did kind of a podcast run last year interviewing like other developer tool creators. And so that's kind of like a sampling of different roles, I guess I've filled. And it's been interesting. It definitely like takes you out of your comfort zone.

Starting point is 00:40:03 At the moment, I'm actually writing a decent amount of code. I'm effectively operating as an individual contributor engineer right now for most of my time, focusing on one or two new features that we're trying to get off the ground in 2021. Interesting. Yeah. I think that makes a lot of sense. Just customer support

Starting point is 00:40:27 and that I certainly see you need to be able to help figure out what's wrong with the product and fix that. And also attention is the scarcest commodity of the 21st century. So making people find out about it through blog posts,

Starting point is 00:40:43 that makes a lot of sense um so like i guess i had one question around how do you mean like maintain quality in a sense like since you've been working on uh this product for so long it's it might be basically kind of like your baby right you're you don't want the product to be bad, but then there's so many other engineers working on it. How do you maintain that you're shipping high-quality stuff while you're shipping new features, not adding many regressions? What is that process like? Yeah. So I think there's multiple layers to quality. There's kind of quality at the end user experience level. There's quality in terms of how well tested the code is.

Starting point is 00:41:31 And there's quality in terms of code style. And there might be more layers than that. Those are just like the first three that come to mind. And I think there's like different things we try to do to maintain a quality bar at all those levels. So I guess I'll start with at the product level. I think it's tricky, right? Because there is this natural tension between maintaining robustness in the application, but also enabling people to ship things quickly.

Starting point is 00:42:09 Kind of three layers to quality, product level, testing, and code quality. And they're all sort of interrelated. I think one of the things that... So it's interesting. I think a lot of companies have a dedicated QA team as kind of like a final backstop to handle quality. We don't have a separate QA role. Insights for Scrap quality really falls on the shoulders of like every single

Starting point is 00:42:37 engineer. And we try to encourage kind of like product level as well as code level ownership among engineering teams, which means like there's no kind of like, oh, I get to throw this over the fence. It's someone else's problem to resolve the bugs associated with it now. A couple things we do that allow this all to work. I think one is we dog food the product heavily, which means if there's a bug or an issue, chances are that it'll get surfaced by us first before it reaches customers. Another thing is we have an automated end-to-end test suite. So this is a test suite that runs headed tests, so UI-driven tests that were based off of common workflows

Starting point is 00:43:25 that we used to test manually. So we kind of came up with them manually first and then tested them every iteration and then gradually automated them so that we could run them automatically. And I forgot what was the other thing we do. I think the general idea is, oh, sorry, yeah. Another thing we do is we make engineers kind of,

Starting point is 00:43:53 there is no hard boundary between the engineering team and customers at Sourcegraph. So a lot of times when we're troubleshooting a bug, a customer engineer or customer support member of the team who are kind of like the front lines of interacting with the customers will tag a member of the software engineering team into a conversation.

Starting point is 00:44:13 And then that engineer will go and kind of help the customer or the user live debug the issue. And that is nice because I think it accelerates the feedback loop of resolving something quickly, but also helps reinforce this kind of end-to-end responsibility that we feel every engineer on the team has, which is ensuring the end-to-end quality of their work for the end user and end customer. And yeah, I mean, in addition to that, we have a robust unit test suite.

Starting point is 00:44:45 We have a lot of members of the team. I think we've kind of selected for engineers who are quality conscious and mindful of things like technical debt. And so they serve as kind of a natural voice on the team advocating for that, both at kind of like a planning level, like, hey, we should take some time to clean this part of the code based on this iteration, as well as, you know, in code review saying like, you know, this change does not meet our quality bar. It needs to be reworked in this manner. I myself have run up into this kind of barrier. You know, when I'm trying to push something, oftentimes there'll

Starting point is 00:45:19 be a member of the team that says like, Nope, you know, like this, this is not, you gotta fix that. And, you know, in the short term, I'm, you you know of course miffed but in a long long run i'm very very thankful uh to have voices like that on the team because i think it's it's helped us execute sustainably um uh over time yeah i certainly have people who've gone through and had to dig up legacy code bases. I think they're the ones who are definitely thankful about being able to maintain good code. Are there any metrics that you track in terms of crash rates or just latency or something to make sure your customers are having a good experience? Yeah, we do track application level metrics. We've instrumented our application with

Starting point is 00:46:11 top line metrics related to search latency and page load times. Everything gets dumped into Prometheus and there's Grafana that gets packaged up with the application that lets you kind of see at a glance how these metrics are trending over time. Yeah, so we do track those. Okay. Interesting. And I'm guessing, like, do you all have any kind of visibility into when you ship something, like, on-prem to customers? Like, how do you know that something's wrong? How do you go debug that?

Starting point is 00:46:44 Yeah, so for most customers, it's when the customer lets us know. That's one of the drawbacks of going with a self-hosted model. But it's also kind of by design, right? Like by design, we don't have direct access to the instances running inside customer environments, because that would imply we have access to the customer's code, which the customer doesn't want. And so a lot of times in those scenarios, we do rely on people on the customer side to report issues to us. Because of that, we do take care to kind of test things during the pre-release process to ensure that as ensure that, um, as much as we can, we catch any sort of major performance regressions, uh, before it gets to the customer, but in the off chance that it, uh, does make it into deployments and hit

Starting point is 00:47:35 a customer, um, we are very responsive and we built in facilities into the application that kind of streamlined the that streamlined the bug reporting and resolution process. So there is a panel in the site admin section of the application that allows you to dump the config of your instance so that you can easily forward that to someone at Sourcegraph so that they can go and look at that config and replicate the issue on a local instance.

Starting point is 00:48:05 Yeah, it seems like there's so many challenges, like scale, but there's also like debug ability, performance, and all of these things. So taking maybe a step back, Sourcegraph basically is a developer tool startup. I think that's a good way to describe it. How would you say if like a developer tool startup like is different from a B2B or B2C startup like organizationally just in terms of sales anything that you've found that's like different like no there's actually more account executives or something like that. Just any interesting observations you've had or you've seen? Yeah, so it's hard to speak generally because I've seen a lot of other developer tool companies

Starting point is 00:48:54 and B2C companies and B2C companies from the outside, but Sourcegraph is really the only company I've seen really close up. So I don't know if I can answer the general question, but I can maybe try to think in terms of, you know, what my impression of is of like, you know, our company versus other companies that don't necessarily sell directly to developers. I think, I mean, there's a lot, I think that gets, that is different when you're set up to, you know, build a product specifically aimed at developers and you're also marketing to developers and a lot of times you're selling to developers. I guess, you know, maybe one thing that is really nice is,

Starting point is 00:49:50 I think communications between the engineering team and your user and customers are a lot more streamlined because it's just like, you know, let's get on, let's hop on a phone call and talk developer to developer. You know, we don't need someone in the middle being kind of like a translator for this. So that part is nice. I think from a sales perspective, developers in general, I think, tend to be very sensitive to overly pushy salesmanship or BS, you know. And so not that like non-developer companies market and sell on the basis of, you know, BS, but like I think with developer tool startups, there is a premium on being like precise about what value it is you're providing and not wasting

Starting point is 00:50:43 people's time and almost like avoiding buzzwords to a certain extent, because those are commonly are associated with, you know, low quality sales efforts. I think all these things are true generally of any startup, right? You don't want to market or sell your product in a way that comes off as BSE or noisy. But I think among the developer audience, you'll get an especially strong reaction if that's what you're doing. Let's see, what else? I think hiring for product has been tricky for us because we kind of waffled between two worlds of like, oh, let's hire people

Starting point is 00:51:27 as product managers who are developers in the past but want to do product. Or let's try and hire people who are product managers by background who it's probably been a while since they've coded or maybe they've never coded in their careers. And there's great people you can get from both those camps, but there's also anti-patterns. And so it's been tricky for us. A lot of this is due to like, you know, us figuring out how we want to structure the product organization and figure out how it relates to engineering. But it's been challenging because I think that our initial tendency was like, oh, let's just hire technical people who want to do product without of course, appreciating or realizing that product management is a very distinct skillset.

Starting point is 00:52:13 And, you know, someone who is a really good developer and has lots of great ideas, you know, for, for the product, not necessarily going to be a great product manager because that role also involves, it involves very little coding and a lot of building consensus among stakeholders and describing it at kind of like a human or a high level, what the objectives are in kind of ensuring consensus around that across the team. So I know I'm rambling here. I'm just kind of like taking things off the top of my head. No, I think that product management part was something that I didn't expect at all. So I know I'm rambling here. I'm just kind of like taking things off the top of my head.

Starting point is 00:52:48 No, I think that product management part was something that I didn't expect at all. That makes a lot of sense. Like who do you hire for product management for a developer tool startup? My first impression would have been just hire programmers because they're the people who you're selling this product to. They'll be thinking about what the product should have. But that's actually a pretty cool insight. Yeah. And how is fundraising, has that changed over the past, the whole fundraising ecosystem for developer tools? Yeah. So the fundraising landscape has changed dramatically, I would say. So back in 2013, when Quinn and I first started hacking on what would eventually become Sourcegraph,

Starting point is 00:53:28 I'd say for developer tools, we couldn't even say developer tools to potential investors, because that would be the conversation killer. Because the consensus was, I think, among the network of Silicon Valley based investors at that time was there's no money in developer tools. Every single developer tool that has been built has been like, you know, it's like they've been single player tools that have failed to generate sufficient revenue, or, you know, if you build a really good developer tool, your customers are just going to copy your product because they're developers and they know how to like build stuff on their own. These were all, you know, among the many different objections that we heard. And it's important to realize, like, I'm not blaming anyone.

Starting point is 00:54:22 I think most investors just kind of look at the history of companies and kind of build a mental model of like, what's going to be, I'm not blaming anyone. I think most investors just look at the history of companies and build a mental model of what's going to be a good company from that. As of 2013, there had been, at least in recent memory, no very big developer tool company is created. Remember, GitHub existed, but this was before the big round they raised from Andreessen and before their acquisition by Microsoft. GitLab, I think, was just getting started in those days. And there had been a long history of companies that had built like new IDEs, like web-based IDEs. They're optimized for like a given individual to use it. And a lot of those had not panned out the way

Starting point is 00:55:09 that their founders had originally envisioned. And so it was kind of a very tough environment to be in. We were fortunate enough to find investors who kind of saw the long-term vision of what we were shooting for and kind of believed in what-term vision of what we were shooting for and kind of believed in what is now I think the conventional wisdom of like, you know, hey, if software is in the world, which you know, I think it now obviously is, then that also means, you know, software

Starting point is 00:55:36 development is going to become a core part of the DNA of every major company and they're going to need tools in order to facilitate doing software, building software well at scale. And so like the contrast from 2013 to now is like night and day, you know, after GitHub acquisition by Microsoft, after GitLab became, you know, multi-billion dollar company after, you know, companies like HashiCorp came along. And also I think, you know, API driven companies like Stripe and Twilio and folks like that really elevated the status and importance of developers to the broader economy and to, you know, the way you go to market as, you know, many startups do today. And so now the conventional wisdom is very different. Like developer tools are very hot. You know, investors are investing in dev tool startups left and right at all kind of parts of the software development lifecycle from, you know, source code, writing source code, reading understanding source code, which is where

Starting point is 00:56:47 we sit to, you know, things on CI, CD, the deployment, the production side, it's, it's, it's really a booming ecosystem. It's really awesome to see. Yeah, I think I might have noticed some of this, but it was great to get your perspective on it. Like, just it seems like people are more excited about developer tools. I also used to think maybe three or four years ago that most engineers and most developers would just use the open source stuff versus going through the procurement process and buying a developer tool and going through the sales cycle or convincing their manager that we should really buy this and pay how much of our money has that also changed or like how i'm sure that might still be a struggle

Starting point is 00:57:30 for you or like has that changed over time yeah so it i think the key to doing um to building a successful developer uh business is to make it easy for people to try your product and see the value. But then there needs to be some gating function, which point, you know, they can't just keep using it for free. Their team can't keep using it for free. The organization has to pay if they want to like install it for everyone inside the organization. And so I think open source, a lot of developer tool startups today have this open core model, which I think companies like HashiCorp and GitLab really pioneered. And I think it's a great one because open core essentially means splitting your product into, part of it is open

Starting point is 00:58:19 source. Typically the majority of the features and functionality are open source, but then there's a certain set of features that are absolutely necessary for a large enterprise to use the product, you know, things like authentication or enforcing permissions or other kinds of like compliance driven requirements or needs, or maybe it's like security related features that, that help give the administrator visibility into like what users are accessing the instance. Those features are gated behind an enterprise license. And what that allows you to do is make your core value proposition accessible to any engineer that

Starting point is 00:59:01 wants to try it. So with Sourcegraph, it's like there's a single Docker run command allows you to spin up Sourcegraph wherever you are inside your network, wherever, and put all your private code into it in a safe and secure environment and allow you to like play around with that. So you can run it locally on your local machine. It's got all your private code and you're searching against it and you kind of build conviction like, hey, you know, this is, this is actually pretty cool. I should spread this around. And then from there, it's easy to get other people on your team involved in it. And then at some point, there are enough people on it that management or leadership or whatever starts to pay attention. And then when they want to go and spread to their entire team, the gating function is that beyond a certain number of users,

Starting point is 00:59:47 we require you to pay for the enterprise version of the product. And there's also all these other important features like authentication and permissions and security that are built into the enterprise-only part of the product that requires the organization to pay. And so at that point, you already have all these like fans inside the company, which is great. There's all these people who are like, we want Sourcegraph, we want Sourcegraph. And that makes the case to the ultimate economic buyer, which is in our case, typically, you know, maybe their director of engineering or the lead of developer tools or developer productivity, or in some cases, you know, the CTO or VP of engineering,

Starting point is 01:00:30 that makes the case that we make to them very, very compelling. Because it's like, you know, you're, you're, the members of your organization are already using this product, they already love it. And, you know, here's all the ways that we provide value to organization. And then we also have features that are specifically targeted towards that director persona. You know, here's all the ways that we can help resolve headaches that you have around understanding the code and how it's changing in your organization as well. And so that model is very effective. That makes sense. And like, so there's this kind of like this bottom up adoption of the product and that what happens next? Is it that like, because there's this kind of like this bottom up adoption of the product and that. What happens next? Is it that like, because you can't find out that people are just trying out your product.

Starting point is 01:01:27 Then do you just hear from a company or like a director or like a CTO saying, turns out we really want to buy this product because the kind of single Docker image, the first thing it asks you to do is input your email address to create an admin account on the instance. And that sends the email address to us. So we're aware of everyone who goes through kind of like the standard installation process and inputs their email. We are very explicit that we collect your email. And that is about the only thing, the only piece of information we do collect. That and I think very like high level usage statistics, like how many users have signed up for the instance. Everything else we try not to collect because we know that the code is very sensitive

Starting point is 01:02:01 and developers are very privacy conscious. And for the super privacy conscious ones, they can always just go and run the open source version themselves. And then they don't have to ever, you know, alert us about, you know, the fact that they're using source graph. So we try to keep that minimal, but also, you know, we are running a business. And so the fact that we can see, you know, oh, so-and-so with this email address at this company just installed the product kind of gives us an idea of who's trying it out.

Starting point is 01:02:30 And then we can see, you know, if there's more and more users hopping out of the product. And at that point, we may, you know, email the person with basically like a friendly intro, like, hey, you know, we noticed you installed Sourcegraph. How can we help? By the way, we'd also like to sell you software if, if that's what you're up for. And here's all the reasons why you might want to pay for it. Um, and so far that's been good enough. Like there are a lot of people who just use source graph, uh, for free that typically, you know, smaller organizations or, um, you know, sets of, uh, small groups of individuals. We feel that's great. Like we, I don't think the, the vast majority of our income as a business is never going to come from individual developers or

Starting point is 01:03:14 small teams. It's really like the, the bulk of, of the way the economics work out, the bulk of the income really come from selling to large organizations and enterprises. And so that those are the people that were focused on charging at the moment. Yeah, yeah, it's like the premium model in a sense. Yeah, that makes that makes a lot of sense. And so a key part of the strategy is like, you know, getting the word out and showing to people that source graph is awesome. Like, what's just the highest leverage where you've seen? Like, is it like going to conferences, posting like blogs on Hacker News? Like how do you get the word out that there's this tool and you should use it because it's awesome? Yeah. So, you know, conferences

Starting point is 01:03:53 are great. It's a shame, you know, we have this pandemic thing going on because that was, you know, I'm really looking forward to things opening back up because it was great to just like go to those and meet new people and exchange ideas. I think that was a big element of us getting the word out in the early days. Just like, it's not necessarily about like having someone try the product and become a user then and there, it's more just about like, hey, you know, there's this thing called Sourcegraph, you might want to, you know, check it out. And then later on, they'll hear our name through some other channel. And they'll be like, oh, I've heard that before. And that kind of increases the likelihood that they'll actually investigate it. I think Hacker News, just like

Starting point is 01:04:35 publishing great technical content has been really good for us. We do a lot of interesting engineering work. And we also write content about like software engineering, developer tools, and open source software, a lot of which the Hacker News audience finds engaging and interesting. We've also like live blogged a lot of the conferences that we've attended, and that tends to be like popular among the H the hacker news audience as well. Yeah. I mean, it's, it's mostly that.

Starting point is 01:05:09 And then actually like, you know, on Twitter too, just like hopping on, on Twitter and, you know, someone complains about the product, engaging them,

Starting point is 01:05:17 or if someone, you know, tweets about code search, you know, someone will reply to them. I think Twitter is actually a great channel to, to reach developers because like a lot of amazing developers

Starting point is 01:05:27 are on Twitter and you can use it as a way to just like discover like what new things people are building out there. And it's great for that. A lot of developers are on Twitter and they're complaining or making memes.

Starting point is 01:05:43 Yeah. I've certainly like been a part of at least a couple of conversations. But that's cool. Maybe just a few questions to close out, just some broader questions around the ecosystem of developer tools. We spoke about how the fundraising climate changed. And I think that kind of can tell us that developer tools if they're going to get funds easier they're probably going to be more of them but in general like how do you think the

Starting point is 01:06:11 developer tools ecosystem is changing in general like what's the future of like a software organization going to look like is it just going to be a smaller organization that uses a bunch of tools versus what we see today, which is like larger organizations. I know there's kind of two questions packed in there, but yeah. What do you think of this? Yeah, that's a great question. So I don't know if I can answer the question directly.

Starting point is 01:06:44 Maybe I'll just, I'll, I'll talk about some things that have been on my mind recently and maybe hopefully, you know, an answer will emerge from that. So, you know, one, one big trend I think is software development is, is becoming something that every company needs to do. So, you know, traditionally we thought of like technology or software, high-tech startups as like our high-tech companies as like a separate sector of the economy. You know, you had the Googles and the Microsofts and the Amazons of the world. But increasingly I think it's like the John Deere's of the world or the, you know, JP Morgan's of the world, or, you know, I don't know,

Starting point is 01:07:28 I'm trying to think of like more like a small, small company names, but like essentially every company is, is, needs to understand how to build software because there's some element of their business that is going to rely on software development. And so the ability to build software at scale, I think before that was only a thing that a few companies in the world needed to know how to do. And now every company needs to know how to do that. And so I think one trend is you're going to start to see a lot of tools that a lot of which may be inspired by internal tools built by earlier technology focused firms, you know, Sourcegraph being one of them,

Starting point is 01:08:06 but there are many other examples of this. Those sorts of developer tools that are similar to those previous internal tools, but built for the broader ecosystem that integrate with, you know, open source technology stack, for example, or that handle a lot of different development environments as opposed to just a single development environment.

Starting point is 01:08:29 Another big trend is just the explosion of open source code. This is not new news by any means. It's been going on for a while now, but it continues to balloon. I think it's really a wonderful thing about the software world that there's this like giant shared commons, knowledge commons that everyone can kind of draw on and use to build whatever application they're trying to build. And I think this is like one way in which the current software revolution is perhaps different from like the first software revolution like the first software revolution you know 1990s early 2000s i think was really about software products impacting the the end user software products being developed by a small set of companies or being fostered by a small set of companies

Starting point is 01:09:24 you know in those days windows Windows was huge, right? Like everything was built on Windows. If you weren't working on top of Windows, you didn't have a business. And Microsoft did. I kind of like to say that Microsoft was the original developer tools company because what they did was they built this amazing developer ecosystem that fueled their offering of third party apps and integrations, which kind of elevated them as a business.

Starting point is 01:09:51 Like I think the reason that Windows was so successful is because Microsoft really prioritized the developer experience and made it so that like developers could build this like wide range of business and end user applications on top of the Windows platform. I think the current world we live in, there's no single proprietary ecosystem that is going to monopolize software development because of such a large shared open source commons. And so I think one of the ways in which the new world is, I mean, it's already different and it will continue to be different is that it's going to be built on these kind of open technology stacks, open protocols, open libraries and frameworks. And there's not going to be like one single like mothership or organization that is

Starting point is 01:10:37 responsible for ensuring, you know, good developer experience. It's more going to be like this ecosystem of tooling companies that identifies pain points associated with usage of different technologies or like frameworks or libraries in this kind of new world we live in and tackling those and solving those for some set of customers in the market. Yeah, I don't know if that answers the question. No, I think that makes sense. Like one thing that maybe worries me based on what you said was, if a software organization is going to be buying a bunch of different tools,

Starting point is 01:11:18 doesn't the complexity kind of explode? Like there's so many tools, nobody knows what exactly each tool does and like like we're paying for each one of these tools does anybody actually use this tool like do you have any thoughts around that yeah yeah a lot of thoughts uh i mean so i think that happens on um at least two levels like one is like there's all these tools that do like ci cd now for instance or there's all these like tools that like help me deploy my software. Do I use Kubernetes or do I use Docker? Do I use serverless?

Starting point is 01:11:50 Or it is kind of like this paradox of choice dilemma. And that's already been reflected in some sense at the, like the library level. Like, so forget, you know, tools that target particular part of your development process, but like, just think about the, like the potential libraries that you could use for say, you know, SSO integration or, you know, HTTP routing, like for any given thing that you want to do, there's, there's a bunch of different choices that you have to make. Whereas in the olden days is kind of like you had much

Starting point is 01:12:22 less choice, but it was easy because you didn't have to decide. You didn't have to expend as much energy deciding. And so, you know, I think there will be need for something that helps people make these decisions. At kind of like the library level, you know, I personally think Sourcegraph is going to be very useful there because it's going to allow people to kind of dive into the the nuts and bolts of of any given library and figure out how it's different from the alternatives and really make an informed decision about your choice of library that goes beyond just using what you're most familiar with or using the thing that has like the nicest landing page or the most github stars i think you know those are decent first order proxies perhaps, but you know, it's not, not at the level that we need to really be able to, you know, make informed choices and,

Starting point is 01:13:15 and develop software effectively. At the tools level, I think it's kind of an open question, like what are going to be like the channels through which people discover and learn about new tools and determine which tool is the best for them? These days, it's like Hacker News is the way that I find out about a lot of tools. And then it's kind of like some rough process of seeing what folks I follow on Twitter and like what my friends are using and trusting their judgment. I think we are gonna see new kind of like channels

Starting point is 01:13:53 for developer tools. So like one of the ways in which Sourcegraph is exploring this is we have an extension API that allows us to integrate third-party developer tools and allows those tools to inject contextual information into the source graph UI. So things like, I think like code coverage, line by line coverage information,

Starting point is 01:14:12 or pulling in logs from production into a particular line that has some instrumentation or monitoring associated with it. So we hope to be a channel of discovery for other developer tools as well. And we think that kind of our status as this like attention hub for developers, because, you know, people use us, go back to us multiple times per day, can help surface information from these other tools in a way that's helpful and immediately valuable to users. But that's just like one, that's just one kind of way in which the distribution

Starting point is 01:14:49 and awareness channels of developer tools might evolve. The truth is, I really don't know. Like I imagine there might be entire companies that spring up to evaluate developer tools. In some sense, like there's a company called StackShare right now, which you probably have heard of. They cover what technology stacks are used by all these different companies. Jonas, the founder, is a good friend of mine. And they're all about just helping companies learn from each other based on what other tools are being used.

Starting point is 01:15:28 So yeah, I don't know if that is a very complete answer, but I certainly think there'll be a lot of interesting opportunities to help people make sense of this booming new world that we live in. So I think what you said makes sense, right? There's gonna be a lot of new tools that solve maybe particular use cases and then there'll be aggregators or these like these attention hubs like sourcegraph for example that integrate with a bunch of the like different developer tools that you might have that can all show up in one place

Starting point is 01:15:56 yeah i wonder and that could also be maybe like bundling where you can have you seen this a lot like this might be a specific question but like have you seen this where like developer tools are like bundled with uh other developer tools and sold together and that you could have some kind of like best and sweet like observability code search and all of that yeah have you explored a business model like that i don't know yeah that's interesting i don't know if i'm aware of any like services that have been bundled together. I know there's some companies out there that are trying to bundle services together. Like GitHub and GitLab are both trying to bundle observability tools and CICD. And I mean, I guess GitHub was the original pioneer.

Starting point is 01:16:39 They bundled an issue tracker and code review tool into a code host. So I think there probably is value there. I think there's always gonna be this tension of having a more like vertically integrated stack that you can just hand someone and they don't have to think about the choices versus having specific tools that are tailored to the specific needs of a given customer

Starting point is 01:17:01 at any part of the software development lifecycle. I think there's no hard and fast rule here. I think it's all just a matter of details. I think there's some person who famously once said like, there's only two ways to make money in business, bundling and unbundling. And so I imagine we'll see like a lot of cycles of bundling and unbundling as the ecosystem continues to evolve and innovate. Yeah.

Starting point is 01:17:30 And maybe like a tongue-in-cheek question. When will Sourcegraph allow me to edit code and just hit like git add, git push? Because it has everything else I need. Like I don't run my code before submitting it anyway. Yeah, that's a great question. everything else I need. I don't run my code before submitting it anyway. Yeah, that's a great question. So, I mean, if the question is when will Sourcegraph allow you to edit code? The short answer to that is very soon it's going to allow you to edit code, but probably not in the way that you think. So we have this feature that's currently in beta right now, and already in the hands of some customers called campaigns, which allows people to do these like large scale refactorings or code modifications.

Starting point is 01:18:13 And this is a pain point that we've seen at a lot of, especially larger customers where once you're, you have a code base at scale, you want to make some small change to shared library API. And that one change involves modifying all the upstream dependencies of that particular thing that you're changing. And that can become a huge headache. We have customers who've spent over a year just trying to make one such change and just managing the collection of code reviews and conversations with different teams that have to happen in order to do that effectively. So long story short,

Starting point is 01:18:49 we just decided to build that into Sourcegraph. So there's a way to make these large-scale simple changes. I think your question was probably more directed at making the small-scale deep changes that you typically make in your editor or when is Sourcegraph going to become just like a web in your editor? Or like, you know, when is Sourcegraph gonna become just like a web-based editor? Like the GitHub code spaces or something they have?

Starting point is 01:19:10 Yeah, yeah. And I think, you know, my answer to that is maybe we'll start introducing like simple edit capabilities over time. They're really depending on what we hear from our users in terms of like what would be useful to them. But I think one of our core convictions is that there really is something very different about the frame of mind you're in when you're writing code versus reading code and trying to

Starting point is 01:19:42 like really figure out how a piece of code works. Because when you're writing code, it's like you're in the zone, you're focused, you know, on where your cursor is. Hopefully all the context, all the necessary context is already in your brain. And you just focused on typing the stuff that you're currently typing. When you're reading code, you kind of have all these files open. You're trying to draw these connections between different files. You're hopping around a lot. And I think it's really a very different mode to be in.

Starting point is 01:20:12 And to kind of illustrate this fact, like just think about the last time you tried to like switch from writing code to reading code in your editor. Like when that happens to me, I always end up opening a bunch of tabs and I get kind of lost. And then when I finally, you know, finish that like rabbit hole, like train of thought, I go back and I was like, oh crap, what was I doing again? And like, there's, you almost have to like reconstruct your editor state to get back to like where your cursor was and what it is you were doing. And that to me just implies very different workflows and consequently

Starting point is 01:20:46 very different applications. Like if you build an application that is focused on writing code, that's complex enough. You're like, IDEs are already complex enough. They have so many features. And if you're going to try to introduce more features targeted towards like reading code and exploring code and building mental model of how unfamiliar code works you're just going to end up over complicating this thing and it it's really better to have another application that's focused on that and you know maybe the two worlds will start to converge at some point i don't know it will really be a product of like you know we start from a focus on reading and understanding code and to the extent that people want to make little edits while they're in that frame of mind we'll support that and maybe over time that will

Starting point is 01:21:36 develop into something more but it's not really like a uh a near-term focus area no i think i think that makes a lot of sense. You're probably one of the people who've thought a lot about how do people actually like write code and read code and what the whole workflow is. So, but yeah,

Starting point is 01:21:54 just for those small edits that, you know, when there's like a comment that has a typo, that's basically it. Yeah, yeah. Yeah, but anyways, thanks so much for being a guest on my podcast. This was a great conversation.

Starting point is 01:22:08 Thanks so much for having me. This was a, this was great.

Your Ad Here

Software at Scale - Software at Scale 9 - Beyang Liu: CTO, Sourcegraph

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.