The Changelog: Software Development, Open Source - AI-assisted development is here to stay (Interview)

Episode Date: December 17, 2021

We're joined by Eran Yahav — talking about AI assistants for developers. Eran has been working on this problem for more than a decade. We talk about his path to now and how the idea for Tabnine came... to life, this AI revolution taking place and the role it will play in developer productivity, and we talk about the elephant in the room - how Tabnine compares to GitHub Copilot, and what they're doing to make Tabnine the AI assistant for every developer regardless of the IDE or editor you choose.

Transcript
Discussion (0)
Starting point is 00:00:00 Hey, welcome back. This is the Change Log. On this show, we feature the hackers, the leaders, and the innovators of the software world. We face our imposter syndrome, so you don't have to. And today, Jared and I did just that. Today, we're joined by Aranya Hav, CTO and co-founder of Tab9. And we're talking about AI assistance for developers. He's been working on this problem for more than a decade. We talk about his path to now and how the idea for Tab9 came to life, this AI revolution taking place, and the role it will play in developer productivity. And we talk about the elephant in the room, how Tab9 compares to GitHub Copilot and what they're doing to make Tab9 the AI assistant for every developer,
Starting point is 00:00:36 regardless of IDE or the editor you choose. And I want to give a shout out to our partners, Linode, Fastly, and LaunchDarkly. We love Linode. They keep it fast and simple. check them out at linode.com slash changelog our bandwidth is provided by fastly learn more at fastly.com and get your feature flags power by launch darkly get a demo at launch darkly.com this episode is brought to you by influxux Data, the makers of InfluxDB, a time-series platform for building and operating time-series applications. And I'm here with Josh Vandere from Network2Code. Josh, tell me about how you're using InfluxDB and Telegraph. Thanks, Adam.
Starting point is 00:01:17 Network2Code helps enterprises bring DevOps ideas into network organizations. We love using open-source tools like InfluxDB and Telegraph to help our clients collect, enrich, and analyze their data on their networks. Normally, we would have to build out this type of tooling, but InfluxDB and Telegraph meet all of our requirements. Plus, InfluxDB and Telegraph are open source, so we're able to contribute changes and use their SDKs to write custom plugins whenever we have specific needs. All right, learn more about the wide range of use cases of InfluxDB at influxdata.com slash changelog. Network monitoring, IoT monitoring, infrastructure and application monitoring. InfluxDB does it all. To get started, head to influxdata.com slash changelog. Again,
Starting point is 00:01:55 influxdata.com slash changelog. so iran you are cto and co-founder of tab9 you're also a professor of cs at israel's tech neon university welcome to the changelog. Thanks, guys, for having me. It's great to be here. Wow. It's great to have you. Professor. Yes.
Starting point is 00:02:31 That's very cool. Yeah, you know, it is cool. I'm not a professor, so I think it's super cool. I also read that prior to this, you also worked at IBM on the Watson project. Is that right? Working on Watson? No, I did not work on the Watson project. I worked in the Watson, TJ Watson project. Is that right? Working on Watson? No, I did not work on the Watson project. I worked in the Watson, TJ Watson lab. The Watson project was actually just a few rooms
Starting point is 00:02:50 down the corridor for me. So I heard all the sessions where they were doing the Jeopardy training and all that. It was super exciting, but I was not involved in that at all. Oh, bummer. But I got to witness kind of the few of the first runs that was literally in the same corridor. So that was cool. Yeah, I connected those dots because you're doing ML stuff and I figured, well, he's doing AI. He was at IBM. He's got to be working on
Starting point is 00:03:16 Watson. So I worked on very cool things that were not Watson, that were program synthesis, which are related to what we're doing also in Tab9. And at the time, we worked on program synthesis, which are related to what we're doing also in that tab nine. And at the time we worked on program synthesis for synthesizing low level concurrent programs.
Starting point is 00:03:33 So the idea that you would write some sequential code and hit a button and magically you will get it to run efficiently on concurrent systems. And I don't know if this sounds very hard, but it is very hard. It was a super exciting technology at the time. Yeah. So from there to here, now you have Tab9,
Starting point is 00:03:53 which is your company trying to do AI-assisted development workflows. Tell us how you got from that place to this place. So I've always been fascinated with programs that run on programs. That has been my kind of long standing fascination. I worked on compilers for a long time in the early days, then worked on program synthesis, which is programs that generate other programs. And generally speaking, I also worked on program verification trying to verify that you know a given program satisfies some spec that it does what you want it to do but very early you
Starting point is 00:04:36 start to realize that once the artifact has been built it's just too late to try and fix it if it was built the wrong way it's just don't bother and and fix it. If it was built the wrong way, it's just don't bother. And then I got super interested in synthesis and this idea that you can generate things from scratch the right way as you build them and therefore get sort of correctness by construction or at least have some good properties of what you're building by construction.
Starting point is 00:05:04 And so working on these kind of synthesis ideas in early days, I guess a decade ago, we started hitting stuff that seems to have actual practical value as opposed to just academic papers and fascinating ideas. And at the time, Dror, who is the CEO of Top9, he's a longtime friend. So I just met him for coffee. I said, hey, Dror, you know, we're kind of like, hey, look, look at this.
Starting point is 00:05:33 This is kind of like, it looks like this could actually work. He said, hey, man, this is like, that's the future. That's amazing. That's like, we've got to do something about that. I said, oh, no, I don't know. You know, like doing stuff for real is so far from what we usually do in research. It's probably too far for us to bridge the gap. I said, no, man, we're doing it.
Starting point is 00:05:54 I said, OK, if you insist, let's do it. And so this is how it all started. Initially, it was just, you know, the two of us building some technology pretty much in coffee shops and as these things go. And at the time it was called Codota, by the way. But then we started rolling with that and things started to work. It was very, very exciting. So that's like the early days of Codota at the time. So the chasm between research and real world, so to speak. Is that quite distant then?
Starting point is 00:06:25 Because I guess with research, you're sort of thinking like, what's so far in the future? You're just trying to kind of meet different target functions in a sense. Like when you're doing research, you're trying to do innovative stuff and stuff that has interesting core ideas, even if the ability to realize them lies like 10 years, 20 years down the road or never ever. Just want like this core idea, clean idea being generated.
Starting point is 00:06:51 So that's kind of the target function that you're optimizing for. And, you know, in real life, real world, what you're optimizing for is something that works, even if it's really, really boring and mundane, or even if you have to cut corners and it's not like a beautiful artifact, but gets the job done, then it's much more important. So it really, you may be working on exactly the same problem, but the fact that on one trajectory, you're trying to optimize like some core idea and kind of on the other trajectory, trying to make it work. This implies often solutions that come from different thoughts, different kind of on the other trajectory trying to make it work. This implies often solutions that come from different thoughts,
Starting point is 00:07:29 different kind of... Yeah. What about program synthesis in terms of like application? Is it a trend in parallel with the possibilities of AI and machine learning and training models and deploying things? Or is it a whole different animal? No. So program synthesis is, it is a different animal, but it has been using techniques from
Starting point is 00:07:52 AI and language models to kind of make synthesis more practical and maybe expand it to additional domains. So the idea of program synthesis is super old. The original papers are from the 50s, the idea of being able to synthesize something. But with the technology that came from ML and language models and things like that, you could do much more practical stuff in program synthesis. But program synthesis is a general concept.
Starting point is 00:08:25 It's a beautiful thing. I'm sorry, I can geek out forever on program synthesis. Yeah, for sure. So I'm trying to see if I'm following you. So program synthesis is an idea or a concept of which deep learning models can be used in order to accomplish it or in order. It's like they are parts of a larger concept. Is that what you're saying?
Starting point is 00:08:47 Yeah, that's correct. Like the problem of program synthesis, which is I give you some sort of specification or intent and you generate the program for me, guaranteed that it does what I asked you to do. This is like an ages old problem. The ability to solve it for realistic payloads kind of
Starting point is 00:09:08 is a thing of recent years. And it is really due to the convergence of several kind of trends. One of them is the availability of training data, like all high quality open source code that you can train on. That's like
Starting point is 00:09:24 one ingredient that you need. Second ingredient is the maturity of program analysis techniques, static program analysis technique, the ability to kind of extract some essence from this text, from the bunch of code. And third thing is the maturity of ML models and availability of really powerful ML techniques. And fourth thing is just computational power. All these things
Starting point is 00:09:48 require massive computational power. So that's just a thing that you couldn't get a decade ago. Is this high-performance computing power, or is that the kind of machine you have to run this kind of stuff on? It's just like today, it's like clusters
Starting point is 00:10:03 of GPUs or TPUs that you can get on Amazon, GCP, Azure. It's a commodity these days. Right. I saw somebody recently ask their employees on Twitter, they said, hey, here's another Christmas present. Do you want a beefy Linux laptop, or would you
Starting point is 00:10:20 rather like a phenomenal cloud dev environment? They're like, I want a phenomenal dev environment on the cloud kind of thing like i want somewhere i could run my stuff with extreme power right rather than an extremely powerful laptop yeah i like both right yeah why not you can't both please yeah i'll take both please why is it either or would you say that art that's artificially generated i don't know how to describe that but like art in the ai space like how you see that ai generated art is that a version of program synthesis? Because you say, here's an intent,
Starting point is 00:10:49 and you got some models, and then here's like this artifact. It may not be a program, but is that a variation? Art synthesis? Yeah. It is synthesis. It's art synthesis, as you said. It's not program synthesis. It's interesting because it kind of involves the same act of curation that you get with a lot of program synthesis that the machine generates something for you and now you have to look at the code that has been generated and say like hey is this what i really wanted and with art this kind of judgment is easier right like is this pretty or is this not pretty right it's subjective it's much easier to do than to get a hundred lines of code generated for you and say oh yeah i'm sure that this actually connects to kafka gets a record you know puts them in mongo and then sends the email through
Starting point is 00:11:35 twilio whatever and so which is i think much harder to do it all seems very hard i agree i think program synthesis sounds a lot harder and we're getting to a point now where we have you can talk about where tab nine is today and maybe where it's headed but there's so many avenues we can go down with this conversation i guess where we should start is like when you had that idea and you're showing your co-founder in that coffee shop like what exactly were you showing him that excited you? And then how close is Tab9, which you describe exactly what it is today as a marketable piece of software or a tool.
Starting point is 00:12:14 How far is that from what you showed him or how far have you diverged since then? Yeah, so I'll start by saying what Tab9 does just so we have context. So what Tab9 does is it's an ai assistant that connects to your ide and generates code from you based on the context in which you're operating so that a massive part or challenge in instances generating code is figuring out the intent what is it that you'd like the machine to generate for you, right? And so
Starting point is 00:12:46 if you ask me to start writing specs of what is it that you want me to generate, you want the machine to generate, this is never going to work because writing the spec is going to be as hard or harder than writing the code in many cases. And so the magic of Tab9 is that you don't need to write any spec. It contextualizes on what you currently have in the editor, in your dev environment, on your machine, in your project, and kind of predicts what is the next thing that you're going to write and write it for you. So in a sense, if you're familiar with Google Smart Compose, Gmail Smart Compose, it's like Smart Compose for code generates the next thing for you. You just have to tab through and accept them. That's kind of what tab 9 does. And I can talk again forever on the challenges of what that is,
Starting point is 00:13:35 and I think we can go down that route later, but just going back to the coffee shop. Yeah, is that what you showed him? Absolutely not, not even close. It was like... I think it's totally unrecognizable. Some of the core technology is still there, but it is utilized in completely different ways.
Starting point is 00:13:57 So what I showed him back in the day was being able to take a huge code base, train on the code base, and generate a model that if you give him a small prompt of like a part of program, it gives you like a larger program that contains that piece. And so it's kind of like a super sophisticated code search based on ML models. That's kind of like what it was at the time. So it could be utilized to do a similar task to what Tab9 is doing today, which is kind of,
Starting point is 00:14:28 I give you the first three lines of the program, you give me the next three lines. But it wasn't even close, again. It's funny how these things evolve, right? Because a lot of the challenges, the bottleneck for a lot of these processes is actually the human, which is surprising, right? So you have this very tight loop of human and machine who are writing a program together.
Starting point is 00:14:51 This is what's going on in tab nine and in other synthesis kind of systems. And the limiting factor is the human, because the human in this tight loop has to say, oh yeah, that's what I want. Oh yeah, that's what I want. Oh yeah, that's what I want. And only the human knows the intent. And so the machine can generate this massive map of what could be possible futures. And the developer has to navigate through this map and say, yeah, that's what I want. And it's very tricky to get the granularity of that interaction right.
Starting point is 00:15:27 Like how much code should I generate? Right. Well, if you think about the end game of program synthesis, you kind of remove what now is what a developer is, is a person who takes human intent and turns it into something that the machine can execute. There's a lot to that as well. But if you think of the end game, perhaps, this is what I would think it is.
Starting point is 00:15:44 Maybe you have a different idea of where it could could potentially go. You know, I used to be a contract developer. I'd have people come to me and say, I want Facebook, but for dogs or whatever, right? Like that's their spec. That's their intent. And like between that and a working product is like the world, right? And granularity and abstractions and like drilling down. And at what level of abstraction can this synthesis get to? Like right now it's at like three lines of code or like down in a function. But is the end game for synthesis like to go higher and higher level to where I can describe maybe I'm writing the user story and that's all I have to write? What do you think it is? Where could this potentially go? I don't think so because, you know,
Starting point is 00:16:25 a lot of programming is actually discovering the spec. I would say that programming is actually mostly discovering the spec. Yeah, because you don't actually know what you want, right? You don't know what you want until you wrote it and you've seen what it does and say, oh, no, that's totally not what I wanted here, right? And so, and this is also why all this discussion
Starting point is 00:16:44 of AI replacing programmers and all these things is like that's not going to happen exactly because the hard part is discovering the spec and the code is the spec and so this is your real job your real job is not knowing the syntax of how to do a certain thing in python and this is exactly what tab nine is trying to help you with kind of like remove the syntactic barriers, make this foreign language easier to speak for you. So you don't have to like get bogged down in the syntactic details of how exactly to read a file line by line in Python, something that you've done a million times before. And if I show you the code, you know that that's what you want, right? That's like the typical use case of type nine is like I tab through, I say,
Starting point is 00:17:28 oh, type nine guesses that the next thing that I want to do is read a file line by line in Python. It presents the code to me in the editor and I just tab through. I say, yeah, that's exactly what I want. I just don't want to write it again. I've written it a thousand times. I get it out of the way.
Starting point is 00:17:42 This is not the part that is interesting. And it kind of relieves me from this like mundane stuff. Right. And lets me focus on really discovering the spec, doing the interesting parts. Or what's even more empowering is I know what I want to do, but I'm not quite sure how to do it. And that can show me, here's one way you can do it. Okay. Maybe I'm not sure if that's the best way, but I'll use it for now and I'll try it and maybe I'll find out later if there's a better way of writing it, but at least it's empowering me right now. You said too that you're not sure what works well. Is there a feedback loop, at least in tab
Starting point is 00:18:14 nine, where you present the user or the developer with the option, so to speak? I start to write this function and it auto-completes it for me. Is there a way you can step through the various options available? Yeah. What's that feedback loop for you to know? Like, oh, this was a good, so you can like retrain yourself. Okay, this was a positive code sample to share with that user and they used it and it worked out. Yeah, there is a feedback loop.
Starting point is 00:18:38 Right now it is, for privacy reasons, mostly remains on your machine. It doesn't go to a central model. So, top nine is, we always joke that we are, like both Roar and I are developers. So, it's a kind of a developer company led by developers. And so, it's typical for American companies to have like values or something, all these fancy things that developers typically hate,
Starting point is 00:19:04 and we hate them as well. So, we only have really one value in top nine, which is loyalty to the developer. So we are super transparent about runs where your code never leaves your machine unless you opt in for it to leave your machine and stuff like that. We're super careful about that because we are developers and we understand the sensitivity around that. So all the feedback loop that you ask for, that they ask about, happens locally unless you opt in for it to happen on the cloud. And it happens locally unless you opt in
Starting point is 00:19:37 for it to happen on your team. And everything is kind of like very carefully designed to keep the loyalty to the developer. On your privacy section of your site, you talk about the custody of your code. You say the tab nine never stores or shares any of your code. And so you give that choice to the end user. And I think I had a conversation with the CEO of Sourcegraph. And early on in their career, they had, I guess, career in terms of product, not so much career in terms of individual.
Starting point is 00:20:09 But what they had initially, a challenge was gaining the trust. So you may have the best possible technological solution, the best product that can actually achieve what the user wants, but gaining that trust is hard. And maybe that's jumping ahead a bit but maybe let's remark that or answer it now if you want to but i'm imagining that's probably the challenge is like gaining the trust of people to you can say you have that loyalty but you have to showcase it it has to you have to show up and do it and gaining that trust to get people to use tab nine is potentially half the battle yeah i think people trust top nine. I think we gained this trust pretty early in the process because we committed to running on local host
Starting point is 00:20:54 unless you opt in for something else. And it's easy to see that this is what's happening, right? And so top nine today is used by millions of people in their IDE daily. And so I think it's very popular. I can talk a lot about why I think it gained such popularity and also why people like it. Let me just say one thing about that. I think at least from my conversations with people, one of the things that you like is kind of the variable reward. You don't know because you're working with an AI system that is not deterministic. You don't know what
Starting point is 00:21:36 is that you're going to get from tab nine. And sometimes it's like magical and sometimes it's magical in a bad way. Like what the hell just happened and because this variable reward it becomes like very engaging you're always kind of curious what's going to happen next and what you find out because this is for developers and developers like advanced stuff right they like like to geek out with the edge of the technology. They are actually rooting for Tab9 to succeed, which is kind of unheard of for this kind of product.
Starting point is 00:22:13 You see people and they're sitting there like, oh, yes, I'm so happy it got like, it predicted the thing that I wanted it to predict. I'm happy for both of us. Sharing it on almost. Yeah, exactly. And you see people sharing on Twitter, like, here's some prediction that I got, which is I find it cute. This episode is brought to you by Teleport. Teleport lets engineers operate as if all cloud computing resources they have access to are in the same room with them.
Starting point is 00:22:57 SSO allows discovery and instant access to all layers of your tech stack, behind NAT, across clouds, data centers, or on the edge. I have Ev Contavoy here with me, co-founder and CEO of Teleport. Ev, help me understand industry best practices and how Teleport Access plan gives engineers unified access in the most secure way possible. So the industry best practice for remote access means that the access needs to be identity based, which means that you're logging in as yourself. You're not sharing credentials from anybody. And the best way to implement this is certificates.
Starting point is 00:23:29 It also means that you need to have unified audit for all the different actions. With all these difficulties that you would experience configuring everything you have, every server, every cluster, with certificate-based authentication and authorization, that's the state of the world today. You have to do it. But if you are using Teleport, that creates a single endpoint. It's a multi-protocol proxy
Starting point is 00:23:51 that natively speaks all of these different protocols that you're using. It makes you to go through SSO, single sign-on, and then it transparently allows you to receive certificates for all of your cloud resources. And the beauty of certificates is that they have your identity encoded and they also expire. So when the day is over, you go home, your access is automatically revoked. And that's what Teleport allows you to
Starting point is 00:24:16 do. So it allows engineers to enjoy the superpowers of accessing all of cloud computing resources as if they were in the same room with them. That's why it's called Teleport. And at the same time, when the day is over, the access is automatically revoked. That's the beauty of Teleport. All right. You can try Teleport today in the cloud, self-hosted or open source. Head to GoTeleport.com to learn more and get started. Again, GoTeleport.com. So how long have you been working on this? I think personally, I've been working on this probably for more than a decade now around this area. But Dvor and I, I think we started around 2014. It was just us.
Starting point is 00:25:17 We kind of played with ideas for a couple of years until we felt like that it has some value. And then we went to get funding so that was probably 2016-17 kind of timeline and at the time it was called Kodota which is a great name in Hebrew maybe but an awful name for all other intents and purposes and in 2019 when we acquired the company called Tab9 we actually changed the name to be Tab9, which I think is a huge improvement over Codota. So that's kind of the timeline for us. Yes, good name, solid name.
Starting point is 00:25:54 And so you've been working on this for a very long time. Curious how you felt, so we haven't mentioned GitHub Copilot or OpenAI Codex yet, but the 800-pound gorilla kind of launched a big, made a big splash when they announced GitHub Copilot. GitHub, of course, the repository for most open source code in the world. And so a huge splash went out. And I'm wondering how you felt when that happened. And I'm sure you
Starting point is 00:26:17 have reactions since then or positioning, but did you feel validated? Did you feel offended? Were you like, oh, great. What were you feeling? No, definitely not offended. Definitely more on the validated side. Like when we started, like people were like, you guys are crazy. This is never going to work. And if it's going to work, nobody's going to use it. Right. So it's like, it's not even a category.
Starting point is 00:26:38 Nobody cares. Yeah. There is autocomplete in the editor. We don't understand why you need another autocomplete. Right. It's like it doesn't parse to people. And so trying to kind of educate and tell the world about AI assisted software development being a thing is a huge hurdle. And definitely Copilot is
Starting point is 00:27:00 validating this domain. And I think the important thing is that AI-assisted development is here to stay. And the world really needs an independent platform for doing that other than Microsoft. And so Microsoft is great. They can have co-pilot and it's awesome. I think the product is very nice and we like it a lot. We see a lot of the evolution that we've gone through.
Starting point is 00:27:29 They're going through obviously much faster than what we could do right at the time. But it's very interesting to see how it evolves. And so there's definitely going to be, as you said, the 800-pound gorilla of Microsoft, AI-assisted development in the room, but there's always going to be also some independent platform such as Tab9 in the room, and we're happy. We're happy that at this point it's us and Microsoft. We're actually proud. I think we're actually, at least in terms of users,
Starting point is 00:28:05 we are definitely the leader in this category. And so we're very proud to be co-pilot and top nine. It's actually a great sentence to say Microsoft and top nine. Yeah, great. So actually validation above all. Yeah, I try to empathize. I hear what you say that I think. But then again, I don't know what's behind the scenes.
Starting point is 00:28:24 I don't know what your user base is like. I don't know what your user base is like. I don't know what your company growth is like. I know that you're at a Series A right now, probably approaching a Series B or at least due for one considering. I think you might make that Series B a lot easier considering you have a large gorilla in the room because you could be a large gorilla too because you've been in the space for so long. And you all have such domain knowledge. I would definitely bet on your horse in the space for so long right and you all have such domain knowledge i would definitely bet on your horse in the race so i would imagine that your valuation and fundraising possibilities just went up considering so i would actually having said all that i think i'd be a lot more excited considering that because now your future is probably a lot more brighter considering
Starting point is 00:28:58 whereas before you were you were fighting the uphill battle you had to keep proving what you could do and obviously developers are probably like, Tab 9 is awesome when they use it, but everyone else who doesn't get it is probably like, what? Why do I need this? Yeah, why do I need this? What is this? Yeah, definitely a lot of the market education,
Starting point is 00:29:13 let's call it, is something that Microsoft can do much more efficiently than us. And we're happy to benefit from that, you know, just as GitLab benefited back in the day, exactly from that. Yeah, so y'all are much more optimistic than I am. So I go straight to Shark Tank and Mr. Wonderful
Starting point is 00:29:32 and I just think to myself, what's stopping Microsoft from squashing you like the cockroach that you are? No offense, but that's Mr. Wonderful. He's quoting Mr. Wonderful directly. That's exactly what he would ask you, right? He'd say, why don't they just squash you like the cockroach that you are? And so I get that you have independence and you have existing customer base and it's great.
Starting point is 00:29:53 And you also probably don't have a massive payroll like they do. Like you're not at the scale they are. You're not an 800 pound gorilla. So you don't have to have 800 pound gorilla in revenue. But what do you do against that size of a beast how do you differentiate how do you stand apart how do you stay alive yeah it's a very good question i think i'm uh again the reality is closer to what adam said in like the opportunities being opened up and like the category being validated is a
Starting point is 00:30:22 transformative time for tab nine actually actually, in a positive way. And our approach is just very different than Copilot at this point. We are technically creating models that are tailored for you and your team. So we allow you to train models on your data sets, on your code base, and give you specialized solutions that are aware of your team's vocabulary, so to speak. So imagine that I have this huge model that has been trained on the universe, but it knows nothing about Iran's code. It just is not aware of what I'm doing in my company, my team, and the practices that I'm using locally. And this is exactly what Tab9 lets me do. It lets me connect to my repo, train my own models.
Starting point is 00:31:09 And these models are tailored for my setting and for my practices, basically. And this is the differentiation that Tab9 has right now. And obviously, we're building much more functionality in other facets of the development lifecycle to complete the picture. So, Tab9 is really, it's not a code completion tool. It is the single source of truth for how to write code in your organization. And it's an active single source of truth in that it learns from your code and then it helps all the developers on your team kind of align and be better and write better code for your particular setting and so that's like what top nine does but don't say too much about what you're doing because i can imagine someone
Starting point is 00:31:55 from github listening to this thinking okay this is what we're gonna do next i'm kidding i'm kidding i mean you have to you can't keep your secrets right you have to put your stuff out there i'm just kidding around mostly yeah putting, putting your stuff out there. And, you know, there are plenty of smart people. I'm sure they don't need us to figure out potential next steps, right? And so you never know. You never know. Well, I think the couple of downsides I would say to Copilot, having not used it either.
Starting point is 00:32:20 I've seen many people talk about it. And Jared, I think you said you had some antidotes to share. I think that the thing that Copilot, where it's at is that it's GitHub, one. It's inside of Visual Studio Code. So there's some downsides there where it's sort of like in its box. And I know that even the same thing we had with the conversation with Corey Wilkerson on Codespace is that they have bigger plans to support beyond VS Code, that this is their starting point. I get that. So maybe at some point they will have the differentiation that you currently have. Copilot, I think, is available in Vim now, according to GitHub Universe. Yeah, it's going to be available on more platforms, I'm sure.
Starting point is 00:32:58 And that might just be enough. Yeah, I think that VS Code is another advantage because they have built-in distribution. It's like Slack versus Microsoft Teams. Why are all these companies on Microsoft Teams when Slack is clearly the market leader in all this? And it's like, well, because it just comes for free as part of this other thing that we have. And VS Code is kind of a de facto installation engine,
Starting point is 00:33:21 whereas Tab9, you have to go get it. And y'all are available everywhere, which is awesome. Like all the common editors, all the IDEs, it seems like for the most part, you're available there. But you do have to go and want it and install it and go from there. Whereas like just having it bundled, it's the old Microsoft bundling thing. I mean, it's going to be a force, I think. But, you know, it's not like Slack is struggling. They're doing all right.
Starting point is 00:33:43 I mean, they got bought by Salesforce for whatever reason. Slack's killing it. People still use Slack, and I'm sure people are still going to use Tab9. Yeah. There's probably a market for both. Yeah. I think that we'll see that play out.
Starting point is 00:33:54 I don't think you have to choose, but I do think there's an advantage for sure. I'm sure that you're actually going to see more competitors in this space. It's not going to remain Microsoft and Tab9 forever. Now that the category is established, I'm sure that more people would like a slice of that pie. Let's say this then. So you started in 2014 and then in 2016 you got funding. You're at Series A right now, probably approaching Series B considering just the timeline. I'm imagining in the last year
Starting point is 00:34:20 you've got something happening in that space. You could tease it if you want to, but whatever. How do you stay financially competitive? How do you win customers? How are you working in that space? How have you evolved? And how are you planning for this future? Because you're going to have more competitors. You already have a big grill in the room and you can assume more coming.
Starting point is 00:34:39 Sure. How are you preparing for that? Top nine is, again, a developer-first company. And so our growth and our also financial growth is based on bottom-up motion. So we basically have developers love us. They onboard their teams. They train their own models on their code and get even more value from TAB and stay with tab nine and bring more teams from the organization until there's a critical mass.
Starting point is 00:35:09 And then we basically expand to the entire, let's say department in the org. So that's kind of the natural motion of tab nine. I believe this is the way it's going to be for a while now. That's the natural order of things. So you make developers happy and they continue to showcase how it's useful to them, their team, other teams see it, get adopted. Yeah, that's the dynamics that we're seeing. And it's all based on the developer love.
Starting point is 00:35:38 Do you have a lot of outreach to those developing teams? Do you have team managers or people inside Tab9 who maintain relationships? No, none. Almost none. Yeah, Tab9 is developer-centric, almost a fault in a sense. We have zero salespeople. We have almost no top-down sales at all. And everything is happening organically from the developers up is that right yeah we have some community on you know discord and like the usual thing that you have around like
Starting point is 00:36:13 community but it's mostly actually a support channel above all yeah that's great what's growth been like then in the last year so let's say copilot came out what roughly a year ago nine months ago what's the time frame? I don't even know. Time's weird to even expect these days. Yeah, it's been trickling to more and more users. Growth has been very strong. Very good. You're happy then. You're smiling. You're not frowning.
Starting point is 00:36:36 No, definitely not frowning. And actually, as things go, Copilot creates awareness. So even if it takes some piece of the pie, the overall size of the pie increases. And so Tab9 is actually happy. So one of the major differentiators comes down to really the ability to synthesize, right?
Starting point is 00:36:53 Or the ability to generate the code and have it be useful more often than not, or more often than my shallow experience with Copilot or whatever, because, you know, developers were kind of fickle and we can have one bad generation and you're like, ah, heck with this tool. I'm switching to tab nine. But then you show up and you got to have some good generators. So it seems like the models at play
Starting point is 00:37:15 and the data are going to be important to differentiation. Also, the way that you go about deploying and everything is important as well. So the free tab nine that you can just install and use is important as well so the free tab nine that you can just install and use is based on a public data set right so it has the open source thing that copilot also uses which is open ai's yeah codex that's their thing you have your own thing they're both trained on data sets correct how do those data sets differ you know how they're doing it
Starting point is 00:37:42 yeah i think i have a pretty good idea. So I think maybe one of the differences is that Tab9 has only been trained on source code with permissive open source licenses. So when we started, we were not sure what is going to be the legal implications of training on code that is GPL or not. And we said, let's just not worry about that at all. Let's exclude anything that is not permissive open source license. I don't want to worry about the ethical implications or the legal implications of training on GPL. And let's keep it clean.
Starting point is 00:38:22 I think in retrospect, that has been a pretty solid decision. I'm sure that a lot of people don't care, but some people do. And as developers, I guess we do. Yeah. So those licenses are MIT, Apache 2, BSD 2 clause, and BSD 3 clause. So those are the four licenses, the permissive open source public licenses that you leverage against in terms of where you can source code from. Yeah, that's right. Again, the beautiful thing about top nine is that you're always in control. Like if you're a developer
Starting point is 00:38:55 and you feel that you need a model that does something else, you want to train your own model for your own use using some other datasets, then we facilitate that. You can do that. You can train on your project. You can train on some other code and build your own model. So again, you're in control. You control what is the data set.
Starting point is 00:39:16 You control where inference runs on your machine, in the cloud, in your cloud, in public cloud, wherever. So you control all the kind of the mechanics from the source up to the inference in tab nine. So let's say I got myself a copy of, somehow the Windows source code was leaked and I happen to have it. Could I train at my own will?
Starting point is 00:39:39 Obviously I'm breaking laws and all these things if I want to, but is it my choice to do that? Is that what you're saying? Like if I have the source code? Yeah, yeah yeah if you have the source code you can train i wouldn't know that this is the windows source code right we don't know what our customers of course not i'm just saying yeah we don't know what our customers are training on so you could in effect do that i was trying to go as far as i could of potentially offending but obviously as a hypothetical not a realization but if i had a copy of source code that was not mine whatever i could train on any code i have essentially yeah you could and we
Starting point is 00:40:12 have no way of knowing what you're training on and we actually don't want to know but the beautiful thing is that assuming that you've trained on something of value to you and obtain it legally now the model is being able to generate code that is following those kind of practices that you've seen that code base and that's super important again because you're getting a model that is specialized for your kind of like ecosystem for your own the way and the kind of the technology that you work in. So I guess one advantage of being David in the David and Goliath story, besides the fact that David wins, so you got that going for you.
Starting point is 00:40:51 But like these things will probably play out in courts of law at some point. I mean, you're avoiding the whole GPL thing entirely by excluding it. But I mean, if anyone's going to get sued, right, it's going to be Microsoft, I think, or GitHub. And so you can at least sit on the sideline and see what happens in court and decide what you do. I don't think so.
Starting point is 00:41:09 I don't think there are going to be any lawsuits here. Maybe, I don't know. I'm not a lawyer, right? So I have absolutely zero legal training. I don't know that this is something that David cares about deeply. David cares about generating value to developers in a way that respects developers. And this is what David is doing in this story. Right. Well, I guess my point wasn't them getting sued. My point is, if it plays out so that it's totally fine to use GPL code in your model. I mean, you guys are limiting quite a bit
Starting point is 00:41:44 the amount of data that you're using. Correct. So you would think, you know, Joe Blow over here watching the sidelines think, I bet OpenAI's codex is better out of the box than tab nine, because they have way more data. Now, once you can start customizing it and training your own models and stuff, maybe not, but like that first user experience, and maybe that's not the case. The universal model, again, there are so many trade-offs at play here, the size of the model, where inference runs, whether it's tab 9 cloud or tab 9 local. So I wouldn't go into the breakdown of this whole thing, right? But there is, let me just say, there's enough data that is with clear permissive open source license.
Starting point is 00:42:25 Sure, but like you're missing the Linux kernel, you know, like there's so much knowledge inside the Linux kernel. Wouldn't you want to be able to use that? I would like to be able to use it, but I restrict myself because the trade-off, I think, is still that I have sufficient amounts of code with the permissive licenses. So even if it came to where it was no big deal, all open source code, all licenses that are open source, you would leave it with permissive. At this point, I think that's the right choice.
Starting point is 00:42:49 If we know down the road, we're convinced that we're not infringing on people's work in a way that offends them, in a sense, that is unethical to them, we may change that decision. It also depends on the granularity of the predictions that you're making. If what you're predicting is a line of code, then I think this is less of an issue, right? Because I trained a model, it predicts like, you know, 10 words now. Is there really a meaning that these 10 words were trained on the Linux kernel?
Starting point is 00:43:18 I'm not sure. But if I give you a snippet of 10 lines that comes verbatim from Linux kernel, then I think it's easy to make the case that this is like not respecting the license of what you intended when you license your code in a certain way, right? We're really not focused on kind of the legal ethical space. I think that's like you can spend your life just pondering on these questions. We're focused on getting value and doing that cleanly. Tab9 works on line completions and snippet completions, so it could work in kind of both levels.
Starting point is 00:44:00 We find that most developers actually get most of the value from the shorter completions, and the reason is exactly the tight loop that I mentioned earlier. The developer wants to say, tab three, tab three, yeah, accept, accept, accept. And they want to be able to make the snap judgment that what tab nine gave them is actually what they meant. And so when you give me a line as a developer, it's easy for me to make the snap judgment. Yeah, that's the line I wanted. If you give me 30 lines, that's like, oh, wait, you just made me read a bunch of code that was written by a machine and can have subtle bugs somewhere. And now as a developer, you made me do the thing that I hate the most, which is read
Starting point is 00:44:39 other people's code, right? There is definitely a question of the human interaction, the granularity of completion that is right for the human consumption. So let's say I write a function name. I'm in JavaScript and I write function. Upload to S3 and I hit tab. Tab 9 is going to generate the code that gets that done or it's going to give me var bucket equals some bucket. It's just going to give you the next line? Hopefully it's going to give you const, I guess,
Starting point is 00:45:08 and not var or let or whatever it is that it's like. Sorry, I'm old school still. He's linting you on the flyer, Jared. I'm pseudocoding over here. Come on, help me out. Yeah, so tab9 will likely, even if it knows the entire snippet, it would unroll it for you line by line.
Starting point is 00:45:26 So even if internally the model has predicted the entire snippet, it would in fact unroll it for you line by line because of two reasons. One, it gives you more control and basically you can choose. It gives you more than one option every line. And there are subtleties in these snippets, right? Like connect to S3 bucket weight. There's a timeout. There's a default. There's this. There's that. more than one option every line and there are subtleties in these snippets right like connect to s3 bucket wait there's a timeout there's a default there's this there's that there are many nuances to this snippet that if you're just going to accept it as a whole you may be missing out on something right and so there is a sense of like walking together with you through the snippet
Starting point is 00:46:02 line by line saying oh yeah yeah yeah, yeah, yeah, yeah, no, I want the timeout. Oh, no, I want some other default value. Oh, yes, do some error handling if you don't find it. And so it's actually this guided, let's call it the guided walkthrough of the snippet is valuable to use a developer. You actually learn from it. You become better than just like copy pasting like an entire snippet yeah this is
Starting point is 00:46:26 a stack overflow generation it's i like that and you can actually see this in action at tab9.com slash i believe it's slash pro where and i'm not trying to advertise anybody to go sign up but you can try it if you want there's a great video there that showcases tab 9 free versus tab 9 pro and you can kind of see this in real time it begins with a to-do comment which as any developer you might often put forward slash forward slash to do all caps and you start writing out what you might want to do and it goes through this entire essentially what you just said there jerry which is not in this case the use case wasn't the s3 bucket but it was exporting defaults and setting consts and stuff like that through. I believe this is I believe it's JavaScript and HTML just by looking at the code sample.
Starting point is 00:47:13 But it's walking you through all the process and it's showing you, OK, each little piece. The function is a user panel. Is it dot react element? All those good different things. It's going through each little piece versus spewing code at you. It's a little bit by little bit as you might do on your own. And that's interesting because it does help you learn things. I might learn new things about a different function I haven't used before because if this model is trained on my code base, well, I might write a new page or new something in our application
Starting point is 00:47:43 that Jared has done several times, and maybe I'm new to that piece and I can leverage all of his experience by way of machine learning through our own application. And I can step through those things and I'm not necessarily cheating. I'm leveraging our current people power, just Jared's work on our code base. And I can leverage that through tab completion with tab nine. I think it's very, very interesting. I think this is definitely interesting in comparison to just open source at large, train on an open source at large model and give me what the consensus has chosen as good
Starting point is 00:48:19 or bad potentially, because there's, there's buggy code that gets generated, like you said, but it lets me leverage our own code base, which I don't think Copilot is doing currently. And maybe it's in their future, but it's not there now. I think that's a very differentiating thing, is to learn on my own code base and base my tab completion on that. And we've seen, you asked earlier about trust in tab 9, and we're seeing people trust us with code all the time.
Starting point is 00:48:44 So huge numbers of teams and and companies are basically sharing the code with tab 9 to train models on obviously we don't store this code anywhere we don't want to we train the model and the code goes away right but they are trusting us with this code. This episode is brought to you by FireHydrant. FireHydrant is the reliability platform for teams of all sizes. With FireHydrant, teams achieve reliability at scale by enabling speed and consistency from a service deployment to an unexpected outage. Here's the thing. When your team learns from an incident, you can codify those learnings into repeatable automated runbooks.
Starting point is 00:49:47 And these runbooks can create a Slack incident channel, notify particular team members, create tickets, schedule a Zoom meeting, execute a script or send a web hook. Here's how it works. Your app goes down, an alert gets sent to a specific Slack channel, which can then be turned into an incident.
Starting point is 00:50:01 That would trigger a workflow you've created already in a runbook, a pinned message inside Slack which will show off all the details, the JIRA or Clubhouse ticket, the Zoom meeting, and all of this is contained in your dedicated incident channel that everyone on the team pays attention to. Now you're spending less time thinking about what to do next, and you're getting to work actually resolving the issue faster. What would normally be manual tickets across the entire spectrum of responding to an incident can now be automated in every single way
Starting point is 00:50:28 with Fire Hydrant. And here's the best part. You can try it free for 14 days. You get access to every single feature, no credit card required at all. That way you can prove to yourself and your team that this works for you. Get started at firehydrant.io.
Starting point is 00:50:42 Again, firehydrant.io. Again, firehydrant.io. This episode is brought to you by Sourcegraph. Sourcegraph is universal code search that lets you move fast, even in big code bases. Here's CTO and co-founder Byung-Loo explaining the problems that Sourcegraph solves for software teams. Yeah, so at a high level, the problems that Sourcegraph solves, it's this problem of, for any given developer, there's kind of two types of code in the world, roughly speaking. There's the code that you wrote and understand, like the back of your hand. And then there's the code that some idiot out there wrote.
Starting point is 00:51:15 Or, you know, alternatively, if you don't like the term idiot, it's the code that some inscrutable genius wrote and that you're trying to understand. And oftentimes that inscrutable genius is like you from, you know, a year ago and you're going back and trying to make heads or tails of what's going on. And really Sourcegraph is about making that code that some idiot or inscrutable genius wrote feel more like the code that you wrote and understand kind of intuitively. It's all about helping you grok all the code that's out there, all the code that's in your organization, all the code that is relevant to you in open source, all the code that you need to understand in order to do your job, which is to build the feature, write the new code, fix the bug, etc. All right, learn how Sourcegraph can help your team at
Starting point is 00:51:58 info.sourcegraph.com slash changelog. Again, info.sourcegraph.com. Ron, do you think this level of granularity is ideal for AI assistant? Or do you think it's ideal for now? Like, is there a world in which I'm not writing my functions anymore? Or is this like the best way to interact with this kind of a system? I think for idiomatic things, there is a world in which you will not write your functions anymore. But it'd be very idiomatic. In a sense, this is already happening with libraries, right? Like you don't write, you use the library instead of writing the function
Starting point is 00:52:51 because what the library has encapsulated is so repeatable and so common that now every language or library has a sort function because everybody knows what sort does. You don't have to redefine it. So you never write because everybody knows what sort does. You don't have to redefine it. So you never write the code for quick sort anymore. You don't need synthesis for that. So once something becomes repeatable enough, it just falls into the framework, right? It falls under the hood, basically. And so these functions are already not written. What is still being written and maybe can be avoided in the future is simple
Starting point is 00:53:26 compositions of these functions that are also kind of semi-repeatable. This is what tab nine in effect is doing right now. It's generating for you these ad hoc compositions that we're doing, like no read from Kafka, call to Twilio, whatever. We're all connecting APIs all day, right? And so this is what tab nine allows you to do to kind of connect this, generate the code that connects this. But to your question, unless it's very idiomatic, it's very hard to generate the entire function body because again, you don't know what is the intent.
Starting point is 00:53:57 The function name is not sufficient for capturing all the nuances. So if you're working in a special domain, maybe there are defaults that the synthesizer can kind of understand to give the scaffolding of the function. But in general coding, there's just not enough information in the function name. If you have maybe a doc string
Starting point is 00:54:20 or some documentation in English that explains what the function is supposed to do, that may be helpful in giving you the first shot of the function, but you will still have to refine it, I think, again, because of all the nuances. So even if I give you the code, you will probably have to edit it a little bit to bring it to the final form. Again, remember that what we're all doing is kind of specification discovery by iterative refinement, right? So maybe you can get the synthesizer, you can get tab9 to generate the first shot of the function, but then you'll have to refine it on your own. And then there's really
Starting point is 00:54:56 a question which is kind of human psychology. What is easier for you? Generating it line by line, following some line of thought or getting an artifact that is 30 lines and trying to refactor it to what was your original intent. And this is really, as I said, the bottleneck here is the human. And it's interesting. I don't know. One of the things that you see with tab nine, for experienced users of tab nine, they start to change how they work to get even more from the tool. So they kind of get a feel of what tab nine does. And then, for example, they start writing kind of the building block functions before they compose them.
Starting point is 00:55:40 And then tab nine is aware of them and can compose them. So the order in which you build things whether it's bottom up or top down as you're writing your program matters for tab nine to contextualize on again i'm happy to i'm not sure that this is clear it's kind of it's clear in my head but it's maybe too abstract yeah i followed you can we talk about how the developer interacts with, I would call it like probing tab nine for something. Maybe one of the ways you do that is by writing code and it starts to think about what you're going to write.
Starting point is 00:56:13 Or as you said, writing out a scaffolding. So it's aware of the whole entire current file you're working in and what maybe your intent is because it's learning on the fly, I assume. Or maybe similar to the way you have Slack commands, you could do like forward slash and then like some sort of command kind of thing where you can say like, okay, here I want to interface with a Kafka API.
Starting point is 00:56:33 And rather than like starting to write code, maybe you can like command into it. I'm just thinking like, that's where I think we should talk more so because that's where I think developers will begin to really get aha. They get the idea that you can learn on open source. You can learn on your own source code. More so because that's where I think developers will begin to really get aha. They get the idea that you can learn on open source.
Starting point is 00:56:50 You can learn on your own source code, et cetera, et cetera. But how do you, how does the developer interface with tab nine to become a better developer, probe it for information to complete on, et cetera? I think that's the interesting part, you know, how that works out. Like how does that interface work now and potentially in the future with interfacing with tab nine? Yeah, that's the interesting part, how that works out. How does that interface work now and potentially in the future with interfacing with Tab9? Yeah, that's super interesting. And in fact, I always say that Tab9 is half an AI company and half a user interaction company because half the challenge here is really the model knows a lot.
Starting point is 00:57:21 The model knows much more than you can imagine, actually. And the question is, how do you expose these things without overwhelming the user? A lot of it is actually quite scary. If I show you what the model knows, you'll be like, holy, this is like, that's great. That's like crazy stuff. And so the question is, how do you engage with the user?
Starting point is 00:57:40 How do you, there's like also an important aspect of, I'm trying to put it tactfully, an illusion of control. So even if I walk you through something that is completely deterministic in a sense, the fact that I walk you through it gives you a sense
Starting point is 00:57:57 that you're still in charge, which is really important for the developer, right? It's like a logical thing for sure. When you make your own choices, you feel more sturdy in those choices because you were a part of making the choice. Yeah, exactly, exactly.
Starting point is 00:58:08 And even if I walked you deliberately through a very deterministic kind of path, the fact that you think that you've made certain choices, it's less offensive to you as a human than just consuming code generated by the machine, right? Huh, it's like holding your hand yeah it's very important it is really important and this is part of the the interaction model and we tried a lot of interaction models over the years including ones that had larger snippets
Starting point is 00:58:36 we even had like click something and we'll show you the five snippets that you want on the side kind of thing on a sidebar and we thought that this is like you know when when i saw that i said like man this is like the most amazing thing this is gonna like blow people's heads it's like amazing and people hated it the users hated it and the reason that they hated it is is it was not in the flow It kind of required me to look at another window and kind of look through five options and read the five options and try to understand the subtle differences
Starting point is 00:59:12 between these pieces of code. And as a user, rather than accelerating me, it was like extra work for me. I came for the tool to get help and the tool is sending me to do like additional work, additional kind of vetting through kind of snippets. And so people really hated that. What you're saying there is cognitive load. If your model has so much information, I'm actually listening to this book right now. And I say listening because I listen to a lot
Starting point is 00:59:41 of books instead of reading. And it's pretty interesting. It's called We Are Legion. And in parentheses, we are Bob. And I don't know if you've seen this, but it's a series. And essentially, I won't give the full plot away, but essentially, I'm going to not give it away. It just basically talks about AI. Bob is AI long term. I don't want to like ruin the story for anybody, but it's pretty interesting how i'm listening to this book about ai and it's bob used to be a human cryogenic died became ai that's the basic premise of the book i'm not plot killing it by any means so the book really now is like the narrator is bob and
Starting point is 01:00:18 bob is ai and bob makes more bobs so bob isn't by himself. And as Bob clones himself, he thinks, because he's still human as AI, he has human tendencies. He thinks if I clone myself, well, now I'm just making more Bobs. And he finds out that they're not really Bobs. They're just variations of what Bob was. And I think it's interesting because this perspective of Bob in relation to, say, even Tab 9, I think like, as this model knows so much, I think like if the model could just have like a feeling, I suppose, like if only I could tell the human this, their program would be better. I kind of feel like there's that tension between that UX of developer probing tab nine for stuff based upon intent and awareness of their code
Starting point is 01:01:03 base and behind the scenes, tab nine is kind of Bob just wishing they can tell the human all they know about this model. Like I just imagine this tension. There's definitely, again, I say that tab nine is half an AI company and half a UX company
Starting point is 01:01:19 because a lot of the challenges are really in getting the humans to interact with the model in a way that accelerates them and does not slow them down. And so we found out through experimentation over the years that you can suggest a lot of information from the model, even if it's totally accurate. And it's great information that can help you in principle. You just flood the developer with so much information that you slow them down. And so we've optimized tab nine to kind of go in the process that you're familiar with. And if you just keep on typing and ignore what tab nine does, just ignore it, it will never break
Starting point is 01:02:00 your flow, right? So we kind of like always optimize it so it does not break your flow that is suggesting only at certain places that we think are useful for you as a developer, that it makes suggestions of the correct length that we think you can process and is useful to you. And this is a very kind of iterative process of optimizing the load, the cognitive load on the human that tab 9 generates making sure that you're actually accelerating people and not slowing them down
Starting point is 01:02:33 and this is really kind of throttle in a sense it is throttling the model kind of not to kill the human so right now and so it is really like strapping a rocket to the human that has such massive force that will just tear the human apart and you have to like throttle it to make sure that you just right i love all this figurative language and imagery that comes about with this no but it is exactly that you just need to just accelerate it and control acceleration such that you make it you make the human faster but not destroy them in the process. Yeah, let's not destroy the humans. Yeah.
Starting point is 01:03:10 I like that you're so focused on the loyalty to developers and that you had said that you're not trying to – Jared asked the question before about, okay, well, Codex seems to be better because it's got more code samples and it's learned from. You're optimizing for the best algorithm that can give and help the developer versus the best knowledge base, so to speak, that you can autocomplete on or AI upon, so to speak, as a developer user. You're optimizing for the best tool, not so much the best model based upon the best data. You're not going to get him to agree to that. Yeah. No, I'm not going to agree to that. I know he's not going to agree to that. Obviously not.
Starting point is 01:03:51 But I think it is important to keep the user in mind and optimize for the user. It often does mean that you need the better model and the better code base to do that. But you have to realize that the bottleneck to a lot of these processes is the human. It's important to acknowledge that. That's what I was trying to get back to. You're trying to help the human be the best they can be because they have the intent,
Starting point is 01:04:17 they have the human interaction with the team, they understand where the products go on. It's their vision. The model and the AI may be able to predict what code might come next and could be helpful, but it's the human who's driving, at least for now, is driving the decisions on the next steps and what the product should do. Rather than just something like this model knows everything so it can do it. What still needs the human, the creation is still happening because of the human's desire to go a certain direction and provide certain value to hopefully other humans. Maybe I'm writing code for a Roomba and it's not human directly.
Starting point is 01:04:50 Maybe it's the Roomba I'm helping and the Roomba helps the humans. So maybe it's like by way of by proxy, so to speak. But, you know, so to speak, we're we're humans helping humans. That's right. So this seems like it adds another layer of complexity to what you're trying to do here. Focusing on the UX now and less on the model is that your product manifests itself in like 26 places, right? I don't know how many IDE integrations you have, but like the actual user experience is spread out across all these products that you don't own.
Starting point is 01:05:21 And each of those products have their own idiosyncrasies and the way people use them are different and like they have their own feels yeah does that exacerbate the ux problem where it's like hey how much am i going to show and when do i show but then also how do i actually integrate with these or those simply shells that are kind of just left alone and you're just modifying what gets sent to the shell. How does that play out for you guys in practice? That's a very interesting question. So definitely there are subtleties to different IDEs, or actually I should say editors, that behave differently,
Starting point is 01:05:56 and they do get different behavior from tab 9. So tab 9 does behave differently in VI and in VS Code, just by the nature, as you said, of the interaction that is expected in the editor. But Tab9, the plugins themselves, by the way, are open source for all these IDs in the editor. So the Tab9 VS Code plugin or Vim plugin is open source. Some of them are actually community plugins
Starting point is 01:06:23 for other editors, and you can write your own. They all connect to tab nine binary inference engine that runs on your machine and can provide the completions to drive all of these IDE extensions, plugins, which again, the behavior of which may differ, and they may be asking for different things from the underlying binary or kind of getting them differently. But the inference engine is shared between all of those IDs. So definitely, though, there are subtleties of the UX per particular editor. So when you have an aha moment as a designer of this product, you and your team, like you went back to the point
Starting point is 01:07:06 where it was like showing you five things on the side and that became homework or you weren't empowering. When you decide, you know what, we're going to change it to do this way now. Do you have to go out and touch 26, I'm just estimating based on your grid of 25 or so, editor integrations in order to roll that idea out? So some of the, again,
Starting point is 01:07:26 a lot of the infrastructure is shared between the IDs. Like the Sublime plugin, I use mostly Sublime. Sublime is my tool of choice. So the Sublime plugin itself is like 400 lines of code. Most of the heavy lifting is done by the top nine engine and not by the plugin. And so... I see.
Starting point is 01:07:46 So the engine is shared, but it's local. Yeah, you do have to touch the plugins, but it's not like massive changes, right? For some of the more sophisticated UIs, some of them are provided by the engine itself and not in the editor. But again, these are subtleties. Right. So what's the engine written in and how the editor. But again, these are subtleties. Right. So what's the engine written in and how do you guys roll that out?
Starting point is 01:08:09 I'm getting into the weeds here. Yeah, the engine is amazing. It's like, I don't know, at this point it's like half a million lines probably of Rust, highly optimized Rust code. I think we're probably one of the biggest Rust shops in Israel for sure. And I'm not sure that we're actually not one of the most sophisticated ones
Starting point is 01:08:28 in the world in Rust because we're doing kind of specialized ML inference engine in Rust. And we're doing a lot of the other top nine infrastructure in Rust, even stuff that does not necessarily belong in Rust. That's just the way we roll, right? You must really like Rust. Yeah, we do. Not the best tool for the job, but we're going to use it anyways.
Starting point is 01:08:51 You know, once you get used to it, it's a great language, a great ecosystem. It also solves a lot of the cross-platform problems that you typically have when you run something as sophisticated as an inference engine, cross OS, cross architecture, right? And so this is great for us. That is super cool. So does any of that stuff leak its way back out into the open source world at all? Any of your byproducts make their way out to the Rust community?
Starting point is 01:09:17 I don't think that we released, at least up to this point, we have not been releasing a lot of code to open source, not by choice choice just by being busy with the core thing and you know not cleaning things up enough to be useful to others yet at this point so i think that has been the limiting factor not not any strategic decision to keep things closed or otherwise right just like it's not it's not useful for others at this point. What's the size of the company
Starting point is 01:09:50 currently like in engineering staff just to give a picture? So I think we're what is the number now it's growing all the time so we're around 40 people now and most of it is still engineering so that's the size.
Starting point is 01:10:06 And yeah, a lot of Rust, some JavaScript, some Python on the ML pipelines. But yeah. Well, we have a lot of listeners who use and love Rust. So if you are hiring, I'm not sure if you are, but this would be a great time to say that because you probably have some interested ears. Yeah, we're always hiring
Starting point is 01:10:26 we love rust and we've onboarded a few non-rust people to the team right and rust has a learning curve which is i'm sure people can appreciate when they moved in from other languages but again once you get acquainted it's's amazing. So we love Rust. We are hiring. I don't know that we will hire to the core engineering team outside of Israel, right? That's always kind of a tension for us. But again, for the right candidates,
Starting point is 01:10:57 we are always flexible. Awesome. Anything else that we didn't ask you that's on your mind, you want to make sure we touch on before we call the show? I don't know if we touched on it or not, but I think it's really important for people to understand that kind of the AI-assisted development is here to stay. I think for me, as someone who's been around in this kind of area and a believer of AI-assisted developer for many years now,
Starting point is 01:11:21 it's super exciting to see these things unfold like from really hallucinations of people, right? Like pipe dreams into stuff that works in production and people are using every day. And I think this is just going to expand to more and more stages of the development lifecycle. Tab 9 is definitely, we started in code completion because that's kind of the high frequency, you are with the developer all day in the IDE, and you can really get a lot of relationship and accelerate development. But we are definitely looking at other stages of the development lifecycle, injecting AI there as well and accelerating other stages. And I think this is going to really grow much further beyond what we're seeing right now. This is just the beginning of the AI-assisted development revolution.
Starting point is 01:12:14 And I think it's a really exciting time to work in this space. Given that, so you have that opinion, given all your work in the field, what could developers do differently today and in the coming year or two differently knowing that ai assist development is in your words here to stay what are some actions individual developers or teams could take aside from just using it like how can developing teams development teams get prepared for this future or changing their mindset? Yeah, it's a really great question. I don't know. I think one thing to keep in mind is basically the kind of the data sets.
Starting point is 01:12:57 What would you train on if you could train AI models on your code? Like, what would it be? How do I keep at least part of the code ready to be served as training data, right? Because what we're seeing with customers is that it's actually quite tricky. If you go to a large enough organization and they say, train on my code, then there's a lot of discussion in terms of say like,
Starting point is 01:13:21 but wait, actually don't train on project X because that's like some legacy thing that no god forbid we should ever propagate that kind of knowledge anywhere and you should actually train on iran's code if you're doing python but on adam's code if you're if you're doing rust so there are a lot of subtlety and kind of like what you'd like to train on and i think that's going to be moving forward. I think that's going to be a really interesting space on kind of how people do the curation of what they would like to train on. We're already seeing like early signs of that happening.
Starting point is 01:13:58 Like people asking us, can I train? Can I exclude this guy because he's cold? You know, he's just he joined like a month ago. Whatever he's doing, I don't want to train on. And so there are a lot of subtleties around that which I think are going to be interesting. Basically, how do you tend your garden of code in order to train the future code you want to write? Yeah, exactly. Exactly that. How do you tend your current garden of code?
Starting point is 01:14:21 Yeah. Yeah. Interesting. Which in a sense you're supposed to do anyway when you're doing like reducing technical debt, but nobody actually has time to do that, right? That's like. One more reason to do so, right? One more reason. Yeah.
Starting point is 01:14:32 One more reason to do so, yeah. Well, Aron, thank you for sharing that sentiment. AI-assisted development is here to stay. He shared a brief idea on how you can begin to prepare. But Aron, thank you so much for sharing this history of the project and now the company and this future that's imminent, basically. Thank you. Thank you very much, guys.
Starting point is 01:14:52 Thanks for having me. Lots of fun. All right, that's it for this episode of The Change Law. Thank you for tuning in. We have a bunch of podcasts for you at changelog.com. You should check out. Subscribe to the master feed. Get them all at changelog.com slash master. Get everything we ship in a single feed. And I want to personally invite you to join the community at changelog.com slash community. It's free to join. Come hang with us in Slack. There are no imposters and
Starting point is 01:15:20 everyone is welcome. Huge thanks again to our partners Linode, Fastly, and LaunchDarkly. Also, thanks to Breakmaster Cylinder for making all of our awesome beats. and everyone is welcome. Huge thanks again to our partners, Linode, Fastly, and LaunchDarkly. Also, thanks to Breakmaster Cylinder for making all of our awesome beats. That's it for this week. We'll see you next week. Thank you. you

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.