Programming Throwdown - 127: AI for Code with Eran Yahav

Starting point is 00:00:00 Welcome to another episode of exciting, deep programming talk with Programming Throwdown. Jason made me feel a little bit sad the other day. He referenced something that we did like a decade ago related to Programming Throwdown. Then it made me realize, oh, that's really good. We've been doing this podcast a long time. And then I got really sad because I'm like, oh, it means I'm old. So no, I'm just kidding. I enjoy doing it. Jason, I know it's great. Okay.

Starting point is 00:00:48 Anyways, that was off topic. So we're here with Aaron today. Welcome to the show, Aaron. Tell us a little bit about yourself, how you got started in programming. Yeah. Thanks for having me, guys. Yeah. I think you started by saying you feel old, so I should really keep quiet on that front.

Starting point is 00:01:04 I mean, I started programming keep quiet on that front. I mean, I started programming many, many years ago. I think my first computer was the IBM PC, I think with five megabytes hard drive or something like that. That was like the super duper. You could never fill that up. Yeah, exactly right. And dial up internet and stuff like that. that that has been a long time ago first programming language was basic if you guys even know what that is and so it's been a long time since then i've been doing programming from a young age i don't even know

Starting point is 00:01:40 since when did undergraduate at Technion CS, then did military service, mandatory in Israel, for six years. What else? PhD, computer science. Went over to the US to do some work on program analysis, program synthesis at IBM Research at the time, TJ Watson Center in New York.

Starting point is 00:02:14 And then came back as a faculty of Technion doing machine learning over code, program analysis, program synthesis, all these great things. And during that time, I really got really deep into program synthesis as something that can transform how we write code in reality and not just in academia. And for many, many years, I've been fascinated by the idea of programs that work on programs, compilers, synthesizers, debuggers, any program that operates on other programs. I found that fascinating from a very young age. And this is what I've been doing for a long, long time. Awesome. Well, that was a very sweet moment. I got a lot of stuff, a lot of questions already. So yeah, so starting in BASIC, I mean, I can relate to that.

Starting point is 00:02:59 And then I started in C. And so now that makes me the guy who always has to write the bit shift operations. So, yeah, I mean, that's quite a sweeping thing. So you even in our intro, we're saying that you consider yourself sort of academic or you're teaching. I spent a lot of time in university. I know that's similar to Jason.

Starting point is 00:03:18 Maybe just a bit before we dig into the other stuff. Like, what are your thoughts for people who are debating, you know, sort of spending extra time in school versus sort of entering the workforce? What are your thoughts about that? We get that question a lot. Yeah, I think I'm quite biased, probably. I think that going to school is great. It really opens your mind and gives you a lot of taste and kind of experience and things

Starting point is 00:03:42 that you would never experience in the workforce, right? Like in a daily job, just because the incentives and the objectives are very different. So I spent four years in undergrad. I think today many programs only require three years. And I think that fourth year in which I did like courses in networks and ai and you know whatever natural language processing and things that you know i would never like even know that these topics exist let alone like go into the depth that you can go in undergrad if i only met them while doing but no like working at a company.

Starting point is 00:04:25 It's just impossible to get this kind of like wide perspective. And I highly recommend for anyone to just spend this extra year or extra couple of years getting exposed to things because this really changes the way you think about problems. It happens to me very frequently that I bump into some problems and say, but wait, that was something that I actually met in computational biology. There was an algorithm like that in computational biology. And even if I don't remember the details, I kind of have the pointer in my head

Starting point is 00:04:57 and I can go and look for it and kind of refresh my knowledge about it. So I think that has been like building this kind of reference in your head of all the topics early when you're young and things are really big. It's easy to get them recorded at the infrastructure level of your brain. I think that's super helpful

Starting point is 00:05:17 and it's an investment that really pays itself back very quickly. Nice. Yeah, Thanks for that input. I mean, I think that's something that people debate. I'm not sure there's a right answer, and I think it varies person to person, but I mean, I think that's really good observations.

Starting point is 00:05:35 And you said something which I think a lot about, and I know Jason does too, which is this word incentives. And I think it's really interesting to view a lot of things, I don't want to say in life, that sounds too deep, but like a lot of things, at least in, you know, interacting with people at work and thinking about incentives. For me, that's been something that helps explain a lot.

Starting point is 00:05:53 And so as you point out, when you're in school, I think your incentives are very different than when you're in the workplace. And I think, like you said, the experiences you'll have, even just thinking in that lens, you can sort of understand how they may be starkly different. I think it's important to understand that every endeavor at the end is a human endeavor. So research is, and the workplace is, and people are at the end motivated by, even if the incentives or the kind of target functions are very implicit, they are there.

Starting point is 00:06:29 And in the workplace, you're supposed to get something done at the end, right? A kind way of saying humans are greedy. Yes. No, no, it's not greedy. I think it's not greedy. It's natural, right? That's true.

Starting point is 00:06:43 Greedy ascribes some sort of moral aspect to it, I guess. Yeah, exactly. All right. So, you know, we talked about in Jason's intro, he said that this is going to be AI for code. So, I mean, I think talking about how code development works and even in your sort of like introduction talk, you talked about synthesis analysis program. So back when I was first writing QBasic and, you know, sort of input introduction talk to you talked about synthesis analysis programming so back when i was first writing cube basic and you know sort of inputting that and like line 10 go to you know

Starting point is 00:07:11 10 whatever i wrote an infinite okay anyways you know it was just an editor the editor was very simple i don't even remember recall it having syntax highlighting um you know that was sort of the it may have had i can't remember now it's been too long those early days and we're starting to drop some of those words so everyone kind of i get i think understands you open uh vi emacs notepad.exe just depending on your on your platform and you start putting like things that represent a program as text into that that sheet and that that's sort of everywhere everywhere starts and then sort of take us on how you think about, you know, I think there's a lot more than just editing the code,

Starting point is 00:07:50 but maybe starting there, like for me at least, well, maybe it starts before and how you think about what you want to put on the paper as it were, or we want to put down. But maybe take us through a little bit like how you think about those early days

Starting point is 00:08:02 and how it was approached so that we can sort of set up how it's sort of undergoing some amount of transformation. Yeah, I think it's all a question of balance between the human and the machine, in a sense. In the early, early days, you had to work really hard to satisfy the requirements of the machine. And the machine was kind of brutally unforgiving, right? Like you could make one single mistake, nothing told you that you made that mistake

Starting point is 00:08:32 and the whole thing would go awry when you run it, right? And so over the years, people had this brilliant idea. I think Fortran was the first, that you can use the machine to help the human deal with the machine, which is, I think, the idea behind the compiler. The early compiler is exactly that, like help the human write something that is slightly higher level. The compiler will do some work to check it and maybe give you some useful errors.

Starting point is 00:09:04 And then the machine would run it for you. And so this is kind of really the early idea of having a machine help the human program it and making it more forgiving in a sense. Yeah, I was right there. So maybe people don't realize or know. I mean, I wasn't exposed to it initially and then found out later. But so originally, people would write literally the hex codes for assembly or no i mean i wasn't exposed to it initially and then found out later but so originally people would write literally the hex codes for assembly to like you know basically script the exact flipping of the gates inside the microprocessor oh this is my background i'll get here all day

Starting point is 00:09:35 so i'm not going to talk about this but like you know actually writing hex that is stored in an e-prom or or whatever and the you know the c CPU actually executes was like, step zero, the very first people had nothing else, I guess that's what you kind of got to do, right? Then people realize, oh, hey, hang on, we can use I believe the word is mnemonics, right? Like, I can say hex code, this is actually the word MOV move, right? And why it wasn't longer, yeah, someone else, but like, MOV, you know, branch if equal b and e branch not equal right like these all these stuff and moving the registers and still thinking about the physicality of the compiler then then the very early thing was like a two-pass assembler right so like i could use labels so i

Starting point is 00:10:16 could use a label for a line number or an address and use something else in the compiler would scan through once pick up all your labels figure out where they go and then scan through again super super low level and so so what you're talking about fortran this idea that like you don't need to write like one-to-one assembly instructions was like yeah i think i agree like a very early breakthrough in in how to get the machine to do the work that you really didn't need to be doing yeah i think since, I think there's been a lot more kind of development in how the machine can help you. You mentioned syntax highlighting earlier, but also all sorts of more sophisticated compilers

Starting point is 00:10:57 that can give you really deep error messages on what you did wrong. And also linters and static analysis that can point out common errors and maybe even suggest corrections and all these things that make our lives easier. But at the end, at the very end, programming is still extremely unforgiving, right? If you compare it to having a conversation in natural language, it's still like you can spend the whole day, you know, like flipping

Starting point is 00:11:27 because you flipped two parameters to a function or just forgot the semicolon somewhere, right? And this is like extremely unforgiving, even today. And it's been a while since Fortran, yeah. Oh, that's true. Okay, so I think people probably know like what syntax highlighting is,

Starting point is 00:11:45 and if they've had any experience, they probably kind of get that. So static analysis, though, maybe can you explain what static analysis is? Yeah, in essence, it's really kind of like an expansion of the compiler to check properties that are more than maybe just type checking or the standard things that the compiler checks to more sophisticated things maybe properties like i know your program does not divide by zero or you

Starting point is 00:12:14 don't have an alt reference or no overflow and properties of that nature and all the way to like full program verification which is checking that your program satisfies kind of a functional logical specification, you know, that a given function that is supposed to sort an array actually returns an array that is sorted and check that statically, meaning without running the program, right? So the static part here means that i'm not going to run the program i'm going to statically check it at compile time and give it

Starting point is 00:12:51 a result and now some of the listeners may like scratch their head and say like but wait this sounds like something that is undecidable because if they've heard about kind of the whole the whole thing problems say like wait can you check that the program terminates probably you cannot and so indeed the problem is undecidable but you can solve approximations of this problem and typically the way that standing analysis works is that it gives you conservative errors or conservative reports, meaning that if it says that there is no error, then it's guaranteed to be correct,

Starting point is 00:13:33 but it may give you false alarms saying, hey, your program may divide by zero, which actually doesn't. And so that has been quite useful at the end. But it is also often quite frustrating for developers because they chase down all sorts of reports that turn out to be false alarms. And that can be really frustrating.

Starting point is 00:14:01 So I think, yeah, I mean, so I've run static analysis before. And I mean, I think the compiler does, you know, some form of static analysis. And even now, like the better, but like separate tools often, you know, go in more depth. And this frustration you're expressing is, was sort of my experience that you get a lot of stuff where it sort of just gets confused, which already we're sort of talking about the computer doing more and talking about getting confused or not understanding and we're talking somewhat about like humans have some intent when they write their code but then how they express it may not match and then something else is trying to like understand it so when we say like the computer tries to understand what you're trying to do or tries to check for a divide by zero? Like, what are we actually meaning there? So, yeah, without like getting all academic here, I think the right view is that the static analyzer or the machine, let's call it, is constructing some obstruction of your program

Starting point is 00:14:59 that it can reason about. And that obstruction does not always match your idea of what the program does. And this is where the confusion or the mismatch comes from. And it's also the reason that sometimes or often these tools cannot explain why they got what they got, right? Or not in a way that you would understand it, not in a way that would be useful for the human.

Starting point is 00:15:28 So the chain of reasoning that led them to the conclusion that you are amazed by or shocked by is not easily explainable to human. But you're spot on in the point of kind of understanding the intent and communicating intent to the machine. And all these questions are really central, both for program analysis and program synthesis. How do you know what is it that the human was trying to do? I mean, I think like when I tried to explain to people who aren't in programming, like what debugging or what these things are like, I just get blank stares. But I mean,

Starting point is 00:16:10 for people who have programmed a lot, I think a lot of people experience that where you write some code, you have some intent, you run it. And even it turns out that actually the computer got it right and you got it wrong, that you thought, you know, a number, you code it better than you like solved your own problem. And I think for me, the interesting thing about, you know, static analysis, or even just compilers adding that to their repertoire over the years

Starting point is 00:16:34 is like the number of things you can hold in your head at one time. So the interaction between functions and that, you know, oh, hey, you're calling this thing over here and it has this or that. And did it really is, well, I come from a CC++ background. So like, you know, for me, it'd be like, can this function ever return a null pointer? Well, if it's simple, maybe it can never return a null pointer unless a null pointer was provided into it. You know, you can kind of start

Starting point is 00:16:59 keeping track of all of those things. And we know a computer is really great at, you know, doing, you know, repetitive things very quickly. And so holding all of those simple functions And we know a computer is really great at doing repetitive things very quickly. And so holding all of those simple functions in its head, it can do those sort of bounds checking, like what is the biggest number that this could get? Is it an overflow or underflow? And even if it can't tell you you're right or wrong, what I've seen be useful is that sort of cooperation where it sort of come back and tells you like, hey, based on what I see here right now, this is the range of values you can sort of expect out of this. And that's when you sort of start to see, oh, wait, there's actually this like, iterative thing where

Starting point is 00:17:33 I'm doing something, the computer's telling me what it thinks I meant to do, I'm telling it that, you know, and you sort of do this happens to me, I guess more now, as like the error messages are getting better, and whatever, versus when I first started out, what I would find myself doing is running, you know, enormous blocks of code and, you know, trying to run the program as a whole and just see if the right thing comes out at the end. Absolutely.

Starting point is 00:17:54 So that's exactly the progress that we're seeing this tight loop between the human and the machine, that the human is doing something. The machine is like giving you feedback. And now, essentially, over the years, we are getting closer and closer to working together with the machine when you write the program. What you described effectively is the iterative discovery of the specification.

Starting point is 00:18:22 When you start programming, you have a loose idea of what it should do, but you start programming, you have kind of a loose idea of what it should do, but you don't have all kind of the edge cases and all the subtleties because, you know, you start with high-level thought of high-level spec. And as you write the code, you start to discover and unravel kind of the low-level details and the hidden complexity that is in there. And what you described is the machine kind of the low-level details and the hidden complexity that is in there.

Starting point is 00:18:46 And what you described is the machine kind of helping you unravel that as you make that progress. And I think this is also where synthesis comes in, which is the idea that if you express your intent clearly enough, the machine can actually predict what is it that you are trying to do from your intent and from context and complete the code or complete the thought for you in a way that also kind of prevents you from falling into the standard or like the common pitfalls around that area.

Starting point is 00:19:20 Yeah, that makes sense. I think one of the things that is really, I think, where program synthesis becomes important is in figuring out all of the assumptions that matter and the ones that don't. in the biggest possible integer ever, and now your program will not work or will overflow. And it's like, okay, yes, that could have happened, but it's not useful, right? And so it becomes, as you said, this symbiotic thing. And so I think one of the big challenges is how to know what sort of errors are useful for people. That seems to be kind of where all the magic has to happen. Right.

Starting point is 00:20:08 And I think this is exactly why I got excited by program synthesis as opposed to kind of the static analysis tools. So I've worked for, I don't know, several, many years on static analysis tools. And really, as you said, the symbiotic thing with static analysis tools is quite tricky because you have this thing that is always complaining and complaining about things that are kind of like, hey, I don't care about that.

Starting point is 00:20:35 That's not the problem here at all. And so it actually ends up distracting you more than helping you. So it's a symbiotic thing. But there's like this nitpicker that is like complaining about the immaterial stuff and distracting me from my actual thing and breaking my train of thought, right? And this is why I think that's the challenge with negative tools that complain all the time. And this is why I find program synthesis so compelling because it says, hang around, I see what you're doing. This may be really helpful for you, like this piece of information or this next line of code.

Starting point is 00:21:13 And I say, you know what? I don't care. Let me keep on typing. I'm like, I'm on a roll, so I'll keep on typing. I won't even notice. even know this, but if I'm stuck for this extra fraction of a second, and I look at the suggestion of the tool and say, oh, yeah, that's exactly what I wanted. Thank you, dear program synthesis, and I can consume this and keep on going. So rather than complaining, it is suggesting things. And I think maybe the simplest analogy is kind of like assume that instead of type ahead on your phone, you had just a spell checker that complains all the time when you make the typing mistakes.

Starting point is 00:21:51 Right. And so type ahead is infinitely more useful. Right. Yeah, I mean, I think like to take like a very specific example, I guess. So, you know, Jason's thing, like I just write a function called, you know, increment or plus one. And, you know, the thing is complaining, squiggling lines everywhere, being like potential overflow, like no bounds checking, whatever. Like you're right, that's really annoying. And I guess there's like,

Starting point is 00:22:14 as programmers progress or even like their understanding of it, what they want to do, as Jason pointed out, I'm going to be like, yeah, that's not really practical. I don't really care. But on the flip side,

Starting point is 00:22:22 I know that is a risk. And if we start talking about like how components get reused in ways that didn't get expected, if the tool that I'm using says, you know, hey, actually, let me add in the bounds check for you, then unless it's performance critical, again, my background, I'll probably let it do it, right? Like, oh, if equal to int max, you know, just do nothing. Like, okay, well, at least it didn't increment, but at least it doesn't, you know, overflow

Starting point is 00:22:48 and give me a number smaller than what I put in. So I can at least say that, you know, now I know this function never, you know, shrinks and only grows, but maybe is equal in some small case. And so having the synthesis, like you say, is actually in some ways applying it useful, more complicated because you first need to understand there's a problem and understand like an acceptable solution that like hey an acceptable solution here

Starting point is 00:23:11 is to balance check and do nothing to balance check and throw an exception uh and what framework i'm in what language i'm in i guess like that could vary wildly yeah a sense, it is kind of more complex, but it's also more in your flow, right? So abstractly you're correct that I need to understand more, but I need to act less in a sense. I need to look at it and say, yeah,

Starting point is 00:23:36 I don't care about this. And like, whereas with the static thing, I'll have to work for the tool to satisfy it, right? Because I'll have to like, and ignore bounce check on the thing, or I'll have to do something actively to make these things stop complaining.

Starting point is 00:23:52 And I think that is the frustration that a lot of people have around that. I think synthesis is kind of like the cure for a lot of these things. Do you find like, you know, when you talk about static analysis or or synthesis do you i mean like some languages aren't used so heavily anymore so maybe excluding those but of languages that people would encounter every day do you find that like

Starting point is 00:24:15 people's acceptance of of those kinds of tools varies based on like the kind of language they're using or the kind of development they're doing? Or are people generally pretty open to having the computer help them out more? Yeah, I think people are very open. I think there's kind of like the bottom 5% of developers know the really, really new ones who don't know what's going on

Starting point is 00:24:43 are going to get really confused by synthesis tools or potentially confused because they're getting these suggestions and they don't know what is it that they're getting and why. So it's kind of like, it's going to be tricky. And the top like 1% of developers who are doing device driver development, one-off thing, algorithmic, this or that, the suggestions are not going to be probably on target because their intent is so one-off and complicated that this doesn't match any of the common distributions of how people write code.

Starting point is 00:25:20 So either they would need some specialized model for device driver development which is kind of like almost its own language right it's like its own set of idioms or they would rather not use any tools because also you know they work in vi and all the stereotypes around that right so, I mean, I think like we've had those editor debates before. I think even on this show, we've talked about it. And I think people change over time. And I think you can get lots of tools

Starting point is 00:25:54 to do lots of things, depending on how much you're willing to sort of spend on it. But for me, the realization I've had of where I am now, at least, is this computer cooperation, which is, again, that the computer, just as simple as, hey, there's this function, where is this function coming from? And sometimes it actually turns out it wasn't the function I

Starting point is 00:26:16 thought it was, it's pulling it from somewhere else. And having used various tools, sometimes that is a guess based on, you know, code match. And sometimes it's based on actually, you know, understanding how it's being built. And this is leading me, I am going somewhere. And that is that, you know, across the different platforms, different code bases, different frameworks, there's a lot of difference. How does the tooling sort of like, we said builds an abstraction, but is it like just like operationally, how does the tooling sort of like we said builds an abstraction but is it uh like just like operationally how does that work if we have programs in c++ java python like is that

Starting point is 00:26:52 intermediate common enough to be useful and reasoned about you know across all of them and it's you know front ends and back ends into and out of those things or do we need sort of like some understanding that differs you know like were saying, for a device driver, you might need something just entirely different. How common or unique are each of those scenarios? So, yeah, it's hard to discuss this in the abstract, so I think we need to concretize it a bit more. So most of the synthesis tools or let's say the AI for code tools do have some semantic aspects that are language specific.

Starting point is 00:27:40 They have some, let's call it extraction procedure that extracts some representation, interpretive presentation from the language, which includes information about maybe types, maybe how data flows between variables, maybe some other things, which is quite language specific, and they extract it to a common presentation, much like a compiler does before it generates code for the backend. So for people who don't know like, you know, GCC or something like that, like standard compilers, they may have a lot of frontends and a lot of backends, but they communicate by extracting or by first translating the frontend language or the source language into some interpretive presentation and the backend works of that.

Starting point is 00:28:31 So similar to that model, standard analysis tool or program synthesis tools also have typically front-ends that extract some semantic information to an interpretive presentation and then the entire backend works of that. So yeah, there are some language-specific, let's call them features, but most of the machinery is language-agnostic. I see. So I don't know too much about GCC other than it's GCC.

Starting point is 00:28:59 So we can talk about Clang, and then I'd like to contrast it with like JVM or, you know, Python. So for many languages that can be compiled by Clang, Clang has this IR, this intermediate representation. And it's actually a very powerful thing because anything you can get into the intermediate representation, you can get machine code out for any backend that Clang supports. So, you know, PowerPC, x86, ARM, you know, whatever your specific, and if you want to write a new, you know, host system, it's just as simple as, you know, writing another one of these backends. And then optimizations can take place in this intermediate represent. Okay. So Java has something, I guess, a bit similar, right? You compiled to this sort of the bytecode and other machines can target to that bytecode. I don't know what it is for Python. But once they're in these representations, and we're sort of at that, you know, intermediate, how similar are like the programming constructs in Python? I've never really thought about it, like in Python, versus, you know, Java versus sort of like, what would have come out of C++? Yeah, I don't think they're necessarily very similar at that level, but kind of the concepts of dynamic binding or all these kind of like fundamental program language concepts are there.

Starting point is 00:30:21 The modeling can be quite involved. For example, if you've ever looked at how Scala is compiled to the JVM, if you had the kind of, I guess, maybe misfortune of looking into these details, this is extremely, extremely involved. It's really beautiful conceptual and engineering work by the Scala team. It's really amazing, but it's really beautiful conceptual and engineering work by the Scala team. It's really amazing, but it's really complicated.

Starting point is 00:30:50 Sure, sure. Okay. So yeah, so when we have the static analysis tools and synthesis tools, and we have them per kind of thing someone is doing, I, I mean, I guess we can, we can sort of like keep moving up the stack. So you have your code base, right? So I have, I have my code base and how my code base behaves may be very different than Jason's code base. We do entirely different kinds of programming. So how did these tools sort of balance the, like, like you sort of said, the meat of the distribution, the sort of like average case versus the tailored case of, hey, on this project, we've made these, I don't want to say artistic, these opinion-based choices. You know, like how does the tooling sort of balance between those? Right. build some universal model, and that universal model captures how code behaves

Starting point is 00:31:47 in the, let's call it the common case, in the wild. Let's say I look at all the C++ projects on GitHub, and I will get some abstraction of what these projects do, how they're supposed to behave. I'll get some distribution of the expected code completions or code predictions that I need to do for these projects. They would work quite well for the majority of new C++ projects.

Starting point is 00:32:15 But if you have made your own opinionated kind of decisions that are veering off significantly from, let's call it a general population, gen pop, then you would benefit greatly from training a custom model for your project, your organization to kind of capture these notions, right? And so the bigger your code base is, effectively, the more you would benefit from having your own private model to better capture how you do things, how your team is doing stuff. So then, I mean, I guess you're starting to veer, obviously,

Starting point is 00:32:59 where we're going. But the static analysis, code completion, doesn't have to always be done by a trained system, right? I assume that prior, these were things that were hand done. So hand modeling, hand feature extraction, hand suggestion, and tuning, and then, you know. Yeah, that's exactly right.

Starting point is 00:33:18 So maybe there's kind of a pause that we need to make here and kind of distinguish, I guess. I don't know if it's the second, third, or fourth wave. So I'll not put a number on kind of which wave is it of the static analysis tools. But let's call it the previous wave of static analysis tools. Uses hand-coded rules that say, oh, if you call foo and then you call bar, then this is really a bad idea.

Starting point is 00:33:46 Like if you check whatever Java to say, if X equals null and then the next line says X.foo, then this is not a good thing, right? Because you're likely to get a null reference. And so these are like hard-coded rules that capture the common case. They're manually crafted rules. And this is how like Linters, like ESLint work, right?

Starting point is 00:34:10 They have like, I wanted to use like the S word, but they have a bunch of rules that were written over the years that capture a lot of like common anti-patterns and these may not be the anti-patterns for you and your project and this is why you get a ton of kind of complaints from these tools that because they're capturing generic things so that was like the previous generation of tools the new generation of tools is using ai to learn kind of what is it that is being done in your code base? What are the patterns?

Starting point is 00:34:49 What are the anti-patterns? What are things that should be avoided? And it's actually using that information to give you much more targeted kind of alarms and reports on your code in case of checking and also much better predictions of what code should be written or how to complete your code when you're doing the program synthesis right so this is kind of like the difference between handcrafted things and learned things and that the reason that we can do that we can learn all these rich information about your project or about code in the universe is because of the progress that has been made in recent years in studying analysis technology in machine learning algorithms and models, and also in the computational power. So we're just, you know, we can throw huge GPUs and memory and data sets at the problem

Starting point is 00:35:52 and kind of train models that are able to capture information or rules that would otherwise have to be handwritten by experts. So when you're writing these handwritten rules you're saying you know this set of operations can cause like we talked about before overflow or underflow or undetermined behavior uh or it's my opinion that uh if you return a value you should always use that value and so like i say that's a rule right yeah so some set of experts deems you know sort sort of what is good or bad. What are the, you know, when you analyze code bases and, you know, sort of try to apply machine learning, you know, in my head, I'm trying to think like, how do you get those sort of quality metrics?

Starting point is 00:36:36 I mean, that the code compiles is insufficient, that the code runs is, I mean, maybe that's how you, that seems a bit tricky. Yeah, I think that the crux of the matter is, is specification at the end, right? Like the reason that I say a value that is being returned should be assigned somewhere. This is a specification, that's a property. And that property is probably kind of like a hygiene condition, right? It's like, it's something that is good to have regardless of what your program is doing. So that's kind of like a generic universal specification

Starting point is 00:37:10 that should hold anywhere, allegedly, right? And this is why we check it. And I don't know how to check for your program that, you know, a student class should always have the ID field assigned because as a general rule, I don't know what is a student. I don't know what is an ID. And I don't know even how to express this as a general rule.

Starting point is 00:37:36 And I'm definitely not going to put it in the list of rules of ESLint to check across the universe because most of the universe does not have the class student and a field called id so this kind of like specialized specifications for your project for your setting are exactly what the machine learning algorithms can pick up and check for consistency of right and then more importantly with program synthesis, they can generate the right assignments to make sure that as you're doing these things, you always assign the idea of the student. They can make sure by construction that you're using these classes or using these pieces

Starting point is 00:38:20 of code the way they were intended. Today's sponsor is Rollbar. Rollbar is the leading platform that enables developers to proactively discover and resolve issues in their code, allowing them to work on continuous code improvement throughout the software development lifecycle. Rollbar has plans for all situations, from free to large enterprise. With Rollbar,

Starting point is 00:38:46 developers deploy better software faster and can quickly recover from critical errors as they happen. We have a special URL at https://try.rollbar.com.pt for programming throwdown. There you can find two free ebooks, How Debugging is Changing and How Dev Experience Matters, as well as sign up for a free trial of Robar. Yeah, so maybe it's like, I'll use Jason's term I like here, double click on that for a second.

Starting point is 00:39:16 We go into this that, okay, so I've seen this, you know, on, you know, Hacker News or Reddit or whatever. Someone trains a hidden Markov model over a code base and I can generate something that on first pass looks like code, right? So, and to kind of like unpack that a bit, it's not my field,

Starting point is 00:39:32 but like what I understand about that is I can generate new code, which matches the pattern or statistics of your code base, right? So whenever you call this function, insert new student into database, everywhere in the code base, you do So whenever you call this function, insert new student into database, everywhere in the code base, you do that on the left hand side,

Starting point is 00:39:48 you always say Boolean successful equals this function. And I can tell you when I emit this code, I always put Boolean sick, but I don't know why you do that or why in this other case, and if it's split 50-50, then the generation will 50% of the time do it, 50% not, right? Like I 50% not. I can match the

Starting point is 00:40:05 statistics. So I think people probably have used tools where they've seen that pattern matching or, hey, other places in this file, you always follow this word with that word. I mean, I've used those tools before. I didn't find them that useful. So what is the objective when you're doing this training of these models that differs from just sort of matching the statistics and actually sort of doing what you i think you were kind of alluding to like the right thing yeah so so i think it's kind of the difference between a bicycle and a spaceship at the essence they're kind of like you know doing the same thing. They're like vehicles, they're moving stuff, but one is just like insanely more powerful than the other. And really it becomes to a question of how powerful are these models in the ability to tailor the context in which you make the prediction

Starting point is 00:41:00 and generalize over that. And so we're kind of, these days we're using models with hundreds of millions or billions of parameters, neural networks with billions of parameters to solve kind of this exact predictive question of here's like the context

Starting point is 00:41:21 that I have in my editor and what is it that I should be typing next? And this model contextualized not only on, you know, the last five words that you wrote, which is like hopelessly naive. They contextualize on the entire context that you have in the file, including natural language, including other peripheral information from the project, including all sorts of other signals that you have in your environment in order to make a prediction. And this is kind of being able to contextualize and generalize over that is the magic that gives you really accurate predictions that people appreciate and can use, as opposed to just, you know, I flip a coin and I suggest that the next word should be either, you know,

Starting point is 00:42:16 fool or bar, right? Or like, yeah, you did like DB. And I guess either the next word is add or remove, right? It's like, and so it's much more, the models are just like much more powerful than that. So are the, I guess like, you know, again, like for people who may not be aware, I mean, is it that you have a sort of hybrid system

Starting point is 00:42:38 where you're trying to actually sort of like imbue some human architecture to these models? Or are you sort of just setting up the problem and allowing sort of an end-to-end solution to become trained? Yeah, so, you know, like in reality, all these systems are not really end-to-end just because of kind of the engineering cost of doing the inference end-to-end

Starting point is 00:43:05 is typically prohibitive, and you need to do a combination of several models in order to get the response on time. And then there's some bias using semantic information. And again, there are a ton of details that I'm not sure that this is useful to discuss here. But yeah, ideally, you would like to have it completely end to end.

Starting point is 00:43:27 But in the world of practical engineering, you need to do something more sophisticated than that. Yeah, that makes sense. Again, we're working in the realm of near real-time synthesis, right? You're typing stuff, and you have to generate these predictions of what comes next in near time to be useful.

Starting point is 00:43:48 And this is where the engineering gets really clever. That's what I was going to say. So, I mean, I guess it's one thing to be given an infinite amount of time to make a suggestion versus I'm going to be typing it and you have to beat me to it or it's not useful. And so, as you mentioned, I can imagine the amount of engineering that goes in from, oh, hey, I read an academic paper that says, you know, we can, you know, you know, suggest code completion at an, you know, x percent accuracy to actually being able to deliver that in an IDE. So maybe to talk about that a minute. So, I mean, you guys have

Starting point is 00:44:21 built a system to do some of this. This is, you know, why you've thought about this so much and, you know, trying to integrate it. At an architectural level, you know, how does that end up working? So, you know, I'm typing in my editor and, you know, words are appearing on screen. What's happening sort of in the background to there's kind of, I guess, half a million lines of Rust code that are running under the hood and doing very efficient neural network inference that involves between two and four different models that are being kind of combined together and have different trade-offs in terms of response time and accuracy. And, you know, if you're typing slightly slower, engineer may catch up and you may get slightly better predictions because the stronger model kind of made it in time. And if not, you may be getting results from a slightly inferior model. And the challenge really for a lot of it is actually the balance between the human and

Starting point is 00:45:39 the machine. How do you make predictions at the right places that do not interrupt the human right we're kind of obsessed uh on these exact kind of finding the balance of when to make predictions what kind of predictions to make where what should be the confidence from the model before we throw it in your face uh what other kind of barriers are there before you interact with the human? And how do you get into the flow of the human in a natural way such that really the human is kind of easily can easily ignore it if it's not helpful, but actually consume it if it is helpful? All right, and there's a lot around that. So yeah,. To pick maybe just even an example there,

Starting point is 00:46:26 how does that in practice, do you guys approach that problem? One example I can think of is how much code to suggest. So there's, I could complete your line. I could suggest your function. I could suggest your whole program. I mean, we can't probably program yet. Maybe we get there in a minute.

Starting point is 00:46:39 But how do you guys sort of balance off trying to figure out how much to end up sending up onto the screen? Right. So luckily, we've been doing this for a while, and we've been serving millions of users. So we actually run some experiments in the wild to find out, like, what kind of prediction of horizon is most useful for people. And it also depends on kind of the intended, again, it depends on context, but let me put that aside for a second.

Starting point is 00:47:15 It turns out that what humans like the most is pieces of code that they can make snap judgments about the correctness of. So the kind of the tight loop that works best is the human writes something or there's sufficient context, the human writes something and the machine suggests something that is easily identifiable as useful. So like kind of, let's say complete to the end of the line, but something that is very idiomatic.

Starting point is 00:47:48 So if you see it, you will know that, yeah, that's what I meant, right? So this is kind of what we call internally the remind me model, right? It's kind of code that I've written 80,000 of times. I know what I'm going to write. If I ask a human next to me, they know what i'm going to write if i ask a human next to me they know what i'm going to write like my my classical example for that is uh read the python file line by line that's like a code i wrote i don't know how many times in my life probably thousands of times in my life right and if i see it i know that it's correct i don't need to kind of ruminate

Starting point is 00:48:23 about it it's like yeah that's it and so that don't need to kind of ruminate about it. It's like, yeah, that's it. And so that's kind of the classic case in which I can complete more than one line because it's really idiomatic. When you see it, you know it's that. It's what you need. It's kind of code that otherwise you would copy from Stack Overflow, right? That's kind of the

Starting point is 00:48:38 idiomatic thing to think about it, right? And so all of this is happening sort of local on the developer's machine. And then how does that split happen? Obviously, some stuff probably needs to happen in a server somewhere. Yeah, so what we do technically,

Starting point is 00:48:57 what we do in Tab9 is we allow you to configure the architecture or the kind of configuration that you'd like to run in. You can run everything completely locally, even air-gapped, so you can run it without network at all. Or you can run some of the inference on the cloud, which obviously gives you better completions because you run with stronger GPUs than what you would otherwise have CPU-only inference on your machine.

Starting point is 00:49:28 Or you can run completely on tab 9 cloud and you control, you can create your own server in your organization and run it. So you basically, you can control which kind of models are being deployed and where do you want to deploy them. I think that's really useful for developers to be able to control where inference happens, especially because inference does require some context from your IDE, right? So you may be quite sensitive

Starting point is 00:49:58 to where this stuff is being sent, depending mostly on policy of your workplace. Yeah, that can vary wildly. So before we dive into like, you know, talking about tab nine specifically, sort of my last comment here is you mentioned trying to maybe come up with how many waves and you refuse to give a number, which is fine. But then like going forward, I mean, what do you sort of see the future? I mean, since I was in school, everyone always said, oh, maybe one day we'll just, you know,

Starting point is 00:50:23 tell the computer what we want and it'll just, you know, write the program for us. And people scoff at that. And, you know, maybe that's not exactly the future that that shaped up. But I mean, and it's fine that this probably gets highly into opinion. But, you know, if we sort of look maybe enough out sort of 10 years, you know, 20 years, what do you think will be the direction we head in for how developers interact with their code? So I think, first of all, it is pretty safe to say that two or three years down the line, all code will be touched by AI in one way or another.

Starting point is 00:50:59 Either it will be generated by AI in parts, or it will be reviewed by AI, or tests will be generated by AI. Something will be done by AI to automate the mundane parts of the job, right? There are so many repetitive work being done, and a lot of it really can be and should be automated by existing AI machinery. And I believe that this is already happening and it's going to accelerate as these tools become more mainstream. So I think that's like an easy prediction to make and I hold it very strongly. Looking 10 years down the line, this is really like speculation and opinion. I think in specific domains, it is going to be the case that you're going to see a lot of automation.

Starting point is 00:51:51 Like if what you're doing is writing components for UI of a particular area, right? I know UI for medical devices, right? It has to be very specific. And the reason that it has to be, then a lot can be automated from intent. The reason that it has to be specific or domain specific is because, again, it's all a question of intent. How do I express as a human the intent to the machine? And this intent is always going to be very partial, right? Otherwise, I'm going to have to write a lot of the intent to the machine. And this intent is always going to be very partial, right? Otherwise, I'm going to have to write a lot of English prose to express what is it that I want. And this English prose is going to be ambiguous. And this English prose is going to

Starting point is 00:52:34 be harder to debug than actual code. So I don't believe that in general purpose programming is ever going to be replaced by English. That sounds preposterous to me. Maybe I'm wrong. But in domain-specific things, I think we're going to see a lot more automation, but it's not going to be... I conjecture that it's going to be more along the kind of low-code, no-code idioms that, you know... Yeah, yeah.

Starting point is 00:53:02 It's similar to Wix, in a sense, doing website, right? If you're doing website design, you can get a lot done without writing a single line of code. But this is kind of domain specific system that can assume sensible defaults when you don't provide them, right? And so you can provide zero information almost to Wix and it will do something sensible because the defaults have been baked into the system.

Starting point is 00:53:30 And this is going to pop up more and more for more domains are going to be automated, I think, in that flavor. And for general purpose programming, which is always going to be around, I don't think anyone should be worried about their job. The job is so much about specification discovery and not about specification implementation. It's not like no developer gets like five pages of English descriptions of the function and then goes to implement it, right? They get the higher level specification and what they actually do is like carve out

Starting point is 00:54:05 the real specification with all the details from it. And programming is about specification discovery, really. It's not about translating English to code. Just to riff on like what you're saying, I mean, I think we have to get there. And the reason why is because something I've noticed happening and it makes sense is when we say programmer, I mean, we tend to think about, you know, someone at a,

Starting point is 00:54:31 what do they call it? I always forget what the latest acronym is, but like a Silicon Valley company or startup or whatever. But in practice, there are way more people writing code outside of those areas than in an inside. And I think that's going to continue to grow as software permeates just everything. I was talking to a gentleman just like a few days ago and he was saying, well, I have this idea, but like, you know, I don't know how to find programmers and I'm just thinking like, yeah, like,

Starting point is 00:54:56 I mean, what you're saying sounds cool, but like, you're going to have to convince somebody who can be paid a lot more to do something a lot more worthwhile. And that's not to say your idea isn't. It's just like, and I was very, I didn't say that, right? But like, it's not that your idea is bad. It's just like, you're not, it's so niche

Starting point is 00:55:12 that like the return there is just never going to be big enough to pay, you know, someone what, you know, they can do, you know, at one of these other, you know, larger addressable market companies. And so I think as you look across just every company needing, you know, some amount of programs to be written and software to be developed, we have to figure out collectively, or I guess someone figuring out collectively how to like,

Starting point is 00:55:37 empower people to do program writing at the level that can be trained more easily, rather than requiring this, you know, sort of academic background, you call it a sort of general purpose program. I think it's a fair term for it, you know, being able to fully debug something, write completely custom greenfield code, you know, this kind of stuff, but instead just turn out another version of something and make small tweaks. And, you know, people in mom and pop shops are going to have to be able to do that. And something like you're saying is a way that happens. don't know it's the way it happens but it's a way we sort of get this stratification where programming becomes a much more diffuse term all the way from

Starting point is 00:56:15 this sort of low code no code down to the people you know optimizing assembly interlux for you know microsecond improvements on you rates of AAA video game titles. I think there's another trend that will for sure become stronger, which is making programming more forgiving in general. I write a program,

Starting point is 00:56:38 it should run and do something. And we've made huge progress moving from C, C++ to like Python JavaScript, right? The barrier to get hello world, right? The time to hello world has shrunk considerably moving to these languages. And I think we're going to see more of that

Starting point is 00:57:00 by the environment making also some sensible assumptions. If what you're doing is writing like some website using Vue, React, whatever is the latest cool thing to write website frontends, then there are sensible defaults that could be made, simplifying a lot of the grunt work that has to be done right now. I think this is another trend that will continue to grow with the assistance of AI, because you do need some sort of intelligence to figure out how to deal with the complexities there. Yeah, one thing I've seen that's been really cool is, I mean, I saw this first with Wolf

Starting point is 00:57:43 from Alpha, and now there are some startups doing it, but basically ways to query knowledge stores, like query databases. And you're seeing things like, give me the average revenue for all the people in Switzerland, all the customers in Switzerland. And just from that English sentence. So that's, as you said, that's something very domain specific. If you just have an arbitrary database called Foo and all the columns are called bar and baz like they can't do anything but but it can pull out unique things like oh this column is called country and and i just know when people make a column called country i can assume what they're going to do and so yeah you're starting to see this with databases and and those queries and i think it's just a matter of time before it gets to code as well.

Starting point is 00:58:26 Yeah, you're starting to see it, but it's always super subtle. And from what I've seen, at least, you write this in English and then say, but wait, when I said Switzerland, what exactly did I mean? Did I mean like also the part of Italy that is, you know, working in Switzerland

Starting point is 00:58:43 or only Swiss citizens or what exactly? It's like you become in all these subtleties that it's actually a job of the developer to kind of go back and say but wait, there's actually a question here. Is it the citizens of the country or

Starting point is 00:58:58 people who work in the country or what exactly is the definition of these terms that you're using? And this is exactly is the definition of these terms that you're using, right? And this is exactly where the programming language doesn't let you be ambiguous. Yeah, that makes sense. And this gets back to your code synthesis. Like, well, you know, I've seen 10,000 people

Starting point is 00:59:17 ask the same question, and 9,900 of them had these assumptions. So we can use statistics to just give people a prototype. And then they'll still have to look at it, but we'll get it right so often that it's super useful. It's like Alexa. You know, Alexa, you know, my parents have like a thick Italian accent. And so Alexa gets maybe 90% of what they say right now, which is amazing. It used to be 0%.

Starting point is 00:59:47 But at some point, it climbed up. And even before it hit 90%, it got to the point where it became useful. There was a crossover point to where they could just tell Alexa, set a timer or set the timer or whatever. I did a terrible mom accent. Sorry, mom. But they do set a timer or whatever. I did terrible mom accent. Sorry, mom. But they do like, you know, set a timer or something like that. And it's correct enough that they, you know, when it's wrong, they just to try again or use their phone or something. And it's still a net positive productivity. Yeah. Cool. Well, let's talk about about tab nine for a minute. So tell us a bit

Starting point is 01:00:23 about what it's like to to work at tab nine are you guys currently like wholly remote i mean the world's a crazy place these days uh how are you guys sort of handling that and uh yeah yeah so we're mostly working remotely due to kind of covid restrictions uh we do have a physical office which is is mostly empty, I guess. But people do come in. Yeah, we have some remote people in the US as well. And we're always hiring interesting talent. On hiring, do you guys do internships? What kind of background are you looking for from people?

Starting point is 01:01:01 I think internships are, again, really hard to do do remote i think you're missing a lot of the experience and the immersion and like meeting a lot of people and so i'm not a huge fan that the current uh climate let's call it so watch this space okay yeah exactly but but uh definitely Yeah, exactly. But definitely interesting. And we're always hiring people who have like an LPL background, like machine learning programming languages background. Tab9 works on the intersection of programming languages and machine learning on representations of code and learning models of code and doing efficient inference for models of code and learning models of code and doing efficient inference for models of code

Starting point is 01:01:47 and stuff like that. We're kind of probably the biggest Rust shop in Israel. And we take a lot of pride in being Rust and Rust enthusiasts and dealing also with all sorts of low level inference code using Rust and also assembly, actually, to be honest. Yeah, I mean, that's fascinating in itself. I have many questions about Rust and about, you know, this sort of like low latency response time things are very interesting.

Starting point is 01:02:19 I think that's an area of growth. Anyways, can't get into that. We're running out of time. So, yeah, how can't get into that. We're running out of time. So, yeah, how can people try out Tab9? Like, what is it?

Starting point is 01:02:29 Are people able to use it? Is it cost an arm and a leg? How does it kind of work? Yeah, so Tab9 is completely free. If you're like a developer, you can download it now,

Starting point is 01:02:38 install it, and use it completely free forever. The free product is very powerful and useful. and we have millions of developers using it in their IDE as we speak. I think it's really, really powerful that the free is almost too powerful, the free version.

Starting point is 01:02:57 That might be biased, but yeah. No, I'm serious, actually. But if you are working in a team, you get exactly to what you said earlier, Patrick, that you would benefit greatly from training a model for your team. And these private models are how Tab9 makes money at the end. We train private models for teams and you just connect your repo and you get a private model delivered completely automatically for your team that knows the kind of the idioms of your team and the vocabulary you know the entities and what's going on in your code base and gives you much better tailored completion

Starting point is 01:03:41 completions for how you do stuff and you can also customize it to say things like, oh, you know, don't train on that legacy code. We're actually trying to escape that. Yeah, sure. If you keep training on it, you get like this effect. Yeah, you get the Jupyter level kind of gravity of the legacy code that you could ever escape, right? Yeah, I mean, that's like anytime we point our code count at,

Starting point is 01:04:08 we use protocol buffers and it generates tons of code and your line of code count is swamped by all this auto-generated code. And I think there that, you know, it used to be, I always thought, you know, oh man, stuff's expensive. These developer tools are expensive. But then now like starting to think a bit more with a manager hat,

Starting point is 01:04:26 I realized actually like programmer time is very expensive. Anything that helps people sort of go faster and not break things and make mistakes. Oh wait, I think I'm misquoting. Anyways, you know, anytime you can sort of help people in that domain, like it's enormously valuable. And I think people are catching up

Starting point is 01:04:43 that like programmer efficiency is a huge bottleneck. It's a huge bottleneck. And we see, you know, a lot of improvement when using tab nine, like users report between 15 to 30% improvement in their productivity, depending, again, it depends on how, let's call it, mundane part of your coding is. Or if you're using very languages that contain a lot of boilerplate and Java comes to mind, for example. Then it's really, really useful. If you're writing front-end code, it's extremely useful. I think in terms of languages,

Starting point is 01:05:27 our distribution is kind of JavaScript, Python, and Java, I think maybe are the top three languages of tab nine users. Yeah, I think it's extremely useful in all languages, but these three see like huge adoption. Also PHP and I think PHP and Rust. We use it ourselves, obviously, for Rust. This thing that you kind of alluded to also

Starting point is 01:05:56 intrigues me because this is, again, thinking with Manager Hat. It's sort of not talked about, and it's pretty expensive, which is when you bring new people onto the project, even if they know what they're doing, like getting them adapted to the sort of idioms and styles and all of that through code review, takes probably even more than one-to-one number of hours for those new people to write the code, of people to review and correct the code.

Starting point is 01:06:22 But it's also frustrating. It's a, you battle sort of like, this isn't a personal thing like egos come in like it's a very expensive number of hours drain on the team thing to bring new people up so having a tool that sits alongside of you and sort of like in a in a low ego risk way sort of encourages you to write conformant code i mean to, this actually is super exciting. Yeah, it's low-ego. And also, it's there for you in the sense that you don't reach the code review and get slammed, right?

Starting point is 01:06:55 Because you already get something that is... The worst case is that you've done the common thing, right? So it lets you really punch above your weight in a sense right because like you know i'm at least as as bad as as as the average i will not fall below that and i think actually much better and an interesting thing that happens with tab nine is that when you see the suggestion it also makes you think sometimes, like when I use tab 9, I get some suggestions like, but wait, that's not the way I intended

Starting point is 01:07:29 to do this, so what am I missing? And actually, it turned out several times that there was just a new API that I didn't know about, right? And people started using, but I'm like, I haven't touched that for a long time, so discovery there, even if you end up not using it,

Starting point is 01:07:47 just being aware that this thing has changed is really interesting. And this also kind of gives you a picture of what's going on, right? As you're programming things that you would not be aware of otherwise. So that's another aspect of that. Nice. So in the show notes, we'll have a link to Tab9, to their Twitter handle, to Aaron's Twitter handle. If you have any questions or reach out to him

Starting point is 01:08:10 or you want to check out the product, I mean, it's super exciting. I'm going to go try it, even though you didn't name my favorite languages. We're going to go see how this goes and give it a whirl. But I had a very enjoyable time. Thank you for coming on with us. Thank you so much for having me. I had a very enjoyable time thank you for coming on with us thank you so much for having me i had a lot of fun

Starting point is 01:08:27 music by eric barn dollar programming Throwdown is distributed under a Creative Commons Attribution Share Alike 2.0 license. You're free to share, copy, distribute, transmit the work, to remix, adapt the work, but you must provide an attribution to Patrick and I, and

Starting point is 01:08:59 share alike in kind.

Your Ad Here

Programming Throwdown - 127: AI for Code with Eran Yahav

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.