Programming Throwdown - 127: AI for Code with Eran Yahav
Episode Date: February 14, 2022Brief Summary:Programming is difficult as it is, but imagine how difficult it was without all the current tools, compilers, synthesizers, etc. that we have today. Eran Yahav, Chief Technology... Officer at Tabnine shares how AI is currently helping with code writing and how it could change in the future.00:00:16 Introduction00:00:51 Eran Yahav’s programming background00:08:11 Balance between Human and the Machine00:11:49 Static Analysis00:29:42 Similarities in Programming Constructs00:25:30 Average vs Tailored tooling00:36:19 Machine Learning Quality Metrics 00:38:27 Rollbar00:40:19 Model Training vs Statistic Matching00:50:19 Developers Interacting with their Code in the Future01:00:18 Tabnine01:08:17 FarewellsResources mentioned in this episode:Companies:Tabnine: Website: https://www.tabnine.com/Twitter: https://twitter.com/Tabnine_LinkedIn: https://www.linkedin.com/company/tabnine/Social Media:Eran Yahav, Chief Technology Officer at TabnineTwitter: https://twitter.com/yahaveLinkedIn: https://www.linkedin.com/in/eranyahav/Sponsor:RollbarWebsite: https://rollbar.com/Freebies: https://try.rollbar.com/pt/If you’ve enjoyed this episode, you can listen to more on Programming Throwdown’s website: https://www.programmingthrowdown.com/Reach out to us via email: programmingthrowdown@gmail.comYou can also follow Programming Throwdown on Facebook | Apple Podcasts | Spotify | Player.FM Join the discussion on our DiscordHelp support Programming Throwdown through our Patreon ★ Support this podcast on Patreon ★
Transcript
Discussion (0)
Welcome to another episode of exciting,
deep programming talk with Programming Throwdown. Jason made me feel a little bit sad the other day.
He referenced something that we did like a decade ago related to Programming Throwdown. Then it
made me realize, oh, that's really good. We've been doing this podcast a long time. And then
I got really sad because I'm like, oh, it means I'm old. So no, I'm just kidding.
I enjoy doing it.
Jason, I know it's great.
Okay.
Anyways, that was off topic.
So we're here with Aaron today.
Welcome to the show, Aaron.
Tell us a little bit about yourself, how you got started in programming.
Yeah.
Thanks for having me, guys.
Yeah.
I think you started by saying you feel old, so I should really keep quiet on that front.
I mean, I started programming keep quiet on that front.
I mean, I started programming many, many years ago.
I think my first computer was the IBM PC, I think with five megabytes hard drive or something like that.
That was like the super duper.
You could never fill that up.
Yeah, exactly right.
And dial up internet and stuff like that. that that has been a long time ago first programming language was basic if you guys even know what that is and so
it's been a long time since then i've been doing programming from a young age i don't even know
since when did undergraduate at Technion CS,
then did military service, mandatory in Israel,
for six years.
What else?
PhD, computer science.
Went over to the US to do some work on program analysis,
program synthesis at IBM Research at the time,
TJ Watson Center in New York.
And then came back as a faculty of Technion doing machine learning over code, program analysis, program synthesis, all these great things. And during that time, I really got really deep into program synthesis as something that can transform how we write code in reality and not just in academia.
And for many, many years, I've been fascinated by the idea of programs that work on programs, compilers, synthesizers, debuggers, any program that operates on other programs.
I found that fascinating from a very young age.
And this is what I've been doing for a long, long time.
Awesome.
Well, that was a very sweet moment.
I got a lot of stuff, a lot of questions already.
So yeah, so starting in BASIC, I mean, I can relate to that.
And then I started in C.
And so now that makes me the guy who always has to write the
bit shift operations.
So, yeah, I mean, that's quite a sweeping thing.
So you even in our intro, we're saying that you consider yourself sort of academic or
you're teaching.
I spent a lot of time in university.
I know that's similar to Jason.
Maybe just a bit before we dig into the other stuff.
Like, what are your thoughts for people who are debating, you know, sort of spending extra
time in school versus sort of entering the workforce?
What are your thoughts about that?
We get that question a lot.
Yeah, I think I'm quite biased, probably.
I think that going to school is great.
It really opens your mind and gives you a lot of taste and kind of experience and things
that you would never experience in the workforce, right?
Like in a daily job, just because the incentives and the objectives are very different.
So I spent four years in undergrad.
I think today many programs only require three years.
And I think that fourth year in which I did like courses in networks and ai and you know whatever natural
language processing and things that you know i would never like even know that these topics exist
let alone like go into the depth that you can go in undergrad if i only met them while doing
but no like working at a company.
It's just impossible to get this kind of like wide perspective.
And I highly recommend for anyone to just spend this extra year
or extra couple of years getting exposed to things
because this really changes the way you think about problems.
It happens to me very frequently that I bump into some problems and say,
but wait, that was something that I actually met in computational biology.
There was an algorithm like that in computational biology.
And even if I don't remember the details, I kind of have the pointer in my head
and I can go and look for it and kind of refresh my knowledge about it.
So I think that has been like building this kind of reference in your head
of all the topics early
when you're young and things are
really big. It's easy to
get them recorded at
the infrastructure level of your brain.
I think that's super helpful
and it's an investment that
really
pays itself back very quickly.
Nice. Yeah, Thanks for that input.
I mean, I think that's something that people debate.
I'm not sure there's a right answer,
and I think it varies person to person,
but I mean, I think that's really good observations.
And you said something which I think a lot about,
and I know Jason does too,
which is this word incentives.
And I think it's really interesting to view a lot of things,
I don't want to say in life, that sounds too deep,
but like a lot of things, at least in, you know,
interacting with people at work and thinking about incentives.
For me, that's been something that helps explain a lot.
And so as you point out, when you're in school,
I think your incentives are very different than when you're in the workplace.
And I think, like you said, the experiences you'll have,
even just thinking in that lens,
you can sort of understand how they may be starkly different.
I think it's important to understand that every endeavor at the end is a human endeavor.
So research is, and the workplace is, and people are at the end motivated by, even if the incentives or the kind of target functions
are very implicit, they are there.
And in the workplace,
you're supposed to get something done at the end, right?
A kind way of saying humans are greedy.
Yes.
No, no, it's not greedy.
I think it's not greedy.
It's natural, right?
That's true.
Greedy ascribes some sort of moral aspect to it, I guess.
Yeah, exactly.
All right.
So, you know, we talked about in Jason's intro, he said that this is going to be AI for code.
So, I mean, I think talking about how code development works and even in your sort of
like introduction talk, you talked about synthesis analysis program.
So back when I was first writing QBasic and, you know, sort of input introduction talk to you talked about synthesis analysis programming so back when i was
first writing cube basic and you know sort of inputting that and like line 10 go to you know
10 whatever i wrote an infinite okay anyways you know it was just an editor the editor was very
simple i don't even remember recall it having syntax highlighting um you know that was sort of
the it may have had i can't remember now it's been too long those early days and we're starting to drop some of those words so everyone kind of i get i think
understands you open uh vi emacs notepad.exe just depending on your on your platform and you start
putting like things that represent a program as text into that that sheet and that that's sort of
everywhere everywhere starts and then sort of take us on how you think about,
you know, I think there's a lot more
than just editing the code,
but maybe starting there,
like for me at least,
well, maybe it starts before
and how you think about
what you want to put on the paper as it were,
or we want to put down.
But maybe take us through a little bit
like how you think about those early days
and how it was approached
so that we can sort of set up
how it's sort of undergoing some amount of transformation.
Yeah, I think it's all a question of balance between the human and the machine, in a sense.
In the early, early days, you had to work really hard to satisfy the requirements of the machine.
And the machine was kind of brutally unforgiving, right?
Like you could make one single mistake,
nothing told you that you made that mistake
and the whole thing would go awry when you run it, right?
And so over the years, people had this brilliant idea.
I think Fortran was the first,
that you can use the machine
to help the human deal with the machine,
which is, I think, the idea behind the compiler.
The early compiler is exactly that, like help the human write something that is slightly higher level.
The compiler will do some work to check it and maybe give you some useful errors.
And then the machine would run it for you. And so this is kind of
really the early idea of having a machine help the
human program it and making it more forgiving in a sense.
Yeah, I was right there. So maybe people don't realize or
know. I mean, I wasn't exposed to it initially and then found out later. But so
originally, people would write literally the hex codes for assembly or no i mean i wasn't exposed to it initially and then found out later but so originally people
would write literally the hex codes for assembly to like you know basically script the exact
flipping of the gates inside the microprocessor oh this is my background i'll get here all day
so i'm not going to talk about this but like you know actually writing hex that is stored in an
e-prom or or whatever and the you know the c CPU actually executes was like, step zero, the very first
people had nothing else, I guess that's what you kind of got to do, right? Then people realize,
oh, hey, hang on, we can use I believe the word is mnemonics, right? Like, I can say hex code,
this is actually the word MOV move, right? And why it wasn't longer, yeah, someone else, but like,
MOV, you know, branch if equal b and e branch not equal right like these
all these stuff and moving the registers and still thinking about the physicality of the compiler
then then the very early thing was like a two-pass assembler right so like i could use labels so i
could use a label for a line number or an address and use something else in the compiler would scan
through once pick up all your labels figure out where they go and then scan through again super super low level and so so what you're talking about fortran
this idea that like you don't need to write like one-to-one assembly instructions was like yeah i
think i agree like a very early breakthrough in in how to get the machine to do the work that you
really didn't need to be doing yeah i think since, I think there's been a lot more kind of development
in how the machine can help you.
You mentioned syntax highlighting earlier,
but also all sorts of more sophisticated compilers
that can give you really deep error messages
on what you did wrong.
And also linters and static analysis
that can point out common
errors and maybe even suggest corrections and all these things that make our lives easier.
But at the end, at the very end, programming is still extremely unforgiving, right?
If you compare it to having a conversation in natural language, it's still like you can spend the whole day,
you know, like flipping
because you flipped two parameters to a function
or just forgot the semicolon somewhere, right?
And this is like extremely unforgiving,
even today.
And it's been a while since Fortran, yeah.
Oh, that's true.
Okay, so I think people probably know
like what syntax highlighting is,
and if they've had any experience, they probably kind of get that.
So static analysis, though,
maybe can you explain what static analysis is?
Yeah, in essence, it's really kind of like an expansion
of the compiler to check properties
that are more than maybe just type checking
or the standard things that the compiler checks to
more sophisticated things maybe properties like i know your program does not divide by zero or you
don't have an alt reference or no overflow and properties of that nature and all the way to like
full program verification which is checking that your program satisfies kind of a functional
logical specification, you know, that a given function
that is supposed to sort an array actually returns an array
that is sorted and check that statically,
meaning without running the program, right?
So the static part here means that
i'm not going to run the program i'm going to statically check it at compile time and give it
a result and now some of the listeners may like scratch their head and say like but wait this
sounds like something that is undecidable because if they've heard about kind of the whole the whole thing problems
say like wait can you check that the program terminates probably you cannot and so indeed
the problem is undecidable but you can solve approximations of this problem and typically
the way that standing analysis works is that it gives you conservative errors
or conservative reports,
meaning that if it says that there is no error,
then it's guaranteed to be correct,
but it may give you false alarms saying,
hey, your program may divide by zero,
which actually doesn't.
And so that has been quite useful at the end.
But it is also often quite frustrating for developers
because they chase down all sorts of reports
that turn out to be false alarms.
And that can be really frustrating.
So I think, yeah, I mean,
so I've run static analysis before. And I mean, I think the compiler does, you know, some form of static analysis. And even now, like the better, but like separate tools often, you know, go in more depth. And this frustration you're expressing is, was sort of my experience that you get a lot of stuff where it sort of just gets confused, which already we're sort of talking about the computer doing more and talking about getting confused or not understanding and we're talking somewhat about like humans have some
intent when they write their code but then how they express it may not match and then something
else is trying to like understand it so when we say like the computer tries to understand what
you're trying to do or tries to check for a divide by zero? Like, what are we actually meaning there? So, yeah, without like getting all academic here,
I think the right view is that the static analyzer
or the machine, let's call it,
is constructing some obstruction of your program
that it can reason about.
And that obstruction does not always match your idea
of what the program does.
And this is where the confusion or the mismatch comes from.
And it's also the reason that sometimes or often
these tools cannot explain why they got what they got, right?
Or not in a way that you would understand it,
not in a way that would be useful for the human.
So the chain of reasoning that led them to the conclusion
that you are amazed by or shocked by
is not easily explainable to human.
But you're spot on in the point of kind of understanding the intent and communicating intent to the machine.
And all these questions are really central, both for program analysis and program synthesis.
How do you know what is it that the human was trying to do?
I mean, I think like when I tried to explain to people who aren't in programming,
like what debugging or what these things are like, I just get blank stares. But I mean,
for people who have programmed a lot, I think a lot of people experience that where
you write some code, you have some intent, you run it. And even it turns out that actually
the computer got it right and you got it wrong, that you thought, you know, a number,
you code it better than you like solved your own problem.
And I think for me, the interesting thing about,
you know, static analysis,
or even just compilers adding that
to their repertoire over the years
is like the number of things
you can hold in your head at one time.
So the interaction between functions and that,
you know, oh, hey, you're calling this thing over here
and it has this or that.
And did it really is, well, I come from a CC++ background. So like, you know, for me,
it'd be like, can this function ever return a null pointer? Well, if it's simple, maybe it can never
return a null pointer unless a null pointer was provided into it. You know, you can kind of start
keeping track of all of those things. And we know a computer is really great at, you know, doing,
you know, repetitive things very quickly. And so holding all of those simple functions And we know a computer is really great at doing repetitive things very quickly.
And so holding all of those simple functions in its head, it can do those sort of bounds checking,
like what is the biggest number that this could get? Is it an overflow or underflow?
And even if it can't tell you you're right or wrong, what I've seen be useful is that sort
of cooperation where it sort of come back and tells you like, hey, based on what I see here
right now, this is the range of values you can sort of expect out of this. And that's when
you sort of start to see, oh, wait, there's actually this like, iterative thing where
I'm doing something, the computer's telling me what it thinks I meant to do, I'm telling it that,
you know, and you sort of do this happens to me, I guess more now, as like the error messages are
getting better, and whatever, versus when I first started out,
what I would find myself doing is running, you know,
enormous blocks of code and, you know,
trying to run the program as a whole and just see if the right thing comes out
at the end.
Absolutely.
So that's exactly the progress that we're seeing this tight loop between the
human and the machine, that the human is doing something.
The machine is like giving you feedback. And now, essentially, over the years,
we are getting closer and closer
to working together with the machine
when you write the program.
What you described effectively
is the iterative discovery of the specification.
When you start programming,
you have a loose idea of what it should do, but you start programming, you have kind of a loose idea
of what it should do, but you don't have all kind of the edge cases
and all the subtleties because, you know,
you start with high-level thought of high-level spec.
And as you write the code, you start to discover and unravel
kind of the low-level details and the hidden complexity
that is in there. And what you described is the machine kind of the low-level details and the hidden complexity that is in there.
And what you described is the machine kind of helping you unravel that
as you make that progress.
And I think this is also where synthesis comes in,
which is the idea that if you express your intent clearly enough,
the machine can actually predict what is it that you are trying to do
from your intent and from context and
complete the code or complete the thought for you in a way that also kind of prevents
you from falling into the standard or like the common pitfalls around that area.
Yeah, that makes sense.
I think one of the things that is really, I think, where program synthesis becomes important is in figuring out all of the assumptions that matter and the ones that don't. in the biggest possible integer ever, and now your program will not work or will overflow.
And it's like, okay, yes, that could have happened, but it's not useful, right?
And so it becomes, as you said, this symbiotic thing.
And so I think one of the big challenges is how to know what sort of errors are useful
for people.
That seems to be kind of where all the magic has to happen.
Right.
And I think this is exactly why I got excited by program synthesis
as opposed to kind of the static analysis tools.
So I've worked for, I don't know, several, many years
on static analysis tools.
And really, as you said, the symbiotic thing
with static analysis tools is quite tricky
because you have this thing that is always complaining and complaining about things that
are kind of like, hey, I don't care about that.
That's not the problem here at all.
And so it actually ends up distracting you more than helping you.
So it's a symbiotic thing.
But there's like this nitpicker that is
like complaining about the immaterial stuff and distracting me from my actual thing and breaking
my train of thought, right? And this is why I think that's the challenge with negative tools
that complain all the time. And this is why I find program synthesis so compelling because it says, hang around, I see what you're doing.
This may be really helpful for you, like this piece of information or this next line of code.
And I say, you know what?
I don't care.
Let me keep on typing.
I'm like, I'm on a roll, so I'll keep on typing.
I won't even notice. even know this, but if I'm stuck for this extra fraction of a second, and I look at the suggestion
of the tool and say, oh, yeah, that's exactly what I wanted. Thank you, dear program synthesis,
and I can consume this and keep on going. So rather than complaining, it is suggesting things.
And I think maybe the simplest analogy is kind of like assume that instead of type ahead on your phone, you had just a spell checker that complains all the time when you make the typing mistakes.
Right. And so type ahead is infinitely more useful. Right.
Yeah, I mean, I think like to take like a very specific example, I guess.
So, you know, Jason's thing, like I just write a function called, you know, increment or plus one. And, you know, the thing is complaining,
squiggling lines everywhere,
being like potential overflow,
like no bounds checking, whatever.
Like you're right, that's really annoying.
And I guess there's like,
as programmers progress
or even like their understanding of it,
what they want to do,
as Jason pointed out,
I'm going to be like,
yeah, that's not really practical.
I don't really care.
But on the flip side,
I know that is a risk.
And if we start talking about like how components get reused in ways that didn't get expected,
if the tool that I'm using says, you know, hey, actually, let me add in the bounds check
for you, then unless it's performance critical, again, my background, I'll probably let it
do it, right?
Like, oh, if equal to int max, you know, just do nothing.
Like, okay, well, at least it didn't increment,
but at least it doesn't, you know, overflow
and give me a number smaller than what I put in.
So I can at least say that, you know,
now I know this function never, you know, shrinks
and only grows, but maybe is equal in some small case.
And so having the synthesis, like you say,
is actually in some ways applying it useful,
more complicated because you first need to understand
there's a problem and understand like an acceptable solution that like hey an acceptable solution here
is to balance check and do nothing to balance check and throw an exception uh and what framework
i'm in what language i'm in i guess like that could vary wildly yeah a sense, it is kind of more complex,
but it's also more in your flow,
right?
So abstractly you're correct that I need to understand more,
but I need to act less in a sense.
I need to look at it and say,
yeah,
I don't care about this.
And like,
whereas with the static thing,
I'll have to work for the tool to satisfy it,
right?
Because I'll have to like,
and ignore bounce check on the thing, or I'll have to do something actively
to make these things stop complaining.
And I think that is the frustration
that a lot of people have around that.
I think synthesis is kind of like the cure
for a lot of these things.
Do you find like, you know,
when you talk about static analysis or or
synthesis do you i mean like some languages aren't used so heavily anymore so maybe
excluding those but of languages that people would encounter every day do you find that like
people's acceptance of of those kinds of tools varies based on like the kind of language they're
using or the kind of development they're doing? Or are people generally pretty open
to having the computer help them out more?
Yeah, I think people are very open.
I think there's kind of like
the bottom 5% of developers
know the really, really new ones
who don't know what's going on
are going to get really
confused by synthesis tools or potentially confused because they're getting these
suggestions and they don't know what is it that they're getting and why.
So it's kind of like, it's going to be tricky.
And the top like 1% of developers who are doing device driver development, one-off thing, algorithmic,
this or that, the suggestions are not going to be probably on target because their intent
is so one-off and complicated that this doesn't match any of the common distributions of how
people write code.
So either they would need some specialized model for device driver development which is kind of like almost its own language right it's like its own set of idioms
or they would rather not use any tools because also you know they work in vi and
all the stereotypes around that right so, I mean, I think like
we've had those editor debates before.
I think even on this show,
we've talked about it.
And I think people change over time.
And I think you can get lots of tools
to do lots of things,
depending on how much you're willing
to sort of spend on it.
But for me, the realization I've had
of where I am now, at least,
is this computer cooperation,
which is, again, that the computer, just as simple as, hey, there's this function,
where is this function coming from? And sometimes it actually turns out it wasn't the function I
thought it was, it's pulling it from somewhere else. And having used various tools, sometimes
that is a guess based on, you know, code match.
And sometimes it's based on actually, you know, understanding how it's being built.
And this is leading me, I am going somewhere.
And that is that, you know, across the different platforms, different code bases, different
frameworks, there's a lot of difference.
How does the tooling sort of like, we said builds an abstraction, but is it like just like operationally, how does the tooling sort of like we said builds an abstraction but is it uh
like just like operationally how does that work if we have programs in c++ java python like is that
intermediate common enough to be useful and reasoned about you know across all of them and
it's you know front ends and back ends into and out of those things or do we need sort of like
some understanding that differs you know like were saying, for a device driver,
you might need something just entirely different.
How common or unique are each of those scenarios?
So, yeah, it's hard to discuss this in the abstract,
so I think we need to concretize it a bit more.
So most of the synthesis tools or let's say the AI for code tools do have some semantic aspects that are language specific.
They have some, let's call it extraction procedure that extracts some representation, interpretive presentation from the language, which includes information about maybe types, maybe how data flows between variables, maybe some other things, which is quite language specific, and they extract it to a common presentation,
much like a compiler does before it generates code for the backend.
So for people who don't know like, you know, GCC or something like that,
like standard compilers,
they may have a lot of frontends and a lot of backends,
but they communicate by extracting or by first translating
the frontend language or the
source language into some interpretive presentation and the backend works of that.
So similar to that model, standard analysis tool or program synthesis tools also have
typically front-ends that extract some semantic information to an interpretive presentation
and then the entire backend
works of that.
So yeah, there are some language-specific, let's call them features, but most of the
machinery is language-agnostic.
I see.
So I don't know too much about GCC other than it's GCC.
So we can talk about Clang, and then I'd like to contrast it with like JVM or, you know, Python. So for many languages that can be compiled by Clang, Clang has this IR, this intermediate representation. And it's actually a very powerful thing because anything you can get into the intermediate representation, you can get machine code out for any backend that Clang supports. So, you know, PowerPC, x86, ARM, you know, whatever
your specific, and if you want to write a new, you know, host system, it's just as simple as,
you know, writing another one of these backends. And then optimizations can take place in this
intermediate represent. Okay. So Java has something, I guess, a bit similar, right? You
compiled to this sort of the bytecode and other machines can target to that bytecode. I don't know what it is for Python. But once they're in these representations, and we're sort of at that, you know, intermediate, how similar are like the programming constructs in Python? I've never really thought about it, like in Python, versus, you know, Java versus sort of like, what would have come out of C++?
Yeah, I don't think they're necessarily very similar at that level,
but kind of the concepts of dynamic binding
or all these kind of like fundamental program language concepts are there.
The modeling can be quite involved.
For example, if you've ever looked at how Scala is compiled
to the JVM, if you had the kind of, I guess,
maybe misfortune of looking into these details,
this is extremely, extremely involved.
It's really beautiful conceptual and engineering work
by the Scala team. It's really amazing, but it's really beautiful conceptual and engineering work by the Scala team.
It's really amazing, but it's really complicated.
Sure, sure. Okay. So yeah, so when we have the static analysis tools and synthesis tools,
and we have them per kind of thing someone is doing, I, I mean, I guess we can, we can sort of like keep moving
up the stack. So you have your code base, right? So I have, I have my code base and how my code base
behaves may be very different than Jason's code base. We do entirely different kinds of
programming. So how did these tools sort of balance the, like, like you sort of said,
the meat of the distribution, the sort of like average case versus the tailored case of, hey, on this project, we've made these, I don't want to say artistic, these opinion-based choices.
You know, like how does the tooling sort of balance between those?
Right. build some universal model, and that universal model captures how code behaves
in the, let's call it the common case, in the wild.
Let's say I look at all the C++ projects on GitHub,
and I will get some abstraction of what these projects do,
how they're supposed to behave.
I'll get some distribution of the expected code completions
or code predictions
that I need to do for these projects.
They would work quite well for the majority of new C++ projects.
But if you have made your own opinionated kind of decisions
that are veering off significantly from, let's call it a general population,
gen pop, then you would benefit greatly from training a custom model for your project,
your organization to kind of capture these notions, right? And so the bigger your code base is, effectively,
the more you would benefit from having your own private model
to better capture how you do things,
how your team is doing stuff.
So then, I mean, I guess you're starting to veer, obviously,
where we're going.
But the static analysis, code completion,
doesn't have to always be done by a trained
system, right?
I assume that prior, these were things that were hand done.
So hand modeling, hand feature extraction, hand suggestion,
and tuning, and then, you know.
Yeah, that's exactly right.
So maybe there's kind of a pause that we need to make here
and kind of distinguish, I guess.
I don't know if it's the second, third, or fourth wave.
So I'll not put a number on kind of which wave is it
of the static analysis tools.
But let's call it the previous wave of static analysis tools.
Uses hand-coded rules that say, oh, if you call foo
and then you call bar, then this is really a bad idea.
Like if you check whatever Java to say,
if X equals null and then the next line says X.foo,
then this is not a good thing, right?
Because you're likely to get a null reference.
And so these are like hard-coded rules
that capture the common case.
They're manually crafted rules.
And this is how like Linters, like ESLint work, right?
They have like, I wanted to use like the S word,
but they have a bunch of rules that were written over the years
that capture a lot of like common anti-patterns and these may not be the anti-patterns
for you and your project and this is why you get a ton of kind of complaints from these tools that
because they're capturing generic things so that was like the previous generation of tools the new
generation of tools is using ai to learn kind of what is it that is being
done in your code base?
What are the patterns?
What are the anti-patterns?
What are things that should be avoided?
And it's actually using that information to give you much more targeted kind of alarms
and reports on your code in case of checking and also much better predictions of what code should be written or how
to complete your code when you're doing the program synthesis right so this is kind of like
the difference between handcrafted things and learned things and that the reason that we can do that we can learn all these rich information about
your project or about code in the universe is because of the progress that has been made in
recent years in studying analysis technology in machine learning algorithms and models, and also in the computational power. So we're just, you know, we can throw huge GPUs and memory and data sets at the problem
and kind of train models that are able to capture information or rules that would otherwise
have to be handwritten by experts.
So when you're writing these handwritten rules you're saying you know this set of operations
can cause like we talked about before overflow or underflow or undetermined behavior uh or it's my
opinion that uh if you return a value you should always use that value and so like i say that's a
rule right yeah so some set of experts deems you know sort sort of what is good or bad. What are the, you know, when you analyze code bases and, you know, sort of try to apply
machine learning, you know, in my head, I'm trying to think like, how do you get those
sort of quality metrics?
I mean, that the code compiles is insufficient, that the code runs is, I mean, maybe that's
how you, that seems a bit tricky.
Yeah, I think that the crux of the matter is, is specification at the end, right?
Like the reason that I say a value that is being returned should be assigned somewhere.
This is a specification, that's a property.
And that property is probably kind of like a hygiene condition, right?
It's like, it's something that is good to have regardless of what your program is doing.
So that's kind of like a generic universal specification
that should hold anywhere, allegedly, right?
And this is why we check it.
And I don't know how to check for your program
that, you know, a student class
should always have the ID field assigned
because as a general rule, I don't know what is a student.
I don't know what is an ID.
And I don't know even how to express this as a general rule.
And I'm definitely not going to put it in the list of rules of ESLint
to check across the universe because most of the universe
does not have the class student
and a field called id so this kind of like specialized specifications for your project
for your setting are exactly what the machine learning algorithms can pick up and check for
consistency of right and then more importantly with program synthesis, they can generate the right assignments to
make sure that as you're doing these things, you always assign the idea of the student.
They can make sure by construction that you're using these classes or using these pieces
of code the way they were intended.
Today's sponsor is Rollbar.
Rollbar is the leading platform that enables developers
to proactively discover and resolve issues in their code,
allowing them to work on continuous code improvement
throughout the software development lifecycle.
Rollbar has plans for all situations,
from free to large enterprise. With Rollbar,
developers deploy better software faster and can quickly recover from critical errors as they
happen. We have a special URL at https://try.rollbar.com.pt for programming throwdown.
There you can find two free ebooks, How Debugging is Changing
and How Dev Experience Matters,
as well as sign up for a free trial of Robar.
Yeah, so maybe it's like,
I'll use Jason's term I like here,
double click on that for a second.
We go into this that,
okay, so I've seen this, you know,
on, you know, Hacker News or Reddit or whatever.
Someone trains a hidden Markov model over a code base
and I can generate something
that on first pass looks like code, right?
So, and to kind of like unpack that a bit,
it's not my field,
but like what I understand about that is
I can generate new code,
which matches the pattern or statistics
of your code base, right?
So whenever you call this function,
insert new student into database, everywhere in the code base, you do So whenever you call this function, insert new student into database,
everywhere in the code base,
you do that on the left hand side,
you always say Boolean successful equals this function.
And I can tell you when I emit this code,
I always put Boolean sick,
but I don't know why you do that
or why in this other case,
and if it's split 50-50,
then the generation will 50% of the time do it,
50% not, right? Like I 50% not. I can match the
statistics. So I think people probably have used tools where they've seen that pattern matching or,
hey, other places in this file, you always follow this word with that word. I mean,
I've used those tools before. I didn't find them that useful. So what is the objective when you're
doing this training of these models that differs from just sort of matching the statistics and actually sort of doing what you i think you were kind of alluding to like
the right thing yeah so so i think it's kind of the difference between a bicycle and a spaceship
at the essence they're kind of like you know doing the same thing. They're like vehicles, they're moving stuff, but one is just like insanely more powerful than the other.
And really it becomes to a question of how powerful are these models
in the ability to tailor the context in which you make the prediction
and generalize over that.
And so we're kind of,
these days we're using models
with hundreds of millions
or billions of parameters,
neural networks with billions of parameters
to solve kind of this exact predictive question
of here's like the context
that I have in my editor
and what is it that I should be typing next?
And this model contextualized not only on, you know, the last five words that you wrote, which is like hopelessly naive.
They contextualize on the entire context that you have in the file, including natural language, including other peripheral information from the project,
including all sorts of other signals that you have in your environment in order to make a prediction.
And this is kind of being able to contextualize and generalize over that is the magic that gives you really accurate predictions that people appreciate and can use,
as opposed to just, you know, I flip a coin
and I suggest that the next word should be either, you know,
fool or bar, right?
Or like, yeah, you did like DB.
And I guess either the next word is add or remove, right?
It's like, and so it's much more,
the models are just like much more powerful than that.
So are the, I guess like, you know,
again, like for people who may not be aware,
I mean, is it that you have a sort of hybrid system
where you're trying to actually sort of like
imbue some human architecture to these models?
Or are you sort of just setting up the problem
and allowing sort of an end-to-end solution to become trained?
Yeah, so, you know, like in reality,
all these systems are not really end-to-end
just because of kind of the engineering cost
of doing the inference end-to-end
is typically prohibitive,
and you need to do a combination of several models
in order to get the response on time.
And then there's some bias using semantic information.
And again, there are a ton of details
that I'm not sure that this is useful to discuss here.
But yeah, ideally, you would like to have it completely end
to end.
But in the world of practical engineering,
you need to do something more sophisticated than that.
Yeah, that makes sense.
Again, we're working in the realm
of near real-time synthesis, right?
You're typing stuff, and you have
to generate these predictions of what comes next in near
time to be useful.
And this is where the engineering gets really clever.
That's what I was going to say.
So, I mean, I guess it's one thing to be given an infinite amount of time to make a suggestion
versus I'm going to be typing it and you have to beat me to it or it's not useful.
And so, as you mentioned, I can imagine the amount
of engineering that goes in from, oh, hey, I read an academic paper that says, you know, we can,
you know, you know, suggest code completion at an, you know, x percent accuracy to actually being
able to deliver that in an IDE. So maybe to talk about that a minute. So, I mean, you guys have
built a system to do some of this. This is, you know, why you've thought about this so much and, you know, trying to integrate it.
At an architectural level, you know, how does that end up working?
So, you know, I'm typing in my editor and, you know, words are appearing on screen.
What's happening sort of in the background to there's kind of, I guess, half a million lines of Rust code that are running under the hood and doing very efficient neural network inference that involves between two and four different models that are being kind of combined together
and have different trade-offs in terms of response time and accuracy. And, you know,
if you're typing slightly slower, engineer may catch up and you may get slightly better predictions
because the stronger model kind of made it in time.
And if not, you may be getting results from a slightly inferior model. And the challenge really for a lot of it is actually the balance between the human and
the machine.
How do you make predictions at the right places that do not interrupt the human right we're kind
of obsessed uh on these exact kind of finding the balance of when to make predictions what kind of
predictions to make where what should be the confidence from the model before we throw it in
your face uh what other kind of barriers are there before you interact with the human?
And how do you get into the flow of the human in a natural way such that really the human
is kind of easily can easily ignore it if it's not helpful, but actually consume it if it is helpful?
All right, and there's a lot around that. So yeah,. To pick maybe just even an example there,
how does that in practice,
do you guys approach that problem?
One example I can think of is how much code to suggest.
So there's, I could complete your line.
I could suggest your function.
I could suggest your whole program.
I mean, we can't probably program yet.
Maybe we get there in a minute.
But how do you guys sort of balance off
trying to figure out how much
to end up sending up onto the screen?
Right. So luckily, we've been doing this for a while, and we've been serving millions of users.
So we actually run some experiments in the wild to find out, like, what kind of prediction of horizon is most useful for people.
And it also depends on kind of the intended,
again, it depends on context,
but let me put that aside for a second.
It turns out that what humans like the most
is pieces of code that they can make snap judgments
about the correctness of.
So the kind of the tight loop that works best is the human writes something or there's sufficient
context, the human writes something and the machine suggests something that is easily
identifiable as useful.
So like kind of, let's say complete to the end of the line,
but something that is very idiomatic.
So if you see it, you will know that, yeah, that's what I meant, right?
So this is kind of what we call internally the remind me model, right?
It's kind of code that I've written 80,000 of times.
I know what I'm going to write.
If I ask a human next to me, they know what i'm going to write if i ask a human next to me they
know what i'm going to write like my my classical example for that is uh read the python file line
by line that's like a code i wrote i don't know how many times in my life probably thousands of
times in my life right and if i see it i know that it's correct i don't need to kind of ruminate
about it it's like yeah that's it and so that don't need to kind of ruminate about it. It's like, yeah, that's it. And so that's
kind of the classic case in which
I can complete more than one line because
it's really idiomatic. When you see it,
you know it's that. It's what you need.
It's kind of code that otherwise you would
copy from Stack Overflow, right?
That's kind of the
idiomatic thing to
think about it, right?
And so all of this is happening
sort of local on the developer's machine.
And then how does that split happen?
Obviously, some stuff probably needs
to happen in a server somewhere.
Yeah, so what we do technically,
what we do in Tab9 is we allow you
to configure the architecture or the kind of configuration that you'd like
to run in.
You can run everything completely locally, even air-gapped, so you can run it without
network at all.
Or you can run some of the inference on the cloud, which obviously gives you better completions
because you run with stronger GPUs than what you would otherwise have
CPU-only inference on your machine.
Or you can run completely on tab 9 cloud and you control,
you can create your own server in your organization and run it.
So you basically, you can control which kind of models are being deployed
and where do you want to deploy them. I think that's really useful for developers
to be able to control where inference happens,
especially because inference does require
some context from your IDE, right?
So you may be quite sensitive
to where this stuff is being sent,
depending mostly on policy of your workplace.
Yeah, that can vary wildly.
So before we dive into like, you know, talking about tab nine specifically, sort of my last
comment here is you mentioned trying to maybe come up with how many waves and you refuse
to give a number, which is fine.
But then like going forward, I mean, what do you sort of see the future?
I mean, since I was in school, everyone always said, oh, maybe one day we'll just, you know,
tell the computer what we want and it'll just, you know, write the program for us.
And people scoff at that.
And, you know, maybe that's not exactly the future that that shaped up.
But I mean, and it's fine that this probably gets highly into opinion.
But, you know, if we sort of look maybe enough out sort of 10 years, you know, 20 years, what do you think will be the direction we head in for how developers
interact with their code?
So I think, first of all, it is pretty safe to say that two or three years down the line,
all code will be touched by AI in one way or another.
Either it will be generated by AI in parts, or it will be reviewed by AI, or tests will be generated by AI.
Something will be done by AI to automate the mundane parts of the job, right?
There are so many repetitive work being done, and a lot of it really can be and should be automated by existing AI machinery.
And I believe that this is already happening and it's going to accelerate as these tools
become more mainstream.
So I think that's like an easy prediction to make and I hold it very strongly.
Looking 10 years down the line, this is really like speculation and opinion.
I think in specific domains, it is going to be the case that you're going to see a lot of automation.
Like if what you're doing is writing components for UI of a particular area, right?
I know UI for medical devices, right?
It has to be very specific.
And the reason that it has to be, then a lot can be automated from intent. The reason that it has
to be specific or domain specific is because, again, it's all a question of intent. How do I
express as a human the intent to the machine? And this intent is always going to be very partial,
right? Otherwise, I'm going to have to write a lot of the intent to the machine. And this intent is always going to be very partial,
right? Otherwise, I'm going to have to write a lot of English prose to express what is it that I want. And this English prose is going to be ambiguous. And this English prose is going to
be harder to debug than actual code. So I don't believe that in general purpose programming is
ever going to be replaced by English. That sounds preposterous to me. Maybe I'm wrong.
But in domain-specific things,
I think we're going to see a lot more automation,
but it's not going to be...
I conjecture that it's going to be more along the kind of low-code,
no-code idioms that, you know...
Yeah, yeah.
It's similar to Wix, in a sense, doing website, right?
If you're doing website design,
you can get a lot done without writing a single line of code.
But this is kind of domain specific system
that can assume sensible defaults
when you don't provide them, right?
And so you can provide zero information almost to Wix
and it will do something sensible because the defaults have been baked into the system.
And this is going to pop up more and more for more domains are going to be automated, I think, in that flavor.
And for general purpose programming, which is always going to be around, I don't think anyone should be worried about their job.
The job is so much about specification discovery and not about specification implementation.
It's not like no developer gets like five pages
of English descriptions of the function
and then goes to implement it, right?
They get the higher level specification
and what they actually do is like carve out
the real specification with all the details from it.
And programming is about specification discovery, really.
It's not about translating English to code.
Just to riff on like what you're saying,
I mean, I think we have to get there.
And the reason why is because
something I've noticed happening and
it makes sense is when we say programmer, I mean, we tend to think about, you know, someone at a,
what do they call it? I always forget what the latest acronym is, but like a Silicon Valley
company or startup or whatever. But in practice, there are way more people writing code outside of
those areas than in an inside. And I think that's going to continue to grow
as software permeates just everything.
I was talking to a gentleman just like a few days ago
and he was saying, well, I have this idea,
but like, you know, I don't know how to find programmers
and I'm just thinking like, yeah, like,
I mean, what you're saying sounds cool,
but like, you're going to have to convince somebody
who can be paid a lot more
to do something a lot more worthwhile.
And that's not to say your idea isn't.
It's just like, and I was very, I didn't say that, right?
But like, it's not that your idea is bad.
It's just like, you're not, it's so niche
that like the return there
is just never going to be big enough to pay,
you know, someone what, you know, they can do,
you know, at one of these other, you know,
larger addressable market companies.
And so I think as you look across just every company needing,
you know, some amount of programs to be written and software to be developed,
we have to figure out collectively, or I guess someone figuring out collectively how to like,
empower people to do program writing at the level that can be trained more easily,
rather than requiring this, you know,
sort of academic background, you call it a sort of general purpose program. I think it's a fair
term for it, you know, being able to fully debug something, write completely custom greenfield code,
you know, this kind of stuff, but instead just turn out another version of something and make
small tweaks. And, you know, people in mom and pop shops are going to have to be able to do that.
And something like you're saying is a way that happens. don't know it's the way it happens but it's a way
we sort of get this stratification where programming becomes a much more diffuse term all the way from
this sort of low code no code down to the people you know optimizing assembly interlux for you know
microsecond improvements on you rates of AAA video game titles.
I think there's another
trend that will for sure
become stronger, which
is making programming
more forgiving in general.
I write a program,
it should run and do something.
And we've made huge
progress moving from
C, C++ to like Python JavaScript, right?
The barrier to get hello world, right?
The time to hello world has shrunk considerably
moving to these languages.
And I think we're going to see more of that
by the environment making also some sensible assumptions.
If what you're doing is writing like some website using Vue, React, whatever
is the latest cool thing to write website frontends,
then there are sensible defaults that could be made,
simplifying a lot of the grunt work that has to be done right now.
I think this is another trend that will continue to grow with the assistance of AI, because
you do need some sort of intelligence to figure out how to deal with the complexities there.
Yeah, one thing I've seen that's been really cool is, I mean, I saw this first with Wolf
from Alpha, and now there are some startups doing it,
but basically ways to query knowledge stores, like query databases. And you're seeing things like,
give me the average revenue for all the people in Switzerland, all the customers in Switzerland.
And just from that English sentence. So that's, as you said, that's something very domain specific.
If you just have an arbitrary database called Foo and all the columns are called bar and baz like they can't do anything but
but it can pull out unique things like oh this column is called country and and i just know when
people make a column called country i can assume what they're going to do and so yeah you're
starting to see this with databases and and those queries and i think it's just a matter of time before it gets to code as well.
Yeah, you're starting to see it,
but it's always super subtle.
And from what I've seen, at least,
you write this in English and then say,
but wait, when I said Switzerland,
what exactly did I mean?
Did I mean like also the part of Italy
that is, you know, working in Switzerland
or only Swiss citizens
or what exactly?
It's like you become in all these
subtleties that it's actually a job of
the developer to kind of go back and say
but wait, there's actually a question
here. Is it the citizens of the
country or
people who work in the country or what
exactly is the definition of these terms
that you're using?
And this is exactly is the definition of these terms that you're using, right?
And this is exactly where the programming language doesn't let you be ambiguous.
Yeah, that makes sense.
And this gets back to your code synthesis.
Like, well, you know, I've seen 10,000 people
ask the same question,
and 9,900 of them had these assumptions.
So we can use statistics to just give people a prototype.
And then they'll still have to look at it, but we'll get it right so often that it's super useful.
It's like Alexa.
You know, Alexa, you know, my parents have like a thick Italian accent.
And so Alexa gets maybe 90% of what they say right now, which is amazing.
It used to be 0%.
But at some point, it climbed up.
And even before it hit 90%, it got to the point where it became useful.
There was a crossover point to where they could just tell Alexa, set a timer or set the timer or whatever.
I did a terrible mom accent.
Sorry, mom.
But they do set a timer or whatever. I did terrible mom accent. Sorry, mom. But they do like, you know,
set a timer or something like that. And it's correct enough that they, you know, when it's wrong, they just to try again or use their phone or something. And it's still a net positive
productivity. Yeah. Cool. Well, let's talk about about tab nine for a minute. So tell us a bit
about what it's like to to work at tab nine are you
guys currently like wholly remote i mean the world's a crazy place these days uh how are you
guys sort of handling that and uh yeah yeah so we're mostly working remotely due to kind of
covid restrictions uh we do have a physical office which is is mostly empty, I guess. But people do come in.
Yeah, we have some remote people in the US as well.
And we're always hiring interesting talent.
On hiring, do you guys do internships?
What kind of background are you looking for from people?
I think internships are, again, really hard to do do remote i think you're missing a lot of
the experience and the immersion and like meeting a lot of people and so i'm not a huge fan that the
current uh climate let's call it so watch this space okay yeah exactly but but uh definitely Yeah, exactly. But definitely interesting. And we're always hiring people who have like an LPL background,
like machine learning programming languages background.
Tab9 works on the intersection of programming languages
and machine learning on representations of code
and learning models of code
and doing efficient inference for models of code and learning models of code and doing efficient inference for models of code
and stuff like that.
We're kind of probably the biggest Rust shop in Israel.
And we take a lot of pride in being Rust and Rust enthusiasts
and dealing also with all sorts of low level inference code using Rust and also assembly, actually, to be honest.
Yeah, I mean, that's fascinating in itself.
I have many questions about Rust and about, you know,
this sort of like low latency response time things
are very interesting.
I think that's an area of growth.
Anyways, can't get into that.
We're running out of time.
So, yeah, how can't get into that. We're running out of time. So, yeah,
how can people try out
Tab9?
Like,
what is it?
Are people able to use it?
Is it cost an arm and a leg?
How does it kind of work?
Yeah,
so Tab9 is completely free.
If you're like
a developer,
you can download it now,
install it,
and use it completely free
forever.
The free product
is very powerful
and useful. and we have millions
of developers using it in their IDE as we speak. I think it's really, really powerful
that the free is almost too powerful, the free version.
That might be biased, but yeah.
No, I'm serious, actually. But if you are working in a team, you get exactly to what you said earlier, Patrick,
that you would benefit greatly from training a model for your team.
And these private models are how Tab9 makes money at the end.
We train private models for teams and you just connect your repo
and you get a private model delivered completely
automatically for your team that knows the kind of the idioms of your team and the vocabulary you
know the entities and what's going on in your code base and gives you much better tailored completion
completions for how you do stuff and you can also customize it to say things like,
oh, you know, don't train on that legacy code.
We're actually trying to escape that.
Yeah, sure.
If you keep training on it, you get like this effect.
Yeah, you get the Jupyter level kind of gravity
of the legacy code that you could ever escape, right?
Yeah, I mean, that's like anytime we point our code count at,
we use protocol buffers and it generates tons of code
and your line of code count is swamped
by all this auto-generated code.
And I think there that, you know, it used to be,
I always thought, you know, oh man, stuff's expensive.
These developer tools are expensive.
But then now like starting to think a bit more
with a manager hat,
I realized actually like programmer time is very expensive.
Anything that helps people sort of go faster
and not break things and make mistakes.
Oh wait, I think I'm misquoting.
Anyways, you know,
anytime you can sort of help people in that domain,
like it's enormously valuable.
And I think people are catching up
that like programmer efficiency is a huge bottleneck. It's a huge bottleneck. And we see, you know, a lot of improvement when
using tab nine, like users report between 15 to 30% improvement in their productivity, depending,
again, it depends on how, let's call it, mundane part of your coding is.
Or if you're using very languages that contain a lot of boilerplate
and Java comes to mind, for example.
Then it's really, really useful.
If you're writing front-end code, it's extremely useful.
I think in terms of languages,
our distribution is kind of JavaScript,
Python, and Java,
I think maybe are the top three languages
of tab nine users.
Yeah, I think it's extremely useful
in all languages,
but these three see like huge adoption. Also PHP and I think
PHP and Rust. We use it ourselves, obviously, for Rust. This thing that you kind of alluded to also
intrigues me because this is, again, thinking with Manager Hat. It's sort of not talked about,
and it's pretty expensive, which is when you bring new people onto the project,
even if they know what they're doing,
like getting them adapted to the sort of idioms and styles
and all of that through code review,
takes probably even more than one-to-one number of hours
for those new people to write the code,
of people to review and correct the code.
But it's also frustrating.
It's a, you battle sort of like, this isn't a personal thing like egos come in like it's a very
expensive number of hours drain on the team thing to bring new people up so having a tool that sits
alongside of you and sort of like in a in a low ego risk way sort of encourages you to write
conformant code i mean to, this actually is super exciting.
Yeah, it's low-ego.
And also, it's there for you in the sense that you don't reach the code review and get
slammed, right?
Because you already get something that is...
The worst case is that you've done the common thing, right?
So it lets you really punch above your weight in a sense right
because like you know i'm at least as as bad as as as the average i will not fall below that and i
think actually much better and an interesting thing that happens with tab nine is that when
you see the suggestion it also makes you think sometimes, like when I use tab 9,
I get some suggestions like, but wait,
that's not the way I intended
to do this, so what am I
missing? And actually, it turned out
several times that there was just a new
API that I didn't know about, right?
And people started using, but I'm like,
I haven't touched that
for a long time, so discovery there,
even if you end up not using it,
just being aware that this thing has changed is really interesting.
And this also kind of gives you a picture of what's going on, right?
As you're programming things that you would not be aware of otherwise.
So that's another aspect of that.
Nice.
So in the show notes, we'll have a link to Tab9,
to their Twitter handle, to Aaron's Twitter handle.
If you have any questions or reach out to him
or you want to check out the product,
I mean, it's super exciting.
I'm going to go try it,
even though you didn't name my favorite languages.
We're going to go see how this goes and give it a whirl.
But I had a very enjoyable time.
Thank you for coming on with us.
Thank you so much for having me. I had a very enjoyable time thank you for coming on with us thank you so much for having me i had a lot of fun
music by eric barn dollar programming Throwdown is distributed under a Creative Commons
Attribution Share Alike 2.0
license. You're free to
share, copy, distribute, transmit
the work, to remix, adapt the work,
but you must provide an attribution
to Patrick
and I, and
share alike in kind.