The Infra Pod - When will AI take over developers? Debate with Guido Appenzeller
Episode Date: August 21, 2023YAIG Maxis Ian Livingstone (Snyk) and Timothy Chen (Essence VC) are back this time with another maxi Guido Appenzeller (ex-VMWare & Intel Group CTO) to debate about how AI will change (replace) de...velopers in the near future. There are more questions generated than answers, but dive in with us to talk about some of the possible things that can happen!
Transcript
Discussion (0)
So welcome to yet another Infra Deep Dive podcast and we got Maxis this time.
So we're super excited to get all of us quick controls to start.
I'm Tim from Essence VC. I'll let you take it away.
I'm Ian. I help sneak,
turn yourself into a platform and do a little angel investing on the side. And I'm super excited
to be joined by Guido from Andreessen. Could you tell us a little about yourself? You have like a
storied past. I recently looked at your LinkedIn and it was like, you know, 20 links deep of like
historic history. So I'd love to learn a little about you
before we dive into it.
I think I'll have a hard time keeping you in the job here.
Yes, thank you.
I'm originally from Germany,
came to the Bay Area to do my PhD,
you know, succumbed to the bad habit
of so many Stanford students of starting companies.
I started two companies,
one sold to HP, the other to Arista.
Then I was CTO for VMware,
for cloud security and networking, ran product for Ubico, you know two, Arista. Then I was CTO for VMware for cloud security and networking,
brand product for Ubico, you know, Ubiquiz.
And then most recently before Andreessen Horowitz was CTO for Intel's data center division.
So, you know, Xeon CPUs, networking, storage, IoT carrier.
It's like a very, very big business unit.
And then since October here at Andreessen Horowitz,
having a ton of fun with AI and infra.
You know, I guess we really wanted to focus on this whole episode on AI and LLMs and that
future.
And we're going to have a very specific topic here.
Tell us more, like how did you get from all that founder research and a ton of like lower
level infra hardware, and you just jumped all the way up here now doing lots of software,
AI, LLM stuff. What is the way up here now doing lots of software, AI,
LLM stuff.
What is the primary motivation and goal when you got into this space?
So to some degree, it's actually taken me back to my roots.
During my undergrad, I did AI, like computer vision, autonomous robots kind of things.
And back then, I think the neural networks were about a factor of a billion smaller or
so.
But other than that, some of the same technologies applied.
And so for a long time, AI was sort of a little bit the technology that never quite worked.
It used to be a joke that if something works, it's in systems.
If something's not supposed to work, it's theory.
And if it's supposed to work but doesn't, it's AI, right?
And 10 years ago, that was sort of correct, right?
You know, most AI demoed fantastically well
but it didn't have the robustness and the sort of unit economics that it made sense for anybody to
use it right and then the big hyperscalers they figured it out i think the mid 2010s right and
then sort of you know suddenly roughly last august right it really took off for the mainstream
technology has always interested me and you know i jumped right back in and it turned out to be an incredible time. I think it's the first time probably since
the internet, right, that we've really developed a fundamentally new building primitive for systems
that usually has very profound impacts on the entire way how we build pretty much anything.
If you look at what the internet did, it means we can suddenly send communication at, I don't know, 100,000 of the cost as previously, but it also
meant that Walmart got replaced as the largest retailer in the US by Amazon, right? So it really
sends ripple effect to the entire economy. And that's, I think, what's exciting me, right? We're
building a completely new primitive here. It has very different properties. It's, you know,
non-deterministic. It's not constructed bottoms up, but it can solve problems that we don't understand in a theoretical way. And it's just
amazing. So I'm having a ton of fun and trying to keep up with everything that's happening.
That's the best answer to jump back in. I've heard a long time. Oh, that thing I studied
for my PhD is now relevant, and I'm so excited with the future. It also goes to show how long
it takes for new novel things to actually make it to market.
Like if you look at the original neural networks in terms of research to where we are today,
that arc is super long.
One of the things I'm very interested in, I'm sort of intrigued to pick your brain about
is Tim and I spent a lot of time talking about trends and infrastructure.
Obviously, LOMs, there's multiple layers here in terms of what they represent for the next
generation, how software is built, who builds software, and the underlying, like literally the wafer that powers said software.
Curious to zoom in and get your take. stack of the raw hardware, the infrastructure, which when I talk about infrastructure, I think
with things like, you know, the database or, you know, the object store or, you know, the network
router up to the like the usual layer of the people that are programming and stitching these
different components together. I'd love to sort of get your perspective, how LLMs actually change
it. It's the question I'm trying to pull apart all the time. And I think a lot of our listeners
are as well. So I'm curious to get your top level take on those. The honest answer is we don't know yet. It's like your internet email was
just invented. Now predict how this will affect everything, right? It's an impossible task,
right? I think it's going to have ripple effect for a long period of time. We're still in a phase
where we're trying to take LLMs and retrofit existing software with LLMs. And something else is starting to emerge where basically you build things bottoms up around
this new compute primitive, right?
And I think that gives you complete structurally different solutions, if that makes sense.
That makes tons of sense.
When I think about the way that LLMs are making their way into infrastructure software, the
way we build is they're very much focused today on the user side, which is GitHub Copilot being the greatest example. It's like, how can I expedite what a developer is
already doing? In many ways, it's very much, we think of the model of infrastructure, it's sitting
on the periphery, right? We're out here on the edge. And most of the times when I think about
the way that we adopt, like if I'm a buyer buying a new system, I'm not going to make a bet on a
core piece of my fundamental infrastructure, you know, switch out my database to some new database. I'm going to go try it out on the edge
and slowly build confidence and prove it out. And over time, it gets sucked in deeper
and deeper and deeper. And as Martine would love to say, it gets layered as you build
on things on top of it. When it comes to the developers, lifecycle developers, what they're
doing, what are the limitations of, let's say, GitHub Copilot
today or LOMs in terms of the
developer feedback loop? I'd love to have a discussion around how that moves forward.
It's a great set of questions. So let's try to structure it a little bit. So I think the first
observation is coding is something that requires correctness. AI is much better in tasks that don't
require correctness. Like creating a really cool image that don't require correctness, like creating a really cool
image. There's many possible answers, you know, like writing source code or designing a bridge.
If you're 90% there, usually you don't get much in terms of brownie points. You have to nail the
answer, right? And I think the net result, as you pointed out, is that it's not yet the main thing,
but more in terms of periphery or assisting function, right? It's called co-pilot for a
reason and not pilot, right? It doesn't Copilot for a reason and not Pilot, right?
It doesn't fly the plane.
It just maybe gives you hints.
Honestly, Copilot is a lot more useful than GitHub Copilot,
in a sense, for flying planes,
because they can do it if necessary.
But Copilot will just suggest stuff,
and it's more the intern, right, for your coding project.
For me personally, I think I'll never again write code without it, right?
It clearly gives me a productivity gain, specifically for new frameworks, new software.
It reduced my stack overflow searches by a large amount.
And the data that we're seeing seems to suggest that depending on how hard what you're doing is,
it can get you between a couple of 10% and or to 100% speed up, right? And which is, hey, I have $100,000 a
year developer, you know, in terms of cost to me as a company, and that person now can be twice as
productive if I pay $300 a year, that's a pretty easy economic case. Not hard to understand,
that's a good idea. The really interesting thing for me, as you pointed out, is, you know,
let's assume I would redesign all the coding process, all the coding tool that we had, under the
assumption that LLMs is sort of an everyday tool that's free, and everybody has access to it,
what would you end up with? It seems pretty clear that it will be completely different from what
you have today. I have not seen anybody yet who has a comprehensive vision for it. But I think
the way how I imagine it is that today, right, if I'm writing software, I run it through a compiler, and then I iterate until it compiles successfully, right? And then
I check it in. We have Git. So, you know, so other people may look at what I've written and then say,
Guido, let's modify this a little bit because of some other changes elsewhere in the program.
I think we need to take all of this methodology and map it back into LLMs, right? So maybe
in the future, I have a tool where I'm putting in a clever prompt that says,
this is a problem I want to solve.
Write some code for me or break it down.
And it breaks it down to subtasks, then writes code for each of these subtasks.
Then I look at the whole thing.
They're like, well, this looks pretty good.
But this particular subtask, you did the completely wrong thing.
So let me annotate the prompt there until it works.
And then I probably want to take all of this prompting
and check it into some kind of system, right?
So that if somewhere else in the program,
something changes in this,
and the LLM should take that into account,
I can basically re-execute that.
For this to make any sense,
we need a fully deterministic, like temperature zero LLM.
And we probably need a very different
code development environment, you know,
like VS Code doesn't quite make sense in that content.
I don't know.
I think it's a super fascinating question.
Yeah, yeah.
So I think we're already jumping into what we actually
want to go for.
So let's just start the spicy future.
Spicy futures.
You probably don't know what it is, but it's very simple.
We're talking about just hot takes, right?
And we usually structure this basically to talk about changes. Two to three years, five years, and 10 to 15 years.
Based on what we understand right now, what do we expect to happen both from the technology point
of view and from the product point of view? I guess you already started, so we can let you
kind of roll. If you just got to take a complete guess,
you know, do you see more changes around developer related workflows and things to impact even further? And what do they look like? If we kind of illustrate a little bit more,
what do you like to see happen? Feasible in the near term?
Yeah, three years is not near term, man. It's really far out. I mean, let's put out some starting
assumptions because I think it might help a little bit, right? 100% of developers will use Copilot or
something similar. And Copilot is not just going to be suggest code to me, but it's going to be
a chat box where I can ask questions about code or give it instructions for the code.
The latest beta VS Code with Copiloty literally has a fixed bug button,
or in Python, an add types button, right? Which, you know, you would have told me that a year ago,
I would have said you're completely crazy, right? How's that supposed to work? But this idea that
I basically have more high-level code-shaping primitives, right? You know, write a unit test,
these kind of things, right, that I can use for my code.
So that seems kind of, I think, just the linear extrapolation or just proliferation of what we
have today. If I'm too far off the reservation, please stop me here. No, you're killing it.
The question is, where do we go from there? And one thing is, so much about software development
today is around collaboration.
Right. You know, GitHub and pull requests has completely changed how we work on these things.
This so far is non-overlapping with LLMs. Is that fair?
There's nothing and no collaborative feature in Copilot whatsoever.
That probably needs to change next. Right. If you have a highly collaborative process, then that piece needs to become collaborative as well.
So how would that look like?
That's the question I keep asking.
You spoke about, like, we're going to check in prompts.
And, like, in reality, we're checking in prompts.
What is a prompt?
Well, it sounds a lot to me like a very detailed product requirements document that you supply to the LM to generate the output.
And part of that definition of the prompt is going to be,
well, we know already when you write a prompt,
you say, well, here's an example of what I, is it working?
And here's an example of what I want.
And please generate, you know, take this high level thing and then translate it into a lot of O-level stuff.
You know, turn the, take this sentence or two sentences
and this base image and turn it into, you know,
Guido is like Iron Man or Captain America.
I think that's your image, right?
That's what these LLMs are great at, is really extrapolating from some seed text or some seed
into a fully-fledged output.
You said it's like a requirements document.
A requirements document is targeted at humans.
If I take input for an LLM to write code, that is
ultimately targeted at a machine, which is the same as source code, right? Source code gets compiled
to machine code. So are you saying this, in what sense is it closer to something for humans? In
what sense is it closer to something for machines? Does that question make sense?
It does make sense. It's a great question. It's a question I keep asking myself.
If you're checking a series of prompts, which are generally input to an LOM, those prompts are also human readable, right? Is it not possible
then just to flip that on its head and be like, well, skip the developer, I'm just going to write
a definition for what I want, and the system emerges. And then the next question of, okay,
well, that's amazing, because now you have moved the bar of who can create like an app, fully functional working app, way up the stack in terms of like people's ability. And so then you also have the
final question, well, what do developers do in the future? That's a good question. You know,
that's for another time, maybe for later in the podcast. But the next question is, how do you know
it's right? And I mean, we already have that today, right? We have unit tasks, we have integration
tasks, we write them in code. There's no reason you could write the tests as a part of your doc
in the same way that we have, here's your input,
and here's my desired output, that could all work.
And so to that point, to ask your question is,
we always go through step functions, right?
We always have linear step functions.
So today, we have human-readable, machine-compilable programming languages.
But in the future, if you were to say, go to where you and I are discussing, then you end up with human-readable, machine-readable, machine-compilable programming languages. But in the future, if you were to, say, go to where you and I are discussing,
then you end up with human-readable, machine-readable,
no intermediate, straight to the machine language,
which is vastly interesting potential change in the way that you would
if you imagine that future.
Yeah. I mean, does that mean we're going to go back to test-driven development,
basically?
Potentially, maybe the job isn't to write code.
It's to just write the test
to whatever you want the output to be
and let the AI model just continue
to search for whatever it writes.
If we're going to see an even more powerful model
in the future and you can't even reason
with the behavior inside,
let's just forget reasoning any behavior.
The only way to make sure
it's going to the right direction
is we just
basically write objection objective functions which is basically integration tests and system
tests up front this is all the behavior we want this is all the things we're looking for let it
just go and do your little like evolutionary algorithm type of thingy but it just go finds
a prompt that could able to describe is that to where we can go? But I can see that as actually maybe one approach to having such a powerful AI.
And our job is to just make sure the output looks the way we want.
I see one issue there, which is, look, why are programming languages what they are?
It's not because we're trying to make something hard to read.
Maybe some programming languages are guilty of that, but the majority, I think,
is not, right? It's more that accurately describing a solution in a very informal
language is actually extremely hard, and having a formal language makes it vastly easier.
The nice thing about a programming language is that it easily lets me express something that
takes into consideration all the edge cases, which plain English does not. Now, why does it still
work if I write a specification that at the end of the day, the which plain English does not. Now, why does it still work if
I write a specification that at the end of the day, the right software comes out? I mean, often,
in many cases, it only works because the developer comes back and says, like, hey, Guido, what on
earth did you write there, right? I have no idea if this means A or B or C, right? So there's a
dialogue back and forth, right? The other half why it works is because the developer knows a
tremendous amount of context, right? They've sat in the
meetings, they've heard, you know, they understand the high level picture. So my current guess is,
if you give an LLM, just natural language and ask it to write a program based on that natural
language without anything else, that's an unsolvable problem, because it's an ill defined
problem in a sense, in the sense that you can't describe the context a human would have
into text and then give it to a machine for the machine. Like the error rate is too high or the
amount of knowledge is too difficult. Is that what you're saying? Yes. And the context window is too
short. There needs to be an interactive process in many cases. If you write a spec and throw it
over a wall, what's the chance of a program coming back that does exactly what you want,
right? I mean, I think that has never happened in my career. It's usually like,
what on earth is this, right? And then you sift through it.
It's a super solid point. And so, I mean, it comes to the point that to have a fully described
working product may be impossible. I guess the other question is, how do you simplify
what you're describing to the LLM and its repertoire of tools the LLM has
to pull the solutions together
so that it's a solvable problem space.
One of the things I think about in this discussion
that prompted it as we were discussing is,
well, the LLM doesn't actually have to understand
how to generate every piece of line of code.
I mean, in this, developers, we use libraries
and we buy services and we spend a lot of money with Amazon
and we use Snowflake.
There's no reason an LLM can't do the same thing, right?
It can know that, oh, I am Snowflake's LLM, this data engineering project,
I'm going to use all Snowflake tools, and I have been programmed to use the ecosystem.
So on that level, it seems highly solvable.
So I guess it's about level of abstraction, how far you're going to go up the stack
away from the core problem that LLM can actually significantly solve.
But then it brings up another interesting problem,
and this is where you get error rate on top of error rate on top of error rate,
which is, well, then you could string multiple of these agents together
where they're interacting and programming,
which I think is probably the more middle ground place we end up with.
I wouldn't say three years.
I think that's ridiculously ambitious.
But a decade from now, where a company buys the Datadog agent, and they buy the Amazon
agent, those are your programming pairs that live across your entire SDLC, from the time
you write code to operations and back.
That's where I go based on this conversation that we're having.
Is that insane?
Are you following where I'm going?
I think it's spot on to me. I mean, I think it was Jan de Koon who pointed out that for any sort
of auto-aggressive model, error accrues exponentially, right? If I always look at the
past, whenever I make a small error, that error is now reflected in the past and I add error on top,
right? And that just gets worse over time. So I think whatever we do there, my guess is you need a solution
where occasionally,
most likely through human intervention,
you pare back that error, right?
So it needs to be an interactive process.
I mean, how are we using this today, right?
If I'm using Copilot,
very often I'm writing a comment, right?
This function does something
and it suggests a function to me.
I go like, well, that's pretty good,
but didn't think of that particular edge case.
So I'm starting to expand my spec, so to speak, right? You know, my little comment on top
that says, you know, and if the input is X, the function should do Y, right? And then, you know,
it gets a little bit better until it works. I mean, to me, it's really taking this process and
making it scalable. You can almost think of an LLM as a compilation step. In goes your mini spec,
and out comes a program. And in many cases, you may have to change your input code to get the right output code.
And then we want to work this together.
I think this is all the thing we have to figure out.
One thought I have, today, OpenAI started this huge model.
And we all just kind of like seeing the potential of it.
But we can't rely on it.
I can't check in a GPT 3.5 turbo prompt, and then suddenly everything just breaks. As you know,
the paper already shows that it's fine-tuning on its own. We don't know what's happening behind
the scenes. And the model changes too. And so in the future,
it feels like there might be possible iterations where
you're going to have a bunch of Uber models that we somehow need to be
constantly adjusting for it
doesn't seem like we can check in any prompts at all so then which means should we fine-tune a
bunch of smaller models that doesn't change as much that we as developers actually understand
our prompts are basically our source code to the specific compiler in some way like if we treat
llms compilers then we have fine-tuned
LLMs as different compilers that does some particular task really well.
When I think of a PRD, turns to the program, the developer job, one is to actually understand
the business context, also to understand the sort of service and the technical aspects,
like what are all the edge cases I need to test for?
What are all the service liability stuff I need to do, right?
What are all the, you know, the HA and all that stuff.
We built that technical and built muscles into the backgrounds.
Can we take that as individual LLMs that can be integratable into some workflows?
That's the SLA LLM that understands how to take a workflow engine and add some...
I don't know.
To me, the only way to have prompts to be transferable
means the models can't change as much.
If it changes, all
hells break loose. You don't know exactly
what the output is anymore.
If you look at it, OpenAI, after
some pushback on the changes at GPT
3.5 and 4, they said they're
going to keep the older versions...
It might have just been 3.5 Turbo.
They said they're going to keep the older versions. Sorry, I think it might have just been 3.5 Turbo. They said they're going to keep the older versions around longer,
which I think is good.
I mean, look, the other thing that left a crater
in the last couple of weeks was Lama, right?
Where we now have an open source model
where you can simply grab the weights and freeze them, right?
And say like, look, I'm going to run with this.
And I mean, if you switch from Python 2 to Python 3,
you know, probably have to edit your code.
If you're switching from Lama 2 to Lama 3 in the future for your code generation exercise, maybe you have to, you know, probably have to edit your code. If you're switching from Lama 2 to Lama 3
in the future for your code generation exercise, maybe you have to, you know, re-edit some of your
prompts as well, or your specs, or whatever they will be called. Like how big of that switch is,
right? Because I think for Python, we can learn all the rules, we can learn everything. Prompts,
I don't know how to learn the rules, right? It's like a black box, so we kind of have to like
morph with it. I feel like the developer workflow in the future will change quite a bit depending on what the models exist. What is the right way we treat
the models in our workflows? And how do we consistently have the right outputs in different
situations? What is the right modality based on the technology? It's hard to tell.
One of the things is like, I often think is like, what's the source of truth to the program?
Is the source of truth for the program in this world the prompts, or is it the code?
And today, in our world, with Copilot, the source of truth, it's my Python code, it's my Go code,
and the version of the compiler it's pinned to that's turning that into a machine,
and the instruction set that it's outputting.
Those three things tie together the source of truth and create some form of determinism in our world,
so that every time I compile, I usually get a program that works.
And that's not always true for various reasons, as we all know,
but largely speaking, it's true.
And so it's interesting,
we have a couple of challenges.
One is you have model drift.
Fine-tuning under the hood that you're unaware of is model drift.
The output is different than every time
you run the prompt over time.
The next problem you have is you contact window challenges,
which is, how can I literally fit all of this text? We think about at scale enterprise code
basis, 10 million lines of code, 100 million lines of code into an LLM fresh. And there's
ways that you could pare that down. I'm sure we can optimize that problem to a certain extent.
But so it has all of the context. So you can then generate semi-accurate piece of code.
And then to continuously do the loops, you have to have a system that's checking that,
hey, this thing it generated actually fits in.
That workflow is the part that I think is still deeply,
one, we don't really understand it.
And two, the two challenges I mentioned,
which is contract window size and the model drift problem,
which, you know, freeze weights.
And if you could stick it, maybe that does solve the problem.
Those are things that challenge to me to say,
actually, the source of truth still has to be the source code,
not the prompt. Is the source code really always the source of truth? Let's take Windows 11. You know, there's so many dependencies on libraries, tools,
build environments, unless you have access to sort of the standard one or the prescribed one,
you might actually end up with a different binary. Yes, you can pin all libraries and pin the compiler and so on. But I would say it's at least the source code plus the build system
in a sense, right? That's the source of truth. And for practical purposes, honestly, if I file
a bug report against Windows, I'm going to specify the binary, right? This is the version XYZ with
the following hash. That is the source of truth, right. But I think it's a bit more nuanced there, isn't it?
One question just leads to the 10 questions.
There's really no answer
in this kind of style of discussion.
But it's actually great
because this is where we're excited
to see the future might be, right?
I feel like we should move on
to talk about maybe in a more,
even distant future,
which I think is even more uncomfortable
because we haven't even solved anything
in the near term.
But do you see in 10 years software development change drastically?
Yes, I think it will change no matter what, right?
What kind of change do you like to see happen?
I guess maybe that's a better way to ask.
This is a big discontinuous innovation, right?
Which usually means a couple of things.
People who are able to harness it get more powerful.
People who ignore it or don't want to take it up, sort of fall behind and get sort of pigeonholed a little bit in the legacy corner.
So I would expect all those things happen just the same way.
I don't know, when the cloud revolution happened, if you were purely operating below the cloud layer, life got a little less interesting.
And you can only work for a small number of companies and there's less growth.
If you're jumping on that train,
there's huge opportunities ahead of you, right?
And I think the same thing is true now.
At the end of the day,
whenever something makes a task more efficient,
that usually increases demand for the task
just because it lowers cost.
So my guess is we'll see more software development
in the future because it's easier.
My guess is if you're a software developer who can harness these new technologies,
you're probably going to end up in a much better spot than before because you've essentially got
more powerful, right? You've gone super side and you can now with like a couple of prompts,
you know, create crazy complex programs. That's an incredible opportunity, right? So I think if
you're a software developer, this is a super exciting time to be alive.'s sometimes people asking, like, you know, is it still worth studying computer
science? My answer is always like, hell yes, right? Or if you just found something that makes
it even more powerful, this is a great time to do it. I mean, the way I think about it is,
so in order to answer your question, I have to scroll back and say, what do I think LLMs are
going to like first be autonomously doing? I think it's going to be small changes. If I think back
to the first time Dependabot
on GitHub opened up
and upgraded to my package JSON,
I was like, this is magic
and then quickly also found it annoying.
It's incorrect, it doesn't do the
full source create up trade, and it's annoying.
But it shows the
brief idea of what
the very beginnings of autonomous agents could do and how that focuses engineers
on solving more interesting problems.
I don't have to deal with upgrading my packages or fixing some CVE
or rewriting some console.log someplace.
I'm focusing more time on being creative and using my brain
for what it's good for.
So I think the first place of innovation, the first place
we'll see real application
that drives real concrete productivity gains
in both for the individual developer,
but also at large-scale enterprises
will probably be in these workflows around remediation.
Nobody wants to do it.
And it is very low on the priority scale
of some program manager, right?
To the point where you end up with a program manager
running some program that has to keep on winding.
I think if I scroll out and think more from where that takes us and how that gets productized, I think we end up with what I
talked about earlier, which is this world where it's kind of going to be like
in the Avengers when Thanos gets the glove. You're going to buy
different agents and it's like each one's going to give you different powers. And some are going to give you
the power of observability, right? The power of sight. And that's going to give you different powers. Some are going to give you the power of observability, the power of sight,
and that's going to help you generate more correct code
at the front end of the stack.
As I'm programming, it's going to look into a bunch of runtime
context about how the system's behaving.
You should not run the for loop this way because that's going to blow up
the stack. This is a really hot piece of code to run at scale.
You could think about
security superpower,
which is like, hey, if you do this, you're going to
send PII in a log someplace.
That's a really bad idea.
More to what you said, which is
Copilot's a single-player experience. I actually think it will
probably retain relatively single-player
and the workflow that developers have
won't drastically change initially
in the next 10 years.
I do think there's an interesting question
of like 25-year timeframes
when we've really got these things in production
and we really have the next evolution
of these architectures,
both on the hardware layer,
but also the actual fundamental neural network.
And everyone has like these app-scale data systems
and we have years and years and years of feedback
from what's working, the UX is down i think that is that's interesting and i find it
difficult to like think about what it looks like but certainly some of the conversations we had
around the future of like how do you actually you get to the level where i write a doc and there's
a product like i think that future is possible i just don't think we're anywhere near close enough
to it actually existing in a way that you know, in a night, actually build an app and turn it into 10 million
users running it. We're just not there. But certainly, I do think we end up with the
copilot analogy is the right analogy. And I think the biggest problem we have, it's the fact that
they're non-deterministic. That gives us these great powers of creation. But on the flip side,
we need something else being the
reinforcement that's saying, hey, the thing you just
generated, it's wrong, and here's why.
We can change them to be
deterministic easily. Set the temperature
to zero, and you have a deterministic
LLM. Honestly, I think
there'll be many LLMs in the future that run
in a deterministic way for that reason.
Yeah, when I was thinking determinism, I'm not thinking
just of the input-output determinism.
I'm thinking of the fact that the context limitation
introduces error, right?
There's only so much data you can give the LLM,
so there's only so much of a, like...
I kind of think of the context window as
how big are the blinders do you have on the eyeball?
Right now, the blinder is like a speckle of light,
and in the future, maybe it'll get bigger.
But I also think of non-determinism in that.
So that's what I meant.
But I think your point, your point is still stands.
Look, I think what we've seen with other LLM applications
is there's this notion of in-context learning
where basically somehow with a database query
or something like that,
try to find the relevant data for this particular query, right?
And then just add it to the prompt.
And my guess is it'll look the same way in the future, right?
If I'm asking to write some code to call an API,
then I'm probably going to go out,
fetch the API documentation,
the relevant parts and stick that in, right?
And if not, it'll just come back
and ask an agent to fetch it
because it sort of needs that, right?
Actually, that's an interesting question to ask.
I think right now, if you're trying to play forward,
we want this PRD to product future to happen somehow.
Will developer tools company in the future
not ship developer ID kind of tools anymore?
Will we just ship vector databases with an API
that's like very stuffed with very particular context
of what you want to do?
And also give you a particular LLM
with a context knowledge graph for you to actually pull and also give you a particular LLM with a context knowledge
graph for you to actually pull from.
And that's my API.
Maybe that's not crazy to think about.
Because if you use an LLM, right now the assumption is LLM need to be completely trained or, you
know, fine-tuned or something.
What if we give you all the context, that knowledge graph for you to be able to grab
from?
Like, this is all the C-sharp coding, you know, all the syntax and everything.
And then give you the most complete output, correct correct code and then use that as a way to then figure out how to build some intermediate language you could go a step further right and
just say like an api ships with an llm that can answer questions about the api yeah that's true
and so it's kind of meant it's maybe that LLM itself either is already completely
trained for the knowledge. The developers
in the future will be picking LLMs, which
ones to be able to compose my stack.
The future stack
is not, you know, I'm using
Go with Node.
This is like
I want LLM, you know, A,
LLM B, Ian, Guido, and
Tim, LL, you know, as my stack., and Tim, LL, that's my stack.
And those three have superpowers able to generate these kind of infrastructure and code
and somehow able to kind of piece that together with an output.
So basically, my first call to an API would be just a natural language description
of what I'm trying to do when it returns to me the code snippet that does the right thing
for that version of the API, right?
Yeah, yeah. And we can able to guarantee and when API upgrades and only upgrades our agents,
like the downstream desk cascading, we can take the LoRa, you know, layers and just propagate
to everybody or something like that. Yeah, swagger plus plus plus or something.
Yeah, yeah. I don't know it's interesting just to think about like
how do i ship knowledge when i ship the tools because if you just ship the tools we have to
absorb all the knowledge and figure out how to do all of that but i think developers in the future
should able to be equipped with the knowledge way faster because of the ai lms are involved here and
our job is to be able to know the right boundaries of context where I know
this tool knows and doesn't know and know how to compose.
I'm still doing the Legoing,
but this Lego is no longer just piecing code and node express and ourselves
is actually more like, okay, this LLM knows these ecosystems really well.
I know these LLMs know these things really well.
I can start putting the prompts together. I know intuitively,
and there's maybe another LLM that knows how to do the business context function testing
really well as well. And I know, understand the boundaries somehow, which means temperature zero
and have a way to actually see how that AI has been developed somehow. I don't know exactly how
that happened, but it feels like that's what people will be interfacing in the future.
There's sort of an interesting analogy there. And boy, we're pretty far out there. So, you know,
I have no idea if any of this is true or not. But if you look at image models for a second,
it's a different area. But if in the past, I wanted to explain to you a particular style,
let's say the style of a famous painter, in a way that allows you to generate images in the same
style that was really hard i mean probably a textual description doesn't really work and can
show you a bunch of examples or so all right but you know it's sort of even if i did all of that
it would still be extremely difficult for you to actually generate a painting that looks like him
and then what we figured out with with stable effusions that we could actually essentially
create a model like a fine-tune,
right, that basically encapsulates that style. If I send you that model, you can simply give
that model a prompt and you get a picture that looks like that particular painter, right? And
that has caused us all kinds of excitement and problems and the internet with copyrights,
but also with, I think, some really amazing, cool applications. If you take that and transform it on APIs, you could say in the
future, I might be able to give you an API documentation in a way where you can specify
a natural language thing you want to achieve with the API. And the model that I give you or that I
host will then generate the code in your particular language of choice to actually perform that task.
That's kind of a mind-blowing idea, right?
I mean, that's so much what we do in software development today is sort of cobbling together
APIs and understanding frameworks, and then they could potentially automate some of them.
It's really interesting.
And it also strikes me that when I think about this future that we're talking about, this
idea where we have these LMs generating this code, that they're really taking a lot of
mindshare from me.
It also strikes me that while the barrier to entry is decreasing,
the debugging experience,
the amount of information you need to go in and figure out,
because you're not actually writing the system.
I remember the first time I used Ruby on Rails
and trying to get that thing to work because I made some changes.
It was like magic to me in the sense that it generated a hello world page.
And then it was like the most befuddling experience to me and trying to actually get it to work
once I messed it up.
Right.
Like, and I can only imagine what the world will look like and the experience of engineering
will be or software development would be using all of these LOMs tied together.
And like, at the end of the day, there's still going to be points where something's not going
to work the way you want it to work. And you're still going to get in there, you're starting
to figure out, well, is this, you know, a state synchronization issue? Is this some type of timing
problem? Is this like, there's all these complexities that just exist, maybe we can, you know, have
auto scale, beautiful underlying infrastructure that scales dynamically, and system just understands
how to do it. But it's very difficult to solve some of these problems that even humans have trouble understanding,
which is state synchronization issues is a good example.
So it strikes me that debugging the output of this
at the end of the day is still going to be
one of the most challenging things.
It's actually fascinating, right?
Because there's not many of the opportunity
to just kind of just think about what might happen.
But I actually do think some of the things we talk about are not 10 years. These are doable in the short term.
I've never been able to predict the future even remotely correctly in a 10-year
timeframe in infra. But here's one example. When we looked at Pinecone early this year,
like in January or so, I'm not sure I understood the use case in the AI context. Four months later
or so, I think it was obvious to most people in the AI industry, this idea of in-context retrieval,
where you pull basically data, you use it as a way to find the relevant text chunks to feed into the
small context window of an LLM. Sometimes, not understanding something too well everybody understands it is
is you know can happen in months at the moment so you know in that sense looking two years ahead
is incredibly hard right we're all students this is a big revolution yeah yeah that's true well i
guess maybe we'll get you on next time to do another like three months to six months prediction
i'd love that.
And then we can grade ourselves on how wrong or correct we are.
I think some of the stuff we talked about in my gut
is that a lot of that will happen sooner.
We'll see the very beginnings,
the early beginnings of it
and how it influences the developer lifecycle
a lot earlier than we're all talking about.
And I just think that how the impact
those workflows have on the developers day to day is a question.
It's like, is it a massive magnitude impact?
Maybe, maybe not.
But it's beginning and you can see the seeds of it and it's proving value.
Cool.
Well, thanks again, Gudo, for your time.
And hopefully not the last time you come on our show.
Thank you so much.
Bye. you you