The Infra Pod - When will AI take over developers? Debate with Guido Appenzeller

Starting point is 00:00:00 So welcome to yet another Infra Deep Dive podcast and we got Maxis this time. So we're super excited to get all of us quick controls to start. I'm Tim from Essence VC. I'll let you take it away. I'm Ian. I help sneak, turn yourself into a platform and do a little angel investing on the side. And I'm super excited to be joined by Guido from Andreessen. Could you tell us a little about yourself? You have like a storied past. I recently looked at your LinkedIn and it was like, you know, 20 links deep of like historic history. So I'd love to learn a little about you

Starting point is 00:00:45 before we dive into it. I think I'll have a hard time keeping you in the job here. Yes, thank you. I'm originally from Germany, came to the Bay Area to do my PhD, you know, succumbed to the bad habit of so many Stanford students of starting companies. I started two companies,

Starting point is 00:00:58 one sold to HP, the other to Arista. Then I was CTO for VMware, for cloud security and networking, ran product for Ubico, you know two, Arista. Then I was CTO for VMware for cloud security and networking, brand product for Ubico, you know, Ubiquiz. And then most recently before Andreessen Horowitz was CTO for Intel's data center division. So, you know, Xeon CPUs, networking, storage, IoT carrier. It's like a very, very big business unit. And then since October here at Andreessen Horowitz,

Starting point is 00:01:22 having a ton of fun with AI and infra. You know, I guess we really wanted to focus on this whole episode on AI and LLMs and that future. And we're going to have a very specific topic here. Tell us more, like how did you get from all that founder research and a ton of like lower level infra hardware, and you just jumped all the way up here now doing lots of software, AI, LLM stuff. What is the way up here now doing lots of software, AI, LLM stuff.

Starting point is 00:01:46 What is the primary motivation and goal when you got into this space? So to some degree, it's actually taken me back to my roots. During my undergrad, I did AI, like computer vision, autonomous robots kind of things. And back then, I think the neural networks were about a factor of a billion smaller or so. But other than that, some of the same technologies applied. And so for a long time, AI was sort of a little bit the technology that never quite worked. It used to be a joke that if something works, it's in systems.

Starting point is 00:02:16 If something's not supposed to work, it's theory. And if it's supposed to work but doesn't, it's AI, right? And 10 years ago, that was sort of correct, right? You know, most AI demoed fantastically well but it didn't have the robustness and the sort of unit economics that it made sense for anybody to use it right and then the big hyperscalers they figured it out i think the mid 2010s right and then sort of you know suddenly roughly last august right it really took off for the mainstream technology has always interested me and you know i jumped right back in and it turned out to be an incredible time. I think it's the first time probably since

Starting point is 00:02:51 the internet, right, that we've really developed a fundamentally new building primitive for systems that usually has very profound impacts on the entire way how we build pretty much anything. If you look at what the internet did, it means we can suddenly send communication at, I don't know, 100,000 of the cost as previously, but it also meant that Walmart got replaced as the largest retailer in the US by Amazon, right? So it really sends ripple effect to the entire economy. And that's, I think, what's exciting me, right? We're building a completely new primitive here. It has very different properties. It's, you know, non-deterministic. It's not constructed bottoms up, but it can solve problems that we don't understand in a theoretical way. And it's just amazing. So I'm having a ton of fun and trying to keep up with everything that's happening.

Starting point is 00:03:33 That's the best answer to jump back in. I've heard a long time. Oh, that thing I studied for my PhD is now relevant, and I'm so excited with the future. It also goes to show how long it takes for new novel things to actually make it to market. Like if you look at the original neural networks in terms of research to where we are today, that arc is super long. One of the things I'm very interested in, I'm sort of intrigued to pick your brain about is Tim and I spent a lot of time talking about trends and infrastructure. Obviously, LOMs, there's multiple layers here in terms of what they represent for the next

Starting point is 00:04:04 generation, how software is built, who builds software, and the underlying, like literally the wafer that powers said software. Curious to zoom in and get your take. stack of the raw hardware, the infrastructure, which when I talk about infrastructure, I think with things like, you know, the database or, you know, the object store or, you know, the network router up to the like the usual layer of the people that are programming and stitching these different components together. I'd love to sort of get your perspective, how LLMs actually change it. It's the question I'm trying to pull apart all the time. And I think a lot of our listeners are as well. So I'm curious to get your top level take on those. The honest answer is we don't know yet. It's like your internet email was just invented. Now predict how this will affect everything, right? It's an impossible task,

Starting point is 00:04:54 right? I think it's going to have ripple effect for a long period of time. We're still in a phase where we're trying to take LLMs and retrofit existing software with LLMs. And something else is starting to emerge where basically you build things bottoms up around this new compute primitive, right? And I think that gives you complete structurally different solutions, if that makes sense. That makes tons of sense. When I think about the way that LLMs are making their way into infrastructure software, the way we build is they're very much focused today on the user side, which is GitHub Copilot being the greatest example. It's like, how can I expedite what a developer is already doing? In many ways, it's very much, we think of the model of infrastructure, it's sitting

Starting point is 00:05:33 on the periphery, right? We're out here on the edge. And most of the times when I think about the way that we adopt, like if I'm a buyer buying a new system, I'm not going to make a bet on a core piece of my fundamental infrastructure, you know, switch out my database to some new database. I'm going to go try it out on the edge and slowly build confidence and prove it out. And over time, it gets sucked in deeper and deeper and deeper. And as Martine would love to say, it gets layered as you build on things on top of it. When it comes to the developers, lifecycle developers, what they're doing, what are the limitations of, let's say, GitHub Copilot today or LOMs in terms of the

Starting point is 00:06:05 developer feedback loop? I'd love to have a discussion around how that moves forward. It's a great set of questions. So let's try to structure it a little bit. So I think the first observation is coding is something that requires correctness. AI is much better in tasks that don't require correctness. Like creating a really cool image that don't require correctness, like creating a really cool image. There's many possible answers, you know, like writing source code or designing a bridge. If you're 90% there, usually you don't get much in terms of brownie points. You have to nail the answer, right? And I think the net result, as you pointed out, is that it's not yet the main thing, but more in terms of periphery or assisting function, right? It's called co-pilot for a

Starting point is 00:06:44 reason and not pilot, right? It doesn't Copilot for a reason and not Pilot, right? It doesn't fly the plane. It just maybe gives you hints. Honestly, Copilot is a lot more useful than GitHub Copilot, in a sense, for flying planes, because they can do it if necessary. But Copilot will just suggest stuff, and it's more the intern, right, for your coding project.

Starting point is 00:07:00 For me personally, I think I'll never again write code without it, right? It clearly gives me a productivity gain, specifically for new frameworks, new software. It reduced my stack overflow searches by a large amount. And the data that we're seeing seems to suggest that depending on how hard what you're doing is, it can get you between a couple of 10% and or to 100% speed up, right? And which is, hey, I have $100,000 a year developer, you know, in terms of cost to me as a company, and that person now can be twice as productive if I pay $300 a year, that's a pretty easy economic case. Not hard to understand, that's a good idea. The really interesting thing for me, as you pointed out, is, you know,

Starting point is 00:07:39 let's assume I would redesign all the coding process, all the coding tool that we had, under the assumption that LLMs is sort of an everyday tool that's free, and everybody has access to it, what would you end up with? It seems pretty clear that it will be completely different from what you have today. I have not seen anybody yet who has a comprehensive vision for it. But I think the way how I imagine it is that today, right, if I'm writing software, I run it through a compiler, and then I iterate until it compiles successfully, right? And then I check it in. We have Git. So, you know, so other people may look at what I've written and then say, Guido, let's modify this a little bit because of some other changes elsewhere in the program. I think we need to take all of this methodology and map it back into LLMs, right? So maybe

Starting point is 00:08:22 in the future, I have a tool where I'm putting in a clever prompt that says, this is a problem I want to solve. Write some code for me or break it down. And it breaks it down to subtasks, then writes code for each of these subtasks. Then I look at the whole thing. They're like, well, this looks pretty good. But this particular subtask, you did the completely wrong thing. So let me annotate the prompt there until it works.

Starting point is 00:08:43 And then I probably want to take all of this prompting and check it into some kind of system, right? So that if somewhere else in the program, something changes in this, and the LLM should take that into account, I can basically re-execute that. For this to make any sense, we need a fully deterministic, like temperature zero LLM.

Starting point is 00:08:59 And we probably need a very different code development environment, you know, like VS Code doesn't quite make sense in that content. I don't know. I think it's a super fascinating question. Yeah, yeah. So I think we're already jumping into what we actually want to go for.

Starting point is 00:09:12 So let's just start the spicy future. Spicy futures. You probably don't know what it is, but it's very simple. We're talking about just hot takes, right? And we usually structure this basically to talk about changes. Two to three years, five years, and 10 to 15 years. Based on what we understand right now, what do we expect to happen both from the technology point of view and from the product point of view? I guess you already started, so we can let you kind of roll. If you just got to take a complete guess,

Starting point is 00:09:51 you know, do you see more changes around developer related workflows and things to impact even further? And what do they look like? If we kind of illustrate a little bit more, what do you like to see happen? Feasible in the near term? Yeah, three years is not near term, man. It's really far out. I mean, let's put out some starting assumptions because I think it might help a little bit, right? 100% of developers will use Copilot or something similar. And Copilot is not just going to be suggest code to me, but it's going to be a chat box where I can ask questions about code or give it instructions for the code. The latest beta VS Code with Copiloty literally has a fixed bug button, or in Python, an add types button, right? Which, you know, you would have told me that a year ago,

Starting point is 00:10:31 I would have said you're completely crazy, right? How's that supposed to work? But this idea that I basically have more high-level code-shaping primitives, right? You know, write a unit test, these kind of things, right, that I can use for my code. So that seems kind of, I think, just the linear extrapolation or just proliferation of what we have today. If I'm too far off the reservation, please stop me here. No, you're killing it. The question is, where do we go from there? And one thing is, so much about software development today is around collaboration. Right. You know, GitHub and pull requests has completely changed how we work on these things.

Starting point is 00:11:15 This so far is non-overlapping with LLMs. Is that fair? There's nothing and no collaborative feature in Copilot whatsoever. That probably needs to change next. Right. If you have a highly collaborative process, then that piece needs to become collaborative as well. So how would that look like? That's the question I keep asking. You spoke about, like, we're going to check in prompts. And, like, in reality, we're checking in prompts. What is a prompt?

Starting point is 00:11:42 Well, it sounds a lot to me like a very detailed product requirements document that you supply to the LM to generate the output. And part of that definition of the prompt is going to be, well, we know already when you write a prompt, you say, well, here's an example of what I, is it working? And here's an example of what I want. And please generate, you know, take this high level thing and then translate it into a lot of O-level stuff. You know, turn the, take this sentence or two sentences and this base image and turn it into, you know,

Starting point is 00:12:04 Guido is like Iron Man or Captain America. I think that's your image, right? That's what these LLMs are great at, is really extrapolating from some seed text or some seed into a fully-fledged output. You said it's like a requirements document. A requirements document is targeted at humans. If I take input for an LLM to write code, that is ultimately targeted at a machine, which is the same as source code, right? Source code gets compiled

Starting point is 00:12:29 to machine code. So are you saying this, in what sense is it closer to something for humans? In what sense is it closer to something for machines? Does that question make sense? It does make sense. It's a great question. It's a question I keep asking myself. If you're checking a series of prompts, which are generally input to an LOM, those prompts are also human readable, right? Is it not possible then just to flip that on its head and be like, well, skip the developer, I'm just going to write a definition for what I want, and the system emerges. And then the next question of, okay, well, that's amazing, because now you have moved the bar of who can create like an app, fully functional working app, way up the stack in terms of like people's ability. And so then you also have the final question, well, what do developers do in the future? That's a good question. You know,

Starting point is 00:13:13 that's for another time, maybe for later in the podcast. But the next question is, how do you know it's right? And I mean, we already have that today, right? We have unit tasks, we have integration tasks, we write them in code. There's no reason you could write the tests as a part of your doc in the same way that we have, here's your input, and here's my desired output, that could all work. And so to that point, to ask your question is, we always go through step functions, right? We always have linear step functions.

Starting point is 00:13:38 So today, we have human-readable, machine-compilable programming languages. But in the future, if you were to say, go to where you and I are discussing, then you end up with human-readable, machine-readable, machine-compilable programming languages. But in the future, if you were to, say, go to where you and I are discussing, then you end up with human-readable, machine-readable, no intermediate, straight to the machine language, which is vastly interesting potential change in the way that you would if you imagine that future. Yeah. I mean, does that mean we're going to go back to test-driven development, basically?

Starting point is 00:14:03 Potentially, maybe the job isn't to write code. It's to just write the test to whatever you want the output to be and let the AI model just continue to search for whatever it writes. If we're going to see an even more powerful model in the future and you can't even reason with the behavior inside,

Starting point is 00:14:19 let's just forget reasoning any behavior. The only way to make sure it's going to the right direction is we just basically write objection objective functions which is basically integration tests and system tests up front this is all the behavior we want this is all the things we're looking for let it just go and do your little like evolutionary algorithm type of thingy but it just go finds a prompt that could able to describe is that to where we can go? But I can see that as actually maybe one approach to having such a powerful AI.

Starting point is 00:14:48 And our job is to just make sure the output looks the way we want. I see one issue there, which is, look, why are programming languages what they are? It's not because we're trying to make something hard to read. Maybe some programming languages are guilty of that, but the majority, I think, is not, right? It's more that accurately describing a solution in a very informal language is actually extremely hard, and having a formal language makes it vastly easier. The nice thing about a programming language is that it easily lets me express something that takes into consideration all the edge cases, which plain English does not. Now, why does it still

Starting point is 00:15:24 work if I write a specification that at the end of the day, the which plain English does not. Now, why does it still work if I write a specification that at the end of the day, the right software comes out? I mean, often, in many cases, it only works because the developer comes back and says, like, hey, Guido, what on earth did you write there, right? I have no idea if this means A or B or C, right? So there's a dialogue back and forth, right? The other half why it works is because the developer knows a tremendous amount of context, right? They've sat in the meetings, they've heard, you know, they understand the high level picture. So my current guess is, if you give an LLM, just natural language and ask it to write a program based on that natural

Starting point is 00:15:55 language without anything else, that's an unsolvable problem, because it's an ill defined problem in a sense, in the sense that you can't describe the context a human would have into text and then give it to a machine for the machine. Like the error rate is too high or the amount of knowledge is too difficult. Is that what you're saying? Yes. And the context window is too short. There needs to be an interactive process in many cases. If you write a spec and throw it over a wall, what's the chance of a program coming back that does exactly what you want, right? I mean, I think that has never happened in my career. It's usually like, what on earth is this, right? And then you sift through it.

Starting point is 00:16:31 It's a super solid point. And so, I mean, it comes to the point that to have a fully described working product may be impossible. I guess the other question is, how do you simplify what you're describing to the LLM and its repertoire of tools the LLM has to pull the solutions together so that it's a solvable problem space. One of the things I think about in this discussion that prompted it as we were discussing is, well, the LLM doesn't actually have to understand

Starting point is 00:16:56 how to generate every piece of line of code. I mean, in this, developers, we use libraries and we buy services and we spend a lot of money with Amazon and we use Snowflake. There's no reason an LLM can't do the same thing, right? It can know that, oh, I am Snowflake's LLM, this data engineering project, I'm going to use all Snowflake tools, and I have been programmed to use the ecosystem. So on that level, it seems highly solvable.

Starting point is 00:17:18 So I guess it's about level of abstraction, how far you're going to go up the stack away from the core problem that LLM can actually significantly solve. But then it brings up another interesting problem, and this is where you get error rate on top of error rate on top of error rate, which is, well, then you could string multiple of these agents together where they're interacting and programming, which I think is probably the more middle ground place we end up with. I wouldn't say three years.

Starting point is 00:17:41 I think that's ridiculously ambitious. But a decade from now, where a company buys the Datadog agent, and they buy the Amazon agent, those are your programming pairs that live across your entire SDLC, from the time you write code to operations and back. That's where I go based on this conversation that we're having. Is that insane? Are you following where I'm going? I think it's spot on to me. I mean, I think it was Jan de Koon who pointed out that for any sort

Starting point is 00:18:09 of auto-aggressive model, error accrues exponentially, right? If I always look at the past, whenever I make a small error, that error is now reflected in the past and I add error on top, right? And that just gets worse over time. So I think whatever we do there, my guess is you need a solution where occasionally, most likely through human intervention, you pare back that error, right? So it needs to be an interactive process. I mean, how are we using this today, right?

Starting point is 00:18:33 If I'm using Copilot, very often I'm writing a comment, right? This function does something and it suggests a function to me. I go like, well, that's pretty good, but didn't think of that particular edge case. So I'm starting to expand my spec, so to speak, right? You know, my little comment on top that says, you know, and if the input is X, the function should do Y, right? And then, you know,

Starting point is 00:18:53 it gets a little bit better until it works. I mean, to me, it's really taking this process and making it scalable. You can almost think of an LLM as a compilation step. In goes your mini spec, and out comes a program. And in many cases, you may have to change your input code to get the right output code. And then we want to work this together. I think this is all the thing we have to figure out. One thought I have, today, OpenAI started this huge model. And we all just kind of like seeing the potential of it. But we can't rely on it.

Starting point is 00:19:21 I can't check in a GPT 3.5 turbo prompt, and then suddenly everything just breaks. As you know, the paper already shows that it's fine-tuning on its own. We don't know what's happening behind the scenes. And the model changes too. And so in the future, it feels like there might be possible iterations where you're going to have a bunch of Uber models that we somehow need to be constantly adjusting for it doesn't seem like we can check in any prompts at all so then which means should we fine-tune a bunch of smaller models that doesn't change as much that we as developers actually understand

Starting point is 00:19:55 our prompts are basically our source code to the specific compiler in some way like if we treat llms compilers then we have fine-tuned LLMs as different compilers that does some particular task really well. When I think of a PRD, turns to the program, the developer job, one is to actually understand the business context, also to understand the sort of service and the technical aspects, like what are all the edge cases I need to test for? What are all the service liability stuff I need to do, right? What are all the, you know, the HA and all that stuff.

Starting point is 00:20:25 We built that technical and built muscles into the backgrounds. Can we take that as individual LLMs that can be integratable into some workflows? That's the SLA LLM that understands how to take a workflow engine and add some... I don't know. To me, the only way to have prompts to be transferable means the models can't change as much. If it changes, all hells break loose. You don't know exactly

Starting point is 00:20:52 what the output is anymore. If you look at it, OpenAI, after some pushback on the changes at GPT 3.5 and 4, they said they're going to keep the older versions... It might have just been 3.5 Turbo. They said they're going to keep the older versions. Sorry, I think it might have just been 3.5 Turbo. They said they're going to keep the older versions around longer, which I think is good.

Starting point is 00:21:07 I mean, look, the other thing that left a crater in the last couple of weeks was Lama, right? Where we now have an open source model where you can simply grab the weights and freeze them, right? And say like, look, I'm going to run with this. And I mean, if you switch from Python 2 to Python 3, you know, probably have to edit your code. If you're switching from Lama 2 to Lama 3 in the future for your code generation exercise, maybe you have to, you know, probably have to edit your code. If you're switching from Lama 2 to Lama 3

Starting point is 00:21:25 in the future for your code generation exercise, maybe you have to, you know, re-edit some of your prompts as well, or your specs, or whatever they will be called. Like how big of that switch is, right? Because I think for Python, we can learn all the rules, we can learn everything. Prompts, I don't know how to learn the rules, right? It's like a black box, so we kind of have to like morph with it. I feel like the developer workflow in the future will change quite a bit depending on what the models exist. What is the right way we treat the models in our workflows? And how do we consistently have the right outputs in different situations? What is the right modality based on the technology? It's hard to tell. One of the things is like, I often think is like, what's the source of truth to the program?

Starting point is 00:22:04 Is the source of truth for the program in this world the prompts, or is it the code? And today, in our world, with Copilot, the source of truth, it's my Python code, it's my Go code, and the version of the compiler it's pinned to that's turning that into a machine, and the instruction set that it's outputting. Those three things tie together the source of truth and create some form of determinism in our world, so that every time I compile, I usually get a program that works. And that's not always true for various reasons, as we all know, but largely speaking, it's true.

Starting point is 00:22:30 And so it's interesting, we have a couple of challenges. One is you have model drift. Fine-tuning under the hood that you're unaware of is model drift. The output is different than every time you run the prompt over time. The next problem you have is you contact window challenges, which is, how can I literally fit all of this text? We think about at scale enterprise code

Starting point is 00:22:49 basis, 10 million lines of code, 100 million lines of code into an LLM fresh. And there's ways that you could pare that down. I'm sure we can optimize that problem to a certain extent. But so it has all of the context. So you can then generate semi-accurate piece of code. And then to continuously do the loops, you have to have a system that's checking that, hey, this thing it generated actually fits in. That workflow is the part that I think is still deeply, one, we don't really understand it. And two, the two challenges I mentioned,

Starting point is 00:23:15 which is contract window size and the model drift problem, which, you know, freeze weights. And if you could stick it, maybe that does solve the problem. Those are things that challenge to me to say, actually, the source of truth still has to be the source code, not the prompt. Is the source code really always the source of truth? Let's take Windows 11. You know, there's so many dependencies on libraries, tools, build environments, unless you have access to sort of the standard one or the prescribed one, you might actually end up with a different binary. Yes, you can pin all libraries and pin the compiler and so on. But I would say it's at least the source code plus the build system

Starting point is 00:23:49 in a sense, right? That's the source of truth. And for practical purposes, honestly, if I file a bug report against Windows, I'm going to specify the binary, right? This is the version XYZ with the following hash. That is the source of truth, right. But I think it's a bit more nuanced there, isn't it? One question just leads to the 10 questions. There's really no answer in this kind of style of discussion. But it's actually great because this is where we're excited

Starting point is 00:24:14 to see the future might be, right? I feel like we should move on to talk about maybe in a more, even distant future, which I think is even more uncomfortable because we haven't even solved anything in the near term. But do you see in 10 years software development change drastically?

Starting point is 00:24:30 Yes, I think it will change no matter what, right? What kind of change do you like to see happen? I guess maybe that's a better way to ask. This is a big discontinuous innovation, right? Which usually means a couple of things. People who are able to harness it get more powerful. People who ignore it or don't want to take it up, sort of fall behind and get sort of pigeonholed a little bit in the legacy corner. So I would expect all those things happen just the same way.

Starting point is 00:24:53 I don't know, when the cloud revolution happened, if you were purely operating below the cloud layer, life got a little less interesting. And you can only work for a small number of companies and there's less growth. If you're jumping on that train, there's huge opportunities ahead of you, right? And I think the same thing is true now. At the end of the day, whenever something makes a task more efficient, that usually increases demand for the task

Starting point is 00:25:18 just because it lowers cost. So my guess is we'll see more software development in the future because it's easier. My guess is if you're a software developer who can harness these new technologies, you're probably going to end up in a much better spot than before because you've essentially got more powerful, right? You've gone super side and you can now with like a couple of prompts, you know, create crazy complex programs. That's an incredible opportunity, right? So I think if you're a software developer, this is a super exciting time to be alive.'s sometimes people asking, like, you know, is it still worth studying computer

Starting point is 00:25:48 science? My answer is always like, hell yes, right? Or if you just found something that makes it even more powerful, this is a great time to do it. I mean, the way I think about it is, so in order to answer your question, I have to scroll back and say, what do I think LLMs are going to like first be autonomously doing? I think it's going to be small changes. If I think back to the first time Dependabot on GitHub opened up and upgraded to my package JSON, I was like, this is magic

Starting point is 00:26:13 and then quickly also found it annoying. It's incorrect, it doesn't do the full source create up trade, and it's annoying. But it shows the brief idea of what the very beginnings of autonomous agents could do and how that focuses engineers on solving more interesting problems. I don't have to deal with upgrading my packages or fixing some CVE

Starting point is 00:26:35 or rewriting some console.log someplace. I'm focusing more time on being creative and using my brain for what it's good for. So I think the first place of innovation, the first place we'll see real application that drives real concrete productivity gains in both for the individual developer, but also at large-scale enterprises

Starting point is 00:26:52 will probably be in these workflows around remediation. Nobody wants to do it. And it is very low on the priority scale of some program manager, right? To the point where you end up with a program manager running some program that has to keep on winding. I think if I scroll out and think more from where that takes us and how that gets productized, I think we end up with what I talked about earlier, which is this world where it's kind of going to be like

Starting point is 00:27:15 in the Avengers when Thanos gets the glove. You're going to buy different agents and it's like each one's going to give you different powers. And some are going to give you the power of observability, right? The power of sight. And that's going to give you different powers. Some are going to give you the power of observability, the power of sight, and that's going to help you generate more correct code at the front end of the stack. As I'm programming, it's going to look into a bunch of runtime context about how the system's behaving. You should not run the for loop this way because that's going to blow up

Starting point is 00:27:41 the stack. This is a really hot piece of code to run at scale. You could think about security superpower, which is like, hey, if you do this, you're going to send PII in a log someplace. That's a really bad idea. More to what you said, which is Copilot's a single-player experience. I actually think it will

Starting point is 00:27:58 probably retain relatively single-player and the workflow that developers have won't drastically change initially in the next 10 years. I do think there's an interesting question of like 25-year timeframes when we've really got these things in production and we really have the next evolution

Starting point is 00:28:15 of these architectures, both on the hardware layer, but also the actual fundamental neural network. And everyone has like these app-scale data systems and we have years and years and years of feedback from what's working, the UX is down i think that is that's interesting and i find it difficult to like think about what it looks like but certainly some of the conversations we had around the future of like how do you actually you get to the level where i write a doc and there's

Starting point is 00:28:38 a product like i think that future is possible i just don't think we're anywhere near close enough to it actually existing in a way that you know, in a night, actually build an app and turn it into 10 million users running it. We're just not there. But certainly, I do think we end up with the copilot analogy is the right analogy. And I think the biggest problem we have, it's the fact that they're non-deterministic. That gives us these great powers of creation. But on the flip side, we need something else being the reinforcement that's saying, hey, the thing you just generated, it's wrong, and here's why.

Starting point is 00:29:10 We can change them to be deterministic easily. Set the temperature to zero, and you have a deterministic LLM. Honestly, I think there'll be many LLMs in the future that run in a deterministic way for that reason. Yeah, when I was thinking determinism, I'm not thinking just of the input-output determinism.

Starting point is 00:29:26 I'm thinking of the fact that the context limitation introduces error, right? There's only so much data you can give the LLM, so there's only so much of a, like... I kind of think of the context window as how big are the blinders do you have on the eyeball? Right now, the blinder is like a speckle of light, and in the future, maybe it'll get bigger.

Starting point is 00:29:42 But I also think of non-determinism in that. So that's what I meant. But I think your point, your point is still stands. Look, I think what we've seen with other LLM applications is there's this notion of in-context learning where basically somehow with a database query or something like that, try to find the relevant data for this particular query, right?

Starting point is 00:30:01 And then just add it to the prompt. And my guess is it'll look the same way in the future, right? If I'm asking to write some code to call an API, then I'm probably going to go out, fetch the API documentation, the relevant parts and stick that in, right? And if not, it'll just come back and ask an agent to fetch it

Starting point is 00:30:15 because it sort of needs that, right? Actually, that's an interesting question to ask. I think right now, if you're trying to play forward, we want this PRD to product future to happen somehow. Will developer tools company in the future not ship developer ID kind of tools anymore? Will we just ship vector databases with an API that's like very stuffed with very particular context

Starting point is 00:30:40 of what you want to do? And also give you a particular LLM with a context knowledge graph for you to actually pull and also give you a particular LLM with a context knowledge graph for you to actually pull from. And that's my API. Maybe that's not crazy to think about. Because if you use an LLM, right now the assumption is LLM need to be completely trained or, you know, fine-tuned or something.

Starting point is 00:30:58 What if we give you all the context, that knowledge graph for you to be able to grab from? Like, this is all the C-sharp coding, you know, all the syntax and everything. And then give you the most complete output, correct correct code and then use that as a way to then figure out how to build some intermediate language you could go a step further right and just say like an api ships with an llm that can answer questions about the api yeah that's true and so it's kind of meant it's maybe that LLM itself either is already completely trained for the knowledge. The developers in the future will be picking LLMs, which

Starting point is 00:31:29 ones to be able to compose my stack. The future stack is not, you know, I'm using Go with Node. This is like I want LLM, you know, A, LLM B, Ian, Guido, and Tim, LL, you know, as my stack., and Tim, LL, that's my stack.

Starting point is 00:31:46 And those three have superpowers able to generate these kind of infrastructure and code and somehow able to kind of piece that together with an output. So basically, my first call to an API would be just a natural language description of what I'm trying to do when it returns to me the code snippet that does the right thing for that version of the API, right? Yeah, yeah. And we can able to guarantee and when API upgrades and only upgrades our agents, like the downstream desk cascading, we can take the LoRa, you know, layers and just propagate to everybody or something like that. Yeah, swagger plus plus plus or something.

Starting point is 00:32:22 Yeah, yeah. I don't know it's interesting just to think about like how do i ship knowledge when i ship the tools because if you just ship the tools we have to absorb all the knowledge and figure out how to do all of that but i think developers in the future should able to be equipped with the knowledge way faster because of the ai lms are involved here and our job is to be able to know the right boundaries of context where I know this tool knows and doesn't know and know how to compose. I'm still doing the Legoing, but this Lego is no longer just piecing code and node express and ourselves

Starting point is 00:32:53 is actually more like, okay, this LLM knows these ecosystems really well. I know these LLMs know these things really well. I can start putting the prompts together. I know intuitively, and there's maybe another LLM that knows how to do the business context function testing really well as well. And I know, understand the boundaries somehow, which means temperature zero and have a way to actually see how that AI has been developed somehow. I don't know exactly how that happened, but it feels like that's what people will be interfacing in the future. There's sort of an interesting analogy there. And boy, we're pretty far out there. So, you know,

Starting point is 00:33:29 I have no idea if any of this is true or not. But if you look at image models for a second, it's a different area. But if in the past, I wanted to explain to you a particular style, let's say the style of a famous painter, in a way that allows you to generate images in the same style that was really hard i mean probably a textual description doesn't really work and can show you a bunch of examples or so all right but you know it's sort of even if i did all of that it would still be extremely difficult for you to actually generate a painting that looks like him and then what we figured out with with stable effusions that we could actually essentially create a model like a fine-tune,

Starting point is 00:34:05 right, that basically encapsulates that style. If I send you that model, you can simply give that model a prompt and you get a picture that looks like that particular painter, right? And that has caused us all kinds of excitement and problems and the internet with copyrights, but also with, I think, some really amazing, cool applications. If you take that and transform it on APIs, you could say in the future, I might be able to give you an API documentation in a way where you can specify a natural language thing you want to achieve with the API. And the model that I give you or that I host will then generate the code in your particular language of choice to actually perform that task. That's kind of a mind-blowing idea, right?

Starting point is 00:34:45 I mean, that's so much what we do in software development today is sort of cobbling together APIs and understanding frameworks, and then they could potentially automate some of them. It's really interesting. And it also strikes me that when I think about this future that we're talking about, this idea where we have these LMs generating this code, that they're really taking a lot of mindshare from me. It also strikes me that while the barrier to entry is decreasing, the debugging experience,

Starting point is 00:35:09 the amount of information you need to go in and figure out, because you're not actually writing the system. I remember the first time I used Ruby on Rails and trying to get that thing to work because I made some changes. It was like magic to me in the sense that it generated a hello world page. And then it was like the most befuddling experience to me and trying to actually get it to work once I messed it up. Right.

Starting point is 00:35:29 Like, and I can only imagine what the world will look like and the experience of engineering will be or software development would be using all of these LOMs tied together. And like, at the end of the day, there's still going to be points where something's not going to work the way you want it to work. And you're still going to get in there, you're starting to figure out, well, is this, you know, a state synchronization issue? Is this some type of timing problem? Is this like, there's all these complexities that just exist, maybe we can, you know, have auto scale, beautiful underlying infrastructure that scales dynamically, and system just understands how to do it. But it's very difficult to solve some of these problems that even humans have trouble understanding,

Starting point is 00:36:06 which is state synchronization issues is a good example. So it strikes me that debugging the output of this at the end of the day is still going to be one of the most challenging things. It's actually fascinating, right? Because there's not many of the opportunity to just kind of just think about what might happen. But I actually do think some of the things we talk about are not 10 years. These are doable in the short term.

Starting point is 00:36:29 I've never been able to predict the future even remotely correctly in a 10-year timeframe in infra. But here's one example. When we looked at Pinecone early this year, like in January or so, I'm not sure I understood the use case in the AI context. Four months later or so, I think it was obvious to most people in the AI industry, this idea of in-context retrieval, where you pull basically data, you use it as a way to find the relevant text chunks to feed into the small context window of an LLM. Sometimes, not understanding something too well everybody understands it is is you know can happen in months at the moment so you know in that sense looking two years ahead is incredibly hard right we're all students this is a big revolution yeah yeah that's true well i

Starting point is 00:37:18 guess maybe we'll get you on next time to do another like three months to six months prediction i'd love that. And then we can grade ourselves on how wrong or correct we are. I think some of the stuff we talked about in my gut is that a lot of that will happen sooner. We'll see the very beginnings, the early beginnings of it and how it influences the developer lifecycle

Starting point is 00:37:38 a lot earlier than we're all talking about. And I just think that how the impact those workflows have on the developers day to day is a question. It's like, is it a massive magnitude impact? Maybe, maybe not. But it's beginning and you can see the seeds of it and it's proving value. Cool. Well, thanks again, Gudo, for your time.

Starting point is 00:37:56 And hopefully not the last time you come on our show. Thank you so much. Bye. you you

Your Ad Here

The Infra Pod - When will AI take over developers? Debate with Guido Appenzeller

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.