No Priors: Artificial Intelligence | Technology | Startups - AI Agents That Reason and Code with Imbue Co-Founders Kanjun Qiu and Josh Albrecht

Episode Date: November 16, 2023

The future of tech is 25-person companies powered by AI agents that help us accomplish our larger goals. Imbue is working on building AI agents that reason, code and generally make our lives easier. S...arah Guo and Elad Gil sit down with co-founders Kanjun Qiu (CEO) and Josh Albrecht (CTO) to discuss how they define reasoning, the spectrum of specialized and generalized agents, and the path to improved agent performance. Plus, what’s behind their $200M Series B fundraise.  Kanjun Qiu is the CEO and co-founder of Imbue. Kanjun is also a partner at angel fund Outset Capital, where she invests in promising pre-seed companies. Previously, Kanjun was the co-founder and CEO of Sourceress, a machine learning recruiting startup backed by YC and DFJ. She was previously Chief of Staff to Drew Houston at Dropbox, where she helped scale the company from 300 employees to 1200. Josh Albrecht is the CTO and co-founder of Imbue. He also invests in other founders via his fund, Outset Capital. He has published machine learning papers as an academic researcher; founded an AI recruiting company that went through YC and a 3D injection molding software company that was acquired; helped build Addepar as an early engineer; and served as a Thiel Fellow mentor. He started programming as a kid and began working professionally as a software engineer in high school.  Show Links:  Kanjun’s LinkedIn | Website | Google Scholar Josh’s LinkedIn | Website | Google Scholar Imbue raises $200M to build AI systems that can reason and code Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Kanjun | @JoshAlbrecht Show Notes:  (00:00) - Introduction to Imbue (04:55) - The Spectrum of Agent Tasks (08:43) - Specialization and Generalization With Agents (13:03) - Code and Language in AI Agents

Transcript
Discussion (0)
Starting point is 00:00:00 Mbue is a company developing AI agents that can reason and code. Today, Elad and I sit down with Kangenkiew and Josh Albrecht, co-founders of Mbue, to discuss training large foundation models for high-level reasoning, why agents require architectures different from large language models or language token prediction models and how current computers are getting in the way of their users. Kansin, Josh, welcome to no priors. Thank you. Thanks. So perhaps you can start by just telling us the story of how you guys know each other and where the idea from Bue came from.
Starting point is 00:00:37 Josh and I met at a conference and then started a big house together. It was big house, 20-person house, and also started the first company around the same time. I've always been really interested in agency and kind of like how do we enable humans to have more agency. And Josh has always been really interested in AI. And so it kind of made sense. We, at that time, talked about, like, oh, you know, someday we're going to be able to have AI systems that give humans a lot more agency. Fast forward to 2018 or so, we were running an AI recruiting company called Sorcerus. And that was actually kind of the first AI agent that we built. It was, you know, not transformer models, like more old school NLP. But it was a system that recruiters used and kind of automatically got candidates in their
Starting point is 00:01:30 inbox. And we learn a lot about, oh, if you have an autonomous system like this, like, what do you actually need to make it work? And around that time, some of our housemates were building GPD3. And we were seeing like, oh, scaling works. You know, if we just keep scaling, actually, you're going to get pretty far with a lot of these language models. So our question at that time was, you know, how far can we get with language models? Does this kind of self-supervised learning, which is working so well on language, work in other modalities as well? So, you In early 2020, that's when we first started seeing self-supervised learning working across video and images and language. And we were like, huh, there's something really interesting here where maybe machines are learning the same kinds of representations or similar representations to what humans are learning. And maybe they can get to a point where they can actually do the types of things that humans are able to do. And that's when we first started in view. We're starting talking about in view. you clearly know a bunch of people working at sort of large language model research labs well when you looked at what they were doing how did the focus come to be on agents in particular
Starting point is 00:02:36 and how is that different from a general language model yeah i think we've always been interested in agents in not just you know recommender systems or classifiers or things like that but in systems that are going to go do real work for us right that are going to actually be useful in the real world. Right now, you can ask some kind of chatbot, something, and it'll give you back a response, but the burden is sort of on you to go do something with that, to verify whether it's correct or not. I think the real promise of AI is if we can get systems that can actually act on our behalf and can accomplish goals, and kind of do these larger things and sort of free us up to focus on the things we're interested in. Yeah, one thing that I think
Starting point is 00:03:11 we often forget because we're in it every day is our computers are actually, like, they need to be micromanaged. The reason we're in front of our computers every day is because nothing really happens. Like, they can't make decisions on their own. Nothing really happens unless I'm in front of it. And I'm, like, in front of it, doing all this really detailed stuff, kind of like operating a factory machine with all of these, like, little knobs that are really specific. And there is a future where computers don't need to be micromanaged that.
Starting point is 00:03:36 I can, like, give a computer an instruction, whether it's natural language or something, some other kind of instruction, like, it can go off and understand what I'm trying to do and help me do it. The diff between this, where we are today, and that is kind of like the diff between the first calculator and where computers are today. Like the very first digital computer was a room size calculator. All it did was calculate Fourier transforms and things like that. And I think that's kind of the potential for where AI can be given the current, like,
Starting point is 00:04:08 where the technology is going. It's very possible. What do you think is, so when I look at technologies, there's almost like three types, right? There's things that just are never going to work. And maybe some aspects. of Theranos for that, right? Like there were questions whether the physics of there knows would ever work as you miniaturized things sufficiently. There's things that can work immediately today or with a little bit of work or engineering that you can get there.
Starting point is 00:04:32 And then there are things that are clearly going to happen. And there at some point in the future. So, for example, in the 90s, people to find out almost everything that cell phones would do. And then eventually it got there once you had better processors on phones and more bandwidth in terms of cellular networks. Like, you need to build a bunch of stuff in terms of infrastructure. And it was clear what was going to happen. What do you think is missing, if anything, technologically, to start to build real worlds for the perform in agents? The way we think about agent tasks is it's a spectrum of difficulty. So some agents are very possible today.
Starting point is 00:05:03 Like we see a lot of them. There are these conversational bots that take over some of the customer success workflows, and they'll fill over to real customer success people if the agent doesn't know how to deal with it. And those are actually, what we see inside companies is that they actually have pretty, complex reasoning workflows. They're somewhat hard-coded, and so they're not general. They don't generalize to other company workflows, but, you know, we are already seeing agents. And then maybe there are two spectra.
Starting point is 00:05:34 There's like specific to general. So today we have very specific agents, and over time, we can, you know, if they are better at reasoning and better at certain other things, interacting with your computer, then they become more general, be able to use the same agent and it'll learn something new. And then there's also a spectrum from, like, co-pilot to more autonomous. So today we see a lot of co-pilots and the human in the loop. And over time, it becomes kind of incrementally more autonomous. And so I don't see it as being so binary, like, oh, there is a technology missing for agents.
Starting point is 00:06:07 But rather that as capabilities improve, we're going to see more and more of these use cases be eaten up by more general, more autonomous agents. There are a few categories of things that are in the way today. So I would say where we are today, we're kind of in the era of like maybe lossy Ethernet with no air correction or like analog computer or something like that, where we have these models and they don't work reliably. And that's, you know, when we talk to founders building agents, that's really the biggest thing. It's really hard to get these systems to work reliably and output exactly what I wanted to output and kind of do the right thing at every step all the time. And so the question is, okay, how do you get it to work more reliably? Well, there's, you know, a lot of why we work on reasoning. And when we say reasoning, it's kind of all of the things around getting tasks done in the world. Like, when does a system come back to you? How does it know it's not certain about its output? Can it kind of think through different action plans and figure out, okay, this plan is the better plan. And we should try going down this path first. Reasoning is one big piece of improving liability. And the second chunk of things is like all of this error correction and I think like chain of thought, tree of thought, these are air correction techniques. We have a lot of other
Starting point is 00:07:29 techniques internally. And that also helps improve reliability. And so if we think about this problem as a reliability problem, then you can incrementally make a lot of progress on it. I loved your framework of sort of generalizability and sort of that two by two that you had. If I look at a lot of the language models today, what I'm observing a lot of people doing is they basically start off prototyping something, say, on GPT4, because it's the most advanced model. They see if it works or not. And if it works, and they have any sort of scale, in some cases, they move to GPT3.5, and sometimes, you know, they've thought about fine-tuning or not. But sometimes they'll move to an open-source model, which works dramatically less well in some cases.
Starting point is 00:08:09 But then they'll fine-tune it for a high-volume use case. And it's all because of cost optimization. Basically, as you know, if you have a really big model, it costs a lot more for inference in terms of compute than like a smaller model. How do you think about that relative to generalizability? Because I guess if you make something really generalizable, my assumption, which may be incorrect, is it's more expensive, right? You'll need some forms of memory for it. You need some broader logical capabilities versus just saying, I'm just going to do the thing that's going to, like, order flights really well or whatever maybe in terms of agents. And so I'm sort of a little bit curious about that framework. I think that's the right way of thinking about it.
Starting point is 00:08:43 I think when Ken June was saying, you know, the spectrum from more specialized to more generalizable, I think we're talking about the ability to solve more general problems, like the ability to do these problems that you've only seen once or twice. I think even as that ability goes up, we're still going to see kind of, you know, a thing coming behind that, a force that takes each of those things. Like maybe you start out by doing your plane booking with GPD4, but eventually you realize, like, oh, actually, like this is so expensive and slow. Like, I just want the thing to be really good at it.
Starting point is 00:09:08 But what you can do is you can apply these agents. And this is part of the reason why we're interested in agents that code. you can apply those agents to the original general system to have it go make a more specialized version of that. So it's kind of specializing the things that you're doing a lot. And you can look at each of those things like, okay, I'm making 10,000 calls of this. This is super expensive. Can I just write a piece of Python code that does this? As you have more general capabilities, you actually can use those more general capabilities to kind of do that specialization.
Starting point is 00:09:33 What we see with a lot of the agent builders today is that they'll use, you know, an agent workflow is complicated. It has lots of different pieces. And so they may use a specialized model for parts of it and a general model for other parts of it. And the way we think about it, it's quite pragmatic, basically that as capabilities increase, what we want is, like, minimal viable models for each capability. And so a lot of the models are much smaller and very specific and, like, pretty specifically trained. In the personal computer kind of revolution, around that time, I think there was kind of like branching. So some people built supercomputers. They're like, oh, we're going to like make the more powerful computer.
Starting point is 00:10:11 other people built personal computers. And it turns out like personal computers much bigger market and supercomputers not that many people needed that much computing power. And I suspect we're going to see something similar where a lot of use cases are going to be able to be addressed by something pretty pragmatic and relatively small. We're like not, we're definitely not pushing the bounds of what we can do with data today on small models. And so, you know, smaller things can work well.
Starting point is 00:10:35 I want to go back to what I think is like a really deep topic of discussion at Embu. in terms of how you define reasoning and like this being an area of differentiation in terms of your research efforts. Like, you know, we all have a bunch of friends at OpenAI and other labs working now publicly on multi-step reasoning and more process supervision, as you were describing. What makes you excited and confident that there needs to be a different approach versus just general language models in order to make the reasoning you need for agents to work? I think there's a different approach. process. Like language models are great. They're really good, you know, predicting the next word. They're good at, you know, making, like a very easy classifier. They're good at all sorts of things. But there is, and there are obvious limits. Like we know even in theoretical senses, like they cannot learn to do multiplication in the general sense because it literally doesn't fit in the context window, right? Like multiplication, they can learn to do addition in a modular sense and they can learn to do it actually almost perfectly if you train them in the proper way. But they're not learning the general algorithm for addition. Instead, if you want,
Starting point is 00:11:41 something to actually execute the general algorithm for addition you need to have a thing that works in a different way that has some sort of outer loop about what step should i take next right that's just a kind of like definitional thing there has to be some other sort of wrapper there has to be a different sort of outside process everyone at open ai and at imview and anthropic we all like know how this works i don't think anyone is proposing like it's just you know shove it all in the language model you can get really far but i think we're interested in what is that other higher level system How do we decide what is the right next step to take? When should I go collect more information?
Starting point is 00:12:11 Am I certain about this? All of these kinds of other things, those are, I think, the questions that are much more interesting. I think there's actually a lot of work to be done there. I think we're still very early in the days of creating these systems. Natural language is not a bad medium for it. Code is also another example of a medium for it. Language is pretty compressed.
Starting point is 00:12:28 And so that's helpful for dealing with these situations. Is that one of the reasons you all decided to focus on code is one of the first types of agents that you have started with? Or could you explain more about the logic behind that? Yeah, I think for us, code is useful when we're thinking about reasoning. One way that we're sort of making collectively reasoning agents today is founders are just hard coding, the reasoning process of like, okay, if there's a customer support complaint about this thing, then I do this. If it's like this, then I do that.
Starting point is 00:13:00 And so you have this very special case version of the thing, right? And there's a spectrum between code and language or more kind of general reasoning abilities, but it's a spectrum. It's not a binary thing, I think. And so you can have code now that we have these language models that kind of mixes the language models and the code layer, right? Where it's like sometimes you're using a language model to decide what to do. Sometimes you're using an if statement. And so it's more about like a fusing or like melding of these two different things and being able to like be in the right place on that spectrum. And so code is actually like a really important part of this.
Starting point is 00:13:29 And as you do things that you want to do more robustly and you want to do in a more repeat. repeatable way, then you want to move it more towards code, right? And so to the extent that you've never seen this task before, maybe you should be doing it in this more kind of nebulous, intuitive sense, and then over time, get better at it, critique it and turn it more into code, actually. Yeah, and when we see founders and ourselves building these agents and people shipping them into production and us shipping them internally for ourselves, like, basically the agent loop can be very complex and breaks down into different chunks, and we can like turn certain chunks into code. it really feels like programming in a lot of ways.
Starting point is 00:14:04 So there's something kind of interesting. Can you talk a little bit about just where you begin in terms of like how to structure the research effort? Like if there are certain tasks you work on, if you start by working on policy or reinforcement on certain tasks or there's data you want to collect. Like how do you start? Yeah. So we have this idea we call serious use where basically we should be building agents that we want to use every day. This is actually one of the biggest blockers. It's really hard to get agents we want to use every day because of the reliability issues.
Starting point is 00:14:37 And so a lot of what we work on is coding agents, but we also work on agents for other operational business processes. And that kind of helps drive, oh, okay, like these parts of the agent loop are really complicated. Like, can we simplify them? Can we make them more reliable? And in a lot of ways, it is an incremental kind of set of work that helps us get from like, you know, 60% reliable to 70% reliable to 7%. 70% to 80%. And that's what forces development of new techniques. It's not like, oh, magical, you know, we train a giant model and stick everything into it and then magically it works. Like, it does not work. It'll get better at random parts of the agent loop, but that's not what we want.
Starting point is 00:15:16 And is the premise here, like you start with a serious use, smaller task in code or something like a, like a recruiting communication automation task? Or how do you choose? Yeah. We pick tasks kind of depending on a bunch of. different factors. One, like how useful, how frequent, how possible is this going to be to do, right? How generally applicable is it? How much is it going to help push the techniques that we want to push forward? Does it scale to more complex versions of the task? Yeah. So we're purposely trying to pick, you know, some with some diversity. Like we have, you know, one agent that will just go do a random to do in your code base. And so this can be super, super general. It can take a really, really long time to do
Starting point is 00:15:54 this, right? And we have another, on the opposite end of that spectrum is we have an agent that we'll look at every single pull request and run linter against it and ask, like, okay, are there any type errors? Okay, how do I fix them? All right, great. Here's like a PR with me fixing the type errors for you, but very, very specific. But really, you can imagine how, you know, you can invoke the to-do agent to fix a specific type error, and you can expand the type error fixer to do unit tests and to do security flaws
Starting point is 00:16:20 and to do renaming these variables. And they sort of meet in the middle as you kind of make these things both more capable. And so there are just different ways of kind of looking at the problem of how do we make a useful coding agent. And Eli, to your point of kind of the specialized versus general dichotomy, one thing that is kind of interesting that we're seeing in agents is, like, agents can call subagents. So our to-do agent can figure out, like, oh, there's already a sub-agent for this thing for this function you're trying to write. And like, let me call that sub-agent because it seems likely to succeed. And then if I try it and it doesn't succeed, I'll do something else. And so
Starting point is 00:16:52 you can kind of have this like more general reasoning layer and also a bunch of sub-agents where That general reasoning layer is actually very specific. It's a specific planner. It's not that good at, like, browsing the web and things like that. But the system itself altogether is more general as a result. How do you guys do evaluation for both these, like, let's say, categories of agents that you're working with today from the, I assume, more closer to production grade to do to broader coding agents? Yeah. The evaluations, I think, are actually one of the most important parts and one of the places where we spend the most,
Starting point is 00:17:28 time and think about it kind of the most. There's a lot of work in specifying exactly what you want from the to-do agent, for example, right? Like, how do you know, like, it gives you back some code. Okay, is that good? There's sort of a spectrum, but like, if it's faster, it's better. If it gives you less code, that's better. But if there's bugs, that's not good. So you really need to take it and, like, break down, what did I really want to happen here? And I think when you start to break this down, you start to say, okay, there's some things that are kind of qualitative, like, do I trust it? Did it come back with tests? Can I run this code immediately, like the kind of feel of it. There's other things that are just for the code itself. There are different attributes.
Starting point is 00:18:01 Is it in the same style? Does it have good variable names? Like, is it a minimal change? Or did it all sorts of stuff that it didn't really need to change? Each of those things are actually some that you can measure a little bit more easily than the overall task. So you can make another kind of metric that's like, okay, how good are the variable names? Well, all right, how similar are they? You can break that down. You can kind of keep breaking it down until you get to a point where it's like, okay, I mean, you know, a regular language model or even just a person looking at this, like there's an objective answer. One of the reasons why we work on code is that there are objective answers to a lot of these questions, either the test pass or they don't. Either the
Starting point is 00:18:31 function is correct or it isn't. Those kind of things are much easier to evaluate. And so we're starting a lot more of our tasks are in that zone as we sort of build up eventually to the ones that are a little bit more qualitative because the evaluation is so much harder there. But I think the whole, the strategy of breaking these things down, like basically the strategy is we take the output or the answer and we like ask a bunch of questions about the output and then we evaluate those questions. And we also evaluate the output. And the interesting thing about that is it scales pretty well to like non-code tasks. So for like our recruiting tasks, we can also do a very similar process. I think part of why a lot of teams try to work
Starting point is 00:19:06 on just math or code reasoning is because those are the easiest to evaluate and like the clearest answers. But but just relying on like is the output correct or not that loses a lot of information in the evaluation. Yeah. I think it's likely to be a pretty rich space. I'm curious just for your point of view, but we've looked at a lot of startups building, let's say, interesting AI, like development tools, right? And one of the things that we've spent a bunch of time thinking about is, like, what makes for a good scalable eval loop, right? And that could be objective and easy to test, right? Like, it doesn't compile to things that, as you said, might be richer in data. Like, how easy is it to check the functionality of something, right?
Starting point is 00:19:49 Do you have to do static analysis? Is the performance better? Are there examples? Are there examples if you want to focus on a particular problem, like, let's say, like Python 2 to 3 upgrades or something, right? I think one of the things that's most attractive about this domain is there are lots of ways to evaluate, right, even beyond the contributions to reasoning that you guys describe. And it's just going to be productive. I mean, maybe on that topic, like, do you, do you guys think of yourselves as a product company? Like, is it important to go get this functionality in front of users or just focus on research and sort of how do you think about that sequencing? Yeah, of course we're a product company. We're a company. But I think looking at the history
Starting point is 00:20:31 of computing, there is like a right time for technology. And today, I think what you see, what you both see is that it's pretty hard to make agents that work that like can be productionized and used all the time. And it's because like the technology is just not there yet. And we use reasoning as like a bucket term. We've described a little bit all the nuances of what we're actually trying to do to get agents to work. And then we lump that all under the term reasoning because it's easier for people to conceptualize. But the reality is like what we're trying to do is to make kind of a system, a set of tools and maybe frameworks that actually makes it so that we can build reliable agents really fast, really easily. So today, like writing agents feels like writing code and assembly, and that really limits the types of agents we can build and also limits the number of people who can build them.
Starting point is 00:21:25 And kind of what we're going toward is like programming languages that are a little bit more ergonomic, where we can build agents much more easily, where they can work much better, and where a lot more people can build them. Whatever it is that we release, that's what we hope it's going to enable. And so that's why we kind of work on different parts of the stack. We work on the underlying models because there need to be like more specific underlying models that work for specific things. And that's what allows a lot of these capabilities and agents to be more reliable. We also work on other pieces of it as well. Maybe if we just project forward a little bit, like what are you guys most excited about? You want to be tools at different levels of the stack company.
Starting point is 00:22:13 What are you imagining people build, or what are you already seeing people build that you think is going to be, let's say, useful a year from now and useful five years from now? Yeah, I think a year from now, we're going to start to see some of these use cases actually work that today. You can write these. Like, we have the capabilities. You can make some kind of agent to triage your email or to do scheduling or many of these workflows. Like, why don't we have that today? That definitely can be done, right? Like, there's nothing stopping us.
Starting point is 00:22:39 And I think five years from now, we're going to have something where it's not just, you know, okay, we have a scheduling bot. We have this other thing. But we really have these more general, more robust systems where each of us can individually say, like, I want a thing that does this. I want to do this particular weird research workflow. And I want it to work like this and blah, blah, blah, and just specify it in language. I think one thing that our recruiter mentioned yesterday, and then I thought was kind of funny is you've been describing to Ken. And it's like, we're actually sort of a software dev tooling company. But the idea is that in the future, everyone is going, like, as we make these things easier and easier to program, really everyone's going to be at like sort of software engineer in that sense. Like we'll be able to make our own agents, right? Just by sort of working in natural language and like describing what we want to do and how we want it to be done and interacting at that level. And so since we're going to be working when these agents, we're kind of making, we're like trying to move towards that kind of tooling. And so I think the goal in five years is for people to be able to really specify some huge range of possible agents that, you know, that do exactly what they want. Like, they're kind of making, we're like, they're like,
Starting point is 00:23:39 can interact with their computer in whatever way they want. I think specifically what she said is we're a software dev tooling company, but in the future, everyone will be a software engineer, and so everyone will need dev tools. And we think of agents. Agents are a very technical term. That's like the specific memory architecture of the computer. But agents, what they enable is they're like a natural language, programming language. And so in the future, you know, it's, you know, computers, programming computers today,
Starting point is 00:24:06 a way to think about the problem is that it's really, not very intuitive to get our computers to do what we want them to do. And computers have been becoming more and more intuitive over time. And the best tools are very intuitive. And so one day, you know, language is very intuitive to us, like vision, kind of seeing, understanding things that way, very intuitive to us. And our computers will become much more intuitive so that people can make them do what they want. More people can.
Starting point is 00:24:31 One major milestone they had recently was you announced a $200 million fundraise from Astera, NVIDIA, and a variety of other folks. How do you think about what proportion of that will go to things like compute versus team and how in general should AI companies think about the capital they raise and how to deploy it relative to different potential objectives and outcomes? I mean, I think actually a significant fraction of that is going to go to compute. I think I can't speak to other companies how they should deploy it, but I think for us, given that our goal is to make agents,
Starting point is 00:25:03 what we really want actually as a company is not to become a huge company. We don't want tens of thousands of people. We want to make our product actually work so that we can make AI agents so we can have some huge impact and have a relatively small, close-knit team where the communication is much easier. It's really hard to communicate with 10,000 people. It's much easier to get 100 people in a room and know what the heck you want to do and agree on things.
Starting point is 00:25:23 And so I think we're trying to ideally leverage ourselves and we're already starting to do that today. And what that looks like is by spending a bunch on compute. Today, we don't have AI agents that are running off and doing all sorts of things on their own, but we do have the beginnings of those. We do have our internal hyperparameter optimizer, for example, which saves us a ton of time. Instead of our researchers manually deciding, like, oh, this learning rate, I should do this experiment. We just take go, we come back after the night.
Starting point is 00:25:46 And it's like, oh, great, everything is optimized. This is really nice. Right. But that used a lot of compute. Like, we're using a huge amount of compute relative to each person. Yeah. We're like training state of the art models with like 14, 13 people. Most of us are not working on training the models or the infrastructure even.
Starting point is 00:26:01 And most of us are not working on that. And so it's the total team. size is very small for what we're able to do because of the way we think about our infrastructure. It's like a very egentic kind of approached infrastructure. It's now sort of broadly viewed that there will not be a fully monolithic architecture for lots of useful models and people have like mixture of experts and such. Given what you want to do with agents, with like planning and reinforcement learning and more test time compute, like I think it's sort of a belief among the largest research labs that under 5,000 GPUs,
Starting point is 00:26:39 like under some reasonable level, like you cannot compete on state-of-the-art reasoning, at least as the core LLMs describe it today. Obviously, that bar keeps moving. Does that number apply to you? Do you think the architecture is just very different? I mean, we actually have a lot of GPUs. So the number may or may not apply,
Starting point is 00:27:01 but we do have a lot of GPUs. we have enough compute to be able to train models that are as large as the largest models have been trained today to date. So we have a ton of compute. We can train these really large models. And it may not be the best use of our time and resources, actually, because I think just as with, like, computers, things just get more efficient. And what we see is that things are getting more efficient in training.
Starting point is 00:27:24 So, like, learning how to use data more effectively so that the models get much better performance with less data, learning how to. like do training runs so that... So things don't, you know, diverge. Like, there's all sorts of things don't diverge. We're not having to like rerun the same thing again. There's like a bunch of, there's a bunch of like hyperpramers to set and like tooling to build around it and monitoring and stuff like that.
Starting point is 00:27:46 That just makes it more efficient to train these things. And then also, I think the data piece is just so big and so under explored. Like people don't really, we all know that data is the thing that matters. And I think like a lot of efficiency gains are going to come from better. data. And so that's actually quite a bit of what we work on. Could you tell us a little bit more about why you decided to focus on coding and what are the types of systems you're really focused on building? Yeah. So there's a bunch of different reasons for focusing on coding. One of them is that the evaluate, we talked about before,
Starting point is 00:28:17 the evaluations are much easier to do and subjective. Another one is that coding is part of reasoning. Another one is that coding really helps us accelerate both our own work and the agents that we end up building. So as we're making the tools for ourselves, we already are starting to see like this kind of leverage from the systems that we've built where like we can run this agent now you know I think probably within the next year we'll probably you know not be hiring as many recruiting coordinators because oh we're going to do some of the scheduling with the agent that we've built right but we also can do the same thing on the software engineering side we're writing unit tests literally right now automatically okay and that's just helping accelerate us helping you know
Starting point is 00:28:53 remove the bugs it's additive it's incremental it's like okay we get a 5% gain a 10% gain here but as we make more and more tools those things compound and I think time, it's going to be possible to make much more robust systems, much more quickly. And we're building, you know, we're using these coding agents to write the coding agents. And I think this is kind of the like, you know, sort of recursive self-improvement thing that people have always been sort of worried about or excited about in AI. But I think what it really looks like in practice is not this scary like, oh, you know, you leave your computer on overnight and all of a sudden it's a super, super God thing the next
Starting point is 00:29:25 day. Instead, it's like this slow grind of making things a little bit better every day. but a 1% improvement every day over a year is huge. And so I think that's the kind of thing that we're really excited about with code is that not only can we apply it to our own workflows, but also as we start to actually get coding agents that can really write code,
Starting point is 00:29:45 now we're in a very unlimited, very interesting space. Right now, the bottleneck for most companies is the ability to hire software engineers that can write really robust code, right? But if you can just turn compute into really good code, now this is a totally different world. Now there's none of this,
Starting point is 00:30:01 oh, yeah, we'll imbue is so much smaller in this other company, blah, blah, blah. It's like, no, no, we can write way more code than anyone else, right? So I think this is kind of a pretty interesting thing that over time I think we'd like to work towards, and so that's another reason for code as well. There's also code is really useful for action, so agents acting. And today, you know, even the models can like do really simple things like write code to write integrations, like API integrations. And so that saves us a lot of time writing API integrations, which is super annoying.
Starting point is 00:30:30 I also think like software is just dramatically underwritten because it's so hard to write code today. So, you know, as we said in the future, like computers will be able to be programmed by regular people. What that means is like we're going to write way, way, way more software all the time. And like people will write software, but maybe not by having to write code the agents write the code. I think it's not only just more software, but also better software, right? If like already we're having our agents kind of look at our pull request, you know, fix the type errors, okay. But we can extend this to adding new unit tests, to fixing the existing unit test, to looking for security flaws. I'm very excited about agents that can go out and help all sorts of
Starting point is 00:31:05 organizations improve the quality of their code base. How can we simplify this for a factor at fixed security flaws? I think there'll just be a huge flourishing of much higher quality, better software, as a result, not just more software, but just taking the existing software and making it so much better, which will make it so much nicer and more fun to interact with as programmers as well. Also, much more custom software. Like something that we do some of is generating interfaces. And it's pretty interesting. Like, if I can have it like a custom interface for whatever it is that I'm trying to do. It has exactly the right form fields. It's like kind of nice. And then I can cache that interface and like we use it. So, you know, pragmatic,
Starting point is 00:31:39 but pretty interesting. Yeah, I mean, I did this over the weekend actually for Mid Journey. I got really sick of typing out Mid Journey prompts on my phone in Discord. You can't keep iterating the prompt. So I just made like a little, a little thing that interacts, you know, with it via the API. Well, my version of the API. But yeah. And so, but I think everyone will be able to do this. Like it didn't actually take that much code when we have agents that can write code. someone else who wants to use it in a different way. Great. You can just like ask the agent to do that, come back five minutes later and you have like your own perfect way of interacting with this. I think that's just going to make our computers feel so much nicer to interact with.
Starting point is 00:32:11 That I think is an inspiring note to end on. We're going to have 25 person companies who can change the world. We're going to have more software, more custom software and higher quality software for us all to use. So thanks so much for doing this, Josh and Kenjin. Yeah, that's what you're trying to. Thank you, Sarah. Thank you. Find us on Twitter at NoPriarsPod. Subscribe to our YouTube channel if you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week.
Starting point is 00:32:41 And sign up for emails or find transcripts for every episode at no dash priors.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.