No Priors: Artificial Intelligence | Technology | Startups - What’s Beyond GitHub Copilot? With Copilot's Chief Architect and founder of Minion.AI Alex Graveley

Episode Date: March 30, 2023

Everyone talks about the future impact of AI, but there’s already an AI product that has revolutionized a profession. Alex Graveley was the principal engineer and Chief Architect behind Github Copil...ot, a sort of pair-programmer that auto-completes your code as you type. It has rapidly become a product that developers won’t live without, and the most leaned-upon analogy for every new AI startup – Copilot for Finance, Sales, Marketing, Support, Writing, Decision-Making. Alex is a longtime hacker and tinkerer, open source contributor, repeat founder, and creator of products that millions of people use, such as Dropbox Paper. He has a new project in stealth, Minion AI. In this episode, we talk about the uncertain process of shipping Copilot, how code improves chain of thought for LLMs, how they improved product, performance, how people are using it, AI agents that can do work for us, stress testing society's resilience to waves of new technology, and his new startup named Minion. No Priors is now on YouTube! Subscribe to the channel on YouTube and like this episode. Show Links: Alex Graveley - San Francisco, California, United States | Professional Profile | LinkedIn Minion AI Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @alexgraveley | @ai_minion Show Notes: [1:50] - How Alex got started in technology  [2:28] - Alex’s earlier projects with Hack Pad and Dropbox Paper [07:32] - Why Alex always wanted to make bots that did stuff for people [11:56] - How Alex started working at Github and Copilot [27:11] - What is Minion AI [30:30] - What’s possible on the horizon of AI

Transcript
Discussion (0)
Starting point is 00:00:00 The most fun stuff was like these people that would pop up and be like, I don't program, but I just learned how to write this hundred-line script that, like, does this thing that I need. It's like, oh, my God. This is the No Pryors podcast. I'm Sarah Goya. I'm Al-Degel. We invest in, advise, and help start technology companies. In this podcast, we're talking with the leading founders and researchers in AI about the biggest questions.
Starting point is 00:00:30 Everyone talks about the future impact of AI. But this week on no priors, we're talking with the architect of arguably the first AI product that is already revolutionized a profession, GitHub co-pilot. Alex Gravely was the principal engineer and chief architect behind this product, a sort of paraps programmer that auto-completes your code as you type. It's rapidly become a product that developers won't live without, and the most leaned upon analogy for every new startup, co-pilot for finance, for sales, marketing, support, writing, decision-making, everything.
Starting point is 00:01:13 Alex is a longtime hacker and tinkerer, open source contributor, repeat founder, and creator of products that millions of people use, such as Dropbox paper. I have huge respect for the range of work he's done, ranging from hardware-level security and virtualization to real-time, collaborative, web-native, and AI-driven UI. He has a new project in stealth, Minion. Just as a heads-up, we recorded this podcast episode in person, and you'll notice we don't sound as crisp as we usually do. Still, I think you'll enjoy listening to this conversation.
Starting point is 00:01:41 Let's get into the interview. Welcome to the podcast, Alex. Hi. So let's start with some background. How did you end up working in tech or AI? Tech was earlier. I started, you know, really young, and I got really into Linux when I was 14 or so.
Starting point is 00:01:57 And, yeah, it was right around the time. when the web was like a new thing and you had to work to kind of get on the web, it's the idea of helping in the open and making it freely available so other people could learn from things. Like I was learning from things seemed great. So I went and spent many years just working on open source stuff. Yeah, spend many hours compiling kernels and hacking on stuff. And yeah. And what was the thought process behind starting like hackpad? Oh, hackpad. Yeah. Yeah, so I had just finished like four or five years at VMware, and I wanted to get into
Starting point is 00:02:36 startups. I knew that. And then so I left VMware and I started working on an education startup like many of us do. Many, many founders start with like the idea of an education startup. It's like a right of passage. Yeah. It's like a right of passage. Yeah.
Starting point is 00:02:51 So I spent, I don't know, nine months working on that. Turns out education is very hard. So after nine months, I was like, all right, this isn't going anywhere. I don't know if there's a value prop here. I mean, that's, that was the value is that I learned that, like, you have to make something that is both achievable and that people want to pay for or spend their time on. So, yeah, then I was just kind of fishing around.
Starting point is 00:03:13 I was living in, like, a warehouse in San Francisco, a bunch of Burning Man people. And we were having trouble organizing large-scale Burning Man projects. And so I forked Etherpad and started hacking on it, recruited a friend to, in the camp. community to start working on with me. And yeah, just grew from there. Before we did YC, we had most many of the large burning man camps using it to organize their, their builds. Can you describe the product experience? Oh, yeah. It was a real-time text editor, kind of like
Starting point is 00:03:42 Google Docs. Google Docs is the only other one that did that at that time. It was kind of nice because it would highlight who said what. So you could go track down. If somebody had a contribution, be like, oh, what did you mean? Or I heard you say something about this. So that's very useful in like large-scale anonymous, pseudo-anonymous groups where you don't really know who has ownership over what, like Burning Mancamps. And let's see. And then we did YC and then we got a bunch, almost many of those people did it. My hack was to go do YC and try and get all the companies doing YC to use my product, which many of them did. I also took like very extensive notes of all the YC presentations using the product that everyone would then like look at. And we were
Starting point is 00:04:27 to get Stripe. And Stripe used us for many years, actually. They built for the, I think we were their first knowledge base. They used it for a long time. A bunch of other big companies as well. And post Dropbox acquisition, you worked on what became paper. How did you think about what you wanted to go work on next? I spent the next few years kind of poking around it stuff. I knew that I wanted to make a robot that does stuff for you. So there was a company called Magic. So I went to this kind of called Magic. They were doing this, like, text-based personal assistance. You remember this one?
Starting point is 00:05:01 I think just like everybody starts an education startup. Magic is one of those names of cycling, and there's a really cool AI company right now called Magic as well. So I feel like there's also these names that kind of persists from generation to generation, which I think is really cool. I'm sure you know it's Code Gen. Yeah, yeah. No, yeah.
Starting point is 00:05:16 I haven't seen a demo yet, but it sounds like they're doing the right thing. There's a few people doing that kind of whole repository changes. So it seems like a great. direction. It was all op-space. It was super heavy op-space. So, you know, they had teams of people and it was 24 hours and they would cycle in and, you know, they'd lose context. And, like, they're all busy because they're trying to, like, deal with lots of people with lots of requests all the time. And so really, it was like a crash course in, like, human behavior, right? Like, what do people do under stress? How do they act? What do they say? What can you train? What can't
Starting point is 00:05:47 you train? Like, can you bucket stuff? And the answer is no. Like, humans are complicated, especially in text from. The beauty of the web in like traditional UIs, right, is that you fill them in. And like, if it doesn't do what you want, then you make the decision to either go forward or not go forward, right? Text has this annoying property of you can be all the way at the 99% mark and then change the goal entirely. Yeah, so it's complicated. It's hard to understand what people want. It's hard to understand the complexity involved, especially when you're dealing with the real world, flights get delayed, passwords get lost. Do you think we're the last generation to deal with that?
Starting point is 00:06:26 In other words, it feels like we're about to hit transition point in which agents can actually start doing some of these things for us for real, when before I think all these products really started with these operations heavy approaches. I remember, you know, there was a really early sort of personalized search engine called Ardbark. Or similarly, if you kind of look behind the hood, it was a lot of ops people and there was a little bit of algorithm, right? That was really with like, you would like sort of describe what you're good at or something.
Starting point is 00:06:50 and then they would try and send questions to you. Yeah, exactly. They'd kind of route things, but I think there's actually people doing some of the routing at least. I can't exactly remember, right? But I think it was, you know, I think a lot of people wanted to build
Starting point is 00:07:00 these really complex sort of bots or agents that were doing really rich things and the technology just wasn't there. It feels like for a period of time. There was also some startups. I remember one company that I got involved with that was trying to do like virtual scheduling and assistance, you know, six, seven, eight years ago.
Starting point is 00:07:16 And again, it felt like it was a little bit early, maybe. Yeah, it was a little bit early. Really cool. Clara Labs. Clara, that's the one. Yeah, yeah, yeah. Yeah, I remember the Sarah.
Starting point is 00:07:22 And then we had an operator. Do you remember operators? I remember operator? Yeah. And then, like, on the question answering stuff, we had jelly. So there's a whole series of. Yeah, what we were able to do with magic. As a like an aside there, you know, I started working there because I wanted to work on the AI part.
Starting point is 00:07:38 I think somewhere in there, Facebook M started as well. And it was just, it was a fun place to try and learn everything I could about solving those problems. So, yeah, before Transformers, there was sequence to sequence, was kind of the previous iteration. You were able to take all the histories of the chats between assistants and people and train a little model on it and a little by today's metrics and run it. And it would show some gray text in the little text bar and people could edit it and hit enter. The operators. Yeah, yeah, the operators. And, yeah, we would measure how much time they spent typing with it and with,
Starting point is 00:08:17 said out of it. And so it would save like half an hour across 100 people per day, across an eight-hour shift. So yeah, that was like my first shipping AI product. And then how'd you end up going to Microsoft? Oh, yeah, there's a bunch of other stuff along with it. So after that, I got into crypto. My friend was doing H-CAPTCHA, which was like sort of a CAPTCHA marketplace, which is now like something like the number one or number two, CAPTCHA service in the world, which is crazy. So kind of launched that. That was fun. annoyed people the world over for many man hours in aggregate, and then left that to work with Moxie on a cryptocurrency for Signal. So that was really fun, complicated, and it all worked
Starting point is 00:09:02 in a few seconds. We were shooting for Venmo quality. When you think about crypto in the context of AI, because people talk about it in a few different contexts, right? One is you have programmatic sort of money as code running, and so that could create all sorts of really interesting things. from an agent-driven perspective, but then the other piece of it is identity. And some people think, I mean, World Coin would be one example, but there's other examples of effectively trying to secure identity cryptographically on the blockchain in an open way, and then using that identity in the future to differentiate between AI-driven agents and people. Do you think that's going to be important, or does that stuff not really matter in terms
Starting point is 00:09:36 of the identity portion of not only crypto, but just like how we think about the future of agents? It's a good question. The honest answer is I think we're going to go through a many-year period of extreme discomfort where AIs pretend to be things or confuse people or extract money from your grandparents or drain people's life savings in ways that are scary. And, you know, Open AI is trying to do their best, but for some reason the focus has been on Open AI doing everything. And instead of like, we should go build the systems that prevent that. We should go pass the legislation that drops the hammer on people doing that stuff. We should go. All this kind of
Starting point is 00:10:17 that is, unfortunately, it seems like we're going to need some really bad things to happen before we align correctly. I'm not really scared about AI's killing us, although I'm very grateful that there are people that are thinking about it. I'm more worried about bad people using new technology to hers. Yeah, Elia from Near has some really interesting thoughts on this because he was one of the main author, or he was the last author on the Transformer paper before he started Near. And he's brought up these concepts of like, how do you stress test society relative to the coming wave of AI, which I think is an interesting concept. Yeah, it's a great way to look at it. Like, it's not as bad as it could be, right? If you think about it, there are most of the things that you want to spam, either have a
Starting point is 00:10:56 spam blocker or are somewhat difficult to create an account on. So doing a better job of sock puppet account filtering is going to be really important going forward. You know, I like what Cloudflare is doing with their kind of fingerprinting instead of visual captures, which are not good enough anymore. One thing that is like kind of a saving grace here is, is that, is that many of the things that you would want to do cost money. So calling everybody costs money. Texting everyone should be hopefully illegal soon, but also costs money. Maybe not enough money to prevent these things, but...
Starting point is 00:11:30 Probably not enough as agents can make money, right? They can just look at the tradeoffs of cost. Yeah, I think it's interesting. I guess I would say it's not, you know, it's somewhere in the middle. It's like, imagine there's a, you know, North Korea has been trying to do this to us for a long time, right? Now there's a North Korea that has more resources or is more distributed or whatever. We have some mitigations. We need more.
Starting point is 00:11:53 We need to be thinking about it a lot more. How did you end up at GitHub and how do you end up working on copilot? While I was working on mobile coin, my dad's kidneys failed and I tried to donate a kidney. And they found a lump in my chest as part of the scans they do. And I had to have most of my right lung removed in 2018. And so that was a big deal. And so took some time. It's weird.
Starting point is 00:12:21 Healing from internal injuries takes a lot longer than you think. Anyway, happy story is that I don't have cancer now for over four years. And my dad's kidney's got, he got a transplant also. So things are good. And so after, I don't know, I guess I was recovering for quite a while. And then I went and begged my friend for a job. I figured I should start working again. That was the transition, I guess.
Starting point is 00:12:49 So I worked on some random stuff at first. I converted GitHub to using their own product to build GitHub, which is kind of fun. They weren't, yeah. I think people still use code spaces now to build GitHub, which is pretty cool. But yeah, then this kind of opportunity to work with Open AI came up. And because I had been tracking AI in the past and was pretty aware of what was going on, I jumped on it. Was that proposed by OpenAI or by GitHub or who kind of initiated it at all? So I don't know the exact beginnings.
Starting point is 00:13:17 I know that Open AI and Microsoft were working on a deal for supercomputers. So they wanted to build a big cluster for training. And there was a big deal that was being worked out. And there was some software kind of provisions thrown in, I think Office and Bang probably. And GitHub was like, oh, okay, well, maybe there's something GitHub can do here. I think Open AI threw a small, through a small fine tuneover, and was like, here's a small model trained on on some code, see if this is interesting, you know. So we played around with it.
Starting point is 00:13:44 What's small? I don't know. This was, I have to remember now. I think this is before I knew very much. So it was definitely not a Da Vinci size model, it's for sure. I don't know what size was. Yeah. And so it was just like I learned later that this was basically a training artifact.
Starting point is 00:13:59 So they had wanted to see what introducing code into their base models would do. I think it had positive effects on chain of thought reasoning. Code is kind of linear. So you can imagine that, you know, you kind of do stuff one after another and the things before have an impact. And, yeah, it was not that good. It was very bad. It was, I think, just, like I said, just an artifact and a small sample of GitHub data that they had crawled. And this was before, actually, I joined.
Starting point is 00:14:27 Me and this guy, Albert Ziegler, were the first two after Uge. Uge got a hold of this model and started playing with it. And he was able to say, like, well, you know, like it doesn't work most of the time, but here it is doing something, you know, here's, here's, and it was only Python at that time. Here it is, you know, generating something useful. We didn't really understand anything. So that was enough to like, okay, well, you know, go fetch a couple of people and start working, see if there's anything in there.
Starting point is 00:14:53 We didn't really know what we had. So the first, you know, task was to go test it out, see what it did. We crowdsourced a bunch of Python problems in-house, stuff that we knew wouldn't be in the training set. And then we started work on fetching repositories and finding the tests in them so that we could basically generate functions that were being tested and see if the test still pass. There had been like a brand new pie test feature introduced like recently that allowed you to see which functions were called by the pie test. So you go find that function, zero the body, ask the model to generate it and then rerun the test and see if it passed. And I think it was less, I don't know, 10%, something like that. of those guys and the dimensions are kind of like how many chances do you give it to solve something
Starting point is 00:15:39 and then how do you test whether it's worked or not right so for the standalone tests that was we had people write test functions and then we would try to generate the body and if the test passed then you know it works and in the wild test harness we would download a repository run all the tests look at the ones that passed find the functions that they call make sure that they weren't trivial generate the bodies for them rerun the tests you would to pass us to it, then you get your percentage. Yeah, I mean, it was something like some very, very low percentage up front, but we knew that there was kind of a lot more juice to squeeze.
Starting point is 00:16:14 So like getting all of GitHub's code into the model, and then a bunch of other tricks that we hadn't, you know, we hadn't even thought of at that time. Yeah, and eventually, you know, it went from, you know, less than 10% in the wild test to over 60%. So that's like one in two, tests, it can just generate code for, which is insane, right? Somewhere along the way, you know, there was like 10% to 20% to 35% to 45%, you know,
Starting point is 00:16:43 these kind of like improvements along the way. Somewhere along the way, we did more prompting work so that the prompts got better. Somewhere along the way, they used, you know, all the versions of the code as opposed to just the most recent version. They used diffs so that it could understand small changes. Like, yeah, it got better. And so, but when we first started, we were just, We didn't know we had. We were just trying to figure it out. At the time, they were thinking in terms of, like, maybe you can replace Stack Overflow or something, you know, do a Stack Overflow, you know, do a Stack Overflow?
Starting point is 00:17:12 Was that the first, like, product idea? I think you guys had for it. I don't know that we had that idea. I think that was kind of like a... Yeah, yeah, yeah, yeah. It was more like, if you made something they competed in Stack Overflow, because we have all this code, wouldn't it be nice to leverage it. Yeah, and so we made some UIs, but like, early on, you know, it was like, early on, it was bad. So it would be like, you'd watch it and it would, it would like, it would run and most of them wouldn't be bad. And be like, there'd be like one success and be like, oh, sweet, I got a success, but I had to wait, you know, some number of seconds for.
Starting point is 00:17:45 Was the test user group just like the six of you, like some larger group? The first iteration was just like an internal tool that people would, that help people write these tests. And then we wanted to see if maybe we could turn that into some UI that people would use where if there was some way to cover up the fact that one in ten things passed. Right? So we tried a few UI things there. And then it was actually open AI. It was like, it would be nice if we could, we're testing these model fine tunes.
Starting point is 00:18:11 It'd be nice if we could like test them more quickly. Can what about doing like a VS code extension, just do autocomplete and, uh, all right, sure. Why not? And yeah, so we did auto complete at first. And that was kind of, that was a big jump, you know, because there were still thinking in terms of like stack overflow, you know, but this is, it's like, you know, I didn't have any idea. basically. I don't know how to beat Stack Overflow with this thing. But we could play with some stuff in VS code that was maybe closer to the code. First we did autocomplete and that was kind of
Starting point is 00:18:44 fun. It was useful. It would show this little pop-up box like Autocomplete does and you could pick some strings. And so that format actually, you know, the usage was, you know, it's fine. It wasn't the right metaphor exactly, right? But you've got like this code generated mixed in with the specific terms that are in the code, and it's a little, it's not exactly the same thing. We tried things like adding a little button over top of empty functions, so you would go generate them, or you could, like, hit a control key, and it would create a big list on the side that you can choose from, or there's a little pop-up thing. So basically tried every single UI we could think of in VS code.
Starting point is 00:19:22 And multiple generations, like the list, that didn't work. Yeah, none of them really worked. I think lists were like, you know, maybe you'd get one generation per person per day. And this was just a small sample. It's just like a few people that were interested in GitHub, language nerds or people that have written tests for us and opening up people. Yeah, so very early on, I had this idea that it should, I had this idea that should work like a Gmail, great text autocomplete,
Starting point is 00:19:51 which was, I was like enamored with that product. It was like, it was the first quote unquote large language model deployment in the wild. It was fast. It was cool. The paper's great. They give you all sorts of details in how they do it, all the workarounds they had to do. So that was always in the back of my head, you know. It was bad also. It was like, you know, those completions are not good. But it seemed like the right thing. Anyway, and somewhere along the way after I tried all the UI, you know, sort of come up with some idea for, BS code didn't support this. So I tried to hack it in. Finally came up with the way to hack it in and enough to make a demo, like a little
Starting point is 00:20:28 demo video. Most of their support to, like, build real support for it within the organization? It's a little complicated. I guess, you know, we were pretty much a skunk works project. No one knew about us. So we would go to, like, if you go to VS code people, we'd be like, hey, we need you to go implement this very complex feature. Like, I don't even know who you are.
Starting point is 00:20:45 Like, what are you talking about? There was definitely some politicking that happened to get the VS code people to dedicate some resources to that on a short, short time frame. Like, we were moving really fast. You know, it was less than a year before from beginning to ship. public launch. Was there a certain metric where you're like, this is good enough?
Starting point is 00:21:03 Like we need to actually put it in the public product? Which specifically with the completions or like the UI? It was completions. It was. Yeah. There was, I mean, we had a long,
Starting point is 00:21:14 we had a nice long window of public access before GA. So where it was free and you could use it. And we did a bunch of optimizing for different groups of people that would be, you know, okay, well, you know, do we have more experienced people?
Starting point is 00:21:27 Do we want more new people who want people from this area or this area? And that gave us a bunch of really good stats. So we were able to learn that, for instance, like, speed is the only thing that matters. Yeah, there's something crazy thing. Like, every 10 milliseconds is 1% fewer completions that people would do. That adds up. 10 milliseconds is pretty fast. We learned that because somewhere in our first few months of public release,
Starting point is 00:21:53 we noticed that Indian completions were really low. Like, for whatever reason, they were just significantly lower than Europe. So network latency to India? Yeah, and it turns out because opening I only had one data center, so it was all in Texas. And so it was going, if you can imagine, you're like typing. And so that goes from India through Europe, over the water, down to Texas and back and back and back. And now if you've typed something that doesn't match the thing that you requested, then that's useless. So you don't get a completion.
Starting point is 00:22:22 By the way, I know it's obvious, but that's happened on every single product I've ever worked on. And, you know, like when I was at Google, like, I worked on a variety of mobile product. Same thing. You know, page load times. Obviously, search in general, 100 milliseconds difference, like, makes a big shift in market share. So, yeah, that's kind of nuts. That's how much speed matters. Yeah.
Starting point is 00:22:38 And so once we figured that out, we knew that we had something awesome, right? Like, people that were close to Texas were like, this is freaking great. Like, they were like, you know, we had a Slack channel and people were posting in it all the time. And the most fun stuff was like, these people that would pop up and be like, I don't program, but I just learned how to write this 100-line script that does this thing that I need. It's like, oh, my God. I definitely feel like I now speak different languages
Starting point is 00:23:01 that I don't actually know the syntax of, which is very exciting. Yeah, it turns out these models are really great at finding patterns. So once we had the UI mechanism that worked, so we knew we were on something of that, and then it was just this like using as much performance as we can get out. We basically never found the bottom of like, you know, we made as fast as we could,
Starting point is 00:23:21 and it was still improving completions. And I'm not perpetuated, like, okay, okay, I know there's this plan for, like, Azure to run Open AI in six months. We need you to do that in the next month. So, like, can we, let's figure out how to make this happen. So, because we wanted to run a bunch of GPUs in Europe, so we could hit Asia. You know, there was no, there was no other place that we could, we could run them in Europe. We could run them on West Coast. We could run them in Texas at the time.
Starting point is 00:23:50 So that was, and Microsoft stepped up there. We got it running. And then pretty much after that, we launched. And, yeah. Were you surprised by the uptake request launch? No. No. I mean, our retention rate was 50%.
Starting point is 00:24:02 Like, it never went, like, months later, it was still above 50%. By, like, weekly cohort, which is, like, insane. Yeah. Right? And we didn't, we didn't know if people would pay for it. That was one thing. I lobbied pretty hard for going cheap and capturing the market. How did you guys think about inference?
Starting point is 00:24:22 cost for this thing at the beginning. Oh, yeah. We were, our estimates were wildly off, wildly off. Yeah. So we got estimates that were like, you know, it'll be 30 bucks a month for, for your on average, right? And then once Microsoft is able to like do some, kind of do their Azure infrastructure, we were able to then like fork off little bits so we could do more accurate projections.
Starting point is 00:24:48 And there's a bunch of like moments where like, how much is it going to cost? You'd like wait for these results. The first big one was 10 bucks a month. It'll cost 10 bucks a month. And I was like, oh, my God, it's so much cheaper than we expected. And then we could optimize it. And that was like we hadn't even optimized on price yet. Right.
Starting point is 00:25:05 And then we optimized and priced a bunch. And yeah, now it's less than that. So like, it was very fortuitous, right? Like we were thinking like, okay, well, maybe it's, maybe it's enterprise only because that's the only people who are going to be willing to pay for this. Like 30 bucks a month is not, it's a lot. And that's like with no market. margins, right? For 40% of your code, it's not a lot.
Starting point is 00:25:26 Yeah, that's the thing. We know that. We know that now. Where do you extrapolate all this like three to five years out? Are they basically going to be just like agents writing code for us in certain ways? Is it 95% of code is written by co-pilots and, you know, humans are kind of directing it? Like, what do you kind of view the world evolving into in the next couple years? And next couple, I mean like three to five, not 20. Yeah, it's hard. I don't know. I think it's hard for me to, you know, it's hard for me to imagine what that world looks like because it's such a shift from like I have my hands on something and I know that it's right to where we are now, which is I mostly know it's right or I have a sense that it's right,
Starting point is 00:26:07 but I have to test it and see it run to know that it's right to then just write this and I'll trust that it's going to work. Like those are pretty crazy transitions, right? Yeah. They exist. Like, you know, for sure. But you could also imagine like certain ways to do. like a code review post some chunk or some other sort of quality check to yeah i think every
Starting point is 00:26:27 if that's the goal we want to get to is the people like i think every barrier to that is achievable so we can code review only the dangerous parts or only the confusing parts or we can do things like train a model on functions before and after changes to say like okay this looks like a more polished version of this function would be this or yeah we can do things like you know just start the very basic, you know, very basic main loop and then add everything piece by piece with tests so that it's, you know, what works and what doesn't, and then just have the, the, I keep generating what's the logical next feature. Like, all these things will get figured out. So if that's what we want to do, that's what's going to happen. So what's the idea behind Minion? Yeah, I think I mentioned
Starting point is 00:27:13 making bots that do stuff for you. It's a broad topic. And I think it's where we see going. You know, the next few years are in the AIs taking action, not just answering questions or writing copy, but actually helping us in our daily lives. Things like organizing my schedule or booking flights or finding a trip for me to take or doing my taxes or telling me which contacts I haven't talked to in a long time and should reach out to, you know. There's a lot of stuff that we can do by giving AI's access to information and letting them act on that information in a controlled way that checks to make sure that we're aligned. And yeah, I think that'll be a really fun future. Almost like you can imagine co-pilot applied to everyday activities, right? Like, Copilot gives you a little bit of help. So I want Minion to give you a little bit of help outside of your code editor.
Starting point is 00:28:15 How do you decide to work on Minion specifically? Minion specifically. So basically, I stopped working at Magic. I quit because I couldn't figure out how to hook up the AI to data. I was like, in order to improve the quality, I need a PhD in math. Like, I don't know what to do now. And it just sort of, the models got better enough where that specific problem seems solvable. Yeah, so the tech got better.
Starting point is 00:28:40 And so that specific problem is where interacting with the real world broke down. in the past you know like i said floods get delayed prices change not just like a little bit all the time you know they might not have the seat you want at the concert that you want to go to and so you know aIs are this kind of compression of everything that you in their training set but they're not a real-time mechanism anyway so that that was the idea was like okay well i think we can you can work on this old problem of how to make a bot do stuff for people that's what we want Let's go make it. I don't know.
Starting point is 00:29:18 It's like maybe I can use the excitement from co-pilot to launch into something which is incredibly hard. And but which I believe the technology is around for if we can figure it out. So yeah, that's it. That's the only reason I've ever gotten to startups, actually, is like, okay, I want to do a startup so that I can do a harder startup. You know, or I want to do a project so I can do a harder project. Is this one sufficiently hard? This one's hard. Yeah.
Starting point is 00:29:40 It's good and hard, I think. Part of that is there are these fun things that you kind of learn along the way that keep they keep it like engaged, right? Like the thing with code with co-pilot is like, it turns out code is pretty special, right? Like you can run it. So if an AI generates some code and it runs, then you know something about that code
Starting point is 00:29:57 that you wouldn't know necessarily the text. And accomplish, like doing agent-like work on the internet or on apps has a lot of these similar kind of properties where it's like, oh, yeah, you can learn something here. We can learn from what people do where we can see if it's a success or failure and then learn from that, or we can optimize what works based on what doesn't work or figure out what's annoying and try to improve it.
Starting point is 00:30:23 So I think these things all compound in some really interesting way. You know, I think the goal, right, is, you know, straight out of sci-fi, right? You want to make a thing where you say, hey, computer, file my taxes, and it does the right thing, you know? And I think we can get there. It's certainly in the next few years. But it's also fun to think about how to break these things down and it turns out breaking down tasks in the same way that humans do. Take a complex task.
Starting point is 00:30:51 You figure out, you know, you write a list of things to go through it. Same kind of thing works for AIs as well. So you break down complex tasks. Figure out if there's any information you need. Maybe you have to write some code. Maybe you have to ask some questions. Maybe you have to query some data. Same thing you wouldn't do.
Starting point is 00:31:07 Humans don't usually write code to do their taxes. But sometimes it's kind of the same thing. through a list of your pay stubs, you know, that's executing a four loop. Also, I like datasets that don't exist. So people clicking on stuff on the web is a gigantic data set, which is currently unowned. And so I think that's pretty exciting. I think there's a lot of stuff to learn from that. So you're hiring.
Starting point is 00:31:32 One thing we've talked about is like it's kind of a funny thing to try to hire for building products with this new set of technologies. that, like, working in machine learning for the last decade may not help you that much? How do you think about the people you need? Yeah, it's been strange. It seems very bimodal. Like, very senior people get it, and very young people get it. And so I don't know that the middle has really caught up yet or realizes that...
Starting point is 00:32:03 Are we on the other side? Are we in the middle? No, I just made the middle in terms of, like... I just made the middle in terms of, like... It's like... I'm very young. No, it's not an age thing. It's not an age thing. It's more like you tell someone who's very naive and you tell someone who's very experienced, like, I can make magic. Here's how I can do it. And then the very naive person is like, that's awesome. And then the very experienced person is like, ah, maybe. Yeah. But the response from almost everybody else is that's not possible or I don't see it. Yeah. A number of companies I know are trying to build that intuition internally now. And often this is the first time in a really long time that I've seen certain founders come back and start coding again. You know, they'll have a multi-hundred-person company and they'll get so
Starting point is 00:32:45 enamored by what's happening that they'll just dive in. A number of them have told me that they feel like some of their team members just don't have that intuition for what this can actually do. And so they don't even know what to build or where to start or how interesting or important it is. And so it's almost like this founder mindset is needed to your point in terms of this is a really interesting new set of capabilities and how to actually learn what these are and then how do I apply them? And I think people lack natural intuition for what this does right now. It's definitely not intuitive. Yeah. I mean, I would say I have some intuition now and people that are, again, on that experience spectrum has some intuition. But it's
Starting point is 00:33:23 not intuitive. There's many kinds of different learners and there's many kinds of different programmers, right? It takes a certain kind of programmer to be comfortable with this idea of like, I don't know what's going to work. Let me try some stuff. It turns out that it's similar attributes to the web or product, product making, being able to test stuff, look at the right metrics, find the right metrics. Yeah, those are, those are useful skills, I think, that are reusable. But the intuition and the tenacity that it takes to, like, you know, this doesn't work at all. What do I do? That's still rare, I think, in that field, because there's so much uncertainty that it's easy to go, like, I tried 10 things and it didn't work. So, like, this is impossible. Like, okay.
Starting point is 00:33:58 I think the natural reaction of somebody looking at something that has 10% performance at the beginning is like, why you can bother we're not going to get there, right? That's a great point. The crazy thing is that these things work at all, right? And not only that, but they scale and they improve the scale, right? Most things break
Starting point is 00:34:17 at scale. Almost everything breaks at scale. And that's what, you know, that's what you hire people to deal with. You know, every doubling, you usually someone breaks, you're going to fix it. It's a pretty crazy thing to think of what kind of emergent properties might still be out there.
Starting point is 00:34:32 if we can get these things 10 times bigger. So I'm always in favor of people pushing those limits, you know? Like, it doesn't make sense. Like, even the best people can't explain what's going to happen when you, you know, go bigger. We built the large Hadron Collider for that reason, too, you know? We didn't know. I don't know. We think maybe we'll find some stuff.
Starting point is 00:34:52 These endeavors are valuable. And, like, yeah, I'm grateful to be alive during this time. It's a great note to end on. Thank you for joining us for the podcast. Yeah, thanks so much. Sure. Thank you for listening to this week's episode of No Pryors. Follow No Pryors for a new guest each week and let us know online what you think and who an AI you want to hear from.
Starting point is 00:35:12 You can keep in touch with me and conviction by following at Serenormus. You can follow me on Twitter at Alad Gill. Thanks for listening. No Pryors is produced in partnership with Pod People. Special thanks to our team, Synthel Galdia and Pranav Reddy and the production team at Pod People. Alex McManus, Matt Saab, Amy Machado, Ashton Carter, Danielle Roth, Carter Wogan, and Billy Libby. Also, our parents, our children, the Academy, and tyranny.m.L., just your average-friendly AGIW world government.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.