No Priors: Artificial Intelligence | Technology | Startups - How the ARC Prize is democratizing the race to AGI with Mike Knoop from Zapier

Starting point is 00:00:00 Hi, listeners and welcome to NoPriars. Today we're talking with Mike Knoop, the co-founder and head of AI at Zapier. Mike co-founded the company in 2011 and was an early adopter of the power of AI in the enterprise. Recently, he's joined forces with Francois Cholet to launch a competition to accelerate progress towards AGI called the Arc Prizes. Mike, welcome the No Priors. And maybe you can start off just by telling us a little bit more about what you're up to on the price side. That sounds really exciting. Yeah, thanks for having. I'm super excited. I've been a no prior listener since literally episode one. So finally excited to get on and introduce yourself. So I'm like one of the co-founders of Zapier. I've run and advised all of our AI projects over the last two years or so. And my day job has been, you know, building AI at the application layer for Zapier. But my kind of nights and weekends have been more interested in this like AI research and progress. In fact, this kind of curiosity goes all the way back to kind of

Starting point is 00:00:59 my college days pre-Zappier. I think actually this is one of the reasons why Zapier was so early into some of the AI stuff was kind of this curiosity in like AGI. The chain of thought paper that came out in Jan 2022 was what kind of like shook me loose. I was running half the company actually at that point. And I gave up my exact team role to go like kind of back to being an I see and answer for myself, like how close are we to AGI? And as it turns out, we are not that close.

Starting point is 00:01:26 You know, my belief is that AGI is probably. has really stalled out over the last four or five years. And I think there's a kind of a handful of reasons for that. I think the biggest one is that the kind of consensus definition of what EGI is, the definition of it is wrong. I think we're measuring the wrong things. And this leads people to think that we're closer to AI than we actually are. This causes like AI researchers and kind of generally the world to be over-invested

Starting point is 00:01:49 in exploiting this large language model like paradigm in regime, as opposed to exploring like new ideas, which are desperately needed. did. And like Frontier AI research is also basically like completely stopped publishing. You know, the GPD4 paper had zero technical details. The Gemini paper had zero technical details on the longer context stuff. And I just wanted to help fix this. I wanted to see if there was something I could do to reaccelorate. And so yeah, I'm excited to share. We just launched Ark Prize. It's a million dollar plus nonprofit public challenge to beat Francois Shole's Arc AGIEval and open source the solution to it and open source the progress

Starting point is 00:02:26 towards it. Arc AGI, to best of my knowledge, is the only true AGI Eval that actually exists in the world and measures a actually good definition, a correct definition of what AGI is, which we can talk about. There's an AI lab called Lab 42 out of Switzerland that's been running a small annual contest over the last four years to try and beat this Eval. And state of the art today is 34%. Stay of the art four years ago when it was first introduced was 20%. So we've made very, very little marginal progress towards it. And this was pre-L and pre-scale, right? So it's like, it has successfully resisted the advent of scale in LMs. The Arc AGI actually looks like an IQ test if you go look at some of the puzzles.

Starting point is 00:03:04 Maybe we can overlay some of the puzzles and show some stuff. Yeah, could we actually get into that? I'd love to hear sort of what you view as the consensus definition of AGI today. What's wrong about it? And then what do you think is the right way to measure or calibrate against that? Yeah, the sort of consensus definition that I think is most popular in sort of the area industry now is that AGI is a system that can, you like the majority of economically useful work that humans can do. I think Vinod gets credit for coining this one. And, you know, I think it's a useful definition, actually. You know,

Starting point is 00:03:37 look, I spend my day job building application. There is legitimate economic value that is sort of unlocked by the current regime with language models. However, I don't think it's a good EGI information though. You know, I think it's a good definition of systems that are useful and economic useful. But, you know, I kind of joke that like, I think it says more about what many humans do for work than it does about actual general intelligence. And, you know, I think it's a good And Francois's definition, which is the one that I think is the right one, is this definition that general intelligence is a system that can effectively, efficiently acquire new skill. That's it, efficiently acquiring new skill, being able to solve these open-ended problems

Starting point is 00:04:14 with that ability. And here's sort of the simple, like, maybe argument in this line of thinking is, you know, we've had AI systems over the last 10, 15 years that can now, you know, win, poker, fold proteins, drive cars, win at chess, and yet I can't take any system that was like trained to beat, you know, poker and go teach it to drive a car. And yet, you know, a lot. That's something like incredibly easy for you to do, right? I could take you out into the parking lot and probably teach you to drive a different car and show you a variant of poker and teach it to. Your ability to like, you know, very efficiently, sample efficiently, energy

Starting point is 00:04:51 efficiently to be able to acquire that new skill in alert it is really what makes you humid and shows the general intelligence ability that you have. And that's what's missing from pretty much every AIEval that exists today. And this archa-gai deal that Francois built back in 2018 is an actual measure of it and formalized. It's a definition and a measure of it that we can actually test against and see progress towards. Yeah.

Starting point is 00:05:13 You mentioned that you feel like LLMs aren't good progress in this direction, but I think one of the arguments for LLMs is something that's unlocking so much economic and other value is the fact that it is generalizable in different ways that didn't exist before. And it does open up the aperture in terms of one system that's kind of trained broadly, but then can do a lot of very specific subtasks. So could you explain more about why you don't feel the just scalability of LLM sort of leads in this direction eventually or scalability of some multimodal model? You know, the sort of claim goes like this.

Starting point is 00:05:45 Effectively what large language models do today is they are high-dimensional. memorization systems, right? They are trained on lots of training data. They're able to find and generalize patterns off of the training data that they're trained on and then apply those in new contexts. And memorization is a form of intelligence, I would claim. But it's not a form general intelligence, right? We need something, there's something more that we need in order to be able to go discover

Starting point is 00:06:12 and invent alongside us. You know, this is the things that I care about, like with AI. This is why I want to build AI. I think, like, if we want to pull forward the future and actually have AI systems, that are able to, you know, discover new branches of physics or pull forward our understanding of the universe, pull forward, like, new therapeutics. The answers to those don't show up in high-dimensional patterns from our existing training data because, like, the answer is literally unknown, right? The pattern is unknown, in fact. You might be able to find some sub-patterns that can apply in like similar reasoning chains, and that's actually how current sort of agent systems work, right?

Starting point is 00:06:45 If the reasoning chain that you need an agent to follow is simple enough, such that the reasoning chain shows up, in an abstract way in the training data, it can oftentimes pluck that and apply it. And it works. Like, this is how Zapier's AI bots actually work, because they're able to, like, you know, see enough sort of small chain reasoning examples and apply that in a new context.

Starting point is 00:07:04 But for EJ systems that are going to go do, like, completely new things for us and solve open-ended problems that, where the sort of reasoning machine doesn't exist in the training data anywhere, that's where it owns are just going to fall flat and be insufficient. And, you know, at the end of the day, I'm an empiricist, I think.

Starting point is 00:07:17 I think that's the only thing that really works in AI is you have to just look at what works and what doesn't. and just sort of objectively, language models do not work to beat ARC. And people have tried. But, I mean, I guess the kind of argument to that is, well, we just need more scale, and then we need to focus on certain types of reasoning modules or other things and some notion of memory. Like, there's basic components that just feel to still be missing.

Starting point is 00:07:37 And maybe that's your point, you know, to some extent. Scaling language models purely will not get there. I think there is a, like, Transformers maybe, right? I think transform might be a potential component of it. Like, I think the biggest thing that we get from, from maybe the biggest thing I think transform we've shown is like we now know how to build a really effective robust perception stack, right, where we can take a deep learning network, show it multimodal data and come up with like numerical representations of that data and do like operations over it, right? And I think that's that likely is a probably a solution path towards true AGI, but the language model version of it where we're just sort of doing that stroke and prediction and training on data. Like that system alone is the one I would claim that like no amount of scale.

Starting point is 00:08:20 Well, like that system, if you just put, you know, double the number of parameters, 10x the number of parameters into it, 10x number of data into it, you're never going to get to H.I. Like, we do need, we need something more. There's something addition in addition to that we need. Okay. And then what ideas do you think are missing or what areas do you think people should be exploring further? I have two thoughts here. One that's working.

Starting point is 00:08:41 So there's one of the techniques that has showed some promise on the Arc Challenge in past years has been this technique of program synthesis. For instance, it's actually been even a long, longer than sort of co-gen models have been. So this idea of like having a computer program that like searches through program space of possible programs and assembles them together in order to do something.

Starting point is 00:09:02 You typically have like, you know, an endpoint output. And you're trying to discover a program that can like map your input to your output. And so it's a very relaxed universal search space, right? You're not sort of following a back propagation gradient of like a signal in order to figure out what the program is you're actually like looping through all possible

Starting point is 00:09:18 programs. And because you're sampling from, like, the full sort of search space there, it increases the likelihood you actually discover, like, a general form solution to it. And so that was what got some of the, like, mid-20% range progress towards ARC was, was in that direction. And it's just like, it's very, very orthogonal to sort of the language model transformer, like, thought chain stuff. But that's, I think, one very promising technique. And then I think the other one is figuring out ways that you can have computers do the architecture discovery itself. This is a not a new field or a new idea.

Starting point is 00:09:57 It's called neural architecture search. It's been around for a long time, I think maybe even 10 years now. It's never really amounted to much. Interestingly, you know, I think a lot of, it's mostly from the academic side of things. And neural architecture search oftentimes researchers don't have access to large-scale compute.

Starting point is 00:10:12 So they're like using a computer program effectively to search through possibly AI architectures. And because academic researchers often don't have access to a lot of compute, they take shortcuts in order to find results that they can publish. And I suspect now over the last four years, we have, we might now have enough compute that's come online at a cheap enough like kind of cost per flap that some of those old neural architecture search methods, we should revisit and relax the search basically try to try to take the learning from the bitter lesson of like, you know, not biasing these searches with human priors and human bias.

Starting point is 00:10:46 And try to relax the search and leverage a lot of the cheap compute that's come aligned towards that. When you talk about AGI, you know, I think there's some books like Blindsight, which tries to differentiate between intelligence and sentience, right? Self-awareness versus actually being able to intelligently do things. When you talk about EGI, is there an embedded concept of sentience and then it is a purely intelligence? I'm not a philosopher.

Starting point is 00:11:10 So, like, I'm probably the worst person to ask about this question. Look, I want to live in the future. That's like kind of one of the things I've been really excited about, like, you know, if I can help pull forward the future, I want to. And I think one of the best ways we could pull forward the future is to invent systems that can invent and discover alongside us. And I think in order to do that, we need this general form of intelligence or a system that can represent this, demonstrate this general form of intelligence about being able to efficiently acquire those new skills and help us solve these open-ended problems. So I don't, like I haven't thought deeply about like, I haven't thought deeply about, like, like, okay, well, is that system sentient, conscious? Yeah, the main reason I ask is more, depending on your viewpoint, that increases or decreases the relative risk of AI as a threat to humanity. And so there's sort of the Dumer argument of sentience is kind of more of an issue than maybe just intelligence.

Starting point is 00:12:02 Intelligence, to your point is, hey, you're harnessing this machine tool to be more efficient in different ways or help you in different ways, which is kind of what your view on the current state. And you said, well, let's focus on AGI as something that we want to pull forward, because the current approach is it's going to do a bunch of economic value, but it's not going to create these intelligent things, right? Or truly intelligent or generally intelligent things. So that's kind of the basis for the question is the degree to which you view there being

Starting point is 00:12:25 increased risks of pulling this technology forward versus not. And, you know, how you think about that more generally. Yeah. There's a good question. You know, I think the, you sort of get close to, you know, the ultimate alignment problem, which is a philosophical question, probably more than an engineering question. I think the only way that you really can approach this stuff is through an empirical lens.

Starting point is 00:12:51 I think you just have to look at what systems can do and make decisions based on that lens. I think it's incredibly dangerous to try and make predictions about future capabilities, about where the technology will go and make rules, legislations, laws like prohibiting or enforcing or requiring certain research directions through a theoretical lens. It just like hasn't empirically worked. I don't think anyone consider it today and say that we would if this is where AI would even be five years ago. So it feels just like incredibly short-sighted to say, well, okay, we're going to like enforce that the sort of language model regime is going to be the only one that we're going to allow it to happen for the forever. So I think that's that's where I end up starting is like you got to be empirical about this stuff.

Starting point is 00:13:33 And until I think we have some empirical evidence of what the systems can do, I think it is sort of dangerous to or at least harmful to progress to try and sort of limit limit. limit the research direction or add a lot of overhead and start exploring new ideas on that front. It makes sense. And then, you know, you've now established this ARC prize, which I think is super exciting. It's a million dollar prize towards, you know, an open source model that, you know, meet certain criteria against your metrics of artificial general intelligence. Why do it as a prize versus investing in companies or, you know, taking a funding, more traditional funding of startups or efforts model versus a prize model. I think outsiders are needed.

Starting point is 00:14:16 You know, there was 300 teams that actually competed in the ARC, like, small version of the contest last year in 2023. And if you go look at all the teams that competed, you know, they're like one or two person teams. They are outsiders to the industry. They're not working in AI startups. Many of them don't even live in like the Bay Area or Silicon Valley or California. It's a very globally distributed side of people with new ideas that are working on this stuff.

Starting point is 00:14:40 I am more confident, actually, that, or I guess I would bet that the solution arc probably comes from an outsider. I think it's probably going to come from somebody who's sort of not indoctrinated and the current way of thinking about language models and scale. Arguably, like, the solution art doesn't even require that much scale. You know, the cool thing about the puzzle, the archie giant e-bells, it's like kind of a minimal reproduction of general intelligence. It fits into a two-by-two game board that's like at max. It's like 15 by 15 squares big. Like it's, it's so small and reproducible. The data fits into such a small set that it's quite likely, actually,

Starting point is 00:15:20 that the solution, it can be, like, written in like 10,000 lines of code or less. And it's not going to require these, like, you know, gigantic, you know, 200 billion, you know, large parameter models in order to solve it. And so it's within, I think it's within the throes of outsiders. I think it's within the throes of people that, like, want to tinker on sort of the night and weekends. And really the goal of the prize is to, like my hope is that I sort of can encourage like the would be AI researcher, you know, who has like choice of what they work on on their nights and weekends to instead of saying like, well, maybe I could go build like another LLM

Starting point is 00:15:56 startup and maybe sell it to instead say, ooh, maybe I could go try to beat this Arc AGIE VAL. And if I do it now, not only is their status attached to it, but there's money attached to it, right? There's like, I get upside. There's like, there's an economic incentive. to like try and win. And I'm trying to like use the prize as kind of a way to counterbalance some of the like economic, you know, unlocked that language models have on the startups and things. You mentioned that ARC in part was inspired by your engagement with AI as part of Zapier and that strategy there.

Starting point is 00:16:28 Can you tell me a little bit more about what Zapier has built on the AI side and how you all both got to it early and then how you ended up approaching what to actually focus on? Because I feel like as people adopt this technology, there's almost like a multi-month phase of just figuring out what it can even do. So can you tell me about that journey and, yeah, how that all worked out? The summer of 2022, both Brian and I actually, my co-founder, Brian CTO, gave up our exact team roles. We went all in back to sort of being ICs, no direct reports.

Starting point is 00:16:55 And for about six months, all we did was like, build, like trying to figure out what was possible. So we built a version of, you know, chain of thought, tree of thought. We built a version of chat GPT, actually. Internally, it's after before it got, before it came out. And I think it gave us somewhat confidence that we, so. we'd like, to the best our abilities, fully explore the search base of what call a GP3 at that point, you know, intelligence-style model could do. And what it led us to see was probably the big gap was that sort of the models are frozen in a time, right?

Starting point is 00:17:25 This was kind of pre-tool use. And the most obvious thing to do was like, well, Zapier has a lot of tools, right? We have 6,000 integrations. Could we hook these language models up to use those tools? And that's ultimately what led to Zapier being a launch partner for the Chatchipati plugin, which I think is one of the first moments that Zapier's, like, kind of became known more popularly in association with AI stuff. Is there anything you can share in terms of adoption or metrics or usage by Zapier users

Starting point is 00:17:52 or customers of your AI products? Yeah, we've got, at this point, over 50 million AI tasks have run on the platform to date over the last year and a half or so since we started tracking. So this is like, you know, think of a Zap, right, where it's like you've got a trigger instead of actions where one of those actions is an AI. step. Dominantly, this is open AI or a chat to PT step where, you know, users doing content generation or feature extraction or summarization. Using AI in the middle of a workflow is kind of the dominant way people are adopting AI today. Over the last couple of months, we've introduced

Starting point is 00:18:25 other products in our ESFA. So we're using AI basically across the entire product. We've launched a new product called Zapier Central, which are effectively these AI bots that you don't have to built, effectively. You know, the classic way I think most people experience app years, you have to build in the editor, right? You go have to, you know, do lots of configuration and click, click, click in order to get your zap set up and just tune to the way you want. And one of the cool things of these new AIBots is you're programming with the natural language. And we're not actually even doing natural language to structure mapping. It is a pure inference-based engine interpreting the user's instructions of what they want the bot to do

Starting point is 00:19:00 and getting access to all the integrations and authentications that they equip it with. And so we're seeing some, just like an order of magnitude easier to use. product. Yeah, that's really cool. I guess one potential future direction that really fits well with what Zapier is provided in the past is sort of the agentic world or really having some of these tasks turn more and more into agents, right? You can imagine that you're setting up some workflow automation or something else and eventually it does things a bit more on its own or you can be a bit more directive and it just goes and does it for you. How far away from that world do you think we are? That's happening today. I mean, we have people literally paying for Zapier's

Starting point is 00:19:37 There's enough value that's unlocked that where people are willing to pay, right? I think that's been shown. The way that I think about this is like concentric rings of use cases that got unlocked as the consistency and reliability of the technology matures. So today, the sort of consistency and reliability thresholds that we're able to meet, the users are able to sort of get to kind of requires first adoption in like personal use cases or team-based workflow use cases where the risk is relatively low if something goes wrong. One interesting thing is like there's actual use cases like bot templates of bots that

Starting point is 00:20:11 we've built and given to different users where one user takes the exact same template, say one of these AI bots that can watch for a certain email hitting or landing in your inbox and sending a message to your team in Slack if it qualifies. Let's say, hey, you're looking out for a certain payment notification email or a refund notification email and you want those like, you know, routed into a certain channel in Slack. That exact use case might be completely acceptable for like a startup, right, that maybe has three Slack channels. and it's like just the founding set or the founding team. And you take that exact same bot, same template,

Starting point is 00:20:41 same exact same thing and go give it to, you know, a mid-market company that's got thousands of Slack channels, partner channels, lots of production things happening. And they might not be comfortable with that risk, right? They might want to clank down the possibility space of what the bot can do in a tighter way, whereas, you know, the first one say, hey, like, sure, have the bot just choose which Slack channels, write the message however at once.

Starting point is 00:21:02 You know, I kind of wanted to just figure it all out. And as you kind of move up and up the risk chain, you kind of want to install more and more clamps. So that's been a big part of our product build thesis for AI bots is like, how do we allow end users to provide clamping behavior on what the bot can and can't do in order to increase the size of the circles that of use cases that sort of get unlocked? So I think that's probably the march of technology we're going to see. I would expect to see it's like there's things that we can do still do that we haven't done yet in terms of making the product and bots more reliable and consistent that we're, working on right now. And I think there's things that the underlying sort of technology and models are going to improve out as well. They'll increase the reliability and consistency. And as that goes forward, I think you'll just see more and more, like the risk of use case will go go up. So open source software as well just open source ideas, papers, data sets, etc., have really

Starting point is 00:21:51 helped drive multiple areas of science and technology forward. How do you think about open source software in the context of AI, in particular given some of the regulatory and other movements that have been happening at both the California level, the national level, etc. My beliefs here are formed through how much we stole that, I think, on AGI progress. We still need fundamental research breakthroughs. We still need fundamental new ideas. And I think the Internet and open source has been one of the world's best inventions in order to generate new ideas. And so I think if you care about actually discovering EGI in our lifetime, then I think it's sort of incumbent to try and promote things that,

Starting point is 00:22:34 increase the likelihood that we're generating new ideas and having lots of AI researcher brains or would be AI researcher brains sort of encountering this stuff and it's not locked and closed behind, you know, a hiring process at a big lab. And so, you know, I'm very much in favor of supporting open progress, open research sharing, especially at the like foundational scientific level because we just need new ideas. And I think the best way to generate those ideas is through open source and open sharing at this point. I mean, the proof here is like literally open AI, right? Like the sort of genesis of the company came out of a published research result from from Google. And sadly, I don't think that's likely to happen now as a result of kind of a lot

Starting point is 00:23:17 of the commercialization and market incentives causing a lot of frontier publishing getting getting closed up because now these companies sort of have, they know the economic value of the research. So they're kind of playing more tight to the chest. And it's just like kind of worrying or upsetting. It's certainly at least stalling progress. And I'm hoping to play a small part in trying to counterbalance that of it. You raise an interesting point, which is the internet was basically driven by open protocols because there were a lot of closed proprietary protocols in terms of how networks function and how machines talk to each other. And then open source, right, in terms of Linux, space servers and other things that were really the workhors of the early

Starting point is 00:23:55 internet. And relatedly, there's a lot of attempts to regulate cryptography in the 90s. for adjacent but overlapping reasons in terms of why people are not trying to regulate AI, well, they say it's a threat or, you know, malicious actors could do malicious things, and everything's been fine with cryptography, and it's been net positive for the world to have it in place. So it's kind of interesting to see some of those analogs for parallel levels. Yeah, I mean, I think my sort of underlying beliefs on AI are AI should likely get regulated through the existing regulatory frameworks that exist. I don't see a lot of new harm or use cases or damage caused by just the narrow form of AI systems that we have today,

Starting point is 00:24:29 that existing sort of regulatory frameworks or agencies don't have power already to sort of regulate and make decisions over. That feels smart and the right way to sort of think about that stuff. Then on the AGI front, I think it's just really, really dangerous to put in prescriptive legislation ahead of seeing any empirical evidence of what the systems can or cannot do yet. I would not trade personal, independent freedom for the sort of what it would take in order to, like, prevent AGI from ever getting developed, just personally. Like, that's kind of my, it's my philosophical framework on that. You know, I'm open to us actually discovering, okay,

Starting point is 00:25:08 here's what the forms of AGI are going to look like, but what they can and can't do. And then making decisions about, okay, how do we want to release that? What is it, you know, how are we going to control that, making decisions at that point based on what we're seeing. But I would be very, very strongly against trying to, like, predict what those things are in theoretical sense. I think that just hasn't worked to the story.

Starting point is 00:25:27 quickly. Great. Well, thank you so much for covering all these wide diversity of topics, telling us more about ARC. It sounds like a very exciting initiative. And so I'm sure there's more to come there. And thank you so much for joining us today on No Pryors. Thanks for having me. Find us on Twitter at NoPriars Pod. Subscribe to our YouTube channel if you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com. Thank you.

No Priors: Artificial Intelligence | Technology | Startups - How the ARC Prize is democratizing the race to AGI with Mike Knoop from Zapier

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.