No Priors: Artificial Intelligence | Technology | Startups - How the ARC Prize is democratizing the race to AGI with Mike Knoop from Zapier
Episode Date: June 11, 2024The first step in achieving AGI is nailing down a concise definition and Mike Knoop, the co-founder and Head of AI at Zapier, believes François Chollet got it right when he defined general intellige...nce as a system that can efficiently acquire new skills. This week on No Priors, Miked joins Elad to discuss ARC Prize which is a multi-million dollar non-profit public challenge that is looking for someone to beat the Abstraction and Reasoning Corpus (ARC) evaluation. In this episode, they also get into why Mike thinks LLMs will not get us to AGI, how Zapier is incorporating AI into their products and the power of agents, and why it’s dangerous to regulate AGI before discovering its full potential. Show Links: About the Abstraction and Reasoning Corpus Zapier Central ARC Prize Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @mikeknoop Show Notes: (0:00) Introduction (1:10) Redefining AGI (2:16) Introducing ARC Prize (3:08) Definition of AGI (5:14) LLMs and AGI (8:20) Promising techniques to developing AGI (11:0) Sentience and intelligence (13:51) Prize model vs investing (16:28) Zapier AI innovations (19:08) Economic value of agents (21:48) Open source to achieve AGI (24:20) Regulating AI and AGI
Transcript
Discussion (0)
Hi, listeners and welcome to NoPriars.
Today we're talking with Mike Knoop, the co-founder and head of AI at Zapier.
Mike co-founded the company in 2011 and was an early adopter of the power of AI in the enterprise.
Recently, he's joined forces with Francois Cholet to launch a competition to accelerate progress towards AGI called the Arc Prizes.
Mike, welcome the No Priors.
And maybe you can start off just by telling us a little bit more about what you're up to on the price side.
That sounds really exciting.
Yeah, thanks for having. I'm super excited. I've been a no prior listener since literally episode one. So finally excited to get on and introduce yourself. So I'm like one of the co-founders of Zapier. I've run and advised all of our AI projects over the last two years or so. And my day job has been, you know, building AI at the application layer for Zapier. But my kind of nights and weekends have been more interested in this like AI research and progress. In fact, this kind of curiosity goes all the way back to kind of
my college days pre-Zappier.
I think actually this is one of the reasons why Zapier was so early into some of the AI stuff
was kind of this curiosity in like AGI.
The chain of thought paper that came out in Jan 2022 was what kind of like shook me loose.
I was running half the company actually at that point.
And I gave up my exact team role to go like kind of back to being an I see and answer for
myself, like how close are we to AGI?
And as it turns out, we are not that close.
You know, my belief is that AGI is probably.
has really stalled out over the last four or five years.
And I think there's a kind of a handful of reasons for that.
I think the biggest one is that the kind of consensus definition of what EGI is,
the definition of it is wrong.
I think we're measuring the wrong things.
And this leads people to think that we're closer to AI than we actually are.
This causes like AI researchers and kind of generally the world to be over-invested
in exploiting this large language model like paradigm in regime,
as opposed to exploring like new ideas, which are desperately needed.
did. And like Frontier AI research is also basically like completely stopped publishing.
You know, the GPD4 paper had zero technical details. The Gemini paper had zero technical
details on the longer context stuff. And I just wanted to help fix this. I wanted to see
if there was something I could do to reaccelorate. And so yeah, I'm excited to share. We just
launched Ark Prize. It's a million dollar plus nonprofit public challenge to beat
Francois Shole's Arc AGIEval and open source the solution to it and open source the progress
towards it. Arc AGI, to best of my knowledge, is the only true AGI Eval that actually exists
in the world and measures a actually good definition, a correct definition of what AGI is, which
we can talk about. There's an AI lab called Lab 42 out of Switzerland that's been running a small
annual contest over the last four years to try and beat this Eval. And state of the art today is
34%. Stay of the art four years ago when it was first introduced was 20%. So we've made very,
very little marginal progress towards it. And this was pre-L and pre-scale, right? So it's like,
it has successfully resisted the advent of scale in LMs.
The Arc AGI actually looks like an IQ test if you go look at some of the puzzles.
Maybe we can overlay some of the puzzles and show some stuff.
Yeah, could we actually get into that?
I'd love to hear sort of what you view as the consensus definition of AGI today.
What's wrong about it?
And then what do you think is the right way to measure or calibrate against that?
Yeah, the sort of consensus definition that I think is most popular in sort of the area industry now is that AGI is a system that can,
you like the majority of economically useful work that humans can do. I think Vinod gets credit
for coining this one. And, you know, I think it's a useful definition, actually. You know,
look, I spend my day job building application. There is legitimate economic value that is sort of
unlocked by the current regime with language models. However, I don't think it's a good EGI
information though. You know, I think it's a good definition of systems that are useful and economic
useful. But, you know, I kind of joke that like, I think it says more about what many humans
do for work than it does about actual general intelligence. And, you know, I think it's a good
And Francois's definition, which is the one that I think is the right one, is this definition
that general intelligence is a system that can effectively, efficiently acquire new skill.
That's it, efficiently acquiring new skill, being able to solve these open-ended problems
with that ability.
And here's sort of the simple, like, maybe argument in this line of thinking is, you know,
we've had AI systems over the last 10, 15 years that can now, you know, win,
poker, fold proteins, drive cars, win at chess, and yet I can't take any system that was
like trained to beat, you know, poker and go teach it to drive a car. And yet, you know,
a lot. That's something like incredibly easy for you to do, right? I could take you out into
the parking lot and probably teach you to drive a different car and show you a variant of poker and
teach it to. Your ability to like, you know, very efficiently, sample efficiently, energy
efficiently to be able to acquire that new skill in alert it is really what makes you humid
and shows the general intelligence ability that you have.
And that's what's missing from pretty much every AIEval that exists today.
And this archa-gai deal that Francois built back in 2018 is an actual measure of it and
formalized.
It's a definition and a measure of it that we can actually test against and see progress
towards.
Yeah.
You mentioned that you feel like LLMs aren't good progress in this direction, but I think one
of the arguments for LLMs is something that's unlocking so much economic and other
value is the fact that it is generalizable in different ways that didn't exist before.
And it does open up the aperture in terms of one system that's kind of trained broadly,
but then can do a lot of very specific subtasks.
So could you explain more about why you don't feel the just scalability of LLM sort of leads
in this direction eventually or scalability of some multimodal model?
You know, the sort of claim goes like this.
Effectively what large language models do today is they are high-dimensional.
memorization systems, right?
They are trained on lots of training data.
They're able to find and generalize patterns off of the training data that they're trained on
and then apply those in new contexts.
And memorization is a form of intelligence, I would claim.
But it's not a form general intelligence, right?
We need something, there's something more that we need in order to be able to go discover
and invent alongside us.
You know, this is the things that I care about, like with AI.
This is why I want to build AI.
I think, like, if we want to pull forward the future and actually have AI systems,
that are able to, you know, discover new branches of physics or pull forward our understanding of the universe, pull forward, like, new therapeutics.
The answers to those don't show up in high-dimensional patterns from our existing training data because, like, the answer is literally unknown, right?
The pattern is unknown, in fact.
You might be able to find some sub-patterns that can apply in like similar reasoning chains, and that's actually how current sort of agent systems work, right?
If the reasoning chain that you need an agent to follow is simple enough, such that the reasoning chain shows up,
in an abstract way in the training data,
it can oftentimes pluck that and apply it.
And it works.
Like, this is how Zapier's AI bots actually work,
because they're able to, like, you know,
see enough sort of small chain reasoning examples
and apply that in a new context.
But for EJ systems that are going to go do,
like, completely new things for us
and solve open-ended problems that,
where the sort of reasoning machine doesn't exist in the training data
anywhere, that's where it owns
are just going to fall flat and be insufficient.
And, you know, at the end of the day,
I'm an empiricist, I think.
I think that's the only thing that really works in AI
is you have to just look at what works and what doesn't.
and just sort of objectively, language models do not work to beat ARC.
And people have tried.
But, I mean, I guess the kind of argument to that is, well, we just need more scale,
and then we need to focus on certain types of reasoning modules or other things
and some notion of memory.
Like, there's basic components that just feel to still be missing.
And maybe that's your point, you know, to some extent.
Scaling language models purely will not get there.
I think there is a, like, Transformers maybe, right?
I think transform might be a potential component of it.
Like, I think the biggest thing that we get from,
from maybe the biggest thing I think transform we've shown is like we now know how to build a really effective robust perception stack, right, where we can take a deep learning network, show it multimodal data and come up with like numerical representations of that data and do like operations over it, right?
And I think that's that likely is a probably a solution path towards true AGI, but the language model version of it where we're just sort of doing that stroke and prediction and training on data.
Like that system alone is the one I would claim that like no amount of scale.
Well, like that system, if you just put, you know, double the number of parameters,
10x the number of parameters into it, 10x number of data into it, you're never going to get to H.I.
Like, we do need, we need something more.
There's something addition in addition to that we need.
Okay.
And then what ideas do you think are missing or what areas do you think people should be exploring further?
I have two thoughts here.
One that's working.
So there's one of the techniques that has showed some promise on the Arc Challenge in past years
has been this technique of program synthesis.
For instance, it's actually been even a long,
longer than sort of co-gen models have been.
So this idea of like having a computer program
that like searches through program space
of possible programs and assembles them together
in order to do something.
You typically have like, you know,
an endpoint output.
And you're trying to discover a program
that can like map your input to your output.
And so it's a very relaxed universal search space, right?
You're not sort of following a back propagation gradient
of like a signal in order to figure out what the program
is you're actually like looping through all possible
programs. And because you're sampling from, like, the full sort of search space there,
it increases the likelihood you actually discover, like, a general form solution to it. And so that
was what got some of the, like, mid-20% range progress towards ARC was, was in that direction.
And it's just like, it's very, very orthogonal to sort of the language model transformer,
like, thought chain stuff. But that's, I think, one very promising technique. And then I think the other one
is figuring out ways that you can have computers
do the architecture discovery itself.
This is a not a new field or a new idea.
It's called neural architecture search.
It's been around for a long time,
I think maybe even 10 years now.
It's never really amounted to much.
Interestingly, you know, I think a lot of,
it's mostly from the academic side of things.
And neural architecture search oftentimes
researchers don't have access to large-scale compute.
So they're like using a computer program
effectively to search through possibly AI
architectures. And because academic researchers often don't have access to a lot of compute,
they take shortcuts in order to find results that they can publish. And I suspect now over the last
four years, we have, we might now have enough compute that's come online at a cheap enough
like kind of cost per flap that some of those old neural architecture search methods,
we should revisit and relax the search basically try to try to take the learning from the bitter
lesson of like, you know, not biasing these searches with human priors and human bias.
And try to relax the search and leverage a lot of the cheap compute that's come aligned towards
that.
When you talk about AGI, you know, I think there's some books like Blindsight, which
tries to differentiate between intelligence and sentience, right?
Self-awareness versus actually being able to intelligently do things.
When you talk about EGI, is there an embedded concept of sentience and then it
is a purely intelligence?
I'm not a philosopher.
So, like, I'm probably the worst person to ask about this question.
Look, I want to live in the future.
That's like kind of one of the things I've been really excited about, like, you know, if I can help pull forward the future, I want to. And I think one of the best ways we could pull forward the future is to invent systems that can invent and discover alongside us. And I think in order to do that, we need this general form of intelligence or a system that can represent this, demonstrate this general form of intelligence about being able to efficiently acquire those new skills and help us solve these open-ended problems. So I don't, like I haven't thought deeply about like, I haven't thought deeply about, like,
like, okay, well, is that system sentient, conscious?
Yeah, the main reason I ask is more, depending on your viewpoint, that increases or decreases
the relative risk of AI as a threat to humanity.
And so there's sort of the Dumer argument of sentience is kind of more of an issue than
maybe just intelligence.
Intelligence, to your point is, hey, you're harnessing this machine tool to be more
efficient in different ways or help you in different ways, which is kind of what your view
on the current state.
And you said, well, let's focus on AGI as something that we want to pull forward, because
the current approach is it's going to do a bunch of economic value,
but it's not going to create these intelligent things, right?
Or truly intelligent or generally intelligent things.
So that's kind of the basis for the question is the degree to which you view there being
increased risks of pulling this technology forward versus not.
And, you know, how you think about that more generally.
Yeah.
There's a good question.
You know, I think the, you sort of get close to, you know,
the ultimate alignment problem, which is a philosophical question,
probably more than an engineering question.
I think the only way that you really can approach this stuff is through an empirical lens.
I think you just have to look at what systems can do and make decisions based on that lens.
I think it's incredibly dangerous to try and make predictions about future capabilities,
about where the technology will go and make rules, legislations, laws like prohibiting or enforcing
or requiring certain research directions through a theoretical lens.
It just like hasn't empirically worked.
I don't think anyone consider it today and say that we would if this is where AI would even be five years ago.
So it feels just like incredibly short-sighted to say, well, okay, we're going to like enforce that the sort of language model regime is going to be the only one that we're going to allow it to happen for the forever.
So I think that's that's where I end up starting is like you got to be empirical about this stuff.
And until I think we have some empirical evidence of what the systems can do, I think it is sort of dangerous to or at least harmful to progress to try and sort of limit limit.
limit the research direction or add a lot of overhead and start exploring new ideas on that front.
It makes sense. And then, you know, you've now established this ARC prize, which I think is super
exciting. It's a million dollar prize towards, you know, an open source model that, you know,
meet certain criteria against your metrics of artificial general intelligence. Why do it as a prize
versus investing in companies or, you know, taking a funding, more traditional funding of
startups or efforts model versus a prize model.
I think outsiders are needed.
You know, there was 300 teams that actually competed in the ARC, like, small version of the
contest last year in 2023.
And if you go look at all the teams that competed, you know, they're like one or two person
teams.
They are outsiders to the industry.
They're not working in AI startups.
Many of them don't even live in like the Bay Area or Silicon Valley or California.
It's a very globally distributed side of people with new ideas that are working on this stuff.
I am more confident, actually, that, or I guess I would bet that the solution arc probably comes from an outsider.
I think it's probably going to come from somebody who's sort of not indoctrinated and the current way of thinking about language models and scale.
Arguably, like, the solution art doesn't even require that much scale.
You know, the cool thing about the puzzle, the archie giant e-bells, it's like kind of a minimal reproduction of general intelligence.
It fits into a two-by-two game board that's like at max.
It's like 15 by 15 squares big.
Like it's, it's so small and reproducible.
The data fits into such a small set that it's quite likely, actually,
that the solution, it can be, like, written in like 10,000 lines of code or less.
And it's not going to require these, like, you know, gigantic, you know,
200 billion, you know, large parameter models in order to solve it.
And so it's within, I think it's within the throes of outsiders.
I think it's within the throes of people that, like, want to tinker on sort of the night
and weekends. And really the goal of the prize is to, like my hope is that I sort of can encourage
like the would be AI researcher, you know, who has like choice of what they work on on their
nights and weekends to instead of saying like, well, maybe I could go build like another LLM
startup and maybe sell it to instead say, ooh, maybe I could go try to beat this Arc AGIE
VAL. And if I do it now, not only is their status attached to it, but there's money attached to it,
right? There's like, I get upside. There's like, there's an economic incentive.
to like try and win.
And I'm trying to like use the prize as kind of a way to counterbalance some of the like
economic, you know, unlocked that language models have on the startups and things.
You mentioned that ARC in part was inspired by your engagement with AI as part of Zapier and that
strategy there.
Can you tell me a little bit more about what Zapier has built on the AI side and how you all
both got to it early and then how you ended up approaching what to actually focus on?
Because I feel like as people adopt this technology, there's almost like a multi-month phase
of just figuring out what it can even do.
So can you tell me about that journey and, yeah, how that all worked out?
The summer of 2022, both Brian and I actually, my co-founder, Brian CTO, gave up our exact team
roles.
We went all in back to sort of being ICs, no direct reports.
And for about six months, all we did was like, build, like trying to figure out what
was possible.
So we built a version of, you know, chain of thought, tree of thought.
We built a version of chat GPT, actually.
Internally, it's after before it got, before it came out.
And I think it gave us somewhat confidence that we, so.
we'd like, to the best our abilities, fully explore the search base of what call a GP3 at that point, you know, intelligence-style model could do.
And what it led us to see was probably the big gap was that sort of the models are frozen in a time, right?
This was kind of pre-tool use.
And the most obvious thing to do was like, well, Zapier has a lot of tools, right?
We have 6,000 integrations.
Could we hook these language models up to use those tools?
And that's ultimately what led to Zapier being a launch partner for the Chatchipati plugin,
which I think is one of the first moments that Zapier's, like, kind of became known more
popularly in association with AI stuff.
Is there anything you can share in terms of adoption or metrics or usage by Zapier users
or customers of your AI products?
Yeah, we've got, at this point, over 50 million AI tasks have run on the platform
to date over the last year and a half or so since we started tracking.
So this is like, you know, think of a Zap, right, where it's like you've got a trigger instead
of actions where one of those actions is an AI.
step. Dominantly, this is open AI or a chat to PT step where, you know, users doing content
generation or feature extraction or summarization. Using AI in the middle of a workflow is kind of
the dominant way people are adopting AI today. Over the last couple of months, we've introduced
other products in our ESFA. So we're using AI basically across the entire product. We've launched
a new product called Zapier Central, which are effectively these AI bots that you don't have to
built, effectively. You know, the classic way I think most people experience app years,
you have to build in the editor, right? You go have to, you know, do lots of configuration and
click, click, click in order to get your zap set up and just tune to the way you want.
And one of the cool things of these new AIBots is you're programming with the natural language.
And we're not actually even doing natural language to structure mapping. It is a pure
inference-based engine interpreting the user's instructions of what they want the bot to do
and getting access to all the integrations and authentications that they equip it with.
And so we're seeing some, just like an order of magnitude easier to use.
product. Yeah, that's really cool. I guess one potential future direction that really fits well
with what Zapier is provided in the past is sort of the agentic world or really having some of
these tasks turn more and more into agents, right? You can imagine that you're setting up some
workflow automation or something else and eventually it does things a bit more on its own or
you can be a bit more directive and it just goes and does it for you. How far away from that world
do you think we are? That's happening today. I mean, we have people literally paying for Zapier's
There's enough value that's unlocked that where people are willing to pay, right?
I think that's been shown.
The way that I think about this is like concentric rings of use cases that got unlocked
as the consistency and reliability of the technology matures.
So today, the sort of consistency and reliability thresholds that we're able to meet,
the users are able to sort of get to kind of requires first adoption in like personal
use cases or team-based workflow use cases where the risk is relatively low if something
goes wrong. One interesting thing is like there's actual use cases like bot templates of bots that
we've built and given to different users where one user takes the exact same template, say one of these
AI bots that can watch for a certain email hitting or landing in your inbox and sending a message
to your team in Slack if it qualifies. Let's say, hey, you're looking out for a certain
payment notification email or a refund notification email and you want those like, you know,
routed into a certain channel in Slack. That exact use case might be completely acceptable for
like a startup, right, that maybe has three Slack channels.
and it's like just the founding set or the founding team.
And you take that exact same bot, same template,
same exact same thing and go give it to, you know,
a mid-market company that's got thousands of Slack channels,
partner channels, lots of production things happening.
And they might not be comfortable with that risk, right?
They might want to clank down the possibility space of what the bot can do
in a tighter way, whereas, you know, the first one say,
hey, like, sure, have the bot just choose which Slack channels,
write the message however at once.
You know, I kind of wanted to just figure it all out.
And as you kind of move up and up the risk chain, you kind of want to install more and more clamps.
So that's been a big part of our product build thesis for AI bots is like, how do we allow end users to provide clamping behavior on what the bot can and can't do in order to increase the size of the circles that of use cases that sort of get unlocked?
So I think that's probably the march of technology we're going to see. I would expect to see it's like there's things that we can do still do that we haven't done yet in terms of making the product and bots more reliable and consistent that we're,
working on right now. And I think there's things that the underlying sort of technology and
models are going to improve out as well. They'll increase the reliability and consistency.
And as that goes forward, I think you'll just see more and more, like the risk of use case will go
go up. So open source software as well just open source ideas, papers, data sets, etc., have really
helped drive multiple areas of science and technology forward. How do you think about open source software
in the context of AI, in particular given some of the regulatory and other movements that have
been happening at both the California level, the national level, etc.
My beliefs here are formed through how much we stole that, I think, on AGI progress.
We still need fundamental research breakthroughs.
We still need fundamental new ideas.
And I think the Internet and open source has been one of the world's best inventions in order to generate new ideas.
And so I think if you care about actually discovering EGI in our lifetime, then I think it's sort of incumbent to try and promote things that,
increase the likelihood that we're generating new ideas and having lots of AI
researcher brains or would be AI researcher brains sort of encountering this stuff and it's not locked
and closed behind, you know, a hiring process at a big lab. And so, you know, I'm very much in
favor of supporting open progress, open research sharing, especially at the like foundational scientific
level because we just need new ideas. And I think the best way to generate those ideas is through
open source and open sharing at this point. I mean, the proof here is like literally open AI,
right? Like the sort of genesis of the company came out of a published research result from
from Google. And sadly, I don't think that's likely to happen now as a result of kind of a lot
of the commercialization and market incentives causing a lot of frontier publishing getting
getting closed up because now these companies sort of have, they know the economic value of the
research. So they're kind of playing more tight to the chest. And it's just like kind of worrying
or upsetting. It's certainly at least stalling progress. And I'm hoping to play a small part
in trying to counterbalance that of it. You raise an interesting point, which is the internet
was basically driven by open protocols because there were a lot of closed proprietary protocols
in terms of how networks function and how machines talk to each other. And then open source,
right, in terms of Linux, space servers and other things that were really the workhors of the early
internet. And relatedly, there's a lot of attempts to regulate cryptography in the 90s.
for adjacent but overlapping reasons in terms of why people are not trying to regulate AI,
well, they say it's a threat or, you know, malicious actors could do malicious things,
and everything's been fine with cryptography, and it's been net positive for the world to have it in place.
So it's kind of interesting to see some of those analogs for parallel levels.
Yeah, I mean, I think my sort of underlying beliefs on AI are AI should likely get regulated
through the existing regulatory frameworks that exist.
I don't see a lot of new harm or use cases or damage caused by just the narrow form of AI systems that we have today,
that existing sort of regulatory frameworks or agencies don't have power already to sort of regulate and make decisions over.
That feels smart and the right way to sort of think about that stuff.
Then on the AGI front, I think it's just really, really dangerous to put in prescriptive legislation
ahead of seeing any empirical evidence of what the systems can or cannot do yet.
I would not trade personal, independent freedom for the sort of what it would take in order to, like,
prevent AGI from ever getting developed, just personally.
Like, that's kind of my, it's my philosophical framework on that.
You know, I'm open to us actually discovering, okay,
here's what the forms of AGI are going to look like,
but what they can and can't do.
And then making decisions about, okay, how do we want to release that?
What is it, you know, how are we going to control that,
making decisions at that point based on what we're seeing.
But I would be very, very strongly against trying to, like,
predict what those things are in theoretical sense.
I think that just hasn't worked to the story.
quickly. Great. Well, thank you so much for covering all these wide diversity of topics,
telling us more about ARC. It sounds like a very exciting initiative. And so I'm sure there's more
to come there. And thank you so much for joining us today on No Pryors. Thanks for having me.
Find us on Twitter at NoPriars Pod. Subscribe to our YouTube channel if you want to see our faces,
follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode
every week. And sign up for emails or find transcripts for every episode at no dash priors.com.
Thank you.