No Priors: Artificial Intelligence | Technology | Startups - Model Plateaus and Enterprise AI Adoption with Cohere's Aidan Gomez
Episode Date: November 21, 2024In this episode of No Priors, Sarah is joined by Aidan Gomez, cofounder and CEO of Cohere. Aidan reflects on his journey to co-authoring the groundbreaking 2017 paper, “Attention is All You Need,”... during his internship, and shares his motivations for building Cohere, which delivers AI-powered language models and solutions for businesses. The discussion explores the current state of enterprise AI adoption and Aidan’s advice for companies navigating the build vs. buy decision for AI tools. They also examine the drivers behind the flattening of model improvements and discuss where large language models (LLMs) fall short for predictive tasks. The conversation explores what the market has yet to account for in the rapidly evolving AI ecosystem, as well as Aidan’s personal perspectives on AGI—what it might look like and when it could arrive. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @AidanGomez Show Notes: 0:00 Introduction 0:36 Co-authoring “Attention is all you need” 2:27 Leaving Google and founding Cohere 4:04 Cohere’s mission and models 6:15 Pitfalls of current AI 8:14 How enterprises are deploying AI today 10:58 Build vs. buy strategy for AI tools 14:37 Barriers to enterprise adoption 20:04 Which types of companies should pretrain models? 24:25 Addressing flaws in open-source models 25:12 Current and expected progress in scaling laws 29:54 Advances in multi-step problem solving and reasoning 32:29 Key drivers behind the flattening curve of model improvements 36:25 Exploring AGI 39:59 Limitations of LLMs 42:10 What the market has mispriced
Transcript
Discussion (0)
Hi listeners and welcome to No Pryors.
Today we're hanging out with Aidan Gomez, co-founder and CEO of Cohere, a company valued at more than
$5 billion in 2024, which provides AI-powered language models and solutions for businesses.
Aiden founded Cohere in 2019, but before that, during his time, as an intern at Google Brain,
he was a co-author on the landmark 2017 paper, Attention is All You Need.
Aiden, thanks for coming on today.
Yeah, thank you for having me.
Excited to be here.
Maybe we can start just a little bit with the personal background.
How do you go from growing up in the woods in Canada to, you know, working on the most important technical paper in the world?
A lot of luck and chance.
But yeah, I happened to go to school at the place where Jeff Hinton taught.
And so obviously Jeff recently wants to be.
the Nobel Prize. He's kind of attributed with being the godfather of deep learning. In U of
T, the school where I went, he was a legend and pretty much everyone who was in computer science
studying at the school wanted to get into AI. And so in some sense, I feel like I was raised
into AI. Like as soon as I stepped out of high school, I was steeped in an environment that really
saw the future and wanted to build it.
And then from there, it was a bunch of happy accidents.
So I somehow managed to get an internship with Lukash Kaiser at Google Brain.
And I found out at the end of that internship, I wasn't supposed to have gotten that internship.
It was supposed to have been for PhD students.
And so they were like throwing a goodbye party for me, the intern.
And Lukash was like, okay, so Aidan, you're going back.
How many years have you got left in your PhD?
And I was like, oh, I'm going back into third year undergrad.
And he was like, we don't do undergrad internships.
So I think it was a bunch of like really lucky mistakes that led me, led me to that team.
working on really interesting, important things at Google, what convinced you that you should start cohere?
Yeah, so I bounced around. Like, when I was working with Lukash and Noam and the Transformer guys, I was in Mountain View.
And then I went back to U of T, started working with Hinton and my co-founder, Nick, in Toronto.
I brain there. And then I started my PhD and I went to England. And I was working with Jakup, who's another transatlantic.
former paper author in Berlin and collaborating with Jeff.
We had Jacob on the podcast.
Oh, nice. Yeah, yeah. Okay. Fan of the pod. Good. Good. So yeah, I was working with
Jacob in Berlin. And then I was also collaborating remotely with Jeff Dean and Sanjay on Pathways,
which was like they're, you know, bigger than a supercomputer training program. The idea was like
wiring together supercomputers to create a new larger unit of compute.
you could train models on.
And at that stage, GPT2 had just come out, and it was pretty clear the trajectory of the technology.
Like we were on a very interesting path.
And these models that were ostensibly models of the internet, models of the web, we're
going to yield some pretty interesting things.
So I called up Nick, I called up Ivan, my co-founders.
And I said, you know, maybe we should figure out how to build these things.
I think they're going to be useful.
For anyone who doesn't know yet, can you just describe at the high level, like, what cohere's mission is and then what the models and products are?
Yeah, so our mission, the way that we want to create value in the world is by enabling other organizations to adopt this technology and make their workforce more productive or transform their product and the services that they offer.
So we're very focused on the enterprise.
We're not going to build a chat GPT competitor.
What we want to build is a platform and a series of products to enable enterprises to adopt this technology and make it valuable.
And in terms of like your North Star of how you organize the team and invest, you obviously come from a research background yourself.
Like how much do you think, you know, cohere success is dependent on core models versus.
other, you know, platform and go-to-market support investments you make?
It's all of the above. Like, the models are the foundation. And if you're building on a
foundation that doesn't meet the customer's needs, then there's no hope. And so the models are
crucial, and it's like the heart of the company. But in the enterprise world, things like
customer support, reliability, security, these are all key. And so we've,
heavily invested on both sides. We're not just a modeling organization. We're a modeling and
go-to-market organization. And increasingly, product is becoming a priority for co-hear. And so
figuring out ways to shorten time to value for our customers. Yeah, over the past like 18 months
since the enterprise world sort of woke up to the technology, we've watched, we've watched
folks build with our models, seeing what they're trying to accomplish, seeing the common
mistakes that they make. That's been helpful. It's been sometimes frustrating, right,
watching the same mistake again and again. But we think there's a huge opportunity to be able
to help enterprises avoid those mistakes and implement things right the first time. And so that's
really where we're pushing towards. Yeah. Can we make that a little bit more real? Like what is the
mistake that frustrates you most and how can product go meet that? Yeah. Well, I think all language
models are quite sensitive to prompts to the way that you present data. They all have their
own individual quirks. The way that you talk to one might not work for the way that you talk to
another. And so when you're building a system like a RAG system where there's an external
database, it really matters how you present the retrieved results to the model. It matters how
the data is actually stored in those databases. The formatting
counts. And these small details are often lost on people. They overestimate the models. They
think they're like humans. And that has led to a lot of repeat failures. People try to implement
a rag system. They don't know about these like idiosyncratic elements of implementing one
properly. And then it fails. And so in 2023, there are a lot of these POCs, a lot of people trying
to get familiar with the technology, wrap their heads around it, and a lot of those POCs fail
because of infamiliarity, because of, yeah, these common errors that we've seen.
And so moving forward, we have two approaches.
One is making the models more robust.
So the model should be robust to a lot of different ways that you present data.
And the second piece is being more structured about the product that we exposed to the user.
So instead of just handing a model and saying, you know, prompt it, good luck.
Actually putting more structure around it.
So creating APIs that more rigorously define how you're supposed to use the model.
These sorts of pieces, I think, just reduce the chances of failure and make these systems much more usable for the user.
What are people trying to do?
Can you give us a flavor of some of like the biggest use cases you see in the enterprise?
It's super broad.
So it spans pretty much every vertical.
I mean, the common things are like Q&A, so speaking to a corpus of documents.
For instance, if you're a manufacturing company, you might want to build a Q&A bot for your engineers or your workers who are on the assembly line and plug in all of the manuals of the different tools and diagnostic manuals for common errors and parts and then let the user chat to that.
that instead of having to open up a thousand page book and try to find what they need.
Similarly, Q&A bots for the average enterprise worker.
So plugging in your IT, FAQ, your HR docs, all the things about your company,
and having a centralized chat interface onto the knowledge of your organization
so that they can get their questions answered.
Those are some of the common ones.
Beyond that, there are kind of specific functions that we power.
A good example might be for a health care company, they have these longitudinal health records of patients.
And that consists of every interaction that that patient has with the health care system, from visits to a pharmacy, to the different labs or tests that they're getting, to doctors' visits.
And it can spend decades.
And so it's a huge, huge record of someone's medical history.
And typically what happens is that patient will call in and they'll ring up the receptionist
and be like, my knee hurts.
I need an appointment.
And the doctor then needs to kind of comb through the past few entries.
See, has this come up before.
And maybe they missed something that was two years ago because they only have 15 minutes
before an appointment.
But what we can do is we can feed that entire history in alongside the reason they're
coming in. So contextually relevant, right, to what they said they're coming in for and surface
a briefing for the doctor. And so this tends to be one dramatically faster for the doctor to
review, but also often it catches things that a doctor couldn't possibly review before every
patient meeting. They're not going through 20 years of medical history. It's just not possible.
But the model can do that. It can do that in under a second. So those are the sorts of functions.
that we're seeing, summarization, Q&A bots, a lot of these, you might think of them as mundane,
but the impact is immense.
We see tons of startups working on problems such as, let's say, enterprise search overall,
specialized applications to, let's say, like, technical support for a particular vertical,
even looking at health records and reasoning against them and retrieving from them.
How do you think about, like, what the end.
state, there's no end state, but what some stable equilibrium state is for how enterprise is
consumed from, let's say, specialist AI-powered application providers versus custom applications
built in-house with AI platforms and model APIs.
I think it's going to be a hybrid.
I think it's probably, you can imagine like a pyramid where the bottom of that pyramid,
every organization needs this stuff.
And it's like co-pilot, like a generalist.
chatbot in the hands of every single employee to answer their questions.
And then as you head up the pyramid, it's more specific to the company itself or the
specific domain or product that they operate in or offer.
And as you push up that pyramid, it's much less likely you're going to find an off-the-shelf
solution to address it.
And so you're going to have to build it yourself.
What we've pushed organizations to do is have a strategy.
that encompasses that full pyramid.
Yes, you need the generalist standard stuff.
Maybe there's some industry-specific tools that you can go out and buy.
But then if you're building, don't build those things that you could buy.
Instead, focus on the stuff that no one's going to sell to you.
And that gives you uniquely a competitive advantage.
So we worked with this insurance company and they insure large industrial development projects.
It turns out, I know nothing about this space.
Turns out what they do is there's like an RFP put out by a mine or something, like whatever the project is, for insurance.
And they have actuaries, jump on that RFP, do tons of research about, you know, the land that it's on, the potential risks, et cetera.
And then it's essentially a race to whoever responds first usually gets it.
And so it's a time-based thing.
How quickly can these actuaries put forward a good researched proposal?
And what we built with them was like a research assistant.
So we plugged in all the sources of knowledge that these actuaries go to to do their research via RAG.
And we gave them a chat bot.
And it dramatically sped up their ability to respond to RFPs.
And so it grew their business because they were just winning many more of them.
And so it's tough for like, you know, we built horizontal technology and LLM is kind of like a CPU.
I don't know all the applications of Anelon, right?
It's so broad and really the deep insight or the competitive advantage,
the thing that puts you ahead is listening to the customer
and letting them tell you what would put them ahead.
And so that's a lot of what we've been doing is just being a thought partner
and helping brainstorm these projects and ideas that are strategic to them.
I'd wager that's, you know, this company is winning because the vast majority of their competitors haven't been able to move so quickly to adopting, you know, and building, like, this research assistant product that is helping them.
Like, what is the biggest barrier you see to generally enterprise adoption?
I think the big one is trust.
So security is a big one.
in particular in regulated industries like finance, health care, data is often not in a cloud,
or if it is in a cloud, it can't leave their VPC.
And so it's very lockdown.
It's very sensitive.
And so that's a unique differentiator of cohere.
The fact that we haven't locked ourselves into one ecosystem and we're flexible to deploy
on-prem if you want us in VBC.
outside of VPC, literally whatever the customer wants, we're able to touch more data,
even the most sensitive data, and provide something that's more useful. So I would say security
and privacy is probably the biggest one. Beyond that, there's knowledge, right? Like the knowledge
to know how to build these systems. They're new. It's unfamiliar to folks. You know,
the people with the most experience have a few years of experience.
And so that's the other major piece.
That bit, I think it's honestly just a time game.
Like, eventually developers will become more familiar with building with this technology.
But I think it's going to take another two or three years before it really permeates.
Do you think in like a traditional hype cycle for enterprise technologies, probably for most technologies, but in particular enterprise, you know, there's this trough of disillusionment concept of people.
get very excited about something and ends up being harder to apply or more expensive than they thought,
do we see that in AI?
I'm sure we see some of it for sure.
But I think honestly, like the core technology is still improving at a steady clip and new applications are getting unlocked every few months.
So I don't think we're in that trough of disillusionment yet.
Yeah, it feels like we're super early.
It feels like we're really, really early.
And if you look at the market, this technology just unlocks an entire new set of things that you couldn't build.
You just fundamentally couldn't build them before, and now you can.
And so there's a resurfacing of technology, products, systems that's underway.
Even if we didn't train a single new language model, like, okay, all the data centers blow up.
We can't improve the L-R-M.
We only have what we have today.
There's a half decade of work to go integrate this into the economy, to build all these things, to build the, you know, RFP insurance RFP response bot to build the health care record summarizer.
Like there's a half decade of just resurfacing to go do.
So there's a lot of work ahead of us.
I think we're kind of past that point.
There was a question of, oh, is there too much hype?
Is this technology actually going to be useful?
but it's in the hands of 100 million people now,
hundreds of millions of people now.
It's in production.
There's very clear value.
The project is now putting it to work and delivering it to the world.
In this question of like integration into the real world,
some piece of it is, of course, like interfaces and change management
and like figuring out how users are going to understand the model outputs
and guard bells and all of that.
Specifically, when we think about the model and specialization, like, do you have some framework you offer customers or that you use internally around what version of it they should invest in, right?
So we have pre-training, post-training, fine-tuning, retrieval, like in those sort of traditional sense, like prompting, especially as we get longer context.
Like, how do you tell customers to make sense of how to specialize?
It really depends on the application.
Like there's some stuff, for instance, we partnered with Fujitsu, who's like the largest
SI in Japan, to build a Japanese language model.
There's just no way you can do that without intervening on pre-training.
You can't like fine-tune or post-trained Japanese into a model effectively.
And so you have to start from scratch.
On the other side, there's more narrow things.
Like if you want to change the tone of the model or you want to change the tone of the model or you
want to, I don't know, change how it formats certain things.
I think you can just do fine-tuning.
You can take the end state.
And so there is this gradient.
What we usually recommend to customers is start from the cheapest, easiest thing,
which is fine-tuning, and then work backwards.
And so start with fine-tuning, then go back into post-training, right?
Like SFT, RLHF, then if you need to, and, you know, it's kind of a journey, right?
Like as you're talking about a production system and the constraints are getting higher and higher,
you potentially will need to touch pre-training.
Hopefully not all of pre-training.
Hopefully it's like 10% of pre-training at the very end or maybe 20% of pre-training.
But yeah, that's usually how we think about it.
It's like this journey from the simplest cheapest thing to the most sophisticated but most performant.
Moving along the gradient from the cheapest thing makes sense to me.
the idea that any enterprise customer will invest in pre-training is, I think, a bit more controversial.
I believe some of the lab leaders would say, like, nobody should be touching this.
And it doesn't make any sense for people from a scale of compute and data, data curation effort required and just sort of the talent required to do pre-training in any sort of competitive way.
Like, how would you react to that?
I think if you're building like a, if you're a big enterprise and you're
sitting on a ton of data, like hundreds of billions of tokens of data, pre-training is a real
lever that you're able to pull. I think for most SMBs and certainly startups, it makes no sense.
You should not be pre-training a model. But if you're a large enterprise, I think it's it should
be a serious consideration. The question is how much pre-training? It's not like you have to start
from scratch and do a, you know, $50 million training run, but you can do a $5 million training run.
That's what we've seen succeed, these sort of continuation pre-training efforts.
So, yeah, that's one of the offerings that we have.
But of course, we don't jump straight into that.
You don't need to spend massively if you don't want to.
And usually the enterprise buying cycle or technology adoption cycle is quite slow.
And so you have time to move back into it.
I would say it's totally at the customer's discretion.
But to the folks who say that no one should be pre-training.
No one outside of, let's say, AGI labs should be pre-training.
That's empirically wrong.
Maybe that's a good jumping off point into just like talking a little bit more about what's going on in the technical landscape and also what that's.
means for Cohere. Like, what is the, what is the bar you set internally for Cohere? You said the
models of the foundation. And I believe you've also said, like, there's no market for last
year's models. Like, how do you square that with the expense of the capital expense of competition
and the rise of open source models now? Well, I think you have to spend, there's some like
minimum threshold that you need to be spending at in order to build a model that's useful. The things
get cheaper. The compute to train the model get cheaper. The sources of data, well, in some
directions they get cheaper and others not. With synthetic data, it's gotten dramatically cheaper,
but with expert data, it's getting harder and harder and more expensive. And so what we've seen
is today you can build a model that's as good as GPT4 in all the things that enterprises might
care about for $10 million, $20 million.
like just orders of magnitude less than what was spent to develop that model.
And so if you're willing to wait six months or a year to build the technology,
you can build it at a fraction of what those frontier labs have paid to develop it.
And so that's been a key part of cohere's strategy is we don't need to build that thing first.
What we'll do is we'll figure out how to do it dramatically cheaper.
focus on the parts of it that matter to our customers. So we'll focus on the capabilities
that our customers really depend on. Now, at the same time, we still have to spend, like relative
to a regular startup. We have to pay for a supercomputer. And those things cost hundreds of millions
of dollars a year. So it is capital hungry, but it's not capital inefficient. It's very clear
that we'll be able to build a very profitable business off of what we're building.
So that's the strategy, is don't lead, don't burn, you know, three, five, seven billion dollars a year
to be at the front, be six months behind, and offer something to market to enterprises that
actually fits their needs at a price point that makes sense for them.
Why spend on the supercomputer and the training yourself at all if you have increasingly
the open source options?
Well, you don't.
Not really.
Say more.
So for Lama, yeah, you get like the base model at the end when it's cooled down and it has
zero gradient.
You get the post-trained model at the end when it's cooled down and has zero gradient.
Taking those models and trying to fine-tune them, it's just not as effective as building
it yourself and you have much fewer levers.
to pull, then if you actually have access to the data and you can change the data that goes
into that process.
And so we feel that by being vertically integrated and by building these models ourselves,
we just have dramatically more leverage to offer our customers.
Maybe if we go to projection and we'll hit on a few things that you've mentioned as well,
where are we in scaling loss?
How much capability improvement do you expect over the next few years?
We're pretty far along, I would say.
Like, we're starting to enter into a sort of flat part of the curve.
And we're certainly past the point where if you just interact with a model, you can know how smart it is.
Like the vibe checks, they're losing utility.
And so instead, what you need to do is you need to get experts to measure within very specific domains like physics, math, chemistry, biology.
you need to get experts to actually assess the quality of these models because the average person can't tell the difference at this stage between generations.
Yes, like there's still much more to go do, but those gains are going to be felt in very specialized areas and have impacts on more researchy, more researchy domains.
I think for enterprises and the general sorts of tasks that they want to automate or tools that they want to build,
the technology is already good enough or close enough that a a little bit of customization
will get them there so that that's sort of the stage that we're at there is there's a new unlock
in terms of the category of problems that you can solve and that's reasoning and so online
reasoning is something that has been missing these models they don't have a they previously
didn't have an internal monologue right like they didn't really think to themselves you would just
ask them a question and then expect them to immediately answer that question. They couldn't
reason through it. They couldn't fail, right? Like make a mistake, catch that mistake, fix it,
and try again. And so the fact that we now have reasoning models coming online, of course,
opening I was the first to put it into production, but cohere's been working on it for about a year
now. This category of tech, I think, is really interesting. There's a new set of problems
that you can go solve. And it also changes the, it changes the, it changes the
economics. So before, if I had a customer come to me and say, Aden, I want your model to be better at X or I want a smarter model, I would say, okay, you know, give us six to 12 months. We need to go spin up a new training run, train it for longer, train a bigger model, et cetera, et cetera. Now there's that that was kind of the only lever we had to pull to improve the performance of our product. There's now a second lever.
which is you can charge the customer more.
You can say, okay, let's spend twice as many, you know, tokens or let's spend twice as much time at inference time.
And you'll get a smarter model.
So there's a much nicer product experience.
Okay, you want a smarter model?
You can have it today.
You just need to pay this.
And so they have that option.
They don't need to wait six months.
And similarly, for model builders, I don't need to go double the size of my supercomputer to hit a requisite.
intelligence threshold, I can just double the amount of inference time compute that my customers
pay for. So I think that's a really interesting structural change in how we can go to market
and what products we can build and what we can offer to the customer. I agree. I think it's
perhaps undervalued in the ecosystem right now, how much more appealing it should be to all
types of customers that you can move from a like a CAP-X model of improvement to a consumption
model of improvement, right? And it's not like, you know, these are apples and oranges
things, but I think you'll see people invest a lot more in, you know, solving problems when
they don't have to hony up for a training run and have this delay, as you described.
Yeah, it hasn't been clocked. Like, people haven't really priced in the impact of inference
time compute delivering intelligence, there's loads of consequences, even at like the chip
layer, right, like what sort of chips you want to build, what you should prioritize for
data center construction. If we have a new avenue, which is inference time compute,
that doesn't require this densely interconnected supercomputer. It's fine to have nodes.
You can do a lot more locally and less distributed. I think it has loads of impact up and down
this chain. And it's a new paradigm.
of what these models can do and how they do it.
You were dancing around this, but because your average person doesn't spend that much
time thinking about, like, what is reasoning, right?
Do you have any intuition you can offer people for, like, what are the types of problems?
This allows us to tackle better?
Yeah, I think any sort of multi-step problem.
There's some multi-step problems you can just memorize, which is what we've been asking
models to do so far.
like solving a polynomial, right?
Like, really, that should be approached multi-step.
That's how humans solve it.
We don't just get given a polynomial and then, boom.
There's a few that maybe we've memorized, right?
But by and large, you have to work through those problems, break them down,
solve the smaller parts, and then compose it into the overall solution.
And that's what we've been lacking.
We've really lacked.
And we've had stuff like chain of thought, which has enabled,
that, but it's sort of like a retrofitting. It's sort of like we train these models to just
memorize input output pairs, and we found a nice little hack to elicit the behavior that mimics
reasoning. I think what's coming now is from scratch, the next generation of models that is being
built and delivered will have that reasoning capability burnt into it from scratch. And it's
not surprising that it wasn't there to begin with, because we've been trying to
training these models off of the internet.
And the internet is like a set of documents, which are the output of a reasoning process
with the reasoning all hidden.
It's like a human wrote an article and, you know, spent weeks thinking about this thing
and deleting stuff and blah, blah, blah, but then posted the final product.
And that's what you get to see.
Everything else is implicit, hidden, unobservable.
And so it makes a lot of sense why the first generation of life.
language models lacked this inner monologue. But now what we're doing is we're with human data
and with synthetic data, we're explicitly collecting people's inner thoughts. So we're asking them
to verbalize it and we're transcribing that and we're going to train on that and model that part
of the problem solving process. And so I'm really excited for that. I think right now it's extremely
inefficient and it's quite brittle, similar to the early versions of language models. But over the next
two or three years, it's going to become incredibly robust and unlock just a whole new set of
problems. What is the basic driver of the slowdown, you know, reaching the flat part of the
curve that you describe with scaling? Is it is it the cost of, you know, increasingly expert data
collecting, as you said, like hitting reasoning traces that is harder and more expensive than
just taking the data on the internet? Is it the difficulty of having evals for, you know,
increasingly complex problems? Is it just overall cost of compute? Like, why do you think that
flattening is happening? When someone's making an oil painting, they do a backcoat and just
cover the whole, the whole canvas, and then they sort of paint in the shapes of,
of the mountains and the trees.
And as you get more and more detailed,
you're bringing out very fine brush strokes,
there's a lot more of them that you need to make.
Before you could just take a big wedge
and just throw paint across the canvas
and accomplish the thing that you wanted to accomplish.
But as you start to get more and more targeted
or more and more detailed,
what you're trying to accomplish.
It requires a much more fine instrument.
And so that's what we've seen with language models.
We're able to do a lot of the common, simple, easy tasks quite quickly,
but as we've approached much more specific, sensitive domains like science, math,
that's where we've started to see resistance to improvement.
And in some places, we've gotten around that by using synthetic data, like in code and math.
These are places where the answer is very verifiable.
You know when you're right or you're wrong.
And so you can generate tons of synthetic data and just verify whether it's correct or not.
You know it's correct.
Okay, let's train on it.
In other areas that require testing and knowledge in the real world, like in biology, like in chemistry,
there's a bigger bottleneck to creating that sort of data and you have to go to experts
who know the field, who have experienced it for decades, and basically distill their knowledge.
But eventually you run out of experts and you run out of that data and you're at the frontier
of what humans know about X, Y, or Z.
There's just increasing friction to fill in these much finer details of this portrait.
it. I think that's a fundamental problem. I don't think that there's any shortcuts around
that. At some stage, we're going to have to give these models the ability to run their own
experiments to fill in areas of their knowledge that they're curious about. But I think that's
quite a ways away. And it's going to be tough to scale that. It will take many, many years to do.
We will do it.
We're going to get there 100%.
But for the stuff that I care about today with Cohere,
I think there are many applications which this technology is ready for production for.
And so the primary focus is getting it to production and ensuring that our economy adopts this technology and integrates it as quickly as possible, gets that productivity uplift.
And so while that technical question is super interesting about, you know, why is progress slowing down?
I think it should be kind of obvious, right?
It's like the models are getting so good.
They're hitting, they're running into the thresholds of human knowledge, which is really where they're getting their capability from.
You are so grounded in, you know, getting the capabilities we have and will continue to progress, even if the curve is flattening into production.
I think I know this answer, but how much does cohere think about like AGI and takeoff?
And does that matter to you?
Well, AGI means a lot of things to a lot of different people.
I think I believe in us building generally intelligent machines, like completely.
It's like, of course we're going to do that.
But AGI has been conflated.
How soon?
We're already there.
It's not a, you know, it's not a binary.
It's not discrete.
It's continuous, and we're, like, well on our way.
We're pretty far down that road.
There's some definition elsewhere in industry that, like, you can put a break point at,
even if you have this continuous function, you can put a break point in, like, there's
intelligence that replaces, like, an educated adult professional in any digital role.
Your view is there's no really important break point that's happening.
That's sort of, like, objective check.
list thing. Like, when you've checked all these boxes, then you've got it. I think you can always
find, like, a counter-example. You're like, oh, well, it hasn't actually beaten this one human over
here who's doing this, like, for random thing. No, I think it's, I think it's pretty continuous
and we're like quite far, quite far along. But the AGI that I really don't subscribe to is
the superintelligence takeoff, self-improvement, just leading to the Terminator that exterminates
us all. Or creates abundance, unclear. Yeah, or creates abundance. Right, right. Yeah.
No, I think we'll be the one to create abundance.
We don't need to wait for this God to emerge and do it for us.
Let's go do it with the tech that we're building.
We don't need to depend on that.
We can go do it ourselves.
We will build AGI if what you mean is very useful, generally capable technology that can
do a lot of the stuff that humans can do and flex into a lot of different domains.
If what you mean is, you know, are we going to build God?
No.
What do you think is the driver in that difference of opinion?
I don't know.
I think maybe I'm a little bit more in the weeds of the practical frustrations of the technology,
where it breaks, where it's slow, where we start to see things plateau or slow down.
And perhaps others are more, maybe they're more optimistic.
Maybe they see, they see a curve increasing and they just think it goes on forever.
Like that will just continue arbitrarily, which I disagree with.
I think there's friction points.
Like there is genuinely friction that enters in.
Like maybe even if in theory, you know, like a neural net is a universal approximator, it can learn anything to universally
approximate, you would need to build a neural net the size of the universe. And so, like, there's
some fundamental barriers to reaching limits that people extrapolate out to that I think will
bound the practically realizable forms of this technology. Are there domains where you just believe
to LLMs, as we have them today, are, like, not a good fit for prediction, right?
And so an example might be, like, are we going to get to physics simulation from sequence
to sequence models?
I mean, probably, yeah.
Like, physics is just, like, a series of states and transition probabilities.
So I think it's probably quite well modeled by sequence modeling.
But are there areas where it's poorly suited?
I'm sure.
I'm sure that there are better models for certain things, more efficient models.
Like you can take it, if you zoom into a specific domain, you can take advantage of structure
in that domain to carve off some of the unnecessary generalities of the transformer or of
this category of architectures and get a more efficient model.
That's definitely true when you, when you zoom in.
And it doesn't sound like you think it's like at its core, like a,
a representation issue or it's just not going to work?
There's irreducible uncertainty in the world.
There's things that you genuinely cannot know.
And like building a bigger model will not help you know this genuinely random or unobsurable thing.
And so those things will never be able to model effectively until we learn how to observe them or, you know.
I think the transformer in this category of model can do much more than people give it credit for.
it's a very general architecture, many, many things can be phrased as a sequence. And these
models are just sequence models. And so if you can phrase it as a sequence, the transformer can do a
fairly good job at picking up any regularity in it. But I'm certain that there are examples that I'm
just not able to think of right now where sequence modeling is super inefficient. Like you can do it with
sequences. You can phrase a graph as a sequence. But it's just like the wrong model. And you would
pay dramatically less compute if you approached it from a different angle. Okay. One last question for
you. So you concluded earlier that scaling, computed inference time, like, oh, people have
noticed, but it's not really priced in, like how big of a change this is. Is there anything else
you think is not priced in by the market right now that, like, cohere thinks about that you think
it up. Yeah, I think there's this idea of like commoditization of models. I don't really think
that's true. I don't think that models are actually getting commoditized. I think what you see is
you see price dumping. And so you see people giving it out for free, giving it out at a loss,
giving it at zero margin. And so they see the prices coming down and they assume prices coming down
means commoditization. I think in reality, the state of the world is there's a total technological
refactor that's going on right now. And we'll last the next 10 to 15 years. And it's kind of like
we have to repave every road on the planet. And there's like four or five companies that know
how to make concrete. Okay. And like maybe today some of them give their concrete create away for free.
But over time, there's a very small number of parties that know how to do this thing and a huge job in front of us.
And pressures to drive growth to show return on investment, it's an unstable present state to be operating at a loss or giving away very expensive technology for free.
So growth pressures of the market will push things in a certain direction.
And, yeah, you know, the price of Haiku 4Xed two weeks ago.
Aiden, this has been super fun.
Thank you so much for doing this with us.
Yeah, my pleasure.
My pleasure.
It was super fun.
Great seeing you.
Find us on Twitter at No PryorsPod.
Subscribe to our YouTube channel.
If you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen.
That way you get a new episode every week.
And sign up for emails or find transcripts for every episode at no-dashpriars.com.
Thank you.