No Priors: Artificial Intelligence | Technology | Startups - Model Plateaus and Enterprise AI Adoption with Cohere's Aidan Gomez

Starting point is 00:00:00 Hi listeners and welcome to No Pryors. Today we're hanging out with Aidan Gomez, co-founder and CEO of Cohere, a company valued at more than $5 billion in 2024, which provides AI-powered language models and solutions for businesses. Aiden founded Cohere in 2019, but before that, during his time, as an intern at Google Brain, he was a co-author on the landmark 2017 paper, Attention is All You Need. Aiden, thanks for coming on today. Yeah, thank you for having me. Excited to be here.

Starting point is 00:00:33 Maybe we can start just a little bit with the personal background. How do you go from growing up in the woods in Canada to, you know, working on the most important technical paper in the world? A lot of luck and chance. But yeah, I happened to go to school at the place where Jeff Hinton taught. And so obviously Jeff recently wants to be. the Nobel Prize. He's kind of attributed with being the godfather of deep learning. In U of T, the school where I went, he was a legend and pretty much everyone who was in computer science studying at the school wanted to get into AI. And so in some sense, I feel like I was raised

Starting point is 00:01:19 into AI. Like as soon as I stepped out of high school, I was steeped in an environment that really saw the future and wanted to build it. And then from there, it was a bunch of happy accidents. So I somehow managed to get an internship with Lukash Kaiser at Google Brain. And I found out at the end of that internship, I wasn't supposed to have gotten that internship. It was supposed to have been for PhD students. And so they were like throwing a goodbye party for me, the intern. And Lukash was like, okay, so Aidan, you're going back.

Starting point is 00:02:05 How many years have you got left in your PhD? And I was like, oh, I'm going back into third year undergrad. And he was like, we don't do undergrad internships. So I think it was a bunch of like really lucky mistakes that led me, led me to that team. working on really interesting, important things at Google, what convinced you that you should start cohere? Yeah, so I bounced around. Like, when I was working with Lukash and Noam and the Transformer guys, I was in Mountain View. And then I went back to U of T, started working with Hinton and my co-founder, Nick, in Toronto. I brain there. And then I started my PhD and I went to England. And I was working with Jakup, who's another transatlantic.

Starting point is 00:02:56 former paper author in Berlin and collaborating with Jeff. We had Jacob on the podcast. Oh, nice. Yeah, yeah. Okay. Fan of the pod. Good. Good. So yeah, I was working with Jacob in Berlin. And then I was also collaborating remotely with Jeff Dean and Sanjay on Pathways, which was like they're, you know, bigger than a supercomputer training program. The idea was like wiring together supercomputers to create a new larger unit of compute. you could train models on. And at that stage, GPT2 had just come out, and it was pretty clear the trajectory of the technology.

Starting point is 00:03:39 Like we were on a very interesting path. And these models that were ostensibly models of the internet, models of the web, we're going to yield some pretty interesting things. So I called up Nick, I called up Ivan, my co-founders. And I said, you know, maybe we should figure out how to build these things. I think they're going to be useful. For anyone who doesn't know yet, can you just describe at the high level, like, what cohere's mission is and then what the models and products are? Yeah, so our mission, the way that we want to create value in the world is by enabling other organizations to adopt this technology and make their workforce more productive or transform their product and the services that they offer.

Starting point is 00:04:25 So we're very focused on the enterprise. We're not going to build a chat GPT competitor. What we want to build is a platform and a series of products to enable enterprises to adopt this technology and make it valuable. And in terms of like your North Star of how you organize the team and invest, you obviously come from a research background yourself. Like how much do you think, you know, cohere success is dependent on core models versus. other, you know, platform and go-to-market support investments you make? It's all of the above. Like, the models are the foundation. And if you're building on a foundation that doesn't meet the customer's needs, then there's no hope. And so the models are

Starting point is 00:05:12 crucial, and it's like the heart of the company. But in the enterprise world, things like customer support, reliability, security, these are all key. And so we've, heavily invested on both sides. We're not just a modeling organization. We're a modeling and go-to-market organization. And increasingly, product is becoming a priority for co-hear. And so figuring out ways to shorten time to value for our customers. Yeah, over the past like 18 months since the enterprise world sort of woke up to the technology, we've watched, we've watched folks build with our models, seeing what they're trying to accomplish, seeing the common mistakes that they make. That's been helpful. It's been sometimes frustrating, right,

Starting point is 00:06:01 watching the same mistake again and again. But we think there's a huge opportunity to be able to help enterprises avoid those mistakes and implement things right the first time. And so that's really where we're pushing towards. Yeah. Can we make that a little bit more real? Like what is the mistake that frustrates you most and how can product go meet that? Yeah. Well, I think all language models are quite sensitive to prompts to the way that you present data. They all have their own individual quirks. The way that you talk to one might not work for the way that you talk to another. And so when you're building a system like a RAG system where there's an external database, it really matters how you present the retrieved results to the model. It matters how

Starting point is 00:06:46 the data is actually stored in those databases. The formatting counts. And these small details are often lost on people. They overestimate the models. They think they're like humans. And that has led to a lot of repeat failures. People try to implement a rag system. They don't know about these like idiosyncratic elements of implementing one properly. And then it fails. And so in 2023, there are a lot of these POCs, a lot of people trying to get familiar with the technology, wrap their heads around it, and a lot of those POCs fail because of infamiliarity, because of, yeah, these common errors that we've seen. And so moving forward, we have two approaches.

Starting point is 00:07:38 One is making the models more robust. So the model should be robust to a lot of different ways that you present data. And the second piece is being more structured about the product that we exposed to the user. So instead of just handing a model and saying, you know, prompt it, good luck. Actually putting more structure around it. So creating APIs that more rigorously define how you're supposed to use the model. These sorts of pieces, I think, just reduce the chances of failure and make these systems much more usable for the user. What are people trying to do?

Starting point is 00:08:15 Can you give us a flavor of some of like the biggest use cases you see in the enterprise? It's super broad. So it spans pretty much every vertical. I mean, the common things are like Q&A, so speaking to a corpus of documents. For instance, if you're a manufacturing company, you might want to build a Q&A bot for your engineers or your workers who are on the assembly line and plug in all of the manuals of the different tools and diagnostic manuals for common errors and parts and then let the user chat to that. that instead of having to open up a thousand page book and try to find what they need. Similarly, Q&A bots for the average enterprise worker. So plugging in your IT, FAQ, your HR docs, all the things about your company,

Starting point is 00:09:06 and having a centralized chat interface onto the knowledge of your organization so that they can get their questions answered. Those are some of the common ones. Beyond that, there are kind of specific functions that we power. A good example might be for a health care company, they have these longitudinal health records of patients. And that consists of every interaction that that patient has with the health care system, from visits to a pharmacy, to the different labs or tests that they're getting, to doctors' visits. And it can spend decades. And so it's a huge, huge record of someone's medical history.

Starting point is 00:09:50 And typically what happens is that patient will call in and they'll ring up the receptionist and be like, my knee hurts. I need an appointment. And the doctor then needs to kind of comb through the past few entries. See, has this come up before. And maybe they missed something that was two years ago because they only have 15 minutes before an appointment. But what we can do is we can feed that entire history in alongside the reason they're

Starting point is 00:10:19 coming in. So contextually relevant, right, to what they said they're coming in for and surface a briefing for the doctor. And so this tends to be one dramatically faster for the doctor to review, but also often it catches things that a doctor couldn't possibly review before every patient meeting. They're not going through 20 years of medical history. It's just not possible. But the model can do that. It can do that in under a second. So those are the sorts of functions. that we're seeing, summarization, Q&A bots, a lot of these, you might think of them as mundane, but the impact is immense. We see tons of startups working on problems such as, let's say, enterprise search overall,

Starting point is 00:11:06 specialized applications to, let's say, like, technical support for a particular vertical, even looking at health records and reasoning against them and retrieving from them. How do you think about, like, what the end. state, there's no end state, but what some stable equilibrium state is for how enterprise is consumed from, let's say, specialist AI-powered application providers versus custom applications built in-house with AI platforms and model APIs. I think it's going to be a hybrid. I think it's probably, you can imagine like a pyramid where the bottom of that pyramid,

Starting point is 00:11:44 every organization needs this stuff. And it's like co-pilot, like a generalist. chatbot in the hands of every single employee to answer their questions. And then as you head up the pyramid, it's more specific to the company itself or the specific domain or product that they operate in or offer. And as you push up that pyramid, it's much less likely you're going to find an off-the-shelf solution to address it. And so you're going to have to build it yourself.

Starting point is 00:12:14 What we've pushed organizations to do is have a strategy. that encompasses that full pyramid. Yes, you need the generalist standard stuff. Maybe there's some industry-specific tools that you can go out and buy. But then if you're building, don't build those things that you could buy. Instead, focus on the stuff that no one's going to sell to you. And that gives you uniquely a competitive advantage. So we worked with this insurance company and they insure large industrial development projects.

Starting point is 00:12:47 It turns out, I know nothing about this space. Turns out what they do is there's like an RFP put out by a mine or something, like whatever the project is, for insurance. And they have actuaries, jump on that RFP, do tons of research about, you know, the land that it's on, the potential risks, et cetera. And then it's essentially a race to whoever responds first usually gets it. And so it's a time-based thing. How quickly can these actuaries put forward a good researched proposal? And what we built with them was like a research assistant. So we plugged in all the sources of knowledge that these actuaries go to to do their research via RAG.

Starting point is 00:13:33 And we gave them a chat bot. And it dramatically sped up their ability to respond to RFPs. And so it grew their business because they were just winning many more of them. And so it's tough for like, you know, we built horizontal technology and LLM is kind of like a CPU. I don't know all the applications of Anelon, right? It's so broad and really the deep insight or the competitive advantage, the thing that puts you ahead is listening to the customer and letting them tell you what would put them ahead.

Starting point is 00:14:08 And so that's a lot of what we've been doing is just being a thought partner and helping brainstorm these projects and ideas that are strategic to them. I'd wager that's, you know, this company is winning because the vast majority of their competitors haven't been able to move so quickly to adopting, you know, and building, like, this research assistant product that is helping them. Like, what is the biggest barrier you see to generally enterprise adoption? I think the big one is trust. So security is a big one. in particular in regulated industries like finance, health care, data is often not in a cloud, or if it is in a cloud, it can't leave their VPC.

Starting point is 00:14:59 And so it's very lockdown. It's very sensitive. And so that's a unique differentiator of cohere. The fact that we haven't locked ourselves into one ecosystem and we're flexible to deploy on-prem if you want us in VBC. outside of VPC, literally whatever the customer wants, we're able to touch more data, even the most sensitive data, and provide something that's more useful. So I would say security and privacy is probably the biggest one. Beyond that, there's knowledge, right? Like the knowledge

Starting point is 00:15:29 to know how to build these systems. They're new. It's unfamiliar to folks. You know, the people with the most experience have a few years of experience. And so that's the other major piece. That bit, I think it's honestly just a time game. Like, eventually developers will become more familiar with building with this technology. But I think it's going to take another two or three years before it really permeates. Do you think in like a traditional hype cycle for enterprise technologies, probably for most technologies, but in particular enterprise, you know, there's this trough of disillusionment concept of people. get very excited about something and ends up being harder to apply or more expensive than they thought,

Starting point is 00:16:17 do we see that in AI? I'm sure we see some of it for sure. But I think honestly, like the core technology is still improving at a steady clip and new applications are getting unlocked every few months. So I don't think we're in that trough of disillusionment yet. Yeah, it feels like we're super early. It feels like we're really, really early. And if you look at the market, this technology just unlocks an entire new set of things that you couldn't build. You just fundamentally couldn't build them before, and now you can.

Starting point is 00:16:54 And so there's a resurfacing of technology, products, systems that's underway. Even if we didn't train a single new language model, like, okay, all the data centers blow up. We can't improve the L-R-M. We only have what we have today. There's a half decade of work to go integrate this into the economy, to build all these things, to build the, you know, RFP insurance RFP response bot to build the health care record summarizer. Like there's a half decade of just resurfacing to go do. So there's a lot of work ahead of us. I think we're kind of past that point.

Starting point is 00:17:35 There was a question of, oh, is there too much hype? Is this technology actually going to be useful? but it's in the hands of 100 million people now, hundreds of millions of people now. It's in production. There's very clear value. The project is now putting it to work and delivering it to the world. In this question of like integration into the real world,

Starting point is 00:17:57 some piece of it is, of course, like interfaces and change management and like figuring out how users are going to understand the model outputs and guard bells and all of that. Specifically, when we think about the model and specialization, like, do you have some framework you offer customers or that you use internally around what version of it they should invest in, right? So we have pre-training, post-training, fine-tuning, retrieval, like in those sort of traditional sense, like prompting, especially as we get longer context. Like, how do you tell customers to make sense of how to specialize? It really depends on the application. Like there's some stuff, for instance, we partnered with Fujitsu, who's like the largest

Starting point is 00:18:44 SI in Japan, to build a Japanese language model. There's just no way you can do that without intervening on pre-training. You can't like fine-tune or post-trained Japanese into a model effectively. And so you have to start from scratch. On the other side, there's more narrow things. Like if you want to change the tone of the model or you want to change the tone of the model or you want to, I don't know, change how it formats certain things. I think you can just do fine-tuning.

Starting point is 00:19:15 You can take the end state. And so there is this gradient. What we usually recommend to customers is start from the cheapest, easiest thing, which is fine-tuning, and then work backwards. And so start with fine-tuning, then go back into post-training, right? Like SFT, RLHF, then if you need to, and, you know, it's kind of a journey, right? Like as you're talking about a production system and the constraints are getting higher and higher, you potentially will need to touch pre-training.

Starting point is 00:19:46 Hopefully not all of pre-training. Hopefully it's like 10% of pre-training at the very end or maybe 20% of pre-training. But yeah, that's usually how we think about it. It's like this journey from the simplest cheapest thing to the most sophisticated but most performant. Moving along the gradient from the cheapest thing makes sense to me. the idea that any enterprise customer will invest in pre-training is, I think, a bit more controversial. I believe some of the lab leaders would say, like, nobody should be touching this. And it doesn't make any sense for people from a scale of compute and data, data curation effort required and just sort of the talent required to do pre-training in any sort of competitive way.

Starting point is 00:20:28 Like, how would you react to that? I think if you're building like a, if you're a big enterprise and you're sitting on a ton of data, like hundreds of billions of tokens of data, pre-training is a real lever that you're able to pull. I think for most SMBs and certainly startups, it makes no sense. You should not be pre-training a model. But if you're a large enterprise, I think it's it should be a serious consideration. The question is how much pre-training? It's not like you have to start from scratch and do a, you know, $50 million training run, but you can do a $5 million training run. That's what we've seen succeed, these sort of continuation pre-training efforts.

Starting point is 00:21:18 So, yeah, that's one of the offerings that we have. But of course, we don't jump straight into that. You don't need to spend massively if you don't want to. And usually the enterprise buying cycle or technology adoption cycle is quite slow. And so you have time to move back into it. I would say it's totally at the customer's discretion. But to the folks who say that no one should be pre-training. No one outside of, let's say, AGI labs should be pre-training.

Starting point is 00:21:48 That's empirically wrong. Maybe that's a good jumping off point into just like talking a little bit more about what's going on in the technical landscape and also what that's. means for Cohere. Like, what is the, what is the bar you set internally for Cohere? You said the models of the foundation. And I believe you've also said, like, there's no market for last year's models. Like, how do you square that with the expense of the capital expense of competition and the rise of open source models now? Well, I think you have to spend, there's some like minimum threshold that you need to be spending at in order to build a model that's useful. The things get cheaper. The compute to train the model get cheaper. The sources of data, well, in some

Starting point is 00:22:33 directions they get cheaper and others not. With synthetic data, it's gotten dramatically cheaper, but with expert data, it's getting harder and harder and more expensive. And so what we've seen is today you can build a model that's as good as GPT4 in all the things that enterprises might care about for $10 million, $20 million. like just orders of magnitude less than what was spent to develop that model. And so if you're willing to wait six months or a year to build the technology, you can build it at a fraction of what those frontier labs have paid to develop it. And so that's been a key part of cohere's strategy is we don't need to build that thing first.

Starting point is 00:23:22 What we'll do is we'll figure out how to do it dramatically cheaper. focus on the parts of it that matter to our customers. So we'll focus on the capabilities that our customers really depend on. Now, at the same time, we still have to spend, like relative to a regular startup. We have to pay for a supercomputer. And those things cost hundreds of millions of dollars a year. So it is capital hungry, but it's not capital inefficient. It's very clear that we'll be able to build a very profitable business off of what we're building. So that's the strategy, is don't lead, don't burn, you know, three, five, seven billion dollars a year to be at the front, be six months behind, and offer something to market to enterprises that

Starting point is 00:24:12 actually fits their needs at a price point that makes sense for them. Why spend on the supercomputer and the training yourself at all if you have increasingly the open source options? Well, you don't. Not really. Say more. So for Lama, yeah, you get like the base model at the end when it's cooled down and it has zero gradient.

Starting point is 00:24:33 You get the post-trained model at the end when it's cooled down and has zero gradient. Taking those models and trying to fine-tune them, it's just not as effective as building it yourself and you have much fewer levers. to pull, then if you actually have access to the data and you can change the data that goes into that process. And so we feel that by being vertically integrated and by building these models ourselves, we just have dramatically more leverage to offer our customers. Maybe if we go to projection and we'll hit on a few things that you've mentioned as well,

Starting point is 00:25:14 where are we in scaling loss? How much capability improvement do you expect over the next few years? We're pretty far along, I would say. Like, we're starting to enter into a sort of flat part of the curve. And we're certainly past the point where if you just interact with a model, you can know how smart it is. Like the vibe checks, they're losing utility. And so instead, what you need to do is you need to get experts to measure within very specific domains like physics, math, chemistry, biology. you need to get experts to actually assess the quality of these models because the average person can't tell the difference at this stage between generations.

Starting point is 00:26:00 Yes, like there's still much more to go do, but those gains are going to be felt in very specialized areas and have impacts on more researchy, more researchy domains. I think for enterprises and the general sorts of tasks that they want to automate or tools that they want to build, the technology is already good enough or close enough that a a little bit of customization will get them there so that that's sort of the stage that we're at there is there's a new unlock in terms of the category of problems that you can solve and that's reasoning and so online reasoning is something that has been missing these models they don't have a they previously didn't have an internal monologue right like they didn't really think to themselves you would just ask them a question and then expect them to immediately answer that question. They couldn't

Starting point is 00:26:53 reason through it. They couldn't fail, right? Like make a mistake, catch that mistake, fix it, and try again. And so the fact that we now have reasoning models coming online, of course, opening I was the first to put it into production, but cohere's been working on it for about a year now. This category of tech, I think, is really interesting. There's a new set of problems that you can go solve. And it also changes the, it changes the, it changes the economics. So before, if I had a customer come to me and say, Aden, I want your model to be better at X or I want a smarter model, I would say, okay, you know, give us six to 12 months. We need to go spin up a new training run, train it for longer, train a bigger model, et cetera, et cetera. Now there's that that was kind of the only lever we had to pull to improve the performance of our product. There's now a second lever. which is you can charge the customer more. You can say, okay, let's spend twice as many, you know, tokens or let's spend twice as much time at inference time.

Starting point is 00:28:00 And you'll get a smarter model. So there's a much nicer product experience. Okay, you want a smarter model? You can have it today. You just need to pay this. And so they have that option. They don't need to wait six months. And similarly, for model builders, I don't need to go double the size of my supercomputer to hit a requisite.

Starting point is 00:28:20 intelligence threshold, I can just double the amount of inference time compute that my customers pay for. So I think that's a really interesting structural change in how we can go to market and what products we can build and what we can offer to the customer. I agree. I think it's perhaps undervalued in the ecosystem right now, how much more appealing it should be to all types of customers that you can move from a like a CAP-X model of improvement to a consumption model of improvement, right? And it's not like, you know, these are apples and oranges things, but I think you'll see people invest a lot more in, you know, solving problems when they don't have to hony up for a training run and have this delay, as you described.

Starting point is 00:29:09 Yeah, it hasn't been clocked. Like, people haven't really priced in the impact of inference time compute delivering intelligence, there's loads of consequences, even at like the chip layer, right, like what sort of chips you want to build, what you should prioritize for data center construction. If we have a new avenue, which is inference time compute, that doesn't require this densely interconnected supercomputer. It's fine to have nodes. You can do a lot more locally and less distributed. I think it has loads of impact up and down this chain. And it's a new paradigm. of what these models can do and how they do it.

Starting point is 00:29:49 You were dancing around this, but because your average person doesn't spend that much time thinking about, like, what is reasoning, right? Do you have any intuition you can offer people for, like, what are the types of problems? This allows us to tackle better? Yeah, I think any sort of multi-step problem. There's some multi-step problems you can just memorize, which is what we've been asking models to do so far. like solving a polynomial, right?

Starting point is 00:30:18 Like, really, that should be approached multi-step. That's how humans solve it. We don't just get given a polynomial and then, boom. There's a few that maybe we've memorized, right? But by and large, you have to work through those problems, break them down, solve the smaller parts, and then compose it into the overall solution. And that's what we've been lacking. We've really lacked.

Starting point is 00:30:40 And we've had stuff like chain of thought, which has enabled, that, but it's sort of like a retrofitting. It's sort of like we train these models to just memorize input output pairs, and we found a nice little hack to elicit the behavior that mimics reasoning. I think what's coming now is from scratch, the next generation of models that is being built and delivered will have that reasoning capability burnt into it from scratch. And it's not surprising that it wasn't there to begin with, because we've been trying to training these models off of the internet. And the internet is like a set of documents, which are the output of a reasoning process

Starting point is 00:31:22 with the reasoning all hidden. It's like a human wrote an article and, you know, spent weeks thinking about this thing and deleting stuff and blah, blah, blah, but then posted the final product. And that's what you get to see. Everything else is implicit, hidden, unobservable. And so it makes a lot of sense why the first generation of life. language models lacked this inner monologue. But now what we're doing is we're with human data and with synthetic data, we're explicitly collecting people's inner thoughts. So we're asking them

Starting point is 00:31:58 to verbalize it and we're transcribing that and we're going to train on that and model that part of the problem solving process. And so I'm really excited for that. I think right now it's extremely inefficient and it's quite brittle, similar to the early versions of language models. But over the next two or three years, it's going to become incredibly robust and unlock just a whole new set of problems. What is the basic driver of the slowdown, you know, reaching the flat part of the curve that you describe with scaling? Is it is it the cost of, you know, increasingly expert data collecting, as you said, like hitting reasoning traces that is harder and more expensive than just taking the data on the internet? Is it the difficulty of having evals for, you know,

Starting point is 00:32:49 increasingly complex problems? Is it just overall cost of compute? Like, why do you think that flattening is happening? When someone's making an oil painting, they do a backcoat and just cover the whole, the whole canvas, and then they sort of paint in the shapes of, of the mountains and the trees. And as you get more and more detailed, you're bringing out very fine brush strokes, there's a lot more of them that you need to make. Before you could just take a big wedge

Starting point is 00:33:26 and just throw paint across the canvas and accomplish the thing that you wanted to accomplish. But as you start to get more and more targeted or more and more detailed, what you're trying to accomplish. It requires a much more fine instrument. And so that's what we've seen with language models. We're able to do a lot of the common, simple, easy tasks quite quickly,

Starting point is 00:33:54 but as we've approached much more specific, sensitive domains like science, math, that's where we've started to see resistance to improvement. And in some places, we've gotten around that by using synthetic data, like in code and math. These are places where the answer is very verifiable. You know when you're right or you're wrong. And so you can generate tons of synthetic data and just verify whether it's correct or not. You know it's correct. Okay, let's train on it.

Starting point is 00:34:26 In other areas that require testing and knowledge in the real world, like in biology, like in chemistry, there's a bigger bottleneck to creating that sort of data and you have to go to experts who know the field, who have experienced it for decades, and basically distill their knowledge. But eventually you run out of experts and you run out of that data and you're at the frontier of what humans know about X, Y, or Z. There's just increasing friction to fill in these much finer details of this portrait. it. I think that's a fundamental problem. I don't think that there's any shortcuts around that. At some stage, we're going to have to give these models the ability to run their own

Starting point is 00:35:17 experiments to fill in areas of their knowledge that they're curious about. But I think that's quite a ways away. And it's going to be tough to scale that. It will take many, many years to do. We will do it. We're going to get there 100%. But for the stuff that I care about today with Cohere, I think there are many applications which this technology is ready for production for. And so the primary focus is getting it to production and ensuring that our economy adopts this technology and integrates it as quickly as possible, gets that productivity uplift. And so while that technical question is super interesting about, you know, why is progress slowing down?

Starting point is 00:36:09 I think it should be kind of obvious, right? It's like the models are getting so good. They're hitting, they're running into the thresholds of human knowledge, which is really where they're getting their capability from. You are so grounded in, you know, getting the capabilities we have and will continue to progress, even if the curve is flattening into production. I think I know this answer, but how much does cohere think about like AGI and takeoff? And does that matter to you? Well, AGI means a lot of things to a lot of different people. I think I believe in us building generally intelligent machines, like completely.

Starting point is 00:36:49 It's like, of course we're going to do that. But AGI has been conflated. How soon? We're already there. It's not a, you know, it's not a binary. It's not discrete. It's continuous, and we're, like, well on our way. We're pretty far down that road.

Starting point is 00:37:07 There's some definition elsewhere in industry that, like, you can put a break point at, even if you have this continuous function, you can put a break point in, like, there's intelligence that replaces, like, an educated adult professional in any digital role. Your view is there's no really important break point that's happening. That's sort of, like, objective check. list thing. Like, when you've checked all these boxes, then you've got it. I think you can always find, like, a counter-example. You're like, oh, well, it hasn't actually beaten this one human over here who's doing this, like, for random thing. No, I think it's, I think it's pretty continuous

Starting point is 00:37:45 and we're like quite far, quite far along. But the AGI that I really don't subscribe to is the superintelligence takeoff, self-improvement, just leading to the Terminator that exterminates us all. Or creates abundance, unclear. Yeah, or creates abundance. Right, right. Yeah. No, I think we'll be the one to create abundance. We don't need to wait for this God to emerge and do it for us. Let's go do it with the tech that we're building. We don't need to depend on that. We can go do it ourselves.

Starting point is 00:38:14 We will build AGI if what you mean is very useful, generally capable technology that can do a lot of the stuff that humans can do and flex into a lot of different domains. If what you mean is, you know, are we going to build God? No. What do you think is the driver in that difference of opinion? I don't know. I think maybe I'm a little bit more in the weeds of the practical frustrations of the technology, where it breaks, where it's slow, where we start to see things plateau or slow down.

Starting point is 00:38:59 And perhaps others are more, maybe they're more optimistic. Maybe they see, they see a curve increasing and they just think it goes on forever. Like that will just continue arbitrarily, which I disagree with. I think there's friction points. Like there is genuinely friction that enters in. Like maybe even if in theory, you know, like a neural net is a universal approximator, it can learn anything to universally approximate, you would need to build a neural net the size of the universe. And so, like, there's some fundamental barriers to reaching limits that people extrapolate out to that I think will

Starting point is 00:39:44 bound the practically realizable forms of this technology. Are there domains where you just believe to LLMs, as we have them today, are, like, not a good fit for prediction, right? And so an example might be, like, are we going to get to physics simulation from sequence to sequence models? I mean, probably, yeah. Like, physics is just, like, a series of states and transition probabilities. So I think it's probably quite well modeled by sequence modeling. But are there areas where it's poorly suited?

Starting point is 00:40:25 I'm sure. I'm sure that there are better models for certain things, more efficient models. Like you can take it, if you zoom into a specific domain, you can take advantage of structure in that domain to carve off some of the unnecessary generalities of the transformer or of this category of architectures and get a more efficient model. That's definitely true when you, when you zoom in. And it doesn't sound like you think it's like at its core, like a, a representation issue or it's just not going to work?

Starting point is 00:40:58 There's irreducible uncertainty in the world. There's things that you genuinely cannot know. And like building a bigger model will not help you know this genuinely random or unobsurable thing. And so those things will never be able to model effectively until we learn how to observe them or, you know. I think the transformer in this category of model can do much more than people give it credit for. it's a very general architecture, many, many things can be phrased as a sequence. And these models are just sequence models. And so if you can phrase it as a sequence, the transformer can do a fairly good job at picking up any regularity in it. But I'm certain that there are examples that I'm

Starting point is 00:41:46 just not able to think of right now where sequence modeling is super inefficient. Like you can do it with sequences. You can phrase a graph as a sequence. But it's just like the wrong model. And you would pay dramatically less compute if you approached it from a different angle. Okay. One last question for you. So you concluded earlier that scaling, computed inference time, like, oh, people have noticed, but it's not really priced in, like how big of a change this is. Is there anything else you think is not priced in by the market right now that, like, cohere thinks about that you think it up. Yeah, I think there's this idea of like commoditization of models. I don't really think that's true. I don't think that models are actually getting commoditized. I think what you see is

Starting point is 00:42:36 you see price dumping. And so you see people giving it out for free, giving it out at a loss, giving it at zero margin. And so they see the prices coming down and they assume prices coming down means commoditization. I think in reality, the state of the world is there's a total technological refactor that's going on right now. And we'll last the next 10 to 15 years. And it's kind of like we have to repave every road on the planet. And there's like four or five companies that know how to make concrete. Okay. And like maybe today some of them give their concrete create away for free. But over time, there's a very small number of parties that know how to do this thing and a huge job in front of us. And pressures to drive growth to show return on investment, it's an unstable present state to be operating at a loss or giving away very expensive technology for free.

Starting point is 00:43:39 So growth pressures of the market will push things in a certain direction. And, yeah, you know, the price of Haiku 4Xed two weeks ago. Aiden, this has been super fun. Thank you so much for doing this with us. Yeah, my pleasure. My pleasure. It was super fun. Great seeing you.

Starting point is 00:43:56 Find us on Twitter at No PryorsPod. Subscribe to our YouTube channel. If you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-dashpriars.com. Thank you.

No Priors: Artificial Intelligence | Technology | Startups - Model Plateaus and Enterprise AI Adoption with Cohere's Aidan Gomez

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.