Big Technology Podcast - Anthropic Product Head: AI Model Development Is Accelerating — With Mike Krieger

Starting point is 00:00:00 Anthropic product head Mike Krieger joins us to talk about how AI model development is accelerating and what we should look out for as things continue to move faster. That's coming up right after this. Capital One's tech team isn't just talking about multi-agentic AI. They already deployed one. It's called chat concierge and it's simplifying car shopping. Using self-reflection and layered reasoning with live API checks, It doesn't just help buyers find a car they love.

Starting point is 00:00:30 It helps schedule a test drive, get pre-approved for financing, and estimate trade and value. Advanced, intuitive, and deployed. That's how they stack. That's technology at Capital One. Welcome to Big Technology Podcast, a show for cool-headed and nuanced conversation of the tech world and beyond. Well, Anthropic has a new model out, Sonnet 4.5, just months after the series of Claude 4 models came out. So things are moving fast, and we're going to figure out why they're moving much faster

Starting point is 00:01:03 and what the implications are for the AI industry and businesses as a whole. And we're joined today by the perfect guest to do it. Anthropic product head, Mike Krieger, is here with us. Mike, it's good to see you again. Welcome to the show. It's good to be here. Thanks, Alex. So I remember sitting in the audience for Anthropics' first developer day, and it's

Starting point is 00:01:24 funny because in the AI world, you sort of, you go in, what is it, cat ears or dog years? I don't even know. Every month feels like a year. And this was in May, May 2025. And I remember yourself and Dario were on stage saying, yes, we're releasing Claude 4, but, you know, we're going to release the next iterations much faster than we ever have previously. And we're already at 4.5. How is it happening. I think there's a couple of things that we're seeing. I mean, even just thinking about, I mean, May, again, feels like a year ago. I think dog ears is about right. I think there's a couple of things. One is we've been working much more with sort of end users or customers of, for example, of our platform. And with that, we can hear like a much faster feedback loop of,

Starting point is 00:02:13 hey, sign it for is great in these ways. We wish it was better in these ways. And you're starting to get customers that really push the models in really interesting ways. And that ends up being very helpful for us on the research side, because then we can say, all right, these are problems to be tackled in the next version of Claude. So for example, one of them was, you know, Claude, you know, Sonnet 4 and even Opus 4, Opus is our biggest model, is good at writing code,

Starting point is 00:02:37 but, you know, tends to get sidetracked or lost if it's working over longer time horizon. So that was a real emphasis of Sonnet 4.5. Or, you know, we've, you know, put a lot of data into the context, basically how the model is, what it's thinking about in a given point, but at some point that gets filled up and how do you then manage, you know, to keep working on those things. So having that feedback look really helps and it also gives us a lot of urgency because it means that there's sort of almost like bugs in some ways out, you know,

Starting point is 00:03:04 that you want to go fix or at least like feature requests that you want to go fix. So that's, that's one piece. The other one is we've just streamlined a lot more of our model release story. So I think having now seen, you know, I joined shortly before Sonnet 3.5, which was back in like May of last year. So really long time ago in AI years. From then to now, just the sort of operational up-leveling that I think we've seen in terms of, you know, how do we get early access feedback from customers? How do we give, like, the remainder of customers, like, a good heads-up so they can co-launch on launch day? What does even that morning look like on rollout? I was talking to a customer and he's like, I've seen a lot of lab rollouts of models and this was like the smoothest I've seen,

Starting point is 00:03:47 which I like took as like a big endorsement of how much we've like. streamline that model release process. That just makes it so that like every release doesn't feel like, you know, this very, you know, bespoke, very difficult process. It can be much more a great. Like, we know what we're doing. Here's the day. To the extent that research can be predictable, which it can't be.

Starting point is 00:04:05 But within that domain, how do we actually make that as smooth as possible? Right. And maybe I'm looking at this from a dumb outsider's perspective. But the one thing that I didn't hear you mentioned was scale. and, you know, hearing so much about the scaling laws, especially from Anthropic, you know, pardon me believe that like, okay, four is, you know, Claude Four is X number of GPUs, and 4.5 is Y number of GPUs, and five will be Z number of GPUs. So does the numbers in your model release, you know, rubric correlate at all with the scale of the data centers that you're

Starting point is 00:04:46 trading on in the scale of the data. I think what has been interesting is at different points. And if you talk to Jared Kavelin, as our chief scientist, he'll, I think, tell you much the same is the scaling laws, I think, paint a picture of what is possible, but is not predetermined. Like, to actually get there, there's a lot of actually really difficult, both machine learning and engineering work. So I think one thing that's been notable to your question about scale over the last, you know,

Starting point is 00:05:12 six months is how much has been really engineering. Like if you're going to do both pre-training and post-training on an increasingly large number of accelerators, how do you make that reliable? How do you keep that, you know, how you keep that run, as we call it, like going, even if, you know, some portion of it has an issue. So a lot of the, I think to your question, a lot of the improvement in our ability to deliver these models really has come for our ability to run these large training runs at scale, which, you know, again, fundamentally in engineering and machine learning problem, I think both have improved. I think if I pointed at something between Sonnet 4 and 4-5, a lot of it really has been on the engineering side to just be able to scale up, especially a lot of the post-training work. If I'm reading you're right, it's not necessarily gains that Anthropic is seeing from scaling up the data centers.

Starting point is 00:05:57 It is algorithmic work that is being done by your teams to make the models better. They really come together. I think it's the algorithmic work and then the ability to maximize the amount of compute that we can use on those algorithmic improvements. So they really kind of go hand in hand, sometimes directly hand in hand in that, you know, either an idea that works at small scale when you scale it up doesn't work as well. And then other times an idea only works when you get enough data and scale in there as well. So it really becomes, you know, when I think about our team, we actually just brought in a new CTO.

Starting point is 00:06:29 And a lot of, I think, his remit will be, how do you really partner research and our like kind of core engineering teams together to achieve that kind of scale? Okay. And another thing I was expecting you to say, which I'm not sure if I've heard yet, is that teams within Anthropic have used the coding capabilities of your AI models to be able to ship faster. Is that a sort of supporting character here, or is it the star? I think it's a good, I'll have to think about that for a second. I think it's a little bit of both. I would actually say there's a thing that is emergent even beyond the coding capabilities, which is the ability

Starting point is 00:07:08 of Claude to be a really active participant in the process. And here's what I mean by that. You know, I think about the way Claude was being used around even Sonnet 4 was, you know, help write code, you know, to launch these models, help write the product code for sure, contribute really strongly to Claude Code. You can imagine Claude Code itself is like a very sort of, we use Claude Code to develop cloud code very much in a loop. I think that the biggest delta between 4 and 4-5 is that now we have much more of Claude as an agent or almost like a coworker in, for example, our Slack channels. So for example, we have something we built that's Claude on call. So if you've been an engineer, one of the things

Starting point is 00:07:51 that you have to do is you take the metaphorical pager, which is basically you're on call for a week or two to manage a system. And if you get paged, you'll show up and say like, all right, there's a certain number of things that could be wrong. I got to go check these graphs. I got to maybe try this out. And one thing that we've built using the Cloud Agent SDK,

Starting point is 00:08:11 which we also released along it's on at 4.5 publicly, but we've been using internally for a while, is the ability for Claude to basically show up first in those incident channels and already have a sense of what might be going on and be able to answer really quickly, hey, can you do some data diving while I work on something else? And so we've increasingly had Claude play these sort of, yeah, these really collaborative roles within our company,

Starting point is 00:08:35 even beyond the ability to code. And it's, again, using the same technology as Claude Code under the hood, but it's accelerating the company in being more efficient or better able to scale up or better to able to understand it. So I think the answer to your question is it's a supporting role on the sort of building side, but it's playing a much more fundamental role in terms of the actual operational side. So let me see if I can zero into it.

Starting point is 00:08:56 So instead of basically being auto-complete for coding, this is actually going out and being proactive, examining things and then coming back with insights. Exactly. And we have similar sort of, you know, agents is the, I guess the industry term of art now. But I feel like agents can mean so many things to different people. people right now. What does agents mean to you? If you're going to start talking about agents, I need a definition of this word because I'm struggling to figure it out. I think the purest definition,

Starting point is 00:09:23 and this is not so pure because I'll probably use like 20 words to do it, so maybe we can edit it down together. No, this is a going in full. Yeah, AI systems that can plan and sort of run actions over long time horizons using a variety of tools where the kind of steps are not predetermined. They're able to solve problems dynamically based on what information emerges on it. So there's, you know, I end up having this sort of agent kind of scorecard that I've been using internally as we think about our own products. And there's a bunch of characteristics that I look at. This is way more than 20 words, Alex.

Starting point is 00:10:02 So attributes I look at are things like autonomy. So how long can the agent run unconstrained? So Sonnet 445 is a big leap there? Proactivity, like, is the agent able to not just really? to questions, but actually sort of suggest either ideas or interject. Ability to use tools, and often a variety of tools. Some of them might be research tools. Some of them might be, you know, be able to write to a database.

Starting point is 00:10:27 Memory. So can the agent sort of learn over time and improve its ability to perform a task? I would say like the 100th task with an agent should be much better than the first, because that should be the case for human employees as well. And then communication. Is it showing up in all the right place? And so for us, we think, you know, these entities, these agents are going to start showing up in all places where you do work, whether that's your Slack or your teams, for example. We launched a research preview of Claude and Chrome. We think of Claude, you need to be in all of these places where you're doing work so that you can actually bring it to work rather than having to bring work to it. So I even have this like spider chart of like attributes. So for any given agent that we're building internally, we sort of like grade it on all these different attributes. And we can say, All right, great.

Starting point is 00:11:13 For the next quarter, our investment is going to be on autonomy or it's going to be on memory and we can kind of kind of pick our attributes that we're working on. That was a good definition of agents. Actually, I think that's the most complete definition I've heard. So here's like an overriding question that's coming up as we talk. Is the improvement that most of the improvement that we're going to see, at least in the near term in AI, is it just going to be coming on the back of the orchestration of these models, getting them to be able to take multiple steps as opposed. to, I think, what was sort of the defining characteristic of the earlier days of LMs, which was basically just make it bigger, make it generally smarter, maybe get some PhDs to feed some information to it in post-training, and then you'll just, you know, see what happens as you go.

Starting point is 00:11:59 I think that there's going to be some fields or disciplines where that sort of extremely sort of precise depth in a particular task or domain will continue to be. important. But I think I'm much more excited. And overall, I think we're spending a lot more our time, even from the product side, around that. I think it's actually two pieces. One is that orchestration. And then two is, how do you take the work that Claude is doing from like pretty good to great? And so, you know, we launched a ability for Cloud to create Word and PowerPoint and Excel files that you can then download and bring into those apps. And if you get to like, 50% as good as you would have done yourself, I don't think that's good enough, and it won't

Starting point is 00:12:47 speed you up. And in fact, it's like, I don't know, I could have just done this myself, and now, at least I would have known what it's done. When you start clearing this sort of like 75% to 80% threshold, of course, it's not scientific, but it's kind of like a little bit of a vibes based thing. Then it starts actually being able to really accelerate work. And so that's the other emphasis, too. And it's interesting that some of that is post-trading. Some of that's actually also giving a lot of really good examples to Claude and really working closely with how the model is producing outputs that are what we think of as like professional quality. Right. And look, I know we're 15 minutes in. So I think we should probably take a minute to

Starting point is 00:13:25 talk about the concrete things that you've improved in Claude between 4 and 4.5. Do you want to just give us briefly a little bit of a list of the things that get better with the new model? Yeah. I think the ones that I I think are highlights. Maybe I'll bucket into three. One is from a price performance perspective. So 4.5, son of 4.5, basically outdoes Opus, our largest model in effectively every category, but does so while running faster and at a fifth the cost. So if you think about where we were in May, you know, Code with Claude, we were announcing Opus 4. We now have a model that is better than that and even its successor opus 4.1, but does so to fit the cost, which is very, like, you know,

Starting point is 00:14:11 opens up a whole new set of use cases for that kind of intelligence. That's one on the price performance piece. The second one is on its ability, not just to code for longer, but just execute it genetically for longer. We talked a little bit about agents, but what we saw was, actually I put a fun video of this on my ex account, which is we asked every clod from Claude 1 to Claude, you know, 4.5 to recreate claw.AI, so like our flagship AI products. And 4.5 was really the first one that was able to do it end to end and actually produce something of, you know, quality. It actually works. You can log in. You can use an API key, all of those things as well. And so that ability to like execute agentically, work for long time horizons. We had one customer who had it

Starting point is 00:14:56 work for 30 hours. Of course, that's not going to be every task, but like that's the kind of upper bound that we're starting to see is another big improvement. And then the third one is, is moving some of those post-training wins beyond just code to other domains we think are really important. So for example, financial analysis is an area that we've been really interested in. We launched Cloud for Financial Services

Starting point is 00:15:17 a couple of months ago. And we incorporated that into the model training in Sonat 4.5 as well. So when you look at things like benchmarks like finance agent, different domains like the legal domain as well, the model is improving not just on code, which is obviously important, but also these other domains that are, they might actually use

Starting point is 00:15:34 code to solve these challenges, but the point is not to write code. The point is to solve a financial analysis, for example. Okay, and I definitely want to get into these various agents in a moment, but let me ask you this. You mentioned that the Newsome at 4.5 model is more performant than Opus, the big model in the last release or the four release, and it's cheaper. So how do you do that? I think it's, I mean, we talked a little bit about scale. That's one piece, which is, you know, that just really being training Sonnet 4-5 on, like, significant scale. Another one is improvements in the post-training work that we've done as well. And the third one is really sort of closing the loop on what we hear from customers

Starting point is 00:16:19 around what are the things that they wished either Opus or Sonnet were better at and then getting that right. So one we hear all the time is instruction following. Like if I tell Sonnet to do this thing, I need it to do the thing very reliably. even if it's AI, even if it tries to be creative. Like, there's times where you really want it to be more prescriptive. And we put a bunch of work into instruction following for the sonnet, too. So I want to talk about these agents.

Starting point is 00:16:42 So I've got a list of four different types that you highlighted upon release, finance, personal assistant agents, customer support, and deep research. And I just want to talk about who they're for. So the finance agents are interesting. So it says you say you could build agents that can understand your portfolio and goals, as well as help you evaluate investments by accessing external APIs, personal assistant agents, build agents that can help you book travel and manage your calendar, as well as schedule appointments, put together briefs, and more by connecting your internal data sources and tracking context across applications.

Starting point is 00:17:18 I think to set these up, it looks like it's a decent amount of work. Like you'd have to, for instance, with the finance agent, understand what an API is. So it's not going to be something that I think most people would take off the shelf. So who is this set of agents for? And do you have plans to make this technology more accessible? So let's say, you know, I'm a finance, not even a finance professional. Let's just say I'm someone that wants to have AI run through my portfolio. Can I eventually be able to easily set that up and run it without having to know any of this fancy tech stuff?

Starting point is 00:17:52 Yeah, I mean, that's absolutely the goal. So there's agents that will build ourselves and kind of deploy end to end. And I'll talk a little bit on the personal assistance side next. But I think by and large, these will be agents that we can help power for, you know, companies that have, you know, that particular domain expertise that they're bringing it to bear. One of the first companies I ever worked with at Anthropic was Intuit. We were powering their sort of tax advisory service. And, you know, Anthropic, we're never going to build a tax product.

Starting point is 00:18:20 But Intuit has the largest one. And so being able to power their sort of tax Q&A was really powerful. How you can imagine all these other places, too, we've been working. more closely with Microsoft, even for some agents, even within their office suite. So being able to take the financial analysis capability and the financial planning capability and bring it closer to an Excel user, for example. I think that's the way you unlock the maximal value of some of these as well. And I think you'll see us sort of demonstrate these capabilities.

Starting point is 00:18:49 But in terms of the first-party products we build, we're pretty thoughtful about which ones we end up going deep on. Because to your point, it's to reach the scale that I think these products deserve to reach, you want somebody who was really thinking through the whole end-to-end user experience and probably has some of the pre-existing connectors already kind of set up as well. But I think it's important also to build some of these ourselves. So we talked about the personal assistant case. One of the things that we've had a lot of fun with on our mobile apps is using on-device capabilities as well. And so I actually just saw that Apple featured us today as our, you know, like new,

Starting point is 00:19:24 features, like for Sonic 4.5, and one of the things that they were featuring was on iOS and Android now, Claude can sort of read your calendar, read your reminders, like compose text messages, without really any setup at all. So that's ideal, right? Which is like, if you've got those pre-existing connectors, you're not sort of spending a lot of time sort of initializing the, just getting it set up to even get any work done as well. But I guess to be more succinct to answer your question, there's some that we'll build ourselves and in those we'll try to, you know, do their best to sort of simplify the setup process. But I'm also very excited for embedding these agents in existing products that are out there that then have all that data built in.

Starting point is 00:20:03 And so as I read through your blog post, I also started to think a little bit about Dario's prediction about the white collar blood methods like impossible not to, where he says, you know, within a few years, you might see 50% of white collar work automated by these AI bots. Looking at it, being able to do these finance tasks or customer support tasks. or even be a personal assistant. I'm just curious from your perspective as the person running product here. Is this something that you're like merrily running towards trying to automate human work? Or like how do you think about it in your role?

Starting point is 00:20:38 We have, you know, kind of like product principles we try to work kind of towards. And it's actually interesting. Like I think we had very, or different, not entirely different, but kind of a different set of product principles even at Instagram. I think it's important to sort of like figure out what like who you're building for and how. and how you go about it. And one of the, like, principles that we operate, you know, with is if you can build things that are complementary or augmentative, like, bias towards those first.

Starting point is 00:21:09 And it's not to say that in the long run, like, overall these products, like, might not, or probably will be doing more sort of automation or even replacement of work. But we think that two things happen if you can build more augmentative products. So it's like not a finance agent that like takes all the work, you know, and does it all for you, but it becomes more of a back and forth. One is I think it helps people develop an intuition of what the AI is good at today and not good at. So that kind of helps people position even their own sort of skills against that. So I think there's the intuition building. And then the second part is it, I think, extends the timeline by which people are making that adaptation.

Starting point is 00:21:47 So I think if you see Dario out there talking about the, you know, likely labor impacts, it's not to sort of of try to accelerate towards those, but more around like, hey, we think this is coming. Let's start this conversation now. And I think in the products that we build, can we sort of show that this is likely to come, but still build a bridge between here and there by building more augmentative products? It's definitely like a, there's art and science here. So I think we debate a lot within the product team as well, like at a great conversation with our head of design where he's like, if we had a product where you hit a button and

Starting point is 00:22:24 it did all your work for you that day. Would that be a good product? And would that be like an anthropic-y product? And we both came to a conclusion like, no. Like one of our kind of core brand tenets that we've like come out is like keep thinking. And like we want it to be much more of this collaborative sort of accelerator of human thought rather than replacement for human thought. And would like to keep that the case for as long as possible. Yeah, I'm still trying to figure out how I feel about this stuff. But I do think that the conversation around augmentation versus automation uh is still like so elementary and honestly like it's a fairly dumb way uh to look at i'm not saying what you're saying is i'm just saying this

Starting point is 00:23:04 the industry's you know perspective on this like are you automating or augmenting tasks because let me give you an example if you automate you know some if you automate a job within your company you've automated a job the question is what happens next and if you put that person who was doing that job on something leading a new project, for instance, or something higher value, you've now augmented it in a way that the word augment doesn't even come close to describing. So it's really tough, I think, to measure this stuff. And I don't know, I just sort of feel conflicted about the way that the conversation has gone so far. What do you think? I think that there's a lot to what you're saying, which is there's the point.

Starting point is 00:23:50 point in time task, right? Like, oh, you know, managing my calendar or, you know, doing some research out about something that I'm talking about. And then there's the broader context of what is the sort of role that that person even has within the company. And, you know, a lot of the things that we think about is people end up, I think people will end up feeling more like managers of AI than just users of AI. Then we think a lot of it about this. Even with, it's happening in engineering, right? where our best engineers are managing three or four clod code instances running at once. And all of a sudden, you've had to think about a higher level, like, right, what is the unit of tasks that I want each of these sort of subclod codes to be doing?

Starting point is 00:24:29 I think the same will be the case for how we interact with AI systems. And there's going to be some blend of automation and augmentation there as well. The way I think about this, sort of the bull case where this is twofold one, can you bring to bear sort of world expert level thinking of a particular discipline into companies that might not have had that before, right, either because that talent isn't present in that local market or because the company is just getting off the ground and they can't afford a like world class CPO somewhere, you know, or CTO, can you like elevate the kind of baseline there? So I think that's one piece too. And the second one is having companies that will, I think,

Starting point is 00:25:13 emerge and be able to scale and maintain that sort of small team cohesion. I think we did this really well with Instagram without having to like, you know, build a huge workforce from day one. And I think the kinds of companies that get built will change. But I still think there's like a tremendous amount of economic opportunity throughout. It just might be, you know, more smaller companies rather than fewer bigger monolithic companies. Interesting. I mean, coming from a guy, well, you were at Instagram was what, 16 people when you sold it for a billion dollars. And people said that was crazy. Exactly. So I got a question once. It was like, when do you think the first single person billion dollar company will emerge? I was like, well, we had 13. It was like, you know,

Starting point is 00:25:52 we were, we were getting close. We were 13 at sale and 16 at close. So basically it was, yeah, just around then. So yeah, I mean, we got a lot done with a little. And I think a lot of that came from focus. And, you know, there's probably work that we could have done even more efficiently. Right. And I mean, I think if you sold, waited a couple of years to sell it, might have been worth double or triple that. So folks, if you're just, by the way, if you don't know about Mike's previous work, he's the co-founder of Instagram. So we are going to get to some of the social media elements of this or the comparisons

Starting point is 00:26:25 to social media building in the second half. But two more questions for you as we round out our first half. You mentioned memory. I think it's one of the most interesting parts of this work that's sort of, I think, underrated and underappreciated in the common conversation. Can you talk about how building better memory within these bots is how important that is and how that's actually happening? I think the biggest sort of breakthrough or really key piece of what we've done on memory is rather than treat it as a sort of substitute for how the model might otherwise access information or sort of a system built on top of the model, we actually have trained it deeply into the model. And so the model knows about the concept of memory, which I know sounds kind of.

Starting point is 00:27:10 of funny, but you can really see it as you talk to it and you can even see. Wait, wait, you have to, what does that mean? You have to talk about what that means, model no idea of the concept of memory. Yeah, so basically, in training, we give the model effectively a series of tools to let it both read from, update, you know, write memory. And what that means is it understands the concept that it, like, is capable of managing its own memory. And then in our platform, we actually now have that as a sort of, you know, basic building

Starting point is 00:27:38 block that you can use. And what that means is, as you're talking to Cloud with Access Memory Tool, you can say, hey, Cloud, can you update your memory about this? And it knows what that means. It'll say, great, I'm going to update the memory. Or when it's performing an action, if it thinks there's a good chance that it has some memory related to that action, it will retrieve that memory before doing the action. And previous systems, you would have to either build that yourself on top of it or Cloud or any of these systems wouldn't be as good at using it. And so effectively, in the same way that we might have the thought, hey, I think,

Starting point is 00:28:09 I think I did this before, like, I think this happened before. I'm going to go, you know, either like, think about it for a sec or maybe even search my email. How do you basically give in clod that same ability? And that can be sort of memory that's very, like, fact-based. Like, who are you interacting with? What should you do? But it can also be more task-based.

Starting point is 00:28:30 Like, whenever I'm doing X, make sure I remember to do Y. That's pretty amazing. And so what will the memory get? get you when you're using this like better memory will it start to remember many more aspects or like that so i'll give you one example and this is so rudimentary but like if i ever use claude to do a podcast description i have a format prompt that i drop in first sentence should be this second sentence should be this and every single time i you know write that prompt uh i i have every time i ask for a description i have to use that exact

Starting point is 00:29:03 prompt or else it will do whatever it wants and freelances when are we going to get to the point where these bots are going to be smart enough where when i tell it remember this is the way that we do things here it knows and i'm sure that my problem is something that people have all across the board when they're trying to get these bots to work on the same things for them yeah um very soon so we have a launch coming up in the next like week or so that's going to really like uh there's both memory and then also the idea of you know what are the repeatable ways in which you want work to get done um and so we'll have something really exciting there very soon but from the memory perspective beyond the sort of like very sort of basic fact-based things like I'm Alex I run a podcast and a newsletter and a site and that's somewhat helpful but I think not sufficient like getting to the point of hey have I interacted with this person before like what happens last time I chatted with my can I like search over my memories there or it can be hey whenever you generate these summaries like make sure that you always you know cite this piece or lead with the punchier sort of thing

Starting point is 00:30:07 and it's able to sort of update and learn over time. So that's the goal is, again, like, if Claude is like a very competent new hire, we wanted to get to the point where as you use it over time, either on our platform or using our kind of first-party products, it is improving and it just feels like a companion that you've actually helped train to your preferences. We're on the list of priorities is that capability. It sounds like it's probably very high for you.

Starting point is 00:30:33 I know that it's high for open AI. Yeah, I think it's very, it's really high for us. I think for us, it's both, it's high on the first party side, but it's also very high on the, on the platform piece as well. Okay. All right. Let's go to break. I want to ask you afterwards about what the moment building AI has in common and differs from in building social media, which, of course, we just mentioned you were right at the center of. So let's do that right after this.

Starting point is 00:30:59 The holidays sneak up fast, but it's not too early to get your shopping done and actually have fun with it. Uncommon goods makes holiday shopping stress-free and joyful, with thousands of one-of-a-kind gifts you can't find anywhere else. I'm already in. I grabbed a cool Smokey the Bear sweatshirt and a Yosemite ski hat, so I'm fully prepared for a long, cozy winter season. Both items look great and definitely don't have the mass-produced feel you see everywhere else. And there's plenty of other good stuff on the site.

Starting point is 00:31:30 From moms and dads to kids and teens, from book lovers history buffs, and die-hard football fans to foodies, mixologists, and avid gardeners, you'll find thousands of new gift ideas that you won't find elsewhere. So shop early, have fun, and cross some names off your list today. To get 15% off your next gift,

Starting point is 00:31:49 go to UncommonGoods.com slash big tech. That's UncommonGoods.com slash big tech for 15% off. Don't miss out on this limited time offer. Uncommon Goods. We're all out of the ordinary. Capital One's tech team isn't just talking about multi-agentic AI. They already deployed one.

Starting point is 00:32:09 It's called chat concierge, and it's simplifying car shopping. Using self-reflection and layered reasoning with live API checks, it doesn't just help buyers find a car they love. It helps schedule a test drive, get pre-approved for financing, and estimate trade and value. Advanced, intuitive, and deployed. That's how they stack. That's technology at Capital One.

Starting point is 00:32:34 And we're back here on Big Technology Podcast with Mike Krieger. He is the head of product, Ed Anthropic, and the co-founder of Instagram. All right, let's talk about social media and AI. Very interesting. I mean, when we look around the AI industry, we see so many folks who've come from places like Facebook and Twitter, now running large parts of these AI companies, of course, yourself, co-founder of Instagram, head of product at Anthropic. Kevin Wheel is running a former head of Instagram product as well, I think, is running product at OpenAI, Fiji Simo, who came from Facebook is running consumer

Starting point is 00:33:11 applications at OpenAI. I mean, I could go on. What does building these products have in common with building social media and how does it differ? I think there's, there's maybe the, you know, abstracted from the actual product itself, like what does it take to build good product? And I think that, I think it's less that there's a lot of social media sort of oriented folks that have now moved into an AI. It's more that I think a lot of the best product people were focused on that, you know, even four years ago, you know, pre-chat GPT, you know, pre the emergence of a lot of these LLMs. So I think it's sort of like the most recent place that concentrate.

Starting point is 00:33:50 I find that that often happens like the concentration of talent among a particular discipline. And I think that was social media beforehand. So that's partly one of them. And there it's, you know, all of the pieces around understanding what your data is telling you, but also having the intuition around like what bets you want to place in terms of where you want to move into next, how you assemble a great product team, how does product engineering and design and marketing work well together, all these different, you know, sort of aspects of that. So there's that one.

Starting point is 00:34:17 And then I think there's the separate question of, you know, within social media, like what are the similarities and differences? With Cloud, it feels quite different in that, you know, we have more of a business audience. Like, plenty of people use it for their individual pieces, but it has less of that sort of, you know, social component right now. It's definitely more word of mouth. Like, the most social thing that we've experienced is how people got excited about all the merch and the pop-up we just did in New York, where that was like a real, like, a tractor moment where there was, like, more of that. But in general, less of the sort of mechanics of, like, capital G growth, right? The, you know, how many, you know, people did you bring in?

Starting point is 00:34:55 Who do they invite? All of those different pieces. So maybe a little bit different there, at least for the pieces that we're tackling with Claude. But of course, as a lot of these non-Claude tools move into more of this generation of images and videos, like there is much more of an overall, a strong overlap with what folks were doing on the social media front. How important is engagement to you? I mean, I think the thing that really drove Facebook decisions was engagement. And of course, growth.

Starting point is 00:35:24 And maybe the two go hand in hand. And we always wonder about AI products. Like, of course, you want people to use them, but you don't want engagement for engagement sake because it's pretty expensive to serve these use cases. So where does engagement sit for you in terms of the metrics that you're optimizing toward? We don't really look at engagement, at least not in the typical. Like, at Instagram, we spend a lot of time looking at things like time spent, right?

Starting point is 00:35:46 We do look at things like daily visitors as a proxy for a utility. So I think that's one piece that we look at as well. But it's interesting, like, I was talking to our mobile team yesterday. Like, I think in the future, people's interactions with something like Cloud Code might be much more mobile oriented. And ideally, like, we're right by Salesforce Park or office. Like, I would love to be able to kick off some coding task, go for a walk in Salesforce Park, maybe with a coworker. Maybe it pings me halfway through and has some clarification question and get back to my desk and it's done. It's a very different discipline than being hands-on keyboard.

Starting point is 00:36:20 I also love that, but that feels like a different discipline than what coding has. evolved to primarily being nowadays. And now it's more about, like, what are the creative ideas that I have that I want to see manifested? But in that world, the time spent was quite low, right? It was maybe like kicking off the task and resolving me some questions, but the value of what was produced was much higher. And so I think the interaction paradigm is just really, really different in terms of what we end up looking at. And so I think much more about the sort of value of work getting done than the sort of like interaction and um sort of uh yeah the long uh long sessions uh that you might see at social media i legitimately just had a founder that i interviewed tell me

Starting point is 00:37:02 that her favorite use case is just using AI to get away from her computer which is something you've never really heard of before in technology so um i got to ask you what do you think mark Zuckerberg is trying to do with his very unique AI strategy I think there's folks in there that I've known for a long time, like Nat, who I really respect. So I think what I suspect you'll see is sort of more experimentation around what, like, AI means for this kind of portfolio of companies. I think the sort of initial wave of, well, you know, we've got some chatbot type stuff in the search bars was, like, not particularly transformative. and I think the teams there likely know it. And so, yeah, I think, or maybe what I hope we'll see is more experimentation

Starting point is 00:37:54 that can kind of live outside of those surfaces. Like in the same way that with Instagram, you know, there were some ideas we had that didn't really belong in the app like hyperlapse or even nobody remembers Bolt, but Bolt was like very, very fast messenger. You know, I think that experimentation, once you get a service as widespread as Instagram or as widespread as Facebook or WhatsApp, So it's hard to introduce a new behavior there. You know, we did it with stories.

Starting point is 00:38:20 I think they've since done it with reels. But it's almost like one, you get one per generation. And I think you want to have more of an experimentation kind of test bed beyond that. And I suspect just like given what I know about how those folks think that there'll be more of that sort of experimentation. Interesting. So as the co-founder of Instagram, I'm sure you've watched with interest as AI generated images and videos have filled social media fees and even propelled like SORA, the SORA app, to the number one spot on the app store. Do you think AI generated content in video may be in this cameo version where you can put yourself in the videos?

Starting point is 00:38:57 Do you think it threatens or makes a run for replacing the human generated content that we have today? Or do you think that the human stuff is going to stay on top? And this is a flash in the pan. Yeah. I mean, I think there's, here's what I'm not sure of yet. We saw this with Instagram that there were creative tools that would emerge. Of course, these were at a much more sort of basic level than the kind of capabilities that you're seeing with VEO and with SORA and even some of these other models.

Starting point is 00:39:28 But you would see an emergence of a creative tool and whether they were able to sort of transcend that to being a network that you came back to was often not the case. And I think that was for a couple of reasons. One is, at least in that generation of products, like the, creative content or the created content started getting a little bit samey over time, right? Especially it was like a very highly stylized tool. And the second one, it's like the dynamics that make Instagram Instagram, like the people you already know on there, the people that you follow, the creators that you know. And of course, this has shifted

Starting point is 00:39:59 down and more of like a like pure algorithmic reels oriented piece. Or maybe I'm talking more about like the, the previous Instagram that was still had a heavy kind of follow component. ends up being a thing that feels like, oh, I know who I'm interacting with here. And of course, TikTok has a very different take on things. So to the extent that it's replacing, I think the things that would have to be true is one, that the content feels varied over time

Starting point is 00:40:26 and not just sort of like, okay, I've kind of seen this before. It's really interesting, but I've seen it before. And then two, is there value to being in that network over time? And do you find yourself opening it because there's, like, not just content that you're interested in, but maybe people that you care about. Or there's sort of communities that form within it. Because I think that's actually what Instagram got right is that you started seeing these emergent communities that maybe we're just around taking photography. Maybe they were oriented around living in a particular city.

Starting point is 00:40:57 And they were very self-organizing. The only tool we gave people was hashtags. And that was enough to sort of spur these communities. So I think that's like the fundamental question to be to be answered still. That's a great point. And I think the cameo aspect where you can put yourself in the video may go some degree towards making that happen in these apps. But I also, I'll tell you on Friday, I couldn't put Sora down and we're at Wednesday.

Starting point is 00:41:23 And I don't really feel compelled to open it right now. I think you're right that maybe AI content creation can have that level of sameness to it where you watch one video you feel like you feel like you've. watch them all. And then maybe people come up with creative prompts and, you know, you see a new trend. But I think that's a spot on point there that that's the challenge. Yeah. Well, I mean, I think it's it's all happening very quickly in terms of the experimentation. And so I think there's also this like ability for these tools to adapt as well. And whether it'll sort of, uh, open the door to sort of a new Cambrian explosion of social products is going to be another

Starting point is 00:42:09 thing that I'm tracking really well. It feels like it's been very quiet on the social front for the last couple of years. You know, we've sort of like stabilized among, you know, a couple of really big players. Not a lot of new experimentation. And yeah, I miss the, you know, 20-tens of, you know, what if social products were like this? And what if we took this differentiated take on things? And not all, most of them are going to work. But at least like, there is that value of like, hell yeah, I want to try that. Like that is a different experience. Like, you know, even if it's, again, things that feel like maybe novelties, like,

Starting point is 00:42:39 oh, it's a photo app that takes the front and the back camera together. Like, is that a lasting network? No, but it painted the way towards something. Yeah, that was fun. And I missed that too. By way, that happened, I think, in the 2020s. But I definitely missed the 2010s when I was, you know, doing social media reporting at BuzzFeed. And there would be a new app every week.

Starting point is 00:42:57 And it was like, all right, well, what's peach? Let's try this out and then be gone. but there would be something new. Oh, my God. Peach was classic, all-time, all-time classic. It was. So I want to talk to you about community briefly. Where do you look to find, I guess, a community of users and get your feedback?

Starting point is 00:43:17 And how important is Reddit for you? Because I've seen so much of the activity in the AI space move on to Reddit. And I'm curious, are you reading like R-slash-singularity or how deep into it are you? That's a great question. It's interesting being, I would say that there's like sort of somewhat overlapping but distinct communities that we look at. One is like being a platform, we have like a strong kind of customer base that often has a very sort of clear-eyed perspective about where the models could continue to improve or what we could be doing better as a platform. And so this is very different than my time at Instagram where, you know, there were people that we talked to a bunch about how they're using Instagram, but we didn't have this like more permanent notion of like an advisory board that we have here at Anthropic. And we just brought in a couple of months ago, Paul Smith as our new chief commercial officer.

Starting point is 00:44:03 And he's brought also the sort of community of more enterprise folks as well that we've been talking to. So that's one kind of big delta, which is like more stable sort of set of people than community that we're talking to. So that's one. We actually have a phenomenal user experience research team. And that's a place where we end up being able to stay connected to how more of the power users that I think of as like our core demographic for something like cloud. are using the product, I love it. Like every month we do a product all hands, and my favorite chunk of that product all hands

Starting point is 00:44:35 is basically the UXR team doing like a voice of the user piece. And it's surprising sometimes because, you know, it's not necessarily who you might expect being like the software engineer archetype who for sure are using our products. But there's also, hey, I'm, you know, a marketing manager. I need to produce 20 decks a week. And now I finally found a tool that I'm like using to cut it out.

Starting point is 00:44:56 But here's my feeling about it. my fears about AI. Here's the promise of AI. It's just a very humanizing kind of aspect of it. And then for sure, you know, I think still today, I think, like, Twitter slash X and Reddit have a, like, strong pocket of that AI community. And we, you know, I think we've gotten better at engaging in that community than before. I think there was a time period where we were like, well, like, there's a lot of volume. How do we react? And then, like, you don't want to be showing up only when there's, like, something that you want to, like, correct or something, because then it feels very, like, corpo and not, like, authentic.

Starting point is 00:45:32 And so I think we've found a better, like, ability to participate in some of those, those communities. And, you know, it's good. They're often the, like, power users, extreme users that are telling you something about the edge of what's possible. And then you can kind of try to generalize it more broadly, too. But less, like, R-slash-Singularity and maybe more like R-slash-Claude AI. Sorry, mundane, but it's where a lot of folks are hanging out.

Starting point is 00:45:55 All right, cool. We spoke this whole conversation. We haven't brought up the fact that, well, I haven't. And maybe this oversight of my end. That Open AI basically has seen how well Anthropic has done on coding and said that this is, you know, a number one priority for the company. And, I mean, every day you can look at Open AI leaders on X, speaking of X, trumpeting their Codex product and talking about how advanced they are on coding skills. So can you talk about how you assess Open AI? challenge and what it's going to be like to sort of go head to head with them on what has been

Starting point is 00:46:30 anthropics bread and butter yeah i mean i think it's definitely there was a maybe a window in the summer where it was surprising to me i guess in general how um uh sort of alone we were in and sort of paying attention and having a product out there it's definitely gotten more uh sort of interesting and competitive which i love i think that that my favorite times at instagram were also like when we had interesting competitors that we i think it pushes you forward in terms of like, what is the product we want to build? What are the capabilities that we're going to need to have? So, you know, it's kind of like a game on and an interesting moment as well. For us, the coding piece, beyond just the fact that coding is a really high

Starting point is 00:47:11 value economic activity, I really see the model's ability to plan, write code, solve problems, is not just being useful for software engineering, but being really critical path to the kind of like agentic behavior we want to build long term. There's no way that would never be anything but like one of our, you know, top two or three priorities. And then it's a matter of how do we make sure that we're showing up with the right products that, like, deeply solve the right problems for people, right? And maybe this ties all the way back to your question about, like, good product design and how I think about products.

Starting point is 00:47:41 It's one thing to score well on software engineering bench. That's important as like a benchmark, but it's way more important, I think, to get the feedback from people, like, great, there was a really hard task that I was doing with Rust. And in Sonnet 4, I can do it. OpenS4, it could barely do it. Sonnet 4 or 5 can do it. Like that I get very excited about because it means we're actually having

Starting point is 00:48:00 like real world impact. So I think you'll see us, if we're doing our jobs right on coding, you know, even in the presence of other players enter the space, try to stay really focused on listening to how people are using these products in the wild and then ensuring that future model versions

Starting point is 00:48:18 are sort of meeting people where they are in that, you know, high utility space. All right. last one for you, Mike. Enterprises. They are all interested in generative AI. They're not great at implementing it. They'll admit it.

Starting point is 00:48:30 The studies show it. Are they going to get it together? I think they will. I'm actually, you know, we're from after our conversation, I'm having a offsite with our product team. And a lot of the focus for next year is continue to go into the enterprise side of things. And I think there's a few things. There's the, and we could probably do another whole, like, hour on this.

Starting point is 00:48:50 I get very excited about this as well. There's a whole range. from how do you take a product like Cloud for Enterprise that, you know, enterprise is already adopting, but make it really, really useful. And we talked a little bit before about, like, output quality and just, like, how much it's actually helping you. And, like, there's, I think part of the valley of disillusionment or the trough of disillusionment that you might be seeing around enterprise AI adoption is the promise of

Starting point is 00:49:14 these tools around, they're going to save you time, they're going to, like, make, like, make your work better, just wasn't fulfilled by the previous generation of products. and they need to be if we're going to actually get like sticky adoption in the enterprise, right? And so that's a lot of what we're pushing on is like it's not like, it's not like AI-produced document slop. It's AI-produced quality stuff that you can then iterate on and use and feel proud that you create it. Just like I think people can feel proud about like here, I built this thing using cloud on the coding side. And then there's all the way to, you know, beyond the Cloud for Enterprise piece, like deeper integrations, like internal transformation. What we're learning there to your question about how enterprises are thinking about and adopting

Starting point is 00:49:55 is, at least for the foreseeable future, we need to lean in much more in terms of helping the enterprises get there. And so we're doing much more of a model now where either with our own engineers embedded in enterprises in partnership with Deloitte, which we just announced this week, can we actually take our technology, meet companies where they, like, what their highest needs are, and then just co-develop and just, you know, block ourselves in the building until we've solved their problem and then like learn from that experience and move on to the next enterprise. But I think it's very different than sort of the lean back sort of like, we're just going

Starting point is 00:50:27 to have enterprise products and hope that enterprise is figure it out. I don't think that that's the reality. We just need to lean in way harder on both ends of that spectrum. Baud.a.i is the website. Mike Krieger is the chief product officer at Anthropic. Mike, always great to speak with you. Thanks for coming on the show. Thanks for having me, Alex.

Starting point is 00:50:43 All right, everybody. Thank you so much for listening and watching. We will see you on Friday to break down the week's news and we will see you then. on Big Technology Podcast.

Big Technology Podcast - Anthropic Product Head: AI Model Development Is Accelerating — With Mike Krieger

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.