Big Technology Podcast - Anthropic's Labs Lead On Fable's Capabilities + Building AI-Native Products — With Mike Krieger

Starting point is 00:00:00 Anthropic is not just a massive model builder, it's a massive product builder as well, with products like Claude and Co-work that have taken off like crazy over the past few months. And Claude Code came out of a place that few of us know much about called Anthropic Labs. And Anthropic Labs is a organization within Anthropic that is working on building the next level of frontier products with AI at the center. And so today we are lucky to hear from the person. running that lab, Mike Krieger, who is the co-founder of Instagram and now the lead of Anthropic Labs at Anthropic. We're going to welcome him on stage along with Lauren Good of Wired, who will join me as a co-interview. Mike and Lauren, let's hear it for both of you guys.

Starting point is 00:00:49 In the face of ongoing disruption and opportunity, TMT leaders need to deliver tangible results, not just ideas. When pace and performance matter most, PWC combines market insights and deep sector experience with AI, cloud, and emerging tech to accelerate your transformation and drive measurable ROI from strategy to execution. PWC can help you anticipate what's next, outpace disruption, and compete. For more information, visit pwc.com. All right, Mike, so chill times in Anthropic land. Nothing going on. Slow week.

Starting point is 00:01:30 Slow week. You want to start, Lauren? Yeah, first of all, I mean, we want to talk about what you're working on at labs and explain your role to folks. But I want to ask you first, how close are you right now to the situation with the White House? Less than in my CPO role. So I transitioned about like five months ago into this lab's role. I think in the CPR role, I think would have been deep in it. Now, obviously, you know, we want to restore it.

Starting point is 00:01:57 And as a like product person, I want to make sure that that gets access. but less close to it than, you know, in that sort of C-level role that I had before. Okay. Alex, you have a follow-up? Oh, yeah. I have like eight follow-ups. Definitely. Well, there was a guy named Ben on X who said,

Starting point is 00:02:13 I will mail Anthropic and original copy of my long-form birth certificate if they will enable Fable for me again. I sound like those lunatics who are obsessed with Fero now. Will you take Ben's long form? I don't know that will take his long form. But it has been interesting. I mean, Fable was only available for a few days, but I've definitely, every time I've tweeted since then they have not read whatever I was tweeting and they've mostly been like

Starting point is 00:02:35 bring back fable which like in Instagram we got rid of Gotham do you ever gotham the filter this is like and then for the rest like the next eight years all I heard was bring back Gotham so it struck an herb but fable will uh will come back before gotham did but yeah it's it's clearly the folks that have gotten to use it and started incorporating it it's actually really interesting um I've learned to not really trust day of or even week of model reactions you don't really know until you've put it through its paces. And so I almost just completely block out the noise in the first couple of days of any new model release.

Starting point is 00:03:07 Because I don't know, everybody has maybe their toy example thing that they like to do with the new model. But it's hard to actually put it through its paces until you've actually had real work done with it. And I think people were just starting to do that. And then, you know, we had to sort of pull back fable. But I remember in December when we put out Opus 4-6, it was like this interesting time

Starting point is 00:03:28 where everybody went home for the holidays and a lot of people had that week off between Christmas and New Year's. And then they came back and were like, oh, I spent a lot of time and I really get why Opus is good and I'm going to do it. So I don't think Fable has had that opportunity yet. But despite that, though, I mean, this was a pretty big reaction from the Trump administration. And I think everyone here, especially if you were listening to the Alex Stamos session, understands what's going on. But this happened within a few days of the model release. If I can give Wired some credit, WIRE just reported last night that it was due to Anthropics.

Starting point is 00:03:59 relationship with or having giving access to the model to SK Telecom that could have raised flags within the administration. How surprised were you by how immediate that backlash was? Yeah, I think the sort of reaction decision was surprising and we were sort of immediately engaged with them to, you know, restore access as well. And so at the same time, you know, the one thing that we think a lot about internally is, you know, there used to be a poster on the Facebook wall when we were still that was like every day feels like a week. And I think that's becoming true in AI. And I think a good thing to remind ourselves of in general in the industry is we're dealing

Starting point is 00:04:37 with unprecedented times. We're dealing with new situations and they can develop really quickly as well. And so I think also developing the capabilities and the connections and to make sure those conversations can happen quickly is really, really important. I have a new motto suggestion for you, for the wall. Move fast and jailbreak things. I don't think they're going to use that, Lauren. Okay.

Starting point is 00:05:00 You know, we just heard from Alex that some of these capabilities have been available on, you know, previous models. Or, you know, you can find bugs with other models that are available today. So why do you think Anthropic got singled out on this front? I don't know why, like, Babel specifically was singled out as, you know, again, fable being the non-cyber intention model as well. I think the thing that does change over time is there's, you know, the capabilities, if you knew what to prompt and knew what you're looking for, and then there's the capabilities I think, you know, I'm not sure I resonate with the like juicing high school athletes metaphor

Starting point is 00:05:37 from Alex, but like, you know, the uplift that you get, like uplift is a thing that we think about a lot when we think about model safety. So if you look at our model cards, for example, one of the ways we look at risk in the bio domain is comparing the uplift from sort of layperson using the model versus, you know, an expert or a layperson just using the internet and seeing the comparison there is. And so that is one trend that is, you know, has been, you know, progressing as the models get more capable. And so maybe less why Fable is singled out and maybe more like what the overall trajectory is. Interesting. So we just had R. Karazian, the lead economist on R.R. on the podcast. And one of the interesting things that he said was, you know,

Starting point is 00:06:18 we had during the Pentagon situation, there were these headlines, okay, this company will not use Anthropic anymore, but actually the data from Ramp shows that spending actually increased to Anthropic models. It was apparently a good publicity moment for the company. So I think it sort of plays into a debate that we have here on the show about like how much of this is like real concern for the issues and how much of it is, you know, from Anthropic is marketing. And like, we have somebody from Anthropic here who can actually shed some light on that. We just said Stamos talking a little bit about it from his perspective on the security side, but we're lucky to have you here today. So is it material or is it marketing or some combination? I mean, I think it's one of

Starting point is 00:06:57 the hardest things to like really deeply believe that something is true and not marketing. And then due to not even just Anthropic, I think like, you know, I think people are generally right to be skeptical of any company saying anything and you should like put it through your own filters as well. But for me personally, I was like, no, but it's real. And like we like both deeply care about safety and are like to the extent that we are being vocal about anything it is to either sort of help paint the picture of what very likely is coming or we believe is coming or what we've already seen and spotted you know for example in the in the mythos case really just looking at like vulnerability scanning and bug finding and doing it in partnership with with companies that

Starting point is 00:07:37 were in that kind of project glasswing initial announcement and so the technology is also not in the like in the just you know it is it is doing really incredible things and therefore even calling out what we see as what is happening, I think can seem hypey. I wish I could press a button and make everybody believe that we are not being hypey. I realize that's not the reality that we operate in, but at least from my perspective, we try to call it like it is. My understanding is that within labs in particular in research and development at Anthropic

Starting point is 00:08:10 that you're using the best AI models to actually build new products, to prototype new products and sort of test out your thesis. So is this ban essentially now limiting your ability to do that within labs? Yeah, I mean, definitely Fable is the best model I've ever used. And it's not to say that work has stopped, but it's definitely like less good than the other models that. Or we are using models that are less good than that as well. And it is also, I mean, maybe the reverse of the distrusting the first week response is what happens when you don't have. Fable and like obviously the Twitter reaction for the people that had already kind of like gotten

Starting point is 00:08:50 into the model and reasoning it was strong. But I'd say even in my personal use, I'm like, oh, like I'm on Opus 48 and it's good. Like I'm still productive. I'm doing work. But, and we can go into sort of like how my work changed with sort of these like Fable or like, you know, the models of that sort of family. But it is noticeable for sure. Yeah, I think we'd like to know that. I mean, you know, we, the public had access to Fable for like a half a minute. groups in this like Project Glasswing have had access to mythos, we don't really know what the difference is between using a model that you can use today and using one of these superanthropic models.

Starting point is 00:09:28 So what actually could you do differently with a fable or a mythos? I think for me it's the sort of scope and scale of delegation. And all these things are really imperfect. Like people say like, oh, is this now a level five software engineer or level six, but anybody who's used these models extensively knows that they're still spiking in capabilities, right? In some ways, in many ways, they're better engineer than me. And in other ways, I was complaining today that it had missed a decender, like the G, that bottom part is called the decender. I'm like, how did you put that in the UI? And it's clipped. And of course,

Starting point is 00:10:03 like there's vision capabilities that need to improve and there's debugging capabilities and there's sometimes even just sort of human common sense that is, you know, we're way better at than when the models are. But overall, I think the big shift for me working, and it was really interesting because it sort of coincided of me going back into a builder role. So I really got to see going from using these models as an executive, you're trying to do the most of them,

Starting point is 00:10:28 but not going to have it right all your email. And I think strategy still needs to come from you, and then you can use the models to sort of pressure test it. But going back into a builder role and going from, okay, I am delegating chunks like, please fix this bug, or I'm thinking of implementing this feature, let's go back and forth, to something that ends up being much more sort of, all right, like, I got this bug report from one of our users, or I have this notion of something that I want to build, like, can you sketch out two or three ways in which we could do it?

Starting point is 00:10:58 Right, that seems plausible. Often I find, actually, sometimes it'll give me the sort of explanation or proposal and be like, okay, that actually is over my head. Like, you are clearly way smarter than me, like, explain it to me like I'm maybe not five, but at least, you know, not you. and it will sometimes explain it that way, but then go build it and getting it right, like, you know, at a very, very, very high rate. And I think that starts really changing how you operate. Like I moved much more to before going to bed, making sure that I had like queued up for

Starting point is 00:11:28 Fable, like enough chunky work to last. I would call it the whole night and I would check in later and it got it done in an hour and it was like just hanging out for the next seven hours. But like really like delegating like much more of a goal than just a... So one task, for example. Yeah. Like, give us an example of one task you would hand to it. I mean, here's a kind of crazy one, which is for the programmers in the audience.

Starting point is 00:11:48 Like, I had written one of our labs projects in Python. That's like the language I know. Well, Instagram was all Python. And for like some not super exciting reasons, we actually needed it to be in TypeScript to deploy it. And I was like, all right, that's going to be like, you know, in Instagram, we, for years talked about moving from Python to PHP or hack or the Facebook language after the acquisition. And at least when I was there and never did. But I basically, we have a feature called dynamic workflows where you can have it like also break down the task into like a lot of sub-tasks. And I trusted it to sort of not just do the individual action, but here's like a whole language conversion of millions of hundreds of thousands of lines of code at that point.

Starting point is 00:12:28 Go off and do it. Go plan it. Go execute it. Go verify the work. Double verify the work. And then I came back to the work being complete. So that level of like this is a big sort of chunky task. So that was so you're basically saying it was faster.

Starting point is 00:12:41 did it in an hour, you're guessing compared to, with Fable, compared to what it would have been before. I think the main difference is in the past, it would be like, great, I did it? And you'd kind of took a shortcut here or this is not quite right, or I need to go verify it or like, oh, you cut this corner.

Starting point is 00:12:58 It's like the managing interns thing that everyone's been saying for the past year. Which is very offensive to interns, by the way, but yes. Yeah, exactly. I don't know. Have you managed an intern? Yeah, that's true.

Starting point is 00:13:07 And you were saying it was more correct. So it's faster, it's more accurate. it's more accurate, more reliable, and then according to the U.S. administration, dangerous. I think the other pieces that has like a greater, theory of mind is the wrong word, but sort of like theory of project, so that it's less, you know,

Starting point is 00:13:25 oh, I'm going to make this change, and it'll say, great, I'll make this change. But really, such as you've done like software engineering at scale, the best engineers kind of keep in mind all the disparate parts of how this thinks, and they also see around the corners, like, I can make this change, but if I don't do it in this way,

Starting point is 00:13:39 then the next change is going to be, incrementally harder. And I think that's like been a significant difference I've seen in that kind of class of model. So I think when we talk about anthropic labs, right, people think of Claude Code because it is really a breakout product. And it sounds like you've been tasked with basically figuring out what the next Claude Code is. Would you say that's an accurate description of what you're doing at labs? And also why does Anthropic need labs? Labs. Yeah. It's also maybe worth thinking about why we needed labs in 2024 when I arrived and why we needed labs today, because I think that the answer kind of shifts. I started the labs, the original labs team with Ben Mann, who's one of

Starting point is 00:14:18 the co-founders of Anthropic, in my third week at Anthropic, and it had been something that I'd been bubbling under. And at the time, the reason was really different. It was all of our product engineering, the team was 25 people, and we didn't have the models, really. Like, we had Claude, when I joined it was Sonnet 3, like you've been Opus 3. Like, those were for their time good models, but you weren't going to, they were not even interns, right? They weren't even IC3 engineers. So if you have a team of only 25 or 30 engineers, they are working on like the next incremental thing.

Starting point is 00:14:48 And we were feeling like the models are starting to get better, but we don't have any products that sort of show that off. Like a good litmus test for me is when we get ready to release a model, do we have either a product or a demo or some other illustration of something that is very different? And it gets harder over time. Like with Fable, you know, even like, like, illustrating that we can task or this longer amount of work.

Starting point is 00:15:10 So really labs at the time was let's make sure we don't, like our products don't fall behind the model exponential that's happening. And yeah, so Claude code came out of that initial one because nobody in the rest of the product, people were thinking about coding, but nobody was sort of had the like space to go and think about, well, what if we totally change the form factor and we embrace the fact that the models we're going to evolve in this way.

Starting point is 00:15:32 And a lot of the like the two most useful thought exercises we do in labs. One is, like, visualize the gap between what the models can do today and how most people use it. It can be closed that gap. That's one. And the other one is, imagine what the models are bad at now that they're actually going to be really good at in six months. And let's make sure we have a product ready for that by then. I think those are like the two guiding questions for labs. And then also out of that first incarnation came computer use. Computer use was different, though, because when we built it, it was really bad. Like, we tried a bunch of products with it. And this was around, you know, on at 3-5 and be like, Claude, can you help me, you know, clean up my desktop?

Starting point is 00:16:07 And it would like click the thing, it was delete the file. You're like, this is not safe for release. We're definitely not going to go and build this or to ship this. But we had that product so that every new model that we'd release, we'd first check it internally and say, did computer use to get better? And we'd tell the research team how it'd gotten better or worse until the moment where we said, it's good enough. We're actually going to put a product out around this.

Starting point is 00:16:28 It also gives you this sort of beacon into the future that then you can kind of measure your future products against. But then compare it to now. So we have a thriving product team. There's co-work. There's, you know, Claude Code has grown a lot. We have our platform. And now I think it's actually much less about none of these product teams are doing this sort of thinking. And I think it's much more that the models are advancing really quickly. And even our capability to interact with them needs to evolve. So one of the things we collaborated with labs and Cloud Code that we ship today is cloud code artifacts. Having cloud code not just be able to type back to you, but also sort of draw a picture

Starting point is 00:17:06 or give you an illustration. And that partially came from spending a lot of time in lab saying just a text box and a big text response is not going to cut it anymore. Like when I mentioned that the models feel like they're way smarter than me when they talk to me. Sometimes I'm like, can you draw me a picture? Because this is what I actually need to fully understand this. But it's really what we've been thinking about is, you know, yes, we have a lot more

Starting point is 00:17:26 products. You know, we actually have a lot of consolidation to do in our products. That's another initiative that we have. But within that, we still have an opportunity to make things much more accessible to a person that does not spend all of their time thinking about prompting and the exponential and the difference between high, low, and medium effort. Like, there's a lot we can still do there. But Mike, so there's a, it puts people using anthropic models in an interesting place, right? You know, Cursor, I think, just sold for $60 billion to SpaceX. And someone put this meme on Twitter that, like, you know, cursor would have sold for $300 billion if it was.

Starting point is 00:18:00 wasn't for this guy. And it's a picture of Boris Ternity, the person who created Claude Code. And so for companies that are going to build on top of anthropic technology, you know, they're going to wonder, do I want to partner with Anthropic or is Anthropic going to go ahead and build the product that I'm going to want to build, potentially even after partnering with them. Yeah. I mean, we'll take the like agenetic coding side than I think the broader sort of aspect of, you know, being both a platform and a product, I think is really interesting. when we take on projects, the goal is often to sort of push that area of the industry forward. So, you know, there were AI coding editors, and some of them were really good.

Starting point is 00:18:42 But nobody was quite thinking about it in as sort of free-form a way as we got to think about it with Cloud Code. And now a lot more products have that flavor than I think would have otherwise. And so I think if wherever – you can call me out on this, Alex, if we're ever entering an industry where we're like, all you're doing is the same thing everybody else is doing, but like you've got the anthropic brand. I feel like that's a bad use of our time and a bad use of our either labs or product team time. Like if we're going in somewhere,

Starting point is 00:19:05 it should hopefully be to say, all right, we think that the direction of travel is this way. We can build a product of that. And then by the way, there's no world, nor should there be a world where like all the products are anthropic products.

Starting point is 00:19:16 That'll be a bad world, right? So like, that is hopefully either creating new space for companies or sort of showing the way where other products can incorporate. Yeah. It would almost be like working for a tech company that has like social, messaging, video of the

Starting point is 00:19:28 right Mike yeah okay well there was some question for example when you know Anthropic launched a product that was seen as competitive to Figma and you had been on the Figma board

Starting point is 00:19:41 prior to that and I think you stepped down is that correct yeah and so it's a good question that Alex has brought up I think where Silicon Valley is known for this really healthy, vibrant, risk tolerant startup ecosystem and when the big start coming in with tons of venture capital and you know a lot of resources

Starting point is 00:19:57 people say, well, wait, are they essentially just going to steal my idea? Yeah. No, I think our dual existence, and it's something that other companies have to navigate, we talked, I'll talk about Amazon a lot in the previous panel, like they have to navigate this role where they are both the infrastructure provider. They obviously have a very large e-commerce. They do video, but they also serve video. And then, you know, by and large, customers can live in that dual world of like,

Starting point is 00:20:21 okay, I'm using their infrastructure, also knowing that they are also using their infrastructure to do that. And I think the, you can talk to our customers and see how well we're doing it. The thing I always try to do is like at least approach it with a lot of transparency. So the cursor example is an interesting one where like Michael and I talked a lot over the, you know, time around here's where things we're heading. And, you know, similarly with the other products that we think about, like can we, I think it's a couple of things. It's transparency. And then it's shared building blocks. Like, yeah, I think in general, and I actually don't think there's any cases where this is even true.

Starting point is 00:20:52 Like we're trying to build on top of the same capabilities that are available. elsewhere. The last time I was here in the Commonwealth Club on the stage was our healthcare day at the beginning of the year and we didn't ship like Claude healthcare only we have it like nobody else has it we shipped a bunch of like plugins and skills and MCPs and like complementary abilities so that's how I'm not claiming it's easy or that it's a straightforward thing but it is how we're trying to navigate what is like admittedly a complicated sort of situation. Speaking of startups, Anthropic is still technically a startup but you're worth a lot of money. I mean what's

Starting point is 00:21:26 the latest valuation? Is it? 965. 965 billion dollars or something like that. I sold Instagram for a billion, right? Start up. In 2010, but I thought of us, yeah. Right. Financials have changed quite a bit since then. And yet Anthropic has positioned itself. It is, you know, a PBC, and it's positioned itself as sort of a more ethical company around building AI. And I'm wondering if you could talk a little bit about how you see that positioning in Anthropics role in particular, changing the culture of the Valley. I think back to how Google, in the beginning of the 2000s, really changed the culture of Silicon Valley in so many ways.

Starting point is 00:22:05 And how do you see Anthropics culture now dictating this next era? Yeah, that's a really interesting question. Maybe I'll start, like, insight, and I think there's an external component, too. I think the reason I joined in the first place, so I was winding down my second startup, and knew I wanted to go work at a frontier lab because I'd started to use these models for coding, and they were bad at coding,

Starting point is 00:22:23 but I could see that they were as bad as they were ever going to be coding they were going to improve and I had started building on top of these APIs so the startup I was doing was called Artifact and we did sort of AI powered sort of news recommendations and she read a lot of big technology via artifact back in the day it was a lot of things we added so but not wired you know you guys had a really hard paywall to be honest fair enough we didn't do do great on my subscription because I can get one for you okay it's actually really funny like the making deals yeah making deals it's the it's the login cookies it's It was like really hard to keep people.

Starting point is 00:22:57 I know, I know. Please escalate this to Condé now. I know. And email login is very hard to do in an hour. But I was building on top of the APIs and be like, wow, okay, they're able to do really interesting things. But it ultimately made me go to Anthropic was like they walk the walk and they really like deeply believe in trying to make AI go well for humanity. And that is like in the water internally. and I think has been why I think the company has remained as cohesive as it has even as we've grown.

Starting point is 00:23:29 And I think that it's like a testament also to the co-founders there on how often they are talking about this as well. There's a surprise for me coming from a world where at Instagram we did a weekly all hands, and we talked about product 95% of the time. And maybe 5% of the time we talked about something else that was going on in the world or around the company. Probably maybe underselling or like go-to-market. Maybe it was like 80-20, but it was definitely a very, very heavy product. And I remember Anthropic about six months in, myself and Kate Jensen, who's one of the leaders in the sales organization, did a joint all hands where we talked about are like, you know, how we're doing product and go to market together. And people were like, this is so great.

Starting point is 00:24:04 I finally understand our product strategy and like what we have been doing. It's like, oh, right, this is not, quote, unquote, a product company. You know, it is a very mission-driven AI company with like a very strong sense of like why it exists in the world. I think in terms of the overall impact on the valley, it remains to be seen. I think positive signs that I've seen are interesting signs of the scenes that I've seen are a renewed interest in philanthropy across the board. And I think that's something that has been written about. And I think it will be an interesting sort of outflow.

Starting point is 00:24:36 Again, who knows how all of this goes. But depending on how it goes, it could mean a lot of interesting new sort of philanthropic deployment. And then I think the other piece is, you know, the conversation around how AI could or should go is one that is happening in real time with the technology versus retrospectively, which I think has been the case for other technology waves. And I think that is a good thing. Hi, everyone, Alex Cantorowitz here. I want to tell you about a documentary I've made with gravity to explore the future of AI agent security.

Starting point is 00:25:06 To find out if we're truly ready for autonomous agents, I sat down with MIT professor Ramesh Rosker, former White House CIO Teresa Payton, Michelin's Group Chief Data NAI officer, Ambika Roger Gopal, and Sharon Guy, a former executive at Alibaba. They each offer unique insights into this evolving landscape. We conclude with Rory Blundell, CEO of Gravity, to discuss the path forward. With Gravity leading the way,

Starting point is 00:25:35 join us on this journey. You can watch the full documentary at the link in the show notes. Visit BetMGM Casino and check out the newest exclusive. The Price is Right Fortune Pick. BetMGM and GameSense remind you to play responsibly. 19 plus to wager. Ontario only. Please play responsibly.

Starting point is 00:26:03 If you have questions or concerns about your gambling or someone close to you, please contact Connects Ontario at 1-866-531-2,600 to speak to an advisor. Free of charge. BetMGM operates pursuant to an operating agreement with Eye Gaming Ontario. Mike, you know, you talked a little bit about Anthropic has this gap that it sees between the capabilities of the models and where everybody is building products. And with labs, what you try to do is get ahead of that so you can show people what AI might be able to do now and six months from now. So please tell us. Please tell us what you're building, where you see the potential, and what people should be on the lookout for.

Starting point is 00:26:44 Yeah, throw a roadmap. Oh, if I can throw an ant. What you're building now, but also if you had, if you have a pie in the sky like Elon Musk data centers in space type ambition, I want to hear about that to tell us. Tell us everything. Great. We have 13 minutes. Go. Exactly. The rest of the monologue in my product. I think maybe two themes I'm really excited about that we've been exploring a lot. The first one is giving Claude an environment where it has more agency and it also has more self-knowledge. And I'm going to impact that. because that's like a lot of AIE words. But I'll give you an example of where we are currently doing a bad job of this. Like if you are in a Cloud project and you make a file with Cloud, you're like, that's great.

Starting point is 00:27:27 Can you add it to our project? Cloud will be, no, you have to go download the file and go drag and drop into this thing. And you're like, what? Until yesterday, I would have said the same thing about Cloud Design and Cloud Code, where if you're in Cloud Code, you're like, cool, like I need a design for this thing that we're building. Or you're in Cloud Design and you make a mockup and you want to go build it. I'd be like, cool, here's a zip file.

Starting point is 00:27:47 And you're like, what? And I think, so that's a little bit of interoperability. But in general, this theme of giving, if you give Claude a lot of notion of its environment, I was talking to actually a customer, like an API customer, and one of the things that they were experimenting was actually even giving Claude, like, a secure version of their source code

Starting point is 00:28:05 while it's running in the agent loop in their product so that if it hits an issue, it doesn't go like, I don't know, I hit an issue. It can be like, well, it's probably this thing, you know, at least when it's talking to one of the sort of maintainers of the software. So that overall theme, and of course you have to do it with safeguards and be really careful about what you unlock with it. It sounds kind of obvious, but it's actually night and day in terms of how expressive these products end up being able to be.

Starting point is 00:28:28 And you can even see it going from maybe like core chat or classic chat in Cloud AI and something like co-work where it's got a little bit more agency and it has a runtime and it's able to sort of understand a little bit of its environment. But I think we are at like 10% of the journey about where we could go. Actually, one of the reasons I think people got excited about things like OpenCla is seeing how a harness that is modifiable and you can talk to it about things. And you don't ever get the sense of like, oh, sorry, I can't do that. You're going to have to go to this setting screen and turn it on. It's just a thing it has access to and hopefully with like the right gardening and permission. So that's like theme one that I'm like extremely excited about. And I think if we do it right should actually like transform all of our products like from head to toe.

Starting point is 00:29:11 So the other piece is, and I'll maybe like share like the, the, not the internal product we're working on, but like the phrase I got as feedback was, like, I think closing the gap, I mean, I talked about closing the gap between capabilities and reality. I think it's also closing the gap between how people understand their own work and then how the actual day-to-day is to do that work. I was talking to somebody internally who's on our privacy team. And to move a ticket from like one queue through another one, via the task tracker into another one was like eight different steps of copying and pasting, of like manually moving a pretty, you know, like kind of annoying to have to do, probably error prone. I have to like keep spot checking it.

Starting point is 00:29:55 And we helped her with one of our labs projects to basically like make that not a pain. And she's like, ah, this is the first time in my career. And she's like been working for 30 years. We're like, what's in my head. And what I am using is like now this. Like it is now closed. And I want to like bring that feeling to everybody who like you know of course clod unlocked a lot of you know non-technical people would be able to code But like it's we're still asking people understand way too many concepts of like what is

Starting point is 00:30:19 What is you know the difference between like my sandbox environment and production or like connected MCP as myself or others or how should I store data? And of course you can't abstract everything but like if you combine both of those themes if you give cloud a lot of self-knowledge and you're creating an environment work and actually solve complex problems for people in like repeatable ways. I think I get like very very excited about that. And you're moonshot. Not letting you off the hook, what's your moonshot? Moonshot? Yeah.

Starting point is 00:30:45 Nothing in space, although I guess we're, you know, we're talking to SpaceX about spacey things. But you're talking to SpaceX? I mean, those are, right, right, right. For a compute, yeah. It was exploring extra orbital. What was the phrase? Something about exploring like post-orbital world things.

Starting point is 00:31:04 Definitely not my department. But yeah, there's stuff in. Are you? So the labs, is. isn't working specifically with the team on compute? Right, exactly. Or chips. Separate.

Starting point is 00:31:12 Totally separate. Okay. Okay. So you're moonshot. Yeah. Do you personally believe in data centers in space? I had a conversation. I, by far from a data center expert, but I talked to somebody who is a person who sends things

Starting point is 00:31:24 to space who is not Elon Musk. And that's what you would say. Yeah. And they were really bullish and I was like trying to talk about why and it was basically like effectively, like effectively infinite. power if you convert it well and, you know, infinite land. And I was like, okay, you can buy that. I mean, I think they feel good about the shielding you have to do.

Starting point is 00:31:48 Again, clearly not my area of expertise. But after that talk, I was like, okay, I see it, you know, even if it's going to take a few years. At first, I admittedly, thought it was a crazy idea in general. But now I'm like, oh, I actually really can understand why this might make sense. When you were talking earlier about the ways that the work in Claude is going to get compressed in all those steps, I couldn't. help but think of tokens and how, you know, maybe it's good for your business model in the short term.

Starting point is 00:32:12 If people have to take so many steps and use so many tokens, but tokens have become this unit of economics that we're using to describe the industry now, and people are token maxing, and now they're tokenizing, and one, I want to see, I want to hear how we're used it on that spectrum if your token maxer. And two, is there a near future in which the industry is not actually measured by tokens? You know, it goes the way of MIPS or dialogue. or some other, you know, there's some other unit of measurement that actually defines the economics of this era. Yeah, I think both of those are really interesting questions.

Starting point is 00:32:46 It was interesting earlier this year when you started hearing about, like, companies that have like dashboards showing, like, who used it the most? And we, of course, have, like, internal metrics as well. And we found that there's not a lot of correlation between, like, the person who's using the most tokens and, like, the person that I, like, it was an interesting thought actually to do at your company's, like, write down your 10 most productive people that you think are most productive, and then, like, get your top 10 token users and see how closely

Starting point is 00:33:08 they correlate, at least for us it wasn't that correlated. It seemed dangerous to sort of like purely glorify the like maximum usage. Obviously it's like very gamable. But even beyond that, I think it's, you know, yes, you can ask Claude to do 10 different variants on something. But if you thought about it deeply, maybe you would do choose two that you thought were most promising. And the third one if you then had like some iteration on that as well. So I would not say like a token match. Actually, the tokeniest thing was that conversion thing I did was just like a couple million tokens. There's like a lot of tokens that it took to convert the the thing from Python to TypeScript.

Starting point is 00:33:41 But I think people are being more thoughtful about these different pieces. And one of the things we look at, whenever we look at a model launch, is not just model intelligence, but we're also really thinking about model intelligence and effort and token efficiency as that combination. And I think that's a big lever we have to improve, is how do we continue to be more and more token efficient for a given task so that you can also, hopefully you don't have to think very hard about this.

Starting point is 00:34:04 We can do this automatically. But we're able to tune the solution to the problem a little bit more. And then to your second question, yeah, I, you know, when I was still in the CPO seat, I was thinking a lot about sort of outcome-based pricing as something that would be really interesting to do. If you could do it, of course, if you talk to like the CERAs and Finns of the world that have like a really clear, like we kept this, you know, we were able to solve this customer request and not have it go escalated. Like, that's really clear. It gets so much fuzzier on these, like, tasks that we actually ask Claude these days.

Starting point is 00:34:33 Like, I had a strategy document. I use Cloud to critique my strategy document. Like, what was the outcome? It's like, well, I don't know. It's like, tell me how the strategy goes six months from now. It feels like it's going to be very hard to capture that as well. But I would like to see some more experimentation around. Can you better capture what it's worth to the individual and then, whether the company,

Starting point is 00:34:53 and then can we find the best way to do that as well? And I guess the most concrete thing we've moved towards that. And we have a product called Claude Vantage agents where we'll run all of the infrastructure for you in terms of doing all of the, you know, agentic harness and calling the tools, et cetera. And you can either do it in sort of the normal mode, which is you give it tasks, it will go through tokens, it'll tell you when it's done, or we have an outcome-based mode where you can say,

Starting point is 00:35:15 here's what good looks like, here's a rubric, go and do it and it'll go off and make it more outcome. So like if everybody had moved on to that API, then I think maybe we could have a different sort of output-based pricing, but we'll see how that gets adopted. John or the guys in the back, do we have the random image? Can we show the random image? If we can, great.

Starting point is 00:35:34 I'm excited, the random image. Oh, here it is. Okay, it's just because we didn't. have a good label for it. So we just called it the random image because it might come up at any point. But this is a chart from the Financial Times. Speaking of utility, where it shows the amount of app releases that have come out, which are skyrocketing, and then apps with significant usage that seem to be going down in app reviews, which seem to be going down. So Mike, I'd love to hear you respond to what we're seeing in the image here. Is it possible that like everybody's coding

Starting point is 00:36:03 and releasing, but we're not really seeing a big boom in productivity? That's really, I mean, I think there's definitely a power law and app usage in general. It'd be interesting to seeing if any of those app release became one of the apps with significant usage. We could take it. We could take it down. Yep, go ahead. I think ties into something I've been thinking a lot about, which obviously my background is in consumer, and I've been wondering what the consumer AI breakouts will end up being. And I don't know that we've seen a lot of them yet. And I think part of it is, you know, I don't know how far back that chart goes, but when we were releasing Instagram, it still felt a little bit wild west in terms of the apps.

Starting point is 00:36:38 People were excited about apps and like two kind of random people released an app where we were able to get to like number one in photos and video within three months, right? I think that is much harder now when you think about how consolidated the top 10 is and how much time is spent on like the TikToks and reels of the world. It's a lot, right? And so I think getting that breakthrough consumer experience I think is really, really hard. So I think that is as much a story about how sort of consolidated consumer, products are these days, number one. Number two, how entrenched or how powerful it is to have

Starting point is 00:37:13 that sort of data, data gravity, like the data gravity of something like your Google Docs or in your Google Doc. So even if somebody has a like 2X better AI powered, you know, doc editor, you're going to move all your stuff? Maybe, probably not. So I think that it speaks to, you know, the things that are sticky. I think about a lot is like the hard stuff is still hard, like making something people want still really hard. We have amazing bottles internally on top of our products work, right? And so I think that's a bullish sign for product people like me because it means that I think we hopefully still add value.

Starting point is 00:37:46 But I think that chart is maybe another place. It's harder in many ways than ever to break through, even if you can code more quickly. And could we have done Instagram in a month instead of three or four, probably? but we got there after like a long winding turns and twists and turns process. You had, I think, 18 people at Instagram when you sold it? 13. With these tools, do you think you would have, how many people you think you would have had?

Starting point is 00:38:14 It's really interesting because of those 13. Just give us a number. Everyone's like, oh, one billion dollar, one person startup. How close could you guys have gotten? I think we could have gotten there with like four to six, you know? Okay. Yeah. Or the thing that we would have done, if we'd grown it,

Starting point is 00:38:31 we'd be able to do things in more than a single track. Like, Instagram was, if you ever watched my five-year-old play soccer now, by which I mean like the ball is there and every single person runs to the ball? Like that was our product team. It was like, video, go. And everybody like goes and works on the one thing. And like we'd be able to like play positions. Like Android we built in about a month for Instagram.

Starting point is 00:38:50 We could have done it probably in a week with the models. And to build Android, we took everybody off iOS. And we all like relearned to code Android OS. And then we went off and do that. that and like for that whole month we were barely shipping updates on iOS. So I think you can be a lot more, actually a really good example. There's a labs project I have internally that helps accelerate how anthropic engineers like code and do code review.

Starting point is 00:39:12 And that project, I am maintaining an iOS and an Android version of, and I basically have the cloud that works on the iOS one, basically like ping the Android one and be like, hey, I implemented this. Sorry, Android users, it's still the second one, even in the AL world, sorry. And then the internet version is like, okay, I'm going to do this. Oh, that doesn't count because that feature doesn't make sense here. I'm going to drop it. And of course, we wouldn't have been able to delegate all of that on Instagram,

Starting point is 00:39:39 but we sure could have done a lot by having sort of platform parity. Like this dream of platform close to parity is now actually quite doable. You're probably going to get calls now from the remaining six or seven people on your Instagram team going, was I, did I make the cut in the new era? Also, it sounds like you probably could bring Gotham back now if you really wanted to. I forget if we eventually, I mean, I think for April Fool's maybe we brought product one day. Yeah.

Starting point is 00:40:02 Do we have time for one more question? Yeah, yeah. My last question for you is you worked on a product that now as it has evolved is in many ways ethically fraught because of some of the harms that people are concerned about with children. And when you talk about the fact that there hasn't really been a big breakout consumer app for AI, I think there has in its chatbots, right? And chatbots have also led to some real dangers and harms for young people. And so when you are building in labs, how are you thinking about the, you know, the potential

Starting point is 00:40:33 harms and the risks that come with just making this technology that much better? Yeah. I mean, I think there are certainly products that we have either prototype or conceptualized and been like, this product, this sounds so hype you. I hate this. But like this product, if shipped would be bad for the world or like would nudge people in the wrong direction. Or even if we did it right, the like wrong or like more morally front version of this would

Starting point is 00:40:55 be actively, we think bad. And so I think asking that question a lot internally makes a difference. And it's a luxury to have core products and models that are doing really well. So we don't like that's that in some ways an easy decision if we think it could get a lot of a lot of use. But yeah, I think going back to an earlier conversation, I think front loading it is really valuable and really thinking through like it is now more normalized to have people at a company and definitely on top it does for like economists thinking about the impact of the thing that you're building on. on the world, and that just was not the case along the years on most of social media, I think. Mike, it's always great to speak.

Starting point is 00:41:33 Thanks for thank you again for bringing your insight today. And let's see it again soon. Let's hear from Mike and Lauren. Thank you. Great job. Thank you. If you want a $3,000 a month payday for life, what would you feel free to do? Maybe take a long weekend, every weekend, or try a bunch of new hobbies.

Starting point is 00:41:54 Would you feel free to upgrade and listen ad free? Don't worry, we get it. Every $20 ticket could win you $3,000 a month for life and supports life-saving cancer research at the Princess Margaret. Feel free to buy your payday for life ticket today. Raffle number 155-2194. Please play responsibly. Are you one of those media strategy people

Starting point is 00:42:16 clicking through slides, scrolling spreadsheets? Yes? Good. This is for you. Because on Spotify, there's an audience that's different. Locked in. Loyal, invested. They're called fans. Fans don't just listen to music. They feel seen by it, like it belongs to them.

Starting point is 00:42:33 So when your brand shows up on Spotify, that's who you're talking to. And you're right next to artists like me, Lizzo. So, are you ready to talk to fans? Spotify Advertising, you're among fans.

Big Technology Podcast - Anthropic's Labs Lead On Fable's Capabilities + Building AI-Native Products — With Mike Krieger

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.