Limitless: An AI Podcast - The Good and Bad of AI Coding: Amazon Shuts Down, Autoresearch, Claude Code Review, Lovable

Starting point is 00:00:00 Last week, Amazon's entire platform crashed for six hours. No one could shop, buy anything. They couldn't even see prices. The reason was because a junior developer had submitted an AI-generated piece of code, which crashed the entire platform, and it cost them millions and millions of dollars. Now, Anthropic, the creators of Claude Code, which is what Amazon was using to create AI-generated code, also had a similar issue where their entire platform has been suffering from outages this entire week. They actually also released a new product called Code Review, which helps use AI to help fix the code problems that their own model is fixing.

Starting point is 00:00:39 It's all getting incredibly complex right now. And Amazon tried to hide the entire thing from a Financial Times reporter. It's all pretty crazy. And it forces us to answer the new narrative, which is AI generated code isn't going anywhere. Demand is at an all-time high. But have we hit a wall? Has it become too dangerous to use AI to code? Yeah, and there's this interesting phenomenon happening as these AI coding abilities in general become better, where currently they account for what, about 4% of total GitHub commits.

Starting point is 00:01:06 The expectations at the end of the year, they will account for 20% of total GitHub commits. And there is this increasing reliance on these AI tools, but that creates these key choke points and points of failure that have a significant effect. I mean, this Amazon Alge was a huge deal. That was pushed by a single person. And then within Claude Code, they've been having a lot of downtime. time. And the Anthropic, I mean, we were trying to use the programs today and it was down. The service weren't working quite right. So there's a lot of these growing pains that are happening. And it seems like we're running into more issues faster than people are trying to mitigate it.

Starting point is 00:01:38 So that's where I suspect this Claude Code checking agent comes in that we're going to cover later. But this Amazon story was pretty fascinating. Like, this costs Amazon billions of dollars. Yeah. So let me walk you through the timeline for this, because this actually isn't the first outage Amazon's experience because of an AI generated. piece of code. So back in early December, actually November of 2025, they took a really hardline approach and a new policy was invoked, which was, I want 80% of all Amazon's code generated to be AI generated. And this was their goal to be achieved by the end of 26. Now, this flips, completely flips from a company that I think employs like hundreds of thousands of engineers and wants them all to

Starting point is 00:02:24 kind of like hand write or hand type the code. So this is a, a pretty aggressive flip. Amazon's been laying off tens of thousands of people. So this kind of is in trend with what they wanted to do. But in the middle of December 2025, they experienced their first outage. It was 13 hours back then. So this is all adding up, by the way.

Starting point is 00:02:41 We're talking about tens of millions of dollars. Then in late 2025, so at the end of December, there was another outage. And then we have the outages that we're speaking about today. And the issue that this flags is, although AI is like a really useful tool to generate code and it finds bugs, it actually might create more problems than you'd expected because the issue that they're seeing is

Starting point is 00:03:01 junior developers that come in that don't understand Amazon's code base, just kind of use AI to run, like pull a code to run autonomously, figure out what they want to create, and then they just submit it without actually reviewing and understanding it. And if this goes unguarded, it creates and results in issues like this. Yeah, and there's the problem that's starting to happen now where agents are creating code far faster than humans can keep up and check it. So it becomes this impossibility where if you want to move at the velocity that AI enables, you are simply unable to to keep up with the change log of what's happening. So you have to defer some sort of trust level to this AI, to its ability to check itself, to run tests to verify that it actually works. And in

Starting point is 00:03:39 some cases, it doesn't. I mean, I'm sure some people listening this experienced the outage. And it wasn't just an AWS outage. This is Amazon, the actual storefront, where my dad actually texted me, he was like, am I doing something wrong? Can you place this order for me? Because it's not going through. And then I went to check myself, and the whole service was down. I went on Twitter. I saw, Casey Nystatt was posting a bunch of photos about this. It was this big deal because anyone who was trying to order anything from Amazon, the entire storefront was just totally offline. So AWS, the web services, runs a significant percentage of the internet. Amazon, the storefront runs a significant percentage of the e-commerce. And these things have been going down

Starting point is 00:04:12 at an increasing rate due to these tools like cloud code. So they're moving much faster, but they're they're breaking things. This is like the early Zuck Facebook mantra. It's like move fast and break things. They're now taking it to the extreme. And as we become increased reliant on these tools, we might see this start to, I mean, permeate even further than just Amazon. Well, the weirdest part about this is that AI kind of has moved from this assistive tool to now being the foundational bedrock for a lot of these different, like, services that you just mentioned. Like, AWS runs the entire internet. Every single business that you interact with online probably uses AWS on the back end.

Starting point is 00:04:46 So if they go down, then your business goes down as well. That's where we get all these outages. Amazon had some pretty severe repercussions. They have now done a complete 180 on their point. policy of generating 80% of their code via AI. And they've now said, if you're a junior developer or engineer working at Amazon, you now need to get your manager's permission to submit code. So this is so tough because you're now gone from trying to automate the entire thing to now making it incredibly worse. So you're using AI to kind of generate the code, but then you still

Starting point is 00:05:18 have to rely on a human constraint. And there are fewer managers than they are like junior developers. So it all just becomes really bogged up as a pipeline to kind of shipping changes. So I think this is going to slow down Amazon massively. But if it's for the result of, you know, saving your company tens of millions of dollars, fair play. But yeah, it's a pretty hard-line approach. And this all seems kind of connected to the trends that we've been seeing, right? Particularly around jobs and distribution.

Starting point is 00:05:41 It's like a lot of people are cutting jobs, which means they're increasing the reliance on these AI tools. But if the result is that the actual output of these AI tools is causing a lot of damage, it creates this tension. And it seems like the AI labs themselves, anthropic, open AI, they cannot hire talented people fast enough. They are looking to hire every engineer under the sun who is confident and capable of building great products.

Starting point is 00:06:05 But a lot of other companies are deferring that workload to these AI labs, which is starting to create this interesting dynamic where the job market is suffering, but the actual productive output of these companies is not to an extent. And that's operating under the assumption that these code tools will get better. but if Amazon and a lot of other companies

Starting point is 00:06:21 start putting in these thresholds that, again, bottleneck and throttle that amount of AI production capability, like that probably doesn't help the calls a whole lot. There's also the angle that AWS, or just data centers in general, are becoming like a really valuable or national security threat level asset, right? So if we look at the current Iran conflict that's being waged between the U.S. and Iran, Iran actually targeted strikes specifically to Amazon's AWS centers out in the UAE and Bahrain.

Starting point is 00:06:55 And this caused a bunch of the outages as well. So it's not even just like AI coding level threats. This is becoming a geopolitical weapon at this point. It makes sense. Every single story we tell, like this Game of Thrones narrative that's kind of existed throughout the history of Limitless. It's just elevating itself higher and higher to the global stage, where now it's impossible to have any sort of conflict or large decision without AI being in the middle of it. Like when you think about this conflict, what are the bigger things? It's this like anthropic deal with the Pentagon. They're now

Starting point is 00:07:24 striking the AWS data centers. It's coming forward the infrastructure that's building these tools that are so powerful. But this isn't just happening on the worldwide stage. This is happening on the consumer and commercial level too. You were mentioning something about lovable and how much they've grown recently. Could you just explain lovable for the people who aren't familiar and what's how big they have grown recently? It's crazy. Okay. So typically the people who code, this might shock some of you. other people who have learned to code. They're very technically focused work.

Starting point is 00:07:50 But there are people like you and I and a bunch of our friends that don't necessarily know how to code but still want to do that. And that's what resulted in vibe coding, right? Now, ClaudeCode and products like Cursor are still very oriented for technical folks. A platform like Lovable is for people who have zero experience coding,

Starting point is 00:08:09 but still want to code things. So the Lovable platform is actually a really cool platform where you just type whatever you want. Like, I want to create an app that tracks my thing. fitness reps or whatever that might be or create or suggest cooking recipes for my type of cuisine. And it does so in a couple of minutes. And it's really intuitive. It makes it super easy. And they have the additional perk of being able to deploy it live as a website or an app that you can

Starting point is 00:08:31 publish on the app store super easily. So they have this end-to-end experience. Now, Lovell when it started, had like a rocket-type trajectory. Within the first 12 months, I think they hit $200 million $1, ARR, which is just like an insane growth, all of these AI companies are blowing up. Now, in the last month alone, they've added an extra $100 million of AR. They were at $300 million. Now they're at $400 million. So that's like roughly like just over 30% growth. Just insane amount of demand.

Starting point is 00:09:01 And that's the point I want to make. Like when I look at these Amazon outages and Amazon restricting their developers from using vibe coding to improve their product, I don't think this is necessarily a wave that they can stop. So they should stop trying to implement policies that are restricting people from using it and instead try to figure out what is a better way to implement AI generated code because this thing's not going away. Lovable, Cursor, ClaudeCode are all skyrocketing and we're not going to stop that. And I think a lot of the, I mean, aside of CloudCode, right, is lovable and cursor, they're kind of model agnostic. They'll use whatever model you put. So I was looking

Starting point is 00:09:37 at Polly Market earlier today because they have markets for which one of these models is actually going to be the best in which time. And this is interesting because if you looked at these charts towards the middle of January, you noticed that Anthropic was kind of the favorite. Anthropic was looking like they were going to have the best model. But the reality now is that by March 31st, there's a very clear divide that has happened over the last six to eight weeks in Open AI having an 85% chance of having the best model. Does that mean there's going to be something new or is that just the current model? We don't know. Currently they're at 5.4, which is fantastic. I think everyone is kind of unanimously decided that it's the best for coding. But then there's also a fun

Starting point is 00:10:12 polymarket here about Claude going down because it appears as if Claude is really just not doing as good as it needs to be. It's not as stable. We were trying to use it this morning. It wasn't working. They're clearly experiencing growing pains because they have all these new users and they're having to throttle reasoning. They're having to limit the amount of searches that people are able to use. I'm sure the researchers who use these to train the models are not happy. and it looks like based on Polymarket will be seeing somewhere between five to nine more outages

Starting point is 00:10:41 in the month of March. Yeah, basically every single day this month. Yeah, like it seems like they're having problems and this is proof. I mean, you can actually bet on the problems that they're having and make some money on it, right? This is what, $26,000 of volume? So thank you, Polymarket for sponsoring this segment.

Starting point is 00:10:56 And I guess now we can get onto the solution, which is what Anthropic is starting to roll out. But they're doing it in a way that's a little less than desirable Yeah, so the main instigators of this entire AI coding debacle is Anthropic. They created Cloud Code. Amazon owns 21% of Anthropic, which is just crazy to say out loud. And so they use Anthropics model called Code to do all their AI code generation.

Starting point is 00:11:23 But Anthropic decided, hmm, instead of trying to make our AI coding models better, let's just release a new product called Code Review, which will review your AI generated code. So you now have Claude generating the AI code and then also reviewing the code as a separate product that you pay for. And it's all for the beautiful price of, I think it's like 15 to 20 bucks. Where is it? Yeah, 15 to 20 bucks per review, which some might say is quite expensive. And they scale based on the pull request complexity. So that number can get higher.

Starting point is 00:11:53 And what I find it funny is they're selling the problem where they're selling you a model. You use it to code. It's going to create problems. and then they're selling you a separate package for the solution where, oh, you have problems in our code. Well, here for another $25, $50, whatever it may be, we'll actually do a code review and we'll fix the problems that we've created. And that creates this, this uniquely disincentivized incentive where there's no real reason

Starting point is 00:12:18 for Claude to want to fix problems when they could just sell this package on top to remove the problem. And this comes in the face of chat, JupT, and OpenAI doing the polar opposite, where they're actually offering these for, for feet. And here's Rohan from OpenAI. If you want AI code review but don't want to pay $25 per review, check out Codex review. It leverages frontier codex models, finds complex issues, and 100% usage-based. Most run should cost $1 or less.

Starting point is 00:12:44 So here we have another head-to-head collision happening where ChatGPT, OpenAI, is going head-to-head with CloudCode. And it seems like CloudCode is the favorite for a lot of people, but I've noticed a lot of the more technical people on my timeline who are building pretty hardcore things have been really relying on. on 5.4 and using Codex much more. Yeah, so to set the precedence, OpenAI and Antarctic now are neck and neck at building the best coding models. And Codex 5.4 from OpenAI has taken the lead. A lot of engineers are now saying it's way better than Opus 4.6, as you just mentioned. In terms of these security code review products that each company offers, Open AIs is instinctively

Starting point is 00:13:23 cheaper. If you have a $20 a month subscription, you now get access to this code review tool for no extra a cost. So it's automatically a better tool to use. Now, in terms of quality, in terms of how good it is at code review, we don't know because they haven't made any of their data sets public. Anthropic has, and it's pretty damn good. Also, if you compare Anthropics product to traditional SaaS vendors that have these kind of appsec security tools, it is still way cheaper. A lot of people were in my comments, actually, because I tweeted about this saying it was cheaper. They were like, no, it's not. Like, I could use the SaaS tool. And I'm like, yeah, but you're not factoring in the engineering

Starting point is 00:13:58 hours, they spend like 30 to 60 minutes per bug. That adds up to about $100. So it's still technically cheaper, but I would rather use Open AI's tool, which is just way better. Now, Anthropic itself are facing so many outages for Claude. You mentioned earlier that, you know, Claude is out for you this morning. It's also out for me this morning. And I think this is because Open AI basically has all the money and compute to subsidize this cost for a tool like this, whereas Anthropic is struggling. They're adding a million users to their general database or to their general user. base every single day and they're not able to subsidize people's compute anymore. So they're just shutting people's access off to it. And it's kind of frustrating to see, to be honest. Yeah,

Starting point is 00:14:37 and the way you have to think about this is each one of these companies has a limited amount of compute. And that compute takes care of everything. It has to serve the customers. It has to serve the developers, the researchers. And a lot of times when you're developing these new technologies or you're training new models, the training run consumes a tremendous amount of GPU power. The researchers want a lot of GPUs to run tests and run trials on. But you still have to, to find extra resources to serve all of the users who are querying this model every single day. And I was recently listening to Dario on, I think, the Dwar Keshe podcast where he was talking about how they think about how many GPUs to order into the future.

Starting point is 00:15:11 Because they go to Jensen, they go to Invidia. They say, hi, Jensen, we would like to place this amount of GPU purchases for this year. And what he was mentioning that I found interesting is that they can't over order because if they're off by just a small percentage, the incremental cost of those GPUs will far outweigh the growth, the revenue that they get from growth. And what it sounded like he did is he was just taking the current growth and mapping it to the future to hedge themselves against this collapse from debt that they would have from over ordering. But the reality is that since that episode was recorded and now, they have gone fully vertical. Like that curve has steepened

Starting point is 00:15:48 significantly more. And they're going to have to figure out a way to solve this because this is not an easy thing. These orders get placed years in advance. They're projecting for a certain curve and they're getting a different one. So is this a short-term thing? Is this going to be durable? I don't know, but it is noteworthy that Anthropics having some growing pains. Okay, so we have two problems here.

Starting point is 00:16:07 We have an amazing AI model that can code, and that is inevitably going to be the feature. Loveables demand proves that. But then we also running into the issue where we have all these security flaws and we could lose tens to hundreds of millions of dollars. Amazon demonstrated that, right? So we're at this conundrum.

Starting point is 00:16:22 How on earth do we solve this? Do you know what the solution might be, Josh? it might be AI. It might be AI itself. That checks out. You have to use AI to solve both of these individual AI problems, I'm afraid. So Andre Carpathy, the godfather of AI, as I like to call him, released this really cool experiment that is just completely blown up,

Starting point is 00:16:43 which hints at what the future solution might be to solving this problem that we just explained on this episode, which is called auto-research. Now, the best way to think about this is he created an AI model that acts as a an AI researcher, but the coolest part about it is it autonomously does research. Now, typically, when you use an LLM, you need to prompt it. And then it says, I don't really know what this means. Can you tell me what it means? And you have to keep working back and forth. This thing runs completely on its own overnight, and in some cases, for days at a time. He's done this overnight when he slept for like nine to 12 hours, and then he's done it again for two to three

Starting point is 00:17:20 days. And how the AI model works is he sets an objective. He says, I want you to try and improve this part of your model. So he's talking to the AI model and saying, you need to improve yourself in this particular way. Go away and figure it out. The model then runs an experiment every five minutes. If the experiment comes out with an improvement, it cashes it in. It improves its own model weight. It improves its own internal insights, right? If it doesn't, it discards it and it tries again. and runs these experiments on and on and on and on. In this experiment, I think he ran around 150 experiments overnight, and it improved itself pretty marginally,

Starting point is 00:17:56 as is demonstrated by this graph, the loss of version just went crashingly down. So it suggests that maybe Anthropic and open air, instead of having to rely on releasing new tools every day constrained by humans, they could just let the AI improve itself and everyone's hunky-dory, maybe. This seems like an early form of this take-off situation in which AI's become self-improving. Because I haven't really seen. this architecture before in such a open and easily accessible way. I mean, this is all open source.

Starting point is 00:18:24 This, I mean, Andre says here it's 630 lines of code. And traditionally, it's on the human to iterate on the prompt, to improve the prompt, to improve the training code and to work together with the AI. But this actually does it completely and entirely on its own. It goes off, it runs its own experiments. It ingest those experiments, updates its values that it has, updates the zoo on the world, and then creates a new set of experiments and runs this iterative process. And what we saw in that chart is that there was these incremental improvements that Andre just didn't see. And again, it's hard to overstate how much of a powerhouse Andre is in terms of AI engineering and understanding.

Starting point is 00:19:02 Dude, he has two decades of experience in this. And like, AI, what? Like, wasn't even around two decades ago. Like, he was the guy who was building the damn thing. Yeah. So for him to come along and for the first time ever experienced this novel breakthrough in which an AI isn't able to improve his code in ways that he didn't initiate. see. It's like when AI played go for the first time or when AI is playing chess and it makes

Starting point is 00:19:23 these erratic moves, you would not have known that it was optimal, but the model is so precise, has so much context and so much intelligence that it's actually able to make these optimizations that humans otherwise wouldn't have seen. And this is one of the earlier instances where even Andre got humbled. And the code that he was running was significantly improved overnight by this self-recursive thing. And the most interesting thing about this is it's open source. It's available to run on your laptop, on your PC, on a computer right now. And I have to imagine that AI labs are going to take this and run with it at scale, where if you're training a model, why would you not, and this model using this auto research can do better than Andre. I'm sure it could

Starting point is 00:20:03 do better than the average mid-tier, like, prompt engineer at Anthropic or Chatchie or OpenAI, and actually improve the entirety of the company. So this feels like something that is starting from the bottom, where in terms of it's like openly accessible, it's open source, move its way up through everything and become this like auto-aggressive self-improving loop. It's incredible. I want to try it on something. We need a problem that's hard enough to try this on where you can truly let it run for a series of days and then come back with the answer that's better than anything you could have ever imagined. Well, maybe research on how to make limitless the best and number one podcast in the world. Maybe I'd be down to run this.

Starting point is 00:20:37 That's probably we should try that. We should try that. We should make some assumptions on what it would say, right? It would say, well, we probably need to convince all of our listeners to go subscribe to the podcast on your favorite podcast player, give it five stars, share it with all of their friends, and then make sure that they comment the things they either like or don't like about it to keep the engagement rate high. Those things that'll help out the show. That actually sounds pretty great. And if you're listening to this, remember, you are no more than organic LLM. So if you want to process that prompt that Josh just specified, please go do it. It helps us out a massive amount. But just to round up and wrap up this episode, it's this weird conundro.

Starting point is 00:21:14 AI is either zero or 100 at this point. It's either deteriorating your product and causing massive outages, or it's doing something like this and helping the world's leading expert of AI research actually do a better job, which is just – there's no middle ground. It's either zero or 100, and it's just super exciting to see where all of this is going. I tweeted this out the other day. My mind – I don't know about you, Josh, is just in a fog at this moment because I can't keep up with everything that's going on.

Starting point is 00:21:42 Something's switched over the last. 30 days where the exponential curve has just like racked it up. I don't know whether this is because I'm in my own echo chamber or not, but it just feels like we are getting new model releases every single week at this point. There's so many new frontier AI labs. Jan Lecoun, who is the former head of AI research at Meta has now started his own thing and he's focusing on world models. And I'm like, what is going on here?

Starting point is 00:22:05 It's just, it's so much. But you're going to hear all about it on Limitless. Josh and I are working 24-7 doing this, but releasing four new episodes every week up from three. The viewership has been crazy. The comments on our last episode where we talk about uploading a fruit flies brain into a laptop, a banger of an episode,

Starting point is 00:22:23 go check that out. Comments are like four times what we normally get. We're responding to every single one. We are the most dialed show on AI that you can potentially watch. Please like, subscribe, leave us comments, give us feedback.

Starting point is 00:22:35 And Josh, yeah, I need an important to know. Like the four episodes is just a testament to how fast we're going. And just they need to cover more things. So, again, you will not miss anything if you watch this show. And we will continue to scale to match the necessary demand to like sufficiently cover all of this. And that's what's happening. I mean, it's just downstream with the craziness. We are very much in this singularity event, in the vertical part of the curve. So

Starting point is 00:22:57 enjoy it. Like I'm just trying to take it all in. Take it for what it is. Get excited about it. Share with everyone. So yeah, thank you guys all for watching as always. And we'll see you in the roundup tomorrow.

Limitless: An AI Podcast - The Good and Bad of AI Coding: Amazon Shuts Down, Autoresearch, Claude Code Review, Lovable

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.