Limitless Podcast - The Good and Bad of AI Coding: Amazon Shuts Down, Autoresearch, Claude Code Review, Lovable

Starting point is 00:00:00 Last week, Amazon's entire platform crashed for six hours. No one could shop, buy anything. They couldn't even see prices. The reason was because a junior developer had submitted an AI-generated piece of code, which crashed the entire platform, and it cost them millions and millions of dollars. Now, Anthropic, the creators of Claude Code, which is what Amazon was using to create AI-generated code, also had a similar issue where their entire platform has been suffering from outages this entire week. They actually also released a new product called Code Review, which helps use AI to help fix the code problems that their own model is fixing.

Starting point is 00:00:39 It's all getting incredibly complex right now, and Amazon tried to hide the entire thing from a Financial Times reporter. It's all pretty crazy. And it forces us to answer the new narrative, which is AI generated code isn't going anywhere. Demand is at an all-time high. But have we hit a wall? Has it become too dangerous to use AI to code? Yeah, and there's this interesting phenomenon. happening as these AI coding abilities in general become better, where currently they account for what, about 4% of total GitHub commits. The expectations at the end of the year, they will account for 20% of total GitHub commits. And there is this increasing reliance on these AI tools, but that creates these key choke points and points of failure that have a significant effect. I mean, this Amazon

Starting point is 00:01:21 Alge was a huge deal. That was pushed by a single person. And then within Claude Code, they've been having a lot of downtime. And the anthropic, I mean, we were trying to use the programs today. And it was down. The service weren't working quite right. So there's a lot of these growing pains that are happening. And it seems like we're running into more issues faster than people are trying to mitigate it. So that's where I suspect this Claude Code checking agent comes in that we're going to cover later. But this Amazon story was pretty fascinating. Like this cost Amazon billions of dollars. Yeah. So let me walk you through the timeline for this, because this actually isn't the first outage Amazon's experience because of an AI generated piece of code. So back in early,

Starting point is 00:02:00 early December, actually November of 2025, they took a really hardline approach and a new policy was invoked, which was, I want 80% of all Amazon's code generated to be AI generated. And this was their goal to be achieved by the end of 26. Now, this flips, completely flips from a company that I think employs like hundreds of thousands of engineers and wants them all to kind of like hand-write or hand type the code. So this is a pretty aggressive flip. Amazon's been laying off tens of thousands of people. So this kind of is in trend with what they wanted to do.

Starting point is 00:02:33 But in the middle of December 2025, they experienced their first outage. It was 13 hours back then. So this is all adding up, by the way. We're talking about tens of millions of dollars. Then in late 2025,

Starting point is 00:02:44 so at the end of December, there was another outage. And then we have the outages that we're speaking about today. And the issue that this flags is, although AI is like a really useful tool to generate code and it finds bugs, it actually might create more problems

Starting point is 00:02:58 than you'd expect it, because the issue that they're seeing is junior developers that come in, that don't understand Amazon's code base, just kind of use AI to run, like pull a code, to run autonomously, figure out what they want to create, and then they just submit it without actually reviewing and understanding it. And if this goes unguarded, it creates and results in issues like this. Yeah, and there's the problem that's starting to happen now, where agents are creating code far faster than humans can keep up and check it. So it becomes this impossibility where if you want to move at the velocity that AI enables, you are simply unable to keep up with the change log of what's happening.

Starting point is 00:03:32 So you have to defer some sort of trust level to this AI, to its ability to check itself to run tests to verify that it actually works. And in some cases, it doesn't. I mean, I'm sure some people listening this experienced the outage. And it wasn't just an AWS outage. This is Amazon, the actual storefront, where my dad actually texted me. He was like, am I doing something wrong?

Starting point is 00:03:50 Can you place this order for me? Because it's not going through. And then I went to check myself, and the whole service was down. I went on Twitter. I saw Casey Nystat was posting a bunch of photos about this. It was this big deal because anyone who was trying to order anything from Amazon, the entire storefront was just totally offline. So AWS, the web services, runs a significant percentage of the internet. Amazon, the storefront runs a significant percentage of the e-commerce. And these things have been going

Starting point is 00:04:12 down at an increasing rate due to these tools like cloud code. So they're moving much faster, but they're breaking things. This is like the early Zuck Facebook mantra. It's like move fast and break things. They're now taking it to the extreme. And as we become increasingly reliant on these tools. We might see this start to, I mean, permeate even further than just Amazon. Well, the weirdest part about this is that AI kind of has moved from this assistive tool to now being the foundational bedrock for a lot of these different like services that you just mentioned. Like AWS runs the entire internet. Every single business that you interact with online probably uses AWS on the back end. So if they go down, then your business goes down

Starting point is 00:04:49 as well. That's where we get all these outages. Amazon had some pretty severe repercussions. they have now done a complete 180 on their policy of generating 80% of their code via AI. And they've now said, if you're a junior developer or engineer working at Amazon, you now need to get your manager's permission to submit code. So this is so tough because you're now gone from trying to automate the entire thing to now making it incredibly worse. So you're using AI to kind of generate the code, but then you still have to rely on a human constraint.

Starting point is 00:05:20 And there are fewer managers than they are like junior developers. So it all just becomes really bogged up as a pipeline to kind of shipping changes. So I think this is going to slow down Amazon massively. But if it's for the result of, you know, saving your company tens of millions of dollars, fair play. But yeah, it's a pretty hard-line approach. And this all seems kind of connected to the trends that we've been seeing, right, particularly around jobs and distribution. It's like a lot of people are are cutting jobs, which means they're increasing the reliance on these AI tools. But if the result is that the actual output of these AI tools is causing a lot of damage, it creates this tension.

Starting point is 00:05:54 And it seems like the AI labs themselves, anthropic, open AI, they cannot hire talented people fast enough. They are looking to hire every engineer under the sun who is confident and capable of building great products. But a lot of other companies are deferring that workload to these AI labs, which is starting to create this interesting dynamic

Starting point is 00:06:12 where the job market is suffering, but the actual productive output of these companies is not to an extent. And that's operating under the assumption that these code tools will get better. but if Amazon and a lot of other companies start putting in these thresholds that, again, bottleneck and throttle that amount of AI production capability, like that probably doesn't help the calls a whole lot.

Starting point is 00:06:30 There's also the angle that AWS, or just data centers in general, are becoming like a really valuable or national security threat level asset, right? So if we look at the current Iran conflict that's being waged between the U.S. and Iran, Iran actually targeted strikes specifically to Amazon's AWS centers out in the UAE and Bahrain, and this caused a bunch of the outages as well. So it's not even just like AI coding level threats. This is becoming a geopolitical weapon at this point.

Starting point is 00:07:04 It makes sense. Every single story we tell, like this Game of Thrones narrative that's kind of existed throughout the history of Limitless, it's just elevating itself higher and higher to the global stage, where now it's impossible to have any sort of conflict or large decision without AI being in the middle of it. Like when you think about this conflict, what are the bigger things? It's this like anthropic deal with the Pentagon. They're now striking the AWS data centers. It's coming forward the infrastructure that's building these tools that are so powerful. But this isn't just happening on the worldwide stage. This is happening on the consumer and commercial level too. You were mentioning something about lovable and how much they've grown recently. Could you just explain lovable for the people who aren't familiar and what's how big they have grown recently? It's crazy.

Starting point is 00:07:42 Okay, so typically the people who code, this might shock some of you, other people who have learned to code. They're very technically focused work. But there are people like you and I and a bunch of our friends that don't necessarily know how to code but still want to do that. And that's what resulted in vibe coding, right? Now, Claude code and products like cursor are still very oriented for technical folks. A platform like Lovable is for people who have zero experience coding,

Starting point is 00:08:09 but still want to code things. So the Loveable platform is actually a really, cool platform where you just type whatever you want. Like, I want to create an app that tracks my fitness reps or whatever that might be or create or suggest cooking recipes for my type of cuisine. And it does so in a couple of minutes. And it's really intuitive. It makes it super easy. And they have the additional perk of being able to deploy it live as a website or an app that you can publish on the app store super easily. So they have this end-to-end experience. Now, Loveball when it started, had like a rocket-type trajectory. Within the first 12 months, I think they hit 200.000.

Starting point is 00:08:42 million dollars ARR, which is just like an insane growth. All of these AI companies are blowing up. Now, in the last month alone, they've added an extra hundred million dollars of AR. They were at 300 million. Now they're at 400 million. So that's like roughly like just over 30% growth. Just insane amount of demand. And that's the point I want to make. Like when I look at these Amazon outages and Amazon restricting their developers from using vibe coding to improve their product, I don't think this is necessarily a wave that they can stop. So they should stop trying to implement policies that are restricting people from using it and instead try to figure out what is a better way to implement AI generated code

Starting point is 00:09:23 because this thing's not going away. Lovable, Cursor, ClaudeCode are all skyrocketing and we're not going to stop that. And I think a lot of the, I mean, aside of ClaudeCode, right, is lovable and cursor, they're kind of model agnostic. They'll use whatever model you put. So I was looking at Pollymarket earlier today because they have markets for which one of these models is actually going to be the best in which time. And this is interesting because if you looked at these charts towards the middle of January, you noticed that Anthropic was kind of the favorite.

Starting point is 00:09:48 Anthropic was looking like they were going to have the best model. But the reality now is that by March 31st, there's a very clear divide that has happened over the last six to eight weeks in OpenAI having an 85% chance of having the best model. Does that mean there's going to be something new, or is that just the current model? We don't know. Currently, they're at 5.4, which is fantastic. I think everyone is kind of unanimously decided that it's the best for coding. But then there's also a fun polymarket here about Claude going down because it appears as if Claude is really just not doing as good as it needs to be. It's not as stable. We were trying to use it this morning. It wasn't working. They're clearly experiencing growing pains because they have all these new users and they're

Starting point is 00:10:25 having to throttle reasoning. They're having to limit the amount of searches that people are able to use. I'm sure the researchers who use these to train the models are not happy. And it looks like based on Polymarket will be seeing somewhere between five to nine more outages in the month of March. Yeah, basically every single day this month. Yeah, like it seems like they're having problems, and this is proof. I mean, you can actually bet on the problems that they're having and make some money on it, right? This is what, $26,000 of volume. So thank you, Polymarket for sponsoring this segment.

Starting point is 00:10:56 And I guess now we can get on to the solution, which is what Anthropic is starting to roll out. But they're doing it in a way that's a little less than desirable. Yeah. So the main instigators of this entire AI coding debacle is Anthropic. They created Claude Code. Amazon owns 21% of Anthropic, which is just crazy to say out loud. And so they use Anthropics model, Cloud Code, to do all their AI code generation. But Anthropic decided, hmm, instead of trying to make our AI coding models better,

Starting point is 00:11:27 let's just release a new product called Code Review, which will review your AI generated code. So you now have Claude generating the AI code and then also reviewing the code as a separate product that you pay for. And it's all for the beautiful price of, I think it's like 15 to 20 bucks. Where is it? Yeah, 15 to 20 bucks per review, which some might say is quite expensive. And they scale based on the pull request complexity. So that number can get higher. And what I find it funny is they're selling the problem where they're selling you a model.

Starting point is 00:11:58 You use it to code. It's going to create problems. And then they're selling you a. separate package for the solution where, oh, you have problems in our code. Well, here, for another $25, $50, whatever it may be, we'll actually do a code review and we'll fix the problems that we've created. And that creates this, this uniquely disincentivized incentive where there's no real reason for Claude to want to fix problems when they can just sell this package on top to remove

Starting point is 00:12:23 the problem. And this comes in the face of chat JupT and OpenAI doing the polar opposite, where they're actually offering these for free. And here's Rohan from OpenAI. If you want AI code review but don't want to pay $25 per review, check out Codex review. It leverages frontier codex models, finds complex issues, and 100% usage-based. Most run should cost $1 or less. So here we have another head-to-head collision happening where ChatGPT, OpenAI, is going head-to-head with CloudCode.

Starting point is 00:12:51 And it seems like CloudCode is the favorite for a lot of people, but I've noticed a lot of the more technical people on my timeline who are building pretty hardcore things have been really relying on. 5.4 and using Codex much more. Yeah. So to set the precedence, OpenAI and Antarctic now are neck and neck at building the best coding models. And Codex 5.4 from OpenAI has taken the lead. A lot of engineers are now saying it's way better than Opus 4.6, as you just mentioned. In terms of these security code review products that each company offers, Open AIs is instinctively cheaper. If you have a $20 a month subscription, you now get access to this code review tool for no extra cost. So it's automatically a better tool to use.

Starting point is 00:13:32 Now, in terms of quality, in terms of how good it is at code review, we don't know because they haven't made any of their datasets public. Anthropics has, and it's pretty damn good. Also, if you compare Anthropics product to traditional SaaS vendors that have these kind of appsec security tools, it is still way cheaper. A lot of people were in my comments, actually, because I tweeted about this saying it was cheaper. They were like, no, it's not.

Starting point is 00:13:54 Like, I could use the SaaS tool. And I'm like, yeah, but you're not factoring in the engineering hours. they spend like 30 to 60 minutes per bug, that adds up to about $100. So it's still technically cheaper, but I would rather use Open AI's tool, which is just way better. Now, Anthropic itself are facing so many outages for Claude. You mentioned earlier that, you know, Claude is out for you this morning. It's also out for me this morning. And I think this is because Open AI basically has all the money and compute to subsidize this cost for a tool like this,

Starting point is 00:14:22 whereas Anthropic is struggling. They're adding a million users to their general database or to their general user base. every single day, and they're not able to subsidize people's compute anymore. So they're just shutting people's access off to it. And it's kind of frustrating to see, to be honest. Yeah. And the way you have to think about this is each one of these companies has a limited amount of compute. And that compute takes care of everything. It has to serve the customers. It has to serve the developers, the researchers. And a lot of times when you're developing these new technologies or you're training new models, the training run consumes a

Starting point is 00:14:52 tremendous amount of GPU power. The researchers want a lot of GPUs to run tests and run trials on, but you still have to find extra resources to serve all of the users who are querying this model every single day. And I was recently listening to Dario on, I think the Dwar Kesheh podcast, where he was talking about how they think about how many GPUs to order into the future. Because they go to Jensen, they go to Nvidia. They say, hi, Jensen, we would like to place this amount of GPU purchases for this year. And what he was mentioning that I found interesting is that they can't over order because

Starting point is 00:15:22 if they're off by just a small percentage, the incremental cost of those GPUs, use will far outweigh the growth, the revenue that they get from growth. And what it sounded like he did is he was just taking the current growth and mapping it to the future to hedge themselves against this collapse from debt that they would have from over ordering. But the reality is that since that episode was recorded and now, they have gone fully vertical. Like that curve has steepened significantly more. And they're going to have to figure out a way to solve this because this is not an easy thing. These orders get placed years in advance. They're projecting for a certain curve and they're getting a different one. So is this a short-term thing? Is this going to

Starting point is 00:16:01 be durable? I don't know. But it is noteworthy that Anthropics having some growing pains. Okay. So we have two problems here. We have an amazing AI model that can code. And that is inevitably going to be the future. Loveables demand proves that. But then we also running into the issue where we have all these security flaws and we could lose tens to hundreds of millions of dollars. Amazon demonstrated that, right? So we're at this conundrum. How on earth do we solve this? Do you know what the solution might be, Josh? It might be AI. It might be AI itself.

Starting point is 00:16:30 That checks out. You have to use AI to solve both of these individual AI problems, I'm afraid. So, Andre Carpathy, the godfather of AI, as I like to call him, released this really cool experiment that is just completely blown up, which hints at what the future solution might be to solving this problem that we just explained on this episode, which is called auto-research. Now, the best way to think about this is he created an AI model. that acts as an AI researcher.

Starting point is 00:16:57 But the coolest part about it is it autonomously does research. Now, typically, when you use an LLM, you need to prompt it. And then it says, I don't really know what this means. Can you tell me what it means? And you have to keep working back and forth.

Starting point is 00:17:10 This thing runs completely on its own overnight, and in some cases, for days at a time. He's done this overnight when he slept for like nine to 12 hours, and then he's done it again for two to three days. And how the AI model works is he sets an objective, He says, I want you to try and improve this part of your model.

Starting point is 00:17:28 So he's talking to the AI model and saying, you need to improve yourself in this particular way. Go away and figure it out. The model then runs an experiment every five minutes. If the experiment comes out with an improvement, it caches it in. It improves its own model weight. It improves its own internal insights, right? If it doesn't, it discards it and it tries again. And it runs these experiments on and on and on.

Starting point is 00:17:51 In this experiment, I think he ran around 150 experiments overnight. And it improved itself pretty marginally, as is demonstrated by this graph, the loss of version just went crashingly down. So it suggests that maybe Anthropic and Open Air, instead of having to rely on releasing new tools every day constrained by humans, they could just let the AI improve itself and everyone's hunky-dory, maybe. This seems like an early form of this takeoff situation in which AI has become self-improving. Because I haven't really seen this architecture before in such a open and easily accessible way.

Starting point is 00:18:23 I mean, this is all open source. This, I mean, Andre says here, it's 630 lines of code. And traditionally, it's on the human to iterate on the prompt, to improve the prompt, to improve the training code and to work together with the AI. But this actually does it completely and entirely on its own. It goes off, it runs its own experiments. It ingest those experiments, updates its values that it has, updates the view on the world, and then creates a new set of experiments and runs this iterative process.

Starting point is 00:18:48 And what we saw in that chart is that there was these incremental improvements that Andre just didn't see. And again, it's hard to overstate. how much of a powerhouse Andrei is in terms of AI engineering and understanding. Dude, he has two decades of experience in this. And like, AI, what? Like, wasn't even around two decades ago. Like, he was the guy who was building the damn thing. Yeah.

Starting point is 00:19:10 So for him to come along and for the first time ever experienced this novel breakthrough in which an AI isn't able to improve his code in ways that he didn't initially see. It's like when AI played go for the first time or when AI is playing chess and it makes these erratic moves, you would not have known that it was optimal, but the model is so precise, has so much context and so much intelligence, that it's actually able to make these optimizations that humans otherwise wouldn't have seen. And this is one of the earlier instances where even Andre got humbled. And the code that he was running was significantly improved overnight by this self-recursive thing. And the most interesting thing about this is it's open source.

Starting point is 00:19:46 It's available to run on your laptop, on your PC, on a computer right now. And I have to imagine that AI labs are going to take this and run with it at scale, where if you're training a model, why would you not, and this model using this auto research can do better than Andre, I'm sure it could do better than the average mid-tier prompt engineer at Anthropic or Chatchip, or Open AI, and actually improve the entirety of the company. So this feels like something that is starting from the bottom, where in terms of it's like openly accessible, it's open source, and then we'll move its way up through everything and become this like auto-aggressive self-improving loop. It's incredible. I want to try.

Starting point is 00:20:23 try it on something. We need a problem that's hard enough to try this on where you can truly let it run for a series of days and then come back with the answer that's better than anything you could have ever imagined. Well, maybe research on how to make Limitless the best and number one podcast on the world. Maybe I'd be down to run this. That's probably, we should try that. We should try that. We should make some assumptions on what it would say, right? It would say, well, we probably need to convince all of our listeners to go subscribe to the podcast on your favorite podcast player, give it five stars, share it with all of their friends, and then make sure that they comment the things they either like or don't like about it

Starting point is 00:20:55 to keep the engagement rate high. Those things that'll help out the show. That actually sounds pretty great. And if you're listening to this, remember, you are no more than organic LLM. So if you want to process that prompt that Josh just specified, please go do it. It helps us out a massive amount.

Starting point is 00:21:09 But just to round up and wrap up this episode, it's this weird conundro. AI is either zero or 100 at this point. It's either deteriorating your product and causing massive outages or it's doing something like this and helping the world's leading expert of AI research actually do a better job, which is just there's no middle ground. It's either zero or 100, and it's just super exciting to see where all of this is going. I tweeted this out the other day. My mind, I don't know about you, Josh, is just in a fog at this moment because I can't keep up with everything that's going on. Something's switched over the last 30 days where the exponential curve has just like racked it up.

Starting point is 00:21:48 I don't know whether this is because I'm in my own echo chamber or not, but it just feels like, We are getting new model releases every single week at this point. There's so many new frontier AI labs. Jan Lecoun, who is the former head of AI research at Meta has now started his own thing, and he's focusing on world models. And I'm like, what is going on here? It's so much, but you're going to hear all about it on Limitless. Josh and I are working 24-7 doing this.

Starting point is 00:22:10 We're releasing four new episodes every week up from three. The viewership has been crazy. The comments on our last episode where we talk about uploading a fruit-fly's brain into a laptop, top, banger of an episode, go check that out. Comments are like four times when we normally get. We're responding to every single one. We are the most dialed show on AI that you can potentially watch. Please like, subscribe, leave us comments, give us feedback.

Starting point is 00:22:35 And Josh, yeah, I need an important to know. The four episodes is just a testament to how fast we're going. And just the need to cover more things. So, again, you will not miss anything if you watch this show. And we will continue to scale to match the necessary demand to, like, sufficiently cover all of this. And that's what's happening. I mean, it's just downstream of the craziness. We are very much in this singularity event, in the vertical part of the curve.

Starting point is 00:22:57 So enjoy it. Like, I'm just trying to take it all in. Take it for what it is. Get excited about it. Share it with everyone. So yeah, thank you guys all for watching, as always. And we'll see you in the Roundup tomorrow.

Limitless Podcast - The Good and Bad of AI Coding: Amazon Shuts Down, Autoresearch, Claude Code Review, Lovable

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.