The Good Tech Companies - Cardano’s 14-Hour Stress Test: How the Network Took a Hit and Healed Itself

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Cardana's 14-hour stress test, how the network took a hit and healed itself. By Sunday Swap, on November 21st, 2025, Cardana suffered a 14-hour self-reparing chain fork. This is the largest degradation of service for Cardana in its eight years of operation, and as a key developer within the Cardana ecosystem, I felt it was a good opportunity to reflect on what went well, and what we can learn to improve Cardano's robustness even further. Whether you're a maximalist or a hater, I think there's something to be learned through the objective facts. I've chosen to build a career and a company on Cardana. When something like this

Starting point is 00:00:41 happens, I don't have the luxury of beating my chest on Twitter or engaging in collective dunking. I need to engage in serious soul searching to determine if my bed is still sound. The answer I came to was yes, absolutely, with some homework. What happened? A serialization bug caused a unidirectional soft fork, one portion of the nodes rejected a transaction that the rest didn't. This was initially triggered in TestNet, likely on accident, and a fix was identified Andre leased quickly. Unfortunately, someone with deep familiarity with Cardana wascible to reverse engineer how the transaction was constructed and submitted it O'Maynet. You may see claims this was vibe-coded that appears to refer to seeing AI to set firewall rules in an attempt to quarantine the

Starting point is 00:01:24 the transaction, not the attack itself. Unfortunately, this was before the fix had achieved widespread adoption, and so a majority of the nodes, those on versions with the bug, accepted it, while key infrastructure like wallets, chain explorers, and exchanges, rejected it. As node operators upgraded to the fixed version, the chain that rejected the transaction began to grow quicker than the one that had accepted it, and ultimately overtook, leading to a reorg that repaired the chain. As a small point of pride, the diagnostic tools built quickly to triage the issue used code from Amaru. An alternative node being written in rust that the Sunday Labs team is a contributor to. This was a good validation of our plan to

Starting point is 00:02:06 bring implementation diversity to Kerdana. Real impact, in practice, the impact of this chain fork was severe, found it as severe as you might have assumed. The chain continued to produce blocks, and a majority of transactions made it into the surviving fork, though delayed. The monitoring infrastructure run by the CF detected a spike in transaction delays up to five minutes, but other users may have seen delays as long as 16 to 30 minutes, the longest gap between blocks. Some subset of users may also have been unable to submit transactions entirely, though this was due to faulty third-party infrastructure that was unable to follow either fork. A small percentage, 3.3%, 479 out of 14401, transactions made it into the faulty chain and did not make it into the surviving

Starting point is 00:02:52 chain. These transactions are still being analyzed, but might represent missed economic opportunities or risks of double spend. How I think about blockchain outages I have developed a personal taxonomy for categorizing large outages, from most serious to least. One. Sovereignty violations, where the core promises and integrity, such as cryptographic signatures of a blockchain get violated too. Ledger bugs, where the economic principles, such as monetary policy, of a blockchain or broken three. Unrecoverable consensus violation, where a network permanently forks four. Recoverable consensus violation, where a network has a long-lived fork but recovers five. Severe smart contract exploit, where user funds are lost due to a bug in the contract six.

Starting point is 00:03:39 Full consensus halt, where the chain must be stopped and restarted, coordinated through a central authority seven. Degradation of service, where transactions are delayed or the wrong information is displayed to users the incident cardana faced qualifies as four serious but recoverable in my full blog post i give examples of each what went well this incident put cardana's oroboros consensus through its spaces long forks like this are supposed to be exceedingly rare black swan events but the design of the consensus protocol and networking stack anticipate in account for this for example the fact that it was able to self-heal is built into the protocol and the way time is handled has a self-regulating Lamport clock that gave the stake pool operators time to upgrade their nodes.

Starting point is 00:04:23 Additionally, the reporting and communication infrastructure maintained by the founding entities really shown, as we were able to quickly get eyes on the problem and communicate it widely. Finally, it was great validation for Kerdana's choice of language. The particular error was related to some faulty bounds checking on a buffer OFF untrusted input. In languages like C, this type of bug, if not this one specifically, could very easily have led to a sovereignty violation through remote code execution or similar. Haskell's strong memory safety guarantees meant that this kind of bug is never on the table. What broke down, it became clear from the incident that we need better infrastructure around some wallets,

Starting point is 00:05:01 DAPPs, and chain explorers. Many were unable to follow either fork and introduced extra delays in user transactions. In some cases this may have been a safety consideration, but in others it was just a lack of defensive programming that anticipated this scenario. Similarly, especially as Cardana enters an era of client diversity, its clear way need to improve our already rigorous testing criteria. A single bug might lead us all to a bit of survivorship Biasse the level of testing across the current node implementation is phenomenal, but that same rigor needs to be improved and standardized across all implementations of the node. Conclusion, blockchains are not immune to the same kinds of bugs that are rampant across most software. It's usually safe to assume that all software is one network packet away from catastrophic

Starting point is 00:05:48 meltdown, assuming you can but find the right incantation. Luckily, most, but not all of these are found by conscientious security researchers and fixed before they can cause widespread impact. This incident was an exception in highlighted areas where Cardana can improve while also demonstrating its strengths. By Pye Lanningham, Chief Technology Officer at Sunday Swap Labs, thank you for listening to this Hackernoon story, read by artificial intelligence. Visit hackernoon.com to read, write, learn, and publish.

The Good Tech Companies - Cardano’s 14-Hour Stress Test: How the Network Took a Hit and Healed Itself

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.