The Infra Pod - Building a bug-free vibe coding world (Chat with Akshay from Antithesis)

Starting point is 00:00:04 Welcome to the InfraPod, Tim at Essence VC and Ian let's go. Hey, Tim. This is Ian Livingston. Super excited. Builder of trusted agent software, make an identity cool again at Keycard. We today are joined by the field CTO of antithesis, Aksh Shay Shaw. Actually, tell us about what in the world, antithesis is, what they do, and maybe you can tell us a little bit about what a field CTO is for all of us trying to figure it out at home. Hey, Timney, and thanks for having me on. At Antithesis, we just built something very simple. We build the best way to ship bug-free distributed systems.

Starting point is 00:00:44 So if you're building some multi-node system and you're storing some data, you probably don't want to lose it. And we build software that helps you make sure that your system does what it says it's going to do. And my job here is to be field CTO, which is kind of an ever-evolving mix of sales and marketing and Devrel and product. And I think it's a nice title for someone who has a technical background, but wants to pitch in on the business side of things. And wear a lot of hats.

Starting point is 00:01:15 Amazing. And what got you to want to join Intithesis? What was it about it? What did this is doing? That made you want to jump on their journey and join the wave that they're building. Well, if you've ever had to build a new distributed system from scratch, you face this very awkward moment very early in the project where you're writing a lot of code that is extremely difficult to get under test. And usually that code is in some error handling block.

Starting point is 00:01:48 It's like, oh, if I try and contact the other node and I can't, what do I do? Or if I get a block of data back from another node in my system and the checksum doesn't match, what do I do now? And those things tend to be very, very hard to test because they're not on the happy path. And in order to get there, you need to introduce some error into your system. Packet loss or a network partition or bad hard drive or bad CPU scheduling or a node that has way too many noisy neighbors, something like that. Up until Antithis is launched, the best way for you to test that stuff was with Jepson. Jepson's a really effective framework, but it's a lot. It's a lot of Lisp. It's non-determined. It's kind of painful to work with.

Starting point is 00:02:33 And so the last time I was doing this, I was starting to build a new distributed thing from scratch. I had dusted off my parentheses and my, you know, parietit config. And I was starting to write my Jepson tests. And that's when antithesis came out of stealth. And so I called them right away on day one. And I think the account exec who picked up my phone was a little taken aback

Starting point is 00:02:56 because my basic message to him was, I don't believe that any of this works. This sounds like nonsense. Nonetheless, I'm desperate, and I have my credit card. I don't really want to hear the pitch. I just want to buy this thing, and I want a three-month cancellation clause, and then let's just get going. And it turned out to be amazing.

Starting point is 00:03:14 The product was super effective, and it was so incredible to me that I wanted to join this company's journey. Can you help for us at home understand, like what are the example or use case for entities helps you solve, like, the distributed systems or bugs or complex things? and why in that specific situation, it's so fundamentally different than what came before and how it empowers people to build trusted distributed systems?

Starting point is 00:03:41 Absolutely. I think the easiest place to start is with a concrete piece of software that we're testing today. So antithesis helps the CNCF test at CD. And if you don't know what that is, at CD is a key value database that is distributed and strongly consistent. and it is the heart of storing state in a Kubernetes cluster.

Starting point is 00:04:04 So that means every time you do anything in Kubernetes or a new machine comes up or a new deployment goes through, the critical path of that is going through Etsy. And the LCD project has a long history. It started many years ago at CoreOS and then became part of Kubernetes. And so over the years, it has gotten bigger and more complicated. And it's now in this linchpin position in the infrastructure world. that it wasn't in when they first built it and designed it. And over the years,

Starting point is 00:04:35 Etsy started to get the kind of bug reports that every distributed system gets, that every engineer hates to see, where someone pipes up on GitHub and they just say, hey, look, I don't really know what's going on, but I'm running this thing, and I'm getting these weird error messages, and then all of a sudden some of my data seems to just disappear.

Starting point is 00:04:54 Or I've got a client connected to Etsy, and it's just not receiving some of the data, that it's supposed to. And I can't really reproduce it, but it happens every so often. I'm sure it's happening. And as a maintainer, you come and you look at that, and you're like, what am I supposed to do with that?

Starting point is 00:05:10 It happens sometimes in my cluster once a month, which is pretty often, but I can't run your cluster for a month just to try and reproduce this bug. And so that stuff just piles up and it sits there. Google invested quite a lot of time and money and effort in improving Etsy's testing. They did an amazing job at it.

Starting point is 00:05:32 They squashed a whole bunch of bugs, but there were still some bugs outstanding. And that is stuff like I described, like clients are just not getting the data that they're supposed to. And what we did is we come in and we helped Etsy, and one of the LCD maintainers, take the test that they had and run them in the antithesis environment, which is special because it is perfect.

Starting point is 00:05:58 perfectly deterministic. So anytime you find a test failure, it immediately becomes perfectly reproducible every single time. And we pair that reproducibility with a really powerful exploration engine. You can think of it a little bit like a fuzzer. And what that does is it finds the deepest, gnarliest corners of your code and make sure that they work as they're expected to. When you pair those together, what you end up with is

Starting point is 00:06:27 a small number of integration tests in your codebase that punch way over their weight. With a couple of tests and with this fancy exploration engine and deterministic environment, you're able to really thoroughly test extremely complicated systems. We can talk about what that exploration engine does or what it means to explore a program or some kind of analogs for this sort of thing. Where do you want to go from here? I mean, I think it would be really useful to also walk through, like, why? What's the cost of these kind of bugs to infrastructure, to the applications we build,

Starting point is 00:07:08 how's the impact reliability, how's the type use cases we can tackle, right? Because I think to many, maybe not the listeners this podcast, let's say less technical people, things like asset-compliant transactions and, you know, seven-nines or five-nines or four-nines or three-ninths availability. These don't mean anything. They don't understand the impact of what it takes to make systems of that reliability and the consequences of a system

Starting point is 00:07:33 that isn't that have those levels of durability to reliability of high availability. So could you help for everybody? Why does this matter? Right? Like what's the fundamental dollar sign reason for the need for these types of things in a world that's increasingly run by computers?

Starting point is 00:07:48 That is a great question. Let me ask you a question in return. Are we totally tired and done talking about the AWS on this podcast. We haven't even talked about it yet. Oh, my God. I mean, Tim might be tired because he's talked about it enough, but I'm not ready.

Starting point is 00:08:02 So, yeah, let's go for it. People never even log on Amazon. That's the joke I have. Gen Z people never even seen the console. We're just getting started, man. Like us looking like this, we probably got tired, but you might even introduce what Amazon is for something. Okay, well, you know, if you haven't heard of it, Amazon.com is a bookshop on the internet.

Starting point is 00:08:21 Amazing stuff. And they spun off this cloud computing division, and so it turns out tons of the stuff you do every day, from your smart light bulbs to credit card swipes at your favorite store, to booking a hotel room, whatever. It all runs on Amazon's cloud computing thing. And for a bunch of reasons that, at least in my opinion, are actually pretty reasonable. All of AWS still has a dependency on this one system deployed in one small set of buildings in one. little region in Virginia. And there was a bug in that system. It was a very complicated bug.

Starting point is 00:08:59 It was very deep. A bunch of things had to go wrong in just the right way. And that system broke. What was it? A couple of weeks ago. And when that system broke, it got all of AWS into what's called a metastable failure.

Starting point is 00:09:14 And that just means that it's broken, but it's broken in a way where it wants to stay broken. It doesn't want to heal. And that just took down this enormous swath of the internet. Credit card payments were down, checkout on a million stores, like online websites were down, websites were down, all sorts of stuff that you wouldn't expect to even have an internet dependence. Turns out, like, you can't open your smart garage door,

Starting point is 00:09:38 your alarm system doesn't work, all kinds of things are down. This has this enormous cascading economic impact. That is a gigantic example, but there are tons of small examples too. Capital years ago had a bug in one of their high-frequency trading systems. So again, a large and complicated piece of software that encountered a particular situation that the people testing that software hadn't thought to test, and it just went awry and blew up the whole company in less than an hour. This stuff happens all the time. That's the outage side of things, where obviously if you have a sufficiently large outage,

Starting point is 00:10:18 it costs money. I think the less recognized side of reliability is that even if you don't need a system that is incredibly high reliability, when people talk about three-nines or five-nines, they're really talking about how many seconds or minutes or hours per year can your system be down. And five-nines is a very, very high bar. It means you have seconds of downtime a year. Most people are not building systems like that. So maybe you can happily tolerate hours or even a day of downtime per year. But what you're expecting to get in return for that relaxed uptime requirement is going faster. You want to say, I want to ship more features, I want to deliver more value to my users, because I don't have to be quite so painstaking about every little bit of code. Well, having a more effective way to test actually lets you deliver on that,

Starting point is 00:11:15 because you can have a handful of tests and be totally sure that you are going to hit your uptime guarantees, even with the most outlandishly aggressive plans, the biggest refactors, the most gigantic new features that you want to ship, you can actually put your pedal to the metal and go as fast as your team can without worrying that you're going to tank your quality. So I think testing and reliability gets you both. It avoids big public outages where you're paying out for SLA breaches. and it buys you more revenue, more value delivered, and more product velocity. So I think when we all think of your company, we all think of the very first block post that announces the company, which says, we've been stalled for five years, right?

Starting point is 00:12:03 I mean, I kind of just stuck in my head and stuck in everyone's head for the biggest reason because it's not common to see that. So I know you only joined, like, not that long ago, but I'm sure you heard and know the story. So tell us, like, what is the birth of this company been like? Because it was the foundation DB team, right? That's right. Come into and to build this sort of, like, pretty interesting low-level hypervisor that can do deterministic executions and stuff like that.

Starting point is 00:12:31 So, yeah, give us the, give us, like, what was the five years about? You know, do you recommend everybody else to do this? Yeah. Well, the short answer is, no, I certainly do not recommend that everybody go down this road. Tim, you of course know that paying a team of salary for five years requires a certain amount of fundraising chutzpah, which not everybody has.

Starting point is 00:12:53 But you're right to say that the origins of antithesis are back in FoundationDB. And FoundationDB was a while ago now, so a lot of people haven't heard of it. It was one of the first strongly consistent distributed databases. And all that means is that it's a massively scalable multi-machine database

Starting point is 00:13:12 that feels like a regular single node database. The answers you get back are always correct from an application developer's perspective, which is a useful property for a database. It's very hard to work with situations when that's not the case. And they proved that you could do this be fully correct

Starting point is 00:13:32 and remain incredibly highly available. And they're kind of famous for only shipping one bug to users ever before the aqua. And commercially, the story of FoundationDB, I think is pretty amazing. They built this software, they launched it to the world, and their two big lighthouse customers were Snowflake and Apple. And at least from the outside perspective, it seems very clear that Apple bought this software, looked at it and said, oh my God, we're going to end up building our whole business on this database. we can't possibly buy this from a vendor. And so shortly after, they just rolled in

Starting point is 00:14:15 and bought FoundationDB lockstock and barrel. The key to FoundationDB's feature velocity and reliability was their really idiosyncratic approach to testing. They built the entire database to be completely deterministic and to be explored by this kind of software fuzzer very similar to the way antithesis worked. but they did it in their application code. And that meant that FoundationDB could have no external dependencies.

Starting point is 00:14:47 No Postgres, no Zookeeper, no LCD, no S3, none of that, because it would break all of their testing. So after they succeeded, all the people there went on to careers at Apple and Google and meta and all the places you'd expect high-flying distributed systems people to go, they got back together and they said, you know, we have now been on our walkabout through the industry. and nobody tests the way FoundationDB did. They're all doing it the wrong way, and they're paying the cost in slow feature delivery

Starting point is 00:15:18 and bug-ridden software. There must be a way to take this approach from FDB and package it up so that it's usable by everyone without completely changing your application code to accommodate it. And they said, of course, the answer is obvious. We must build our own virtual machine that is perfectly deterministic. And then you can take any software you like

Starting point is 00:15:43 and run it in our VM and it will get all of these magical properties for free. And that, it turns out, is a very big undertaking. We use the word hypervisor a lot. Virtual machine is another good word for it. But if you're an application developer and you're writing your apps and Python or Java or Go, you have to go down several layers

Starting point is 00:16:04 to get to where antithesis really begins. See, we're not writing a Go program. We're not writing the Linux user space. We're not plugging it at the Linux kernel. We're going one layer below that. And we're writing, we're emulating the layer where the kernel interacts with the hardware. And we're intercepting those calls. Some of them get passed through to the real hardware because they are deterministic by nature.

Starting point is 00:16:33 Others, we sort of intercept and fake out. And that's stuff like, what's the current time? How do I schedule this thread? Give me some randomness that normally is seated and provided by the hardware. We intercept all of that and make it perfectly deterministic. And that's the basis of really powerful exploration. You can think of this like playing a Nintendo game. If you want to build a program that can kind of blindly beat Super Mario Brothers,

Starting point is 00:17:02 if you're of roughly our age, you've had this experience. You're sitting down, you're playing a video game, you really want to beat it. But every time you die, you go all the way back to the beginning. It's a pain in the butt. And so it's just really, really hard to get to the end of the game. And back in the day, they actually built a product to fix this. They called it the game genie. And it literally was like a physical piece of hardware that would sit between the game cartridge and the console.

Starting point is 00:17:27 And it did exactly what Antithesis does. It took that cartridge and it started monkeying with the bits. the system to make it easier to play the game. And one of the things that, if I remember right, it offered was it allowed you to save the game in games that didn't normally support saves. And that makes it dramatically easier to beat the game, to find all the secrets, to get to the hidden levels. That's what determinism gives us for Etsy.

Starting point is 00:17:55 It lets us save the game wherever we want. So when we find one interesting fault and we think we're on the trail of a cool bug, we save the game and then we can always restart from there instead of going back to the beginning. And I'm curious because I worked on Kafka, you know, I went on Mesos

Starting point is 00:18:15 and it's both pretty much different type of statefulness, but it's definitely just your systems. We run into all kinds of cluster cascading failures and replication bugs and it's just hard to do those kind of testing, right?

Starting point is 00:18:31 And also we're smaller teams. you really can't enable to test everything. So I still remember the days we have to basically do a bunch of like, not super granular, but like, I guess more like integration tests, but like different scenarios ourselves to try to like catch these weird problems. I guess my question here is like I feel like SED, Kafka's, right, Zookeeper, all these systems, they definitely power a lot of things. You had to be much more careful there.

Starting point is 00:18:57 But I don't think there's that many systems like that. Right? So you went through all this to build a pretty beefy, you know, hypervisor that can basically like mock up any system interaction and, you know, save the world, get the game state world and just go back anytime. I think that's great. But where do you think this applies beyond the cough gutted NCDs and the foundation DBs? Because I think those things, they typically are very low level, very scalable, very critical. But I feel like that. I feel like that. likes criticalness and scalable, also is a spectrum, right? Absolutely. Do you want to focus on those set of things, or do you find your FULCT or your users or your customers are actually not just building Kafka's or not just building zookeepers? What are they building that they actually see this as seriously crucial for them? That's a great question.

Starting point is 00:19:54 So I think people build all kinds of things where they have a clear idea of correctness in their minds, but either they're unable to meet their goals or they're not sure how to get the reliability they want with lower time and engineering investment. So I can give you some examples of both of those. One really commonplace where you're actually, from an infer perspective, at least, pretty high up in the stack, but you have really serious correctness guarantees are anything involving money. Like if you're building stripe or column, you're building stripe or column or ramps like internal ledger, that's a place where losing money is a big deal. You can't have debits without a corresponding credit somewhere. Similar systems that take that to the next level

Starting point is 00:20:43 are places like Knight Capital. Anywhere where you have software that's moving around and investing money, your correctness guarantees get pretty extreme. Those are places that already tend to be on board with spending money and spending time to improve reliability. The bulk of software, which I think is probably what you're hinting at, it's like business software, right, where you say, yeah, like, you know, I opened the Uber app, I click to get me a ride button, and like, I would love it if a car showed up and picked me up. Like, if it's not working, just like swipe that app away, open up Lyft, open up Waymo, and

Starting point is 00:21:22 like, surely somebody will get me a car. Or I'll hop on the bus. It'll be fine. Well, I can tell you when I was at Uber that our perception of that, right, was that this is a crisis. Like, if you are not taking Uber, you probably are taking a lift. And that is a big problem for us as a business. And we have some clear guarantees in our minds about what it looks like to get a trip. And no, it's not a database, but it is sort of one business transaction. And the idea is that if we dispatch you a car, that trip has to end in one of a handful of states eventually, very, very reliably. And that's where something like

Starting point is 00:22:01 Temporal gets born, right? And any system you build on top of temporal needs to obey those same guarantees. The trip starts, either it gets canceled, or eventually the trip ends and you are billed and we collect the payment. That's a guarantee that took years and untold numbers of millions of dollars of investment to actually get right at Uber. I assume that The same thing is true of Lyft and of every company like this. The same thing is true of every sufficiently complicated web application. We're just on the front end side. It's like a video game.

Starting point is 00:22:38 There's so much state. There's so many little user journeys you can take. And it's so common now to find places in the app that are just completely broken. Like, well, I click these six links. I'm on this page and my cart disappeared. I just can't get to it. Let me hard to refresh this. see if I can make it work now. If it's cheap and easy to fix that, I think the market for this

Starting point is 00:23:03 is enormous. Everybody wants their software to be best in class to delight their users and to work reliably. It's just that not every problem is worth spending $50 million over 10 years to make completely bulletproof. You know, I spent three-ish years at Salesforce in 2012 through 2015. And people at the time spent a lot of time inside the organization. And the thing about Salesforce is it's described by most people's a giant database in the sky that has like drag and drop user interface. It's very extensible. But basically someone slapped a UI over an asset compliant database and then made it extensible.

Starting point is 00:23:44 And that's created a $350 billion company. And the core thing about Salesforce that it actually sold to people was asset compliance. that if you did something in the UI, it actually happened. And there were guarantees around that. And there were guarantees around different pieces of code or different things or rules or whatever that could be run prior to that transaction occurring. And at the time, this is what they called Apex and Salesforce apps.

Starting point is 00:24:12 And that was revolutionary for that type of customer because it put programming in the hands of less sophisticated people, but enabled them basically become programmers. And they have this whole community called the sales, Admin's community, which hundreds of thousands of people whose entire identity is basically, well, I'm a Salesforce admin, which is a type of developer. But at the core of it, always comes down to this sort of property is that, and which is one of the reasons that makes it very difficult for Salesforce to change anything about what it does,

Starting point is 00:24:39 is that it must guarantee certain types of things occur in an asset-compliant way inside the context of the Salesforce instance. And it strikes me to think, you know, we're entering a world with agents. We have things like temporal and durable workflows. we have the rise of things like sandboxes. We're getting to a world where it's not just like a dream where we can spin up many ephemeral instances of a piece of software and we can test it and figure out which is the right one

Starting point is 00:25:05 and do tons of mutations in the code and figure out, you know, like we're getting there. And so it would be really great to hear from you how you fit, how you think about things like these durable frameworks like temporal or others and things like antithis fit into this new world, right? where we can basically reason and guesstimate and describe intent to a model, and it can spit out a really good suggestion and how what the future of agentic coding looks

Starting point is 00:25:33 like, but what also the future of crafting systems looks like, knowing that we have this thing that's pretty amazing, it's very non-deterministic, but ultimately what we actually want to software most of the time is very deterministic behavior. How did you all approach this at Salesforce? Because in theory, underneath Salesforce is a data. database that's acid compliant. So, like, why is it so hard to make Salesforce just do what its infrastructure does? I mean, at the core of it, it comes down to scale, complexity and number of use cases to do it, right?

Starting point is 00:26:05 And so, you know, a lot of people sit back and think, well, the truth of what sales force made Salesforce so valuable is, yeah, it's a database in the sky. But they made it an incredibly extensible database in the sky where you could bring your own data. You could enforce that data model. you can write hooks around that data in pre-processer steps, and that was consumable and made possible by, you know, less sophisticated engineers. You didn't have to be an Oracle developer. You didn't have to be, you know, spend, have a PhD to do it

Starting point is 00:26:33 or understand asking compliance. It just worked that way. So ultimately, what Salesforce, if you listen to Beniof, what Salesforce really sells to its customers, first and foremost is trust is we're going to give you, put something in your hands, and then we're not going to let you do something that would violate trust, and we're not going to let your data get leaked,

Starting point is 00:26:49 and we're not going to let you lose your data because this is your most important data because it's your business data and it's how you run your business, and that's basically what Salesforce sells. And at the core of it, I mean, Salesforce is a giant or a instance, but there have been with millions and millions

Starting point is 00:27:04 and millions and millions of hours put into scaling it and architecting it and building systems around it that make it possible to deliver the guarantees across the vast number of use cases that Salesforce supports for the biggest customers in the world running vast quantities of data. All of those dimensions just mentioned are actually the thing that makes Salesforce valuable in the same way that Twitter was never just in MySQL database. It was actually a real-time feed, and so you could not replicate Twitter at scale with just

Starting point is 00:27:35 a post-christ instance or MySQL. You actually had to go build a very complicated distributed system dealing with the different shapes and formulation of the data based on how you wanted to query it. And Salesforce is very much the same thing. That makes a ton of sense to me. And I think that you're right, that that leads us very directly to things like temporal and now the many other durable workflow frameworks that are out there. And then even more so to LLM authored kind of ephemeral code.

Starting point is 00:28:04 One of the things that I would imagine makes Salesforce somewhat tractable as a product is that you're not letting users write arbitrary database operations. They still have the constraints of the Salesforce user interface. They're allowed to plug in in certain places using kind of a visual or a programming language that has some constraints around it. And then you test really carefully to make sure that given that structure, that the platform guarantees still hold true. The same kind of thing is true of temporally. The temporal workflow, right, is divided into two separate types of things. there's glue code that can be non-deterministic,

Starting point is 00:28:45 but then there's the core of the business logic, which you're just required to make deterministic. There's not a ton of help. There are not really any safety rails there. You just have to do it properly. And if you don't do it properly, the guarantees of temporal the whole system just fall apart. There are a ton of problems that come up in engineering at scale

Starting point is 00:29:05 and over time, where your temporal workflows might be perfect at any given commit. But when you deploy the next commit, you break everything because you made a change that's not compatible with the old workflows. This is even harder in what you're describing as an agentic coding environment, where now you're trying to provide guarantees around arbitrary code that's doing who knows what. I think it's always helpful with those kind of systems to have a clear sense of, hey, for the platform I'm building, what are the bedrock guarantees I provide? If you are Salesforce, one of your bedrock guarantees might be, no matter what code, our new LLM integration is writing and plugging into Salesforce, agents are not able to access data that the user who's invoking the agent cannot access.

Starting point is 00:30:00 Like period, the end, that is fundamental to the trust, as you say, that Salesforce is selling. Well, that's a property that if you look at it the right way, is a lot like asset compliance and would benefit from being really exhaustively tested without having to write 800 million flaky integration tests. They're like, what about this bag of code? What about this one? What about this one? What about this table of data within Salesforce?

Starting point is 00:30:26 What about this other one? You want this to be a generic safety property that you kind of smear across everything in your platform. I think that's the way that we see this sort of testing evolving, that no matter who is, is writing the code or where it's coming from, the faster the code is changing and the more of it you have, the more you need to up-level your testing strategy to just be higher octane to keep up. LLMs are the latest step change in how quickly we're producing code and producing business logic.

Starting point is 00:31:00 And to keep up with that, we just need tests that speak more loudly, that are more expressive, that provide stronger safety guarantees with lower investment of human time. So I think back to my question originally, I talked about the workload of foundation DBs and SCDs are so specialized, right? What usually comes with it is the team is quite specialized, right? Not every engineer actually understands

Starting point is 00:31:24 everything is the working beneath it. That's right. Just earlier, your gen Cs never even log into Amazon anymore. That's right. What's happening? You know, we're talking hypervisors. now we have like so many layers of things, people are completely abstracted away,

Starting point is 00:31:38 and LMs are making even worse, right? They may not even know what code looks like, which is happening that I'm seeing. And so I'm just curious for you working with your users or customers, how does it actually really work to adopt antithesis in the first place? Because if you're such a low-level framework in some level, and you do such a low-level system like interaction mocking and determinism, that might be still too very daunting.

Starting point is 00:32:06 Like, okay, for me to figure out, okay, how do I mock my sis calls and that kind of stuff? It's just too much. But the abstraction layers on top are quite enormous, potentially, when it comes to like ease of use or programmability or something like that. So how do you find a path into a company

Starting point is 00:32:20 that makes it doesn't feel like I'm buying a new, crazy, you know, risk architecture and CPU that had to reinvent everything, you know? Well, that's not what our users actually have to do, right? Like, we've done all that work for you, way that's completely transparent. You're just writing a Python integration test the same way you would have otherwise. It just acquires superpowers

Starting point is 00:32:41 when you run it in our environment. You don't have to do anything that looks totally crazy to you. But Tim, there is something kind of lying underneath what you're saying that we should talk about directly. I think everybody said all the same stuff

Starting point is 00:32:56 when the kids were writing C instead of assembly. And they said all the same stuff. when the kids started writing Java instead of C++, like, oh my God, the horror, they don't even know what Malik is. And we said the same thing when everyone was writing Python. You're like, oh, my God, these kids, there are no types. What is this?

Starting point is 00:33:18 And one of the things that I love about startups and technology on Silicon Valley that I think we have lost a bit of over the last 10 years, maybe 15 years, is that it's a place where young people get to come and come with, new, dangerous ideas and throw away all the stuff that old people like me thought was really good and important and critical and see what shakes out of that. And I'm really excited for it. I don't think that there's, the joke about Python is that it was executable pseudocode. Let's push further. Like, what's next? I'm not, I don't think the fundamentals of engineering change. Whether you're writing Python or driving an LLM,

Starting point is 00:34:03 to write whatever it turns out, you have to have some way to know whether it worked or not. And if you want to build a product around it, you need to know whether it's working over time. And yeah, we used to write manual test plans and have a QA team. And then we added in automated testing. And that was a whole movement. You know, we were writing books about continuous integration. There was sort of a cult that formed around test-driven development and unit testing. all of those were technological changes and cultural changes.

Starting point is 00:34:36 And I think we're looking at another big technological change that will have to come with a big cultural change in engineering practices. And I think just like an LLM allows you to express your product, like what your code is should do in something that is very close to kind of a stilted form of natural language. The sorts of tests that antithesis encourages you to write are also closer to kind of a stilted form of natural language. Instead of saying, for a database at least, like, hey, if I run, you know, select star from this table with this like wear clause, I'm going to get exactly these rows back. And antithesis test might say, hey, if I run any random SQL query, just make one up for me, test framework. and I'm running them all concurrently,

Starting point is 00:35:33 none of them should see the other one's effects. Period. Computer, please fill in the blanks for me. And that now feels good to me. That feels like my tests are operating at kind of the same level of abstraction that my code is operating at. And that's where we all feel really comfortable, right?

Starting point is 00:35:51 We're kind of, we have the same expressive power on the coding side and on the verification side. That's part of why I'm really bullish about this approach to test. and about what antithesis is doing. Very cool. Well, you know what's coming, sir. We want to bring our favorite section called a spicy future.

Starting point is 00:36:09 Spicy future. Tell us, what do you believe that most people don't believe yet in your world? So we're all kind of infra people here, but we're living through this renaissance of cloud infrastructure, everything on S3, agenic ops. and I don't know how spicy it is, but I certainly think that there are many companies and products and people coming up who are going to run into,

Starting point is 00:36:39 I think, the hard, bitter lesson of infrastructure, which is that ops really matters. And you can have the best product, you can have the most sophisticated database. But if you are not truly excellent at operating it, it doesn't matter. The user experience is terrible. The reliability is terrible.

Starting point is 00:36:59 And people will flock to a worse piece of technology operated by more conscientious engineers. And I think that will remain true through this next generation of infrastructure. If you look at every Postgres wire compatible but under the hood extremely fancy database, most of their uptime and reliability guarantees

Starting point is 00:37:20 get completely smoked by companies like Planet Scale who basically say, we're going to run Postgres for you, we're going to shard it, we're going to do a bunch of fancy stuff. But the bedrock of what they offer is really world-class operations. And to double down on that, what do you see is like misting from the younger or the less experience, LN fancy people doing fancy stuff?

Starting point is 00:37:48 What kind of operational patterns or practices they just completely lost that maybe we and you and I both know better well or in any specifics, I guess? Yeah. I mean, I don't think that these are. any of these things are specific to all of them. I think this is true of all new infrastructure. The core of good ops is almost always just painstaking thoroughness. The observability is excellent. The metrics are excellent.

Starting point is 00:38:17 The log output is understandable and manageably sized. You've run books for everything. When you're building, you're thinking up front about how the software is going to fail and building a plan to make that failure palatable to you. And that way of building and developing an operating software comes from experience in painful and large outages where people are yelling at you and millions of dollars

Starting point is 00:38:46 are disappearing down the toilet every couple of seconds. And you're covered in that gross panic sweat. I think every company gets that eventually, either through their own experience or by hiring people who have it. I think, though, that it's underappreciated as a selling point of good software. I think we all focus a lot on really interesting distributed systems or really interesting kind of white papers that underlie fancy new systems. And sometimes we discount the value of the accumulated years or decades of operational experience

Starting point is 00:39:27 with the old system. That was certainly my experience the very first time I had to run a Cassandra cluster. I was like, oh, this is going to be great. It's designed for high availability. It'll just stay up.

Starting point is 00:39:36 And that, of course, was completely wrong. It was down all the time. And it's because I had no idea what I was doing. I read the book. I read the giant O'Reilly book. I read the Man Pages. But it just didn't help that much. And given how much AI coding has changed

Starting point is 00:39:52 everyone's daily, you know, practice on coding. do you think there's also a possibility for AI to kind of change the operation side as well, like a cursor for ops type of thing or LN for ops? I don't think 100% replacement is possible. But I think this is the world we're like, we don't even know yet, right? I haven't really seen like an ops co-pilot or certain thing like that truly pervasive yet. Like what do you think is possible?

Starting point is 00:40:19 I don't know. I think that's a really good question, you know. And I am kind of hoping that this is finally the moment, at least from my perspective. that somebody finds a really good use case for distributed tracing, I think part of what makes LLM so effective at writing code, like way more effective than I would have expected five or seven years ago, is that as an industry, we have been very diligently producing a lot of really excellent open source code for it to train on.

Starting point is 00:40:46 I don't think we've been doing that for post-mortems, for root cause analysis, for all the bug reports, and pull requests that we made to fix various. bugs. We just haven't really done that. And so I'm not sure where the corpus of data to learn from would come from there. I'm by no means a machine learning expert. I'm a little unsure why I would expect a large language model to be the right approach to that problem instead of some older machine learning approaches that are not like purely ingesting post-mortems, I guess. It feels like a very differently shaped problem that is not really about token prediction. I don't know. Tim,

Starting point is 00:41:31 you probably know more about this than I do. What do you think? Well, I know enough that I don't know how to predict a future anymore, so that's really... Wait, so why are we doing this segment? To get your take, because you're supposed to expert on the hot seat, not us, but you know, like funny enough, I feel like, like you said, there is no way to be 100%. I don't think there's a way to solve a problem, not even AI coding agents haven't solved any problems but definitely would change the practice. So I think there is certainly

Starting point is 00:42:02 a very high interest for us to even figure out like what's the next future. Because tracing, like you said, was almost like a unusable data source. Like it's very usable only a certain case and certain type of teams. But if LMs are getting so good, there is

Starting point is 00:42:18 data plus maybe even like a U.S. problem to it as well. And like the model may or may not has to solve everything at once. Because I think there has to be a human loop in a very much way. But like today it's either AISRE that I'm going to solve everything for you or you're back to them to the caves, right? That's fair. In the middle, there's nothing, right? Yeah. Tim, I do have to, I want to walk back what I said before a little bit. I'm reflecting on my own operational career. And actually we'll say there was one stretch of my career where I was

Starting point is 00:42:53 I was rolling out this new production critical system, and it was me and one other engineer, mostly working on it. The other engineer was just one of the most brilliant, competent people that I've ever worked with. And my on-call routine, actually, was this was long enough ago that you accessed production through a set of individually named jump boxes.

Starting point is 00:43:14 Oh, yeah, yeah. And my on-call unlock was that I figured out which jump box was his favorite. And so whenever I was on call, I would go to that box and I would use the Unix SU command to switch to be his user and then I would control R

Starting point is 00:43:29 through his shell history. What did he do last time he was on call? Probably whatever I need to do is in here somewhere. Let me think hard about that. That feels like something that LLMs could do. Maybe Warp is doing this already. I think the other thing also is that if you talk to a lot of really seasoned essays,

Starting point is 00:43:46 they will kind of tell you, I think, that almost universally the right answer to a producer production outage is to roll back one of the things that recently changed. And ideally to just roll back a lot of it. Like, whatever has changed since the last time everything was working, just roll it all back and see if that fixes it. Bringing things back to antithesis, like, one of the things that we often test for our

Starting point is 00:44:10 customers is the safety of cluster upgrades and rollbacks. And like, what does the world look like in that mixed version state? And when you roll back, does everything go back to operating problems? If you had that as a system guarantee, what an agent or what a human needs to do is very straightforward. Every outage, you just roll back everything that has changed in the last hour and then wait and see. And that's not even really AI. That's more of a shell script. But that means running your stuff, not just in testing anymore, right? No, it just means that all of the software that you're deploying has been tested to roll back properly.

Starting point is 00:44:48 Got it, got it. So that you don't have to stop it. Exactly. You don't have to stop and think like, hey, is this rollback safe? Just make sure that you actually have rollback, which is also not easy. That's true. You're making it sound like, since you all have the right tools and the right people, I'm just going to make sure they all run. But realistically, how many products actually has safe rollback everywhere and even global

Starting point is 00:45:12 rowbacks anywhere? That ain't easy at all. Yeah, no, that is not hard. I mean, there's not easy. Yeah, yeah, because it's not just safe. state. Sometimes it's just like flakiness. You know, I'm running on Amazon. Amazon doesn't go up all the time. Are we at a right time, right mood, right vibes? I feel like we always has vibes. Like forever, I've been an engineer. It's like today I think November 2019, EBS is still running okay right now. It's like a seasonal thing, you know? Like do we have COVID

Starting point is 00:45:45 or flu right now? Like it is weird though, because you have no control of the hardware we're running anymore. And even if we have the hardware boxes, you don't even feel like you have full control either. And so we've been really trying to kind of like playing with vibes even more and more over time. And so I don't think we always trust the roadbacks always will work, right? That's true too, for sure. So it's been always just so interesting to watch how the industry are just hiding one distraction away when we haven't felt like we fix anything down at the lower level. but it works good enough, right? It worked good enough, you know?

Starting point is 00:46:22 I think it works until it doesn't, you know? Part of building things and, like, improving reliability and improving correctness for this stuff, at least some of that investment has to go to fixing the bottom layer and working your way up. Yeah, yeah, which is really, I think, where you guys are doing are really amazing in this layer. Cool.

Starting point is 00:46:39 Well, we have so many who can talk about, but based on time, what's the last question is, where can people find you and a thesis is? Where is a place they can maybe learn more or reach out if they're very interested to use to what you guys are doing. I think the best place to go is the Antithesis website. So it's just antithesis.com

Starting point is 00:46:57 and we've given a bunch of fun talks, often featuring old school Nintendo games and you can find those on YouTube. Our handle is at Antithesis dash HQ. Amazing. Well, thank you so much for having

Starting point is 00:47:12 being on a pod and I hope you had fun. Absolutely. It was great to see. Thanks, Tim. Thanks, Ian.

The Infra Pod - Building a bug-free vibe coding world (Chat with Akshay from Antithesis)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.