The Infra Pod - Building a bug-free vibe coding world (Chat with Akshay from Antithesis)
Episode Date: January 12, 2026In this episode of the Infra Pod, hosts Ian Livingston (Keycard) and Tim Chen (Essence VC) interviewed the Field CTO Akshay Shah of Antithesis, diving deep into the world of distributed systems, relia...bility, and the future of software testing. The conversation covers the challenges of building bug-free distributed systems, the story behind Antithesis, lessons from major outages, and the evolving landscape of infrastructure and AI-driven operations.Timeline with Timestamps:00:00 – Introduction & guest background02:00 – What Antithesis does and why it matters06:00 – Real-world impact: Testing distributed systems (etcd, Kubernetes)09:00 – Major outages & lessons learned (AWS, Knight Capital)12:00 – The origins and philosophy behind Antithesis16:00 – The future of reliability, testing, and AI in infrastructure28:00 – Closing thoughts & where to learn moreLinks:Learn more about Antithesis: https://antithesis.comAntithesis on YouTube: @AntithesisHQ
Transcript
Discussion (0)
Welcome to the InfraPod, Tim at Essence VC and Ian let's go.
Hey, Tim. This is Ian Livingston. Super excited. Builder of trusted agent software,
make an identity cool again at Keycard. We today are joined by the field CTO of antithesis,
Aksh Shay Shaw. Actually, tell us about what in the world, antithesis is, what they do,
and maybe you can tell us a little bit about what a field CTO is for all of us trying to figure it out at home.
Hey, Timney, and thanks for having me on.
At Antithesis, we just built something very simple.
We build the best way to ship bug-free distributed systems.
So if you're building some multi-node system and you're storing some data,
you probably don't want to lose it.
And we build software that helps you make sure that your system does what it says it's going to do.
And my job here is to be field CTO, which is kind of an ever-evolving mix of sales and
marketing and Devrel and product.
And I think it's a nice title for someone who has a technical background, but wants to
pitch in on the business side of things.
And wear a lot of hats.
Amazing.
And what got you to want to join Intithesis?
What was it about it?
What did this is doing?
That made you want to jump on their journey and join the wave that they're building.
Well, if you've ever had to build a new distributed system from scratch,
you face this very awkward moment very early in the project where you're writing a lot of code
that is extremely difficult to get under test. And usually that code is in some error handling block.
It's like, oh, if I try and contact the other node and I can't, what do I do? Or if I get a block of
data back from another node in my system and the checksum doesn't match, what do I do now?
And those things tend to be very, very hard to test because they're not on the happy path.
And in order to get there, you need to introduce some error into your system.
Packet loss or a network partition or bad hard drive or bad CPU scheduling or a node that has way too many noisy neighbors, something like that.
Up until Antithis is launched, the best way for you to test that stuff was with Jepson.
Jepson's a really effective framework, but it's a lot. It's a lot of Lisp. It's non-determined.
It's kind of painful to work with.
And so the last time I was doing this,
I was starting to build a new distributed thing from scratch.
I had dusted off my parentheses and my, you know,
parietit config.
And I was starting to write my Jepson tests.
And that's when antithesis came out of stealth.
And so I called them right away on day one.
And I think the account exec who picked up my phone was a little taken aback
because my basic message to him was,
I don't believe that any of this works.
This sounds like nonsense.
Nonetheless, I'm desperate, and I have my credit card.
I don't really want to hear the pitch.
I just want to buy this thing, and I want a three-month cancellation clause,
and then let's just get going.
And it turned out to be amazing.
The product was super effective, and it was so incredible to me
that I wanted to join this company's journey.
Can you help for us at home understand,
like what are the example or use case for entities helps you solve,
like, the distributed systems or bugs or complex things?
and why in that specific situation,
it's so fundamentally different than what came before
and how it empowers people to build trusted distributed systems?
Absolutely.
I think the easiest place to start
is with a concrete piece of software that we're testing today.
So antithesis helps the CNCF test at CD.
And if you don't know what that is,
at CD is a key value database that is distributed
and strongly consistent.
and it is the heart of storing state in a Kubernetes cluster.
So that means every time you do anything in Kubernetes or a new machine comes up or a new deployment goes through,
the critical path of that is going through Etsy.
And the LCD project has a long history.
It started many years ago at CoreOS and then became part of Kubernetes.
And so over the years, it has gotten bigger and more complicated.
And it's now in this linchpin position in the infrastructure world.
that it wasn't in when they first built it and designed it.
And over the years,
Etsy started to get the kind of bug reports
that every distributed system gets,
that every engineer hates to see,
where someone pipes up on GitHub and they just say,
hey, look, I don't really know what's going on,
but I'm running this thing,
and I'm getting these weird error messages,
and then all of a sudden some of my data seems to just disappear.
Or I've got a client connected to Etsy,
and it's just not receiving some of the data,
that it's supposed to.
And I can't really reproduce it,
but it happens every so often.
I'm sure it's happening.
And as a maintainer, you come and you look at that,
and you're like, what am I supposed to do with that?
It happens sometimes in my cluster once a month,
which is pretty often,
but I can't run your cluster for a month
just to try and reproduce this bug.
And so that stuff just piles up and it sits there.
Google invested quite a lot of time and money and effort
in improving Etsy's testing.
They did an amazing job at it.
They squashed a whole bunch of bugs,
but there were still some bugs outstanding.
And that is stuff like I described,
like clients are just not getting the data that they're supposed to.
And what we did is we come in and we helped Etsy,
and one of the LCD maintainers,
take the test that they had and run them in the antithesis environment,
which is special because it is perfect.
perfectly deterministic.
So anytime you find a test failure,
it immediately becomes perfectly reproducible every single time.
And we pair that reproducibility with a really powerful exploration engine.
You can think of it a little bit like a fuzzer.
And what that does is it finds the deepest, gnarliest corners of your code
and make sure that they work as they're expected to.
When you pair those together, what you end up with is
a small number of integration tests in your codebase that punch way over their weight.
With a couple of tests and with this fancy exploration engine and deterministic environment,
you're able to really thoroughly test extremely complicated systems.
We can talk about what that exploration engine does or what it means to explore a program
or some kind of analogs for this sort of thing.
Where do you want to go from here?
I mean, I think it would be really useful to also walk through, like, why?
What's the cost of these kind of bugs to infrastructure, to the applications we build,
how's the impact reliability, how's the type use cases we can tackle, right?
Because I think to many, maybe not the listeners this podcast, let's say less technical people,
things like asset-compliant transactions and, you know, seven-nines or five-nines or four-nines
or three-ninths availability.
These don't mean anything.
They don't understand the impact of what it takes
to make systems of that reliability
and the consequences of a system
that isn't that have those levels of durability
to reliability of high availability.
So could you help for everybody?
Why does this matter?
Right?
Like what's the fundamental dollar sign reason
for the need for these types of things
in a world that's increasingly run by computers?
That is a great question.
Let me ask you a question in return.
Are we totally tired and done
talking about the AWS
on this podcast.
We haven't even talked about it yet.
Oh, my God.
I mean, Tim might be tired because he's talked about it enough, but I'm not ready.
So, yeah, let's go for it.
People never even log on Amazon.
That's the joke I have.
Gen Z people never even seen the console.
We're just getting started, man.
Like us looking like this, we probably got tired, but you might even introduce what Amazon
is for something.
Okay, well, you know, if you haven't heard of it, Amazon.com is a bookshop on the internet.
Amazing stuff.
And they spun off this cloud computing division, and so it turns out tons of the stuff you do every day, from your smart light bulbs to credit card swipes at your favorite store, to booking a hotel room, whatever.
It all runs on Amazon's cloud computing thing.
And for a bunch of reasons that, at least in my opinion, are actually pretty reasonable.
All of AWS still has a dependency on this one system deployed in one small set of buildings in one.
little region in Virginia.
And there was a bug in that system.
It was a very complicated bug.
It was very deep.
A bunch of things had to go wrong in just the right way.
And that system broke.
What was it?
A couple of weeks ago.
And when that system broke,
it got all of AWS
into what's called a metastable failure.
And that just means that it's broken,
but it's broken in a way
where it wants to stay broken.
It doesn't want to heal.
And that just took down
this enormous swath of the internet. Credit card payments were down, checkout on a million stores,
like online websites were down, websites were down, all sorts of stuff that you wouldn't expect
to even have an internet dependence. Turns out, like, you can't open your smart garage door,
your alarm system doesn't work, all kinds of things are down. This has this enormous cascading
economic impact. That is a gigantic example, but there are tons of small examples too.
Capital years ago had a bug in one of their high-frequency trading systems.
So again, a large and complicated piece of software that encountered a particular situation
that the people testing that software hadn't thought to test, and it just went awry and blew
up the whole company in less than an hour.
This stuff happens all the time.
That's the outage side of things, where obviously if you have a sufficiently large outage,
it costs money.
I think the less recognized side of reliability is that even if you don't need a system that is incredibly high reliability, when people talk about three-nines or five-nines, they're really talking about how many seconds or minutes or hours per year can your system be down.
And five-nines is a very, very high bar. It means you have seconds of downtime a year. Most people are not building systems like that.
So maybe you can happily tolerate hours or even a day of downtime per year.
But what you're expecting to get in return for that relaxed uptime requirement is going faster.
You want to say, I want to ship more features, I want to deliver more value to my users,
because I don't have to be quite so painstaking about every little bit of code.
Well, having a more effective way to test actually lets you deliver on that,
because you can have a handful of tests and be totally sure that you are going to hit your uptime guarantees,
even with the most outlandishly aggressive plans, the biggest refactors, the most gigantic new features that you want to ship,
you can actually put your pedal to the metal and go as fast as your team can without worrying that you're going to tank your quality.
So I think testing and reliability gets you both.
It avoids big public outages where you're paying out for SLA breaches.
and it buys you more revenue, more value delivered, and more product velocity.
So I think when we all think of your company, we all think of the very first block post
that announces the company, which says, we've been stalled for five years, right?
I mean, I kind of just stuck in my head and stuck in everyone's head for the biggest reason
because it's not common to see that.
So I know you only joined, like, not that long ago, but I'm sure you heard and know the story.
So tell us, like, what is the birth of this company been like?
Because it was the foundation DB team, right?
That's right.
Come into and to build this sort of, like, pretty interesting low-level hypervisor
that can do deterministic executions and stuff like that.
So, yeah, give us the, give us, like, what was the five years about?
You know, do you recommend everybody else to do this?
Yeah.
Well, the short answer is, no, I certainly do not recommend that everybody go down this road.
Tim, you of course know that
paying a team of salary for five years
requires a certain amount of fundraising chutzpah,
which not everybody has.
But you're right to say that the origins of antithesis
are back in FoundationDB.
And FoundationDB was a while ago now,
so a lot of people haven't heard of it.
It was one of the first
strongly consistent distributed databases.
And all that means is that
it's a massively scalable multi-machine database
that feels like a regular single node database.
The answers you get back are always correct
from an application developer's perspective,
which is a useful property for a database.
It's very hard to work with situations
when that's not the case.
And they proved that you could do this
be fully correct
and remain incredibly highly available.
And they're kind of famous for only shipping one bug
to users ever before the aqua.
And commercially, the story of FoundationDB, I think is pretty amazing.
They built this software, they launched it to the world, and their two big lighthouse customers were Snowflake and Apple.
And at least from the outside perspective, it seems very clear that Apple bought this software, looked at it and said, oh my God, we're going to end up building our whole business on this database.
we can't possibly buy this from a vendor.
And so shortly after, they just rolled in
and bought FoundationDB lockstock and barrel.
The key to FoundationDB's feature velocity and reliability
was their really idiosyncratic approach to testing.
They built the entire database to be completely deterministic
and to be explored by this kind of software fuzzer
very similar to the way antithesis worked.
but they did it in their application code.
And that meant that FoundationDB could have no external dependencies.
No Postgres, no Zookeeper, no LCD, no S3, none of that,
because it would break all of their testing.
So after they succeeded, all the people there went on to careers at Apple and Google and
meta and all the places you'd expect high-flying distributed systems people to go,
they got back together and they said, you know, we have now been on our walkabout through the industry.
and nobody tests the way FoundationDB did.
They're all doing it the wrong way,
and they're paying the cost in slow feature delivery
and bug-ridden software.
There must be a way to take this approach from FDB
and package it up so that it's usable by everyone
without completely changing your application code to accommodate it.
And they said, of course, the answer is obvious.
We must build our own virtual machine
that is perfectly deterministic.
And then you can take any software you like
and run it in our VM
and it will get all of these magical properties for free.
And that, it turns out, is a very big undertaking.
We use the word hypervisor a lot.
Virtual machine is another good word for it.
But if you're an application developer
and you're writing your apps and Python or Java or Go,
you have to go down several layers
to get to where antithesis really begins.
See, we're not writing a Go program.
We're not writing the Linux user space.
We're not plugging it at the Linux kernel.
We're going one layer below that.
And we're writing, we're emulating the layer where the kernel interacts with the hardware.
And we're intercepting those calls.
Some of them get passed through to the real hardware because they are deterministic by nature.
Others, we sort of intercept and fake out.
And that's stuff like, what's the current time?
How do I schedule this thread?
Give me some randomness that normally is seated and provided by the hardware.
We intercept all of that and make it perfectly deterministic.
And that's the basis of really powerful exploration.
You can think of this like playing a Nintendo game.
If you want to build a program that can kind of blindly beat Super Mario Brothers,
if you're of roughly our age, you've had this experience.
You're sitting down, you're playing a video game, you really want to beat it.
But every time you die, you go all the way back to the beginning.
It's a pain in the butt.
And so it's just really, really hard to get to the end of the game.
And back in the day, they actually built a product to fix this.
They called it the game genie.
And it literally was like a physical piece of hardware that would sit between the game cartridge and the console.
And it did exactly what Antithesis does.
It took that cartridge and it started monkeying with the bits.
the system to make it easier to play the game.
And one of the things that, if I remember right, it offered was it allowed you to save the game
in games that didn't normally support saves.
And that makes it dramatically easier to beat the game, to find all the secrets, to get
to the hidden levels.
That's what determinism gives us for Etsy.
It lets us save the game wherever we want.
So when we find one interesting fault and we think we're on the trail of a cool bug,
we save the game
and then we can always restart from there
instead of going back to the beginning.
And I'm curious because
I worked on Kafka,
you know, I went on Mesos
and it's both
pretty much different type of statefulness,
but it's definitely just your systems.
We run into all kinds of
cluster cascading failures
and replication bugs and it's just
hard to do
those kind of testing, right?
And also we're smaller teams.
you really can't enable to test everything.
So I still remember the days we have to basically do a bunch of like,
not super granular, but like, I guess more like integration tests,
but like different scenarios ourselves to try to like catch these weird problems.
I guess my question here is like I feel like SED, Kafka's, right, Zookeeper,
all these systems, they definitely power a lot of things.
You had to be much more careful there.
But I don't think there's that many systems like that.
Right? So you went through all this to build a pretty beefy, you know, hypervisor that can basically like mock up any system interaction and, you know, save the world, get the game state world and just go back anytime. I think that's great. But where do you think this applies beyond the cough gutted NCDs and the foundation DBs? Because I think those things, they typically are very low level, very scalable, very critical. But I feel like that. I feel like that.
likes criticalness and scalable, also is a spectrum, right?
Absolutely.
Do you want to focus on those set of things, or do you find your FULCT or your users or your
customers are actually not just building Kafka's or not just building zookeepers?
What are they building that they actually see this as seriously crucial for them?
That's a great question.
So I think people build all kinds of things where they have a clear idea of correctness in their
minds, but either they're unable to meet their goals or they're not sure how to get the
reliability they want with lower time and engineering investment. So I can give you some examples
of both of those. One really commonplace where you're actually, from an infer perspective,
at least, pretty high up in the stack, but you have really serious correctness guarantees
are anything involving money. Like if you're building stripe or column, you're building stripe or
column or ramps like internal ledger, that's a place where losing money is a big deal. You can't
have debits without a corresponding credit somewhere. Similar systems that take that to the next level
are places like Knight Capital. Anywhere where you have software that's moving around and investing
money, your correctness guarantees get pretty extreme. Those are places that already tend to be
on board with spending money and spending time to improve reliability.
The bulk of software, which I think is probably what you're hinting at,
it's like business software, right, where you say, yeah, like, you know, I opened the Uber
app, I click to get me a ride button, and like, I would love it if a car showed up and picked
me up.
Like, if it's not working, just like swipe that app away, open up Lyft, open up Waymo, and
like, surely somebody will get me a car.
Or I'll hop on the bus.
It'll be fine. Well, I can tell you when I was at Uber that our perception of that,
right, was that this is a crisis. Like, if you are not taking Uber, you probably are taking a lift.
And that is a big problem for us as a business. And we have some clear guarantees in our minds
about what it looks like to get a trip. And no, it's not a database, but it is sort of one business
transaction. And the idea is that if we dispatch you a car, that trip has to end in one of
a handful of states eventually, very, very reliably. And that's where something like
Temporal gets born, right? And any system you build on top of temporal needs to obey those same
guarantees. The trip starts, either it gets canceled, or eventually the trip ends and you are billed
and we collect the payment. That's a guarantee that took years and untold numbers of millions of
dollars of investment to actually get right at Uber. I assume that
The same thing is true of Lyft and of every company like this.
The same thing is true of every sufficiently complicated web application.
We're just on the front end side.
It's like a video game.
There's so much state.
There's so many little user journeys you can take.
And it's so common now to find places in the app that are just completely broken.
Like, well, I click these six links.
I'm on this page and my cart disappeared.
I just can't get to it.
Let me hard to refresh this.
see if I can make it work now. If it's cheap and easy to fix that, I think the market for this
is enormous. Everybody wants their software to be best in class to delight their users and to work
reliably. It's just that not every problem is worth spending $50 million over 10 years to make
completely bulletproof. You know, I spent three-ish years at Salesforce in 2012 through 2015.
And people at the time spent a lot of time inside the organization.
And the thing about Salesforce is it's described by most people's a giant database in the sky
that has like drag and drop user interface.
It's very extensible.
But basically someone slapped a UI over an asset compliant database and then made it extensible.
And that's created a $350 billion company.
And the core thing about Salesforce that it actually sold to people was asset compliance.
that if you did something in the UI, it actually happened.
And there were guarantees around that.
And there were guarantees around different pieces of code
or different things or rules or whatever
that could be run prior to that transaction occurring.
And at the time, this is what they called Apex and Salesforce apps.
And that was revolutionary for that type of customer
because it put programming in the hands of less sophisticated people,
but enabled them basically become programmers.
And they have this whole community called the sales,
Admin's community, which hundreds of thousands of people whose entire identity is basically,
well, I'm a Salesforce admin, which is a type of developer.
But at the core of it, always comes down to this sort of property is that, and which is one
of the reasons that makes it very difficult for Salesforce to change anything about what it does,
is that it must guarantee certain types of things occur in an asset-compliant way inside
the context of the Salesforce instance.
And it strikes me to think, you know, we're entering a world with agents.
We have things like temporal and durable workflows.
we have the rise of things like sandboxes.
We're getting to a world where it's not just like a dream
where we can spin up many ephemeral instances of a piece of software
and we can test it and figure out which is the right one
and do tons of mutations in the code and figure out, you know,
like we're getting there.
And so it would be really great to hear from you
how you fit, how you think about things like these durable frameworks
like temporal or others
and things like antithis fit into this new world, right?
where we can basically reason and guesstimate and describe intent to a model,
and it can spit out a really good suggestion and how what the future of agentic coding looks
like, but what also the future of crafting systems looks like, knowing that we have this thing
that's pretty amazing, it's very non-deterministic, but ultimately what we actually want
to software most of the time is very deterministic behavior.
How did you all approach this at Salesforce?
Because in theory, underneath Salesforce is a data.
database that's acid compliant.
So, like, why is it so hard to make Salesforce just do what its infrastructure does?
I mean, at the core of it, it comes down to scale, complexity and number of use cases to do it, right?
And so, you know, a lot of people sit back and think, well, the truth of what sales force made Salesforce so valuable is, yeah, it's a database in the sky.
But they made it an incredibly extensible database in the sky where you could bring your own data.
You could enforce that data model.
you can write hooks around that data in pre-processer steps,
and that was consumable and made possible by, you know,
less sophisticated engineers.
You didn't have to be an Oracle developer.
You didn't have to be, you know, spend, have a PhD to do it
or understand asking compliance.
It just worked that way.
So ultimately, what Salesforce, if you listen to Beniof,
what Salesforce really sells to its customers,
first and foremost is trust is we're going to give you,
put something in your hands,
and then we're not going to let you do something that would violate trust,
and we're not going to let your data get leaked,
and we're not going to let you lose your data
because this is your most important data
because it's your business data
and it's how you run your business,
and that's basically what Salesforce sells.
And at the core of it,
I mean, Salesforce is a giant or a instance,
but there have been with millions and millions
and millions and millions of hours put into scaling it
and architecting it and building systems around it
that make it possible to deliver the guarantees
across the vast number of use cases that Salesforce supports for the biggest customers in the world
running vast quantities of data.
All of those dimensions just mentioned are actually the thing that makes Salesforce valuable
in the same way that Twitter was never just in MySQL database.
It was actually a real-time feed, and so you could not replicate Twitter at scale with just
a post-christ instance or MySQL.
You actually had to go build a very complicated distributed system dealing with the different
shapes and formulation of the data based on how you wanted to query it.
And Salesforce is very much the same thing.
That makes a ton of sense to me.
And I think that you're right, that that leads us very directly to things like temporal
and now the many other durable workflow frameworks that are out there.
And then even more so to LLM authored kind of ephemeral code.
One of the things that I would imagine makes Salesforce somewhat tractable as a product
is that you're not letting users write arbitrary database operations.
They still have the constraints of the Salesforce user interface.
They're allowed to plug in in certain places using kind of a visual or a programming language that has some constraints around it.
And then you test really carefully to make sure that given that structure, that the platform guarantees still hold true.
The same kind of thing is true of temporally.
The temporal workflow, right, is divided into two separate types of things.
there's glue code that can be non-deterministic,
but then there's the core of the business logic,
which you're just required to make deterministic.
There's not a ton of help.
There are not really any safety rails there.
You just have to do it properly.
And if you don't do it properly,
the guarantees of temporal the whole system just fall apart.
There are a ton of problems that come up in engineering at scale
and over time,
where your temporal workflows might be perfect at any given commit.
But when you deploy the next commit, you break everything because you made a change that's not compatible with the old workflows.
This is even harder in what you're describing as an agentic coding environment, where now you're trying to provide guarantees around arbitrary code that's doing who knows what.
I think it's always helpful with those kind of systems to have a clear sense of, hey, for the platform I'm building, what are the bedrock guarantees I provide?
If you are Salesforce, one of your bedrock guarantees might be,
no matter what code, our new LLM integration is writing and plugging into Salesforce,
agents are not able to access data that the user who's invoking the agent cannot access.
Like period, the end, that is fundamental to the trust, as you say, that Salesforce is selling.
Well, that's a property that if you look at it the right way, is a lot like asset compliance
and would benefit from being really exhaustively tested
without having to write 800 million flaky integration tests.
They're like, what about this bag of code?
What about this one?
What about this one?
What about this table of data within Salesforce?
What about this other one?
You want this to be a generic safety property
that you kind of smear across everything in your platform.
I think that's the way that we see this sort of testing evolving,
that no matter who is,
is writing the code or where it's coming from, the faster the code is changing and the more of
it you have, the more you need to up-level your testing strategy to just be higher octane to keep up.
LLMs are the latest step change in how quickly we're producing code and producing business logic.
And to keep up with that, we just need tests that speak more loudly, that are more expressive,
that provide stronger safety guarantees
with lower investment of human time.
So I think back to my question originally,
I talked about the workload of foundation DBs
and SCDs are so specialized, right?
What usually comes with it is the team is quite specialized, right?
Not every engineer actually understands
everything is the working beneath it.
That's right.
Just earlier, your gen Cs never even log into Amazon anymore.
That's right.
What's happening?
You know, we're talking hypervisors.
now we have like so many layers of things,
people are completely abstracted away,
and LMs are making even worse, right?
They may not even know what code looks like,
which is happening that I'm seeing.
And so I'm just curious for you working with your users or customers,
how does it actually really work to adopt antithesis in the first place?
Because if you're such a low-level framework in some level,
and you do such a low-level system like interaction mocking and determinism,
that might be still too very daunting.
Like, okay, for me to figure out,
okay, how do I mock my sis calls
and that kind of stuff?
It's just too much.
But the abstraction layers on top are quite enormous,
potentially, when it comes to like ease of use
or programmability or something like that.
So how do you find a path into a company
that makes it doesn't feel like I'm buying a new,
crazy, you know, risk architecture and CPU that had to reinvent everything, you know?
Well, that's not what our users actually have to do, right?
Like, we've done all that work for you,
way that's completely transparent.
You're just writing a Python integration test
the same way you would have otherwise.
It just acquires superpowers
when you run it in our environment.
You don't have to do anything
that looks totally crazy to you.
But Tim, there is something kind of lying
underneath what you're saying
that we should talk about directly.
I think everybody
said all the same stuff
when the kids were writing
C instead of assembly.
And they said all the same stuff.
when the kids started writing Java instead of C++, like, oh my God, the horror,
they don't even know what Malik is.
And we said the same thing when everyone was writing Python.
You're like, oh, my God, these kids, there are no types.
What is this?
And one of the things that I love about startups and technology on Silicon Valley that I think
we have lost a bit of over the last 10 years, maybe 15 years,
is that it's a place where young people get to come and come with,
new, dangerous ideas and throw away all the stuff that old people like me thought was really good
and important and critical and see what shakes out of that. And I'm really excited for it.
I don't think that there's, the joke about Python is that it was executable pseudocode.
Let's push further. Like, what's next? I'm not, I don't think the fundamentals of engineering
change. Whether you're writing Python or driving an LLM,
to write whatever it turns out, you have to have some way to know whether it worked or not.
And if you want to build a product around it, you need to know whether it's working over time.
And yeah, we used to write manual test plans and have a QA team.
And then we added in automated testing.
And that was a whole movement.
You know, we were writing books about continuous integration.
There was sort of a cult that formed around test-driven development and unit testing.
all of those were technological changes and cultural changes.
And I think we're looking at another big technological change
that will have to come with a big cultural change in engineering practices.
And I think just like an LLM allows you to express your product,
like what your code is should do in something that is very close to kind of a stilted form of natural language.
The sorts of tests that antithesis encourages you to write are also closer to kind of a stilted form of natural language.
Instead of saying, for a database at least, like, hey, if I run, you know, select star from this table with this like wear clause, I'm going to get exactly these rows back.
And antithesis test might say, hey, if I run any random SQL query, just make one up for me, test framework.
and I'm running them all concurrently,
none of them should see the other one's effects.
Period.
Computer, please fill in the blanks for me.
And that now feels good to me.
That feels like my tests are operating
at kind of the same level of abstraction
that my code is operating at.
And that's where we all feel really comfortable, right?
We're kind of, we have the same expressive power
on the coding side and on the verification side.
That's part of why I'm really bullish
about this approach to test.
and about what antithesis is doing.
Very cool.
Well, you know what's coming, sir.
We want to bring our favorite section called a spicy future.
Spicy future.
Tell us, what do you believe that most people don't believe yet in your world?
So we're all kind of infra people here, but we're living through this renaissance of cloud infrastructure,
everything on S3, agenic ops.
and I don't know how spicy it is,
but I certainly think that there are many companies
and products and people coming up
who are going to run into,
I think, the hard, bitter lesson of infrastructure,
which is that ops really matters.
And you can have the best product,
you can have the most sophisticated database.
But if you are not truly excellent at operating it,
it doesn't matter.
The user experience is terrible.
The reliability is terrible.
And people will flock
to a worse piece of technology
operated by more conscientious engineers.
And I think that will remain true
through this next generation of infrastructure.
If you look at every Postgres wire compatible
but under the hood extremely fancy database,
most of their uptime and reliability guarantees
get completely smoked by companies like Planet Scale
who basically say,
we're going to run Postgres for you,
we're going to shard it,
we're going to do a bunch of fancy stuff.
But the bedrock of what they offer is really world-class operations.
And to double down on that, what do you see is like misting from the younger or the less experience,
LN fancy people doing fancy stuff?
What kind of operational patterns or practices they just completely lost that maybe we and you and I both know better well or in any specifics, I guess?
Yeah.
I mean, I don't think that these are.
any of these things are specific to all of them.
I think this is true of all new infrastructure.
The core of good ops is almost always just painstaking thoroughness.
The observability is excellent.
The metrics are excellent.
The log output is understandable and manageably sized.
You've run books for everything.
When you're building, you're thinking up front
about how the software is going to fail
and building a plan to make that failure palatable to you.
And that way of building and developing an operating software
comes from experience in painful and large outages
where people are yelling at you and millions of dollars
are disappearing down the toilet every couple of seconds.
And you're covered in that gross panic sweat.
I think every company gets that eventually,
either through their own experience or by hiring people who have it.
I think, though, that it's underappreciated as a selling point of good software.
I think we all focus a lot on really interesting distributed systems
or really interesting kind of white papers that underlie fancy new systems.
And sometimes we discount the value of the accumulated years or decades of operational experience
with the old system.
That was certainly my experience
the very first time
I had to run a Cassandra cluster.
I was like,
oh, this is going to be great.
It's designed for high availability.
It'll just stay up.
And that, of course, was completely wrong.
It was down all the time.
And it's because I had no idea what I was doing.
I read the book.
I read the giant O'Reilly book.
I read the Man Pages.
But it just didn't help that much.
And given how much AI coding has changed
everyone's daily, you know,
practice on coding.
do you think there's also a possibility for AI to kind of change the operation side as well,
like a cursor for ops type of thing or LN for ops?
I don't think 100% replacement is possible.
But I think this is the world we're like, we don't even know yet, right?
I haven't really seen like an ops co-pilot or certain thing like that truly pervasive yet.
Like what do you think is possible?
I don't know.
I think that's a really good question, you know.
And I am kind of hoping that this is finally the moment, at least from my perspective.
that somebody finds a really good use case for distributed tracing,
I think part of what makes LLM so effective at writing code,
like way more effective than I would have expected five or seven years ago,
is that as an industry, we have been very diligently producing
a lot of really excellent open source code for it to train on.
I don't think we've been doing that for post-mortems,
for root cause analysis, for all the bug reports,
and pull requests that we made to fix various.
bugs. We just haven't really done that. And so I'm not sure where the corpus of data to learn from
would come from there. I'm by no means a machine learning expert. I'm a little unsure why I would
expect a large language model to be the right approach to that problem instead of some older
machine learning approaches that are not like purely ingesting post-mortems, I guess. It feels like a very
differently shaped problem that is not really about token prediction. I don't know. Tim,
you probably know more about this than I do. What do you think? Well, I know enough that I don't
know how to predict a future anymore, so that's really... Wait, so why are we doing this segment?
To get your take, because you're supposed to expert on the hot seat, not us, but you know, like
funny enough, I feel like, like you said, there is no way to be 100%. I don't think there's a way to
solve a problem, not even
AI coding agents haven't solved any problems
but definitely would change the practice.
So I think there is certainly
a very high interest for us to even
figure out like what's the next future.
Because tracing, like you said, was
almost like a unusable
data source. Like it's very usable
only a certain case and certain type of teams.
But if LMs are getting so good,
there is
data plus maybe
even like a U.S. problem
to it as well. And like the
model may or may not has to solve everything at once. Because I think there has to be a human
loop in a very much way. But like today it's either AISRE that I'm going to solve everything
for you or you're back to them to the caves, right? That's fair. In the middle, there's nothing,
right? Yeah. Tim, I do have to, I want to walk back what I said before a little bit. I'm reflecting
on my own operational career. And actually we'll say there was one stretch of my career where I was
I was rolling out this new production critical system,
and it was me and one other engineer,
mostly working on it.
The other engineer was just one of the most brilliant, competent people
that I've ever worked with.
And my on-call routine, actually,
was this was long enough ago
that you accessed production through a set of individually named jump boxes.
Oh, yeah, yeah.
And my on-call unlock was that I figured out
which jump box was his favorite.
And so whenever I was on call,
I would go to that box
and I would use the Unix
SU command to switch to be his user
and then I would control R
through his shell history.
What did he do last time he was on call?
Probably whatever I need to do is in here somewhere.
Let me think hard about that.
That feels like something that LLMs could do.
Maybe Warp is doing this already.
I think the other thing also is that
if you talk to a lot of really seasoned essays,
they will kind of tell you, I think,
that almost universally
the right answer to a producer
production outage is to roll back one of the things that recently changed.
And ideally to just roll back a lot of it.
Like, whatever has changed since the last time everything was working, just roll it all
back and see if that fixes it.
Bringing things back to antithesis, like, one of the things that we often test for our
customers is the safety of cluster upgrades and rollbacks.
And like, what does the world look like in that mixed version state?
And when you roll back, does everything go back to operating problems?
If you had that as a system guarantee, what an agent or what a human needs to do is very straightforward.
Every outage, you just roll back everything that has changed in the last hour and then wait and see.
And that's not even really AI. That's more of a shell script.
But that means running your stuff, not just in testing anymore, right?
No, it just means that all of the software that you're deploying has been tested to roll back properly.
Got it, got it.
So that you don't have to stop it.
Exactly. You don't have to stop and think like, hey, is this rollback safe?
Just make sure that you actually have rollback, which is also not easy.
That's true.
You're making it sound like, since you all have the right tools and the right people,
I'm just going to make sure they all run.
But realistically, how many products actually has safe rollback everywhere and even global
rowbacks anywhere? That ain't easy at all.
Yeah, no, that is not hard.
I mean, there's not easy.
Yeah, yeah, because it's not just safe.
state. Sometimes it's just like flakiness. You know, I'm running on Amazon. Amazon doesn't go up all
the time. Are we at a right time, right mood, right vibes? I feel like we always has
vibes. Like forever, I've been an engineer. It's like today I think November 2019,
EBS is still running okay right now. It's like a seasonal thing, you know? Like do we have COVID
or flu right now? Like it is weird though, because you have no control of the hardware we're running
anymore. And even if we have the hardware boxes, you don't even feel like you have full control
either. And so we've been really trying to kind of like playing with vibes even more and more
over time. And so I don't think we always trust the roadbacks always will work, right?
That's true too, for sure. So it's been always just so interesting to watch how the industry
are just hiding one distraction away when we haven't felt like we fix anything down at the lower level.
but it works good enough, right?
It worked good enough, you know?
I think it works until it doesn't, you know?
Part of building things and, like, improving reliability
and improving correctness for this stuff,
at least some of that investment has to go to fixing the bottom layer
and working your way up.
Yeah, yeah, which is really, I think,
where you guys are doing are really amazing in this layer.
Cool.
Well, we have so many who can talk about,
but based on time, what's the last question is,
where can people find you and a thesis is?
Where is a place they can maybe learn more
or reach out if they're very interested to use
to what you guys are doing. I think the best place
to go is the Antithesis website.
So it's just antithesis.com
and we've given
a bunch of
fun talks, often
featuring old school Nintendo games
and you can find those on YouTube.
Our handle is at Antithesis
dash HQ.
Amazing. Well, thank you so much for having
being on a pod and I hope you had fun.
Absolutely. It was great to see.
Thanks, Tim. Thanks, Ian.
