The Changelog: Software Development, Open Source - The era of durable execution (Interview)

Starting point is 00:00:00 Okay, friends, it's time for your favorite podcast. Welcome to the change where we feature the hackers, the leaders and those who are building durable execution functions. Today, Jared and I joined by Stefan Ewen, the founder and CEO of restate, talking about the coming era of resilient applications, the meaning of, and what it takes to achieve item potency, this world of stateful, durable execution functions, and when it makes sense to reach for this tech. A massive thank you to our friends and our partners over at fly.io.

Starting point is 00:00:42 That is the home of changelaw.com. Learn more at fly.io. That is the home of changelaw.com. Learn more at fly.io. Okay, let's get resilient. Well friends before the show, I'm here with good friend David Shue over at Retool. Now David I've known about Retool for a very long time you've been working with us for many many years and speaking of many many years Brex is one of your oldest customers. You've been in business almost seven years I think they've been a customer of yours for almost all those seven years to my knowledge but share the story what do you do for Brex how does Brex leverage Retool and why have they stayed with you all these years? So it's really interesting about Brex is that they are a extremely

Starting point is 00:01:31 operational, heavy company. And so for them, the quality of the internal tools is so important because you can imagine they have to deal with fraud. They have to deal with underwriting. They have to deal with so many problems. Basically, they have a giant team internally, basically just using internal tools day in and day out. And so they have a very high bar for internal tools. And when they first started, we were in the same YC batch, actually.

Starting point is 00:01:53 We were both at Winter 17. And they were, yeah, I think maybe customer number five or something like that for us. I think DoorDash was a little bit before them, but they were pretty early. And the problem they had was they had so many internal tools they needed to go and build, but not enough time or engineers to go build all of them. And even if they did have the time or engineers, they wanted their engineers focused on building external physics software, because that is what would drive the business forward. Brex mobile app, for

Starting point is 00:02:18 example, is awesome. The Brex website, for example, is awesome. The Brex expense flow, all really, you know, really great external external software. So they wanted their engineers focused on that as opposed to building internal crud UIs. And so that's why they came to us. And it was honestly a wonderful partnership. It has been for seven, eight years now.

Starting point is 00:02:36 Today, I think Brex has probably around 1000 retool apps they use in production, I want to say every week, which is awesome. And their whole business effectively runs now on Retool. And we are so, so privileged to be a part of their journey. And to me, I think what's really cool about all this is that we've managed to allow them to move so fast. So whether it's launching new product lines, whether it's responding to customers faster,

Starting point is 00:03:00 whatever it is, if they need an app for that, they can get an app for it in a day, which is a lot better than, you know, in six months or right here, for example, having to schlep through spreadsheets, et cetera. So I'm really, really proud of our partnership with Brex. OK, Retail is the best way to build, maintain and deploy internal software, seamlessly connected databases, build with elegant components and customize with code, accelerate mundane tasks and free up time for the work that really matters for

Starting point is 00:03:29 you and your team. Learn more at retool.com. Start for free. Book a demo. Again, retool.com. We We are joined today by Stefan Ewen from restate.dev. Stefan, welcome to the ChangeWalk. Hey, thanks for having me. It's a pleasure. Pleasure to have you as well. Adam, how you doing, man? So good, Jared, how about you?

Starting point is 00:04:22 I'm doing well. Always excited at the beginning of a conversation Mm, so good, Jared, how about you? I'm doing well, always excited at the beginning of a conversation to dig into something new, something different, and something called Restate. This is supposed to be the simplest way to build resilient applications. This is a requested show, Stefan,

Starting point is 00:04:43 so we do take episode requests. This listener would like to remain anonymous. However, they say that Restate is a super exciting approach to managing distributed systems. And they say that we should get you on the show. And so we just take orders around here and our listeners often get what they want. And so that's how we found you.

Starting point is 00:05:03 What's the listener request? Awesome so that's how we found you. Was a listener request. Awesome, that's very cool to hear. Is that open source communities at work and all that. That's right. So Restate, let's not get into Restate itself at first. Let's talk about resilient apps first because this is called Tagline. The simplest way to build resilient applications.

Starting point is 00:05:25 Let's talk about that. What is exactly a resilient application in your estimation? Okay, yeah, so in the way we think of it in the context of restate, we're talking mostly about the back ends of application, the sort of coordination and orchestration logic, a resilient application would be an application that doesn't accidentally drop your order that doesn't accidentally place it twice if you hit F5 at the wrong

Starting point is 00:05:54 point, when you're in the browser that doesn't, you know, accidentally book your Uber for two people instead of for one Book your Uber for two people instead of for one. That doesn't, you know, just disconnect you from your chat bot, lose the history, make your start over, all these kind of things. This is what we call, like what we're talking about when we mean result in apps in the context of restate. So basically apps that tolerate all sorts of hiccups, errors in the infrastructure, unavailable endpoints, you know, net network failures, process failures,

Starting point is 00:06:30 temporary outages, but also like, you know, types of programming glitches that cause like, requests to fall through and having to be retried in order to, you know, be processed reliably, but then not get duplicated, but the system understanding how to either potently treat them as a retry and not as a second request. This is sort of the bigger picture of what we mean here with resultant applications.

Starting point is 00:06:55 Can you take a moment to demystify that term that you just said, idempotently? So everyone's on the same page. What does that mean, idempotently? Yeah, I think idempotently, so everyone's on the same page. What does that mean, idempotently? Yeah, I think idempotency, you can think of it as just understanding that a repeated request is actually not a new request, but it's the same request again.

Starting point is 00:07:17 You're just sending it again because you were maybe disconnected from the original request or you got an error back. It's the thing that if you don't do it correctly, that is actually accidentally placing an order twice when it did try to place it once. It's the thing that many applications don't get right and that's why you still see too many websites saying,

Starting point is 00:07:41 don't hit F5 while this is showing. We just don't want- Do not reload while we finish this transaction. Exactly, because it doesn't understand that you submit the thing again, and this is actually supposed to be the same thing. It's just like another submission of the same request. It has no way of identifying that,

Starting point is 00:07:57 so it might accidentally treat it as a second one or it has just like a very rough way of doing this. It's a surprisingly complicated problem and there's like lots of applications that don't get it correct and a lot of weird ways applications work around this. Like as a fun fact, I think my first, the first bank where it was like a customer,

Starting point is 00:08:19 they did only allow you to wire to a certain recipient, a certain amount once per day. If you're trying to wire the same amount to the same person a second time for the day, they just wouldn't allow it because they didn't know if that was like a retry from like your browser or if that was something stuck in a queue at some point in time. That's just how they did duplicate it. And yeah, so generally odd impotency is deduplication in a meaningful way, understanding retries versus new requests. Yeah, that's called overkill, I think, when they did that.

Starting point is 00:08:51 They're like, how can we just use a blunt tool to solve this tiny little problem? Let's just not let you do more than one per day. Not ideal, surely, for lots of uses. Okay, so same request twice, operates the first time, won't operate the second time. Generally speaking, how do you achieve this? I mean, you could just limit to one request per day,

Starting point is 00:09:13 but if you're not gonna do that, how do people usually implement or ensure idempotency in their applications? Yeah, so in a way, you basically have to find a way to anchor the identity of requests all the way through. There's different standards for doing that. The HTTP standard has actually a new, has defined a header where you can put in

Starting point is 00:09:41 all the potency key and the service when they support this are supposed to understand if that key is set and a previous request with the same key has come and the same parameters that this is a duplicate request for the same operation. But then down the road, you basically just try to anchor requests and the processing of different steps in each other.

Starting point is 00:10:06 When you do message queues, you place correlation IDs when you're working with databases, maybe you try to address primary keys or transaction IDs or leases or tokens. There's tons and tons of tricks people do, but it's ultimately still very hard problem to do if you wanna do it end to end. It's kind of a mindset.

Starting point is 00:10:29 You have to set out to do it, don't you? Otherwise. I think you have to design from the start for it. Like if the weakest link in the chain that breaks, it's one of these things, yes. Or you can just say, do not reload your page while you're hitting this API. Exactly.

Starting point is 00:10:43 That's the simplest way to do it. Just like make it the user's problem, don't handle it in your infrastructure, yeah. Also, as an engineer, when I've achieved idempotency, I know it feels very good when you're like, okay, I'm for sure not gonna do double execution in this particular code path. That feels good.

Starting point is 00:10:59 And as an end user, I'm also happy when I know that I'm not gonna get charged twice, for instance. That's right. an end user, I'm also happy when I know that I'm not gonna get charged twice, for instance. That's right. So you can actually see good APIs design this in from the start. I think the first API where I came across that I thought that was really, really well done

Starting point is 00:11:16 was Stripe's payment API. You can really see, I think that's also why it got so insanely popular so quick. The way they made that handling just seamless for folks that embedded this into their code to really understand how do I make sure I accept this request once and deal with all these things. That was really, that was stellar.

Starting point is 00:11:38 All right, so here's a harder question. Where does the word item potency come from and why do we use that to describe this thing? It seems unnecessarily verbose and jargony. Do you know? Just spell it first, Jared. I-D-E-M-P-O-T-E-N-C-Y would be the adjective. Do you know?

Starting point is 00:12:00 If you don't know, you can just say I have no idea. But if you know, it'd be awesome. I don't really know, but my guess is it's a Latin word. It comes from Latin. It does sound like that. It's not like, yeah. It doesn't put that, I don't know. It's all Greek to me.

Starting point is 00:12:13 Maybe we can get a real time look up for that and follow up on it from some sort of LLM. Just prompting Adam behind the scenes to prompt his favorite LLM. I'm on Wikipedia as we speak. Okay, good. I was stalling for you. You know, I don't really have the details here for you, Jared. I'm sorry I can't LLM quickly enough for you,

Starting point is 00:12:31 but it says item potence is the property of certain operations in mathematics. All right, I went straight to the LLM and I got the answer to my question. So the term item potent comes from Latin roots. So Stefan, excellent call there. Item meaning the same and potent meaning having power or being able to.

Starting point is 00:12:54 Put together, item potent roughly means having the power to remain the same. So there's the actual word. And then yes, mathematics, blah, blah, blah. I've stopped reading now, so hopefully that wasn't a hallucination and we can all move on. Yeah, that sounds about right.

Starting point is 00:13:11 Like if it's an hallucination, it's a close one. I've learned that. Yeah, I D E M plus potence, same plus power. There you go, it's the same power. Very cool. Well, thank you for scratching my itch, my curious itch there. Both of you and ChatGBT, I suppose, pitched in on that one.

Starting point is 00:13:31 What else? So we're talking resiliency. I'm curious, obviously, to have a resilient app just as good, right? Like, who wouldn't want these things? And just taking one of those things, like impotency and realizing it's hard to achieve on your own throughout especially

Starting point is 00:13:50 more complicated applications. Why did you all set out to solve this problem for folks to build Restate and why did you feel like I'm the guy for the job? Yeah, that has a long answer and a short answer. I can start with the short answer. The short answer is, it's really the, I would say the state of the art to build backends that are supposed to, backends that do any non-trivial state management and coordination

Starting point is 00:14:19 is completely unsustainable, I think, though, the way we're building this today. Just to give you an example. So let's, let's, let's actually start to accept it, stay with LLM, because we just talked about it, right? So let's say you're building, you're building a chatbot, you're submitting something like a message there. This thing in the end has to, it has to reach the LLM, but it has to look up the context in which that chat happened before it has to make the call has to reach the LLM, but it has to look up the context in which that chat happened before.

Starting point is 00:14:47 It has to make the call, has to go back, store the context. You don't want it to just lose everything if you lose your connection in the middle. If you, let's go with the F5 thing again, you don't want it to actually trigger the same request twice or lose the entire session, make you start over. So you're probably just putting this as an asynchronous request that runs in the

Starting point is 00:15:07 background that you're, you know, you're sending from your, from your chat session, from your browser. But it's a separate, like asynchronous request that runs that talks to the LLM. You want it to be actually retrying in case something fails or is overloaded and it's throttling your back. And then you want to be able to reconnect to that task or request in case something goes wrong or in your browser or you accidentally hit the back button or whatever. Just implementing this is a surprisingly complicated thing where you start to stitch together like probably a queue, a database, and a bunch of tasks to manage that. To give you another example, just talked about Stripe, right? So let's say you're sending a request there for a payment and sometimes they tell you,

Starting point is 00:15:48 look, this is good or bad, like we accepted or didn't. Sometimes they tell you, I don't really know, like off-road detector is still running or we have some weird thing in the background that we're still asking and it hasn't told us. So I'm going to send you a webhook in a moment to tell you whether this went through or not. And now you have like a synchronous request there and then somewhere else an asynchronous request coming on. You just want to make those two reliably meet. Even if this one fails, you want it to sort of like

Starting point is 00:16:13 recover somewhere, understand where to, you know, reconnect with that web hook that you're awaiting. And like this little piece, like it's really, it's really just one case handling in the backend where Stripe says, okay, I'm processing instead of like yes or no. There's actually many days of work to make that look reliable, to make that work reliable. And it's like lots and lots and lots of things like this that just like get in the way with

Starting point is 00:16:36 so many moving pieces, so many APIs to talk to, so much work, so much more work than originally happening asynchronously in separate requests than just in the synchronous user interaction. Just gluing those all together has become such a complicated thing that we felt this does need a better solution. This is like the motivation more from the let's say from the use case side, I can give you a motivation more from the, like why we actually ended up doing this. I think this is a motivation that probably lots of folks stumbled across to medley inside

Starting point is 00:17:15 legacy, this needs a better, a better solution. I think there's like different, different projects approaching that problem. Why are we approaching it the way we're doing this has to do with like where we come from, like before we worked on on restate, we were building Apache Flink is a different system is stream processing framework. Basically events and analytics. So you know, you have these events coming in, often through a message queue, and you want to, you know, aggregate them, join them.

Starting point is 00:17:47 Just like a few examples where this is used as fraud detection banks, like some payment events go in, you aggregate feature vectors, some that throw a fraud model. Or I think, you know, things like the TikTok recommender look flink to use Flink to actually join information from users and interactions together in real time and understand how to create, how to update the features that will go into the recommendation model. I think companies like Uber use it for like determine pricing and traffic models in ETA. So it's whenever you have events and you want to analyze them in a way that you aggregate them into some sort of typically statistical value or immaterialized view. This is what

Starting point is 00:18:33 we were building before. So it's an analytical framework. What did actually happen then is at some point in time, we saw folks were using that thing to solve distributed transactional processing, the types of things that where you would say, hey, let's assume an order processing service that you know, takes the event, get up, check out that check out that order. And it has to do a bunch of steps, let's say update the inventory trigger payment, call the service to prepare logistics, maybe call another service to put this in the user's history,

Starting point is 00:19:10 maybe more steps and so on. And we started to see folks using Fling for that because it had this interesting property that it had sort of this baked in way of reliable communication and state management. It was all built for analytical use cases, but they found this such an interesting property that they started to apply this to the transactional use cases as well, like auto processing. Just because they found that this is otherwise way too complicated to build and way too easy

Starting point is 00:19:39 to build it in such a way that it is brittle, not scalable, that it has corner cases where it violates a lot of these properties that we just said. When this started happening repeatedly we thought, okay, but apparently there isn't really a good tool out there yet. Apparently this property of correct stateful coordination is something people really appreciate. They feel like it makes their life easier to build these type of frameworks. And then we set out to, to build a solution for that. And that became restate. It's in many ways actually from the way it approaches things, from its architecture, it's inspired by, by our work on Apache Flink. But it's almost,

Starting point is 00:20:18 it's almost a complete mirror image implementation of it. It takes almost the opposite design choice in most aspects because it's really optimized for load-data transactional processing rather than high throughput analytical processing, which was linked. But what we retained from this idea is, yes, stateful orchestration and event-driven foundation and so on. This is something we should build and we should be working on and that became Restate. What's the timing all this?

Starting point is 00:20:52 When did Flink start? How mature was it or is it? And when did Restate be born out of the idea? Like, give us a context in time. Yeah, so this context is measured in decades, I think. Okay, that's useful. So yeah, I mean, Flink was officially founded in 2014, but the work that became Flink is back to 2010

Starting point is 00:21:20 when I was still in university. So it was like four years of academic work. And then there's Flink started 2014. I worked on it until 2022. So eight years after it became an open source project. And then I left Flink because I needed to work on something else, like for a change. And then we started working on restet. Like end of 22, early 23. So Reset is a bit over two years old now. Flink has had its 10th anniversary last year. Flink is super mature. I think it's used by thousands and thousands of companies at absolutely insane scale. at absolutely insane scale. The probably largest installation of link that I know must be Alibaba who runs tens of thousands of cores for like a single processing pipeline to like live

Starting point is 00:22:17 compute their e-commerce search and recommender continuously. Reset is not quite as old and quite as mature as it is. Yeah, two years. It's been two years. But we've released our stable 1.0 last summer and we recently released our first distributed replicated highly available architecture. And I have a bunch of folks that actually productively use that. So I would say we're on course to get there. Nice.

Starting point is 00:22:50 Was leaving Flink difficult, sad, joyous? Like what was that like when you left? Cause that's a long time to work on one thing and then to move on to that thing. I mean, you become pretty attached, don't you? Yeah, yeah, absolutely. So it is on to that thing. I mean, you become pretty attached, don't you? Yeah, yeah, absolutely. So it is definitely a difficult thing. And it was not just leaving Flynk.

Starting point is 00:23:10 So we created Flynk as an open source project, but we also built a company around that. And the company went through an acquisition, but we actually still stayed there, started building and growing the team. So I was simultaneously leaving the open source project and the company I built and everything. And absolutely, it's a difficult thing because that becomes like your babies, actually two babies in a way, right? The project and the

Starting point is 00:23:34 company. But yes, I feel that at some point I felt like there's this, I don't know how to say this in English, but I felt I was getting this like tunnel dish on problems. I've been working on this for so long. And I've seen so many sort of repeated things that like whenever I heard a problem, I was just like putting it in this into this category that I knew and that like it started happening that I did that with things and then later realized like, Oh, no, that wasn't actually the right thing for that problem that might have been for the last nine, but this one should actually have been different.

Starting point is 00:24:07 And so you get this kind of, if you work too long on the same thing, you're starting to not see the forest for all the trees. And I felt I was reaching that point. So it sounded like, yeah, I should probably start doing something new. It's kind of like familiar grooves in your brain, you know? If you just go to the same thing

Starting point is 00:24:26 over and over again, it's like your brain gets new grooves and then different problems fall into those familiar grooves and they just slide in there. Sometimes when they don't even fit or they aren't a good fit. I certainly understand that. I think when you've focused on one thing for a very long time,

Starting point is 00:24:45 it is hard to think outside that groove. And it's interesting that you're building a similar system, spiritually similar, but I guess you could say radically different architecture. Is that the way you described it? Like it's inspired by Flink, but it seems like it takes the opposite approach of the world.

Starting point is 00:25:04 Yes, I think you could call it like that. It is under the hood, still an event driven system, still as Flink is, but it just builds for completely different traders. So just to give you an example, the core of Flink is the way it is this exactly one stateful processing. So it basically has the data streams that it keeps moving operations that do stuff, stateful stuff with events, count, joint and so on. And then there's this asynchronous process running in the background that takes these consistent snapshots. So if something goes down, you can restore the state from the snapshot and sort of

Starting point is 00:25:43 like just start the flow from there and has this like kind of clever way to do this in a way that maintains consistency across all the parallel machines and it is that like efficiently incrementally frequently and so on. But it's a very throughput optimized thing. So it stays off the critical path if you wish like once in the background. So yeah, it's really good for throughput, but it, you know, it does this like persistence operation once every couple of seconds if, you know, if you do it to be like very, very frequent, most people actually run it more in the order of

Starting point is 00:26:15 minutes, right? So when, when something goes down and that in a pipeline, you just like replay the last minute of data, which typically doesn't quite take a minute because replays faster than the data rate at which the events are produced. But still, it takes you back a certain amount of time, which is usually okay for analytics. The worst thing that happens is this feature here in that vector that goes into that traffic model or that recommendation like is maybe a few seconds older than it would have otherwise been. It's not such a big deal typically. On the transactional processing side, imagine you have this multi-step process.

Starting point is 00:26:54 You want a really fast checkout process and you say, I want to just start the next step after I know the payment has gone through. Before that, I'm not updating inventory. I'm not kicking off any of the other processes, then you really need actually, like a persistent step to be recorded ideally in milliseconds. Maybe it's not that critical for the processing, but we're building this for like even more low latency use case like payment

Starting point is 00:27:19 processing and settlement and so on. And there you really just want to kick off the next step after you know, like the previous step is persisted like possibly in a multi-data center replication way. And only then do I start the next thing. So you have to really design this completely differently. It's completely optimized for low latency, transactional durability rather than analytical throughput.

Starting point is 00:27:41 So it's, yeah, it's a completely different design, even though both ultimately are event-driven architectures. Right. But your atomic unit now is like, is compute, right? Like it's logic. And perhaps data that comes from that. It's a transactional step. And so you can't just skip a transactional step

Starting point is 00:28:03 because there's a workflow here and certain things rely on other things. And so the way that you think about durability as opposed to analytical data is, like I said earlier, radically different. That makes sense to me. Yeah, exactly. The atomic step in RESTED is extremely fine grained, right?

Starting point is 00:28:24 Like we're really building this in such a way that you should feel comfortable in a program that you do to use restate to persist fine-grained steps, state updates. It uses internally actually this durable mechanism for a leader election to understand that it can lock and fence off different retries. And this is yeah, this is such a such a fine grained nature, what's really important that recording a durable step has the lowest possible latency versus in Flink, the atomic step is like a couple of million events being aggregated together in some status you get it over 10 machines. So that's

Starting point is 00:29:02 one atomic step. It's completely different, yes. Well friends, I'm here with a good old friend of mine, Terrence Lee, cloud native architect at Heroku. So Terrence, the next gen of Heroku called Fur is coming soon. What can you say about the next generation for Heroku? Fur represents the next decade of Heroku. You know, Cedar lasted for 14 years and more. Still going. And Heroku has this history of using trees to represent ushering in new technology stacks and foundations for the platform. And so like Cedar before, which we've had for over a decade,

Starting point is 00:29:49 we're thinking about fur in the same way. So if you're familiar with fur trees at all, Douglas furs, they're known for their stability and resilience. And that's what you want for the foundation of a platform that you're going to trust your business on top of. We've used stacks to kind of usher in this new technology. And what that means for fur is we're replatforming on top of open standards. used stacks to kind of usher in this new technology. And what that means for FUR is we're re-platforming on top of open standards. A lot has changed over the last decade.

Starting point is 00:30:09 Things like container images and OCI and Kubernetes and CloudNave, all these things have happened in this space. And instead of being on a real island, we're embracing those technologies and standards that we help popularize and pulling them into our technology stack. And so that means you as a customer don't have to kind of pick or choose.

Starting point is 00:30:28 So as an example, on Cedar today, we produce a proprietary tarball called Slugs. That's how you run your apps. That's how we pack to them. On Fur, we're just gonna use OCI images, right? So that means that tools like Docker are part of this ecosystem that you get to use. So with our Cloud Native Build Packs,

Starting point is 00:30:43 you can build your app locally with the tool called Pack and then run it inside Docker. And that's the same kind of basic technology stack we're going to be running in Firm. So you can run them in your platform as well. So we're providing this access to tools and things that developers are already using and extensibility on the platform that you haven't had before. But this sounds like a lot of change, right? And so what isn't changing? And what isn't changing is the Heroku you know and love. That's about focusing on apps and on infrastructure and focusing on developer productivity.

Starting point is 00:31:11 And so you're still gonna have that get push Heroku main experience. You're still gonna be able to connect your applications and pipelines up to GitHub, have that Heroku flow. We're still about abstracting out the infrastructure from underneath you and allowing you as an app developer to focus on developer productivity. Well, the next generation of Heroku is coming soon.

Starting point is 00:31:29 I hope you're excited because I know a lot of us, me included, have a massive love and place in our heart for Heroku. And this next generation of Heroku sounds very promising. To learn more, go to heroku.com slash changelog podcast and get excited about what's to come for Roku. Once again, heroku.com slash change blog podcast. This word durability is being used a lot.

Starting point is 00:31:58 Durable execution, durability. What exactly is durability? Doesn't fall down, doesn't break? Always good? Yeah, something like that. I think durability is probably the same as persistence, maybe with a bit of a stronger emphasis on it really doesn't get lost after it happens.

Starting point is 00:32:19 So durability is the D in acid when it comes to databases, right? Databases say we're giving you atomicity, consistency, isolation and durability, once you do an update, we're not going to lose it, like no matter what crashes, like the database has a mechanism to, to bring that change to the database back. If I told you I've recorded that row, I've recorded the change, it will be there no matter what. And, and the in

Starting point is 00:32:43 the context of restate, that doesn't mean, for example, if you the core building block of restate is a stateful durable function, you can think of it like that. And the stateful durable function, when you schedule an invocation for that, or as you go through the code of that stateful durable function has like multiple steps, that or as you go through the code of that state with durable function has like multiple steps, recording, recording a step, whenever you go beyond beyond a step that you asked to restate to treat as durable. You know that no matter what happens, you will never re execute that step, you'll never

Starting point is 00:33:20 come up with a different value. Like if your machine goes down, the reset server goes down, if you deploy it across availability zones, if your machine goes down, the reset server goes down, if you deploy it across availability zones, the data center goes down, the network gets partitioned, whatever, you'll never ever go back and re-execute that step. If it once told you that it's done, that's the meaning of durability. Once it says it's there, it's always going to be there. And I think this is in a way the, I'd say almost like one of the magic ingredients is the way Reset looks at making distributed application development simple. I'd say there's two core pieces that you need to think about. One of them is the durability. Make durability extremely fine-grained and extremely cheap. So because

Starting point is 00:34:13 if you can apply durability in fine-grained steps, you always have to worry about very little after a failure. Let's say your durability is coarse-grained. Let's say the order workflow is one durable step, right? And it crashes in the middle. It gets retried. It's up to you to figure out, well, did I actually process the payment already or not? Maybe there is a way to just assume, okay, I deported, I can send it again, or I might even not be able to ask the service, did I do that or not? Did I actually decrement the available kind of product already or not? Maybe I have a way to again make this durable or not. I don't know. These things tend to be harder than one thinks because sometimes the

Starting point is 00:34:52 API gives you, you know, it might have given you an error back the first time and you thought I didn't do this and followed some control path flow. And then the next time you actually get not an error, but the real result and then you follow different paths. So people mess up this all the time. It's really hard to reconcile if you have these multiple steps as a coarse atomic unit. What did I do? How did I do it the last time? How do I recover from this? But if you have extremely fine-grained durability, if you're recording every individual step as durable in the system, and when it comes back, it can tell you exactly like this

Starting point is 00:35:21 was the last step that you recorded, then you just have a very small amount of uncertainty. Okay, here's this one thing that I might have tried already. I have to just worry about that bit instead of the whole history and possible control flow and all the choices how I might have ended up here that I need to reconstruct in order to proceed consistently from there. So just like very fine-grained durability is extremely powerful and simply fine things. I'd say the second magic ingredient is then how do you anchor this in the whole retrying and resolving potentially inconsistent situations with partitions, with timeouts, with zombie processes and so on, so that there's always a very consistent view of what the last durable step was. I think that's the second sort of ingredient of FreeState.

Starting point is 00:36:07 It's not just durability, it's actually durability and consensus. And I'm giving you a very, very crystal clear view on where you left off, where you need to continue from. I think if you take those two things in conceptually, you've simplified the problem massively and the rest is almost API sugar that you built on top of that. That's the magic that happens in the restate runtime. It's a very low latency, durable consensus lock that fuses queuing, state management, locking, fencing,

Starting point is 00:36:47 creating futures, resolving futures, like all these kind of operations that tend to be part of a distributed coordination process. When you say the restate runtime, what can you liken that to, for those of us who don't know what a restate runtime is? Is it like a Node.js thing? Is it like a database? Is it like a Node.js thing? Is it like a database? What is that?

Starting point is 00:37:07 Yes. So using reset is a bit like, I would say, somewhere in between using a database and using a message broker. So you write your program pretty much as code, however, how you would write it before. But you're using the reset SDK. Think of it a bit like your database driver in order to act sort of, wrap certain operations as, okay, this operation here should be recorded as a durable step

Starting point is 00:37:39 or attach this state to the invocation transactionally or create this future, complete this state to the invocation transactionally or, you know, like create this future, complete this future and so on. So this is, do these operations on like through the restate SDK. Reset itself is, is then like, maybe message brokers the best comparison. It's on the level of the message broker. So when you invoke your code, you're not calling it to that directly, you're actually calling it to restate, which makes the invocation of your function on behalf of you. The programming model that we try to provide is you're writing a service

Starting point is 00:38:10 that looks like an RPC service, like you're writing RPC handlers, and then reset almost looks like a reverse proxy for you. So the other services, instead of calling the code directly, they call it indirectly through restate, reset proxy in the call. And it puts itself in the middle with its durable consensus log. And when it forwards the request to the service, it just isn't forwarded naively as an HTTP request, but it actually uses a, like an invocation protocol, it uses like HTTP2 or another type of streaming connection, holds onto that connection, allows the service to sort of synchronize fine-grained steps. It will, when it forwards an invocation, for example, tell it exactly what the supposed state of the world should be, as in, here's the steps I know you

Starting point is 00:38:56 should treat as completed, here's where you should continue. And it will then allow to use that connection, that sort of lifeline to let the application, you know, create durable actions. So the, yeah, it's like on the level of a broker or database, looks like a reverse proxy to the invoker, looks like a, maybe almost like a database to the service that uses it. And when would somebody reach for this? Now you said distributed systems, but some people think every network attached system is a distributed system. So I mean, if I'm building a web application,

Starting point is 00:39:34 let's call it a monolith that answers HTTP requests and has a database backend, a Ruby on Rails or a Phoenix or insert your Django, insert your backend framework here. Are those folks pulling in Restate and using it for certain aspects of their workflows or is that not necessary for them because they are kind of a monolith?

Starting point is 00:39:56 Like do I have to be building a services-based architecture? Like where does it fit in? I would very much be with you on like almost any system we build as a distributed system. Yeah, completely. So I think it becomes useful very, very quickly. Maybe one way to think about this is your backend where you do, where you maintain state and update it and run operations and changes.

Starting point is 00:40:22 It usually has a database that has the sort of core business state, some operations go just like purely straight to the database. That's all they do. That's fine. But anytime, you have to do something that's not straight against this like one like your core database, but something that goes against like different API, something that runs in the background. Yeah, something that is asynchronous work that goes beyond just touching the database, I think you already are at the point where it's starting to become useful. Then you know, if if the only thing you're doing is maybe

Starting point is 00:40:55 forwarding one call, yes, maybe maybe it's overkill. But the I think the the usefulness starts much, much sooner than lots of folks realize. I would say every time you think about pulling in a message queue, you should probably start to think about pulling in something like restate because it gives you a way to do the things you're probably trying to do with a message queue, but in a more high level, in a more well-defined concrete way. You're not treating events, but you're dealing with stateful, durable invocations, stateful, durable functions all of a sudden, which is very often what you really want. If you want, if you're putting something off the synchronous path with a queue, you very often want to say, okay, here's something where I really care about that this happens. It shouldn't get lost, right? That's

Starting point is 00:41:53 why I'm putting it in a queue. And then you probably care about this thing happening once, having reliable retries, to quickly reach the state where the processing of this operation is actually multiple steps. And then you're again in the okay, how do I do reconciliation of multiple steps if it failed somewhere in the middle? And I don't know what I already completed or not. So I would say the moment you start to pull in a message queue, you probably should think about something like like we said, the point comes very quickly. Yeah. But then it's, it's that's the that's the simplest use case, I would say the most the most complicated ones that we see people build with us right now is using this

Starting point is 00:42:38 to replace complex, a complex choreography of like multiple Kafka topics and rabid mQQs and session servers and workers and so on. Or even like a distributed sort of payment ledger keeping system. So it's really a very broad spectrum. The I would say in a way you could think like all the type of work you do in the back end. That's not the central database that keeps your business state. I think is ultimately ultimately where we sit comes in. Yeah, what about a scenario where. It's publishing I'm thinking like tick tock or YouTube for example as a creator we will upload videos to YouTube, there's a process that happens,

Starting point is 00:43:26 there's a certain orchestration that happens, it has to be compressed, it has to go through certain filters, maybe there's even a content filter that has to go through a copyright filter. Is that an example of where you would use something like Restate where you want it to go, you want the user to be able to upload properly

Starting point is 00:43:41 and your server capture the data and all the good things, but you got to run it through a process of saying okay this is now content that can be seen by what we call the world because it's been blessed by the copyright filter etc etc. Is that a scenario where it makes sense? Yeah absolutely. This is basically a workflow again if you think about it right? You're uploading the video let's's say, maybe the upload first puts it into some cloud storage. But then as you said, you first pass it to the content filter, then you have maybe a

Starting point is 00:44:11 few steps that even run in parallel, like recoding it for different resolutions, optimizing it to be served through the CDN and so on. Then you're, I don't know, running it through a system that tries to figure out what's the best sort of title frame to display and like all these different steps that you do and they take potentially a long time. So it's a long running process. There's a fair chance that the container goes down in the middle or wants to be migrated and when it comes back up, you really want this to understand where did I leave off? Like what are the processes I should reconnect to that are doing the encoding or the analysis?

Starting point is 00:44:44 Like this is exactly the orchestration of that process is where we said would come in. You wouldn't feed the video frames to the system. That's like overkill. You don't need to feed the video frames to a transactional log. Like it's just that you put them in whatever cloud storage or so,

Starting point is 00:44:57 but the orchestration of the process, of the workflow, of the pipeline that does that, that's a very good to reset your skills actually, yes. That's a very good research case, actually, yes. That's a great example, Adam, because it definitely makes it easy to think through. I guess as people who upload to YouTube, we are intimately familiar with all the different steps. That's why I enumerated very well.

Starting point is 00:45:16 And it's asynchronous, because you can go about doing the other things while it's working on the long-running tasks, for instance. And somebody coded up some nice orchestration behind that sucker to keep that thing running. Yeah, and Google has the engineers to code up reliable orchestration flows, even in a way that they're nicely observable.

Starting point is 00:45:37 You can reconnect to them. They're efficient, they know how to parallelize step and synchronize steps and so on. It's a much harder thing to do for many companies who don't hire the same type of engineers as Google does. And I think, I think for those reasons, it actually makes these type of things much more achievable than if you try to embark on that, on that journey without. Even when you were sharing how it worked early on, you were saying that it seemed,

Starting point is 00:46:06 at least from my perspective, it seemed like it was user born every time. Every new application, every new scenario, every new job, every new what have you, maybe even in your boring scenario where you were to sort of focus for a bit there, you keep recreating this durable invocation world over and over and over again. And why not turn it into like you have done here with a server and a client and SDKs for

Starting point is 00:46:35 different languages and a flow that every developer can grab. Is that kind of where what landed you to this point here was that frustration of the repetition and repeating and rebuilding every single time you build an application? I think that's a great way of putting it. Yes. I would say the number one alternative to restay that people do or use is roll your own. Absolutely. And it's a very repetitive process. And most of the time, I would say folks don't realize really all the edge cases that existed what they do, they just maybe don't even solve them. So this is basically half-baked, roll your own. And it's, yeah, every time again and again. And it's very

Starting point is 00:47:22 often very similar problems that you're solving. Like let's say you're taking a message queue to say an action that I triggered should run asynchronously and it should, you know, chat have reliability, read choice. And then I'm pulling in another store like Redis or other key value store to record different steps. Then I might be pulling in something like ZooKeep zookeeper etcd to place a lock on certain operations. So no, no, they don't happen concurrently, like, we shouldn't be updating certain, I don't know, should a retry shouldn't work on the same payment ID. If the lock is still being held by the original processor. So and then you're trying to sort of the original processes. So and then you're trying to sort of going back to add impotency, trying to make an update to another system and understand how do you actually anchor the idea of that processing forward into the updated that other system like a new recreating

Starting point is 00:48:15 exactly recreating that type of pattern over and over and over again. And I think this is where in a way workflow engines were originally born, if you wish, like enterprise workflow engines, which try to say, okay, let's try to define a flow where we can have steps following a certain predefined control flow graph. And we have the workflow engine giving you the guarantees that step B that follows step A really only starts after step A is done and step A is transactually persisted before B starts and so on. They tend to be extremely heavyweight and flexible. Yeah, it's just not not they break all the tools and everything when you want to interact with them. And what what Reset does and the durable execution space, what they're trying to do in general is kind of bring this level of guarantees in a very, very lightweight way into like almost arbitrary programs because it's just such a useful power to have to kind of define these durable steps, especially

Starting point is 00:49:18 if you don't have to like branch out in a different domain-specific language or graphical way to define them. But if you could just write your regular code, but have it treated, have it executed with the same sort of guarantees as if it was an enterprise workflow. It seems like you might have a large education challenge in front of you because there's so much thought that has to go into this kind of architecture.

Starting point is 00:49:41 I think the fact that your largest competitor is Roll Your Own means most people don't know. Like, we kind of all discover this pattern slowly over time inside of our own daily work, and so I'm just curious how you think you can attack that, or are there other people, like, is there a common thread or movement that you can attach to or create

Starting point is 00:50:07 in which people are like, yeah, here's this new style? I thought of this because you mentioned workflow. And I think that's sort of in the wheelhouse or in the ballpark of what restate is, message queue. I mean, is there like a simple idea or concept, pattern, of which restate could be one, or maybe restate is the brand? But have you thought through this,

Starting point is 00:50:34 because you have a marketing problem here, or a challenge, I should call it. Yeah, so I think that is very true in many ways. Okay, I think there's like lots of layers of answers to that. I would say the simplest way you can actually explain it reasonably simple if you just start from, it's stateful durable functions, which have guarantees that they execute,

Starting point is 00:50:59 they run to the end, they are able to record steps, they're able to basically do these sort of asynchronous building blocks that you have in your usual programs, calling other functions, creating promises, resolving them, updating state, making calls and so on, just in a fine grained persistent way that knows how to recover. This is sort of the basic building block of state for durable function. Now, the harder thing is actually in a way making people realize that they should be using something like this rather than roll your own. It's not uncommon that it's mostly on the on the junior side of engineers that that you talk to them this like, I don't

Starting point is 00:51:38 get it. Like, I know how to write a retry loop like what, what is this? And then, you know, this is it's a journey from there. The interestingly, I think the most enthusiastic audience is often the engineers that have been burned before, that know, okay, I know how to build distributed systems, but holy cow, I know how hard it is, and like even though I'm really good at this, I tend to overlook still two out of 10 corner cases

Starting point is 00:52:02 and I get paged Sunday night or so. Those are the ones that really, that often go like, yes, I know why I wanna use this because I know how much time I would otherwise spend on solving all these things if I have to do it myself. So I guess that's right. There's definitely an education challenge there. And I would say a very sort of like in your face example

Starting point is 00:52:23 of that is if we look at the AI space and agents right now, I think every AI company is like reinventing workflows in the context of like agents and agentic workflows. Like everybody's building and I would say slowly rediscovering like all those things when there's been an entire sort of industry that has been working on this for like, wait, I mean, we've been working on this for two years. But if you ask IBM, they've been working on this for probably 30 years or something like that. I mean, in

Starting point is 00:52:51 a very different way, right? But but still, and I feel like the the AI companies are kind of like bit by bit rediscovering this. And like, when you when you start talking to them, I think some of them understand, okay, even where if you if you're building agents, if you deploy them, how how they ultimately end up having to solve these problems again, like imagine you have a chatbot that does your flight booking this like there's something you have to do to make it not rebook your flight twice if they just crushes on the wrong point. They're ultimately going to the same problems. They actually

Starting point is 00:53:25 have a perfect foundation to build on with these systems being built today. But yes, I think they're not aware yet that this is something that they run eventually into. So yeah, I think you can see this in many places that the industry is rediscovering work in different sort of subfields that other fields have been done just because information flow isn't perfect. Mine are off topic, Grant. Why are all of the AI agent, like Hello World examples,

Starting point is 00:53:56 why are they all booking flights for us? It's like, do you want some undeterministic, half-baked language model, booking your flight, that's like a very difficult thing to roll back, you know? Like, I just don't, that's like one of my last human out of the loop, like AI agent moves. Like, can we start with something a little bit less critical?

Starting point is 00:54:17 I don't know about you, Adam, but I get like serious heart palpitations thinking that someone's gonna book a flight for me. And you don't get heart palpitations, Jared, you're a pretty chill dude. I am, I'm pretty chill, but I just feel like, gosh, you know how hard it is to gonna book a flight for me. And you don't get hard palpitations, Jared. You're a pretty chill dude. I am, I'm pretty chill, but I just feel like, gosh, you know how hard it is to roll back a flight? I mean, come on.

Starting point is 00:54:30 Oh yeah. Well, I think it depends. I mean, I don't mind. I think it's the human dream to have somebody or something take that kind of action. Right? That specific action. Like, book me a flight.

Starting point is 00:54:44 Let's simplify it. How about you just like, give me a, yeah, exactly. Restaurant reservation, you know, because worst case scenario, I ghost it and feel bad. But if I don't show up for my flight, I lose my 400 bucks or whatever, you know? Yeah. Maybe this is so much accumulated pain

Starting point is 00:55:00 from people waiting in the call centers for airlines that, I know, all these companies see like, oh, that's a perfect example for a chatbot. People will wanna use it because they will not want a single other minute to spend on the phone with these call centers. Yeah, perhaps. Well, friends, I am here with a new friend of mine, Scott Dietzen, CEO of Augment Code. I'm excited about this.

Starting point is 00:55:39 Augment taps into your team's collective knowledge, your code base, your documentation, your dependencies. It is the most context aware developer AI, so you won't just code faster, you also build smarter. It's an ask me anything for your code. It's your deep thinking buddy. It's your stand flow antidote. Okay, Scott. So for the foreseeable future, AI assisted is here to stay. It's just a matter of getting the AI to be a better assistant. And in particular, I want help on the thinking part, not necessarily the coding part. Can you speak to the thinking problem versus the coding problem

Starting point is 00:56:11 and the potential false dichotomy there? A couple of different points to make. AIs have gotten good at making incremental changes, at least when they understand customer software. So first and the biggest limitation that these AIs have today, they really don't understand anything about your code base. If you take GitHub Copilot for example, it's like a fresh college graduate, understands some programming languages and algorithms, but doesn't understand what

Starting point is 00:56:35 you're trying to do. And as a result of that, something like two-thirds of the community on average drops off of the product, especially the expert developers. Augment is different. We use retrieval augmented generation to deeply mine the knowledge that's inherent inside your code base. So we are a co-pilot that is an expert and they can help you navigate the code base,

Starting point is 00:56:57 help you find issues and fix them and resolve them over time much more quickly than you can trying to tutor up a novice on your software. So you're often compared to GitHub Copilot. I can imagine that you have a hot take. What's your hot take on GitHub Copilot? I think it was a great 1.0 product, and I think they've done a huge service in promoting AI.

Starting point is 00:57:19 But I think the game has changed. We have moved from AIs that are new college graduates to in effect AIs that are now among the best developers in your code base. And that difference is a profound one for software engineering in particular. If you're writing a new application from scratch, you want a webpage that'll play tic-tac-toe,

Starting point is 00:57:39 piece of cake to crank that out. But if you're looking at a tens of millions of line code base, like many of our customers, Lemonade is one of them. I mean, 10 million line mono repo, as they move engineers inside and around that code base and hire new engineers, just the workload on senior developers to mentor people into areas of the code base

Starting point is 00:58:00 they're not familiar with is hugely painful. An AI that knows the answer and is available seven by 24, you don't have to interrupt anybody and can help coach you through whatever you're trying to work on is hugely empowering to an engineer working on unfamiliar code. Very cool.

Starting point is 00:58:16 Well, friends, Augment Code is developer AI that uses deep understanding of your large code base and how you build software to deliver personalized code suggestions and insights. A good next step is to go to augmentcode.com. That's A-U-G-M-E-N-T-C-O-D-E.com. Request a free trial, contact sales, or if you're an open source project, Augment is free to you to use.

Starting point is 00:58:42 Learn more at augmentcode.com. That's A-U-G-M-E-N-T-C-O-D-E.com. Augmentcode.com. I'm gonna go out of the limb to bring us back into somewhat left the center, but basically center. Please do. And I'm gonna say that this is the year, 2025 is the year where durable execution of things is more important

Starting point is 00:59:12 than it ever has been. Oh really? It's always been important, but more and more people are leveraging APIs, they're building out this agentic world we keep hearing about. Right. And I think you keep having more and more people program against brittle APIs, brittle latency of networks databases, etc

Starting point is 00:59:28 And you need that promise. I'm gonna say that this is the year Where the marketing problem that you have that Jared alluded to is still there? I'm sorry, but it's less and I'll tell you why it's less because Render I just talked to on our go L, uh, CEO founder of a render and this is on their radar. So they're building an application for developers. We did a whole show on this. And during that conversation, he mentioned a brand, at least I think I did actually,

Starting point is 00:59:57 I mentioned a brand that sponsors us, not this show, but has been, and I think still is a sponsor into Q2 and maybe Q3. And that brand is Temporal. So I'm going to ask you to sort of help me understand the difference between Temporal, Nats, Synadia, Restate, your open source flavors in your cloud, what Rendon may be doing for application developers. It seems like this durable execution retry model doesn't live in the language itself. It's something you have to build every single time. That sucks.

Starting point is 01:00:31 And it seems like more and more people are trying to solve it. So break down all those for me, temporal, NAT, Synadia, yourself, what render's doing, and anything else that may be doing, I mean, flink, but you know, that's a different world. Yeah. There's another one called Resonate.

Starting point is 01:00:47 You know that one? Stefan, do you know it? Yeah, I know Resonate. I know the guy behind it. It's pretty new, but anyways, there is definitely, like you said, there's other people trying to solve this problem. Yes, exactly.

Starting point is 01:01:00 I think this starts from the same observation, like the state of how things are built if you don't rely on one of those tools. It's almost unsustainable. It's hard to build. It's hard to hand it over to another person. There's often so much implicit and brittle assumptions in how this works. So folks have been trying to come up with solutions. From the ones you mentioned, Temporal is absolutely the closest, maybe yeah, between Temporal and Resonate, I would say those are the closest to restate. So I would actually focus on those. I would say NATS goes more in the direction of like flexible, persistent messaging together with like some state management blended

Starting point is 01:01:46 in and so on. But you can already see like folks are trying to just like figure out what are the different aspects we need when building applications and sort of like make them type work together with each other in a tighter way. And if you wish, I think this is the, this is for me, restate is, there's a couple of things that that make it unique, but I would say two things stand out first, I'd say the model goes a bit further than than every other system. So restate is, if you look at temporal,

Starting point is 01:02:15 temporal is workflows, that's really what they implement workflows and activities. So it's this, it's like durable steps. And then, you know and then with sleeps in there and signals and so on. So the full-fledged workflow is actually very flexible if you're a power user and know how to use that. Reset goes beyond that by saying we're not just looking at a workflow at like one durable execution of multiple persistent steps. We're sort of generalizing this almost what Temporal does for workflow, which we're sort of generalizing this almost what Temporal does for workflow,

Starting point is 01:02:46 we're trying to do this for a distributed service architecture consisting of like multiple stateful services that interact with each other. And that, you can see this from the fact that Reset has like persistent messaging and RPC built in, it has state built in that lives across a single durable execution. So again, in Prol terms, the workflow is done.

Starting point is 01:03:13 The workflow is done like, you know, it's sort of a self-contained unit within the workflow across the durable steps, it remembers context, but once the workflow is done, it's done. And then Reset is a stateful model where you could almost think of the activities or like decoupled from the workflow is done is done. And then reset is a stateful model where you could almost think of the activities are like decoupled from the workflow. The activities can be stateful services and entities that live for a very long time and then you have durable functions that interact with them. It's a much more flexible and powerful model to build things

Starting point is 01:03:37 like distributed state machines. We have folks that actually start ditch certain elements of databases to put their state in restate because that is transactually integrated then with the durable steps and out of the box consistent. So let's say number one, so think of the temporal model that generalized into distributed services to include long-lived states, include communication like between microservices. It makes for like more powerful, more flexible box. That's the one thing. The second thing goes a bit back to what I said earlier. When we started this project,

Starting point is 01:04:12 we set out with the following. You can implement durable execution. I think just, it's not terribly complicated to implement a durable execution API on top of a database. If you make it very simple, you know, have a step, write it to a database, you know, on replay, just like query the database, what are the steps that are already in there? It has a lot of holes, but you know, like it gets you started. But then, okay, let's talk about the holes, right? Like, all of a sudden, you have a problem with like long running processes that spend for a long time scaling this to zero. You have a problem that you have to worry then in your library, you have to implement your own distributed locking and mutex

Starting point is 01:04:53 in case you have timeouts and zombie processes and so on. And so when you try to make it a really good experience, you quickly come to the point of, okay, we actually have to go a lot further than building a library on top of database. Then you start, you know, maybe we're building a big orchestration server that still uses a database in the background. And then you really come to the point of, if you want to make durable execution so lightweight that you can use it almost pervasively, how low latency do you have to make these steps under load? What is the best you can actually do if you deploy this across multiple data centers, if you deploy this across multiple regions? Then you come to the point that a distributed database across multiple data centers and

Starting point is 01:05:34 regions, there's a lot of coordination back and forth because the database model, it needs to guarantee integrity. It does a lot of transaction transaction time, stamping back and forth and round trips. On the other hand, if you built this on a log, like optimized transaction log, you can get as good as make one flexible quorum right across your different data centers and you have the step persisted and you can continue. So it's kind of going to the point where it's saying, if we want to make this extremely fast, so low latency that you can actually start to use it in places where you didn't think you could use durable execution before, because it becomes

Starting point is 01:06:16 so cheap, so low latency. How would you have to build a system to do that? And that's where we went. You'd have to build it from first principle, starting with a low latency replicated log. On top of that, build it like end to end event driven. So you don't do like batch queries on a database, but you do the most latency thing you can do. You do fine-grant messaging and event pipelines. And you basically layer from there. And then the other thing is like, okay, let's not just make it really low latency, but at the same time, it has also to be an extremely lightweight thing.

Starting point is 01:06:53 Because somebody who, you know, we just said, what's the simplest use case? Like when should you actually start looking at reset only when you have a distributed ledger to build? Or do you want to do this if the only thing you want to do is like put your asynchronous emails sending in the background, but reliable. So the next thing is how do we actually make this extremely lightweight? What's the most lightweight package we can give that thing? And the most lightweight package is single binary, zero dependencies. Just download that thing. It has its log built in its orchestration layer, its metadata consensus

Starting point is 01:07:22 module, everything in a single binary. Just download one command starts in a second and you're done. Like this, literally nothing else to do. And then you can take this thing actually, and start scaling out just by adding more notes. If you want to migrate it, let it take a snapshot to an object store, start deploying the data, send us resume, go from there. So what's really the experience that durable execution needs, really the experience that durable execution needs. If you want to be able to take it from the point that it's so lightweight, you almost want to embed it with almost any application to this thing powers like distributed multi-regional payment processing. What's the architecture need for that? So that's what we started building in Restate. So the second thing, that was a very long way of saying the second thing is reset is really sort of a

Starting point is 01:08:06 durable execution stack built from first principles for low latency serverless operations, high throughput and just like really, really nice, nice operations from the small to the large scale rather than saying, let's start with whatever database we have, I think in temporal case, when I came out of Uber, they started with Cassandra and saying, let's start with whatever database we have, I think, in temporal case, when I came out of Uber, they started with Cassandra and say, let's build a server that sort of like sits on top of Cassandra and like stores all all the state that it needs for coordination in there.

Starting point is 01:08:35 And then, you know, you have like different pieces that you need to scale, you have a database that is actually a lot more than you really need for durable execution. But on the way also, sort of sacrifices the potential for optimizing. Those are the differences I would say. Gotcha. Built for speed basically, that's what you're saying. Built to be lightweight. Scale down.

Starting point is 01:08:58 Yeah, scale down, scale up. And yeah, lightweight, simple to operate. I usually don't like to do this. I usually like to talk more about like what makes Reset great than what makes other systems not great. Uh oh, Adam puts you on the spot. Well, you know, I think it's important. Well, if I'm gonna say, if I'm gonna go out on a limb and say this is the year, then you have to follow me, okay?

Starting point is 01:09:18 Yeah. You have to follow me and you have to answer my question because I'm reducing your marketing churn for you. Yeah. Just by nature. So I just say if you look at the way reset is built and it allows you to get started and scale from there, if you say, okay, I care about self hosting this because what I pipe through this is like critical data.

Starting point is 01:09:43 It's not something I trust with some managed cloud. It really has to run in my account. So I think the experience you get out of Reset is vastly different than what you get out that you get from many other systems. And that's because it's just been this very thoughtfully crafted stack from the very beginning and not sort of incrementally evolved from this database and that server. Right, if you're directly comparing to Temporal, which is an incumbent, which was spun off from as you mentioned Uber and has different principles for which it was built on, you

Starting point is 01:10:17 went back to first principles and said, okay, if you want to get to the point where you can put this in almost everywhere you want, you have to be low latency, You have to be fast. You have to see these first principles you built on have to be there. Yeah. And you can't have the requirement to first install the distributed database before you get started. What is the requirements? That's where I was going to go too. So it seems like it's client, which is an SDK essentially inside your code base making calls to a server. What is the architecture, the infrastructure required? So the reset server, which is where the low latency consensus log lives and the thing

Starting point is 01:10:58 that basically becomes the reverse proxy for your service. That thing has not really any requirement if you want to get started. It's a self-contained binary. It embeds its own distributed log, ROC-CBS storage engine, its own consensus engine. The only thing if you want to run it as a single node is you need to give it a persistent disk. It's almost like, let's say if you want to run SQLite or Postgres, it's a little bit like, let's go back to the good old days where you download one binary. I just like started and it's actually running. It's actually good. Like there's nothing else you need to do. But then at the same point, it's also able to go from that single process, to start with the single binary to actually cluster up

Starting point is 01:11:49 and build a distributed cluster. And there's a very interesting architecture in there. We built it basically for the cloud data age, where you would say any system that you run at scale should not really store its own data, but it should just make use of object stores for as much as it can, because S3 and these systems, they're these bottomless, insanely durable and insanely cheap storage systems. So make use of that as much as you can to put a large chunk of your data.

Starting point is 01:12:19 So that means while you may be working with the data on your individual nodes, you're not really required to safeguard it on the nodes because you can recover it from S3 or an object store. So what Reset then does is it actually implements its log in such a way that it only uses this for to really give you the very low latencies for the durable steps. And then in the background, it incrementally moves data to S3, which makes the individual nodes fairly lightweight to operate.

Starting point is 01:12:52 So to go back to your question, what are really the requirements when you want to run it? If you want to run it on a single node, none, or a persistent volume, if you actually want to run it in production, if you want to run it in a distributed setup, give it an S3 bucket. Those are the requirements. If you want to use it from your code, the requirement in your code is to use the SDK and to basically create a reset entry point that reset can connect to and where it can use it, sort of durable invocation protocol that understands how to decode that. This sort of entry point is mimicking the popular frameworks like, you know, it's relatively close to Express.js if you're talking the JavaScript world and the Java world, it looks more like Spring,

Starting point is 01:13:40 Boot, and so on. And then within the individual durable function service handler, you need to use the restate context to say, okay, I want to run this step and record it as durable. I want to create this as a durable promise for a persistent callback or so. But otherwise, the structure of your code is very much the same as it used to be. So it's supposed to be as little invasive or as little to get as little in the way of how you used to do things as it can, just sort of changing the paradigm as in, because it has this foreign country ability

Starting point is 01:14:22 for these operations, you can get rid of a lot of this sort of unhappy path code. There's still cases you need to treat, but mostly you still need to treat sort of like persistent errors that come from the application and where you say like, okay, you're making a call to an API where you're not authorized. It's not really a way you can recover from this. It's trying to do something you're not supposed to. And you know, handle this, but don't worry about handling process failures, network failures,

Starting point is 01:14:49 rate limits where that that bounce you back. Don't worry about many classes of race conditions about you know, like the state being maintained in the database versus the logic that interacts with it in a function that can, you know, where you don't know, really did this go through or not. Just if you put the state of the reset handler, it's just gonna be consistent for you. All of those things. By keeping the structure of the code

Starting point is 01:15:13 like close to what you used to write. So I made you talk about things you don't like to talk about except for maybe the architecture. That seems kind of fun to you. What is it that you do like to talk about when it comes to defining and describing restate and why developers should consider it? So now what I don do like to talk about when it comes to defining and describing restate and why developers should consider it? So now what I don't like to talk about is competitors in the sense of I don't want to

Starting point is 01:15:33 say, okay, I don't like this about their competitor. I don't like that about another competitor because number one, I'm not an expert in those systems. I try to be honest. I look at them to the extent I need to look at them, but actually no deeper than I need to because I found this very liberating to not have my like judgment sort of clouded or pre sort of pre biased by having looked at something I feel if you look, for example, if we look in depth at how temporal would build their

Starting point is 01:16:02 API and so on, there's like a very good chance that like, oh yeah, I get it. This is why they did it and this makes sense and so on. Then there's a good chance you'll probably do it the same way just because you've sort of seen this example, coded it, understood it, and you're preconditioned exactly. If you don't do this. Yeah.

Starting point is 01:16:16 It's like, sure, there's cat. Is the cat alive or dead in the box? We won't know until we look. Maybe, yeah. But if you don't do that, It's both dead and alive. You actually have a chance to do something, to come up with your own creativity, possibly do something better, right?

Starting point is 01:16:29 So that's one of the reasons why I don't like to talk about them so much because I'm absolutely not an expert. I look as much as I need to but I don't usually don't try to go super deep into these systems. And the second thing is that, I don't know, there's so much, I'd rather talk about good things than bad things. Yeah. It's like more of a, it I'd rather talk about good things than bad things. It's more fun to say nice things than bad things. I understand your discomfort then and now. Definitely it's always, it can be tumultuous talking about competitors and what they do

Starting point is 01:16:58 and what they don't do. I think in the context and the reason why the question is pertinent is because whenever, like to Jared's point, you have a marketing challenge ahead of you and I think in the context and the reason why the question is pertinent is because whenever like to Jared's point, you have a marketing challenge ahead of you. And I think it's because the idea of durability and item potency is mostly well known, not always easily implemented, and there's options out there. And so when you sort of look at that challenge, you think, well, what could someone reach for it? When would they reach for it?

Starting point is 01:17:24 When does it make the most sense to reach for it? And does it actually fit whenever they do try to implement it at scale, you know across different boundaries and whatnot And so I think when you compare that and you look at like well Nats Nats is a whole different scenario But they do similar things. It's kind of funny because when you mentioned Flink, you're like Well, it does this in a different way and then you got to restate because of your experience there and whatnot See anything with Nats is like Nats does a lot of the similarity things where your brokering messages, there's a lot of retries, there's a lot of key value storing in there. There's a lot of that same principles, but it's not about durability. It's not about retries.

Starting point is 01:17:57 And then you obviously have Temporal, you have Render who's trying to or going to do something like that in the same platform, which I just had that conversation with Anurag and then you obviously have restate and how you went back to first principles versus being spun out of something or what have you so I think from a a guide standpoint, you're the most you're the you're the best suited guide in this conversation to explore those Because jared and I can't do that for us Yeah, absolutely. So if you want a quick summary, I'm very biased, but I think there's almost no reason to not

Starting point is 01:18:36 reach for restate. I think it really is this solution from first principles with amazing developer experience with a very powerful abstraction that allows you to build what you can build with workflows and signals, but also so much more. And yeah, just the journey from the beginning downloading the binary, then migrating, scaling out is an absolutely, it's a great experience. And I mean, the project is newer than other projects, so it will have a rough edge here or there, but it's also moving very quick.

Starting point is 01:19:15 It's very good at reacting to community feedback fast. So I think it's a good choice. It hasn't made a lot of users happy so far. Could we use maybe Jared an example from our own application to consider how we would pick up restate? And I know we publish episodes, right? We publish episodes. We often will have scenarios where the slug isn't right.

Starting point is 01:19:34 We've had different scenarios where we had to do things in prod to fix something. You know, it could be metadata and you've got different checks before the publish process. Is there a way knowing what you know now about restate, how you would consider implementing something like that to safeguard publishing episodes in a, in a durable way? I've never really used one of these tools before,

Starting point is 01:19:57 so it's difficult for me to say that. Um, I do know just at a technical level that I do not believe restate has an Elixir SDK, so we might be out in the yard. Elixir one, that's a good ask. Okay, maybe I can help you come up with an example here. Let's say you're recording the episodes and then every time an episode is done, let's say, you know, let's do an AI thing here or so. So you're building your, your chat where you can chat with an episode, like, okay, you know, show me, like, tell me when did they

Starting point is 01:20:36 talk about this? Or tell me what episodes talked about these topics and so on. So what you're doing whenever, whenever an episode is done, you're feeding it first through a model that transcribes the audio, then you're chunking it up, feed it through embeddings models stored maybe in a vector database, and then you have kind of a rack style way of, you know, when a query comes, create the embedding, look up the similarity search in your vector database, feed it to the model to get the answer.

Starting point is 01:21:08 For something like this, let's say you started just building the flow in, say in a Node.js application, like in a simpler way, just said, okay, here's the episode. I have something like it gets uploaded. Let's say you're uploading it to an S3 bucket and there's like an event whenever something, you know, gets uploaded to this bucket, you have an event that represents this, and then it starts like a Node.js script or something like this.

Starting point is 01:21:32 And this script is of the type that, you know, if it fails, you know, somebody would have to restart it. And now let's say you're trying to implement that with Restate. I would say approach it the following way. Like the first thing is get a handle of Restate itself. Like there's a cloud service that you can use on our site, which is like a free tier.

Starting point is 01:21:58 Either go there or just use one of these ways to run it yourself on like a, single machine with an EBS volume. Then you have the server there, then put your Node.js script, maybe you can actually put it on something like Lambda ECS, just like use a serverless option to host this. And then tell Reset, use the Reset SDK

Starting point is 01:22:18 to define the entry point until Reset, okay, here's the service that you're now should sort of durably manage. So Reset will then go there and discover this and understand, okay, hey, there's this, you know, like what do we call it, like video transcriber or video embedder service. And then Reset knows about this. And then you would go to your Amazon console and say, okay, for this type of event, I wanna create a webhook to restate so that it makes an invocation to recent says,

Starting point is 01:22:50 okay, this thing has been updated. You know, the kind of event that would previously call directly your Node.js process or script, you know, you actually make it an HTTP call to restate and reset with that call your process. You've already gained one thing right away. You've now basically have a reliable queue in front of it. Just that if you don't do anything special.

Starting point is 01:23:14 So when the web call comes, it's gonna be acknowledged back and reset has this thinking of your process, if it crashes, it will retry this. It will actually give you a nice observability, much more than you would get from your average message queue about individual retries, configuration about timers and back off and timelines and so on. As the next step, you would actually then go into your script and say, okay, let's actually

Starting point is 01:23:35 identify the steps where if something fails in the step or after that step, I don't want it to go back. Like let's say forking the process that does the transcribing or like calling the LLM to create the embeddings. You introduce then the reset context that you get by using the reset SDK and just say, okay, let me wrap these API calls just with reset.run. And that will capture the results of this

Starting point is 01:24:05 durable and basically turn, you've now turned it basically into a workflow. Let's say you want to do something like, let's say you want to do something like parallelize the different steps. You know, maybe just typing this one by one through this embeddings model is a little tricky thing. So let's do, you want to fan out.

Starting point is 01:24:30 You would then, you could then go and say, let me try and do the exact same thing I do in a regular node process, to just make a bunch of function calls, record, like remember the promises, sort of a way to promise that all for those in the end, join the results, put those in the database. You can do exactly that in your code.

Starting point is 01:24:46 Just again, anchor this in the reset context. So you get like this durable parallelization, durable, sort of like scatter, gather, and so on. And so you would then incrementally sort of like rewrite your code to say, okay, let's actually make the step durable. Let's make that step durable, and that step durable. Say as a next thing, maybe one of your folks wants to approve it before it really goes

Starting point is 01:25:06 out. You then possibly, let's do that in the simplest most possible way. We create an awakeable or a durable promise in Reset and say, okay, somebody needs to complete this actively. Send an event, make an HTTP caller to complete this and say, okay, this is approved, go through or no, this is not approved like a board. You can then use, for example, you could put the result of transcription just in Reset. Say somebody could look at it from the UI and then say, okay, yeah, I'm making an API

Starting point is 01:25:43 call here to approve this and continue. And so you can then incrementally build your process, rebuild your process into durable steps. As the next thing you could then, for example, take it and migrate it from a long running process to a Lambda function. Because one of the nice things you have with durable execution is when it's waiting

Starting point is 01:26:02 for something else to happen, it can actually just make this thing go away because with durable execution is when it's waiting for something else to happen, it can actually just make this thing go away because with durable execution, it knows how to recover it to the back to the place where it was by replaying the history of durable steps. So you could then say, if you're on vacation and you approve it a week later, you don't have like some process running and waiting for it. It's just like, it's going to go away. And when the approval finally comes, it's going to come back, use the durable steps to replay back to the point and then to the remaining steps. And so you typically folks would incrementally then rework their

Starting point is 01:26:34 non-durable services, first connect them to reset to basically get the equivalent of a durable queue and then like incrementally rework it and say, okay, I want to ask durable steps here, maybe parallelization, maybe a signal. And I think that's typically how you'd approach it. That makes a lot of sense. I do see also you have some guides on the website about how to implement certain things. I'm curious about the observability bit.

Starting point is 01:26:58 Is that a part of your hosted offering? Is that a part of the open source project? How does the business end fit in and is observability part of that open core sort of thing? Yeah. So at the moment, at the moment, the, like what you get in the open source is very broad.

Starting point is 01:27:22 You get in the open source compared to the hosted offering, pretty much everything except the fact that you would self-host it and you know, like the whole day, whole authentication and API tokens and so on that exists only in the managed offering. But other than that, we've started with an open source first approach. So the open source has pretty much the full suite at the moment. The observability, there's two things about observability in Reset.

Starting point is 01:27:51 Like number one, it can actually give you an amazing amount of observability itself out of the box because it funnels all these durable steps through its consensus logs. It has all the information of what happened. And it's not just function call, but like to the granularity of here is a step that happened or it's actually failed.

Starting point is 01:28:09 This is the last step that completed in this failure and since then I've retried so many times and this is the last exception I've seen. It has all that information available because it also is connected to the service and understands okay what type of errors are happening, is this a retryable error or not? And it gives you access to all that observability data in its own UI. It's actually a fascinating way that this is implemented via control. I want like one or two technical details. Sure. So there's this durable log that recalls all the actions. Then everything is indexed into RocksDB instances to sort of retain it in a scalable way.

Starting point is 01:28:46 everything is indexed into RocksDB instances to retain it in a scalable way. We've built a SQL query engine using the data fusion project around this that allows you to basically do SQL queries against all of that invocation and transaction journal state and so on. What the UI actually does is basically issue SQL queries. It's almost like back to the good old days when all your state was in a single Postgres database. And if you wanted to find out why your application is stuck, you just would open the SQL shell and start querying. And we kind of lost that because we went into like distributed microservices.

Starting point is 01:29:15 And if you want to find out what happened, you now have to do a like murder mystery with 20 services. Yeah, and you're bringing it back. And we're kind of bringing it back, like, yeah, SQL query for the win, like for your distributed application state. So this is one of the things, you get an amazing amount of,

Starting point is 01:29:29 amazing amount of insights right out of just the Reset journal. Second thing is because like all the operations go through there, Reset can also just out of the box generate open telemetry traces and spans for you. So if you give it an O-Tel endpoint that it should push those to,

Starting point is 01:29:45 it will just give you the traces right away without you needing to configure anything. You can then extend it and augment it with your own traces. But yeah, so those are the two things, the two things you can do. And so the business end is basically cloud hosting for Restate. The business end is gonna be a lot more than that.

Starting point is 01:30:06 Okay. This is what it's going to look like. That's, I don't think we can go into this yet. Interview us again in six months. Interview us again in six months. Um, it's not ready to be, uh, to be announced, but, um, what's available right now is most of it, what we have built is also the open source. So yes, on the business side,

Starting point is 01:30:29 we do currently only hosting. For the next six months or so. Maybe. Fair enough, cool. Well, I think it sounds like a really cool system. I'm excited about this new world of durable execution functions and some way to slap a name on that

Starting point is 01:30:54 that brings all of the junior engineers to the yard, along with us seasoned engineers who have felt these pains for all these years. You know, like the serverless folk did. You know, they just said it's serverless, and they're like, oh, okay, cool, serverless. Maybe I should try it. I feel like restate and friends need

Starting point is 01:31:12 some sort of a marketing term to just simplify the overall concept of what you all are building. But I do think it's very interesting tech and very promising. I do like the term resilient apps, so I think maybe we need something with resiliency involved. But that's all for me, Adam. Any other questions from you before we let him go?

Starting point is 01:31:30 I was like in durable personally. I don't know if you want to wordsmith that a little bit here, but I like durable. I think that seems to have- So if you're interested, we've actually gone through a few iterations. We started with something just calling a durable Async await because in many ways that's what is underneath the hook

Starting point is 01:31:52 It's like yes durable asynchronous operations like a function vocation as nice in one's operation made durable a step like that's sort of Nice and connoisseur API call with the durable result So and and they were like some sort of like expert programmers that are like, oh, that's really cool, I get it. Like, you know, it's like distributed durable event loops. It's very cool. But then 90% of the folks did not get that. And we just went with durable execution.

Starting point is 01:32:18 And then it turns out that's a very, that's a term that's maybe increasingly more recognized, but it also undersells a little bit what we said there because folks actually think like, oh yeah, so it's just the same thing as Temporary, but actually does a bit more. So yeah, we're still on the wordsmithing side. Like yes, distributed durability, resilient apps,

Starting point is 01:32:38 resilient distributed state management, like there's so many things on the table. At the moment, I used this earlier in the talk, like stateful durable functions or something we've used. I think this is maybe increasingly getting recognized because of a lot of the efforts that let's say, Cloudflare does with its work or like durable objects. There's a construct and reset called a virtual object that a surprising

Starting point is 01:33:08 amount of similarities with durable objects, and we would have called a durable object if that term hadn't been trademarked by Cloudflare. And Azure, Azure durable functions is probably even closer to reset than temporal. So if you I think you can actually think like temporal, Azure durable functions, so reset is a bit more than that. I think it combines a bit more of like orchestration

Starting point is 01:33:31 and stateful logic even in a more flexible way than durable functions does. Yeah. But yeah, so stateful durable functions is where we've currently landed at. But look, it's a journey. I think honestly, even temporal hasn't figured that out after five years. I think it's still- Well, that's a journey. I think honestly, even Temporal hasn't figured that out

Starting point is 01:33:45 after five years. I think it's still- Well, that's why I said it's a challenge that I think maybe restate shouldn't solve it, but I feel like maybe everybody who's in this category, like you need it, there's a missing category. It's almost like a style of application or an architecture. Where it's like, well, what architecture is this?

Starting point is 01:34:04 Well, it's model view controller. Okay, it's MVC, I can build an MVC style app. You know, where this is like, I don't know what to call it. I'm missing a word. But it's almost like, you know, it's restate style or something like, maybe you have to term it after yourself if you want to really own the market.

Starting point is 01:34:21 That's durable function style or it's, yeah. Durable function doesn't speak to me personally at all. Sounds really boring, but that's just me. And maybe it's working on other folks. Adam likes durable. Durable to me just sounds like cool. It's not gonna break. Stateful, durable functions, that's what you said.

Starting point is 01:34:36 Is that right? That's what I said earlier, yeah. Stateful, durable functions. SDF. SDF style. I'm gonna make an acronym or something like that. I don't know, TDD, SDF. Right. MVC, yes. Yeah, I'm gonna make an acronym, something like that. I don't know, TDD, SDF, MVCS. Yeah, I mean, model view controller,

Starting point is 01:34:48 it doesn't have any sort of appeal to it either on its face. So, I think that wasn't a bad example. Anyways, we could continue to workshop it till we're blue in the face. But obviously you've been working on it longer than that. Hey listen, this is the year. This is the year of it. I'm just saying. The year of the what?

Starting point is 01:35:03 The year of whatever this is. So the durable function. Whatever this is, it's the year. This is the year of it. The year of the what? The year of whatever this is. The durable function. Whatever this is, it's the year. Yeah, I feel there's something about durable in itself that's not recognized by lots of folks. I think you actually asked about that earlier as well. Like what does durable really mean?

Starting point is 01:35:20 Like maybe a stronger emphasis on persistence or so. like maybe a stronger emphasis on persistent or so. I think there's something to be said about resilience. I think resilience is a much more attractive word, generally speaking, and one that to me calls and says, is your app resilient? I'm like, ooh, I don't know if it is. I want resilience. Because durability can show somewhere,

Starting point is 01:35:44 whereas resilience is like, you know what? No matter what happens, I'm going to succeed. I'm going to try until I bounce back. Yeah, and I think you're right in the sense of durability is a means to an end. It's very implementation detail, if you wish. Reset achieves resilience by doing a lot of fine-grained durable operations, which make it easy to bring

Starting point is 01:36:05 it back to a consistent state and that drives resilience. There you go. Now you're getting your ass looking down. Love it. Now we're deep. Listen, hey, you know we have a fun place to hang. It's called Zulip. Oh, that's true.

Starting point is 01:36:19 Go to changelaw.com slash community. Join us in there and then if you have some ideas about this name or this wordsmith, this with us, this world that, uh, Stefan is, is creating, then, you know, pile on, share your thoughts, all the good stuff. Well, what's left, anything left unsaid about this durable, resilient world we're going to live in. What's unsaid about the durable world that we live in? What's unsaid about the durable world that we live in? I think it's inevitable to come. The question is mostly in what shape is it coming.

Starting point is 01:36:54 I think it's been worked on actually from multiple dimensions. There are folks like us that work on this from the reset side, like here's the lightweight durable log that is easy to integrate with other functions. I think that sort of the serverless folks and the Wasm folks are working on that from a different side, like saying, okay, hey, let's compile everything to Wasm and let the system sort of use the Wasm interpreter

Starting point is 01:37:20 to snapshot things. Then there's, and I think there's folks that kind of use container engines to implement this. So the thing what they all share is just understanding of completely unsustainable to not have anything like this. It's harder. It's hard. It gets increasingly more important, the more moving parts and the more asynchronous processes you have. And I think if we all believe what the AI people tell us that like 80% of all this is anyway, it's going to be some agentic stuff in two years. And I think you've just like created an even bigger problem and an even bigger need for this type of systems. So I think this is coming in one in one shape or the

Starting point is 01:38:02 other. This is our sort of... It's inevitable. Our bet of how to best achieve it. And it's gonna be fun to see. It's gonna be fun to see what happens. There you go. All right, Stefan, well, thank you so much for sharing the journey, sharing the love, sharing the things.

Starting point is 01:38:20 Appreciate you. Thanks for having me. Cheers. Okay, very fun conversation today with Stefan. Very big idea of resilient applications. Love the idea of restate. The idea of stateful, durable execution functions is awesome, but it's just four words too long for me.

Starting point is 01:38:40 I do agree with Jared on the marketing challenge ahead, but resent applications. I'm down with that. I think you are too. If you haven't yet check them out restate.dev. Okay. Big thank you to our friends over at augment code, our friends over at retail, and of course, our friends over at her Roku, her Roku.com. Actually, I think it's Heroku.com slash change all podcasts. If you wanna use the URL they gave us, there you go. But the next gen platform for is coming and I heard it's awesome. Okay, BMC thank you so much for those beats.

Starting point is 01:39:20 You are awesome and we'll see you soon. Thanks for watching!

The Changelog: Software Development, Open Source - The era of durable execution (Interview)

Stephan Ewen, Founder and CEO of Restate.dev joins the show to talk about the coming era of resilient apps, the meaning of and what it takes to achieve idempotency, this world of stateful durable exec...ution functions, and when it makes sense to reach for this tech.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.