The Changelog: Software Development, Open Source - Kaizen! Tip of the Pipely (Friends)

Starting point is 00:00:00 Welcome to changelog and friends, a weekly talk show about scaling fly machines. Speaking of fly, thanks to our awesome partners, the public cloud built for developers who ship. Learn all about it at fly.io. Okay, that's Kaizen. Well friends, it's all about faster builds. Teams with faster builds ship faster and win over the competition. It's just science. And I'm here with Kyle Galbraith, co-founder and

Starting point is 00:00:53 CEO of Depot. Okay, so Kyle, based on the premise that most teams want faster builds, that's probably a truth. If they're using CI provider for their stock configuration or GitHub actions, are they wrong? Are they not getting the fastest builds possible? I would take it a step further and say if you're using any CI provider with just the basic things that they give you, which is if you think about a CI provider, it is in essence a lowest common denominator generic VM. And then you're left to your own devices to essentially configure that VM and

Starting point is 00:01:25 configure your build pipeline. Effectively pushing down to you, the developer, the responsibility of optimizing and making those builds fast. Making them fast, making them secure, making them cost effective, like all pushed down to you. The problem with modern-day CI providers is there's still a set of features and a set of capabilities that a CI provider could give a developer that makes their builds more performant out of the box, makes the builds more cost effective out of the box and more secure out of the box.

Starting point is 00:01:58 I think a lot of folks adopt GitHub Actions for its ease of implementation and being close to where their source code already lives inside of GitHub. And they do care about build performance and they do put in the work to optimize those builds. But fundamentally, CI providers today don't prioritize performance. Performance is not a top level entity inside of generic CI providers. Yes.

Starting point is 00:02:21 Okay, friends, save your time, get faster bills with Depot, Docker builds, faster get up action runners and distributed remote caching for Bazel, Go, Gradle, Turbo repo and more. Depot is on a mission to give you back your dev time and help you get faster build times with a one line code change. Learn more at depo.dev. Get started with a seven day free trial. No credit card required again depot.dev Well today is a very good day because we are kaisening and Gerhard is here And Adam's here and I am here. Hey guys. Hey, it's good to be almost weren't all here

Starting point is 00:03:04 But we're all here happen rain and thunder and lightning Internet outages. Mm-hmm. So what happened to my internet? Yeah to your internet. I don't know just went down and Stayed down about 1230 on Monday, maybe one 130 And I called him and told my internet was down and then they said about 12.30 on Monday, maybe 1.30. And I called them and told them my internet was down and then they said, we'll fix it. And then they didn't fix it. And then they did fix it.

Starting point is 00:03:35 But it's a little bit too late for us. We actually were gonna record at 9 a.m. I think on Tuesday. And it came back up around 11 a.m. on Tuesday. So not even a 24 hour outage yet. Still way too long. Way too long for my liking. And it was just my house. I don't know what happened. They said they had to rebuild the modem. Which was apparently a remote rebuild. I think they just flashed it with a new something or another. You only have one. Let me guess. You only have one internet. I think they just flashed it with a new something or another.

Starting point is 00:04:05 You only have one, let me guess. You only have one internet. This is correct. Well, I do have my phone. I said, I told you guys I could just tether to my phone and you know, play hot and loose? No, not the same. Loose and fast.

Starting point is 00:04:20 Fast and loose, thank you. I was thinking fly close to the sun and I was thinking fast and loose and I said hot and loose. But Gerhard, you said you had like a multimedia presentation. I'm gonna have to like have really good internet and so we just called it off. So now we're here, the internet's back.

Starting point is 00:04:37 The rain is over. I'm assuming it's done raining there, Adam. You're all clear? Yeah, I think so. Oh, no, yes, we're good. Okay. And Gerhard're good. Okay. And Gerhard brought us some goodies. You got a story to tell?

Starting point is 00:04:49 I do, yes. Tell us what you have to say. I was thinking about this for some time, actually. Wow. And I was thinking is, when we get close to launching the pipe dream, to launching Pipe Lee, how do I want to do this?

Starting point is 00:05:03 And that's the story. The story is that you were thinking about it, or that you've thought about it and you're going to tell us more? Well, the story is that I will tell you more. A lot of stuff has happened. I decided to double down on the pipe dream on Pipe Lee. I decided to like all my time went there. And all that means is that we have something. We have something. We have something.

Starting point is 00:05:25 Is this the launch story? I think it's good. Let's just like, you know, let me set the expectations at the right level. Okay. And let's see if Adam approves of the toy that we want to wrap for him. Okay. That's the bar. That is the bar.

Starting point is 00:05:40 Because Adam enjoyed his toy that is being built in the factory. And let's see what he thinks of it. I love toys. Don't we all? Don't we all? Well and let's see what he thinks of it. I love toys. Don't we all. Don't we all. Let's get right to it. Show us the toy Gerhard. All right, so I'm going to share my entire screen.

Starting point is 00:05:52 That's why I want to mention that this is a presentation style. It is a presentation style. I could even share the slides. Now they're about 80 megabytes big. Wow. Because there are some recordings as well, right? If something will not work live, that's okay. I already did it. It's The screen behind me will be out of sync with the right will be doing but but but the thing itself

Starting point is 00:06:17 Has been captured so that I'm can't tell a good story. All right, so the thing which I would like us to do now is click around a few versions of the changelog site and talk about how responsive the different versions of the changelog site feels to us. And I think this is why Jared's internet was important so that he experienced it as close as he would normally know tethering nothing like that. So we will start with the origin.

Starting point is 00:06:50 The origin as our listeners will know runs on fly and we always capture a date when this was created. This particular origin was created in January of 2024, 12th of January, which means that the URL to go to the origin, by the way, most users will not do that. This is for the CDN to do that, right? It is changelog-2024-01-12.fly.dev. And I want you to be logged out. That is important. Or just use simply a private window, whichever is whichever is easier. We don't want any cookies. We don't want you to be logged in because the experience will differ if you are. And this is going to be our baseline.

Starting point is 00:07:36 So it's important that the reading is accurate. And this is as slow as it gets. This is if we were to hit the website. This is running in Ashburn, Virginia. And it basically comes down to your network latency to Ashburn. Okay. So let's open up the website. I'm going to open it up as well. And I'm going to click around and see how it feels.

Starting point is 00:08:01 How responsive does it feel? That's what we're aiming for. Remember to be signed out. That's the important part. I'm clicking around, I'm signed out, I'm in a private window. Perfect. So how does it feel in terms of responsiveness, the website? Average. Average. What about when you click on news? Do you see any delays, anything like that? I would say there's a slight delay. It doesn't feel a snappy atom feels like what about you I can see it rendering do images play a game into this like oh yes okay because that's what I'm noticing most it's laggy is

Starting point is 00:08:35 like the the viewport kind of gets painted and then it moves around because the images catch up and yeah it doesn't feel it feels like it's like I'm on tethered internet basically. Right. There you go. See, imagine if Jared had a tethered internet, how slow that would feel. Double tether. Cool. Yeah, exactly. Something like that. Yeah. So now the interesting thing is that even though the images do serve from

Starting point is 00:09:00 the CDN, everything else around them, the JavaScript, the CSS, all of that. I don't think it does. Let me just double check that. Oh, it should. It should. Yes, actually does. So it is just a request to the website. You're right. Actually, yes. Everything, all the static assets are served from the CDN. It's just a request to the website, which makes it feel slow. And I don't think we're biased. I don't think we are imagining this. I have been looking at this for quite a while, and it all comes down to that initial request.

Starting point is 00:09:35 Anything that hits the website for me takes about 360 milliseconds, and this is constant. So I'm showing here the HTTP stat output, a tool. We talked about it. We may drop a link in the show notes. And that's what it comes down to, right? Like the origin itself is slow, the further away you are from it, the slower it will get. But you in the US would have expected this to be snappier. So interesting that it isn't. it's borderline snappy. I can feel it a little bit but it's not bad. Right. And I think that is because you have the changelog.com experience. So now if you go to changelog.com and do exactly the same stuff that you did before, changelog.com, and you click around, how does it feel now? Instant. Yeah, I mean it's snappy. Versions of instant. Almost instant. Like some pages feel... I think the news is the one that you notice that like that paint just takes like a little bit longer, right?

Starting point is 00:10:28 It's not instant, doesn't load instantly, but it's significantly better than if you were to go to the origin. Agreed. And this will be consistent for everyone. I think that is the advantage of changelog.com actually running through the CDN. All the requests run through the CDN, even the ones to the website. So the thing is that if it's not in the cache, if it's a cache miss, for me, it loads the homepage loads in about 300 milliseconds, which is slightly better than when I go to the origin, but it's not great. Now, obviously, if this is a cache hit, in my case, it loads in under 20

Starting point is 00:11:08 milliseconds or around 20 milliseconds. And 15 times quicker is a noticeable difference. Sure. So as soon as these things get cached, it's really, really fast. So we would expect this from a CDN all the time, why they should consistently behave like this. And by the way, title proposal, 15x quicker, maybe. We'll see. We'll see, we'll see, right? We're getting there. Note taken.

Starting point is 00:11:35 So the problem is that with a current CDN, 75% of homepage requests are cash misses. So 75% of all requests. Which is to me, insane. It is insane, right? That sounds pretty bad. So some would say, present company included. Okay.

Starting point is 00:11:58 It defeats the purpose of a CDN, right? I would agree. Yeah, but there's more. But there's more. There's more. Tell us, tell us. So here, this is a question for you, both Adam and Jared. What do you think is the percentage of all GET application requests that our cache hits? How many of all the requests that go to the app, to the origin, do you think are being

Starting point is 00:12:28 searched from the cache? And the options are 15%, 20%, 25%, or 30%. What is your guess? Well, I buzzed in thinking it was a game show. My bad. I buzzed myself in even. Go ahead. You get to go first.

Starting point is 00:12:43 20%, please. 20%. Okay, Jared. I buzzed myself in even. Go ahead, you get to go first. 20% please. 20%, okay, Jared. So you just told us that 75% are misses. Yep. And that's every type of request. Now you're asking- No, no, no, sorry, 70, just the homepage.

Starting point is 00:12:58 Oh, the homepage. Just the homepage is 75% miss, which means the homepage is a 25% hit. Now I'm asking about all the requests to the application origin. Remember, we have a few origins. Okay. Just going to be application or going with the highest possible choice. 30%.

Starting point is 00:13:16 So yeah, 17.93%. So yes, Adam is closer. 15% would have been accurate. 20. I think 20 is more accurate because eight 17.93 is closer to 20. Sure. So yeah, I think you were too optimistic. Because if 30% were cash hits, that would be good.

Starting point is 00:13:34 It's actually 17%. 18%. 18% are cash hits. Everything else is a miss. And the window is the last seven days. The last seven days. The last seven days. So in the last seven days, only 18% of requests were searched from a cache. How does this make any sense?

Starting point is 00:13:53 Right. So October 2023. This is what we started right on this journey. When this was issue 486 in our repo, what is the problem? Well, after October 8th, 2023, CDN cache miss is increased by 7x. It just happened. We looked into it, we tried to understand it, and we could not. And it's been ever since then? Or it's like this systematic problem ever since then? Well, it has been low ever since. So the cache hits have been low to the application specifically ever since,

Starting point is 00:14:29 which is why even when you go through the CDN and you think, right, things are snappier and they are to some extent, many requests, they are just cache misses, especially going to the application. So here we are today. It's only been three weeks. So it's only been three weeks. So let me explain what it means. So it depends how you count.

Starting point is 00:14:54 OK? So the thing is that roughly that's how much time I had to spend on this, like about three weeks in total. Got you. Got you. Right? Spread over about three weeks in total. Gotcha. Right, spread over like a long period of time. Right.

Starting point is 00:15:08 So we are just about to unleash our clicks on pipedream.changelog.com. Okay. Bring your mice out and let's do this. Let's unleash. Let's see how this feels. Our clicks. Pipedream. And by the way, anyone can reproduce the same experiment. Let's see how this feels. Our clicks. Pipe dream.

Starting point is 00:15:25 And by the way, anyone can reproduce the same experiment. Remember to be logged out, that part is important. Or a private window, because if you have any cookies, it will bypass the CDN. That's the rule. When should I do this? Right now? Right now.

Starting point is 00:15:40 Yeah, right now. Just click around and tell me how it feels. I mean, I've tested it myself, but I don't have your experience. So how does it behave on your side of the world? So one thing in particular that I noticed between the two of them right away, because I clicked into news,

Starting point is 00:15:56 and it seems like there's this paint delay on the right-hand side. So we split that viewport news, side is subscribe right side is the newsletter Very very cool, but that right side newsletter side the the background color seems to like delay paint I'm not sure if that's it's happening here as well as the past an iframe. So that's a secondary request. I got you Okay, so I'm not gonna judge that then I think that's important He's like the whole thing like how does one compare to the other? Where's the iframing from though?

Starting point is 00:16:28 Iframing from? From the same site. The same site, yeah. It should be, yeah. So for me, again, for me, when I click on news, I can see that the iframing, right, there's a little bit of a delay. But when it paints, for me, it paints instantly on Pipetree.

Starting point is 00:16:44 On change.com, there's like a little delay between the whole thing. That's at least like how I experience it. But when it paints for me it paints instantly on pipe dream on changeable calm There's like a little delay between the whole thing that that's at least like how I experience it Now anyone can reproduce this and we wonder or I wonder how do you perceive the two? Wherever you are in the world if you click around these are live links by the way Changeable calm and pipe dream the changeable calm they should both behave, sorry, they will both have the same content. And what I'm wondering is, how do you perceive them? Is there a significant difference? Is it the same? Right. What you notice? What about you, Jared? Do you notice anything different? My experience, specifically on the episode page, which I think is a good

Starting point is 00:17:24 one because it has a lot of, I just call it first party content, not even CDN content. Cause I do, I mean the CDN is a CDN, right? So I do see the images lazy loading in slightly, just like they would on the previous one. However, the first party content, for instance, I'm on making DN simple, podcast 637,

Starting point is 00:17:44 which has all the podcast information, all the chapters, and then the entire transcript, which is lengthy, and it loaded in very quickly. Obviously my browser's not rendering that text that's off the screen, but it has to at least download it in the HTML. So that was very fast. Other than that, it feels similar to changeall.com, has to at least download it in the HTML. So that was very fast.

Starting point is 00:18:05 Other than that, it feels similar to change.com. And it's the images that I do notice load in, because they're lazy loaded. They load in a split second later. Other than that, but yeah, I think the episode page is a good test and it's significantly faster. Okay, so pipedream.change.com,

Starting point is 00:18:21 if you look at the requests, to see the network requests in your developer tools, you will see that all the static assets they load from CDN2 dot change dot com, which is the pipe dream too. So everything that we serve, all the origins, whether it's the assets, whether it's the feeds or the website, it all goes through the pipe dream.

Starting point is 00:18:46 And the application was changed, that's what we were talking about earlier, we may unpack that. The change is to every public URL that we serve, now we have an alternative, which is all running through the pipe dream. I'm using an HTTP stat here and I'm going to https pipe dream dot changelog dot com. If it's a cache hit it loads for me in 25 milliseconds which is slower than changel. However, if it's stale, it should also return within 25 milliseconds, which is what's happening here. Our content should always be served

Starting point is 00:19:39 from the CDN, regardless if it's fresh or not. And in this case, what we see, if it's already been served once, it will stay in the cache until there's pressure on the cache and we control when that is. We just basically size the cache accordingly. We give it more memory and then more objects we'll store will remain in memory. And what we want to do is to always serve content from the CDN whether it's stale or not. So this was a cache hit,

Starting point is 00:20:10 right? You can see there's a cache status header, it was served from the edge, we see what what region it was served from. By the way, if you were to do a curl request, you'd see the headers, you would see like all this information, even in your browser developer tools, open any endpoint and you get this information for every single response. We see what was the origin that the request, the CDN had to go through to fulfill the request. The TTL, that is the important, and there's the important flag, which is, sorry, the important value, which is how long was that object stored in the cache. In this case,

Starting point is 00:20:47 it's minus four. It's a negative number, which means that it's considered stale. The default value, the default ETL is set to 60 seconds. Anything that was requested within 60 seconds is considered fresh. But then we have this other period, this other value which is grace, which says for for 24 hours continue serving this object from the CDN, but try and fetch it from the background. And also we see that this has been served from the CDN 26 times already. As I read these headers, these are important, every single request now has them. We can see which was a region, which is an edge region. We don't have an origin yet, but we should by the way. The closer you are to the origin, it just says the origin.

Starting point is 00:21:33 Shield, all that we can configure now. What a Shield origin does basically, the CDN instances, which aren't close to the origin, they will go to the CDN instance which is closest to the origin. And that's so that we place as little load on the origin as possible. I don't think that will be a problem for us, but we can do it if you want to. And the question is, after all these years, are we holding fly.io right? What does that mean? Well, changelog the application has only

Starting point is 00:22:12 been deployed in two regions, right? Actually one region, and we have two instances. But we always wanted to have it spread across the world. The problem with that is how do we connect to the database? Then you're introducing latency of the database layer. But now these CDN instances, they can be spread around the world. So that means that finally we're doing this right. Right. We just put it in front of our app instead of making our app be distributed. Now we're distributing in front of it. I think so. Yeah. So, shall we see where these instances are running?

Starting point is 00:22:49 Yeah, we'll see it, man. I'm curious. Are we curious about anything else before we move on to that? I'm curious about the rollout of this thing because I've noticed a few things this week and I'm wondering if maybe things are pointing at different directions and if that, uh, if that explains some stuff that I've been seeing, but we can maybe hold that for later. I think we can talk about that now. Just, just like, so that's where we go. We go through this. I, we never had this situation before, by the way, where we have two application instances completely separate that are pointing to the same database, right? So the data is always the same,

Starting point is 00:23:24 but one is going to become the new production and it's configured in a certain way with a new CDN and the existing application, the one that's behind changelog.com is still consumed by our production CDN. I mean, we have two CDNs, that's a situation. Right. And we can't change the production application

Starting point is 00:23:45 because if we do that, then we have rolled out the new CDN and we don't know whether we are ready yet. I think that's what we need to determine today. What else is left? How do things look so far? And just like assess the readiness of the new CDN. Yeah. Of the pipe dream.

Starting point is 00:24:05 So what things have you noticed, Jared, that are off? So I shipped ChangeLog News Monday afternoon and that particular episode has dramatically lower downloads. So low, in fact, that it has to be a bug somewhere in the system that's not real. Like it's not a real number or and I'm wondering if maybe a bunch of podcast apps got pointed to the new CDN and we're not capturing those logs which is how we get this stats.

Starting point is 00:24:35 So that that was the first thing I was like there's no way that this is actually only been downloaded 700 times or whatever it was. Yeah. In the first day. That was the first thing I noticed there. And you're nodding along, so you're thinking probably that's the case. Yeah, I think so. I think that that's what happened.

Starting point is 00:24:49 If, so depending on which instance picked up the job, right, like this is all like background jobs, it must have pushed a different URL than the live one. So then all those podcasting platforms, like how would you call them? All the podcasting? The clients, I mean. Okay, so all the podcasting platforms, like how would you call them? All the podcasting? The clients. I mean, okay, so all the podcasting clients, some of them, maybe all of them may have picked,

Starting point is 00:25:12 but I think if it would have been all of them, we would have seen zero downloads. Yeah, it wasn't all of them. It was just some of them. Maybe eventually the other app caught up and started doing things because we sent out a bunch of notifications, you of notifications in the background. Not because we have multiple instances. And I think this must be a job queue, right? Whichever instance picks up the job,

Starting point is 00:25:34 basically it puts its own URL and then ships it to the actual sub-eins that we are in production without wanting. Damn. Yeah. OK. So I mean, assuming that all those clients got their podcast without wanting. Damn! Yeah. Okay. I know that. So, I mean, assuming that all those clients got their podcast episode, then it works. But we have no way of knowing. So, if our listener here didn't get Monday's news episode for some reason,

Starting point is 00:25:55 let us know. Oh, no. They did. Well, they might have. I mean, the URL is correct, but they are going to the new application instance, which we're not tracking which goes to the new Which has the CDM same data. Yeah same data. Just different applications. The data will be the same. Okay Let me tell you the other thing I've noticed. Okay, go on. So that's the one let's debug live debugging Love you already know about which is that and this is probably the exact same issue Yeah, is that when we posted our auto posts to,

Starting point is 00:26:26 I think Slack in this case, posted the app instance URL, not the tangell.com URL. It might have been both. Actually it was both. Yeah, it was both. And so there was a URL mismatch, which I think is the exact same issue. And then the third one is that I subscribe

Starting point is 00:26:46 to all of our feeds, because I want to make sure they all work. And so whenever we ship an episode, I get like five versions, you know, just padding our stats, getting five downloads for the price I want. And specifically the slash interviews. So yesterday's show with Nathan Sobo,

Starting point is 00:27:04 two days back as far as we shipped this, but yesterday when we record, it went out and I downloaded on the changelog feed and I downloaded on the plus plus feed and I didn't download it on my interviews only feed because you can just get the interviews if you want. And that feed did not have that episode until this morning when I logged in

Starting point is 00:27:23 and said, refresh the feed. And I forced it to refresh that feed and then I got it. And so there's, and again, that's probably, those are background jobs. So somehow that did not get refreshed. So that's the third thing. Okay. The fourth one.

Starting point is 00:27:38 Okay. There's four? Yesterday, I disabled Slack notifications entirely. And this is our last step to cut entirely over to Zulip. And I have a blog post which is going out announcing that we're no longer on Slack. Don't go there. However, after Adam shipped that episode,

Starting point is 00:27:58 it posted the new notification into Slack, even though that code doesn't exist anymore. And I deployed it. And so I'm guessing it still exists on your disk. Your experimental one is not keeping up with code changes. Yeah. Okay. That's correct. So all my bugs are related to this very exciting

Starting point is 00:28:14 new deployment that I didn't know about. We broke it. I don't think we kisanted. I think we broke it. I think so. Yeah, yeah, yeah. I think we broke it. So those are the four things I've noticed.

Starting point is 00:28:23 No, sorry. I broke it. Let me take responsibility for this one. Yeah, that's much more fair. it. So those are the four things I've done. No, sorry. I broke it. Let me take responsibility for this. Yeah, that's much more fair. I had nothing to do with that. Well, friends, I'm here with Terrence Lee talking about what's coming for the next generation of Heroku. They're calling this next gen FUR. Terrence, one of the biggest moves for FUR in this next generation of Heroku. It's being built on open standards and cloud native. What can you share about this journey? If you look at the last half a decade or so, like there's been a lot that's changed in the industry. It's being built on open standards and cloud native. What can you share about this journey?

Starting point is 00:28:45 If you look at the last half a decade or so, like there's been a lot that has changed in the industry. A lot of the 12 factorisms that have been popularized and are well accepted even outside the Ruby community are things that are think table stakes for building modern applications, right? And so being able to take all those things from kind of 10, 14 years ago, being able

Starting point is 00:29:05 to revisit and be like, okay, we helped popularize a lot of these things. We now don't need to be our own island of this stuff. And it's just better to be part of the broader ecosystem. Like you said, since Heroku's existence, there's been people who've been trying to rebuild Heroku. I feel like there's a good Kelsey quote, when are we going to stop trying to rebuild Heroku? It's like people keep trying to like build their own version of Heroku internally at their own company, let alone the public offerings out there. I feel like Heroku's been the gold standard.

Starting point is 00:29:32 Yeah, I think it's the gold standard because there's a thing that Heroku's hit, this piece of magic around developer experience, but giving you enough flexibility and power to do what you need to do. Okay, so part of Fur and this next generation of Roku is adding support for.NET. What can you share about that? Why.NET and why now? I think if you look at.NET over the last decade, it's changed a lot..NET is known for being this Windows-only platform. You have WinForms, use it to build Windows stuff, double-IS, platform, you have WinForms, use it to build Windows stuff, double-I-S, and it's moved well beyond that over the last decade. You can build.NET on Linux, on Mac, there's this whole cross-platform open source ecosystem and it's become this juggernaut of an ecosystem around it and we've gotten this

Starting point is 00:30:19 ask to support.NET for a long time and it isn't a new ask and regardless of our support of it, like people have been running.net on Heroku in production today. There's been a mono build pack since the early days when you couldn't run.net on Linux and now with.net core, the fact that it's cross platform, there's.net core build pack that people are using to run their apps on Heroku.

Starting point is 00:30:39 The kind of shift now is to take it from that to a first class citizen. And so what that means for Heroku is we have this languages team, we're now staffing someone to basically live, breathe, and eat being a.NET person, right? Someone from the community that we've plucked to be this person to provide that day zero support for the language and runtimes that you expect in, like we have for all of our languages, right? To answer your support and deal with all those things when you open support tickets on Heroku and kind of all the documentation that you expect for having quality language support in the platform. In addition to that, one of the things that it means to be first class is that

Starting point is 00:31:15 when we are building out new features and things, it is now one of the languages as part of this ecosystem that we're going to test and make sure run smoothly, right? So you can get this kind of end-to-end experience. You can go to Dev Center, there's a.NET icon to find all the.NET documentation, take your app, create a new Heroku app, run get push Heroku main, and you're off to the races. So with the coming release of Fur and this next generation of Heroku,

Starting point is 00:31:39 .NET is officially a first-class language on the platform, dedicated support, dedicated documentation, all the things. If you haven't yet, go to haroku.com slash changelog podcast and get excited about what's to come for Roku. Once again, heroku.com slash changelog podcast. Okay, so let's talk through this in terms of what a potential fix would look like. We have a new application instance which behaves as production from all purposes, right? Like the content is exactly as production, it connects to the same database instance, it has all the same data.

Starting point is 00:32:24 What isn't happening is the code updates aren't going out automatically. That has not been wired because my assumption was, I will only deploy this one instance. I'm going to change a couple of properties so it has the new CDN configured. And I'll see how it behaves the whole stack in isolation. What happened, obviously, is the new instance is consuming the same jobs,

Starting point is 00:32:46 the same background jobs as the existing production. So very helpfully it has sent the new links, which are all temporary, especially like the application links, the ones that you've seen in Zulu and a couple of other places, which are just for the application origin and they are only meant to be there for the CDN. Everything should go through the CD origin, and they are only meant to be there for the CDN. Everything should go through the CDN, but the CDN hasn't been configured yet through everything, because that's like where the test comes in.

Starting point is 00:33:13 How does the application behave? So some links need to be application links. How does the CDN behave? So on and so forth. So in this case, we need to somehow fix those links, the ones that went out and they're incorrect. I'm not sure whether we know what they are and if not then we need to basically make the new this experimental application instance not send not consume basically jobs not process any back jobs. We just need to disable

Starting point is 00:33:41 O-Ban in that one. Perfect. And then it would never get invoked unless you manually go to the website, right? Yeah. And then we want to make sure that nothing crawls it. Yes. That's a good one. Because then they'll start sending traffic to its endpoints instead of our main website. So let's do that, Tootsuite. So I think, yeah, I think we're finished with the recording.

Starting point is 00:34:04 Let's go and do it. No, no, we haven't. Don't worry. This is still going. Okay. So yeah. But that- Those two changes I think will mitigate the current issues.

Starting point is 00:34:14 Yes. Yeah. Sounds about right. Okay. So that makes me happy as long as we get those rolled out here. We figured it out. We figured what the issues are.

Starting point is 00:34:24 All right. So what do I want to do now? I think I would like to see how many PIPELY instances were running. All over the world. Okay. And for this, I'm going to use a new terminal utility, which I found that I, I like, I was like, yes yes this is exactly what I was missing it's called fly radar fly radar this is what it looks like you need to go to that it's all N curses based it's all happening in my terminal it's beautiful

Starting point is 00:34:57 oh fly radar 0 to 1 I can see all the change log applications the one they were going to look at is a CD and. So by the way, the two applications, do you see this one? The changelog 2225-0505 is a new application instance that was deployed three days ago, while the one above, the 2024, that is the current production. And that was updated one hour ago.

Starting point is 00:35:21 So the code will differ. And the Slack notifications, if this application instance picks up a job, it will do whatever it's configured to do, which will be the wrong thing. Another thing we can do briefly before we figure that out is we could just redeploy that one, so at least it's current. Yes.

Starting point is 00:35:39 And it won't do any Slack notifications. Because I definitely don't want to say, we're no longer doing Slack notifications, and have another one come in, and I'll have egg on my face. As soon as we stop recording, I'll go and do that. Not a problem. Okay. So let's have a look at the CDN 2020, five zero to 25, which is the

Starting point is 00:35:55 instance when it was deployed and it has had a few updates. What do we see? We see 10 instances. You see the region and you see it's been updated one day ago. I see Sydney. Is that right? I see Chicago. Yes.

Starting point is 00:36:11 LHR. Is that the Virginia one? Heathrow. Oh, London Heathrow. Of course. Yes. These are the airports, by the way. JNB.

Starting point is 00:36:18 That's... Johannesburg. Johannesburg. Very good. San Jose. Yes. Correct very good. San Jose? Yes, correct. Okay, IAD, that one's, that's the Virginia one. That's the one.

Starting point is 00:36:31 Okay. That's the one. SIN. Adam, you wanna guess these? SIN? I know DFW. What is DFW? Well, that's where you live. Dallas, Fort Worth, and I think France, the FRA is probably France, is my assumption.

Starting point is 00:36:43 France, what's SIN? No, that's actually Frankfurt. Oh, okay Germany Singapore Singapore Yeah, man, I keep seeing this. Okay, and SCL. I haven't I don't know what SCL is. Come on I keep seeing better. Okay. I don't know. All right, let's do fly CTL Uh platform, I think, regions, I think, and the regions list, there we go. There we see what they are.

Starting point is 00:37:07 SEL, San Diego, Chile. Yeah, that's the one, SEL, yeah, that's the one, Santiago. That was it, SEL. That's how we see what the regions are. Cool. That's cool, man. Yeah, go over there in Australia or New Zealand or something.

Starting point is 00:37:20 Well, we do have Sydney. So, yeah. Oh, that's true. We can add, yeah, we can add more. I mean, we had 10, but we can add more. No, no, no, Sydney covered that. I just do have Sydney. Oh, that's true. We can add more. I mean we had 10, but we can add more. Sydney covered that. I just forgot about Sydney. Yeah, so you've seen all the machines. And in terms of other uses, like it has logs, alpha logs. So this is something that's really cool. So these are the logs for the new.

Starting point is 00:37:38 Let's see what logs we have, what requests we have flowing to the new chain log instance. This is a cool tui. Congrats to the fly radar coder author person. This is cool. It reminds me of canines. Yeah, exactly. That's exactly it. Yeah, that's exactly it. Oh, look, we have some requests. Robots. Robots got some requests and the homepage got some requests. And this is IAD, IAD. So we can see what instances were requested. So now let's go to the... I'm not liking these requests, Gerhard.

Starting point is 00:38:08 How can we get requests? Yeah, well, we will be getting, because we have the CDN, we have monitors set up, we have a bunch of things. Now these are the requests going to the existing. You can see there's a lot more traffic going to the existing application. If you ask me, there's too much traffic. The CDN is not doing its job.

Starting point is 00:38:26 That's what we're trying to fix. There's way too many requests hitting it. And you can see that the regions, right? We have two regions. EWR. Adam, what does EWR stand for? Do you know? Ooh, why?

Starting point is 00:38:41 Write-on? Write-on. OK, yeah, perfect. That's exactly what it is. That's right on, man. Yeah. So so so we can focus only on like like specific instances to see the log. So I think this is really cool. So we've seen this. Let's move on. For Cancly fly radar, for Cancly.

Starting point is 00:39:03 Whoa. That's quite the day for Cankan Klee. Yeah, so he built this. I think it's a really cool tool. You can go and check it out on GitHub. It's all written in Rust. So it's really, really fast. It's a terminal UI and it was inspired by K9s. Oh, look at that. Yeah, yeah, that's it. So issue five. March 22nd. That's when I, I just stumbled across it. So I captured it and go and check it out. But it was really cool. Like when I, when I seen fly radar, I thought like, wow, this is exactly what I, what I

Starting point is 00:39:32 wanted anyway, anyway, back to the pipe tree. So which backend do you think serves the most requested URL? Now the question we have three backends or three origins. Right. You have the application origin, the question, we have three backends or three origins. Right. You have the application origin, the one that we've been focusing on. There's a feeds backend and the assets backend.

Starting point is 00:39:51 So in the last seven days, which backend served the most requested URL? Like the one top URL. The one top URL exactly. Which one serves that particular. That's the question. Yes. Okay There's only three possible answers. Yeah, I'm gonna go with feeds same feeds feeds So apparently we're serving this podcast original image about 10 000 times per day

Starting point is 00:40:19 Or once every 10 seconds, that's the assets endpoint. I had to check what it was. Yeah, it is assets. Yeah, it's our alibar. The change log. The change log. So the answer was assets actually. I guess that makes some sense because everyone has to download that into their podcast app all the time. Yeah. Cash ass sucker.

Starting point is 00:40:35 Come on. I know, right? Do a better thing. Yeah, do a better job with cashing it. That would be a good thing. So, but honestly, it was the second one. I guess feed. So we were almost correct if it wasn't for that one image. I'm wondering, how does the new CDN behave for our most

Starting point is 00:40:55 requested URL, which is not a static asset? So how does it behave for podcast feed? I'm going to run three commands, actually, a few more than three. The recording has been done, so if anything doesn't work as it should, we'll switch back to the recording, but that's going to be a backup. Alright, so let's go back into the terminal and we'll experience this firsthand just to see what it feels like. So I'm in the PyPy repository and the first command which I'm going to run is just debug. And by the way, anyone should be able to clone the repository and do exactly what I do.

Starting point is 00:41:29 What's happening here is behind the scenes, it is building everything that we need for the CDN, including the debug tooling, and it will run it locally. and it will run it locally. And the TUI that you see here, because it is a TUI, it has a couple of shortcuts, is Dagger. So all this is wrapped into Dagger. So I have a terminal opened in Pipely, all running locally. All right. So the first thing which I'm going to do is I'm going to benchmark the current CDN, changelog.com.

Starting point is 00:42:08 So I'll do just bench CDN. All this is wired together, sending a thousand requests to the feed endpoint. And this is what we see. So the current CDN serves about 300 requests per second. And it's the size that is the interesting one. The size is about 220, maybe bytes per second. So I think that the CDN is faster. But the bottleneck here is my 2 gigabit home connection.

Starting point is 00:42:40 And this is as much as I can benchmark it. So that's the limit. So if we were to benchmark using the same connection CDN to this will go to the pipe dream to feed This is how that behaves and by the way, this is live real traffic that's happening here so 177 and 132 Megabytes per second. So what do you think is happening here?

Starting point is 00:43:07 If you had to guess. Well, my guess would be that it's not as much bandwidth as Fastly has. That is correct. Yes. So I'm looking at fly here, right? And this is the CDN instance. We have the different instances. Do you see here like London Heathrow?

Starting point is 00:43:23 That is the one that lit up, lit up in response to me sending it a lot of traffic and you can even see it here, right? If I do London Heathrow, you can see that's the one that was serving the most bandwidth. And actually what I've hit is the 1.25 gigabit limit of this one instance. And that's just a constraint of the actual instance on fly.

Starting point is 00:43:43 Like that particular fly VM or whatever they're called. That is correct. Yeah, exactly. So if I do machines list flyctl machines list, you'll see that and let me just do an RG on LHR. You'll see that we have like a single instance in Heathrow, we could run more. And that's what we're going to do here to see if running more instances will increase the bandwidth. So I'm going to do, let's do flyctl scale count three. And I'm saying we're just basically going to run three instances in the Heathrow region. The reason why we don't do this is we'll just

Starting point is 00:44:15 add cost. When we are in production, we may need to do this because some areas might may be running hotter than others. So we may need to scale it accordingly. But right now, hotter than others. So you may need to scale it accordingly. But right now, every single region has one instance only. So let me do machines list. So what I want to see is they are all started and they're all running the health check. There's one. Yep. These are all good. Yep. Everything is nice and healthy. So now let's go back and let's run the same benchmark. And you'll see it live. OK. So it still has the same 1,000 requests to the feed endpoint.

Starting point is 00:44:52 And 180, so just about the same. Not much has changed. It takes a while for everything to warm up and the request to be spread correctly. We've seen there a blip. So let's see. How does it behave now? OK, so we're 150 megabytes per second. If we run this

Starting point is 00:45:08 a few more times so that everything is nice and spread. Request per second, right? You said megabytes per second. That's request per second. So this is so 171 maybe bytes per second, which is almost like 1.7 gigabits. And the request we have 228. So these three instances, instances that's what we see and if we run this enough times when I tested this last time I was able to get to that about two gigabits, but it's not It's not like an exact result every single time based on network conditions based on a bunch of things you know based on where those

Starting point is 00:45:44 instances are placed within the fly network, but time based on network conditions, based on a bunch of things, based on where those instances are placed within the fly network. But three instances, and even if one added more, I've seen there was like this limit, like obviously like the two gigabits. Well you max out eventually, right? Exactly. I max out eventually. I'm still not maxed out currently. And the reason why I know that is because if I bench CDN 2, I can see that that brings me close to that 2 gigabits, right, 220. CDN 1, this is Fastly. CDN 1, yeah, this is changel.com, this is Fastly,

Starting point is 00:46:11 that's correct, and it's those 300 and something requests per second. So Fastly is still faster because we haven't added enough instances in your region in order to get our bandwidth up on fly to max out your Gerhard's personal bandwidth. Exactly. Exactly. So adding instances doesn't really move the needle very much, but it does move it

Starting point is 00:46:32 eventually if you really want it to. Exactly. So this is, this is maybe even a question to the fly team. So when it comes to the instances, if I look at what instances we provisioned, you can see that we are running shared CPU 2x and they get two gigabytes of RAM. The question is, and I think we kind of like touched upon this last time, even the performance instances we don't seem to be getting more bandwidth. There is a point at which an instance doesn't get more traffic.

Starting point is 00:47:02 And depending on maybe the region's capacity, maybe there is some sort of limit that we're hitting. Now, do you remember bunny? Yeah. Yeah, okay. That's super fast. We can bunch bunny, which is still alive. Bunch bunny or bunch bunny?

Starting point is 00:47:18 Bunch bunny, we can bench bunny and bunny will go, and this is how that behaves. Bunny change load. Bunny doesn't let you, right? Exactly. So the rate limits me. So I can't benchmark Bonnie. You think that's because they don't want to be benchmarked

Starting point is 00:47:31 or you think it's because they're just fighting off? I think it's throttling, yeah. They are throttling. So bonniechangelog.com and I have been benchmarking them quite a bit in preparation for this. My IP might be blacklisted somewhere on the Bonnie side. Yeah. But that's, that's the reality. Cool. It should be able to get like some sort of like pass like,

Starting point is 00:47:51 Hey, I'm a developer and I'm testing things because benchmarking. Of course. Yeah, I think, I think so. I think so. Cool. Okay. So I'm wondering if I had a hundred gigabit internet connection and one day, and this is a fact one day, I will So I'm wondering if I had a 100 gigabit internet connection, and one day, and this is a fact, one day, I will have that internet connection. And Fly did too, right?

Starting point is 00:48:13 Because remember Fly, I mean, in this case, Fly is the bottleneck. Correct. What could we expect from Pipetream? Just up runs the whole of pipe dream locally. Okay, so now you got no network. No network, exactly. It's just like everything is running on the same host

Starting point is 00:48:32 and you can see that this is actually forwarding traffic to the feeds endpoint, to the static endpoint, to even the application origin. This is like all of our features. So it's all here, right? It's all here. It's all here, so let's do bench feed and let's see what we get. Oh, we're getting massive amounts of that's 200,000 requests.

Starting point is 00:48:52 That is 200,000 requests. Yes, it's more. What do you see in data? Can you read that out for us? 85 gigabytes. That was a bit silly, but yes, it's every 10 seconds. So now it's switched because we had so many requests, the scale switched yes, it's every 10 seconds. So now it's switched the others because we had so many requests. The scale switched from one second to every 10 seconds.

Starting point is 00:49:10 And this is what we see. We are pushing 11,000 requests per second. And we're transferring eight gigabytes. Not gigabits, gigabytes per second. So so this is a really fast network. Right. We could saturate close to 100 gigabit, gigabytes per second. So if we had a really fast network, we could saturate close to 100 gigabit. That's insane. So the software works.

Starting point is 00:49:33 And that's just a credit to Varnish, right? Pretty much. It really, really works. When you hold it right. When you hold it right. And you don't have a network. Exactly. Well, you have a hundred, you need to have a hundred gigabit connection. So that's, I think that's the hard part. And Fly needs to, Fly needs to have, or whatever provider we run, it needs to have more network capacity

Starting point is 00:49:54 because right now my internet is faster than what the Fly instance does. Yeah. And I can't saturate it. And we've seen because I can saturate. I can saturate fastly. Cool. So, and I think the interesting thing, which, which I haven't shown you it and I can, I can, I could, because it's behind me, but anyway, that that's not very visible, what I would like to show is basically I'm hitting the limit of my CPU, right? Like where I'm running this benchmark is a 16 core machine and I'm running both Varnish and the benchmark inclined, OHA, OHA in this case. And between the two of them,

Starting point is 00:50:31 they're saturating 16 cores. And that's what we see here. So the bottleneck really is the CPU. It could go faster because again, networking is just all in the kernel. So pipe dream and pipe plea is an iceberg and we explored just the tip of it. So most of it is underwater. Are you talking about lines of code? No I'm talking about many things but let's go. I'm wondering how how many of my 20 lines of ballooning to at this point. It's there. It's there. That thing is coming up. So yeah, stay tuned.

Starting point is 00:51:08 Stay tuned. So VTC stands for Varnish Test Case. And Pontus Algren, one of our Kaizen listeners mentioned this in a Zulub message back in December 2024. So he said regarding the testing of VCL, did you consider the built-in test tool VTC? So you were doing something else previously. I can't remember what you were doing. We are still doing that, but I'm also doing this.

Starting point is 00:51:39 So I'm just going to play the recording. Okay. It's just easier. So just test VTC is going to run in three seconds all the tests for the different varnish configuration that we have for the light cream. Cool. This is really, really fast. This is the equivalent to your unit tests, if you wish. Weren't you running the tests against

Starting point is 00:52:01 like production instances last time? I was. And I still am. Now you have to do that. Are you still are? I'm still there. Yes. Why wouldn't you replace it? Hang on.

Starting point is 00:52:09 Just give it a minute. So we're getting there. So this is, so this is what the VTC looks like. And basically you can, you can control it at a very low level in terms of the requests, the responses, the little branching. So think of it when you're trying to come up with a final varnish, right? You make like little experiments to see how the different pieces of configuration would work.

Starting point is 00:52:33 And that's what VTC enables you to do. You can write a subset of your VCL. You can configure clients, you can configure servers, and you can make them do things in an isolated way, in a very quick way. You can basically model what the thing is going to look like, and you're going to check if what you thought would happen does happen. And that's what makes it really, really fast. And it's all built into the language. So it's there, and we have it,

Starting point is 00:52:59 and it gives me a nice tool to figure out what is the minimal set of varnishes I have to write for this. And I think this is where like that number of lines of code and number of lines of config comes in. But we all know that we want acceptance tests. We want to see what users will experience. And remember, this is what you were asking for Jared you were saying How do we know that this new thing is going to behave exactly the same way as the existing thing behaves? So what we now have is you see the test acceptance These are all the various things that we can run in the context of Pipe Lee. We can do test acceptance CDN test acceptance CDN to, or test acceptance local. And this is using a HURL,

Starting point is 00:53:49 and we're describing the different scenarios that you want to test for real, testing these real endpoints. Which one would you like us to try out? Local. Local, great. So what I've heard is change log. That's exactly what I said. Change log, okay. Okay, so what I've heard is change log. That's exactly right.

Starting point is 00:54:06 Change log, okay. So what I've heard is... I don't know why you even ask. Well, you have to have a bit of fun. So test acceptance CDN and test acceptance CDN is going to run the same tests against the CDN. It's going to test the correctness of our CDN. Not using VTC though. Say again?

Starting point is 00:54:26 Not using the VTC stuff. No, this is HURL. This is HURL stuff. So different tests. Exactly. This is like a different level. The VTC stuff is just for the varnish config. HURL, in this case, the acceptance test, are doing real requests and checking the behavior

Starting point is 00:54:41 of the real endpoints. Like for example, am I getting the correct headers back? Am I being redirected? Is this returning within a certain amount of time? What happens if I do this request twice? How does it behave? Is it a miss versus a hit? What happens?

Starting point is 00:54:59 So we have 30 requests that we fire against the existing CDN and we see how it behaves. And then what we're going to do, we're going to run the same requests against the new CDN and it's slow. Why do you think it's slow? Well, I don't know what these tests are doing. So I can't answer that question. So these tests are checking the behavior of the various endpoints, for example, the feed

Starting point is 00:55:27 endpoints or the admin endpoint or the static assets endpoint. In this case, you can see that we are waiting for the feed endpoint. So if you go back and you think about the various delay and the stale versus miss, we are checking how the stale behavior, sorry, we're checking how the stale properties of a feed responses behave. So if I'm going to hit this endpoint within 60 seconds, will it show up as stale? So we're literally we're checking and we have to wait to see will it expire, will it will it will it refresh? So so you're delaying on purpose to see exactly I'm delaying it on purpose and it takes about 70 seconds because we need to wait

Starting point is 00:56:15 That long right to test the staleness and by the way, that's that's something which I'm going to do next So we're going to check the staleness of something and the staleness currently currently set to 60 seconds. And you can see we can do the variable delay. So this is the real CDN. We're going to pipe dream. We're not testing the local one. We're testing the pipe dream one. And this is the existing configuration, which we consider to be production. Now you said local, and now we can do the same test. We're going to run them against local, and we're going to change a couple of properties because locally we want slightly different behavior, and what we care about is that speed, right?

Starting point is 00:56:53 We want these tests to be much, much quicker. And in this case, you can see like the actual requests going through, you can see the responses, you can see the headers. We still are testing delays, but the delays are much shorter, which means that the test will complete much, much quicker. So we control these variables. The production is just like,

Starting point is 00:57:11 you know, as it is. This is how it behaves and that's what we're testing. So it will be slightly slower. Shall we do it for real? Would you like me to try to run another test and see how it behaves if I do the acceptance local or shall we move on to something else? What is the conclusion from that? Like conclude some things for me. Well, the conclusion is that we are able to run the CDN locally and poke it in product and make sure that the CDN in this case is behaving exactly as we expect it to.

Starting point is 00:57:40 We have a controlled way of configuring everything. What I mean by that, I mean the backends, the various backends that we use. We have properties to control like TTL, Stainless Freshness, and see how different configurations change the behavior of the system. We also have it deployed and we can check if the existing CDN behaves the same as the new CDN. I haven't written all the tests, only like the big ones, does the feed endpoint behave correctly? Do the static assets behave correctly?

Starting point is 00:58:12 What about the admin endpoints or those that shouldn't be cached, do they behave correctly? So I'm starting to build a set of endpoints and set of tests that check how those endpoints behave. And there's certain differences, right? Like one CDN behaves slightly differently. We know like the existing one, right, that we're trying to improve on. So we can see where does it fall short. There's a couple of interesting things that we can look at. For example,

Starting point is 00:58:39 I've seen that we, for example, don't cache the JSON variant of the feed of the RSS. Maybe we'd want to do that. I don't know. But going through this, like testing the correctness of the system, made me look into parts that I wouldn't normally look. The best part is that we can run this locally.

Starting point is 00:59:02 We are in full control of everything that happens in our CDN. It's a lot of responsibility and it takes a certain level of understanding to know what the tools are and how they fit together, but we have it. Yeah, that's awesome because now we don't have to just poke at a VCL in the sky and hope that it does what it does and just only test in production. You know? You can actually make changes with confidence. Is that a state of the art for any of the CDNs out there?

Starting point is 00:59:37 Like can you do this level of acceptance test between, I guess you probably can't, right? We can't run FASI locally, we can't run even bunny locally we can only run our own thing locally so you can't really test the way you'd develop it locally and then develop it in production but you can test you know XYZDN versus Pipeley or PipeDream right you can test that that's what you're doing right now I think the first step is to being able to run it locally and running anything of that magnitude locally is hard. Let me rephrase that.

Starting point is 01:00:11 I would say if you are unhappy with your CDN provider, thus far has there been a way to say, what the original question was, can we trust moving to something else? In this case, the something else is something we've built, not a different public provider. And so we're scrutinizing a little bit more, but if you were unhappy with, you know, one CDN and you were thinking, man, I want to move to a different one.

Starting point is 01:00:38 Has there been a state of the art to test the, I guess, the efficacy between different CDNs. Has this tooling been there before? I'm not aware if it has. If someone from our listeners is aware of such tooling existing, I'd love to learn about that. I think it pretty much comes down to DIY, as in how much of the correctness of the system

Starting point is 01:00:59 are you testing for? And in this case, even though it is a CDN, it is part of our system, right? Because it determines how the changelog website and the application and all the origins behave ultimately. How do users perceive them? And the best thing that we have, honestly, are the logs. Because based on the logs, you can see what users experience. But is that good enough? I mean, these systems are really big, right? Like global scale big. It's really hard, for example, even for me, I mean, sure, I could

Starting point is 01:01:29 force and test every single endpoint. But on like when I'm running these these tests, right? When I'm, for example, testing changelog.com, I'm testing whatever wherever I'm closest to, based on the network conditions based on whatever's happening. And I need to encode certain properties I care about to check that they are behaving correctly. The same tooling could be used for any other CDN. So once we encode the things that we care about in terms of the correctness of the system, let's say that one day we migrate to Cloudflare.

Starting point is 01:02:02 If we did that, we would run the same set of acceptance tests against Cloudflare or whatever we're building there and see does this thing behave the same as the thing that we're migrating from. So there are like these harnesses that we are required to have to make sure that the systems behave correctly because they're big complicated systems and most of them are beyond our control as we've learned over the years. Does that answer your question, Adam? Kind of. I mean, I think it does.

Starting point is 01:02:28 I think what I was pointing at or potentially trying to uncover is the potential of, you know, we're all allergic to vendor lock-in essentially, you know, I feel like I wonder if there's a level of vendor lock-in because the, you don't know unless you make the move and it's hard as a developer and I see, or even a VP, to say, we've got to make this change. We've got to move to a different platform because of X, Y, and Z, and whatever their data is, whatever their reasons are. And I wonder how many people or how many teams

Starting point is 01:02:58 are staying where they're at because they have fear of the unknown. The unknown is that they can't test to this degree, this acceptance level. I mean, yeah, that is real. I mean, just think about the journey that we had to take to get to the point where we are today. It took a lot of effort, took a lot of time.

Starting point is 01:03:15 It took a lot of understanding what even are the components. And we could have picked something else. We didn't have to pick varnish, but we didn't want to at least I didn't want to change too much at once. One day we may replace varnish. It is possible. The real values in understanding what the pieces are and how they fit together, whatever those pieces are, whether it's Kubernetes, whether it's a PaaS, whether it doesn't really matter. It's a database. Take your pick. Each context is different. So then how do you go about understanding what the pieces are? How do they interact?

Starting point is 01:03:50 And how do you ensure, I think this is coming back to where we started, how do you ensure that what we do does genuinely improve things? And that is the hard part. Being able to measure correctly, being able to understand what improvement even means in the first place is really hard. And what trade-offs are you okay to make? We take a lot of responsibility by running this ourselves. And I'm very aware of that.

Starting point is 01:04:15 I think that is really like the hard part. Being confident that you can pull this off, having the experience that you can pull it off, and you can learn anything that you're missing. And if you apply those principles to whichever context you operate in, you'll be good. It won't be easy, but you'll have learned so much. Well, friends, I'm here with a good friend of mine, David Shue, the founder and CEO of Retool So David I know so many developers who use Retool to solve problems, but I'm curious help me to understand the Specific user the the particular developer who is just loving retool. Who's your ideal user? Yeah, so for us the ideal Yeah, so for us, the ideal user of Retool is someone whose goal first and foremost is to either deliver value to the business or to be effective.

Starting point is 01:05:12 Where we candidly have a little bit less success is with people that are extremely opinionated about their tools. If, for example, you're like, hey, I need to go use WebAssembly, and if I'm not using WebAssembly, I'm quitting my job, You're probably not the best virtual user, honestly. However, if you're like, hey, I see problems in the business and I want to have an impact and I want to solve those problems. Retool is right up your alley. And the reason for that is Retool allows you to have an impact so quickly.

Starting point is 01:05:36 You could go from an idea, you go from a meeting like, hey, you know, this is an app that we need to literally having the app built at 30 minutes, which is super, super impactful on the business. So I think that's the kind of partnership or that's the kind of impact that we'd like to see with our customers. You know, from my perspective, my thought is that, well, Retool is well known. Retool is somewhat even saturated.

Starting point is 01:05:58 I know a lot of people who know Retool, but you've said this before. What makes you think that Retool is not that well known? Retool today is really quite well known amongst a certain crowd. Like I think if you had a poll like engineers in San Francisco or engineers in Silicon Valley, even I think it'd probably get like a 50, 60, 70% recognition of retool. I think where you're less likely to have heard of retool is if you're a random developer at a random company in a random location like the Midwest, for example, or like a developer in Argentina, for example, you're probably

Starting point is 01:06:30 less likely. And the reason is, I think we have a lot of really strong word of mouth from a lot of Silicon Valley companies like the Brexit, Coinbase, Doordash, Stripes, etc. of the world. There's a lot of chat, Airbnb is another customer, Nvidia is another customer, so there's a lot of chatter about Retool in the Valley. But I think outside of the valley, I think we're not as well down. And that's one goal of ours to go change that.

Starting point is 01:06:51 Well, friends, now you know what Retool is, you know who they are. You're aware that Retool exists. And if you're trying to solve problems for your company, you're in a meeting, as David mentioned, and someone mentions something where a problem exists and you can easily go and solve that problem in 30 minutes, an hour or some margin of time that is basically a nominal amount of time. And you go and use Retool to solve that problem. That's amazing.

Starting point is 01:07:19 Go to Retool.com and get started for free or book a demo. It is too easy to use Retool and now you know, so go and try it. Once again, retool.com. Because we're able to do this whole, you know, multi application, multi CDN scenario, is there a way to say, test 75% of our traffic goes to existing CDN,

Starting point is 01:07:49 25% of our traffic goes to new CDN over a course of time, like as this confidence, you know, gets to a higher level, is that, like what's the proper way, you don't just like switch it off, right? Like we're testing and confirming it and things like that, like how does it work in different scenarios? But is that the prudent way to roll it out or am I jumping the gun on your No, no, no, no. I think this is good. This is exactly, I mean,

Starting point is 01:08:13 these are like the big questions because honestly there is no right answer. So a progressive rollout is the most cautious one, especially if you don't know how the new system is going to behave. In our case, we're spending a lot of time to double check that the correctness of the system is right and that the system behaves correctly when it comes to all the other... So it's one component, the CDN, but it integrates with S3. It integrates with a bunch of other, it integrates with S3 for stats, right? It integrates with Honeycomb for all the telemetry, for all the traces, for all the old events.

Starting point is 01:08:55 It integrates with R2, the different R2 backends for the actual storage of certain components. So there's like a lot of, we're just basically replacing a central piece and everything around it still has to still still remains and the integration has to be right. So yes, we could do a gradual rollout in that maybe from a DNS perspective, we say 25% of queries return this backend or this origin, sorry, in this case, let me just not compound the word origin, 25% of the requests go to PyPedream, and 75% go to Fastly. And how do they behave? But at that point, we are maintaining two systems, which is okay, but it cannot be a long term solution.

Starting point is 01:09:43 Right? So we want to shorten the window in which we run both systems at once and that both are active. Because we could very easily switch, for example, to pipe dream, right? Make sure that everything runs correctly. Let's say that we detect that,

Starting point is 01:09:58 hey, for some reason something isn't behaving correctly. We still have the old system, we just point the DNS back and everything continues as it was, which is why two of everything. That's another principle that we have. So at this point, we have two CDNs, we have two applications, which are completely isolated. Now, they are running on fly, like the runtime is the same. But if one was to go down, the other one wouldn't know about it. So we've designed this in a way that is very cheap to fail. The new stuff, if it fails, will have impacted maybe a few minutes worth of traffic and fail catastrophically, which is why

Starting point is 01:10:33 running all these benchmarks, running all these correctness to make sure that the chances of that happening are low. No guarantee in anything, but they're low. And going back and forth is super easy because we run both things at the same time. The problem of running both fastly and the new one is that we may see inconsistent data that gets written out. I'll go to great lengths. I mean the logs, I mean events. I'll go to great lengths to ensure that's not the case. But if there are little discrepancies, we may end up with different data and it may take a while to find that out, especially on the metric side. What kind of data would be different?

Starting point is 01:11:12 Like a different image or, you know? The stats that we write. It's like all the requests that come in, the stats that we write to S3, for example. And when Jared processes them, right, when the background jobs kick off, they just can't reconcile the two different ways of saving the same data. Because there's a lot of config in Varnish, sorry, there's a lot of config in Fastly that configures how we write out the logs to S3. And that will be accurate.

Starting point is 01:11:41 The problem is that certain properties that Fastly has, PyPetri may not have. Again, let's remember Fastly is a version of enterprise varnish, which is completely different. Like it's only them that they have certain properties about varnish. We don't have certain methods. We don't have table lookups.

Starting point is 01:11:59 There's so many features that we don't have in the open source varnish. So there might be differences in what we could, what we may be able to do. For example, the geo IP stuff, I don't know how that's going to work or if it's going to work at all. And maybe it's fine. But that's an example of something that's running these two systems at the same time. We'll need to reconcile the differences.

Starting point is 01:12:20 I suppose no two differences switching everything across across and then oh you are missing these properties that you care about But that is the risk of going from one thing to another thing Well, I found the answer to my question. It looks like it's about 308 lines of code at this point Great you're getting there, but that's okay. You preempted it. Oh good. Oh good. Oh good cool. I care about yeah I know I know yes, so it quite, yeah, it changed a bit and we'll go over that in a minute. So.

Starting point is 01:12:50 One more question for you before we go on. Yes. You said the phrase enterprise varnish. Is there such a thing? Do they have like a different fork of it they're developing? Yes, absolutely. Okay, open core style.

Starting point is 01:13:00 So there's obviously there's varnish and there's enterprise varnish. Enterprise varnish is a paid product. As far as I know and Fastly started, this is like going through their blog and going through the various public information which is out there. They started with Varnish, but they've been changing it a lot over the years. That was their starting point. I don't know how similar it is to the Enterprise Varnish, but at this point we can assume it is a custom platform, customized varnish. I don't even know if it is varnished.

Starting point is 01:13:29 There are certainly VCL, but I don't know how that maps to what they actually write because that's like all their like proprietary software. Who's in control of this enterprise varnish? They are the varnish people. Um, I searched it on Google and I couldn't, I mean I'm still using Google. Yes. If you go Varnish Enterprise, yeah there is even like a company and consultancy behind it.

Starting point is 01:13:51 Varnish-software.com. They sell Varnish Enterprise. They have the open source Varnish community version. Alrighty. I didn't think I'd land on the right page. It seemed like a not the right place but. Yeah. Varnish Enterprise and Varnish Software is the commercial never been here before

Starting point is 01:14:07 Okay, brand new. Yeah Okay, so varnish cash is the open source community version varnish enterprise this these are things I'm not familiar with I just never paid attention to this this detail. So you got varnish cash open-source varnish Pro You got Varnish Cache, Open Source, Varnish Pro, Varnish Enterprise, Varnish Controller, Traffic Router. Okay, so you got like different layers. So we're using obviously the OAP only available to every developer out there, Varnish Cache.

Starting point is 01:14:35 They are using likely, highly likely, Varnish Enterprise. Yes, because, and the reason why we know this is from the documentation, they have certain phrases, like one behavior that we had to work around as you can see here, right? We have different instances running. So we have pipey running which is varnish, right? That's what it's like varnish 770 But we have feeds and feeds is the TLS proxy We talked about it in the last episode the TLS proxy terminates TLS to backends, in this case, HTTPS traffic. Varnish itself cannot go to TLS backends. It doesn't terminate SSL. Varnish Enterprise does. And the reason why I know that is because that's what we use in the

Starting point is 01:15:19 FastDVCL config. So Varnish in that case does terminate TLS. And that is a Varnish Enterprise feature only. So that was like another thing that we had to solve somehow. And Abhil, again, thank you very much for helping out with that. Writing this like very simple Go proxy, which uses little memories, highly performant, that is able to terminate SSL, which then in this case, Pitely connects to and it's all running locally. So feeds assets and app, there are separate processes and we can see this by,

Starting point is 01:15:53 let's just do this PS. Look at that. This is like the whole process tree of what's running in Pitely. So we have T-marks, obviously that's like the, the, the session, which I have opened here, uh, bash just up, right? It's just a wrapper. It basically invokes Gorman. So it runs like all the various processes. And we have TLS exterminator, local port 5000, proxies to change log flydev.

Starting point is 01:16:19 We can see the process. We can see the memory usage, all of that. It's using currently, what is it? 8 megabytes of memory. And that was asked of benchmarking, right? We ran a benchmark here. TLS exterminator, we're going to feeds, and we're going to, which was the other one.

Starting point is 01:16:37 There should be one more there. Change of place, the static assets. And then eventually we have varnish. So you have quite a few things running here just to get that experience that, you know, in Fastly's case, it's just all part of varnish. So these, we are bringing different components together, building what we're missing so that we get something similar.

Starting point is 01:17:00 And ultimately what we care is how the system behaves from the outside. Do the users get the experience that we want them to have or that we expect for them? All right, so I could do this live, but I think it's easier, I can focus a bit better. So the tests, right, we can run them locally. Now, I did mention that we're using Dagger. So if I do Dagger log and change log, what that means

Starting point is 01:17:26 is I'm going to authenticate to Dagger Cloud. And then everything that runs locally will be sent the whole telemetry, like how the behavior of the various commands, like how do they change. In this case, I'm running the acceptance test locally. And by connecting Dagger to Dagger Cloud, I'm able to see all the different things

Starting point is 01:17:47 that run for those acceptance tests. All the commands that get installed, all the tools that get installed, all the commands that run. And in this case, I can even see the actual requests that go to the local instance of Varnish in great, great detail. to the local instance of varnish in great great detail. It's all real time, it's all wasm goodness, and the tests are hooked up too. So when I run something locally I can or even in CI it all goes to the same place and I can understand how these various components behave. How long do they take? That's what we see here like a trace of the various steps. So when something is slow or misbehaves,

Starting point is 01:18:28 I know where to look. So the acceptance tests, they run locally in one minute and 26 seconds. And that's pretty good. So what else is left? We're nearing the end. What else is left before we can deliver this toy to Adam? That's what we are working towards.

Starting point is 01:18:48 So the first thing is the memory headroom. What does that mean? Varnish we are configuring it to use a certain amount of memory so that it can serve as many things as it can from memory. So it's really, really fast. And I went through a couple of iterations basically, and we'll see that in a minute. The value which I set initially was not the right one.

Starting point is 01:19:09 Varnish kept crashing and I had to find out what the right value is because it's not very obvious. Forwarding logs, that is the part which I think it's an important one, but not as like, it's a smaller component compared to everything else. So we will have one more process running. In this case, it'll be vector and vector is going to consume all the varnish logs and it's going to deliver them to different sinks. That's what they're called internally. So one will go to Honeycomb and we'll be able to compare R is the data the same format as we get from Fastly.

Starting point is 01:19:48 Because all the dashboards and all the learning and everything else should work the same. The SLOs and all of that. And are we able to send the same logs with the same format to S3 so that Jared is able to process the metrics? That is the important part, right? When you mentioned that the numbers went, went, went down. Well, we're not getting those metrics from the new instance. Yeah. And the last one is the edge redirects.

Starting point is 01:20:13 And that's just basically writing more VCL, which is fairly straightforward at this point, and by the way, LLMs are very helpful. So I was, I was using agents for this and you know, they, they really go through it like they just said was very good. I was very nice episode. I enjoy that by the way. So stuff like that you know which makes this super super simple is literally copying config from one file to another file and just reformatting it. But we have most of it. A couple of things are different because again our varnish doesn't have all the properties that the Fastly varnish has like table lookups and specifically there's

Starting point is 01:20:41 more like if else clauses and a couple of other things but nothing crazy but mostly straightforward and this is also going to clean up a lot of redirect roles because they're all over the place there's jobs there's go-to's there's quite a few things in our existing varnish config and then the last one is the content purge so we'll talk about that in a minute but the the memory, this is what it looks like, the memory. So basically, you see like, we are looking at the memory usage of an instance of pipelay slash pipetree. And you can see that the limit is two gigabytes.

Starting point is 01:21:20 And we want to be just under it. But then sometimes what happens, there's some requests coming in all of a sudden. This is like one instance that was hit particularly badly. I don't know what was happening with it, but there's lots of traffic going to this instance. And by the way, it was more like bot traffic. It felt like agents are trying to scrape you. That's exactly how it felt.

Starting point is 01:21:38 They tried different things. So it was all just garbage. And when we see these drops, is Varnish was crashing because it was running out of memory. It was getting OM killed. So I had to adjust that headroom a couple of times and now it's been stable. If we look at the actual, let's see if we find it here. It's this one, right?

Starting point is 01:21:57 So I'll look at the last six hours, right? You can see all the various varnish instances, the memory, we never had those big drops. There's smaller drops based on data being replenished and how it changes. We still need to understand those metrics by the way, but that's coming. So things have been stable from that perspective. Cool. And 800 megabytes, that's how much headroom we had to leave for Varnish. This was version

Starting point is 01:22:25 005. It was the last one we pushed and things have been stable ever since. So we need to leave 800 megabytes free so that things don't get killed. That seems to be the goal. Number 400 was not enough. And the pull request 12 is up there, which we're going to send logs to Honeycomb. That is the first one. There is not much else other than just like a placeholder for it, but that's the next big thing. And we need Content Perch. And for this, I need to Tango with Jarrett on this one. It takes two to Tango.

Starting point is 01:22:57 Yeah, pretty much. Pretty much. So this is where we talk about how do you imagine us integrating O-Ban with Fly in this case to understand what the various pipe dream instances are, because we need to send requests to every single one of them when you want to purge content. There is no orchestrator, which was what was happening in Fastly, right? You would send a purge request and then Fastly would distribute it to all the instances or not, because things weren't cashed that well. Anyway,

Starting point is 01:23:31 the point is we need now to orchestrate that purging across all the instances. So how do you think we may approach this, Jared? Well, we need some sort of an index or a list of available instances. Perhaps we could get it from fly directly. Yeah, there's DNS. We can send a DNS query and it will give us all the instances. So as long as we know some sort of standardized naming around these instances, so they're not our app instances or whatever. It's like our PIPELY instances. Yeah. And we just create an O band worker that just says, you know,

Starting point is 01:24:03 you tell it what to purge. It wakes up, says, all right, give me all my instances. Gets that from fly and then just loops over them and sends whatever we decide a purge request looks like to that instance. Yeah. I'd really like to do this maybe before the next kaizen. I sure that's a big one. Because if you think about it, really, it's like these two big things.

Starting point is 01:24:25 It's sending the logs and the events to Honeycomb and to S3, content purging. And that's like this piece where we need to work together on this. And then the edge redirects are really simple. It's literally just like copy pasting a bunch of config, clearing it up. And that's it. That's it.

Starting point is 01:24:43 That's it. That's how close we are. That's how close we are. That's how close we are. It's not even Christmas. So close you can almost play with that toy. Yeah. Well, Garen's been playing with it. Yeah, I have. I mean benchmarking it. I mean anyone can try it. You've been trying it. We go to feeds. We serve assets. Now we just have to do like some of the, I think, tooling around it. Like some extra stuff that is not user-facing.

Starting point is 01:25:08 Because the content purge, I mean, if you think about it, do we need to do the content purge? 60 seconds, that's how long things will be stale, because they get refreshed ultimately every 60 seconds. The problem with that is maybe that is too aggressive for static assets, right? We would like to cash them maybe for a week, maybe for a month, I don't know, like stuff like the image that we've seen, right? The change log image that doesn't change. Yeah. So that could be cash for a year, right? Right. Unless it gets effectively content purged. Is there a way to like classify assets as like

Starting point is 01:25:41 this will never change kind of thing? Like give things like Buckets like a bucket is like on that every absolutely whatever minute cycle B buckets like this almost never changes So let's just go ahead and cash that almost forever Absolutely, and then see is like these things will never ever ever change and when they do it's a manual purge Yeah, I mean that that's all all that is possible. The is, what is the simplest thing that we could do that would ensure a better behavior than we've seen so far from a CDN and something that maybe doesn't require a lot of maintenance? So as I was thinking about content purging, I was wondering, well, if we expire everything, let's say within the board, like if we say feeds, refresh them every minute.

Starting point is 01:26:32 Static assets, refresh them every hour. The application refresh maybe every five minutes, maybe every minute. I'm not sure. Maybe we don't need content purge. When you say refresh, does it literally delete from the CDN and pull it over from wherever? Or does it just check freshness? It's so when a request comes in, it will check freshness when the request comes in. Which means that let's say a request

Starting point is 01:27:04 arrived an hour ago, and the TTL is 60 seconds. When the second request arrives, it checks, is that considered stale or fresh? If it's considered stale, if the TTL is longer, it will still serve the stale content, which means it could be an hour long, whichever the duration is between requests. And then it will go in the background to the origin to, to, to, to fetch a fresh copy. So subsequent requests will get the fresh content, but never the one that checks the freshness, if that makes sense.

Starting point is 01:27:36 There's a port, even if it's the same. Uh, yeah. Because we are configuring the detail. We're saying only keep it for 60 seconds. We're not doing any comparisons, we're not doing any ETAG comparisons, we're not doing anything like that. That's too CPU intensive to do comparisons,

Starting point is 01:27:50 like checksums and stuff like that? It's not that kind of thing. Because I'm thinking like R-Sync, for example, whenever I do things, this is not the same, but it's similar. It's like, hey, I want to go and push something there, but you can also do dash dash checksum, which is like, let me do a computation

Starting point is 01:28:04 between the two things and confirm like even though certain things may have changed like Updated on or whatever, but it's still the same data. It doesn't actually update it, you know I'm just wonder if that's a thing in CDM world Yeah, it is. I mean that's where like for example the e-tax come in right in the knee tag header you can you can put basically the check sum of the actual resource, and then it will check it first. Like say, is this ETAG different

Starting point is 01:28:30 than what I have in my cache? And if it's not, then this is up to date. So it's not time-based, it's just header-based. And all it does is just goes and check the resource on request, but it still means that the first request that comes after that object has been cached may serve a stale content. Actually it will return a stale content. The first one will always return a stale content because that's when the check happens.

Starting point is 01:29:00 There's no background, anything, right, to run in the background to compare all the objects which I have in memory, are they fresh or not? And this is where the content purge comes in. When you know that something has changed, you're explicitly invalidating these objects in the CDN's memory. So let's say you've published a new feed, right? You know you've updated it in the origin, then you send the request to the CDN, which I believe that's what we have today. To say purge this because there is a new copy. And then the first request is going to be a miss. It will not be a stale.

Starting point is 01:29:35 It will be a miss because the CDN doesn't have it. It has to go to the origin. What is this? Can you go back? It would hurt your presentation to go back to the what's left to do slide. Yeah, I kind of want to see that list again.. Yeah. Yeah, of course left. What does it take? Like what what do we reasonably think is required to get to I? Love this your indexing to by the way of this list Although your font doesn't let it be very straight. I'm pedantic now as a designer looking at it

Starting point is 01:30:01 your font doesn't let it be very straight. I'm pedantic now as a designer looking at it. That's okay. What's required to get all this done? Like how difficult of a lift is the remaining steps to put a bow on it? Let's talk about unknowns because I think that's, because it's like, the question is, how long is a piece of string?

Starting point is 01:30:19 And I don't know, like, what is a string? Show me the string, I'll tell you how long it is. And what this means is that I don't know all the properties that we need to write out in the logs to see if we have them. And again, I know that the gip we don't have, I mean, that that's just not a thing. We don't have that. And adding that will be more difficult than if we are okay to not have it, for example. will be more difficult than if we are okay to not have it for example. So maybe we do that or maybe we just add wherever the request is coming like whichever instance is serving the request we just use the instances location not the clients location. So maybe that's one way of working around it. So forwarding logs

Starting point is 01:31:02 it's fairly simple in terms of the implementation. What we don't know is what are all the little things that need to be in those logs for the logs to be useful or as useful as they are today. And this is the dance between you and Jared. Actually, this, no, this, that's the edge redirects. So this is forward logs, forwarding logs is we have to send them to Honeycomb and to S3 Honeycomb so that we understand how the service behaves What are the hits like remember all those graphs that I was able to produce?

Starting point is 01:31:31 We need to be able to see which requests were hit which were which were miss So all that stuff I think in a day I could get the Honeycomb stuff done. I think right. I mean There's nothing crazy about it. It things will not be present, but most of it is fairly straightforward. S3 is a little bit more interesting because I haven't seen that yet, and I'm not familiar with the format. But I know it's a derivative of what we get in the request. So just a matter of crafting a string that has everything we care about. And I'm going to flag if any items, if they're problematic.

Starting point is 01:32:04 So honestly, I would say a few days worth of work, I can get the forwarding logs sorted. Then moving to the edge redirects. The question is, how far do you want to go with them? Are you okay with the current behavior, which everything expires in 60 seconds, and we can be serving style content or do you want to implement what Jared suggested? I want purgeability. Content purge, not edge redirect, sorry, content purge. That's what I meant.

Starting point is 01:32:32 Sorry. I want purgeability. I just like to have the control. I don't think it's gonna be very hard to do. On the logs front, I don't think we wanna lose GEO IP information. I think we could relatively easily, since we're running a background process.

Starting point is 01:32:46 I'm not sure if Vector has that kind of stuff built in or if you just have a script that does two things that pulls the IP, you know, checks it against the MaxMind database and then puts it back in there. There is some integration with the MaxMind and it exists. I know there is like the lite version which is free. Yeah, which is all we would need. And if that's okay, I haven't done it myself, but having looked at the config,

Starting point is 01:33:09 as long as the file is in the right place, which won't be a problem, it's pretty much like baked into the software. Yeah, so if we do that, then we're pretty much everything else we have. But I do think we should keep that because it is nice to know where people are listening to us. So that will make it slightly more difficult, the live version. If we had to go for the paid

Starting point is 01:33:29 version that would be a different story because I don't even know what it takes to get a MaxMind paid database and get it refreshed and all that. We'll have to look at the details. So my goal is by the next Kaizen all this to be done. Yes. That is my goal. That is my goal. Like we are honestly, like one of my title proposals was 90% done.

Starting point is 01:33:54 I feel that we are 90% done or 10% left, right? Whichever, like all the heavy stuff has been taken care of. That's exciting. My title proposal is tip of the iceberg. Tip of, oh yeah, I love that. Tip of the mountain or the, oh yes, I love that. Tip of the iceberg. CDNLC, CDN like change log.

Starting point is 01:34:18 Or, you know, what would Jesus do? Now, what would change log do? They build a CDN. Yeah. Or bottlenecks. That's also a thing. There's like so many bottlenecks in different parts of the system. Right.

Starting point is 01:34:30 Including me. I'm a bottleneck by the way. My time is a bottleneck. But honestly, I'm very happy with where we are with this. I mean, I've learned so much and it feels like we own such an important piece of our infrastructure. We were never, never able to do this. And all because we were patient and diligent and we had good friends is why

Starting point is 01:34:51 we are where we are today. And that makes me so happy. So many people joined this journey. So, yes, I love that. Three of my favorite things, patience, diligence and friends. You know, get you far. I think so too. Thanks, Gerhard. You're leaving us on the cliffhanger here. diligence and friends, you know? Yep. Get you far. I think so too.

Starting point is 01:35:05 Thanks, Gerhard. You're leaving us on the cliffhanger here. Kaizen 19. Ah. Kaizen 19, this is it. This is the last one. Sure, what am I sure today? Dang, man, it's in the wash.

Starting point is 01:35:18 Well, I'm excited about this. I think, let's say next Kaizen, this is production worthy. What changes, you know, once that's true, once that's true, it's in production, it's humming along perfectly fine. What changes for us specifically? Our content is available more or is available full stop when our application is down. That was never the case.

Starting point is 01:35:50 When the application goes down, we are down. We've seen when you had that four-hour flood at IOL teaching at one region, and that's what we went to two regions. That's what prompted us to go to two regions. And with a CDN that caches things properly, that would not be the case. And by the way, that's something that I wanted to test. I don't think we have time for that now,

Starting point is 01:36:15 but in the next time, we'll take the application down and make sure that we're still up. Now that's still going to be, so users which are logged in, I think we'll need to do maybe something clever. And again, it's within our control. We can say, even if you have a cookie, if the backend is down, we will serve you stale content, which is public content so that we look like we are up, but none of the dynamic stuff is getting to work. So that's, that's one thing. Um,

Starting point is 01:36:42 I think this gives us a lot more control of what other things our application used to do. You remember all those redirects that we still have all over the application that we couldn't put in the CDN? Because it would have been working with this weird VCL language that wasn't ours, like pie in the sky, as Jared used to call it, that we don't know how it's going to behave.

Starting point is 01:37:07 So we chose to put more logic in the application that we wanted to, because the relationship with the CDN was always like this awkward one. And I think we had a great story to tell. I mean, just think about how many episodes we talked about this thing. And now it's finally here. And it feels like, like, like, was it worth it? What was the first time we started this? Was it a year ago?

Starting point is 01:37:30 A year ish, year and a half of like two years. I can't remember. Well, I remember October. That's that's October 20, 23. Yeah, that was like quite, quite a while ago, which is when we were seriously thinking like, hey, is this is this experience experience that we can expect that was like the right important milestone in this journey that kind of like started all of this so next Kaizen is roughly July something it's May now right if it's a two-month Kaizen 20

Starting point is 01:38:00 it's a nice 20 we have to do it we have to do it so it's it's a nice 20. Look at that. Nice. We have to do it. We have to do it. So it's it's a quarter off of October. So October isn't quite something September would have been the next guys and after July, right. So it's a little bit before October, but I feel like it's like almost two years, a year and three quarters basically. Yeah, I think if we go to, I'll just very quickly go to change log and the change log repository in the discussions. And I think we had even a question, should we build a CDN? And when was that? January 12, 2024. That was the first one when we asked the question, like, should we build a CDN? I was like, that's that that started out in my mind this journey. So January 2024. It will be one year and seven months, six, seven months.

Starting point is 01:38:54 Yeah, call 1818 19 months. Look at that. If it was 20 months, I'll be crazy. Seven episodes. Delay or an S kaizen by a couple of months. But let's remember there's like all these other things that used to happen and they were happening around. It wasn't just just this. I mean, this was one of the things that was like kicking the background. But again, just just look through all the things that we went through

Starting point is 01:39:16 to get here today. But the definite like like between Kaizen 18 and 19, this has been my only focus because I wanted to get to a point where we can, you know, 90% done. Let's do it. Let's do the last 10% for Kaizen 20. That's what I'm thinking too. We'll celebrate.

Starting point is 01:39:34 Oh, I just had a good idea. Go on, go on. And we can cut this if we don't do it. But let's all go somewhere together for Kaizen 20. Let's be together. Okay. Oh, I'm intrigued. I like this.

Starting point is 01:39:47 London or Denver or Texas or something. Let's get together. Let's have a little launch party. Oh wow. You like this? I like where this is going. That's good. Okay, we'll iron out the details,

Starting point is 01:40:02 but we're all into the idea. Yeah, I like the Denver. Denver would be great. Okay. All right. Maybe we'll invite some friends. Oh, wait, Dripping Springs. I'm just kidding.

Starting point is 01:40:15 That's where I live, Gerhard, is Dripping Springs. To our listener, let us know in Zulip, if you would go to a change log, Kaizen 20, Pipe Lee launch party in Denver sometime this summer. Let us know. Oh wow. Throw a little party. That's quite a cliffhanger. Alright, we'll leave it right there.

Starting point is 01:40:36 We'll leave it right there. Okay, perfect. Alright. See you in Denver. See you in Denver. Kaizen. Always. Kaizen. See you in Denver. Kaizen. Always.

Starting point is 01:40:45 Kaizen. See y'all. So a live Kaizen recording slash Pipe Lee launch party in Denver in July. Would you be there? Why or why not? Please do let us know in the comments. We are serious about this. Are you? Comment in Zulip please. Let's thank our sponsors one more time. Fly.io of course, depo.dev,

Starting point is 01:41:14 heroku.com, and retool.com. Do us a solid and check out what these orgs are up to and tell them changelog sent ya. We love it when that happens. Next week on the pod. News on Monday. Derek Hollison from Senadia talks Nats vs the CNCF on Wednesday. And we are playing Pound Define once again but this time with some new faces and a mysterious one who just so happens to produce our beats. Who just so happens to produce our beats. Oh, I want to do that. I so badly want to do that. Have a great weekend.

Starting point is 01:41:48 Drop a comment in ZULIP if you listen all the way to the end. And let's talk again real soon.

The Changelog: Software Development, Open Source - Kaizen! Tip of the Pipely (Friends)

Kaizen 19 has arrived! Gerhard has been laser-focused on making Jerod's pipe dream a reality by putting all of his efforts into Pipely. Has it been a big waste of time or has this epic side quest morp...hed into a main quest?!

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

The Changelog: Software Development, Open Source - Kaizen! Tip of the Pipely (Friends)

Kaizen 19 has arrived! Gerhard has been laser-focused on making Jerod's pipe dream a reality by putting all of his efforts into Pipely. Has it been a big waste of time or has this epic side quest morp...hed into a main quest?!

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.