The Changelog: Software Development, Open Source - Kaizen! Tip of the Pipely (Friends)
Episode Date: May 9, 2025Kaizen 19 has arrived! Gerhard has been laser-focused on making Jerod's pipe dream a reality by putting all of his efforts into Pipely. Has it been a big waste of time or has this epic side quest morp...hed into a main quest?!
Transcript
Discussion (0)
Welcome to changelog and friends, a weekly talk show about scaling fly machines.
Speaking of fly, thanks to our awesome partners,
the public cloud built for developers who ship.
Learn all about it at fly.io.
Okay, that's Kaizen.
Well friends, it's all about faster builds.
Teams with faster builds ship faster and win
over the competition. It's just science. And I'm here with Kyle Galbraith, co-founder and
CEO of Depot. Okay, so Kyle, based on the premise that most teams want faster builds,
that's probably a truth. If they're using CI provider for their stock configuration or
GitHub actions, are they wrong?
Are they not getting the fastest builds possible?
I would take it a step further and say if you're using any CI provider with just the
basic things that they give you, which is if you think about a CI provider, it is in
essence a lowest common denominator generic VM.
And then you're left to your own devices to essentially configure that VM and
configure your build pipeline. Effectively pushing down to you, the developer, the responsibility of
optimizing and making those builds fast. Making them fast, making them secure, making them cost
effective, like all pushed down to you. The problem with modern-day CI providers is there's
still a set of features and a set of capabilities that a CI provider
could give a developer that makes their builds
more performant out of the box,
makes the builds more cost effective out of the box
and more secure out of the box.
I think a lot of folks adopt GitHub Actions
for its ease of implementation and being close
to where their source code already lives inside of GitHub.
And they do care about build performance and they do put in the work to optimize those
builds.
But fundamentally, CI providers today don't prioritize performance.
Performance is not a top level entity inside of generic CI providers.
Yes.
Okay, friends, save your time, get faster bills with Depot, Docker builds, faster get
up action runners and distributed remote caching for Bazel, Go, Gradle, Turbo repo and more.
Depot is on a mission to give you back your dev time and help you get faster build times with a
one line code change. Learn more at depo.dev. Get started with a seven day free trial.
No credit card required again depot.dev
Well today is a very good day because we are kaisening and
Gerhard is here
And Adam's here and I am here. Hey guys. Hey, it's good to be almost weren't all here
But we're all here happen rain and thunder and lightning
Internet outages. Mm-hmm. So what happened to my internet? Yeah to your internet. I don't know just went down and
Stayed down about
1230 on Monday, maybe one 130
And I called him and told my internet was down and then they said about 12.30 on Monday, maybe 1.30.
And I called them and told them my internet was down and then they said, we'll fix it.
And then they didn't fix it.
And then they did fix it.
But it's a little bit too late for us.
We actually were gonna record at 9 a.m. I think on Tuesday.
And it came back up around 11 a.m. on Tuesday.
So not even a 24 hour outage yet.
Still way too long. Way too long for my liking. And it was just my house. I don't know what
happened. They said they had to rebuild the modem. Which was apparently a remote rebuild.
I think they just flashed it with a new something or another.
You only have one. Let me guess. You only have one internet. I think they just flashed it with a new something or another.
You only have one, let me guess.
You only have one internet.
This is correct.
Well, I do have my phone.
I said, I told you guys I could just tether to my phone
and you know, play hot and loose?
No, not the same.
Loose and fast.
Fast and loose, thank you.
I was thinking fly close to the sun
and I was thinking fast and loose
and I said hot and loose.
But Gerhard, you said you had like a multimedia presentation.
I'm gonna have to like have really good internet
and so we just called it off.
So now we're here, the internet's back.
The rain is over.
I'm assuming it's done raining there, Adam.
You're all clear?
Yeah, I think so.
Oh, no, yes, we're good.
Okay. And Gerhard're good. Okay.
And Gerhard brought us some goodies.
You got a story to tell?
I do, yes.
Tell us what you have to say.
I was thinking about this for some time, actually.
Wow.
And I was thinking is,
when we get close to launching the pipe dream,
to launching Pipe Lee,
how do I want to do this?
And that's the story.
The story is that you were thinking about it, or that you've thought about it and you're going to tell us more?
Well, the story is that I will tell you more.
A lot of stuff has happened.
I decided to double down on the pipe dream on Pipe Lee.
I decided to like all my time went there.
And all that means is that we have something.
We have something. We have something.
Is this the launch story?
I think it's good.
Let's just like, you know, let me set the expectations at the right level.
Okay.
And let's see if Adam approves of the toy that we want to wrap for him.
Okay.
That's the bar.
That is the bar.
Because Adam enjoyed his toy that is being built in the factory.
And let's see what he thinks of it.
I love toys. Don't we all? Don't we all? Well and let's see what he thinks of it. I love toys.
Don't we all.
Don't we all.
Let's get right to it.
Show us the toy Gerhard.
All right, so I'm going to share my entire screen.
That's why I want to mention that this is a presentation style.
It is a presentation style.
I could even share the slides.
Now they're about 80 megabytes big.
Wow.
Because there are some recordings as well, right?
If something will not work live, that's okay. I already did it. It's
The screen behind me will be out of sync with the right will be doing but but but the thing itself
Has been captured so that I'm can't tell a good story. All right, so
the thing which I would like us to do now is click around
a few versions of the changelog site and talk about how responsive
the different versions of the changelog site feels to us.
And I think this is why Jared's internet was important
so that he experienced it as close as he would normally know tethering
nothing like that.
So we will start with the origin.
The origin as our listeners will know runs on fly and we always capture a date when this
was created.
This particular origin was created in January of 2024, 12th of January, which means that the URL to go to the origin,
by the way, most users will not do that. This is for the CDN to do that, right? It is changelog-2024-01-12.fly.dev.
And I want you to be logged out. That is important. Or just use simply a private window, whichever is whichever is easier.
We don't want any cookies.
We don't want you to be logged in because the experience will differ if you are.
And this is going to be our baseline.
So it's important that the reading is accurate.
And this is as slow as it gets.
This is if we were to hit the website.
This is running in Ashburn, Virginia.
And it basically comes down to your network latency to Ashburn.
Okay.
So let's open up the website. I'm going to open it up as well.
And I'm going to click around and see how it feels.
How responsive does it feel? That's what we're aiming for.
Remember to be signed out.
That's the important part. I'm clicking around, I'm signed out, I'm in a private
window. Perfect. So how does it feel in terms of responsiveness, the website? Average. Average.
What about when you click on news? Do you see any delays, anything like that? I would say there's a
slight delay. It doesn't feel a snappy
atom feels like what about you I can see it rendering do images play a game into
this like oh yes okay because that's what I'm noticing most it's laggy is
like the the viewport kind of gets painted and then it moves around because
the images catch up and yeah it doesn't feel it feels like it's like I'm on tethered internet
basically.
Right. There you go. See, imagine if Jared had a tethered
internet, how slow that would feel.
Double tether.
Cool. Yeah, exactly. Something like that. Yeah. So now the
interesting thing is that even though the images do serve from
the CDN, everything else around them, the JavaScript, the CSS, all of that.
I don't think it does. Let me just double check that.
Oh, it should.
It should. Yes, actually does. So it is just a request to the website. You're right. Actually,
yes. Everything, all the static assets are served from the CDN. It's just a request to the website,
which makes it feel slow. And I don't think we're biased.
I don't think we are imagining this.
I have been looking at this for quite a while, and it all comes down to that initial request.
Anything that hits the website for me takes about 360 milliseconds, and this is constant.
So I'm showing here the HTTP stat output, a tool.
We talked about it. We may drop a link in the show notes. And that's what it comes down to, right? Like the origin itself is slow, the further away you are from it, the slower it will get. But you in the US would have expected this to be snappier. So interesting that it isn't.
it's borderline snappy. I can feel it a little bit but it's not bad. Right. And I think that is because you have the changelog.com experience. So now if you go to
changelog.com and do exactly the same stuff that you did before, changelog.com, and you click around,
how does it feel now? Instant. Yeah, I mean it's snappy. Versions of instant. Almost instant. Like
some pages feel... I think the news is the one that you notice that like that paint
just takes like a little bit longer, right?
It's not instant, doesn't load instantly, but it's significantly better than if you
were to go to the origin.
Agreed.
And this will be consistent for everyone.
I think that is the advantage of changelog.com actually running through the CDN.
All the requests run through the CDN, even the ones to the website. So the thing is that if it's not in the cache, if it's a cache
miss, for me, it loads the homepage loads in about 300 milliseconds, which is slightly
better than when I go to the origin, but it's not great. Now, obviously, if this is a cache hit, in my case, it loads in under 20
milliseconds or around 20 milliseconds. And 15 times quicker is a noticeable difference.
Sure. So as soon as these things get cached, it's really, really fast. So we would expect this from
a CDN all the time, why they should consistently behave like this.
And by the way, title proposal, 15x quicker, maybe.
We'll see.
We'll see, we'll see, right?
We're getting there.
Note taken.
So the problem is that with a current CDN,
75% of homepage requests are cash misses.
So 75% of all requests.
Which is to me, insane.
It is insane, right?
That sounds pretty bad.
So some would say, present company included.
Okay.
It defeats the purpose of a CDN, right?
I would agree.
Yeah, but there's more.
But there's more. There's more.
Tell us, tell us.
So here, this is a question for you, both Adam and Jared.
What do you think is the percentage of all GET application requests that our cache hits?
How many of all the requests that go to the app, to the origin, do you think are being
searched from the cache?
And the options are 15%, 20%, 25%, or 30%.
What is your guess?
Well, I buzzed in thinking it was a game show.
My bad.
I buzzed myself in even.
Go ahead.
You get to go first.
20%, please.
20%. Okay, Jared. I buzzed myself in even. Go ahead, you get to go first. 20% please.
20%, okay, Jared.
So you just told us that 75% are misses.
Yep.
And that's every type of request.
Now you're asking-
No, no, no, sorry, 70, just the homepage.
Oh, the homepage.
Just the homepage is 75% miss,
which means the homepage is a 25% hit.
Now I'm asking about all the requests to the application origin.
Remember, we have a few origins.
Okay.
Just going to be application or going with the highest possible choice.
30%.
So yeah, 17.93%.
So yes, Adam is closer.
15% would have been accurate.
20.
I think 20 is more accurate because eight 17.93 is closer to 20.
Sure.
So yeah, I think you were too optimistic.
Because if 30% were cash hits, that would be good.
It's actually 17%.
18%.
18% are cash hits.
Everything else is a miss.
And the window is the last seven days.
The last seven days. The last seven days.
So in the last seven days, only 18% of requests were searched from a cache.
How does this make any sense?
Right.
So October 2023.
This is what we started right on this journey.
When this was issue 486 in our repo, what is the problem?
Well, after October 8th, 2023, CDN cache miss is increased by 7x.
It just happened. We looked into it, we tried to understand it, and we could not.
And it's been ever since then? Or it's like this systematic problem ever since then?
Well, it has been low ever since. So the cache hits have been low to the application specifically ever since,
which is why even when you go through the CDN and you think, right,
things are snappier and they are to some extent, many requests,
they are just cache misses, especially going to the application.
So here we are today.
It's only been three weeks.
So it's only been three weeks.
So let me explain what it means.
So it depends how you count.
OK?
So the thing is that roughly that's
how much time I had to spend on this,
like about three weeks in total.
Got you.
Got you. Right? Spread over about three weeks in total. Gotcha.
Right, spread over like a long period of time.
Right.
So we are just about to unleash our clicks
on pipedream.changelog.com.
Okay.
Bring your mice out and let's do this.
Let's unleash.
Let's see how this feels.
Our clicks.
Pipedream. And by the way, anyone can reproduce the same experiment. Let's see how this feels. Our clicks. Pipe dream.
And by the way, anyone can reproduce the same experiment.
Remember to be logged out, that part is important.
Or a private window, because if you have any cookies,
it will bypass the CDN.
That's the rule.
When should I do this?
Right now?
Right now.
Yeah, right now.
Just click around and tell me how it feels.
I mean, I've tested it myself,
but I don't have your experience.
So how does it behave on your side of the world?
So one thing in particular that I noticed
between the two of them right away,
because I clicked into news,
and it seems like there's this paint delay
on the right-hand side.
So we split that viewport news, side is subscribe right side is the newsletter
Very very cool, but that right side newsletter side the the background color seems to like delay paint
I'm not sure if that's it's happening here as well as the past an iframe. So that's a secondary request. I got you
Okay, so I'm not gonna judge that then I think that's important
He's like the whole thing like how does one compare to the other?
Where's the iframing from though?
Iframing from?
From the same site.
The same site, yeah.
It should be, yeah.
So for me, again, for me, when I click on news,
I can see that the iframing, right,
there's a little bit of a delay.
But when it paints, for me, it paints instantly on Pipetree.
On change.com, there's like a little delay between the whole thing. That's at least like how I experience it. But when it paints for me it paints instantly on pipe dream on changeable calm
There's like a little delay between the whole thing that that's at least like how I experience it
Now anyone can reproduce this and we wonder or I wonder how do you perceive the two?
Wherever you are in the world if you click around these are live links by the way
Changeable calm and pipe dream the changeable calm they should both behave, sorry, they will both
have the same content. And what I'm wondering is, how do you perceive them? Is there a significant
difference? Is it the same? Right. What you notice? What about you, Jared? Do you notice
anything different? My experience, specifically on the episode page, which I think is a good
one because it has a lot of,
I just call it first party content,
not even CDN content.
Cause I do, I mean the CDN is a CDN, right?
So I do see the images lazy loading in slightly,
just like they would on the previous one.
However, the first party content, for instance,
I'm on making DN simple, podcast 637,
which has all the podcast information, all the chapters,
and then the entire transcript, which is lengthy,
and it loaded in very quickly.
Obviously my browser's not rendering that text
that's off the screen,
but it has to at least download it in the HTML.
So that was very fast.
Other than that, it feels similar to changeall.com, has to at least download it in the HTML. So that was very fast.
Other than that, it feels similar to change.com.
And it's the images that I do notice load in,
because they're lazy loaded.
They load in a split second later.
Other than that, but yeah,
I think the episode page is a good test
and it's significantly faster.
Okay, so pipedream.change.com,
if you look at the requests,
to see the network requests
in your developer tools, you will see that all the static
assets they load from CDN2 dot change dot com,
which is the pipe dream too.
So everything that we serve, all the origins, whether it's the
assets, whether it's the feeds or the website, it all goes through
the pipe dream.
And the application was changed, that's what we were talking about earlier, we may unpack
that.
The change is to every public URL that we serve, now we have an alternative, which is
all running through the pipe dream.
I'm using an HTTP stat here and
I'm going to https pipe dream dot changelog dot com. If it's a cache hit it loads for
me in 25 milliseconds which is slower than changel. However, if it's stale, it should also return within
25 milliseconds, which is what's happening here. Our content should always be served
from the CDN, regardless if it's fresh or not. And in this case, what we see, if it's already been served once,
it will stay in the cache until there's pressure
on the cache and we control when that is.
We just basically size the cache accordingly.
We give it more memory and then more objects we'll store
will remain in memory.
And what we want to do is to always serve content
from the CDN whether it's stale or not. So this was a cache hit,
right? You can see there's a cache status header, it was
served from the edge, we see what what region it was served
from. By the way, if you were to do a curl request, you'd see
the headers, you would see like all this information, even in
your browser developer tools, open any endpoint and you get this information for every single response.
We see what was the origin that the request, the CDN had to go through to fulfill the request.
The TTL, that is the important, and there's the important flag, which is, sorry, the important
value, which is how long was that object stored in the cache. In this case,
it's minus four. It's a negative number, which means that it's considered stale. The default
value, the default ETL is set to 60 seconds. Anything that was requested within 60 seconds
is considered fresh. But then we have this other period, this other
value which is grace, which says for for 24 hours continue serving this object from the
CDN, but try and fetch it from the background. And also we see that this has been served
from the CDN 26 times already. As I read these headers, these are important, every single
request now has them.
We can see which was a region, which is an edge region. We don't have an origin yet, but we should by the way. The closer you are to the origin, it just says the origin.
Shield, all that we can configure now. What a Shield origin does basically, the CDN instances,
which aren't close to the origin, they will go to the CDN instance which is
closest to the origin. And that's so that we place as little load on the origin as
possible. I don't think that will be a problem for us, but we can do it if you
want to. And the question is, after all these years,
are we holding fly.io right?
What does that mean?
Well, changelog the application has only
been deployed in two regions, right?
Actually one region, and we have two instances.
But we always wanted to have it spread across the world.
The problem with that is how do we connect to the database?
Then you're introducing latency of the database layer. But now these CDN instances, they can
be spread around the world. So that means that finally we're doing this right.
Right. We just put it in front of our app instead of making our app be distributed.
Now we're distributing in front of it. I think so. Yeah. So, shall we see where these instances are running?
Yeah, we'll see it, man.
I'm curious. Are we curious about anything else before we move on to that?
I'm curious about the rollout of this thing because I've noticed a few things this week and I'm wondering if maybe things are pointing at different directions and if that, uh, if that explains some stuff that I've been seeing,
but we can maybe hold that for later.
I think we can talk about that now. Just, just like, so that's where we go.
We go through this. I, we never had this situation before, by the way,
where we have two application instances completely separate that are
pointing to the same database, right? So the data is always the same,
but one is going to become the new production
and it's configured in a certain way with a new CDN
and the existing application,
the one that's behind changelog.com
is still consumed by our production CDN.
I mean, we have two CDNs, that's a situation.
Right.
And we can't change the production application
because if we do that, then we have rolled out the new CDN
and we don't know whether we are ready yet.
I think that's what we need to determine today.
What else is left?
How do things look so far?
And just like assess the readiness of the new CDN.
Yeah.
Of the pipe dream.
So what things have you noticed, Jared, that are off?
So I shipped ChangeLog News Monday afternoon
and that particular episode
has dramatically lower downloads.
So low, in fact, that it has to be a bug
somewhere in the system that's not real.
Like it's not a real number or and I'm wondering if maybe a bunch of podcast apps got pointed
to the new CDN and we're not capturing those logs which is how we get this stats.
So that that was the first thing I was like there's no way that this is actually only
been downloaded 700 times or whatever it was.
Yeah.
In the first day.
That was the first thing I noticed there. And you're nodding along,
so you're thinking probably that's the case.
Yeah, I think so.
I think that that's what happened.
If, so depending on which instance picked up the job,
right, like this is all like background jobs,
it must have pushed a different URL than the live one.
So then all those podcasting platforms,
like how would you call them?
All the podcasting? The clients, I mean. Okay, so all the podcasting platforms, like how would you call them? All the podcasting?
The clients.
I mean, okay, so all the podcasting clients, some of them, maybe all of them may have picked,
but I think if it would have been all of them, we would have seen zero downloads.
Yeah, it wasn't all of them.
It was just some of them.
Maybe eventually the other app caught up and started doing things because we sent out a
bunch of notifications, you of notifications in the background.
Not because we have multiple instances.
And I think this must be a job queue, right?
Whichever instance picks up the job,
basically it puts its own URL and then ships it
to the actual sub-eins that we are in production
without wanting.
Damn.
Yeah.
OK. So I mean, assuming that all those clients got their podcast without wanting. Damn! Yeah. Okay. I know that.
So, I mean, assuming that all those clients got their podcast episode, then it works.
But we have no way of knowing. So, if our listener here didn't get Monday's news episode for some reason,
let us know.
Oh, no. They did.
Well, they might have.
I mean, the URL is correct, but they are going to the new application instance, which we're not tracking which goes to the new
Which has the CDM same data. Yeah same data. Just different applications. The data will be the same. Okay
Let me tell you the other thing I've noticed. Okay, go on. So that's the one let's debug live debugging
Love you already know about which is that and this is probably the exact same issue
Yeah, is that when we posted our auto posts to,
I think Slack in this case,
posted the app instance URL, not the tangell.com URL.
It might have been both.
Actually it was both.
Yeah, it was both.
And so there was a URL mismatch,
which I think is the exact same issue.
And then the third one is that I subscribe
to all of our feeds,
because I want to make sure they all work.
And so whenever we ship an episode,
I get like five versions, you know,
just padding our stats,
getting five downloads for the price I want.
And specifically the slash interviews.
So yesterday's show with Nathan Sobo,
two days back as far as we shipped this,
but yesterday when we record,
it went out and I downloaded on the changelog feed
and I downloaded on the plus plus feed
and I didn't download it on my interviews only feed
because you can just get the interviews if you want.
And that feed did not have that episode
until this morning when I logged in
and said, refresh the feed.
And I forced it to refresh that feed and then I got it.
And so there's, and again, that's probably,
those are background jobs.
So somehow that did not get refreshed.
So that's the third thing.
Okay.
The fourth one.
Okay.
There's four?
Yesterday, I disabled Slack notifications entirely.
And this is our last step to cut entirely over to Zulip.
And I have a blog post which is going out
announcing that we're no longer on Slack.
Don't go there.
However, after Adam shipped that episode,
it posted the new notification into Slack,
even though that code doesn't exist anymore.
And I deployed it.
And so I'm guessing it still exists on your disk.
Your experimental one is not keeping up with code changes.
Yeah.
Okay. That's correct.
So all my bugs are related to this very exciting
new deployment that I didn't know about.
We broke it.
I don't think we kisanted.
I think we broke it.
I think so.
Yeah, yeah, yeah.
I think we broke it.
So those are the four things I've noticed.
No, sorry. I broke it.
Let me take responsibility for this one. Yeah, that's much more fair. it. So those are the four things I've done. No, sorry. I broke it. Let me take responsibility for this.
Yeah, that's much more fair. I had nothing to do with that.
Well, friends, I'm here with Terrence Lee talking about what's coming for the next
generation of Heroku. They're calling this next gen FUR.
Terrence, one of the biggest moves for FUR in this next generation of Heroku.
It's being built on open standards and cloud native. What can you share about this journey?
If you look at the last half a decade or so, like there's been a lot that's changed in the industry. It's being built on open standards and cloud native. What can you share about this journey?
If you look at the last half a decade or so,
like there's been a lot that has changed in the industry.
A lot of the 12 factorisms that have been popularized
and are well accepted even outside the Ruby community
are things that are think table stakes
for building modern applications, right?
And so being able to take all those things
from kind of 10, 14 years ago, being able
to revisit and be like, okay, we helped popularize a lot of these things.
We now don't need to be our own island of this stuff.
And it's just better to be part of the broader ecosystem.
Like you said, since Heroku's existence, there's been people who've been trying to rebuild
Heroku.
I feel like there's a good Kelsey quote, when are we going to stop trying to rebuild Heroku?
It's like people keep trying to like build their own version of Heroku internally at their own company,
let alone the public offerings out there. I feel like Heroku's been the gold standard.
Yeah, I think it's the gold standard because there's a thing that Heroku's hit, this piece
of magic around developer experience, but giving you enough flexibility and power to do what you need to do. Okay, so part of Fur and this next generation of Roku is adding support for.NET.
What can you share about that? Why.NET and why now?
I think if you look at.NET over the last decade, it's changed a lot..NET is known for being this
Windows-only platform. You have WinForms, use it to build Windows stuff, double-IS,
platform, you have WinForms, use it to build Windows stuff, double-I-S, and it's moved well beyond that over the last decade. You can build.NET on Linux, on
Mac, there's this whole cross-platform open source ecosystem and it's
become this juggernaut of an ecosystem around it and we've gotten this
ask to support.NET for a long time and it isn't a new ask and regardless of our
support of it, like people have been running.net on Heroku
in production today.
There's been a mono build pack since the early days
when you couldn't run.net on Linux
and now with.net core, the fact that it's cross platform,
there's.net core build pack that people are using
to run their apps on Heroku.
The kind of shift now is to take it from that
to a first class citizen.
And so what that means for Heroku is
we have this languages team, we're now staffing someone to basically live, breathe, and eat being
a.NET person, right? Someone from the community that we've plucked to be this person to provide
that day zero support for the language and runtimes that you expect in, like we have for all of our
languages, right? To answer your support and deal with all those things when you open support tickets on Heroku and kind of all the documentation that you expect for having quality language support
in the platform. In addition to that, one of the things that it means to be first class is that
when we are building out new features and things, it is now one of the languages as part of this
ecosystem that we're going to test and make sure run smoothly, right? So you can get this kind of end-to-end experience.
You can go to Dev Center, there's a.NET icon
to find all the.NET documentation,
take your app, create a new Heroku app,
run get push Heroku main, and you're off to the races.
So with the coming release of Fur
and this next generation of Heroku,
.NET is officially a first-class language on the platform,
dedicated support, dedicated documentation, all the things.
If you haven't yet, go to haroku.com slash changelog podcast and get excited about what's
to come for Roku.
Once again, heroku.com slash changelog podcast. Okay, so let's talk through this in terms of what a potential fix would look like.
We have a new application instance which behaves as production from all purposes, right?
Like the content is exactly as production, it connects to the same database instance,
it has all the same data.
What isn't happening is the code updates aren't going out
automatically.
That has not been wired because my assumption was,
I will only deploy this one instance.
I'm going to change a couple of properties
so it has the new CDN configured.
And I'll see how it behaves the whole stack in isolation.
What happened, obviously, is the new instance is consuming the same jobs,
the same background jobs as the existing production.
So very helpfully it has sent the new links, which are all temporary, especially
like the application links, the ones that you've seen in Zulu and a couple of
other places, which are just for the application origin and they are only
meant to be there for the CDN. Everything should go through the CD origin, and they are only meant to be there
for the CDN.
Everything should go through the CDN, but the CDN hasn't been configured yet through
everything, because that's like where the test comes in.
How does the application behave?
So some links need to be application links.
How does the CDN behave?
So on and so forth.
So in this case, we need to somehow fix those links, the ones that went out and they're
incorrect. I'm not sure whether we know what they are and if not then we need to
basically make the new this experimental application instance not send not
consume basically jobs not process any back jobs. We just need to disable
O-Ban in that one. Perfect. And then it would never get invoked unless you manually go to the website, right?
Yeah.
And then we want to make sure that nothing crawls it.
Yes.
That's a good one.
Because then they'll start sending traffic to its endpoints instead of our main website.
So let's do that, Tootsuite.
So I think, yeah, I think we're finished with the recording.
Let's go and do it.
No, no, we haven't.
Don't worry.
This is still going.
Okay.
So yeah.
But that-
Those two changes I think will mitigate the current issues.
Yes.
Yeah.
Sounds about right.
Okay.
So that makes me happy
as long as we get those rolled out here.
We figured it out.
We figured what the issues are.
All right.
So what do I want to do now?
I think I would like to see how many PIPELY instances were running.
All over the world.
Okay.
And for this, I'm going to use a new terminal utility, which I found that I, I like, I was like, yes yes this is exactly what I was missing it's
called fly radar fly radar this is what it looks like you need to go to that
it's all N curses based it's all happening in my terminal it's beautiful
oh fly radar 0 to 1 I can see all the change log applications the one they
were going to look at is a CD and. So by the way, the two applications,
do you see this one?
The changelog 2225-0505 is a new application instance
that was deployed three days ago,
while the one above, the 2024,
that is the current production.
And that was updated one hour ago.
So the code will differ.
And the Slack notifications, if this application instance
picks up a job, it will do whatever it's configured to do,
which will be the wrong thing.
Another thing we can do briefly before we figure that out
is we could just redeploy that one,
so at least it's current.
Yes.
And it won't do any Slack notifications.
Because I definitely don't want to say,
we're no longer doing Slack notifications,
and have another one come in, and I'll have egg on my face.
As soon as we stop recording, I'll go and do that.
Not a problem.
Okay.
So let's have a look at the CDN 2020, five zero to 25, which is the
instance when it was deployed and it has had a few updates.
What do we see?
We see 10 instances.
You see the region and you see it's been updated one day ago.
I see Sydney.
Is that right?
I see Chicago.
Yes.
LHR.
Is that the Virginia one?
Heathrow.
Oh, London Heathrow.
Of course.
Yes.
These are the airports, by the way.
JNB.
That's...
Johannesburg.
Johannesburg.
Very good.
San Jose. Yes. Correct very good. San Jose?
Yes, correct.
Okay, IAD, that one's, that's the Virginia one.
That's the one.
Okay. That's the one.
SIN. Adam, you wanna guess these?
SIN?
I know DFW.
What is DFW?
Well, that's where you live.
Dallas, Fort Worth, and I think France,
the FRA is probably France, is my assumption.
France, what's SIN?
No, that's actually Frankfurt. Oh, okay Germany
Singapore Singapore
Yeah, man, I keep seeing this. Okay, and SCL. I haven't I don't know what SCL is. Come on
I keep seeing better. Okay. I don't know. All right, let's do fly CTL
Uh platform, I think, regions, I think,
and the regions list, there we go.
There we see what they are.
SEL, San Diego, Chile.
Yeah, that's the one, SEL, yeah, that's the one, Santiago.
That was it, SEL.
That's how we see what the regions are.
Cool.
That's cool, man.
Yeah, go over there in Australia or New Zealand
or something.
Well, we do have Sydney.
So, yeah. Oh, that's true.
We can add, yeah, we can add more. I mean, we had 10, but we can add more. No, no, no, Sydney covered that. I just do have Sydney. Oh, that's true. We can add more. I mean we had 10, but we can add more.
Sydney covered that. I just forgot about Sydney.
Yeah, so you've seen all the machines.
And in terms of other uses, like it has logs, alpha logs.
So this is something that's really cool.
So these are the logs for the new.
Let's see what logs we have, what requests we have flowing to the new chain log instance.
This is a cool tui. Congrats to the fly radar
coder author person. This is cool. It reminds me of canines.
Yeah, exactly. That's exactly it. Yeah, that's exactly it. Oh, look, we have some requests.
Robots. Robots got some requests and the homepage got some requests.
And this is IAD, IAD. So we can see what instances were requested.
So now let's go to the...
I'm not liking these requests, Gerhard.
How can we get requests?
Yeah, well, we will be getting, because we have the CDN,
we have monitors set up, we have a bunch of things.
Now these are the requests going to the existing.
You can see there's a lot more traffic
going to the existing application.
If you ask me, there's too much traffic.
The CDN is not doing its job.
That's what we're trying to fix.
There's way too many requests hitting it.
And you can see that the regions, right?
We have two regions.
EWR.
Adam, what does EWR stand for?
Do you know?
Ooh, why?
Write-on?
Write-on.
OK, yeah, perfect. That's exactly what it is.
That's right on, man.
Yeah. So so so we can focus only on like like specific instances to see the log.
So I think this is really cool.
So we've seen this.
Let's move on. For Cancly fly radar, for Cancly.
Whoa. That's quite the day for Cankan Klee. Yeah, so he built this.
I think it's a really cool tool. You can go and check it out on GitHub. It's all written in Rust.
So it's really, really fast. It's a terminal UI and it was inspired by K9s.
Oh, look at that.
Yeah, yeah, that's it. So issue five. March 22nd. That's when I, I just stumbled across it.
So I captured it and go and check it out.
But it was really cool.
Like when I, when I seen fly radar, I thought like, wow, this is exactly what I, what I
wanted anyway, anyway, back to the pipe tree.
So which backend do you think serves the most requested URL?
Now the question we have three backends or three origins.
Right. You have the application origin, the question, we have three backends or three origins.
Right.
You have the application origin,
the one that we've been focusing on.
There's a feeds backend and the assets backend.
So in the last seven days,
which backend served the most requested URL?
Like the one top URL.
The one top URL exactly.
Which one serves that particular.
That's the question. Yes. Okay
There's only three possible answers. Yeah, I'm gonna go with feeds same feeds feeds
So apparently we're serving this podcast original image about 10 000 times per day
Or once every 10 seconds, that's the assets endpoint. I had to check what it was. Yeah, it is assets. Yeah, it's our alibar.
The change log.
The change log.
So the answer was assets actually.
I guess that makes some sense because everyone has to download that into their podcast app
all the time.
Yeah.
Cash ass sucker.
Come on.
I know, right?
Do a better thing.
Yeah, do a better job with cashing it.
That would be a good thing.
So, but honestly, it was the second one.
I guess feed. So we were almost correct if it wasn't for that one image.
I'm wondering, how does the new CDN behave for our most
requested URL, which is not a static asset? So how does it
behave for podcast feed? I'm going to run three commands,
actually, a few more than three.
The recording has been done, so if anything doesn't work as it should, we'll
switch back to the recording, but that's going to be a backup. Alright, so let's go
back into the terminal and we'll experience this firsthand just to see
what it feels like. So I'm in the PyPy repository and the first
command which I'm going to run is just debug. And by the way, anyone should be able to clone the repository and do exactly what I do.
What's happening here is behind the scenes, it is building everything that we need for the CDN, including the debug tooling, and it will run it locally.
and it will run it locally. And the TUI that you see here, because it is a TUI,
it has a couple of shortcuts, is Dagger.
So all this is wrapped into Dagger.
So I have a terminal opened in Pipely, all running locally.
All right.
So the first thing which I'm going to do
is I'm going to benchmark the current CDN, changelog.com.
So I'll do just bench CDN.
All this is wired together, sending a thousand requests to the feed endpoint.
And this is what we see.
So the current CDN serves about 300 requests per second.
And it's the size that is the interesting one.
The size is about 220, maybe bytes per second.
So I think that the CDN is faster.
But the bottleneck here is my 2 gigabit home connection.
And this is as much as I can benchmark it.
So that's the limit. So if we were to benchmark using the same connection CDN to this will go to the pipe dream to feed
This is how that behaves and by the way, this is live real traffic that's happening here
so
177 and
132
Megabytes per second.
So what do you think is happening here?
If you had to guess.
Well, my guess would be that it's not as much bandwidth as Fastly has.
That is correct.
Yes.
So I'm looking at fly here, right?
And this is the CDN instance.
We have the different instances.
Do you see here like London Heathrow?
That is the one that lit up, lit up in response to me sending it a lot of traffic
and you can even see it here, right?
If I do London Heathrow,
you can see that's the one that was serving
the most bandwidth.
And actually what I've hit is the 1.25 gigabit limit
of this one instance.
And that's just a constraint of the actual instance on fly.
Like that particular fly VM or whatever they're
called.
That is correct. Yeah, exactly. So if I do machines list flyctl machines list, you'll
see that and let me just do an RG on LHR. You'll see that we have like a single instance
in Heathrow, we could run more. And that's what we're going to do here to see if running
more instances will increase the bandwidth. So I'm going to do, let's do flyctl scale count three. And I'm saying
we're just basically going to run three instances in the
Heathrow region. The reason why we don't do this is we'll just
add cost. When we are in production, we may need to do
this because some areas might may be running hotter than
others. So we may need to scale it accordingly. But right now,
hotter than others. So you may need to scale it accordingly. But right now, every single region has one instance only. So let me do machines list. So what I want to see is they
are all started and they're all running the health check. There's one. Yep. These are
all good. Yep. Everything is nice and healthy. So now let's go back and let's run the same benchmark. And you'll see it live.
OK.
So it still has the same 1,000 requests to the feed endpoint.
And 180, so just about the same.
Not much has changed.
It takes a while for everything to warm up
and the request to be spread correctly.
We've seen there a blip.
So let's see.
How does it behave now?
OK, so we're 150 megabytes per second. If we run this
a few more times so that everything is nice and spread.
Request per second, right? You said megabytes per second.
That's request per second. So this is so 171
maybe bytes per second, which is almost like 1.7 gigabits.
And the request we have 228. So these three instances, instances that's what we see and if we run this enough times when I tested this last time
I was able to get to that about two gigabits, but it's not
It's not like an exact
result every single time based on network conditions based on a bunch of things you know based on where those
instances are placed within the fly network, but time based on network conditions, based on a bunch of things, based on where those instances
are placed within the fly network. But three instances, and even if one added more, I've
seen there was like this limit, like obviously like the two gigabits.
Well you max out eventually, right?
Exactly. I max out eventually. I'm still not maxed out currently. And the reason why I know
that is because if I bench CDN 2, I can see that that brings me close to that 2 gigabits, right, 220.
CDN 1, this is Fastly.
CDN 1, yeah, this is changel.com, this is Fastly,
that's correct, and it's those 300 and something
requests per second.
So Fastly is still faster because we haven't added
enough instances in your region in order to get our bandwidth
up on fly to max out your Gerhard's personal bandwidth.
Exactly.
Exactly.
So adding instances doesn't really move the needle very much, but it does move it
eventually if you really want it to.
Exactly.
So this is, this is maybe even a question to the fly team.
So when it comes to the instances, if I look at what instances we provisioned, you
can see that we are running shared CPU 2x and they get two gigabytes of RAM.
The question is, and I think we kind of like touched upon this last time, even the performance
instances we don't seem to be getting more bandwidth.
There is a point at which an instance doesn't get more traffic.
And depending on maybe the region's capacity,
maybe there is some sort of limit that we're hitting.
Now, do you remember bunny?
Yeah.
Yeah, okay.
That's super fast.
We can bunch bunny, which is still alive.
Bunch bunny or bunch bunny?
Bunch bunny, we can bench bunny and bunny will go,
and this is how that behaves.
Bunny change load.
Bunny doesn't let you, right?
Exactly.
So the rate limits me.
So I can't benchmark Bonnie.
You think that's because they don't want to be benchmarked
or you think it's because they're just fighting off?
I think it's throttling, yeah.
They are throttling.
So bonniechangelog.com
and I have been benchmarking them quite a bit
in preparation for this.
My IP might be blacklisted somewhere on the Bonnie side. Yeah. But that's, that's the reality. Cool.
It should be able to get like some sort of like pass like,
Hey, I'm a developer and I'm testing things because
benchmarking.
Of course. Yeah, I think, I think so. I think so. Cool.
Okay. So I'm wondering if I had a hundred gigabit internet
connection and one day, and this is a fact one day, I will So I'm wondering if I had a 100 gigabit internet connection,
and one day, and this is a fact, one day,
I will have that internet connection.
And Fly did too, right?
Because remember Fly, I mean, in this case,
Fly is the bottleneck.
Correct.
What could we expect from Pipetream?
Just up runs the whole of pipe dream locally.
Okay, so now you got no network.
No network, exactly.
It's just like everything is running on the same host
and you can see that this is actually forwarding traffic
to the feeds endpoint, to the static endpoint,
to even the application origin.
This is like all of our features.
So it's all here, right?
It's all here.
It's all here, so let's do bench feed and let's see what we get.
Oh, we're getting massive amounts of that's 200,000 requests.
That is 200,000 requests.
Yes, it's more.
What do you see in data?
Can you read that out for us?
85 gigabytes.
That was a bit silly, but yes, it's every 10 seconds.
So now it's switched because we had so many requests, the scale switched yes, it's every 10 seconds. So now it's switched the others because we had so many requests.
The scale switched from one second to every 10 seconds.
And this is what we see.
We are pushing 11,000 requests per second.
And we're transferring eight gigabytes.
Not gigabits, gigabytes per second.
So so this is a really fast network. Right. We could saturate close to 100 gigabit, gigabytes per second. So if we had a really fast network, we could saturate
close to 100 gigabit.
That's insane.
So the software works.
And that's just a credit to Varnish, right?
Pretty much. It really, really works.
When you hold it right.
When you hold it right.
And you don't have a network.
Exactly. Well, you have a hundred,
you need to have a hundred gigabit connection. So that's, I think that's the hard part. And Fly
needs to, Fly needs to have, or whatever provider we run, it needs to have more network capacity
because right now my internet is faster than what the Fly instance does. Yeah. And I can't saturate it. And we've seen because I can saturate. I can saturate fastly.
Cool.
So, and I think the interesting thing, which, which I haven't shown you it and
I can, I can, I could, because it's behind me, but anyway, that that's not very
visible, what I would like to show is basically I'm hitting the limit of my
CPU, right?
Like where I'm running this benchmark is a 16 core machine and I'm running both
Varnish and the benchmark inclined, OHA, OHA in this case. And between the two of them,
they're saturating 16 cores. And that's what we see here. So the bottleneck really is the
CPU. It could go faster because again, networking is just all in the kernel. So pipe dream and pipe plea is an iceberg and we explored just
the tip of it. So most of it is underwater. Are you talking about lines of
code? No I'm talking about many things but let's go. I'm wondering how how many
of my 20 lines of ballooning to at this point. It's there.
It's there.
That thing is coming up.
So yeah, stay tuned.
Stay tuned.
So VTC stands for Varnish Test Case.
And Pontus Algren, one of our Kaizen listeners mentioned this in a Zulub message back in
December 2024.
So he said regarding the testing of VCL, did you consider the built-in test tool VTC?
So you were doing something else previously.
I can't remember what you were doing.
We are still doing that, but I'm also doing this.
So I'm just going to play the recording.
Okay. It's just easier. So just test VTC is going to run in three seconds
all the tests for the different varnish configuration
that we have for the light cream.
Cool.
This is really, really fast.
This is the equivalent to your unit tests, if you wish.
Weren't you running the tests against
like production instances last time?
I was.
And I still am. Now you have to do that.
Are you still are?
I'm still there.
Yes.
Why wouldn't you replace it?
Hang on.
Just give it a minute.
So we're getting there.
So this is, so this is what the VTC looks like.
And basically you can, you can control it at a very low level in terms of the
requests, the responses, the little branching.
So think of it when you're trying to come up with a final varnish, right?
You make like little experiments to see how the different pieces of configuration
would work.
And that's what VTC enables you to do.
You can write a subset of your VCL.
You can configure clients, you can configure servers, and you can make them
do things in an isolated way, in a very quick way.
You can basically model what the thing is going to look like,
and you're going to check if what you thought would happen does happen.
And that's what makes it really, really fast. And it's all built into the language.
So it's there, and we have it,
and it gives me a nice tool to figure out what is the minimal set of varnishes I have to write for this.
And I think this is where like that number of lines of code and number of lines of config
comes in. But we all know that we want acceptance tests. We want to see what users will experience.
And remember, this is what you were asking for Jared you were saying How do we know that this new thing is going to behave exactly the same way as the existing thing behaves?
So what we now have is you see the test acceptance
These are all the various things that we can run in the context of Pipe Lee. We can do test acceptance CDN
test acceptance CDN to, or test acceptance local.
And this is using a HURL,
and we're describing the different scenarios
that you want to test for real,
testing these real endpoints.
Which one would you like us to try out?
Local.
Local, great.
So what I've heard is change log.
That's exactly what I said. Change log, okay. Okay, so what I've heard is change log. That's exactly right.
Change log, okay.
So what I've heard is...
I don't know why you even ask.
Well, you have to have a bit of fun.
So test acceptance CDN and test acceptance CDN is going to run the same tests against the CDN.
It's going to test the correctness of our CDN.
Not using VTC though.
Say again?
Not using the VTC stuff.
No, this is HURL.
This is HURL stuff.
So different tests.
Exactly.
This is like a different level.
The VTC stuff is just for the varnish config.
HURL, in this case, the acceptance test, are doing real requests and checking the behavior
of the real endpoints.
Like for example, am I getting the correct headers back?
Am I being redirected?
Is this returning within a certain amount of time?
What happens if I do this request twice?
How does it behave?
Is it a miss versus a hit?
What happens?
So we have 30 requests that we fire
against the existing CDN and we see how it behaves.
And then what we're going to do, we're going to run the same requests against the new CDN
and it's slow.
Why do you think it's slow?
Well, I don't know what these tests are doing.
So I can't answer that question.
So these tests are checking the behavior of the various endpoints, for example, the feed
endpoints or the admin endpoint or the static assets endpoint. In this case, you can see
that we are waiting for the feed endpoint. So if you go back and you think about the
various delay and the stale versus miss, we are checking how the stale
behavior, sorry, we're checking how the stale properties of a
feed responses behave. So if I'm going to hit this endpoint
within 60 seconds, will it show up as stale? So we're
literally we're checking and we have to wait to see will it expire, will it will it will it refresh?
So so you're delaying on purpose to see exactly I'm delaying it on purpose and it takes about 70 seconds because we need to wait
That long right to test the staleness and by the way, that's that's something which I'm going to do next
So we're going to check the staleness of something and the staleness currently currently set to 60 seconds. And you can see we can do the variable delay. So this is the
real CDN. We're going to pipe dream. We're not testing the local one. We're testing the
pipe dream one. And this is the existing configuration, which we consider to be production. Now you
said local, and now we can do the same test. We're going to run them against local,
and we're going to change a couple of properties
because locally we want slightly different behavior,
and what we care about is that speed, right?
We want these tests to be much, much quicker.
And in this case, you can see like the actual requests
going through, you can see the responses,
you can see the headers.
We still are testing delays,
but the delays are much shorter,
which means that the test
will complete much, much quicker. So we control these variables. The production is just like,
you know, as it is. This is how it behaves and that's what we're testing. So it will
be slightly slower. Shall we do it for real? Would you like me to try to run another test
and see how it behaves if I do the acceptance local or shall we move on to something else?
What is the conclusion from that?
Like conclude some things for me.
Well, the conclusion is that we are able to run the CDN
locally and poke it in product and make sure that the CDN
in this case is behaving exactly as we expect it to.
We have a controlled way of configuring everything. What I mean by that, I mean
the backends, the various backends that we use. We have properties to control
like TTL, Stainless Freshness, and see how different configurations change the
behavior of the system. We also have it deployed and we can check if the
existing CDN behaves the same as the new CDN.
I haven't written all the tests, only like the big ones,
does the feed endpoint behave correctly?
Do the static assets behave correctly?
What about the admin endpoints or those that shouldn't be
cached, do they behave correctly?
So I'm starting to build a set of endpoints
and set of tests that check how those endpoints behave.
And there's certain differences, right? Like one CDN behaves slightly differently.
We know like the existing one, right, that we're trying to improve on.
So we can see where does it fall short.
There's a couple of interesting things that we can look at. For example,
I've seen that we, for example, don't cache the JSON variant
of the feed of the RSS.
Maybe we'd want to do that.
I don't know.
But going through this, like testing
the correctness of the system, made me look into parts
that I wouldn't normally look.
The best part is that we can run this locally.
We are in full control of everything that happens in our CDN.
It's a lot of responsibility and it takes a certain level of understanding to know what the
tools are and how they fit together, but we have it. Yeah, that's awesome because now we don't have to
just poke at a VCL in the sky and hope that it does
what it does and just only test in production.
You know?
You can actually make changes with confidence.
Is that a state of the art for any of the CDNs out there?
Like can you do this level of acceptance test between,
I guess you probably can't, right?
We can't run FASI locally, we can't run even bunny locally we can only run our own thing locally so you can't
really test the way you'd develop it locally and then develop it in
production but you can test you know XYZDN versus Pipeley or PipeDream
right you can test that that's what you're doing right now I think the first step is to being able to run it locally
and running anything of that magnitude locally is hard.
Let me rephrase that.
I would say if you are unhappy with your CDN provider,
thus far has there been a way to say,
what the original question was,
can we trust moving to something else?
In this case, the something else is something we've built, not a different
public provider.
And so we're scrutinizing a little bit more, but if you were unhappy with, you
know, one CDN and you were thinking, man, I want to move to a different one.
Has there been a state of the art to test the, I guess, the efficacy
between different CDNs.
Has this tooling been there before?
I'm not aware if it has.
If someone from our listeners is aware
of such tooling existing, I'd love to learn about that.
I think it pretty much comes down to DIY,
as in how much of the correctness of the system
are you testing for?
And in this case, even though it is a CDN,
it is part of our system, right?
Because it determines how the changelog website and the application and all the origins behave
ultimately. How do users perceive them? And the best thing that we have, honestly, are
the logs. Because based on the logs, you can see what users experience. But is that good
enough? I mean, these systems are really big, right? Like global scale big. It's
really hard, for example, even for me, I mean, sure, I could
force and test every single endpoint. But on like when I'm
running these these tests, right? When I'm, for example,
testing changelog.com, I'm testing whatever wherever I'm
closest to, based on the network conditions based on whatever's
happening. And I need to encode certain properties I care about to check that they are behaving correctly.
The same tooling could be used for any other CDN.
So once we encode the things that we care about in terms of the correctness of the system,
let's say that one day we migrate to Cloudflare.
If we did that, we would run the same set of acceptance tests against Cloudflare or whatever we're building there and see does this
thing behave the same as the thing that we're migrating from. So there are like
these harnesses that we are required to have to make sure that the systems
behave correctly because they're big complicated systems and most of them are
beyond our control as we've learned over the years.
Does that answer your question, Adam?
Kind of.
I mean, I think it does.
I think what I was pointing at or potentially trying to uncover is the potential of, you
know, we're all allergic to vendor lock-in essentially, you know, I feel like I wonder
if there's a level of vendor lock-in because the, you don't know unless you make the move
and it's hard as a developer and I see, or even a VP,
to say, we've got to make this change.
We've got to move to a different platform because of X, Y,
and Z, and whatever their data is, whatever their reasons are.
And I wonder how many people or how many teams
are staying where they're at because they
have fear of the unknown.
The unknown is that they can't test to this degree,
this acceptance level.
I mean, yeah, that is real.
I mean, just think about the journey that we had to take
to get to the point where we are today.
It took a lot of effort, took a lot of time.
It took a lot of understanding what even are the components.
And we could have picked something else.
We didn't have to pick varnish, but we didn't want to at least I didn't want to change too much at once. One day we may
replace varnish. It is possible. The real values in understanding what the pieces are
and how they fit together, whatever those pieces are, whether it's Kubernetes, whether
it's a PaaS, whether it doesn't really matter. It's a database. Take your pick. Each context
is different. So then how do you go about understanding what the pieces are?
How do they interact?
And how do you ensure, I think this is coming back to where we started,
how do you ensure that what we do does genuinely improve things?
And that is the hard part.
Being able to measure correctly,
being able to understand what improvement even means in the first place is really hard.
And what trade-offs are you okay to make?
We take a lot of responsibility by running this ourselves.
And I'm very aware of that.
I think that is really like the hard part.
Being confident that you can pull this off, having the experience that you can pull it off, and you can learn anything that you're missing. And if you apply those principles to whichever context you operate in, you'll be
good. It won't be easy, but you'll have learned so much. Well, friends, I'm here with a good friend
of mine, David Shue, the founder and CEO of Retool
So David I know so many developers who use Retool to solve problems, but I'm curious help me to understand the
Specific user the the particular developer who is just loving retool. Who's your ideal user?
Yeah, so for us the ideal
Yeah, so for us, the ideal user of Retool is someone whose goal first and foremost is to either deliver value to the business or to be effective.
Where we candidly have a little bit less success is with people that are extremely opinionated
about their tools.
If, for example, you're like, hey, I need to go use WebAssembly, and if I'm not using
WebAssembly, I'm quitting my job, You're probably not the best virtual user, honestly.
However, if you're like, hey, I see problems in the business
and I want to have an impact and I want to solve those problems.
Retool is right up your alley.
And the reason for that is Retool allows you to have an impact so quickly.
You could go from an idea, you go from a meeting like, hey, you know,
this is an app that we need to literally having the app built at 30 minutes,
which is super, super impactful
on the business.
So I think that's the kind of partnership or that's the kind of impact that we'd like
to see with our customers.
You know, from my perspective, my thought is that, well, Retool is well known.
Retool is somewhat even saturated.
I know a lot of people who know Retool, but you've said this before.
What makes you think that Retool is not that well known?
Retool today is really quite well known amongst a certain crowd.
Like I think if you had a poll like engineers in San Francisco or engineers in Silicon
Valley, even I think it'd probably get like a 50, 60, 70% recognition of retool.
I think where you're less likely to have heard of retool is if you're a random
developer at a random company in a random location like
the Midwest, for example, or like a developer in Argentina, for example, you're probably
less likely.
And the reason is, I think we have a lot of really strong word of mouth from a lot of
Silicon Valley companies like the Brexit, Coinbase, Doordash, Stripes, etc. of the world.
There's a lot of chat, Airbnb is another customer, Nvidia is another customer, so there's a lot
of chatter about Retool in the Valley.
But I think outside of the valley,
I think we're not as well down.
And that's one goal of ours to go change that.
Well, friends, now you know what Retool is, you know who they are.
You're aware that Retool exists.
And if you're trying to solve problems for your company, you're in a meeting, as David mentioned,
and someone mentions something where a problem exists and you can easily
go and solve that problem in 30 minutes, an hour or some margin of time that is basically
a nominal amount of time.
And you go and use Retool to solve that problem.
That's amazing.
Go to Retool.com and get started for free or book a demo.
It is too easy to use Retool and now you know,
so go and try it.
Once again, retool.com.
Because we're able to do this whole, you know,
multi application, multi CDN scenario,
is there a way to say,
test 75% of our traffic goes to existing CDN,
25% of our traffic goes to new CDN over a course of time,
like as this confidence, you know, gets to a higher level,
is that, like what's the proper way,
you don't just like switch it off, right?
Like we're testing and confirming it and things like that,
like how does it work in different scenarios?
But is that the prudent way to roll it out or am I jumping the gun on your
No, no, no, no. I think this is good. This is exactly, I mean,
these are like the big questions because honestly there is no right answer.
So a progressive rollout is the most
cautious one,
especially if you don't know how the new system is going to behave.
In our case, we're spending a lot of time to double check that the correctness of the system is right
and that the system behaves correctly when it comes to all the other...
So it's one component, the CDN, but it integrates with S3. It integrates with a bunch of other, it integrates with S3 for stats, right?
It integrates with Honeycomb for all the telemetry, for all the traces, for all the old events.
It integrates with R2, the different R2 backends for the actual storage of certain components.
So there's like a lot of, we're just basically replacing a central piece and everything around it still has to still still remains and the integration has to be right.
So yes, we could do a gradual rollout in that maybe from a DNS perspective, we say 25% of
queries return this backend or this origin, sorry, in this case, let me just not
compound the word origin, 25% of the requests go to PyPedream, and 75% go to Fastly.
And how do they behave?
But at that point, we are maintaining two systems, which is okay, but it cannot be a
long term solution.
Right?
So we want to shorten the window
in which we run both systems at once
and that both are active.
Because we could very easily switch,
for example, to pipe dream, right?
Make sure that everything runs correctly.
Let's say that we detect that,
hey, for some reason something isn't behaving correctly.
We still have the old system,
we just point the DNS back
and everything continues as it was, which is why two of everything. That's another principle that we have. So at this point, we have
two CDNs, we have two applications, which are completely isolated. Now, they are running on
fly, like the runtime is the same. But if one was to go down, the other one wouldn't know about it.
So we've designed this in a way that is very cheap to fail. The new stuff, if it fails,
will have impacted maybe a few minutes worth of traffic and fail catastrophically, which is why
running all these benchmarks, running all these correctness to make sure that the chances of that
happening are low. No guarantee in anything, but they're low. And going back and forth is super easy because we run both things at the same time.
The problem of running both fastly and the new one is that we may see inconsistent data that gets written out.
I'll go to great lengths. I mean the logs, I mean events. I'll go to great lengths to ensure that's not the case.
But if there are little discrepancies, we may end up with different data
and it may take a while to find that out,
especially on the metric side.
What kind of data would be different?
Like a different image or, you know?
The stats that we write.
It's like all the requests that come in,
the stats that we write to S3, for example.
And when Jared processes them, right,
when the background jobs kick off,
they just can't reconcile the two different
ways of saving the same data. Because there's a lot of config in Varnish, sorry, there's a lot of config in Fastly that configures how we write out the logs to S3. And that will be accurate.
The problem is that certain properties that Fastly has,
PyPetri may not have.
Again, let's remember Fastly is a version
of enterprise varnish, which is completely different.
Like it's only them that they have certain properties
about varnish.
We don't have certain methods.
We don't have table lookups.
There's so many features that we don't have
in the open source varnish.
So there might be differences in what we could, what we may be able to do.
For example, the geo IP stuff, I don't know how that's going to work or if it's going
to work at all.
And maybe it's fine.
But that's an example of something that's running these two systems at the same time.
We'll need to reconcile the differences.
I suppose no two differences switching everything across across and then oh you are missing these properties that you care about
But that is the risk of going from one thing to another thing
Well, I found the answer to my question. It looks like it's about
308 lines of code at this point
Great you're getting there, but that's okay. You preempted it. Oh good. Oh good. Oh good cool. I care about yeah
I know I know yes, so it quite, yeah, it changed a bit
and we'll go over that in a minute.
So.
One more question for you before we go on.
Yes.
You said the phrase enterprise varnish.
Is there such a thing?
Do they have like a different fork of it
they're developing?
Yes, absolutely.
Okay, open core style.
So there's obviously there's varnish
and there's enterprise varnish.
Enterprise varnish is a paid product.
As far as I know and Fastly started, this is like going through their blog and going through the various public information which is out there.
They started with Varnish, but they've been changing it a lot over the years. That was their starting point.
I don't know how similar it is to the Enterprise Varnish, but at this point we can assume it is a custom platform,
customized varnish.
I don't even know if it is varnished.
There are certainly VCL, but I don't know how that maps to what they actually write
because that's like all their like proprietary software.
Who's in control of this enterprise varnish?
They are the varnish people.
Um, I searched it on Google and I couldn't, I mean I'm still using Google.
Yes.
If you go Varnish Enterprise, yeah there is even like a company and consultancy behind
it.
Varnish-software.com.
They sell Varnish Enterprise.
They have the open source Varnish community version.
Alrighty.
I didn't think I'd land on the right page.
It seemed like a not the right place but.
Yeah.
Varnish Enterprise and Varnish Software is the commercial never been here before
Okay, brand new. Yeah
Okay, so varnish cash is the open source community version varnish enterprise this these are things
I'm not familiar with I just never paid attention to this this detail. So you got varnish cash
open-source varnish Pro
You got Varnish Cache, Open Source, Varnish Pro,
Varnish Enterprise, Varnish Controller, Traffic Router. Okay, so you got like different layers.
So we're using obviously the OAP only available
to every developer out there, Varnish Cache.
They are using likely, highly likely, Varnish Enterprise.
Yes, because, and the reason why we know this
is from the documentation, they have certain phrases, like one behavior that we had to work around as you can see here, right?
We have different instances running. So we have pipey running which is varnish, right? That's what it's like varnish 770
But we have feeds and feeds is the TLS proxy
We talked about it in the last episode the TLS proxy terminates TLS to backends,
in this case, HTTPS traffic. Varnish itself cannot go to TLS backends. It doesn't terminate SSL.
Varnish Enterprise does. And the reason why I know that is because that's what we use in the
FastDVCL config. So Varnish in that case does terminate TLS.
And that is a Varnish Enterprise feature only.
So that was like another thing that we had to solve somehow.
And Abhil, again, thank you very much for helping out with that.
Writing this like very simple Go proxy, which uses little memories, highly performant,
that is able to terminate SSL, which then in this case,
Pitely connects to and it's all running locally.
So feeds assets and app, there are separate processes and we can see this by,
let's just do this PS.
Look at that.
This is like the whole process tree of what's running in Pitely.
So we have T-marks, obviously that's like the, the, the session, which I have
opened here, uh, bash just up, right? It's just a wrapper.
It basically invokes Gorman.
So it runs like all the various processes.
And we have TLS exterminator, local port 5000, proxies to change log flydev.
We can see the process.
We can see the memory usage, all of that.
It's using currently, what is it?
8 megabytes of memory.
And that was asked of benchmarking, right?
We ran a benchmark here.
TLS exterminator, we're going to feeds,
and we're going to, which was the other one.
There should be one more there.
Change of place, the static assets.
And then eventually we have varnish.
So you have quite a few things running here
just to get that experience that, you know,
in Fastly's case, it's just all part of varnish.
So these, we are bringing different components together, building what we're missing so that
we get something similar.
And ultimately what we care is how the system behaves from the outside.
Do the users get the experience that we want them to have
or that we expect for them?
All right, so I could do this live,
but I think it's easier, I can focus a bit better.
So the tests, right, we can run them locally.
Now, I did mention that we're using Dagger.
So if I do Dagger log and change log, what that means
is I'm going to authenticate to Dagger Cloud.
And then everything that runs locally
will be sent the whole telemetry,
like how the behavior of the various commands,
like how do they change.
In this case, I'm running the acceptance test locally.
And by connecting Dagger to Dagger Cloud,
I'm able to see all the different things
that run for those acceptance tests. All the commands that get installed, all the tools that
get installed, all the commands that run. And in this case, I can even see the actual requests
that go to the local instance of Varnish in great, great detail.
to the local instance of varnish in great great detail. It's all real time, it's all wasm goodness, and the tests are hooked up too. So when I run something locally
I can or even in CI it all goes to the same place and I can understand how
these various components behave. How long do they take? That's what we see here
like a trace of the various steps.
So when something is slow or misbehaves,
I know where to look.
So the acceptance tests,
they run locally in one minute and 26 seconds.
And that's pretty good.
So what else is left?
We're nearing the end.
What else is left before we can deliver this toy to Adam?
That's what we are working towards.
So the first thing is the memory headroom.
What does that mean?
Varnish we are configuring it to use a certain amount of memory so that it can serve as many
things as it can from memory.
So it's really, really fast.
And I went through a couple of iterations basically,
and we'll see that in a minute.
The value which I set initially was not the right one.
Varnish kept crashing and I had to find out
what the right value is because it's not very obvious.
Forwarding logs, that is the part which I think
it's an important one, but not as like,
it's a smaller component compared to everything else.
So we will have one more process running. In this case, it'll be vector and vector is going to
consume all the varnish logs and it's going to deliver them to different sinks. That's what
they're called internally. So one will go to Honeycomb and we'll be able to compare R is the data the same format as we get from Fastly.
Because all the dashboards and all the learning and everything else should work the same.
The SLOs and all of that.
And are we able to send the same logs with the same format to S3 so that Jared is able to process the metrics?
That is the important part, right?
When you mentioned that the numbers went, went, went down.
Well, we're not getting those metrics from the new instance.
Yeah.
And the last one is the edge redirects.
And that's just basically writing more VCL, which is fairly straightforward
at this point, and by the way, LLMs are very helpful.
So I was, I was using agents for this and you know, they, they really go through
it like they just said was very good.
I was very nice episode. I enjoy that by the way. So stuff like that you know which makes this
super super simple is literally copying config from one file to another file and just reformatting
it. But we have most of it. A couple of things are different because again our varnish doesn't
have all the properties that the Fastly varnish has like table lookups and specifically there's
more like if else clauses and a couple of other things but nothing crazy but mostly straightforward and
this is also going to clean up a lot of redirect roles because they're all over
the place there's jobs there's go-to's there's quite a few things in our
existing varnish config and then the last one is the content purge so we'll
talk about that in a minute but the the memory, this is what it looks like, the memory.
So basically, you see like, we are looking at the memory usage of an instance of pipelay
slash pipetree.
And you can see that the limit is two gigabytes.
And we want to be just under it.
But then sometimes what happens, there's some requests coming in all of a sudden.
This is like one instance that was hit particularly badly.
I don't know what was happening with it,
but there's lots of traffic going to this instance.
And by the way, it was more like bot traffic.
It felt like agents are trying to scrape you.
That's exactly how it felt.
They tried different things.
So it was all just garbage.
And when we see these drops,
is Varnish was crashing because it was running out of memory.
It was getting OM killed.
So I had to adjust that headroom a couple of times and now it's been stable.
If we look at the actual, let's see if we find it here.
It's this one, right?
So I'll look at the last six hours, right?
You can see all the various varnish instances, the memory, we never had those big drops.
There's smaller drops based on data being replenished and how it changes.
We still need to understand those metrics by the way, but that's coming.
So things have been stable from that perspective.
Cool.
And 800 megabytes, that's how much headroom we had to leave for Varnish.
This was version
005. It was the last one we pushed and things have been stable ever since. So we need to
leave 800 megabytes free so that things don't get killed. That seems to be the goal. Number
400 was not enough. And the pull request 12 is up there, which we're going to send logs
to Honeycomb. That is the first one. There is not much else other than just like a placeholder for it, but that's the next
big thing.
And we need Content Perch.
And for this, I need to Tango with Jarrett on this one.
It takes two to Tango.
Yeah, pretty much.
Pretty much.
So this is where we talk about how do you imagine us integrating O-Ban with Fly
in this case to understand what the various pipe dream instances are, because we need
to send requests to every single one of them when you want to purge content. There is no
orchestrator, which was what was happening in Fastly, right? You would send a purge request
and then Fastly would distribute it to all the instances or not,
because things weren't cashed that well. Anyway,
the point is we need now to orchestrate that purging across all the instances.
So how do you think we may approach this, Jared?
Well, we need some sort of an index or a list of available instances.
Perhaps we could get it from fly directly. Yeah, there's DNS.
We can send a DNS query and it will give us all the instances.
So as long as we know some sort of standardized naming around these instances,
so they're not our app instances or whatever. It's like our PIPELY instances.
Yeah. And we just create an O band worker that just says, you know,
you tell it what to purge.
It wakes up, says, all right, give me all my instances.
Gets that from fly and then just loops over them and sends whatever we decide a purge
request looks like to that instance.
Yeah.
I'd really like to do this maybe before the next kaizen.
I sure that's a big one.
Because if you think about it, really, it's like these two big things.
It's sending the logs and the events to Honeycomb and to S3,
content purging.
And that's like this piece where we need to work together on this.
And then the edge redirects are really simple.
It's literally just like copy pasting a bunch of config,
clearing it up.
And that's it.
That's it.
That's it.
That's how close we are.
That's how close we are. That's how close we are.
It's not even Christmas. So close you can almost play with that toy. Yeah. Well,
Garen's been playing with it. Yeah, I have. I mean benchmarking it. I mean anyone can try it.
You've been trying it. We go to feeds. We serve assets. Now we just have to do like some of the,
I think, tooling around it.
Like some extra stuff that is not user-facing.
Because the content purge, I mean, if you think about it,
do we need to do the content purge?
60 seconds, that's how long things will be stale,
because they get refreshed ultimately every 60 seconds.
The problem with that is maybe that is too aggressive for static assets, right?
We would like to cash them
maybe for a week, maybe for a month, I don't know, like stuff like the image that we've seen, right? The change log image that doesn't change. Yeah. So that could be cash for a year, right?
Right. Unless it gets effectively content purged. Is there a way to like classify assets as like
this will never change kind of thing? Like give things like Buckets like a bucket is like on that every absolutely whatever minute cycle B buckets like this almost never changes
So let's just go ahead and cash that almost forever
Absolutely, and then see is like these things will never ever ever change and when they do it's a manual purge
Yeah, I mean that that's all all that is possible. The is, what is the simplest thing that we could do that would ensure a better
behavior than we've seen so far from a CDN and something that maybe doesn't require a
lot of maintenance?
So as I was thinking about content purging, I was wondering, well, if we expire everything, let's say within the board, like if we say feeds, refresh them
every minute.
Static assets, refresh them every hour.
The application refresh maybe every five minutes, maybe every minute.
I'm not sure.
Maybe we don't need content purge. When you say refresh, does it literally delete from the CDN
and pull it over from wherever? Or does it just check
freshness?
It's so when a request comes in, it will check freshness when
the request comes in. Which means that let's say a request
arrived an hour ago, and
the TTL is 60 seconds. When the second request arrives, it checks, is that considered stale
or fresh? If it's considered stale, if the TTL is longer, it will still serve the stale
content, which means it could be an hour long, whichever the duration is between requests.
And then it will go in the background to the origin to, to, to,
to fetch a fresh copy.
So subsequent requests will get the fresh content, but never the one that checks
the freshness, if that makes sense.
There's a port, even if it's the same.
Uh, yeah.
Because we are configuring the detail.
We're saying only keep it for 60 seconds.
We're not doing any comparisons,
we're not doing any ETAG comparisons,
we're not doing anything like that.
That's too CPU intensive to do comparisons,
like checksums and stuff like that?
It's not that kind of thing.
Because I'm thinking like R-Sync, for example,
whenever I do things, this is not the same,
but it's similar.
It's like, hey, I want to go and push something there,
but you can also do dash dash checksum,
which is like, let me do a computation
between the two things and confirm
like even though certain things may have changed like
Updated on or whatever, but it's still the same data. It doesn't actually update it, you know
I'm just wonder if that's a thing in CDM world
Yeah, it is. I mean that's where like for example the e-tax come in right in the knee tag header
you can you can put basically the check sum
of the actual resource, and then it will check it first.
Like say, is this ETAG different
than what I have in my cache?
And if it's not, then this is up to date.
So it's not time-based, it's just header-based.
And all it does is just goes and check the resource
on request, but it still means that the first request
that comes after that object has been cached may serve a stale content.
Actually it will return a stale content.
The first one will always return a stale content because that's when the check happens.
There's no background, anything, right, to run in the background to compare all the objects which I have in memory, are they fresh or not? And this is
where the content purge comes in. When you know that something has changed,
you're explicitly invalidating these objects in the CDN's memory. So let's say
you've published a new feed, right? You know you've updated it in the origin,
then you send the request to the CDN, which I believe that's what we have today.
To say purge this because there is a new copy.
And then the first request is going to be a miss.
It will not be a stale.
It will be a miss because the CDN doesn't have it.
It has to go to the origin.
What is this? Can you go back?
It would hurt your presentation to go back to the what's left to do slide.
Yeah, I kind of want to see that list again.. Yeah. Yeah, of course left. What does it take? Like what what do we reasonably think is required to get to I?
Love this your indexing to by the way
of this list
Although your font doesn't let it be very straight. I'm pedantic now as a designer looking at it
your font doesn't let it be very straight. I'm pedantic now as a designer looking at it.
That's okay.
What's required to get all this done?
Like how difficult of a lift is the remaining steps
to put a bow on it?
Let's talk about unknowns because I think that's,
because it's like, the question is,
how long is a piece of string?
And I don't know, like, what is a string?
Show me the string, I'll tell you how long it is.
And what this means is that I don't know all the properties that we need to write out in the logs to see if we have them.
And again, I know that the gip we don't have, I mean, that that's just not a thing. We don't
have that. And adding that will be more difficult than if we are okay to not have it, for example.
will be more difficult than if we are okay to not have it for example. So maybe we do that or maybe we just add wherever the request is coming like whichever
instance is serving the request we just use the instances location not the
clients location. So maybe that's one way of working around it. So forwarding logs
it's fairly simple in terms of the implementation.
What we don't know is what are all the little things that need to be in those logs for the
logs to be useful or as useful as they are today.
And this is the dance between you and Jared.
Actually, this, no, this, that's the edge redirects.
So this is forward logs, forwarding logs is we have to send them to Honeycomb and to S3
Honeycomb so that we understand how the service behaves
What are the hits like remember all those graphs that I was able to produce?
We need to be able to see which requests were hit which were which were miss
So all that stuff I think in a day I could get the Honeycomb stuff done. I think right. I mean
There's nothing crazy about it. It things will not be present, but most of it is fairly straightforward.
S3 is a little bit more interesting because I haven't seen that yet,
and I'm not familiar with the format.
But I know it's a derivative of what we get in the request.
So just a matter of crafting a string that has everything we care about.
And I'm going to flag if any items, if they're problematic.
So honestly, I would
say a few days worth of work, I can get the forwarding logs sorted. Then moving to the
edge redirects. The question is, how far do you want to go with them? Are you okay with
the current behavior, which everything expires in 60 seconds, and we can be serving style
content or do you want to implement what Jared suggested?
I want purgeability.
Content purge, not edge redirect, sorry, content purge.
That's what I meant.
Sorry.
I want purgeability.
I just like to have the control.
I don't think it's gonna be very hard to do.
On the logs front,
I don't think we wanna lose GEO IP information.
I think we could relatively easily,
since we're running a background process.
I'm not sure if Vector has that kind of stuff built in or if you just have a script that
does two things that pulls the IP, you know, checks it against the MaxMind database and
then puts it back in there.
There is some integration with the MaxMind and it exists.
I know there is like the lite version which is free.
Yeah, which is all we would need.
And if that's okay, I haven't done it myself,
but having looked at the config,
as long as the file is in the right place,
which won't be a problem,
it's pretty much like baked into the software.
Yeah, so if we do that,
then we're pretty much everything else we have.
But I do think we should keep that
because it is nice to know where people are listening to us.
So that will make it slightly more difficult, the live version. If we had to go for the paid
version that would be a different story because I don't even know what it takes to get a MaxMind
paid database and get it refreshed and all that. We'll have to look at the details.
So my goal is by the next Kaizen all this to be done.
Yes.
That is my goal.
That is my goal.
Like we are honestly, like one of my title proposals
was 90% done.
I feel that we are 90% done or 10% left, right?
Whichever, like all the heavy stuff has been taken care of.
That's exciting.
My title proposal is tip of the iceberg.
Tip of, oh yeah, I love that.
Tip of the mountain or the, oh yes, I love that.
Tip of the iceberg.
CDNLC, CDN like change log.
Or, you know, what would Jesus do?
Now, what would change log do?
They build a CDN.
Yeah.
Or bottlenecks.
That's also a thing.
There's like so many bottlenecks in different parts of the system.
Right.
Including me.
I'm a bottleneck by the way.
My time is a bottleneck.
But honestly, I'm very happy with where we are with this.
I mean, I've learned so much and it feels like we own such an important
piece of our infrastructure.
We were never, never able to do this.
And all because we were patient and diligent and we had good friends is why
we are where we are today.
And that makes me so happy.
So many people joined this journey.
So, yes, I love that.
Three of my favorite things, patience, diligence and friends.
You know, get you far.
I think so too. Thanks, Gerhard. You're leaving us on the cliffhanger here. diligence and friends, you know? Yep. Get you far.
I think so too.
Thanks, Gerhard.
You're leaving us on the cliffhanger here.
Kaizen 19.
Ah.
Kaizen 19, this is it.
This is the last one.
Sure, what am I sure today?
Dang, man, it's in the wash.
Well, I'm excited about this.
I think, let's say next Kaizen,
this is production worthy.
What changes, you know, once that's true, once that's true, it's in production, it's
humming along perfectly fine.
What changes for us specifically?
Our content is available more or is available full stop when our application is down.
That was never the case.
When the application goes down, we are down.
We've seen when you had that four-hour flood at IOL teaching at one region, and that's
what we went to two regions.
That's what prompted us to go to two regions.
And with a CDN that caches things properly,
that would not be the case.
And by the way, that's something that I wanted to test.
I don't think we have time for that now,
but in the next time, we'll take the application down
and make sure that we're still up.
Now that's still going to be, so users which are logged in,
I think we'll need to do maybe something clever. And again, it's within our control.
We can say, even if you have a cookie, if the backend is down,
we will serve you stale content, which is public content so that we look like we
are up, but none of the dynamic stuff is getting to work. So that's,
that's one thing. Um,
I think this gives us a lot more control of what other things
our application used to do.
You remember all those redirects that we still
have all over the application that we couldn't put in the CDN?
Because it would have been working
with this weird VCL language that wasn't ours,
like pie in the sky, as Jared used to call it, that we
don't know how it's going to behave.
So we chose to put more logic in the application that we wanted to, because the relationship
with the CDN was always like this awkward one.
And I think we had a great story to tell.
I mean, just think about how many episodes we talked about this thing.
And now it's finally here.
And it feels like, like, like, was it worth it?
What was the first time we started this?
Was it a year ago?
A year ish, year and a half of like two years.
I can't remember.
Well, I remember October.
That's that's October 20, 23.
Yeah, that was like quite, quite a while ago, which is when we were seriously
thinking like, hey, is this is this experience experience that we can expect that was like the right important
milestone in this journey that kind of like started all of this so next Kaizen
is roughly July something it's May now right if it's a two-month Kaizen 20
it's a nice 20 we have to do it we have to do it so it's it's a nice 20. Look at that. Nice. We have to do it. We have to do it. So it's
it's a quarter off of October. So October isn't quite something September would have been the
next guys and after July, right. So it's a little bit before October, but I feel like it's like
almost two years, a year and three quarters basically. Yeah, I think if we go to, I'll just very quickly go to change log and the change log repository in the discussions.
And I think we had even a question, should we build a CDN? And when was that?
January 12, 2024. That was the first one when we asked the question, like, should we build a CDN? I was like, that's that that started
out in my mind this journey. So January 2024. It will be one
year and seven months, six, seven months.
Yeah, call 1818 19 months. Look at that. If it was 20 months,
I'll be crazy.
Seven episodes.
Delay or an S kaizen by a couple of months.
But let's remember there's like all these other things that used to happen and
they were happening around. It wasn't just just this.
I mean, this was one of the things that was like kicking the background.
But again, just just look through all the things that we went through
to get here today.
But the definite like like between Kaizen 18 and 19,
this has been my only focus because I wanted to get to a point where we can,
you know, 90% done.
Let's do it.
Let's do the last 10% for Kaizen 20.
That's what I'm thinking too.
We'll celebrate.
Oh, I just had a good idea.
Go on, go on.
And we can cut this if we don't do it.
But let's all go somewhere together for Kaizen 20.
Let's be together.
Okay.
Oh, I'm intrigued.
I like this.
London or Denver or Texas or something.
Let's get together.
Let's have a little launch party.
Oh wow.
You like this?
I like where this is going.
That's good.
Okay, we'll iron out the details,
but we're all into the idea.
Yeah, I like the Denver.
Denver would be great.
Okay.
All right.
Maybe we'll invite some friends.
Oh, wait, Dripping Springs.
I'm just kidding.
That's where I live, Gerhard, is Dripping Springs.
To our listener, let us know in Zulip, if you would go to a change log, Kaizen 20,
Pipe Lee launch party in Denver sometime this summer.
Let us know.
Oh wow.
Throw a little party.
That's quite a cliffhanger.
Alright, we'll leave it right there.
We'll leave it right there.
Okay, perfect.
Alright.
See you in Denver.
See you in Denver.
Kaizen.
Always.
Kaizen. See you in Denver. Kaizen. Always.
Kaizen.
See y'all.
So a live Kaizen recording slash Pipe Lee launch party in Denver in July.
Would you be there?
Why or why not?
Please do let us know in the comments.
We are serious about this. Are you? Comment
in Zulip please. Let's thank our sponsors one more time. Fly.io of course, depo.dev,
heroku.com, and retool.com. Do us a solid and check out what these orgs are up to
and tell them changelog sent ya. We love it when that happens.
Next week on the pod. News on Monday. Derek Hollison from Senadia talks Nats vs the CNCF
on Wednesday. And we are playing Pound Define once again but this time with some new faces
and a mysterious one who just so happens to produce our beats.
Who just so happens to produce our beats. Oh, I want to do that.
I so badly want to do that.
Have a great weekend.
Drop a comment in ZULIP if you listen all the way to the end.
And let's talk again real soon.