The Changelog: Software Development, Open Source - Gatsby's long road to incremental builds (Interview)
Episode Date: May 6, 2020Gatsby creator Kyle Mathews joins Jerod fresh off the launch of incremental builds to tell the story of this feature that's 3 years in the making. We talk about Kyle's vision for Gatsby, why increment...al builds took so long, why it's not part of the open source tool, how he makes decisions between Cloud and open source features, and more.
Transcript
Discussion (0)
It's pretty clear we're at a very pivotal moment in how we build stuff on the web at all.
The web started with static files, and then the web app architecture developed with databases and web servers,
and the security layers and so forth that Netscape and others did a lot of work in the 90s to develop.
And that's just been it for the last 25 years.
And what we've seen, though, is that there's been a huge shift been a huge shift from kind of the LAMP stack and like single server, you know, monolithic architectures to sort of more serverless
cloud managed services and functions, etc, that you're more like stitching together stuff from
like a bunch of different like hosted services.
Being with her changelog is provided by Fastly. Learn more at Fastly.com.
We move fast and fix things here at Changelog because of Rollbar.
Check them out at Rollbar.com.
And we're hosted on Linode cloud servers.
Head to Linode.com slash Changelog.
Linode makes cloud computing simple, affordable, and accessible.
Whether you're working on a personal project or managing your enterprise's infrastructure, Linode has the pricing, support, and skill you need to take your ideas to the next
level. We trust Linode because they keep it fast and they keep it simple. Check them out at
linode.com slash changelog. All right, welcome back, everyone. This is the ChangeLog, a podcast featuring the hackers, the leaders, and the innovators in the world of software.
I'm Adam Stachowiak, Editor-in-Chief here at ChangeLog.
On today's show, Gatsby creator Kyle Matthews joins the show.
He's talking to Jared about the launch of incremental builds.
Telling the story about this feature is three years in the making.
We cover Kyle's vision for Gatsby, why incremental builds took so long, why it's not part of the open source tooling, and how he makes decisions between cloud
and open source features.
So I'm joined by Kyle Matthews from Gatsby. Kyle was on Founders Talk a couple years ago now,
back in 2018. Of course, we covered Gatsby as well with Jason Langsdorf, also back in 2018.
And we've had Gatsby on JS Party just recently.
But we're here with Kyle, creator of Gatsby, to talk about what's been going on in this brand new thing, incremental builds, which you just launched today, right?
As we record this, the 22nd of April.
Congrats, big launch day.
Yeah, thank you so much.
Yeah, thank you for having me.
You bet.
So before we get into all that, let's talk about launching stuff because you just did it.
What goes into a launch in your business and in your stress life?
And how did it go today?
Share me your feels.
Yeah, yeah.
I mean, it's intense.
I mean, you know, there's just any big release,
there's just a ton of work that goes into, you know, getting it to this point.
And incremental builds is by far the biggest thing we've launched in our company history.
I mean, we've been working towards it for like three and a half years, you know,
since I started designing Gatsby V1 back in 2016.
Yeah, and like there's's been four or five engineers
working full-time on it the last six, eight months.
And we were doing a lot of preparatory projects before that.
And just like the marketing team, the DevRel team, etc.,
the sales team have been working on stuff
for weeks and months preparing for it.
So it's a lot of work.
There's a lot of how are people going to respond is there something like glaringly wrong that we you know it's like
that we're gonna have to be scrambling to fix after it gets out so you prepare as much as you
can and you hope for the best and uh you just kind of like cross your fingers and you know close your
eyes and step across the threshold and see what happens that's right hit publish yeah so were
there any fires in the last 24, 48 hours?
Was there any blowback yet?
It's just been a few hours out right now, so early reactions.
It seemed positive from what I'm reading on Twitter.
But what about fires?
Was there anything that went wrong or a marketing copy that needs to be changed?
The little stuff that you forget about.
There was two typos, I think, in the blog post that somehow made it through the dozens of reviews we did.
Funny how that works.
Yeah, I know, yeah.
I'm like, every time I'm like, wow, it's really good.
And then I review it again the next day and I'm like, oh man, this is terrible still.
But yeah, it was actually really kind of charming and nice that like the first
OSS maintainer that we hired after we got funding, who's also like the OSS,
you know, engineering manager for a while.
He left recently to, I don't know, relax. He's a pretty casual contractor
engineer, so he's like, whoa, two years straight at a company. That's a long time.
Anyways, he found the typo and did a PR, so all of us were super excited to see that from him.
Generally speaking, it's been pretty smooth. We've been testing it for a few months
with customers and a lot of test sites.
We were pretty confident with things.
And yeah, so far it's been smooth, which is great.
Very cool.
Let's back out, zoom out, talk about Gatsby writ large,
what it is, why it's interesting,
why you founded a business around it,
as well as an open source project.
Of course, this has to be executive summary.
You can do the full founders talk
if you're interested listeners.
We'll link that in the show notes.
But tell us, first of all, the Gatsby elevator pitch, and then we'll get into the executive summary of Gatsby Inc. and where we are. Sure. Yeah. Basically, it's just, it's pretty
clear at like a very pivotal moment in how we build stuff on the web at all. It's like, you know,
the web started with like static files, and then static files and then kind of like the web app architecture developed
with databases and web servers and the security layers
and so forth that Netscape and others did a lot of work
in the 90s to develop.
And that's just kind of been it for the last 25 years.
And what we've seen though is that there's been a huge shift
from kind of a lamp stack and single server, monolithic architectures,
to more serverless cloud with managed services and functions, etc.,
that you're more stitching together stuff from a bunch of different hosted services.
If you look at a lot of other applications, a lot of people have other areas of software.
A lot of things have moved that direction.
But websites largely are still on CMSs.
If you look at WordPress, WordPress is growing faster
right now than it ever has.
It's just like eating through the web.
It's growing 2-3% a year or something like that.
It's something like a third of sites now, isn't it?
Or somewhere in that range.
Yeah, and it's adding another percentage or two
or something every year these days.
It's just interesting to see that the web is still dominated
by this old monolithic LAMP stack error kind of architecture.
Dynamic server-rendered web pages, right?
Yeah, with caching and so forth.
Everyone else has moved more towards hosted cloud type stuff.
So Gatsby is, we believe that's where websites should go too
because there's a ton of advantages towards using managed services
versus kind of download and maintain it yourself.
It's just way easier.
It's a way lower effort to scaffold and evolve and run software like this.
And so Gatsby is an effort to take the web that direction.
We like to say,
what would WordPress look like if it was invented in 2020?
And for sure it won't look like it is now.
Well, we see WordPress even trying to morph to a certain degree with the headless style and WordPress API.
So even WordPress itself,
which was around the turn of the century, 2000 era,
is morphing in certain ways.
Of course, it's going to be a hybrid, but it's going towards that and saying, well,
you can use us this way because people have seen the advantages that I'm sure you're about
to enumerate.
And so we see older, established players like WordPress offering these things.
But if WordPress was invented today or even back a few years ago when you started Gatsby,
surely Matt would have made it different.
He would have architected it quite different
is what you're saying.
Yeah, yeah.
Like CMSs,
because like traditional CMSs,
they have like multiple concerns.
They have sort of the content piece
and the content management pieces
and they have like the presentation layer.
You know, the templates and database queries
and the caching layers
to actually serve the website.
And so the world that Gatsby lives in, those two layers are split from each other.
And so you still have the CMS.
Gatsby works really great with WordPress.
We hired two WordPress veterans, including the founder of WP GraphQL, to continue working on that and build a really tight integration between Gatsby and WordPress. And so it's like, we're going to have incremental build support for WordPress
to do pretty soon. But yeah, WordPress in the Gatsby world
is just another potential backend source of data
for driving your website.
So the advantages of Gatsby is all in the, well not all, but the main
win for static sites in general and for Jamstack and this style architecture sites
is they're user-oriented. They live nearer to the user.
They've already been compiled, so you're not asking your user to wait for that render.
It's all out there. That's really where Gatsby began to shine
back when we had Jason on the show. I had an aha moment on that show
because I had been reading Gatsby's marketing pages
and I'm coming from somebody who's been using Jekyll
and looking at all these things for years.
And when I saw that Gatsby was like this blazing fast thing
built with React and GraphQL,
so I saw the React and the GraphQL,
but also Gatsby was blazing fast,
which by the way, that's what everybody says their things are today.
They're all blazing fast. so we need new marketing terms.
Yeah, we were there early, but now it's gotten a little saturated.
Oh yeah, I even wrote a skeptic's guide
to developer marketing terms on changelog.com
because there's so many that are just repeated ad nauseam.
Anyways, it was supposed to be blazing fast,
and I was talking to jason he kept
talking about the lighthouse scores and like all these things and i was like well there's another
one that's fast it's hugo hugo's really fast because you know you hit compile or build and
my jekyll blog of a thousand ish posts takes a minute and a half and hugo on that same content
takes four seconds and so that's how fast is gatsby? And Jason was like, no, no, no, no, no, no.
It's not fast there. It's fast. The produced thing,
which arguably and they're both important,
but the most important place is where it's the fastest.
And even back then it was like, yeah,
but what about my builds because I'm using it as a developer.
And he was like, like well we're working on
that we're working on that and so it's interesting that even back then my aha moment was okay gaspy
isn't fast on the build necessarily faster than any other players in the static site generator
space it's faster on the produced assets and that's great um but when it comes to building
we still have this it's not really catch 22 i something to building, we still have this. It's not really a catch-22.
It's something to get a hurdle.
We have a hurdle, which you guys have been working on.
So explain why that's there.
Why build times for these types of tools grows as your site grows.
Yeah, so as you're kind of alluded to,
it's like pre-rendering or pre-building your site
is an awesome kind of way to run a website
because you don't have to worry
about operating the website.
It's like a bunch of files sitting on a CDN
can't go down because you don't have running code.
Right, you're de-risking it.
There's nothing that can break
in the request response lifecycle
because there's no logic there.
It's just like I'm serving assets.
Yeah, and you can't hack it because there's no code to hack into.
All of the actual running code is either on a CI server
or behind a firewall.
The CMSs are well protected.
Yeah, and you can handle any sort of traffic spike
because you're just on a shared CDN
and they're seeing a million times your traffic,
whatever your site gets.
And the website is pushed all around the world,
like the CDNs take care of it.
So there's a lot of cool stuff about the model.
But yeah, the traditional kind of downside is the build time.
And in the blog post, the launch blog post today,
we kind of went into the history of these two pathways,
where you have database-driven websites
versus pre-built, statically rendered websites.
And this has been sort of the traditional dichotomy on the web.
And it's interesting, too, because the early days of the web,
everyone thought that pre-building was going to win.
And if you look at a kind at movable type versus WordPress,
that was a big battle in the early, mid, late 2000s.
People were like, yeah, movable type,
that just makes so much sense.
It's easy and scalable to host, it's faster,
the websites are faster.
But yeah, the problem that everyone ran into
is that the build times just sort of grew.
So you could start out a project and it was fine,
but then at some point your build times would be untenable
and you'd have to switch off to WordPress or something like that.
And WordPress made the bet that they're like,
hey, with Moore's laws and other stuff like that,
WordPress is going to be fast enough that we can just rely
on this sort of dynamic rendering model with caching.
As history showed, it kind of proved out.
So what we've been working towards is,
when I went into thinking about working on Gatsby full-time
and making it a thing and raising money, etc.,
we were very aware of this sort of dichotomy
in that database-driven websites had won
because they're much faster and easier to update.
But we saw that there was just another option there,
that it wasn't just those two options.
Because before Gatsby, I worked a lot
on streaming architectures,
like stream processing and type stuff,
using Kafka and other tools like that.
And these are kind of like new models
that kind of like in the big data world
have replaced the batch processing model.
Instead of nightly
or every hour running a process
to regenerate a bunch
of data,
architectures like Kafka
allow you to process data as it comes
in. Data is streaming through
and through some clever programming techniques
you can update things very cheaply
on the fly.
Your derived data is updated in real time. So my thought back then was like, okay, that could apply really well to websites. Because what is a website other than
a sort of the most recent view of a long series of events? So it's like code changes, data changes, et cetera, that get turned, processed into something
that's served out to the users.
So static sites are kind of like the batch processing model,
and then database-driven sites are sort of the
generate-on-the-fly, cash-expensive things
so that you're not overwhelming things.
So those are kind of like the two models on the web.
I was like, well, why isn't there also an, you know, kind of an opening for sort of a stream processing model
where instead of rebuilding the whole thing every so often, you just sort of very develop a model
to like very lightly update stuff on the fly. Cause then you get the benefits of, you know,
it's like cached, you know, it's like the, the view is up todate and cached, and it's cheap to generate.
But you also get the benefits of real-time.
You're not delayed by an hour or a day or whatever the arbitrary...
You're not batching.
Yeah, the arbitrary batch update time is.
That's what incremental builds is.
The idea is that you can blend the models of dynamically generated websites
that are database-driven with the pre-rend kind of pre rendering pre build static delivery model. But
you have a build process, which is kind of patterned after this, like stream processing ideas
that can very quickly, very cheaply update the entire website, every time like a new change
event comes in. So it's like, you know, we a webhook from a CMS saying, hey, something changed.
We take that data and we figure out using Gatsby's data engine which pages need to be
rebuilt.
And we rebuild those five pages or whatever it may be and then push those out to the CDN.
And that process takes seconds with the new incremental builds we launched. Firebase Admin Panel that lets you monitor key KPIs. Maybe even the tool your data science team
hacked together so they could provide custom ad spend analytics. Now, these are tools you need,
so you build them. And that makes sense. But the question is, could you have built them in less
time, with less effort, and less overhead and maintenance required? And the answer to that
question is, yes. That's where Retool comes in. Rohan Chopra, Engineering Director at
DoorDash has this to say about Retool, quote, the tools we've been able to quickly build with Retool
have allowed us to empower and scale our local operators, all while reducing the dependency
on engineering, end quote. Now, the internal tooling process at DoorDash was bogged down
with manual data entry, missed handoffs, and long turnaround times. And after integrating Retool, DoorDash was able to cut the engineering time required
to build tools by a factor of 10x and eliminate the error-prone manual processes that plague
their workflows.
They were able to empower backend engineers who wouldn't otherwise be able to build frontends
from scratch.
And these engineers were able to build fully functional apps in Retool in hours, not days
or weeks.
Your next step is to try it free at retool.com
slash changelog. Again, retool.com slash changelog.
So incremental builds was really part of your vision all the way back when you started it you mentioned that to the new stack that like this has kind of been the vision perhaps even
your early pitches to investors was like look i need to build this engine the obvious question
for me somebody who's not having to build it but some of y's just been waiting for it, is what's taking so long?
Why is it so complicated?
That's a great question.
Yeah, so this is definitely part of the vision from the get-go.
What's taking so long?
Yeah, it's a super fair question.
I mean, we certainly didn't anticipate it taking this long.
I'd say it's like a couple things.
It is like a super hard technical problem, which as we dove in deeper and deeper to the problem, it's just
remarkable how many things can trip up fast builds.
It's like we just kind of approach the problem with different types of source data and different
types of pages, et cetera, et cetera.
And it's just been one problem after another.
Can you give us a
peek into that world like i can see and i've solved technical problems never anything that's
taken multiple years as i've dove into technical problems i've seen the layers unpack and like
you turn a corner and here's a brand new thing you never consider and actually it
has implications for these seven other things you thought you had in the can yeah can you give
an example i mean i imagine the most straightforward problematic case of an incremental build in my
opinion as a person who hasn't had to solve this problem is i have all these pages they have
dependencies right on certain data i update a footer and it has to update these seven five
seventy five thousand pages or something and so like knowing that all these pages share this footer data,
but if I change this one topic tag,
actually that's only seven pages.
The dependency graph, I can see where that's complicated,
but what else is there?
Is there other dragons that you have to defeat
as you go down this pit?
Yeah, the dependency graph is the largest part
because it's just making sure that you can track
the implication of every change
and then identify the shortest amount of stuff
that needs to be done.
So last summer, for example,
an engineer spent two months refactoring
how we associate data with pages.
Because before that, there was some data that was just affecting most pages or all pages,
which meant that any data change made a trivial change to all pages, which, of course, breaks incremental builds,
because you need to be able to say,
only change these seven files.
But that just sort of baked in a whole bunch of different places.
And so kind of reversing that and kind of getting the DAG clean
took a while.
Also, it's like Gatsby builds do a ton of stuff.
We source data from n number of
places, you know, it goes into this like data transformation pipeline. And so like markdowns
getting converted to HTML, and you know, images are getting processed, and, you know, variety of
other kind of like possible transformations happen. And so we also like taking the data that's
generated from all these different different sourcing and transformation steps
and then generating a GraphQL schema.
And then with that GraphQL schema, we then run queries.
And so there's just a lot of moving pieces.
One of the biggest challenges that the team has faced
this last while is helping all these pieces coordinate with each other so that it's a smooth
deterministic process for like, boink, boink, boink, hits this, and so forth.
One of the biggest ones we've done is we've been steadily moving
more and more Gatsby internals to X state, which
a state machine gives us a lot more predictability
around coordinating all these different moving parts.
And so that's proved really helpful,
both to just visualize what's going on.
Because if you have seven different subsystems
all talking to each other and interacting with each other,
the XState Visualizer is very helpful for that.
Then also just lock down stuff.
And so we had a bunch of kind of event emitter type things
that were going on before. And so
something would finish and be like,
I'm done. And then, you know, three other
things might do it. And so it's like you kind of get these
like loops going where it like do it multiple
times or anyways, it's just
a lot of inefficiencies and
bugs and race conditions that would pop
out of that. And a lot of that, you know, didn't
show up with regular builds.
That was part of the problem, too,
is that all this stuff in a regular build,
it's much more just like, vroom, go through.
But then with incremental builds,
you're now keeping the cast around.
And so there's a lot of things that are already there.
And so it's like all the random kind of side effects
that might be popping around that add extra time.
They don't mean as much in the context of a five-minute build.
But when you're trying to get incremental build times
like under 10 seconds or under five seconds.
Was that your threshold?
Did you set out with a threshold of time?
Yeah.
And said, we've got to maintain this threshold?
Yeah, 10 seconds from content person pressing save in the CMS
to it being live on the website
is sort of our kind of gold standard for what we're doing.
So I ask you to zoom out again because I think you and I might share some context that the
listener might with regards to Gatsby's data engine, the GraphQL-based data engine, and
how it allows you to slurp in data from all these different sources.
I understand that.
You sincerely understand that.
Maybe explain that so that everybody understands why some of these things that you're explaining
might make more sense.
Yeah, yeah, definitely.
Yeah, that's a great question.
So kind of going back to, or as I mentioned earlier, that websites are moving from a monolithic
CMS to sort of like a distributed CMS, or we like to call it like a
content mesh that, you know, previously, you know, your front end was kind of like directly tied to
your backend. So if you're writing PHP, or you're writing, sorry, if you're working in like WordPress
or Drupal, you know, you have sort of like a built in data access layer, you know, you don't have to
like do anything special, like the just shows up in your templates.
But once you go distributed, the question is,
well, how do you get the data from wherever it is
to your website in the right format you want
into your templates and then off to the user?
And do that consistently and quickly
and it's not hard to set up.
So anyway, the solution that we've arrived at
is this idea of source plugins.
And so anything that has an API,
you can write a source plugin for.
And so I think there's like 400 plus source plugins now
for Gatsby, which includes all the normal CMS suspects,
but a lot of things that you wouldn't even really think of
as a CMS or a source for data,
but actually in practice turn out to be quite useful.
Like Airtable, Google Spreadsheets.
Greenhouse for application tracking systems
is one of our most popular source plugins, actually.
Because a lot of companies build their websites with Gatsby
and then they have a careers page.
And then whenever they add a new job to it,
it just automatically shows up on the careers page.
And when they close the job, it disappears.
So it normalizes all these different data sources.
And they could be a file system, it could be a database,
it could be an API, it could be Airtable.
As long as they have a source plugin,
or you can write your own source plugin for whatever data source,
Gatsby can slurp that up and use it on the other side.
Yeah, exactly, yeah.
We've seen larger sites that have a million plus nodes, is what we call it, into each individual piece of data that comes in.
And so tracking all that data and tracking the dependency graph from the data to pages gets to be pretty complicated. And that's where a lot of the engineering work has gone into,
is making that all efficient and scalable and deterministic
and very, very fast.
So that does help understand why it has taken
a significant amount of time.
Probably your investors want to know,
when is this going to be out there?
How much do we have to spend to get this thing built?
The other question that your users probably are wondering,
and I've seen a few of them wonder on Twitter
in reaction to this,
and the question that I think plagues most businesses
that have an open source project at their core
or as a major component is,
this is part of your cloud offering, this
is a paid thing, this is not part of the open source project, what's up with that?
And so were there technical reasons?
I mean, the explanation to me, it seems like there probably are, but are there technical
reasons why incremental builds is cloud-based, or are there only business reasons?
It's primarily technical.
It's like, you know, to make this work right, it requires tight coordination
between the CI service and distributed workers
because we've paralyzed a lot of work across functions,
plus deploying to the CDNs.
To get the demo video that I put into the launch blog post
and the tweet, it had me clicking and contentful.
It had a three-second build,
and then it's immediately live on the internet.
And that sort of extreme real-time build speed
is only possible with very tight coordination
and purpose-built infrastructure
that is deliberately created around the Gatsby build process.
Because the infrastructure is running the build.
Yeah.
And so that infrastructure is part and parcel, I guess,
of our offering.
It's purpose-built for that, and it's super valuable.
But all the other kind of pieces,
it's like, to do incremental builds,
Gatsby has actually been, like, has been incremental
in the sense that we, like, cache stuff between builds, you know?
So, like, if you have the cache there, it doesn't redo a lot of the work for, you know,
since V1, you know, so for, like, the last, like, three years.
But the problem is, is that most places that people do Gatsby builds, you know, there's
like hundreds of thousands of Gatsby sites, you know, out there and all sorts, running
all sorts of different places.
The problem is that it's just
CI services
can
retain cache between
builds, but it's often
an extra step. And then they
don't do it very efficiently.
Most of them
tar up your cache and throw it off to S3
and then restore it and so it's fine
it works and like you can like easily achieve with gatsby build speeds of you know 45 seconds to two
minutes you know for many sites but to get to five seconds you know requires something deliberately
built for gatsby gotcha and you know we raised money, we knew that this was possible
and we knew that no one else was going to build it.
Most people that do CI stuff, it's kind of generic services
that are meant with a wide range of software.
And we knew that would be really valuable for the web,
that there's something like this that exists,
that balances the dynamic nature of database
driven sites with kind of the ease of operation security and speed of static sites and yeah so
we we thought that this this this approach made a lot of sense and make a lot of people happy
so along the way you have published some experimental optimizations in the open source package.
It looks like this is a flag you have to turn on.
Is this what you're talking about when you say it does do incremental builds?
Maybe you call it smart builds.
It caches things it can in the CLI or in the build tool.
Is that what you're referring to,
the experimental page build optimizations for incremental data changes?
Yeah, so there was a PR that a community member did recently
and our OSS team worked with.
It's a company that's doing a larger Gatsby site
and so they were like, hey, let's increase the build speed.
And so they added a PR to cache HTML builds.
So that was an example of, yeah,
of all the things that Gatsby does in the build process,
that was one of the things that wasn't being cached. And then with that, you know, it reduces
the amount of work. And that was awesome. And that's, that's experimental right now behind a
flag. We'll probably make that, you know, the default in V3. But even before that, like we've
cached a lot of the other parts. So for example, you know, transformation of data. So like mark
down to HTML, for example, is fairly computationally expensive.
And image transformation is very computationally expensive.
Running queries is kind of expensive, but at scale it gets expensive.
And so those three things have always been cached. When we launched Gatsby builds in January, we launched and we had benchmarks that show that Gatsby builds was faster.
This was like pre-incremental builds,
but it was already faster by like 10, 20x over other services.
And it was largely just because we kept the cache around on disk
so that you would get a container and it would run your build.
And the next time there's a build, it would just run it again, right into the
same container.
So there was no waiting for the cache to come from somewhere.
It just start up.
The cache was fully there.
And because of that, yeah, it was just dramatically faster.
Which is just the difference between a purpose built tool and a generic tool, right?
Yeah.
Because on a generic CI, who's going to do your build for you, you get a container.
It's going to do all the same things,
only that container is probably going to go away between builds
or it doesn't know the Gatsby bits
or it's not aware of the Gatsby bits.
Exactly, yeah.
And because we maintain Gatsby and Gatsby Cloud,
we can continue to do optimizations on both sides
to kind of get to that point where people consistently see
sub-10 second build times.
So we had Frank Karlochek on the show a couple of months back.
He's from NextCloud.
I'm not sure if you've heard of NextCloud.
But they started off as OwnCloud, which is a personal cloud software.
And they were open core.
OwnCloud was open core.
And they ran into nonstop issues of determining
what goes into the
open source edition and what goes into the enterprise
edition or whatever it was called.
And they butted up against their community in many
places because there was such an enthusiastic
community, which I think Gatsby has as well,
that their community would implement things
and it would actually go against their own business
model. He actually started a brand new
company called NextCloud,
which he calls 100% open source, not open core.
It's interesting with tools like these where you kind of have an open source project.
It's not open core, but it kind of is in a sense
because you have a SaaS that's basically providing features
or a build tool, which is kind of a SaaS.
And so I'm wondering how often you run into these similar problems.
Here's a situation where your community members
built something cool, and it's like,
hey, let's slurp that up and make that part of Gatsby.
But have you struggled with decision making
around where do we monetize,
where do we just give things away
and how do I strike that balance
without alienating people or making my investors mad?
This is a classic problem in open source
for open source commercial companies.
Open source is not cheap to create.
We're spending three plus million dollars a year
paying engineers, designers, etc.
to work on Gatsby.
It's a very complicated piece of software already
and it's just getting ever more complicated.
We have investors who expect us to make money.
We want to turn this into a sustainable business that
can keep investing more and more in Gatsby and the overall experience
for building web applications and sites.
The model we've arrived at is
called Open Cloud.
It's different than OpenCore, because OpenCore came from more the era where
the expectation was you download the software
and run it on your own servers. And with kind of the
rise of more kind of like cloud services type things,
a lot of open source companies are kind of turning to a model where
they kind of develop the capabilities of being the best in the
world at running the open source software and kind of providing
various sort of cloud services around it that
people need. And kind of a really great example of this that not a lot of people
think about is GitHub,
because GitHub is a commercial open source company. It took Git, which is a phenomenal
open source project, and then they added hosting. They're super good at hosting very large Git
repositories. You never even think about it. You just like Git push and pull and just, well,
except for the outage yesterday and last week. Anyways, it's okay. I was going to say the Philanthropy, but well, maybe not.
Anyways, they're pretty good, though, all the same.
And they offered the hosting, the infrastructure,
because it's sort of annoying to set up a VM or something
and pop Git on it and keep it secure and do backups
and all the other crap that you have to do to run Git.
And they added collaboration.
They invented the whole pull review cycle,
which revolutionized how we do software.
And I think that model of providing infrastructure
for open source and providing collaboration
around the open source is the model
that balances the freedom
and hackability aspects of open source that people love
with the need to create a sustainable business model.
Because a cloud service is just not something that any user...
Because the problem with open source is when the commercial entity
has similar ambitions and goals and capabilities
as the user. Because then there's that conflict.
Because, well, why would I buy your silly little add-on
when I can just write it myself?
It's kind of silly.
But in GitHub's case, do that many people really want
to run their own Git server and learn how to do backups
and security and everything else?
There's some, but it's an infinitesimally small percentage.
And the same thing, does an agency that's
building Gatsby sites, do they want to set up
a Kubernetes cluster to run
Gatsby builds and monitor it and
all that jazz? And there's a handful,
but again, it's an infinitesimally small
percentage of that.
And also, it's just like by combining
our deep expertise in Gatsby
and the deep expertise and capabilities we're gaining in running cloud services,
it's a very powerful combination that we can continue to optimize both together
and provide a really, really superior service to people that are using Gatsby.
Yeah, and then the collaboration bits,
there's all sorts of stuff that we can do
that once we're tied into your kind of development and content workflows, you know, there's a ton of things that
we can do to make the experience phenomenal, you know, not just for the developers, but the designers
and content people and marketers, etc. And, you know, that's the sort of thing that we think is
a great balance for everyone. Because Gatsby, the open source bits are 100% open source.
We're not holding anything out of it.
But how we do the builds is kind of intimately tied into our cloud service and optimized
so that everything is just humming along.
Like the connections to,
like when you set up a CMS,
we manage that for you so that it's like sending
the right webhooks with the right,
because a lot of these services, they'll let you embed sending the right webhooks with the right, you know, because like,
you know, a lot of these services, they'll let you like embed information into the webhooks. So we do everything to like optimize the what they send and how we pull the latest data so that we
can opt, you know, so it's just like this big complicated chain of events that have to happen,
you know, to have really fast builds. And we optimize all that for you. And we develop a lot
of, you know, again, sophistication and capability around that,
which no individual developer
of working on a web project with Gatsby would ever do.
And anyway, it's just sort of like,
it's like we develop this capability
and we can offer it at scale
to anybody who wants to use Gatsby
for a very reasonable price.
And then anyone using Gatsby
can hack at it all they want
and do whatever they want with it.
We think that's a really great balance for people.
I'm Jared Santo, JS Party's producer
and one of nine regular voices you'll hear on the show.
We are a party-themed podcast
so fun is at the heart of every episode.
One way we keep things fun is by mixing it up and trying new things. We play games like JS Jeopardy. This
gives you access to an outer function scope from inside an inner function. Oh, I think that,
never mind. Global scope? Incorrect. Yeah, I didn't think so. Debate hot topics like,
should websites work without JS?
I'm going to appeal to authority and read some quotes at this time.
Okay.
I've lost complete control of this panel.
Go ahead, Ross.
The first quote, no code is faster than code.
Discuss and analyze the news.
Yeah, this reminds me of when you're playing Pokemon
and you have like, you know, an electric Pokemon versus a water Pokemon and you try like an attack.
Share wisdom we've collected over the years.
To be honest, a lot of what we rely on is pretty garbage.
And like, I mean, I wrote some of it, so it's OK.
Like, I can say this.
Interview amazing devs like John Rezig and
Amelia Wattenberger and a whole lot more. Oh, and did I mention we record the show live? We do.
You can be part of the hijinks each and every Thursday at change.com slash live.
This is JS party. Please listen to a recent episode that piques your interest and subscribe
today. We'd love to have you with us.
So I like the GitHub comparison.
I think what I'm learning over time as I talk to all these people
who are in open source and in software
and trying to figure out the balance
and what works for them is
it just is not a one size fits all solution
to this problem because there's so much nuance when you say open source even it's like that doesn't
even mean we all share the same license like that doesn't really mean much at all and so you have
to drill down and you have to say well this circumstance is actually slightly different
because of the nature of their customers and this one's different because of that so i think that's
just fascinating to talk to people like yourself
who've figured out what's working
for your open source project and your business
and where to optimize for your customers and your users.
GitHub's an interesting point.
You mentioned they took Git and they added hosting.
What's interesting about Gatsby
is you've taken Gatsby and you've added building,
but y'all aren't hosting.
Right.
Is that just for now?
Is that because you don't want to run a CDN?
I'm just curious.
It seems like an obvious next step.
Yeah, we're definitely going to continue
to tighten our integrations with CDNs.
But what's interesting is CDNs are phenomenally complicated
to build and run.
So most people who say they do hosting
are actually outsourcing it to somebody else.
Like Firebase even doesn't do their own hosting.
As far as I know, Fastly is running their hosting.
And the company formerly known as Zyte is using Cloudflare.
Netlify is, I think, the only one that actually has their own CDN.
But even then, they're relying on an open source project like Traffic Server, I believe,
out of Yahoo.
Sure, but from a customer's perspective, the buck stops with Netlify, right?
Right, absolutely, yeah.
You're not going to turn to their partners and say, hey, you're going to say, my website's down.
That's their problem.
So they might be white labeling.
I mean, Heroku was built on AWS, right?
Yeah.
So a fair point.
So you could do that.
You could say, well, we've got a great partner.
We can be on top of Netlify,
or we can be on top of AWS and be Gatsby hosting
without the complication.
Yeah, yeah.
So our plans roughly are, AWS and be Gatsby hosting without the complication. Yeah, yeah.
So our plans roughly are,
we think that owning your own infrastructure and having control there
is something that a lot of people want.
So our goal is to have, just like we have,
you can use any CMS and we treat them all kind of the same.
You can also deploy from Gatsby Cloud to any CDN.
So right now we deploy to five different kind of deploy targets,
what we call them.
And there's like three more, I think,
that's on our short list to add.
And we think that's really important
because for a lot of teams,
they already have something set up.
They have contracts with the CDNs,
they have infrastructure in AWS or Azure or Google Cloud.
And the website is an important part of that,
but just part of that.
So we want them to choose Gatsby,
and we also want them to choose Gatsby Cloud,
and we also want them to be able to continue running
their infrastructure as they're used to.
And so we can push to an S3 bucket
or whatever it is that they want to do.
But for kind of the more,
there is a large percentage of people
that just want us to handle it.
So that is something that we're planning on doing
in the future.
But we just barely launched last November
with Gatsby Cloud at all.
And then Gatsby builds in January and now this.
And so it's kind of one thing at a time.
Yeah, that makes sense as a not a priority,
but I don't think it makes sense as a never do.
Yeah, exactly.
What about the CMS side?
The other integration side would be,
well, what about Gatsby CMS?
Because now maybe you can integrate better,
you can write your data sources better,
or whatever, maybe you have great UX,
and you could offer that as an option
alongside Contentful and the other CMSs out there.
That question comes up a ton.
I guess my most common response to that is,
which CMS would we build if we built a CMS?
Because there isn't a one-size-fits-all CMS.
If you look at the CMS market, it's insanely fragmented. Because there isn't a one-size-fits-all like CMS.
It's like if you look at the CMS market,
it's insanely fragmented.
And the reason for that is that there's just like,
the idea of managing content and data
can go so many different directions.
You know, there's Google Spreadsheet
to greenhouse for job applications.
Because like content, it's not just the content,
but it's the workflows and permissions
and everything around it that has to fit
within the context of the team
that's maintaining that content and data.
So yeah, so the question for us is always like,
well, we don't have any particular expertise in that,
and anything that we built would only fit
for some subset of the market and so our thought is instead of building a quote-unquote cms it's like we develop
the capacity to kind of package together you know gatsby plus cms's into something that's like
you know very kind of like feels like a coherent thing.
The analogy I like to use is it's like CMSs were like mainframes, you know, where back in the 70s,
60s, 70s, you know, early 80s, you'd have computer companies that would like build a computer from
scratch, you know, they would like develop a lot of the hardware, they'd write all the system
software, they'd write all the user land software. It was just like this massive operation. And then they'd like ship you like,
here's your, here's your, you know, one ton computer. And they were super expensive,
you know, like hundreds of 1000s of millions of dollars, even in those years dollars. And then
the, you know, the PC revolution happened, which was that IBM developed a spec for how
these like different pieces could fit together.
And then from that kind of common spec for how the different modular pieces talk to each other,
there was just hundreds of different PC skews that came out of that,
aiming all the way from high-end business uses to poor college student
to every other little niche in between.
The idea of going from monoliths to modular units
that are packaged together is really appealing to us
because it's like, why develop a solution
that just fits in some slice of the market
where we can just take Gatsby plus whatever subset of CMSs
makes sense for the use case of, you know,
the company organization that needs a website. And then work with our partners, you know, at the CMSs
to do kind of package things together really nicely. So that's like really streamlined to
start new projects to use it and manage it and so forth. And so we've already done quite a bit
around kind of our preview workflows, where, you know, you have a CMS, and then like they click preview, and like pops open a preview,
like kind of a live preview of the Gatsby site on Gatsby Cloud. And then when they click save,
you know, it sends off the webhook and does the build. And so functionally, from like someone
living in the CMS, like they don't need to know that Gatsby exists.
It's kind of like if you use WordPress today.
Does a WordPress user really know how the site is packaged and shipped?
Do they really get what's going on?
They largely don't.
They live inside the CMS,
and then how the bits of information they're retaining
gets on the internet and delivered to somebody
is not of interest to them.
And so like Gatsby, it's like what we're doing with WordPress, Drupal,
all the headless CMSs, we're kind of acting in that same sort of capacity
that we're fitting in so seamlessly that you don't even know what's happening.
So like with WordPress, we have experimental support.
You click the preview button in WordPress
and it just pops open a preview,
but it's now Gatsby Cloud Preview
instead of the WordPress preview.
And you click Save and we can just show a little spinner
or something until Gatsby Cloud reports back
that now it's live on the internet.
So somebody could swap out PHP,
the PHP layer in WordPress,
and just have Gatsby.
And everyone living in WordPress would never know the difference.
Yeah, that's interesting.
So the marketing person,
they think they use WordPress
for all intents and purposes.
Yeah, and they do, yeah.
But then when they click preview,
they put their site,
they think it's on WordPress,
but it's actually on Gatsby.
It's on both.
Yeah, exactly.
Yeah, yeah, yeah. It's just a different tool to kind of on both. Yeah, exactly.
It's just a different tool to deliver it.
Well, you're definitely right that there's no one-size-fits-all CMS because publishing is such a personal
or a localized workflow.
We've built our own platform for podcasting for years
because we were on Tumblr
and then back when Tumblr was where you had to be
and then we were on WordPress
and yes, you can use WordPress for publishing podcasts
and yet it didn't fit into our workflow.
It didn't work the way we wanted to work
and so as developers we're like,
well, we'll build our own platform
and now we can tweak and customize it
to work the way that we work
because even that, even amongst podcasts,
like our site's open source,
people use it all the time to read
and check out what an open source Elixir or Phoenix app looks like,
but they don't use it to build their own podcasting platform
because it's for changelog.
It's open source because of course,
but it's not a generic platform.
It's a customized platform because we wanted customization.
And so even amongst podcasters,
you wouldn't find our CMS probably all that useful
unless you're doing podcasts the way that we do podcasts.
And so even amongst, you could say,
well, we would do a podcast CMS.
Yeah, you can hit the 80%,
but it would be hard to hit the 80%
in all these different areas, right?
News, podcasting, e-commerce, etc.
I mean, blogging or generic.
A WordPress competitor would be the most obvious choice
if you had to say what kind of CMS would we build.
It would be like, well, pages and posts and images.
But what fun is that?
Yeah.
But WordPress and Ghost ghosts already exist so
like right and what fun is it to have another one of those yeah yeah i mean there's like decades
literally decades of effort going into make making them awesome so we're like let's just
keep using wordpress like don't don't stop using wordpress just uh you know swap out the front end
so you don't have to worry about hosting or security or updating plugins, etc.
Because WordPress, everyone agrees that CMS is amazing,
but the operation of the website and the development experience
and the ease of pulling in other services and the ease of building application-type stuff,
everyone quite agrees that that's not so pleasant.
I'd actually argue that there's not enough CMSs.
Kind of like your experience, there's a lot of value in building
something specific to your needs.
One of the coolest conversations I've had with a Gatsby user is
this little two-person agency in the Netherlands.
They started using Gatsby with a really rudimentary CMS
that they built for political activist websites.
So it's a lot of political activism,
a lot of campaign websites,
just everything in that genre.
And they built this CMS that's super specific
to that kind of world.
And then they built a Gatsby theme that kind of encompasses
the maybe five different faces or something
that these different websites can go.
And he was just almost cackling on the website.
He's like, I can build a site in minutes
because he just goes to Netlify and tweaks some environment variables and
points at an instance of the
CMS that he built and voila,
off it goes.
It wasn't hard at all for him to build that
because
when you decouple
all the different aspects of the traditional CMS,
all of a sudden building
the admin part and the workflows in there
becomes a much more approachable task
because you're not worrying about,
how do we actually build the website
and how do we run it and deploy and all that?
Because those are super complicated.
But Gatsby can handle that
and then you can just plug in your data
from whatever it is.
Yeah, you can have a Cambrian explosion
of small, focused CMSs
because all you're building out is the workflows
that are custom to that particular vertical
and you can make it super for them
and you don't have to worry about the presentation layer,
like all that stuff.
Make a Gatsby plugin, source plugin,
and you could have a separate team
even working on the actual website and all that.
That's interesting possibilities.
Yeah, so I see kind of a world where the average number
of content sources in a Gatsby site just keeps going up and up.
Because each team that's responsible for managing
some part of the website, they can have something
very specific to their particular ways of managing the data.
So it's like you have a store locator page on the site.
And there's a team that's maintaining that
and updating information about location
and a new one's opening up.
And so they're like, new store just opened
and geotracking, whatever.
I mean, all the information that they need
can be locked down in some CMS instance
and they don't have to have access to anything else.
They don't have to get distracted or weirded out
by the hundred other content types that other people need
for managing other types of the website.
And it can just be super simple and straightforward
to do their jobs.
Yeah, and I think that's really nice,
because if you talk to anybody who works on websites,
most people do not enjoy it.
And I think there's just a lot of incidental complexity
that comes from mashing everything together in one system.
That doesn't need to be that way.
We're getting short on time, but I did want to ask you about Gatsby Recipes
before we call it a show.
Speaking of launches, man, you've been launching.
You launched this one a couple days ago, it seems like.
Yeah, just last Thursday, yeah.
A passion project of yours, something you're interested in.
Tell us about Gatsby Recipes.
Yeah, so Gatsby Recipes is sort of like
a new automation framework that we're adding to Gatsby.
Basically, it can kind of take a lot of routine tasks
that you need to do with a website
and you can pull those things into a recipe.
And a recipe is like an MDX file, which maybe at first blush sounds really weird,
but it's like Markdown and React components.
We wanted to kind of follow...
We wanted to follow a kind of liter yeah it does yeah uh we wanted though to follow a kind of like a literate
programming style thing um where you could like describe what you're doing and then tell a story
the steps yeah so tell the story through it because we think that's really helpful for
kind of like scaffolding you know kind of code yeah because scaffolding is all about like why
is this website the way it is you know it is? It's like you chose different pieces for specific reasons.
And so if you can say, well, this is the recipe that sets up the CMS
and the components that use the data from that CMS,
then you can look at a site, like a site directory
with all these different files and config things and whatever,
and then you can pull it out and extract it into recipes
that there's some sort of coherent story around that.
So you can be like, here's my dev tools recipe
and install some NPM packages,
you know, add some dot files.
Maybe it hooks up Cypress, you know, Cypress Cloud,
you know, maybe like using Storybook.
So it configures that and it like hooks up to,
you know, kind of like cloud service for running that.
Anyways, and so it just kind of combines all those things.
But it's not just like telling the story, so it just kind of combines all those things.
But it's not just telling the story,
but it's also when you're starting a new project,
you can reuse these recipes.
And so you can just like, okay, for this project,
we want WordPress, we want Shopify,
we need a normal developer setup with TypeScript and Cypress and Jest.
And we're going to set up CircleCI.
We're going to use Gatsby Cloud, of course, to do the CICD.
And then we're going to set up Fastly for hosting.
And Azure for its functions.
I don't know.
You just throw them together, all these different things.
And then you can have recipes for each of these.
And at the start of the project, instead of going to each one
and tediously configuring it and clicking new
and grabbing API tokens, et cetera, et cetera,
you just run the stack of recipes.
And when it needs to know something,
it'll just ask you information.
And then five minutes later, you have a website.
That sounds cool.
So I'm reading through some of the code as you talk
and even your selected recipe to run in the blog post,
which will be in the show notes as well. And it looks almost like a Docker file.
It's like a add this, add that, do this.
Script is another word that maybe I would use, but scripting has more connotations.
Maybe this is more declarative than a script would be.
Pretty cool.
And so you run the recipe, set up, you can tweak it, and then you can just click off a new Gatsby site.
Yeah, it takes a lot of inspiration from infrastructure as a code projects, like Terraform is kind
of a prominent example of that.
Because all these tools started developing 10-ish, 15 years ago.
And the reason is because as developers started moving
towards cloud-centric models,
all of a sudden managing all your infrastructure
got a lot more complicated.
And so they developed programmatic ways of maintaining
and evolving that infrastructure.
And so with websites moving to the cloud as well, the same problem is emerging.
It's like, wow, three CMSs, a form handling API, and this and that and that, and functions.
And anyways, it just gets a lot more difficult to scaffold out a new project and to kind
of evolve it.
You're like, okay, we launched, cool.
Now we need to do this new feature.
It's like back of the setup stuff.
And this is meant to kind of reduce that burden a lot.
It's also another way, kind of another perspective is,
you know, kind of like the monolithic to distributed CMS
and kind of the mainframe to PC.
It's like when you go from a monolithic
to modular architecture, you know, the initial stages,
there's just like a lot of energy that's unlocked because previously people who are kind of locked
into the monolith and like frustrated by the limitations, all of a sudden, like this kind of
sophisticated early adopters, all of a sudden they can like start building out whatever they want.
And they're just like super excited. It's like, you know, a kid in a toy store sort of thing.
It's like, well, I can do this and this and this. And they're just like piecing things together.
And it's awesome. And that's like the initial stages. And that's like, you know, a kid in a toy store sort of thing. It's like, well, I can do this and this and this. And they're just like piecing things together. And it's awesome. And that's
like the initial stages. And that's like, if you look at like kind of the homebrew computers in the
late 70s, like they were buying like processors and, you know, wiring things up. And like, there's
all these people that are super excited about it. But things don't go mainstream until you kind of
like standardize stuff a bit. And then you can like package it together into sort of a consumer-friendly thing.
Because like, you know, the average user of these things
isn't as excited about like how things work under the hood.
They just want something that, you know,
you press a button, it turns on,
and it's, you know, generally speaking,
reliable and so forth.
Right.
Yeah, so it's like headless CMSs,
you know, headless WordPress, headless Drupal.
A lot of people are really excited about this. A ton of people over the last few years, headless WordPress, headless Drupal.
A lot of people are really excited about this.
A ton of people over the last few years have built a lot of things.
But if you look at the percentage of their adoption in the marketplace,
it's like single digit, if that, of websites.
It's like a very, very tiny.
So there's a lot of noise.
There's a lot of excitement and noise around it.
But the actual usage is much smaller than that.
So to ask the question, of course, is,
well, how do we mainstream this?
Because we think it's awesome.
It is amazing.
It's like once you get there and you're building a Gatsby site,
it's incredible.
You're super productive.
The websites are super fast.
They're really cheap and easy to run.
That's awesome.
But it's like, how do we get lots and lots of people in that same world?
And to get that, it's packaging things up.
People can just like, cool, this is what I need to build a website.
It's sort of like the famous five-minute install for WordPress,
but for this modular world.
So Gatsby Recipe, this is part of the open source project, right?
It's not a Gatsby Cloud thing, it's right there in the CLI.
And available now.
And there's some recipes out.
Is there a place for sharing recipes?
Because that's the next step then.
Get a recipe hub out there.
Yeah, that's something that's very much on our mind.
Right now people are just like,
we have some official recipes that you can get direct access to. And then people are just sharing, we have some official recipes that you can kind of get direct access to.
And then people are just like sharing them on like guests or something like that.
But a kind of a recipe hub, like you're saying, we think would be super fun.
Because, yeah, I mean, our hope is that there's like tens of thousands of these recipes.
Like anything you can imagine doing, you can just like search and like find a recipe for that.
And you're like, oh, cool.
And you can like look at like five different variants.
It's like, oh, they set up this way, they set up this way, they set up this way.
And then you can remix it
to meet the specific needs of your project.
And then agencies,
companies can develop their own
set of recipes that they use
to run stuff.
Very cool, Kyle. Well, congratulations
on finishing your long road
to incremental builds.
You can go and relax now for a little while.
Then you have to get back to work,
I'm sure, as the buzz around
the new stuff is out there.
Of course, links to everything we mentioned are in the show notes
for easy clickings. Definitely check out
Gatsby Recipes. Seems like a very cool
new thing. Get out there, try it,
share your recipes, and
definitely give Gatsby a go
if you haven't yet. It's got lots of interesting
ideas at Play, React, GraphQL, all the buzzwords.
And Jamstack.
It's the year of Jamstack.
Yeah, it's coming.
So thanks, Kyle, and we'll talk to you again soon.
Yeah, thanks so much, Jared.
All right, sign up in the comments and share what you think about the Jamstack.
Is it the future?
Is it coming?
Is it here? Are you using it? What do you think about incremental builds? What about the story Kyle
shared? Is it inspiring to you? What are you building? Hop in the comments and let us know.
Of course, you can comment on all our podcasts at changelog.com. Open your show notes and click
discuss on changelog news. We'd love to hear from you. One of the easiest ways you can support us
is by telling your friends about this show. the easiest way for podcasts to gain new listeners is by hearing it
word of mouth not marketing not anything else so tell your friends and we appreciate it special
thanks to the beat freak break master cylinder for all of our beats and of course thank you to
our awesome sponsors and partners who get it fastly lin, Linode, and Rollbar. And as you know, we have a master feed.
This master feed brings you all of our podcasts in one single feed.
It is by far the easiest way to listen to everything we ship.
Head to changelog.com slash master to subscribe
or search for ChangeLog Master in your podcast app.
You'll find us.
Thanks for listening this week.
We'll see you next week.