The Changelog: Software Development, Open Source - Open source, not open contribution (Interview)
Episode Date: March 26, 2021This week we're talking with Ben Johnson. Ben is known for his work on BoltDB, his work in open source, and as a freelance Go developer. Late January when Ben open sourced his newest project Litestrea...m in the readme he shared how the project was open source, but not open for contribution. His reason was to protect his mental health and the long term viability of the project. On this episode we talk with Ben about what that means, his thoughts on mental health and burnout in open source, choosing a license, and the details behind Litestream - a standalone streaming replication tool for SQLite.
Transcript
Discussion (0)
Today on The Change Law, we're talking with Ben Johnson.
Ben is known for his work on BoltDB and his work in open source as a freelance Go developer.
Late in January when Ben opened source his newest project, Lightstream, in the readme
he shared how the project was open source, but not open for contribution.
His reason was to protect his mental health and the long-term viability of the project.
On this episode, we talk with Ben about what that means, his thoughts on mental health and burnout in open source,
choosing a license, and the details behind Lightstream,
a standalone streaming replication tool for SQLite.
Big thanks to our partners, Linode, Fastly, and LaunchDarkly.
We love Linode. They keep it fast and simple.
Check them out at linode.com slash changelog.
Our bandwidth is provided by Fastly.
Learn more at fastly.com.
And get your feature flags powered by LaunchDarkly.
Get a demo at LaunchDarkly.com.
Linode is simple, affordable, and accessible cloud computing the developers trust.
Linode is our cloud of choice.
We trust them, and we think you should build anything you're working on,
a fun side project project or that next big
info move at work with linode the best part you can get started on leno with a hundred dollars
in free credit get all the details at leno.com slash changelog or text changelog to 474747
and get instant access to that hundred free credit again leno.com slash changelog.
Ben, you're back.
It's good to have you back on this show.
You've been on GoTime.
You've been on the change like forever ago, basically.
But, you know, we don't want to bear the lead.
There's a new open source database out there,
Lightstream, but the reason why we reached out to you was because of
I suppose the
anti-normal of open
source but closed to contributions aspect
of what you wrote there.
Let's open it up there.
Like you mentioned, it's a database.
It's actually a database tool that wraps around SQLite
to let you stream your data into the cloud, basically.
Just run, so you can run SQLite in production
and have it safely persisted.
And yeah, it got a lot of notoriety early on,
not for the actual code that I wrote,
but for the kind of the code that I'm not allowing.
And then that it's really, yeah,
there's a closed contribution policy
on the repo. And it came from just kind of some other projects I've done over the past,
like BoltDB I wrote quite a while ago. And a lot of them just kind of became,
just being a lot of maintenance and not even just like checking code and doing all that,
but just like responding and just really,
you're trying to take a lot of people's desires for what they want in the project,
and 90% of the time you have to say no,
that's not really where we're going with this,
and just trying to figure out that overhead
and trying to mitigate that.
Yeah.
What you said in the readme was,
you've said some more than this,
but I'll give the TLDR version of it.
I've made the decision to keep this project closed to contributions for my own mental health and long-term viability of the project.
Which I think will go into the deeper parts of it because you've done other open source before and you've got some scars and some history to you.
And some aspects to, I guess, what motivates you.
But what kind of feedback did you get initially from this?
Was it a lot of high fives?
Or was it a lot of like, whoa, hey, Ben, that's kind of wrong?
Or what happened?
What's the fallout so much from that line there?
Sure.
I mean, I was fully expecting people to just rag on that.
Actually, I expected people to not really even notice it
because it was buried at the bottom of a long readme at first.
And it somehow made to Hacker News. And honestly, I would say it was buried at the bottom of a long read me at first and it somehow made to hacker
news and uh honestly i would say it was probably 95 supportive just other people just kind of
saying oh yeah i've totally been there too it's just a lot to take on and take it in changes
and uh try to manage that thing and really like i guess my goal i i really try to distill it down
to like what is my goal for this project and i I think, you know, I don't, I tend to make tools that are minimal. Like I have a fixed
idea of what I want to build. So for, for Lightstream, I want to run SQLite in production
and anything that doesn't really support that, you know, any extra use cases are just not that
important to me. And I want to make it as simple as possible. So I didn't really necessarily want
to make the biggest project or the fanciest project. I wanted to make something that just
kind of works and works well for what I'm doing. So I didn't see external contributions really
moving the needle, you know, in that, you know, for that kind of thing. And actually that being
said, I feel like there's a fascination in our industry where around code and like contributing
code, but like, honestly, like I feel like the code piece is such a small part of it all.
If anything, I would love to have people try it out,
test it out, submit feedback, bugs, that kind of stuff.
I feel like that is like, or even like docs changes.
I feel like that's like 90% of the project.
And then the little bits and bobs of the code are a smaller piece. That's why we wanted to talk to you really about this, because I feel like there's a lot of the project. And then like the little bits and bobs of the code are, you know, a smaller piece.
Well,
that's why we wanted to talk to you really about this,
because I feel like there's a lot of nuance here.
And,
and,
you know,
prior to that,
you said you're grateful for community involvement,
bug reports.
You did say those things.
And,
you know,
but the highlight really was that you wanted to keep it close to
contributions for whatever reasons,
whether it's mental health or long-term viability,
as you mentioned,
but just for whatever reason you wanted to kind of keep the code base itself
limited to your input because you had a specific scope. And I think that's where you kind of have
to have a podcast like this go into those details where it's literally Ben Johnson sharing with us
the details of why that makes sense for your project and how you can see community involvement
still taking place, but just not so much in the contribution to the code base itself.
Yeah, for sure. And actually, I can give you some good examples too recently
where some people, you know, one thing that people wanted to see was
Windows support and the code changes to make Windows support
happen were relatively small. I would say it was probably a dozen, two dozen lines
of code. But actually
I haven't run Windows in 10 years,
10, 15 years. So actually
getting in and reading the docs on how Windows services
worked and getting it up and running
and getting a VPS started that runs on Windows
doing RDP over
to that and logging in, setting stuff up
and all the packaging stuff around
MSI installers, getting
a code signing certificate
like there's just like a million things to do to actually make this really run and like have a good
developer experience that aren't just those you know 20 lines of code that were you know pushed in
so i feel like that those are the kind of the underappreciated parts that you really just never
see but that's really kind of what makes the project rather than the actual the code itself actually another one that came up right after that was s3 compatible stores so
right now it pushes to s3 amazon s3 and there was i don't know maybe like a four line change to make
it work with like minio or gcs yeah google cloud storage a couple other cloud stores and those
little bits aren't that hard just to kind of put a little tweak in there,
but I wanted to make sure that the experience
of getting on and trying it with those pieces
and how they integrate into the docs
and changing the getting started
to make it simpler for people to actually try it out
and going through and testing,
all those things, it's just crazy.
Everything beyond the code
that really doesn't get talked about.
But it's just hugely important. Yeah the code that really doesn't get talked about. It's just hugely important.
I may be splitting hairs here, but you say
it's not open to contribution, and it sounds like
those are all contributions.
That's a really good point, actually. That's probably
some copy editing
I need to change. Those are huge
contributions. I guess code contribution is the
thing it's close to.
Only Ben can write code, but everybody else can be. Because the question is, if I don't want
contributions, then it's like, well, why did you open source it? But it's clear why you open sourced it, because you do want
participation or key to community involvement and all these things.
It's just specifically, you're writing the code for this project.
Yeah, and I think there's a lot around the actual direction of
usability,
how you want it to feel,
how everything integrates together
that I think is easy to miss
if you're an outside contributor
just bringing an initial PR into the project.
And I think I could certainly get people up and running
and explain to them why certain things go together
or certain things work the way they do.
But again, that's just a lot of overhead that I'm not necessarily
opposed to, but is that time better spent building the product and
making it kind of gel a little better together?
And I guess from my side, I haven't gotten to that point where I need a second person to
come on and really commit code in that kind of way.
So break it down for us then, a suggestion or a contribution non-code-wise.
How does that happen?
Does it simply happen in issues or, hey, Ben, by the way, I want to have not just simply
S3, I want to support Minio or XYZ store.
How does that permeate into the actual code base?
Does it just simply come through you or how does it work?
I mean, yeah, issues have been a great way.
It's been pretty active on there so far.
People just, if they have bugs, they tell me,
or if they have issues running it or whatnot,
that's a great way to do it.
The GitHub discussions I've actually really liked,
where you kind of have some,
like a threaded discussion board,
which I feel like they haven't really announced enough
because I don't know a lot of projects using that.
But I find that's a great way to kind of get people on
and talking about stuff
that don't feel like they necessarily have an issue. They just more have a question or like, using that. But I find that's a great way to kind of get people on and talking about stuff that
don't feel like they necessarily have an issue. They just more have a question or like, you know,
what's the best usage for this kind of thing. So I think those are great ways to do it. As far as
the, like the documentation, that's actually all MIT licensed. So if someone wants to come in and
make changes, suggestions, or, you know, fixed typos, that kind of thing, or any, whatever
contributions from a doc side, That's all open source and
open contribution.
Open, I guess, not really
code contribution, but
contribution.
So you didn't make this decision
in a vacuum, and your previous experiences
obviously informed
this decision, at least some.
So you're talking about mental health and
really your enjoyment overall and the success of the project are kind of informing this decision, at least some. So you're talking about mental health and really your enjoyment overall
and the success of the project
are kind of informing this decision.
What have you been through that brought you here?
Have you been through burnout?
Have you been through terrible pull requests
or low-value code contributions?
What's the kind of stuff that you've been dealing with
over the years?
Sure, yeah.
I mean, I think, so previously I'd written BoltDB,
which is a database in Go.
It's a key value store.
And that project, you know,
there are definitely valuable contributions.
I don't want to, like, diminish that.
But I feel like a lot of contributions,
either they can fall in two buckets, I'd say that,
or a few buckets, I guess.
So you can have, you know,
very small kind of trivial contributions,
which, you know, I don't have anything against small contributions at all.
But then you also have kind of like mid-sized to large contributions,
which can really either skew the scope of your project very much.
And a lot of times you just can't accept those,
or you have to do a lot of changes to really accept those.
And the other side to that is that if you do get some great feature added,
the person that added that feature that sent the pull request, they're not probably going to be around six months from now when people are asking you to support that.
And something's broken on it, you've got to debug that.
So I guess I kind of come from a database background over the last decade or so. And I find that people in other kind of realms of the industry, I feel like they really focus on like, hey, look at this new feature, this great feature we have, blah, blah, blah.
Like they really tout that. Whereas in my position, I really see features as like a liability.
Like every little feature I add is something that could possibly corrupt a database. Like
it's really pretty serious. I mean, not Lightstream. Lightstream doesn't actually write your database.
But there are huge liabilities
if you just add some small pieces of code.
I feel like there's a utility side
and a liability side to every feature.
I feel like the utility side needs to vastly outweigh
the liability side.
So that's why I feel like I tend to reject a lot of features.
I've heard it said often
that code is a liability and features
are assets, but I've never heard anybody say features
are liabilities. I definitely see
where you're coming from. That being said, you're probably talking
about the code of the features, right? The maintenance
of the features, which are
liabilities. I think you interact, like
features interact with each other, and the more that
you have, I think those interactions
really grow kind grow exponentially.
Whatever, geometrically.
Some kind of math equation.
They can really grow as they interact with each other.
There's just going to be unexpected ways that they do that.
I think features really very much do,
even from a documentation standpoint,
usability standpoint, are liabilities.
Yeah.
And they're kind of one-way streets as well
because it's easy to add,
but it's very difficult to remove,
especially if you have empathy for your users.
I can't just take away this thing
that you're relying upon,
but it's really screwing up this part of the code base.
And so in that way way it's a liability even
though that person sees it as the value as the maintainer all you see is how it's slowing you
down or causing you headaches because taking it away is selfish lots of times as a maintainer
yeah you know i thought about um like i thought about different ways you can kind of run
byte stream and whatnot and i've really been trying to figure out how to run
SQLite and a serverless platform
which is a weird idea
but I feel like it'd be an easy way
to get people to deploy their applications
and it runs just simply with no
configuration outside of your
serverless platform and
for that one idea I had
was running kind of a service for people
where they can replicate to the server that's outside their serverless platform.
Sorry, this is a long story.
But in thinking about that, it kind of stresses me out a bit to think,
hey, if someone does choose to use a service that they put out there,
I can't just give up on it.
They might be really relying on it for their business or their organization.
And I feel like this is a commitment you really need to consider a lot, for sure, in the long-term effects.
Features are kind of like hiring people, in a way, right?
If you want to have an analogy, I think of that with business even.
I can recall back in the day when I worked for a non-profit, and I was very green behind the ears.
Or what's the word? Wet behind the ears.
Wet behind the ears.
Is that a bad term to say these?
I don't know.
I think it means you're fresh out of the water, isn't it?
Yeah.
I was inexperienced, let's just say, in the realm of business.
A lot of ambition, but my boss, the founder of the company, the nonprofit company, I was keen on getting the help because we needed support in the design front.
And I wanted to hire somebody like ASAP because I was the one feeling the burden of the need essentially, so the feature.
And he's like, we got to be slow to hire.
And he taught me this lesson essentially about being slow to hire, and it seems similar with slow to feature essentially.
Because for every feature or for every hire, you may have to eventually deprecate it or fire them or circumstances change.
And so just be very wise and very calculated with your hires or very calculated with your feature adoption.
I agree 100% for sure.
And it's easy to get in over your head because when that feature comes in for free overnight while you were sleeping and all you had to do
was hit this button, it's exciting that somebody
likes your software enough to work on it.
I haven't had a successful project like you have been
with BoltDB or any of the other stuff.
I've had things where maybe it's my open source deal
or I would love the contributions,
I never quite got there.
But I've gotten features, little ones and medium sized ones on a few projects.
And for me, it's always been like, I'll giddy about it, but that's like kind of like you get a dopamine hit and it's enjoyable.
And so you just do something quickly cause you're like, yeah, somebody cares.
But then six months down the road, you're like, why did I do that?
Yeah, for sure.
I still feel the same way
when people submit issues on the project
or submit feature requests
and things like that or want to discuss it.
I love talking about this stuff
and working on it with people.
But yeah, I totally agree.
There can be some regret later on.
So when you made this choice,
have you read Nadia Ekbal's Working in Public?
Or have you been thinking about these things?
Because the reason why I made the connection,
I thought maybe he was inspired by that
because she said in that book and on our podcast afterwards,
like one of the things she's realized over this time,
you know, researching and being part of open source
is like she realized that open source
doesn't mean open participation.
And it doesn't mean open participation and it
doesn't have to mean that and that resonated with me and when i saw your post i thought i wonder if
maybe you had been inspired by that concept or maybe you came up with this completely in a silo
i mean i haven't read the book i may have seen other posts of hers i'm sure yeah she definitely
influences thoughts around this i think that there's definitely a crowd of Twitter OSS maintainers
that we commiserate a bit to each other
when we see a project gets closed down
because someone gets burned out.
You see these large public things that happen like that
where it happens, someone has a hard time,
closes a project or it shuts down, or something goes sideways,
and a bunch of people all kind of know that feeling
and kind of share that feeling.
And I think, I'm passionate about open source sustainability,
but I think it's just a hard problem.
I don't know as far as how do you get people
working on open source, which is free,
and I don't feel like people have really found
great ways of making
money off of it to like sustain them in that way financially so i think there's people out there
trying to figure this stuff out like her and i don't think there's really an answer quite yet
but i think that trying to maintain like lower that burden at least in some way i think can
maybe help in some incremental small way yeah well i think that your choice here and i think
probably her findings and her statement
which she said on this show and elsewhere
that it doesn't have to be participatory
because many times it is the
situation where you have the one maintainer
serving the many users
and the contribution does not
scale alongside the user growth.
She calls that a
stadium. I don't know it's like you're
kind of a rock star in a stadium
you know there's one person on the lead microphone and there's a hundred thousand in the stands
pre-covid now there's like uh ten percent of a hundred thousand people on the zoom there's
cutouts there's a hundred thousand cutouts out there you know so like that circumstance happens
a lot where the growth of the project happens but the growth of contribution doesn't scale
or doesn't match.
And that's okay.
And it's okay to actually even say
that's what I'm going to do.
I'm the only person on this
and that's the way I want it to be.
And I think it's fitting for scoped things like this,
like the tools you like to build
where it's not a thing that you're going to work on
into infinity necessarily
unless it grows outside of the scope that you initially defined.
But I think what that comment from her did,
and probably what yours has done with this,
putting your stake in the ground,
and then having it on Hacker News and 95% positive,
it's probably given a lot of other people permission to do that.
To feel like, oh, Ben can do it, I can do the same thing.
Because a lot of us
like to put stuff out into
the world just for that reason.
And don't necessarily want it to be
community projects.
Open source does not have to be
community maintained.
It doesn't have to be.
I was thinking about this today.
I feel like there's two kinds of projects out there.
You have frameworks and you have libraries.
This is kind of the debate out there.
Do you build this big scope thing,
say like a React or a Kubernetes?
I'm not going to build that by myself.
It's meant to have this huge overarching scope
that your application runs on top of
versus say you have a library or a tool
that is kind of an incremental small scope piece.
And I don't like writing frameworks.
That's just not my passion.
I don't like those never-ending scope projects.
I like saying, hey, here's a problem and here's a solution and build a tool for it.
So I feel like my favorite projects are those kind of projects.
And within those tools, I feel like the best projects of those tend to have kind of that benevolent dictator for life kind of management around it so why do you do open source like what's your intrinsic reason
i would say it's twofold i like the reach that open source has where like you know when i wrote
bolt db i had people say that i could try to monetize it i don't have any idea how you monetize
like an embedded database like that but i'm sure could, I'm sure it could make more money than I did off of it, obviously.
And to take that and then say, Hey, this is free for anybody to use. And it gets picked up by
another project. And then like it got picked up by etcd. And then etcd got put into Kubernetes.
And it's just kind of, it's crazy to think of the reach that, you know, BoltDB, while it's a small,
small piece is, you know, deployed in some of the largest companies in the world, you know, bolt DB, while it's a small, small, small piece is, you know, deployed in
some of the largest companies in the world, you know, helping to, you know, persist state in their
etcd cluster. So like little things like that, just seeing that I can make some small incremental
change in the world that has large reach. So I'd say that's why the biggest reason I do open source.
And then kind of a secondary reason is like, you know, a lot of things that you do at a day job are, you know, you're doing credit apps or you're doing things that, you know, move the business forward.
But they're never going to be like this kind of edgy, researchy, kind of like down in the weeds, fixing some really deep, interesting problem a lot of times.
A lot of times it's just kind of a day-to-day kind of work you do. So I feel like the open source stuff that I do tends to be kind of that more esoteric, unusual stuff. So like this,
for example, like Lightstream, it's one of those problems that I've always had where I don't want
to have like a complicated application deployment. I just want to use SQLite. How do I make that
happen? Like what is the thing that's stopping me from doing that in a production app? And, you know,
I could never write Lightstream for a company because that just sounds ridiculous.
There are other solutions out there that people could run Postgres,
or you could run on RDS on Amazon.
There's not a justifiable reason to build Lightstream in a company.
So that kind of thing.
I worked on a project before that where I ported over a tool called Klee, K-L-E-E.
It's this crazy code execution tool where it basically like it'll analyze your code and go through kind of all the paths of the code.
And you can like do things where you can generate test cases from code.
And it has like a solver in the back end.
Anyway, it does all these kind of crazy things.
I ported that over to use with Go.
But like, you know, I spent a ton of time on that. I anyway, it does all these kinds of crazy things. I ported that over to use with go, but like,
you know, I spent a ton of time on that.
I released it,
but I never,
it was never really finished.
I just kind of wanted to try these new things and kind of experiment and,
you know,
push my brain in different ways,
you know?
So it's,
it's really like an intellectual kind of interest.
That was a long answer.
This episode is brought to you by our friends at Retool.
Retool helps you build internal tools fast and easy.
From startups to Fortune 500s, the world's best teams use Retool to power their internal apps.
Assemble your app in just a few minutes by dragging and dropping from pre-built components.
Connect to most databases or anything with a REST, GraphQL, or gRPC API.
Retool empowers you to work with all your data sources seamlessly in one single app.
Retool is highly hackable, so you're never limited by what's available out of the box.
If you can write it in JavaScript and an API, you can build it in Retool.
You can use their cloud service or host it on-prem for yourself.
Learn more and try it free at retool.com slash changelog.
Again, retool.com slash changelog. Again, retool.com slash changelog.
So one of the reasons that you say you do open source is because of the potential impact of your code.
I think it's a great reason.
I think it's one reason why lots of people do open source.
And it's pretty cool to see, like you said, your little database, your little key value
store, you know, like inside Kubernetes, powering all these deployments.
It has to be satisfying.
Was there any fear or trepidation or concern that maybe this decision around no code contributions
would limit Lightstream's impact?
I was probably a little naive with it. I thought nobody would notice, to be quite honest.
Nobody would notice Lightstream or nobody would notice this policy?
The contribution policy. I mean, I thought some people might when they try to open a PR,
but I didn't think it would become a big topic.
So actually the thing that I worry more that would limit people, this is actually the first project I've ever used a GPL for. Um, and yeah, I'm still not sure about that decision. I
mean, I think I haven't had any blowback. I was surprised I haven't had any blowback about that,
but yeah, I think not being able to embed it or just, I don't know, people get weird about
copy left licenses. What drove that decision? It's weird. The little things that kind of change your mind.
Like I've always written libraries.
So like libraries, especially in Go,
like you pretty much have to have a very open license
like MIT or Apache.
And this is kind of the first command line tool
that I ever wrote that kind of runs separate
from the application.
And Mike Perham, I think that's how you say his name,
Sidekick, he had a tweet like years ago that just always stuck in my head. And it was basically, I think that's how you say his name, Sidekick, he had a tweet like
years ago that just always stuck in my head. And it was basically, I think it was kind of trolling
a little bit, but he was basically saying like, you know, if you don't license a GPL, you just
don't care about your code or like don't care about, I mean, he was being trolly, I think it's
a little bit in jest, but that kind of sat with me and just like, you know, if you don't really
control the, you know, what happens to your code and where it goes and what people do with it, you know, you kind of limit the ways you can grow that project.
And, you know, I think there are, again, around like sustainability.
I think that I guess my biggest thing with sustainability is that it feels like I know GitHub's recently ordered like, or added like corporate sponsorships, but a lot of it's always focused around like individuals
contributing to other individuals doing open source.
Whereas like really the people that benefit the most are,
you know,
these bigger companies that could easily spend a thousand dollars a year,
whatever paying for some library that really supports their business.
And I think having more control around the actual license and what people can
end up doing with it, um, I think can really kind of shape, you know, that
conversation more that that makes sense.
I'm not trying to sound too greedy or anything, but I'm just, I'm really just,
I find that kind of be a fascinating direction that I've never really taken
before.
Have you read this license end to end?
Ben just curious.
The GPL. Yeah. I don't know if I've read-end, Ben? Just curious. The GPL?
Yeah.
I don't know if I've read it end-to-end.
I think I've read it at most parts of it at one point, but I probably should.
It's a long license.
Over the years of having so many more conversations about licenses, I find myself actually reading
more and more.
Now, I haven't.
I have to admit, I haven't read the gpo end to end but i don't have
any code out there that has it adopted as its license so at least i'm clear there and you're
not but yeah but i'm just curious that's totally a good point when you choose this license and you
know you mentioned mike param and that tweet and you know whether it was in jest or not like what
what specifically about this license like what clauses made you think, okay, this is suited for Lightstream?
Sure, that's a good point. And I will say, I tend to defer to people that probably
know more than I do. And I'll read summaries on a license
more than I will go word for word into license and pick it apart because I'm no lawyer.
I think the ideas around
if you're going to use this code, or not even just use it, if you're going to take the code and change it around, like, you know, I think that that stuff should be put back in the world for the benefit of everybody.
Like, and I think that, you know, the one thing I don't like about it with libraries is like linking this tool into your code suddenly means your code needs to be GPL as well.
And that seems ridiculous to me.
Whereas Lightstream is pretty isolated.
It's a single binary, runs next to your application.
And any changes to that should, I would assume,
probably be helpful ultimately
to the wider community out there.
And yeah, I would welcome,
if not even the code contributions from that,
just simply the ideas
around what people are changing about it
and putting that back out there.
So you want to make sure that whatever
value is there currently
or could be derived from the future, whether
it's you changing it or someone
else changing it, you want to make sure that future
public use, the open source spirit
remains with the software.
Yeah, that's basically the idea.
Yeah, so if I adopt it at my company
and then we invest
labor hours into making
Lightstream 10x faster
or I don't know what sort of metric
you would improve it, right?
I'm a 10x engineer, so I 10x it.
As soon as you touch the code
it just goes up.
That's right. I actually just put a comment in there that says 10x and, so I 10x it. As soon as you touch the code, it just goes up. That's right.
I actually just put a comment in there that says 10x
and then I...
You'd want that to be out in the world,
right? Even if that code's not
going back into Lightstream, you may look at that
and be like, oh, that's a clever
thing Jared did. I can't believe he's
such a good 10x-er. And then you might just pull that in.
But if I didn't have to do that, we could just
keep it for ourselves
and Lightstream wouldn't benefit
and the world wouldn't benefit.
Exactly, yeah.
Let's be honest real quick, Jared.
You're not a 10Xer.
No, I'm not a 10Xer.
You're a 11Xer.
Oh, thank you.
You called me out.
Insofar as I can multiply things by 11.
That's right.
Yeah.
But that's as far as I'll go.
So you were concerned that GPL would limit it.
You were so concerned that the non-open code contribution
would be a limiter.
But it sounds like, at least in terms of an open source project,
it's off to a great start, wouldn't you say?
Lots of attention, lots of people looking at it.
So far so good.
I haven't had anybody push back on the GPL.
I think one person asked about it, and then I told them why,
and they just said, okay, and they moved on.
So that's been good to see, for sure.
You mentioned that you made this choice somewhat naive,
in a naive way.
And I'm curious if there were alternative options.
Because I suppose you could not put it out there
so explicitly,
it's open source but no contributions.
You could just simply just not accept pull requests,
which is kind of what GitHub forces you to do now, right?
Because even though you've made this declaration
and you're reading it,
it doesn't mean that the tooling has supported
your desires for keeping your pull requests closed.
You're still sort of stuck with that.
Yeah, for sure.
And I've had folks from GitHub reach out over DM on Twitter
asking what they can do to help support the project.
And honestly, I just asked for being able to check off the,
to hide the pull requests, just not allow those.
I think that'll go a long way.
And it sounds like it's something they've definitely debated
and they've talked about.
And I'm sure there are nuanced reasons why they can or can't do that.
And maybe it's coming in the future.
But I'm not privy to those decisions.
But I think it would unload a huge burden on people
if they just didn't have to think about that.
And closing PRs after the fact is like
just the most soul-sucking thing to do where it's like someone might have put in so much time into
pr yeah and then you have to just like close the issue or close it and be like i'm so sorry like
i can't i can't take this because it's not what i'm looking for like you know so i'm trying to
be as explicit as i can without before someone really puts all that time into it.
Yeah.
It's really difficult, I suppose,
in the world of open source
to not waste someone's time
without some sort of explicit visual cue.
I would imagine if you have a repository
that does not have pull requests,
which is sort of,
I can remember when, you know,
back in 2008 when GitHub first launched,
like that was the cool thing.
Like PRs is the DNA of GitHub essentially. So if you take that away, I can remember back in 2008 when GitHub first launched. That was the cool thing.
PRs is the DNA of GitHub, essentially.
So if you take that away, to me,
you'd need to be visually far more clear on a repository that that's not an option.
Maybe a red banner or something.
I don't know, just something very different,
starkly different than every other repository.
Or that skull and crossbones emoji.
There you go.
Non-shelf ass.
I totally agree.
It's almost like how the license shows up on there.
It'd be nice to have some kind of
I guess it's community guidelines.
I don't know. There's something I feel like they could
do to basically say
we're welcoming but not that welcoming.
Right.
What's the most polite way to say PR's not welcome? Yeah, I don't know, we're welcoming, but not that welcoming. Right. I was going to say, what's the most polite way to say PR is not welcome?
Yeah, I don't know, man.
You may have done it.
It's definitely been an interesting thing to tiptoe around.
How do you convey to somebody that I value your input, just not the code that you're giving me?
That's a nuanced thing that I don't think I accomplished very well.
Right.
And you're touching on it.
That's exactly why I thought that it would make sense to talk through this with you,
because we've had, you know, I would say the luxury of knowing you for many years now.
Not like buddies hanging out on the weekends, but we've known you for many years,
and I know where your heart is at, or at least we have a direction of where your heart's at
with open source in the community.
And, you know, a passerby, a brand new person to Ben Johnson in your code
and who you are may not understand the nuanced reason
of why you would make this choice.
And I think that's good luck to GitHub and the interface designers there
to encapsulate what this podcast may convey well or not so well
in a button or some sort of visual element, it's going to be difficult.
No, yeah, I think the podcast is a great medium to convey that.
Even a blog post is not going to – people are going to read that in different ways.
Hopefully I don't sound like a d*** on here, but it's easy to come up with that way with just written text.
Yeah.
I'm just enjoying the thought of GitHub
putting our podcast in a button somewhere.
When you click the button,
you just have to listen to this conversation.
It's like, here, this is why he's doing it, all right?
There you go.
Going back to the GitHub features end of this,
you could use an issue template,
but is there a PR template?
Is there anything in between the person
and their pull request besides your readme
where you could inject
a thing that says,
hey, don't do that.
Like, don't waste your time.
There's a PR template,
which I have the same
kind of paragraph
about why I don't
take pull requests.
And, but again,
like you have to get
to the point of
finishing the code
and pull requesting it
to actually see that.
So, I mean,
in that sense,
like, you know,
the person still
has wasted their time. I don't see that pull request that I have to like you know the person still has wasted their time i
don't see that pull request that i have to then close which probably makes it easier for me but
like it still hurts that somebody may have put time into that yeah you almost want it like on
the fork button you know like when you click fork it might tell you at that point that yeah you can
fork this project but know that yeah that'd be nice yeah because that's usually the first step
that i would do is fork it right yeah unless you're doing an edit like to the read like a typo edit on the readme inside the
github web interface yeah something like full screen banner that comes across and just says
you're not taking this code back so once you're right so what would happen if somebody came up
to you and like you just misspelled something in your readme and they just they just did it anyways are you going to close it and be like i'm going to commit the same
change with my own signature it's something i definitely struggled with i have in the argument
of slippery slopes but like it is one of those things where like i don't have a problem with
small pull requests like those little tiny minutiae but then there's going to be somebody
who instead of changing a word they change like the whole sentence. And maybe it just reads in a weird way.
It's just not what I'm trying to do.
Or then maybe it becomes a small code change, but then that still grows.
And I don't have a perfect answer for this.
This is really an experiment.
So I don't want to come across saying, I know that this is the best way to do open source out there.
And it definitely has its flaws.
And this is a perfect example of one.
Yeah, I mean, in that sense, it's definitely hard.
It's tough.
Yeah, because it seems so petty to be like,
actually, I'm not going to accept this,
because I don't accept them.
But once you accept one,
now your list of contributors is two people,
and you can't go back on that,
and now it's like, I don't accept contributions.
What about that person? Why'd you accept that one? It's like I don't accept contributions what about that person
why did you accept that one
now I have to have this conversation
every couple of weeks or whenever it happens
or even back to the license
you mentioned GPL being
good now for some reason
you change your mind
every contributor is a liability to
a veto to that change
yeah you gotta do like a CLA
a contributor license agreement.
I think that's what it stands for.
And then that becomes a whole thing.
And I don't know, it's just,
I really appreciate people pointing out the typos and whatnot,
but just the amount of overhead just seems weird
compared to the actual value of the change.
This all comes back to the scope of Lightstream, though.
The scope is limited,
and that's why you want to maintain control.
It's also influenced by past interactions
in open source and your work.
It's a culmination of many things
that isn't just simply,
I prefer my code, not yours.
That's not what you're saying.
You're saying, I want to be the contributor to it.
I have the best code.
And you never make a typo. That's how what you're saying. You're saying I wouldn't be the contributor to it. I have the best code. And you never make a typo.
That's how you solve that problem. No typos.
Well, you're not saying it condescendingly
is what I mean. You may be saying that, but you're
not saying it as like, you suck, I'm
better. It's more like, I just have a preference
here. It's your prerogative to
feel that way, Ben.
I think everyone generally prefers their own
code, but I think there's definitely something around continuity of code.
Like, you know, if I contribute to somebody else's repo, my code,
even if I really try to follow their code style,
it's going to be a different approach.
It's going to be a different just way of doing things,
which, you know, it's going to be that one section of code
and their whole code base that just works a little bit differently,
and they've got to kind of keep that in the back of their head.
Or they can come in with your PR and change it around
to the way they would do it, that kind of refactor.
Let's be super explicit then.
If someone on the GitHub team is listening,
what exactly is your request to make the way you want to run this operate?
Is it simply turn off PRs and hide the button or hide the fork
or do some of the things that Jared mentioned?
What's a good suggestion?
I think hide the PRs is probably number one.
I think some notification when someone tries to fork,
I think it would be awesome as well.
I think that's a great idea.
But beyond that, I think the discussions are a great direction
to move the conversation away from code and actually the use of the tool and how people use it.
That's really the thing that I miss out on for a lot of things is use cases and how you use it and the workflows and stuff like that.
The code in Lightstream, you're not going to be blown away by it.
I'm not doing SIMDd crazy whatever um coding in
there it's just you know ifs and for loops and whatever you do in code so like the real value
of it i feel like it's when you actually apply that stuff and how that looks in the real world
so i think like discussions go a long way in that and then i would say my other request to github
and we talked about this briefly but but they do corporate sponsorships now.
I actually, I really wish they would allow you to only take corporate sponsorships.
I feel really weird about taking money from other individual developers.
And I actually, I don't do sponsorships for that reason. And actually, if you really want a wish list beyond that, I would say, I think that there is this idea that corporations should come along and benevolently support projects.
It's in their own self-interest, for sure, but it's definitely a charity.
And I don't think that's the right way to frame it.
I don't think you're ever going to convince a large swath of companies to support open source without really giving them something direct and tangible in value.
So I know it's a contentious idea,
but some idea of giving priority support to some corporate sponsorships or giving some additional benefits that you can really give to a company
and say, hey, if you sponsor this thing for, I don't know,
$100 a month, then you get these benefits.
You can do that outside of GitHub.
There are ways of doing that.
But I think to streamline it inside of GitHub would be really powerful.
I think that would really motivate a lot of open source contributors.
I think the framing of the sponsorship is really where it gets, as you said, weird.
Even I would say at a company level, I would personally much prefer it if you just offered a product, and that one product was just simply support, and it was only open to corporations or businesses, LLC, corporation, whatever you want to be, just not an individual.
So an individual software maintainer like yourself doing business with corporations, and I might personally prefer to just do the business with
that business personally rather than leverage github but i think if github could you know
produce tooling the framing of it being sponsors or github sponsors like that's where it gets
in my mind weird like even for us we as a podcast network and a podcast business, a media company, we sell sponsorships.
But once we pass that threshold of like relationship, we begin to call them and treat them much like partners because we're not looking for sponsors and transactions.
We're looking for people who care about us as a business, the community we serve, which is software developers, and that's I think, you know, they get in the
door with the word sponsorship, but we soon
after help them understand
our own lexicon, which
is, you know, we treat
you like a sponsor or a partner
and not so much like a sponsor, and at that point
we prefer you to not be transactional
and prefer to lean on relationship, but
I'm kind of going in the weeds on our own business, but
that's, I think the word sponsorship is gets yeah a little murky in my opinion no i think that's
totally fair yeah whatever you want to call it i think that the biggest hurdle i think that
github can help with is that you know companies tend to have you know these painful procurement
processes where you have to invoice them and it has to be whatever i think to be able to streamline that piece i think would help like yeah the idea of most developers you know going through
procurement processes for every company i think seems overwhelming i would be happy to pay github
30 or whatever you know typical app store fees are to manage that kind of stuff to provide tooling
around that.
I would have no problem giving that money away to them rather than having to kind of side channel all that stuff
through a website I have to build
or some tool I have to use outside of GitHub.
That makes sense because if they can knock down all that red tape,
all that minutiae in the process, the bureaucracy of that buying
process, PO numbers and accounts payable.
It can be a nightmare if you have no patience or you don't want to spend your time there,
which I would imagine you would just much rather write code or handle non-existent pull
requests or hang in discussions or whatever.
I'm just kidding.
But that would be a better use of your time.
And if GitHub could level the playing field globally at a corporation level and remove that red tape
and make it as easy as just a relationship thing rather than saying,
let me ask my accounts payable department, let me talk to my boss.
We've already, GitHub's already sort of leveled the playing field and made corporations who do want to pour back into
or buy these kinds of would-be products from open source developers like yourself.
That would be pretty cool.
And I don't have all the answers, so I'm sure there are reasons that is a terrible idea.
But I think normalizing companies paying for some kind of product
on top of open source, especially support or other things of that ilk, I think are a good direction for sure.
I'm curious, Ben, if you've been to the SQLite website much or read much of their documentation lately.
I have read all their documentation.
Did you read their copyright?
Is it the public domain?
The reason why I ask is because they say something very similar to you.
They say open source, comma, space, not open contribution.
So even a lot of language is very similar on that front.
And they say a lot of what you've said.
So similar stance at least.
Yeah, yeah.
I think I pulled in some of that from the readme as well.
I think I tried to reference that.
But it definitely did influence some of that from the readme as well i think i tried to reference that okay but it's uh it definitely did influence um some of my thoughts around it so i don't mean to like
discount anything that you know brought into the conversation i'm not trying to i'm just trying to
draw similarities yeah yeah and i think that they do it mainly to kind of keep the copyright clean
exactly and that's definitely part of what i'm doing i think my main focus is more mental health
and just you know keeping a really tight
scope, which I don't think necessarily applies for SQLite. I think they can broaden their scope
quite a bit. Have you gotten the call yet? The call from Mr. Hip himself? That's right.
No, yeah. They actually reached out pretty early on. We did a conference call with them.
They were super nice. Yeah, they got on and we kind of walked through how it all works.
And I was fully expecting them to think that I have done some unspeakable, terrible things to their database.
But they were quite supportive of it.
So I really appreciate that.
Yeah.
Well, you know, Richard, even when he was on the show, he talked about, you know,
essentially what you said in why you built Lightstream was SQLite is kind of touted as this toy database and not taken super seriously.
And obviously when Jared and I had him on that show, I forget what episode that was, but it was 201.
Yeah, episode 201.
Great episode.
And just a whole different side having had that conversation with Richard about SQLite and how it's used and even the business model behind it and how they run it.
And I just drew some similarities, I suppose, to the challenges you have.
And they had some pretty expensive prices on their pro support page,
which is they've been able to make money from events.
I'm hopeful for you, at least.
Yeah, I appreciate it.
You may be able to be in their stream, so to speak.
Yeah.
Pun intended.
And I've actually, I won't say who this was,
but I had a conversation with somebody
who was a CTO of a VC-backed company,
a database company.
And he had talked to Richard Hitt before.
And essentially, they run their group.
They make money through, i think uh memberships and you know i don't know how i have no idea how much they make
and but the the guy was talking to him like you know how much you know you know how are you guys
doing all that stuff and kind of like asking his approach to to doing it that way instead of going
the vc route and raising a bunch of money and doing a big exit.
And my understanding, this is again secondhand,
but that Dr. Hipp basically said,
how much time do you spend coding at your company?
And the guy at this VC-backed company is basically zero.
He's just kind of management and talking to VCs and talking to investors and whatnot.
And Dr. Hipp basically said, he gets to spend every day.
He gets to code. And he's like, Dr. Hip basically said, you know, he gets to spend every day, he gets to
code. And he's like, that's, and that's kind of like, that's kind of my end goal is like, I would
love to be able to get in a place where I can just work in open source. And like, I don't have any
interest in raising VC money. You know, if there's something that it would really help with, sure.
But like, at the end of the day, like, even if I thought about this, like, if I made $100 million,
like, I don't see my
life changing significantly other than I would just spend my time working on open source.
All the time.
I don't love like yachts or like fancy cars or anything like that. I just like,
you know, solving problems that I find interesting. So I think my long-term goal would be
somehow to make it a sustainable thing that I could just work on in that sense.
So that'd be my goal.
This episode of The Change Log is brought to you by Render.
Render is a unified platform to build and run all your apps and websites
with free SSL, a global CDN,
private networks, and auto-deploys from Git.
They handle everything from simple static sites to complex applications with dozens of microservices.
If you're a developer or a founder that's frustrated with AWS's complexity or Heroku's high costs,
you owe it to yourself to use the $100 in free credits they're giving our listeners to give Render a try. Render is built for modern applications and offers everything you need out of the box.
One-click scaling, zero downtime deploys, built-in SSL, private networking, managed databases,
secrets and configuration management, persistent block storage, and infrastructure as code.
Heroku customers running production and staging workloads typically see cost reductions of over 50% after switching to Render. Here's the best part. We work closely with the team at Render
to ensure you have zero risk by giving you $100 in free credits. Plus, they're going to assign a
world-class engineer to your account to offer guidance and answer any questions you have.
When you're ready to transition your infrastructure, they'll be there to help you with that
too. Automate your cloud hosting with Render at render.com slash changelog.
Get $100 in free credits to try the Render platform,
plus a world-class engineer assigned to your account to guide you along the way
to send an email to our special email changelog at render.com to get access to those free credits.
All that begins at render.com slash changelog.
So we've been talking for a while about open source,
but let's talk about the software, shall we?
Yeah, sure.
So the project is Lightstream. So it's L-i-t-e stream as you know s whatever however you spell that um stream yeah s-t-r-e-a-n
there we go yeah say it together there you go it's a way to basically if you have a sqlite database
you know you want to deploy your application on you little tiny $5 a month VPS, and you want that to run.
It doesn't need to be the biggest scale platform in the world, but most apps can probably run on a $5 VPS running SQLite.
But the problem is that if that VPS dies suddenly, then all your data is gone too.
So the idea with Lightstream is you could do backups every hour, every day,
but then you're losing an hour or a day of data if that happens,
if you lose that VPS.
So what Lightstream does is it basically runs separately
outside of your application in a little process
and continuously pulls in changes from your database
and streams those out to S3, like an Amazon S3, like an object store,
so that you're never losing more than a couple seconds of data
if your VPS just dies catastrophically.
That's the idea with it.
So that's kind of where it started,
and that's largely the use case I'm looking at.
But there's been a lot of really interesting use cases
coming from other people where they're like,
hey, can I run this thing?
But I actually want to have a bunch of read replicas too.
So it's really a way that you could scale out SQLite, which is kind of a weird idea.
Yeah, that's kind of a weird idea.
Yeah, and that's not in there yet, but that's definitely on the roadmap right now.
And I've had other people that are interested where, actually, there's been a lot of interest around this whole idea of the JAMstack,
where I've never really gotten into the JAMstack stack so please correct me if i'm totally wrong in this
but a lot of people they'll take the data that they have and they basically generate out the pages
and post those on a cdn so that you know you put those on a cdn and then everyone in the world
gets kind of a local copy of that page and it's super fast and super responsive but then if you
take that idea and you instead of generating all your pages,
you just have read replicas around the world
on these tiny $5 a month VPSs,
you could have a global application
where you have 100 millisecond or less latency
between you and the server for everyone in the world
because you're replicating it out,
which is kind of a weird idea.
There's actually a service,
I haven't used this yet,
so I guess I'm plugging them,
but I cannot vouch for them,
called Fly.io.
It's kind of like a Heroku.
They have persistent disks available as well.
But you can run those things
for a couple bucks.
And I think they have like 20 different regions where you can deploy out to.
So really, you could run this kind of like as a serverless platform, basically.
But you can run the serverless platform for 40 bucks a month, and you're running globally
around the world, and your users get these super fast latencies.
So there's a lot of potential for where Lightstream can go.
Sorry, that was a really expounded answer.
But the idea is really, in a nutshell where Lightstream can go. Sorry, that was a really expounded answer.
But the idea is really, in a nutshell,
Lightstream is meant to let you run SQLite in production.
Right.
And kind of whatever way you want to look at that.
Well, let's loop back around to the JAMStack bit,
because that is interesting and a conversation that's been somewhat ongoing on the show.
Maybe even more so on JS Party, but I want to loop back around to that.
Let's just start with SQLite in production.
First of all, I'm a fan, a SQLite fan.
But I do tend to reach for Postgres when it comes to production.
I don't know if I do that because I just feel like
SQLite's just not made for production.
We do use it, I guess, in one production capacity
for ChangeLog Nightly.
It's what backs ChangeLog Nightly,
but that's basically a batch process that runs nightly and sends out you know
does some processing sends out emails and persists you know its state in sqlite but it's not like a
web server that's getting hit by hundreds of requests a second and all that and i always
thought like sqlite was cool and all and for specific things like in your phone it makes sense,
but would you run it on a VPS
with a web server front end?
Aren't there concurrency issues with SQLite
or anything like that that you wouldn't want to do it?
It does run multi-threaded.
So I write Go.
That's my language of choice.
And I've written projects in SQLite.
And I will say, I guess, a few things on that topic.
It does well multi-threaded.
I can run thousands of requests at this VPS at a time.
And the fact that you can actually run a request.
And I've done testing where I've had several queries run on an HTTP request.
And the total time, and this includes rendering out HTML as well,
the total time to connect to the queries, pull that back,
render out the front end was about 50 microseconds.
The way that you develop, I find, with embedded databases
tends to kind of change your mindset a bit.
I have this theory that all databases are actually the same.
The only real difference that you have among databases is that is latency so like once
you have a client server situation you can't you know you have issues like n plus one queries so
really you want to optimize to get as much of your data back in a single query as possible
and you have to do joins you have to do all kinds of, there's a lot of stuff around ORM tools,
where they kind of like try to batch together requests.
And it's always a pain in the ass.
And, you know, that query language is what kind of really makes the difference.
So, you know, if you have graph data, you want to have a graph language.
If you have document data, you want to have a document language.
SQL, you know, works on relational tables.
But once you actually move all the storage locally
into the same process as your code,
you really don't even need those separate languages.
I mean, they can kind of help from a usability standpoint,
but from a performance standpoint,
you could just as easily look up your individual
traversed graph nodes locally using your own language
versus the actual query language itself.
Does that make sense?
That's a bit esoteric.
To a certain degree. Yeah, so I mean, like underline pretty much all databases, you know, there's some exceptions, but I would say most use a B tree and that's kind of, you know,
you have a thing that you store according to a primary key. And that's true in a document store
and a graphs database, pretty much all databases use that kind of underlying format. So it's not that I'm particularly in love with SQLite.
I think it's a good database,
but at the end of the day, it's a B-tree
that has some nice little SQL on top of it
that make it a little more usable.
In that sense, it's a bit of a rant,
but I think once you move the data locally,
then it really changes how you approach the database.
So what makes SQLite different than BoltDB, for example?
I mean, they're both similar foundations, but is it the query language?
Yeah, I mean, query language, I think the...
I've built applications on top of BoltDB,
and there are a lot of things I really like about it.
I would say the biggest thing that you miss
that's really nice about having something like SQLite
is that you're separating out your code,
kind of almost like your code schema from your data schema,
where you might change your, say, for example,
you change your application, you add a new type,
or maybe you split off some type in your code
into two separate tables.
Does that make sense?
And then you go to deploy that,
but if your code is very much tied,
or your underlying data in your database
is tied to the structure of your code in your application,
then it makes it really tough to transition
between versions of code.
Because when you deploy it,
your data is still in that old format.
So having that declarative schema
and being able to change that kind of separately from your code,
actually I found to be super nice.
And just little things like indexes and foreign key constraints.
So really pretty simple things.
Right.
I don't use any crazy features.
I mean, there are use cases for, you know, Postgres has all kinds of crazy features you can use.
It does.
But at the end of the day, I use 99% of my code.
It's just some select statements and some DDL.
Gotcha.
So you can go concurrent with it via threading.
And because it's embedded, I guess you don't have the network connection
set up and teared down, so you're not worried so much about badging
or pooling connections, right?
Because it's not connections.
It's just like
the same process in memory.
The only problem is that
it's just sitting right there inside of your
binary and you don't
have it backed up, but now you've got that solved
with Lightstream.
That's kind of the idea.
When I thought about what was the thing that was keeping me from
running SQLite in production,
replication and disaster recovery was really kind of the main thing.
And I actually spent a long time trying to figure this problem out.
The code itself isn't even huge.
You can open up the code.
It's not going to blow your hair back or anything.
It's not that fancy.
But trying to figure out how to actually make it happen was
like a long journey where i originally actually ported sqlite to go like it's kind of a thing i
do where like i don't understand code until i really work with it and kind of move it around
and the idea wasn't necessarily to like release that code but just really to try to understand
what was going on underneath and you know i did that and then i actually moved on i
tried to do uh do you know fuels a fuse file system is it's like a network mount thing uh
sort of it's like a you can build your own file systems in in linux basically and with fuse and
it's this weird uh it's like if you wanted to make a file system of all your github issues you
could have like this intermediate binary that kind of interacts between your unix commands like ls and whatnot and then your binary
can translate those commands into okay you know github calls or something like people do all kinds
of weird things with it so i kind of built like a i tried doing an intermediate fuse file system
where it kind of intercepted the rights to sqlite and replicate those that would kind of intercept the writes to SQLite and replicate those. That was kind of overly complicated.
And then
the actual trick with
Lightstream, the thing that actually makes it work
is that, so there's
a write-ahead log in SQLite. I don't know
if this gets too much in the wheeze here.
But every time you write to the database, it doesn't
write to your data file. It writes to this
write-ahead log. And those
writes kind of, they're just append only.
So they keep getting tacked onto the end of your write-ahead log.
And then eventually, you know, that write-ahead log gets too big.
And it has to do a thing called checkpointing,
where it essentially moves all those pages from your write-ahead log back into your database.
And the issue that I had originally is that I didn't have any control over when SQLite would checkpoint
and move that stuff back over.
And that's kind of the key.
You don't want your underlying data through your wall file to disappear because that's what you're replicating from.
But SQLite has this little caveat where it actually can't checkpoint if there's an open read connection on the database.
So Lightstream actually keeps a persistent read connection on your, or like transaction on your database at all times
and has some tricks around when to release that and checkpoint back and it kind of takes over
that checkpointing process so lightstream essentially controls that whole process and
is able to capture every wall frame like wall right that goes in and then can ship those off
to s3 so when you you take the kind of sum total of all those rights and you replay them
then you basically get your your database that you uh your end state of your database
does that make sense i know that's yeah so it's kind of like i wouldn't call it like hijacking
that right ahead log or it's kind of like forcing it to be there long enough that it can piggyback
the data over and then it flushes?
Yeah, and the wall basically acts like a circular buffer.
So it kind of goes to the end and starts back at the beginning.
So it essentially just keeps track of the end of that
and tails it, more or less.
And that doesn't degrade the performance of the production database at all?
No. And actually, when you're running it,
Lightstream uses almost no CPU at all or anything. And actually, when you're running it, Lightstream uses almost no CPU
at all
or anything.
It's pretty low overhead.
Most of the stuff
is in the OS page cache
anyway,
like the data itself.
So you're not really
even doing much disk access.
And yeah,
there's definitely
some optimization
still to be done,
but you generally
shouldn't see,
you shouldn't really notice
Lightstream running.
Have you tested it
against larger, large databases like, you know, gig notice Lightstream running. Have you tested it against larger
databases like
megabyte, gigabyte sized
SQLite files?
I have a VPS running
at all times.
This is the one thing that actually gives me confidence
around Lightstream.
There's two different kinds of replication. You can do logical
replication, which is where you say
someone submits an update X know, for all your records,
and you're storing kind of that command of how to make the change. And then there's physical
replication, which is what Lightstream does, where every page that gets written, we actually
replicate that whole page. And then we can replay those pages to build the database.
So what Lightstream is able to do is that it can actually check some build the database. So what Lightstream's able to do
is that it can actually check some of the database.
So you can do basically an MD5 hash on the database
at a point in time, and then it'll replay
the replica from S3, and those two
should match byte for byte.
So there's a VPS I run that actually constantly pulls
from the GitHub archive.
So it's just pulling in events from there,
and building, pushing them into a database,
and then every hour or so
it actually pulls down the replica,
replays it all,
ensures that they're byte for byte
matching exactly.
And yeah, it does great.
I haven't had issues with multi-gigabyte databases at all.
Cool, and it just keeps growing.
Yeah, it just keeps growing.
Growing, growing, growing.
It's kind of like what Changelog Nightly does only we're not storing the actual events another little interesting bit is
like s3 is super cheap like the you get billed for a couple different things you get billed for
the number of files you push up there like the actual request itself but you don't actually get
billed for the the bytes that you push up like you can send up a 10 gigabyte file but you
only get charged for a single put request it's only when you download the the data that you
really incur much charges so i think the the put requests i think cost like five thousandths of a
penny or something like that for each request so you can essentially run you can run light stream
um where it's you know pushing up every about 10 seconds
and it costs you about $1.30 a month.
And because you don't have the overhead,
like you don't get a cost incurred per byte sent up,
you really have minimal costs in that realm.
So it's a weird, like super cheap backup strategy.
That doesn't seem like it should work,
but the actual economics work pretty well.
Although the VPS that I run to continuously verify it that doesn't seem like it should work but the actual economics work pretty well although the
vps that i run to continuously verify it does actually cost a little chunk of change because
it's constantly down gigabytes of data so right so you're just replacing the same file over and
over again versus proliferating files right is that why it's a single put no it's actually so
it's doing a new put for every new chunk of wall rights that gets pushed up.
It'll snapshot it periodically as well.
You generally have about a fixed size of data that you're pushing up.
SQLite files tend to compress really well.
B-trees in general do.
They tend to have a lot of empty space.
The actual monthly cost of the gigabytes tends to be pretty trivial too.
It's also a weird thing too,
where people I've had people ask me like,
if I'm going to start a business around this thing, um,
and I've had interest in VCs and whatnot,
but like it has this,
um,
this thing where it like almost shoots you in the foot where it's like so
cheap and so easy to run that like,
I don't think I can't think of a service that would actually make it like
easier or cheaper or like better necessarily like I could sell.
So that's been a, it's, it's worked out great so far, I could sell. So it's
worked out great so far, but not from a
money-making standpoint.
That's not really...
Scale so well that you can't sell it.
Or make a service that makes it better.
It is one of those things, yeah.
In that blog post, though, where you talked about why you...
I think you said it's titled
Why I Built Lightstream. You mentioned about scaling.
Can you talk about scaling a little bit there?
Because I'm sure that once you've proved it's stable and usable
and you can actually use it,
at some point you're going to rely upon it
more so than just simply a Greenfield application.
You'll need to scale to more CPUs, more RAM, more servers.
Sure, yeah.
Talk about that.
Yeah, so I think scaling is an interesting topic in our field.
I feel like it's been an obsession over scaling and uptime, I think, that have kind of gone
off the rails over the last 10, 20 years, where we have this idea of everyone tries
to build their application to be the next Twitter or whatnot, or people worry about,
what if I have to scale?
It's crazy in whatever amount of time.
And generally, that's not the case, first of all.
But given Moore's Law, where we are seeing exponential increases in compute that we have
available in a single box, but for some weird reason, we keep having this exponential scaling
of the number of nodes we actually need to run to run applications seems backwards to me like we have you know we have nodes on amazon where you can get you can spin up a 96
core box for you know however much money a month but that's a lot of cores like each one's doing
30 you know 3 billion operations per second you know we should be able to run you know a couple
hundred hdb requests to that so as far as as the scaling piece, I find that most people, if you're running a local SQLite database, you're not going to hit those scaling concerns.
Actually, one scaling concern I find people actually hit is things like Postgres tend to have a high overhead for connections.
So you end up having to put in something like pgBouncer in between that can actually start to pull those connections to not overload Postgres. Whereas you just, you don't
get that when you have an in-process database. So, you know, from that standpoint, it's great.
I would say that, you know, if you're running application, you know, again, I write in Go,
it's super fast language and running locally, I can run, you know, I can push through thousands and thousands of requests per second
on pretty modest hardware. And I think that really covers probably 90% of applications out there that
people are going to write. And even if you don't use SQLite for your main company's application,
there's probably a ton of applications in your company that are on the side or periphery that
don't need to be and you know some huge kubernetes cluster so i'll say that on the scaling side and
then on the uptime side i feel like people have this obsession around uptime but i feel like the
more tools that people add and i don't really mean to rag on kubernetes all the time i do but
i think it has a tool that has an appropriate use case,
but it's not the vast majority of people's use cases.
I think that from an uptime perspective,
I think you're getting many more layers of complexity in there
that are going to cause you to have more downtime
than simply running a single node that may go down
because of a network connection once a year,
or a couple times a year for a couple minutes.
I don't think people are really taking the cost
of downtime when they think about
the trade-off they're making
to make these complex systems
that give them the illusion of uptime.
Hope that makes sense.
In your blog post you mentioned solutions
such as Kubernetes tout the benefits of zero downtime deployments,
but ignore that their inherent complexity causes availability issues.
Then you link out to this other thing, which I had no clue of before,
which is a public postmortem website for Kubernetes.
And there's just like a lot.
List of postmortems for Kubernetes.
It's k-s.af.
That's a compiled list of links to public failure stories
related to Kubernetes.
Most recent publications are on top.
But it's, I mean, it's a few,
it's several scrolls.
So there's a lot.
I don't want to like, you know,
people have put in good effort into Kubernetes.
I don't think it's a bad piece of software.
I feel like core Kubernetes
is generally good. I feel like the ecosystem around it is overly complex for most people and you know i feel like kubernetes
is the future but i don't think it's the present right now like i feel like people really need to
have a great use case for why they're going to use kubernetes before they jump on there you know
i've worked with companies before that are trying to evaluate their Kubernetes strategy before they actually have customers.
And that seems insane to me.
Yeah, it does.
I generally have a rule of thumb that the cost of going to Kubernetes is probably, say,
a million dollars.
And it's not meant to be like a hard and fast rule, like it's going to cost that much for
everyone.
But you need to have a million dollar problem that you're solving with Kubernetes.
And if the idea, if the number one million dollars
sounds like a lot of money, you shouldn't be using Kubernetes.
It's probably well beyond your problem space. So that's my personal
view on where we're going with technology and the complexity around it.
I don't think people should take on those tools lightly.
What would you consider the best use case then for,
I'm going to say it like Richard Hipp says,
which is SQLite.
I'm sorry to correct you guys on that
because that's what he said.
SQLite.
He's not here right now.
So what's the best use case for SQLite
and then using Lightstream?
If someone's using Postgres
or they're chasing uptime,
they're chasing scaling,
why would a team or an individual developer
that's building an application choose SQLite or Lightstream?
Sure, yeah.
I mean, I guess I kind of think of it in the opposite direction.
I kind of start from a default of, hey, SQLite, as they say it.
It's supposed to be like a stalagmite or stalactite.
Like a meteorite.
It's SQLite.
You know, actually, so this is a bit of an aside.
I cannot for the life of me pick up that pronunciation.
But whenever I'm writing, there's always a distinction of,
like if you call it SQLite, then you would say a SQLite database.
Whereas if you call it SQLite, you you would say a SQLite database. Whereas if you call it SQLite, you'd call it an SQLite database.
I always have this torn around the grammatical side.
Anyway, I think of the actual deployment from a different side
where I feel like most applications would probably work fine on SQLite.
I think you really need a good reason
to move off of that.
If you're going to start introducing additional tools,
you're doing multi-node deployments,
I think that you really should have a good reason for that.
There's an inherent complexity in that,
in that once you move away from a single node,
there's a lot of things you can't do anymore.
You might have a Postgres cluster
and it's connected to from multiple nodes, but that becomes slow because of
latency to the database. So you may want to add some kind of in-memory cache, but you can't add
an in-memory cache on the, you know, the web nodes because those are all connected to the database
and they don't have a full view of, you know, if changes came through a different web node.
So then you have to use something like memcached
or maybe a Redis node.
So you really, you know, this phrase like complexity
begets complexity.
Like you're going to, anytime you add more complex systems,
those complex systems are going to probably rely
on more complexity later on.
Like you're not going to have a full view
of the complexity you're adding initially.
So to answer your question, I think most people should run SQLite databases,
especially now that you can run them safely.
And I think you should really have a good reason not to if you're not going to.
Well, I'd say we would loop back around to the JAMstack.
So I want to do that before we forget.
This idea of read-only replicas and basically shipping them off to points of presence
around the world so that not only is your static
assets CDN'd but your data
store is CDN'd effectively. And so you could run, we talk about
edge computing and you have these functions on the edge
and Jamstack proponents are big on that.
But I always say, well the function's running on the edge, but anytime it needs
to interact with my database, it has to come all the way back to whatever
centralized server the actual backend is running on.
It has to incur that cost. Of course you can cache and stuff,
so there are advantages of doing that,
but ultimately your database is still in one place
or a few places.
And so the goal would be to get your database
just everywhere.
And not have to worry about how that works.
That does sound pretty awesome.
And so I've kind of just been saying,
and then there's FaunaDB's kind of doing that,
and I think Cockroach has some sort of angle into that.
There's people working on this.
Mostly what people say is like,
well, it's being worked on.
And so everybody kind of wants that,
because once your database servers are just CDN,
then of course your application servers
can just be that way as well,
if you have a separate app and DB.
But in the case of an embedded database well your application's already out there and your
database is embedded and lightstream's just managing that so it sounds really rad
yeah that's but they're read-only replicas so when it comes to writes you'd still have like a
centralized thing but rights are usually less often than reads so it's like not the panacea
but it's pretty stinking close if it could work well.
Yeah, exactly.
I think people generally, at least a lot of the web apps that I've worked on over the years,
tend to be 90% reads, 10% writes.
You go onto a website, like an e-commerce website,
you're probably browsing around a bunch,
clicking on at least nine different pages before you actually check out.
And I think that idea of the read mostly apps
really benefit from this kind of thing.
I think most people are pretty okay with,
they get on a website and by the time they have to go check out,
they're used to waiting a couple seconds at worst
for a credit card to go through, that kind of thing.
I think the expectations around that are pretty okay.
But to be able to actually get, to snap through a website,
to browse around an e-commerce website,
and every page loads in sub-100 milliseconds,
I think it would be awesome, no matter where you are in the world.
I think that's a pretty compelling case.
So what would it take to get SQLite So what would it take to get SQLite?
What would it take to get SQLite?
We're going to spend most of the rest of this podcast on that.
I'm going to call it the way I've been doing it the whole time.
We'll make it real hard for our transcriber.
To just throw back to the episode we had with Richard Hitho, Jared,
you did say that you were going to try hard to say it the way he said it.
I did try hard, and that was like 10 years ago.
And I had given up.
I also told Gregory Kurtzer that I would pronounce it his way
because he was right here on the show.
And once he leaves, I'm going to go back to my own way.
That's true.
So that's what I'm doing with Richard Hipp.
Okay, you've got an out.
Go ahead.
Sorry, Richard.
Go your own way.
Sequelite.
I go back and forth.
I call it SQL, then I call it SQL.
I have no consistency.
Not internally consistent.
And now I've lost my train of thought.
Thanks, Adam.
What was I talking about?
What would it take to get SQLite
plus Lightstream deployed in such a fashion?
You mentioned there's some serverless platforms.
Maybe they would have to use
Lightstream somehow.
Can I just go to DigitalOcean
or to Linode and just pick
VPSs around the world
and then just do my own thing?
How would it actually play out?
Sure, yeah.
I think the biggest issue you really have
around
these re-replicas,
especially serverless,
is you really need all your rights to go to a single node.
It doesn't really make as much sense if they're going everywhere
because most of them are going to be read replicas.
So I think solving that issue is probably the biggest one.
You can certainly do it in your own code.
It would be nice to make it more automatic.
I'm not quite sure how that would work.
But once you redirect your writes,
say you're pushing all your posts and puts
and patches http methods over there to one single node you know i think that makes it a lot easier
and then from that read replica is coming into light stream and the next version and that
basically has it basically streams out those changes to all the different serverless nodes
so that one system the fly fly IO, they have persistent
disks, which solves a lot of the issue. Uh, you can do it without persistent disks too. Um, but
you get some issues around, you essentially need to download the database on startup of that
serverless function, uh, when it's cold and actually, uh, bring it into the local file system.
So that can be, that kind of negates some of the benefit
of a fast serverless platform.
So those are kind of the two main issues.
So the persistent disks, I would say,
you can solve that, but otherwise it's redirecting writes.
Yeah, as you redirect writes,
you're kind of turning SQLite
into a client-server database, though,
because you're pushing all your writes
to one particular instance.
And so aren't those other instances
having to basically become clients of that instance?
I wouldn't go that far.
I think you can do a lot just simply with rerouting
or doing a proxy through an HTTP server.
I think you could probably make a lot of that invisible.
I see what you're saying.
So the proxy can't handle it.
If you can guarantee that all your Git methods are going to be read-only
then I think you could probably easily do that
yeah fair enough that's a good point
read-only replicas are coming soon
and then is Lightstream done
or is there a future beyond that
for the tool
or do you feel like that's your scope
and you're sticking to it
I would say that's largely the scope
that I'm looking for I really want to make it just hardened and just work as easily as
possible like i think that's where a lot of work really uh really comes in is just like getting
every single little edge case that comes up and making sure that it flows smoothly and that you
can use whatever you know s3 store you want to use.
Making it work well with NFS disks is another thing.
There's some different configurations you can do with it.
I don't really have any big plans for anything crazy beyond that.
Honestly, if I can get a globally distributed SQLite database,
I'm pretty happy.
Well, Ben, thank you so much for, I suppose, being bold to say no to contributions. Bold not to ruffle some feathers, but I mean, that's can kind of see some details there. But hearing a full-length episode like this, I think,
does provide some pathways to understand what a maintainer is truly trying to do with their software.
So I appreciate you sharing your time and your wisdom here today.
Thank you, Ben.
Yeah, thanks for having me on.
I really appreciate it.
That's it for this episode.
Thanks so much for tuning in.
I want to give a plug for Ben.
He's got an awesome blog out there called Go Beyond.
It's at gobeyond.dev.
We're huge fans of Ben, so make sure you check that out.
If you haven't heard yet, we have a membership.
It's called ChangeLog++ because, hey, why not increment things?
It is better, as they say.
You can subscribe at changelog.com slash plus plus.
Get closer to the metal.
Make the ads disappear,
and of course,
support all of our podcasts.
Again, changelog.com slash plus plus.
And of course,
huge thanks to our partners,
Linode, Fastly, and LaunchDarkly.
Also, thanks to Breakmaster Cylinder
for making all of our awesome beats.
And of course,
thanks to you for listening.
We appreciate your attention.
We appreciate you listening.
And one more step you can take is to join the community.
changelog.com slash community.
It's free to join.
Come hang with us in Slack.
Call this place your home.
changelog.com slash community.
That's it for this week.
We'll see you next week. Game on!