The Changelog: Software Development, Open Source - Open source, not open contribution (Interview)

Starting point is 00:00:00 Today on The Change Law, we're talking with Ben Johnson. Ben is known for his work on BoltDB and his work in open source as a freelance Go developer. Late in January when Ben opened source his newest project, Lightstream, in the readme he shared how the project was open source, but not open for contribution. His reason was to protect his mental health and the long-term viability of the project. On this episode, we talk with Ben about what that means, his thoughts on mental health and burnout in open source, choosing a license, and the details behind Lightstream, a standalone streaming replication tool for SQLite.

Starting point is 00:00:32 Big thanks to our partners, Linode, Fastly, and LaunchDarkly. We love Linode. They keep it fast and simple. Check them out at linode.com slash changelog. Our bandwidth is provided by Fastly. Learn more at fastly.com. And get your feature flags powered by LaunchDarkly. Get a demo at LaunchDarkly.com. Linode is simple, affordable, and accessible cloud computing the developers trust.

Starting point is 00:00:58 Linode is our cloud of choice. We trust them, and we think you should build anything you're working on, a fun side project project or that next big info move at work with linode the best part you can get started on leno with a hundred dollars in free credit get all the details at leno.com slash changelog or text changelog to 474747 and get instant access to that hundred free credit again leno.com slash changelog. Ben, you're back. It's good to have you back on this show.

Starting point is 00:01:40 You've been on GoTime. You've been on the change like forever ago, basically. But, you know, we don't want to bear the lead. There's a new open source database out there, Lightstream, but the reason why we reached out to you was because of I suppose the anti-normal of open source but closed to contributions aspect

Starting point is 00:01:58 of what you wrote there. Let's open it up there. Like you mentioned, it's a database. It's actually a database tool that wraps around SQLite to let you stream your data into the cloud, basically. Just run, so you can run SQLite in production and have it safely persisted. And yeah, it got a lot of notoriety early on,

Starting point is 00:02:17 not for the actual code that I wrote, but for the kind of the code that I'm not allowing. And then that it's really, yeah, there's a closed contribution policy on the repo. And it came from just kind of some other projects I've done over the past, like BoltDB I wrote quite a while ago. And a lot of them just kind of became, just being a lot of maintenance and not even just like checking code and doing all that, but just like responding and just really,

Starting point is 00:02:47 you're trying to take a lot of people's desires for what they want in the project, and 90% of the time you have to say no, that's not really where we're going with this, and just trying to figure out that overhead and trying to mitigate that. Yeah. What you said in the readme was, you've said some more than this,

Starting point is 00:03:02 but I'll give the TLDR version of it. I've made the decision to keep this project closed to contributions for my own mental health and long-term viability of the project. Which I think will go into the deeper parts of it because you've done other open source before and you've got some scars and some history to you. And some aspects to, I guess, what motivates you. But what kind of feedback did you get initially from this? Was it a lot of high fives? Or was it a lot of like, whoa, hey, Ben, that's kind of wrong? Or what happened?

Starting point is 00:03:32 What's the fallout so much from that line there? Sure. I mean, I was fully expecting people to just rag on that. Actually, I expected people to not really even notice it because it was buried at the bottom of a long readme at first. And it somehow made to Hacker News. And honestly, I would say it was buried at the bottom of a long read me at first and it somehow made to hacker news and uh honestly i would say it was probably 95 supportive just other people just kind of saying oh yeah i've totally been there too it's just a lot to take on and take it in changes

Starting point is 00:03:57 and uh try to manage that thing and really like i guess my goal i i really try to distill it down to like what is my goal for this project and i I think, you know, I don't, I tend to make tools that are minimal. Like I have a fixed idea of what I want to build. So for, for Lightstream, I want to run SQLite in production and anything that doesn't really support that, you know, any extra use cases are just not that important to me. And I want to make it as simple as possible. So I didn't really necessarily want to make the biggest project or the fanciest project. I wanted to make something that just kind of works and works well for what I'm doing. So I didn't see external contributions really moving the needle, you know, in that, you know, for that kind of thing. And actually that being

Starting point is 00:04:38 said, I feel like there's a fascination in our industry where around code and like contributing code, but like, honestly, like I feel like the code piece is such a small part of it all. If anything, I would love to have people try it out, test it out, submit feedback, bugs, that kind of stuff. I feel like that is like, or even like docs changes. I feel like that's like 90% of the project. And then the little bits and bobs of the code are a smaller piece. That's why we wanted to talk to you really about this, because I feel like there's a lot of the project. And then like the little bits and bobs of the code are, you know, a smaller piece. Well,

Starting point is 00:05:06 that's why we wanted to talk to you really about this, because I feel like there's a lot of nuance here. And, and, you know, prior to that, you said you're grateful for community involvement, bug reports.

Starting point is 00:05:13 You did say those things. And, you know, but the highlight really was that you wanted to keep it close to contributions for whatever reasons, whether it's mental health or long-term viability, as you mentioned, but just for whatever reason you wanted to kind of keep the code base itself

Starting point is 00:05:27 limited to your input because you had a specific scope. And I think that's where you kind of have to have a podcast like this go into those details where it's literally Ben Johnson sharing with us the details of why that makes sense for your project and how you can see community involvement still taking place, but just not so much in the contribution to the code base itself. Yeah, for sure. And actually, I can give you some good examples too recently where some people, you know, one thing that people wanted to see was Windows support and the code changes to make Windows support happen were relatively small. I would say it was probably a dozen, two dozen lines

Starting point is 00:06:04 of code. But actually I haven't run Windows in 10 years, 10, 15 years. So actually getting in and reading the docs on how Windows services worked and getting it up and running and getting a VPS started that runs on Windows doing RDP over to that and logging in, setting stuff up

Starting point is 00:06:19 and all the packaging stuff around MSI installers, getting a code signing certificate like there's just like a million things to do to actually make this really run and like have a good developer experience that aren't just those you know 20 lines of code that were you know pushed in so i feel like that those are the kind of the underappreciated parts that you really just never see but that's really kind of what makes the project rather than the actual the code itself actually another one that came up right after that was s3 compatible stores so right now it pushes to s3 amazon s3 and there was i don't know maybe like a four line change to make

Starting point is 00:06:56 it work with like minio or gcs yeah google cloud storage a couple other cloud stores and those little bits aren't that hard just to kind of put a little tweak in there, but I wanted to make sure that the experience of getting on and trying it with those pieces and how they integrate into the docs and changing the getting started to make it simpler for people to actually try it out and going through and testing,

Starting point is 00:07:19 all those things, it's just crazy. Everything beyond the code that really doesn't get talked about. But it's just hugely important. Yeah the code that really doesn't get talked about. It's just hugely important. I may be splitting hairs here, but you say it's not open to contribution, and it sounds like those are all contributions. That's a really good point, actually. That's probably

Starting point is 00:07:35 some copy editing I need to change. Those are huge contributions. I guess code contribution is the thing it's close to. Only Ben can write code, but everybody else can be. Because the question is, if I don't want contributions, then it's like, well, why did you open source it? But it's clear why you open sourced it, because you do want participation or key to community involvement and all these things. It's just specifically, you're writing the code for this project.

Starting point is 00:07:58 Yeah, and I think there's a lot around the actual direction of usability, how you want it to feel, how everything integrates together that I think is easy to miss if you're an outside contributor just bringing an initial PR into the project. And I think I could certainly get people up and running

Starting point is 00:08:18 and explain to them why certain things go together or certain things work the way they do. But again, that's just a lot of overhead that I'm not necessarily opposed to, but is that time better spent building the product and making it kind of gel a little better together? And I guess from my side, I haven't gotten to that point where I need a second person to come on and really commit code in that kind of way. So break it down for us then, a suggestion or a contribution non-code-wise.

Starting point is 00:08:46 How does that happen? Does it simply happen in issues or, hey, Ben, by the way, I want to have not just simply S3, I want to support Minio or XYZ store. How does that permeate into the actual code base? Does it just simply come through you or how does it work? I mean, yeah, issues have been a great way. It's been pretty active on there so far. People just, if they have bugs, they tell me,

Starting point is 00:09:07 or if they have issues running it or whatnot, that's a great way to do it. The GitHub discussions I've actually really liked, where you kind of have some, like a threaded discussion board, which I feel like they haven't really announced enough because I don't know a lot of projects using that. But I find that's a great way to kind of get people on

Starting point is 00:09:23 and talking about stuff that don't feel like they necessarily have an issue. They just more have a question or like, using that. But I find that's a great way to kind of get people on and talking about stuff that don't feel like they necessarily have an issue. They just more have a question or like, you know, what's the best usage for this kind of thing. So I think those are great ways to do it. As far as the, like the documentation, that's actually all MIT licensed. So if someone wants to come in and make changes, suggestions, or, you know, fixed typos, that kind of thing, or any, whatever contributions from a doc side, That's all open source and open contribution.

Starting point is 00:09:48 Open, I guess, not really code contribution, but contribution. So you didn't make this decision in a vacuum, and your previous experiences obviously informed this decision, at least some. So you're talking about mental health and

Starting point is 00:10:03 really your enjoyment overall and the success of the project are kind of informing this decision, at least some. So you're talking about mental health and really your enjoyment overall and the success of the project are kind of informing this decision. What have you been through that brought you here? Have you been through burnout? Have you been through terrible pull requests or low-value code contributions? What's the kind of stuff that you've been dealing with

Starting point is 00:10:21 over the years? Sure, yeah. I mean, I think, so previously I'd written BoltDB, which is a database in Go. It's a key value store. And that project, you know, there are definitely valuable contributions. I don't want to, like, diminish that.

Starting point is 00:10:34 But I feel like a lot of contributions, either they can fall in two buckets, I'd say that, or a few buckets, I guess. So you can have, you know, very small kind of trivial contributions, which, you know, I don't have anything against small contributions at all. But then you also have kind of like mid-sized to large contributions, which can really either skew the scope of your project very much.

Starting point is 00:10:55 And a lot of times you just can't accept those, or you have to do a lot of changes to really accept those. And the other side to that is that if you do get some great feature added, the person that added that feature that sent the pull request, they're not probably going to be around six months from now when people are asking you to support that. And something's broken on it, you've got to debug that. So I guess I kind of come from a database background over the last decade or so. And I find that people in other kind of realms of the industry, I feel like they really focus on like, hey, look at this new feature, this great feature we have, blah, blah, blah. Like they really tout that. Whereas in my position, I really see features as like a liability. Like every little feature I add is something that could possibly corrupt a database. Like

Starting point is 00:11:40 it's really pretty serious. I mean, not Lightstream. Lightstream doesn't actually write your database. But there are huge liabilities if you just add some small pieces of code. I feel like there's a utility side and a liability side to every feature. I feel like the utility side needs to vastly outweigh the liability side. So that's why I feel like I tend to reject a lot of features.

Starting point is 00:12:05 I've heard it said often that code is a liability and features are assets, but I've never heard anybody say features are liabilities. I definitely see where you're coming from. That being said, you're probably talking about the code of the features, right? The maintenance of the features, which are liabilities. I think you interact, like

Starting point is 00:12:20 features interact with each other, and the more that you have, I think those interactions really grow kind grow exponentially. Whatever, geometrically. Some kind of math equation. They can really grow as they interact with each other. There's just going to be unexpected ways that they do that. I think features really very much do,

Starting point is 00:12:43 even from a documentation standpoint, usability standpoint, are liabilities. Yeah. And they're kind of one-way streets as well because it's easy to add, but it's very difficult to remove, especially if you have empathy for your users. I can't just take away this thing

Starting point is 00:12:58 that you're relying upon, but it's really screwing up this part of the code base. And so in that way way it's a liability even though that person sees it as the value as the maintainer all you see is how it's slowing you down or causing you headaches because taking it away is selfish lots of times as a maintainer yeah you know i thought about um like i thought about different ways you can kind of run byte stream and whatnot and i've really been trying to figure out how to run SQLite and a serverless platform

Starting point is 00:13:28 which is a weird idea but I feel like it'd be an easy way to get people to deploy their applications and it runs just simply with no configuration outside of your serverless platform and for that one idea I had was running kind of a service for people

Starting point is 00:13:44 where they can replicate to the server that's outside their serverless platform. Sorry, this is a long story. But in thinking about that, it kind of stresses me out a bit to think, hey, if someone does choose to use a service that they put out there, I can't just give up on it. They might be really relying on it for their business or their organization. And I feel like this is a commitment you really need to consider a lot, for sure, in the long-term effects. Features are kind of like hiring people, in a way, right?

Starting point is 00:14:14 If you want to have an analogy, I think of that with business even. I can recall back in the day when I worked for a non-profit, and I was very green behind the ears. Or what's the word? Wet behind the ears. Wet behind the ears. Is that a bad term to say these? I don't know. I think it means you're fresh out of the water, isn't it? Yeah.

Starting point is 00:14:31 I was inexperienced, let's just say, in the realm of business. A lot of ambition, but my boss, the founder of the company, the nonprofit company, I was keen on getting the help because we needed support in the design front. And I wanted to hire somebody like ASAP because I was the one feeling the burden of the need essentially, so the feature. And he's like, we got to be slow to hire. And he taught me this lesson essentially about being slow to hire, and it seems similar with slow to feature essentially. Because for every feature or for every hire, you may have to eventually deprecate it or fire them or circumstances change. And so just be very wise and very calculated with your hires or very calculated with your feature adoption. I agree 100% for sure.

Starting point is 00:15:18 And it's easy to get in over your head because when that feature comes in for free overnight while you were sleeping and all you had to do was hit this button, it's exciting that somebody likes your software enough to work on it. I haven't had a successful project like you have been with BoltDB or any of the other stuff. I've had things where maybe it's my open source deal or I would love the contributions, I never quite got there.

Starting point is 00:15:42 But I've gotten features, little ones and medium sized ones on a few projects. And for me, it's always been like, I'll giddy about it, but that's like kind of like you get a dopamine hit and it's enjoyable. And so you just do something quickly cause you're like, yeah, somebody cares. But then six months down the road, you're like, why did I do that? Yeah, for sure. I still feel the same way when people submit issues on the project or submit feature requests

Starting point is 00:16:13 and things like that or want to discuss it. I love talking about this stuff and working on it with people. But yeah, I totally agree. There can be some regret later on. So when you made this choice, have you read Nadia Ekbal's Working in Public? Or have you been thinking about these things?

Starting point is 00:16:29 Because the reason why I made the connection, I thought maybe he was inspired by that because she said in that book and on our podcast afterwards, like one of the things she's realized over this time, you know, researching and being part of open source is like she realized that open source doesn't mean open participation. And it doesn't mean open participation and it

Starting point is 00:16:45 doesn't have to mean that and that resonated with me and when i saw your post i thought i wonder if maybe you had been inspired by that concept or maybe you came up with this completely in a silo i mean i haven't read the book i may have seen other posts of hers i'm sure yeah she definitely influences thoughts around this i think that there's definitely a crowd of Twitter OSS maintainers that we commiserate a bit to each other when we see a project gets closed down because someone gets burned out. You see these large public things that happen like that

Starting point is 00:17:19 where it happens, someone has a hard time, closes a project or it shuts down, or something goes sideways, and a bunch of people all kind of know that feeling and kind of share that feeling. And I think, I'm passionate about open source sustainability, but I think it's just a hard problem. I don't know as far as how do you get people working on open source, which is free,

Starting point is 00:17:41 and I don't feel like people have really found great ways of making money off of it to like sustain them in that way financially so i think there's people out there trying to figure this stuff out like her and i don't think there's really an answer quite yet but i think that trying to maintain like lower that burden at least in some way i think can maybe help in some incremental small way yeah well i think that your choice here and i think probably her findings and her statement which she said on this show and elsewhere

Starting point is 00:18:07 that it doesn't have to be participatory because many times it is the situation where you have the one maintainer serving the many users and the contribution does not scale alongside the user growth. She calls that a stadium. I don't know it's like you're

Starting point is 00:18:23 kind of a rock star in a stadium you know there's one person on the lead microphone and there's a hundred thousand in the stands pre-covid now there's like uh ten percent of a hundred thousand people on the zoom there's cutouts there's a hundred thousand cutouts out there you know so like that circumstance happens a lot where the growth of the project happens but the growth of contribution doesn't scale or doesn't match. And that's okay. And it's okay to actually even say

Starting point is 00:18:49 that's what I'm going to do. I'm the only person on this and that's the way I want it to be. And I think it's fitting for scoped things like this, like the tools you like to build where it's not a thing that you're going to work on into infinity necessarily unless it grows outside of the scope that you initially defined.

Starting point is 00:19:07 But I think what that comment from her did, and probably what yours has done with this, putting your stake in the ground, and then having it on Hacker News and 95% positive, it's probably given a lot of other people permission to do that. To feel like, oh, Ben can do it, I can do the same thing. Because a lot of us like to put stuff out into

Starting point is 00:19:28 the world just for that reason. And don't necessarily want it to be community projects. Open source does not have to be community maintained. It doesn't have to be. I was thinking about this today. I feel like there's two kinds of projects out there.

Starting point is 00:19:42 You have frameworks and you have libraries. This is kind of the debate out there. Do you build this big scope thing, say like a React or a Kubernetes? I'm not going to build that by myself. It's meant to have this huge overarching scope that your application runs on top of versus say you have a library or a tool

Starting point is 00:20:01 that is kind of an incremental small scope piece. And I don't like writing frameworks. That's just not my passion. I don't like those never-ending scope projects. I like saying, hey, here's a problem and here's a solution and build a tool for it. So I feel like my favorite projects are those kind of projects. And within those tools, I feel like the best projects of those tend to have kind of that benevolent dictator for life kind of management around it so why do you do open source like what's your intrinsic reason i would say it's twofold i like the reach that open source has where like you know when i wrote

Starting point is 00:20:37 bolt db i had people say that i could try to monetize it i don't have any idea how you monetize like an embedded database like that but i'm sure could, I'm sure it could make more money than I did off of it, obviously. And to take that and then say, Hey, this is free for anybody to use. And it gets picked up by another project. And then like it got picked up by etcd. And then etcd got put into Kubernetes. And it's just kind of, it's crazy to think of the reach that, you know, BoltDB, while it's a small, small piece is, you know, deployed in some of the largest companies in the world, you know, bolt DB, while it's a small, small, small piece is, you know, deployed in some of the largest companies in the world, you know, helping to, you know, persist state in their etcd cluster. So like little things like that, just seeing that I can make some small incremental

Starting point is 00:21:14 change in the world that has large reach. So I'd say that's why the biggest reason I do open source. And then kind of a secondary reason is like, you know, a lot of things that you do at a day job are, you know, you're doing credit apps or you're doing things that, you know, move the business forward. But they're never going to be like this kind of edgy, researchy, kind of like down in the weeds, fixing some really deep, interesting problem a lot of times. A lot of times it's just kind of a day-to-day kind of work you do. So I feel like the open source stuff that I do tends to be kind of that more esoteric, unusual stuff. So like this, for example, like Lightstream, it's one of those problems that I've always had where I don't want to have like a complicated application deployment. I just want to use SQLite. How do I make that happen? Like what is the thing that's stopping me from doing that in a production app? And, you know, I could never write Lightstream for a company because that just sounds ridiculous.

Starting point is 00:22:08 There are other solutions out there that people could run Postgres, or you could run on RDS on Amazon. There's not a justifiable reason to build Lightstream in a company. So that kind of thing. I worked on a project before that where I ported over a tool called Klee, K-L-E-E. It's this crazy code execution tool where it basically like it'll analyze your code and go through kind of all the paths of the code. And you can like do things where you can generate test cases from code. And it has like a solver in the back end.

Starting point is 00:22:41 Anyway, it does all these kind of crazy things. I ported that over to use with Go. But like, you know, I spent a ton of time on that. I anyway, it does all these kinds of crazy things. I ported that over to use with go, but like, you know, I spent a ton of time on that. I released it, but I never, it was never really finished. I just kind of wanted to try these new things and kind of experiment and,

Starting point is 00:22:52 you know, push my brain in different ways, you know? So it's, it's really like an intellectual kind of interest. That was a long answer. This episode is brought to you by our friends at Retool. Retool helps you build internal tools fast and easy.

Starting point is 00:23:29 From startups to Fortune 500s, the world's best teams use Retool to power their internal apps. Assemble your app in just a few minutes by dragging and dropping from pre-built components. Connect to most databases or anything with a REST, GraphQL, or gRPC API. Retool empowers you to work with all your data sources seamlessly in one single app. Retool is highly hackable, so you're never limited by what's available out of the box. If you can write it in JavaScript and an API, you can build it in Retool. You can use their cloud service or host it on-prem for yourself. Learn more and try it free at retool.com slash changelog.

Starting point is 00:24:02 Again, retool.com slash changelog. Again, retool.com slash changelog. So one of the reasons that you say you do open source is because of the potential impact of your code. I think it's a great reason. I think it's one reason why lots of people do open source. And it's pretty cool to see, like you said, your little database, your little key value store, you know, like inside Kubernetes, powering all these deployments. It has to be satisfying. Was there any fear or trepidation or concern that maybe this decision around no code contributions

Starting point is 00:24:43 would limit Lightstream's impact? I was probably a little naive with it. I thought nobody would notice, to be quite honest. Nobody would notice Lightstream or nobody would notice this policy? The contribution policy. I mean, I thought some people might when they try to open a PR, but I didn't think it would become a big topic. So actually the thing that I worry more that would limit people, this is actually the first project I've ever used a GPL for. Um, and yeah, I'm still not sure about that decision. I mean, I think I haven't had any blowback. I was surprised I haven't had any blowback about that, but yeah, I think not being able to embed it or just, I don't know, people get weird about

Starting point is 00:25:19 copy left licenses. What drove that decision? It's weird. The little things that kind of change your mind. Like I've always written libraries. So like libraries, especially in Go, like you pretty much have to have a very open license like MIT or Apache. And this is kind of the first command line tool that I ever wrote that kind of runs separate from the application.

Starting point is 00:25:41 And Mike Perham, I think that's how you say his name, Sidekick, he had a tweet like years ago that just always stuck in my head. And it was basically, I think that's how you say his name, Sidekick, he had a tweet like years ago that just always stuck in my head. And it was basically, I think it was kind of trolling a little bit, but he was basically saying like, you know, if you don't license a GPL, you just don't care about your code or like don't care about, I mean, he was being trolly, I think it's a little bit in jest, but that kind of sat with me and just like, you know, if you don't really control the, you know, what happens to your code and where it goes and what people do with it, you know, you kind of limit the ways you can grow that project. And, you know, I think there are, again, around like sustainability.

Starting point is 00:26:14 I think that I guess my biggest thing with sustainability is that it feels like I know GitHub's recently ordered like, or added like corporate sponsorships, but a lot of it's always focused around like individuals contributing to other individuals doing open source. Whereas like really the people that benefit the most are, you know, these bigger companies that could easily spend a thousand dollars a year, whatever paying for some library that really supports their business. And I think having more control around the actual license and what people can end up doing with it, um, I think can really kind of shape, you know, that

Starting point is 00:26:51 conversation more that that makes sense. I'm not trying to sound too greedy or anything, but I'm just, I'm really just, I find that kind of be a fascinating direction that I've never really taken before. Have you read this license end to end? Ben just curious. The GPL. Yeah. I don't know if I've read-end, Ben? Just curious. The GPL? Yeah.

Starting point is 00:27:06 I don't know if I've read it end-to-end. I think I've read it at most parts of it at one point, but I probably should. It's a long license. Over the years of having so many more conversations about licenses, I find myself actually reading more and more. Now, I haven't. I have to admit, I haven't read the gpo end to end but i don't have any code out there that has it adopted as its license so at least i'm clear there and you're

Starting point is 00:27:29 not but yeah but i'm just curious that's totally a good point when you choose this license and you know you mentioned mike param and that tweet and you know whether it was in jest or not like what what specifically about this license like what clauses made you think, okay, this is suited for Lightstream? Sure, that's a good point. And I will say, I tend to defer to people that probably know more than I do. And I'll read summaries on a license more than I will go word for word into license and pick it apart because I'm no lawyer. I think the ideas around if you're going to use this code, or not even just use it, if you're going to take the code and change it around, like, you know, I think that that stuff should be put back in the world for the benefit of everybody.

Starting point is 00:28:15 Like, and I think that, you know, the one thing I don't like about it with libraries is like linking this tool into your code suddenly means your code needs to be GPL as well. And that seems ridiculous to me. Whereas Lightstream is pretty isolated. It's a single binary, runs next to your application. And any changes to that should, I would assume, probably be helpful ultimately to the wider community out there. And yeah, I would welcome,

Starting point is 00:28:43 if not even the code contributions from that, just simply the ideas around what people are changing about it and putting that back out there. So you want to make sure that whatever value is there currently or could be derived from the future, whether it's you changing it or someone

Starting point is 00:29:00 else changing it, you want to make sure that future public use, the open source spirit remains with the software. Yeah, that's basically the idea. Yeah, so if I adopt it at my company and then we invest labor hours into making Lightstream 10x faster

Starting point is 00:29:15 or I don't know what sort of metric you would improve it, right? I'm a 10x engineer, so I 10x it. As soon as you touch the code it just goes up. That's right. I actually just put a comment in there that says 10x and, so I 10x it. As soon as you touch the code, it just goes up. That's right. I actually just put a comment in there that says 10x and then I...

Starting point is 00:29:29 You'd want that to be out in the world, right? Even if that code's not going back into Lightstream, you may look at that and be like, oh, that's a clever thing Jared did. I can't believe he's such a good 10x-er. And then you might just pull that in. But if I didn't have to do that, we could just keep it for ourselves

Starting point is 00:29:45 and Lightstream wouldn't benefit and the world wouldn't benefit. Exactly, yeah. Let's be honest real quick, Jared. You're not a 10Xer. No, I'm not a 10Xer. You're a 11Xer. Oh, thank you.

Starting point is 00:29:57 You called me out. Insofar as I can multiply things by 11. That's right. Yeah. But that's as far as I'll go. So you were concerned that GPL would limit it. You were so concerned that the non-open code contribution would be a limiter.

Starting point is 00:30:16 But it sounds like, at least in terms of an open source project, it's off to a great start, wouldn't you say? Lots of attention, lots of people looking at it. So far so good. I haven't had anybody push back on the GPL. I think one person asked about it, and then I told them why, and they just said, okay, and they moved on. So that's been good to see, for sure.

Starting point is 00:30:34 You mentioned that you made this choice somewhat naive, in a naive way. And I'm curious if there were alternative options. Because I suppose you could not put it out there so explicitly, it's open source but no contributions. You could just simply just not accept pull requests, which is kind of what GitHub forces you to do now, right?

Starting point is 00:30:55 Because even though you've made this declaration and you're reading it, it doesn't mean that the tooling has supported your desires for keeping your pull requests closed. You're still sort of stuck with that. Yeah, for sure. And I've had folks from GitHub reach out over DM on Twitter asking what they can do to help support the project.

Starting point is 00:31:15 And honestly, I just asked for being able to check off the, to hide the pull requests, just not allow those. I think that'll go a long way. And it sounds like it's something they've definitely debated and they've talked about. And I'm sure there are nuanced reasons why they can or can't do that. And maybe it's coming in the future. But I'm not privy to those decisions.

Starting point is 00:31:37 But I think it would unload a huge burden on people if they just didn't have to think about that. And closing PRs after the fact is like just the most soul-sucking thing to do where it's like someone might have put in so much time into pr yeah and then you have to just like close the issue or close it and be like i'm so sorry like i can't i can't take this because it's not what i'm looking for like you know so i'm trying to be as explicit as i can without before someone really puts all that time into it. Yeah.

Starting point is 00:32:07 It's really difficult, I suppose, in the world of open source to not waste someone's time without some sort of explicit visual cue. I would imagine if you have a repository that does not have pull requests, which is sort of, I can remember when, you know,

Starting point is 00:32:21 back in 2008 when GitHub first launched, like that was the cool thing. Like PRs is the DNA of GitHub essentially. So if you take that away, I can remember back in 2008 when GitHub first launched. That was the cool thing. PRs is the DNA of GitHub, essentially. So if you take that away, to me, you'd need to be visually far more clear on a repository that that's not an option. Maybe a red banner or something. I don't know, just something very different,

Starting point is 00:32:40 starkly different than every other repository. Or that skull and crossbones emoji. There you go. Non-shelf ass. I totally agree. It's almost like how the license shows up on there. It'd be nice to have some kind of I guess it's community guidelines.

Starting point is 00:32:58 I don't know. There's something I feel like they could do to basically say we're welcoming but not that welcoming. Right. What's the most polite way to say PR's not welcome? Yeah, I don't know, we're welcoming, but not that welcoming. Right. I was going to say, what's the most polite way to say PR is not welcome? Yeah, I don't know, man. You may have done it. It's definitely been an interesting thing to tiptoe around.

Starting point is 00:33:13 How do you convey to somebody that I value your input, just not the code that you're giving me? That's a nuanced thing that I don't think I accomplished very well. Right. And you're touching on it. That's exactly why I thought that it would make sense to talk through this with you, because we've had, you know, I would say the luxury of knowing you for many years now. Not like buddies hanging out on the weekends, but we've known you for many years, and I know where your heart is at, or at least we have a direction of where your heart's at

Starting point is 00:33:41 with open source in the community. And, you know, a passerby, a brand new person to Ben Johnson in your code and who you are may not understand the nuanced reason of why you would make this choice. And I think that's good luck to GitHub and the interface designers there to encapsulate what this podcast may convey well or not so well in a button or some sort of visual element, it's going to be difficult. No, yeah, I think the podcast is a great medium to convey that.

Starting point is 00:34:13 Even a blog post is not going to – people are going to read that in different ways. Hopefully I don't sound like a d*** on here, but it's easy to come up with that way with just written text. Yeah. I'm just enjoying the thought of GitHub putting our podcast in a button somewhere. When you click the button, you just have to listen to this conversation. It's like, here, this is why he's doing it, all right?

Starting point is 00:34:33 There you go. Going back to the GitHub features end of this, you could use an issue template, but is there a PR template? Is there anything in between the person and their pull request besides your readme where you could inject a thing that says,

Starting point is 00:34:47 hey, don't do that. Like, don't waste your time. There's a PR template, which I have the same kind of paragraph about why I don't take pull requests. And, but again,

Starting point is 00:34:56 like you have to get to the point of finishing the code and pull requesting it to actually see that. So, I mean, in that sense, like, you know,

Starting point is 00:35:03 the person still has wasted their time. I don't see that pull request that I have to like you know the person still has wasted their time i don't see that pull request that i have to then close which probably makes it easier for me but like it still hurts that somebody may have put time into that yeah you almost want it like on the fork button you know like when you click fork it might tell you at that point that yeah you can fork this project but know that yeah that'd be nice yeah because that's usually the first step that i would do is fork it right yeah unless you're doing an edit like to the read like a typo edit on the readme inside the github web interface yeah something like full screen banner that comes across and just says

Starting point is 00:35:36 you're not taking this code back so once you're right so what would happen if somebody came up to you and like you just misspelled something in your readme and they just they just did it anyways are you going to close it and be like i'm going to commit the same change with my own signature it's something i definitely struggled with i have in the argument of slippery slopes but like it is one of those things where like i don't have a problem with small pull requests like those little tiny minutiae but then there's going to be somebody who instead of changing a word they change like the whole sentence. And maybe it just reads in a weird way. It's just not what I'm trying to do. Or then maybe it becomes a small code change, but then that still grows.

Starting point is 00:36:13 And I don't have a perfect answer for this. This is really an experiment. So I don't want to come across saying, I know that this is the best way to do open source out there. And it definitely has its flaws. And this is a perfect example of one. Yeah, I mean, in that sense, it's definitely hard. It's tough. Yeah, because it seems so petty to be like,

Starting point is 00:36:31 actually, I'm not going to accept this, because I don't accept them. But once you accept one, now your list of contributors is two people, and you can't go back on that, and now it's like, I don't accept contributions. What about that person? Why'd you accept that one? It's like I don't accept contributions what about that person why did you accept that one

Starting point is 00:36:46 now I have to have this conversation every couple of weeks or whenever it happens or even back to the license you mentioned GPL being good now for some reason you change your mind every contributor is a liability to a veto to that change

Starting point is 00:37:03 yeah you gotta do like a CLA a contributor license agreement. I think that's what it stands for. And then that becomes a whole thing. And I don't know, it's just, I really appreciate people pointing out the typos and whatnot, but just the amount of overhead just seems weird compared to the actual value of the change.

Starting point is 00:37:21 This all comes back to the scope of Lightstream, though. The scope is limited, and that's why you want to maintain control. It's also influenced by past interactions in open source and your work. It's a culmination of many things that isn't just simply, I prefer my code, not yours.

Starting point is 00:37:40 That's not what you're saying. You're saying, I want to be the contributor to it. I have the best code. And you never make a typo. That's how what you're saying. You're saying I wouldn't be the contributor to it. I have the best code. And you never make a typo. That's how you solve that problem. No typos. Well, you're not saying it condescendingly is what I mean. You may be saying that, but you're not saying it as like, you suck, I'm

Starting point is 00:37:56 better. It's more like, I just have a preference here. It's your prerogative to feel that way, Ben. I think everyone generally prefers their own code, but I think there's definitely something around continuity of code. Like, you know, if I contribute to somebody else's repo, my code, even if I really try to follow their code style, it's going to be a different approach.

Starting point is 00:38:16 It's going to be a different just way of doing things, which, you know, it's going to be that one section of code and their whole code base that just works a little bit differently, and they've got to kind of keep that in the back of their head. Or they can come in with your PR and change it around to the way they would do it, that kind of refactor. Let's be super explicit then. If someone on the GitHub team is listening,

Starting point is 00:38:38 what exactly is your request to make the way you want to run this operate? Is it simply turn off PRs and hide the button or hide the fork or do some of the things that Jared mentioned? What's a good suggestion? I think hide the PRs is probably number one. I think some notification when someone tries to fork, I think it would be awesome as well. I think that's a great idea.

Starting point is 00:38:58 But beyond that, I think the discussions are a great direction to move the conversation away from code and actually the use of the tool and how people use it. That's really the thing that I miss out on for a lot of things is use cases and how you use it and the workflows and stuff like that. The code in Lightstream, you're not going to be blown away by it. I'm not doing SIMDd crazy whatever um coding in there it's just you know ifs and for loops and whatever you do in code so like the real value of it i feel like it's when you actually apply that stuff and how that looks in the real world so i think like discussions go a long way in that and then i would say my other request to github

Starting point is 00:39:41 and we talked about this briefly but but they do corporate sponsorships now. I actually, I really wish they would allow you to only take corporate sponsorships. I feel really weird about taking money from other individual developers. And I actually, I don't do sponsorships for that reason. And actually, if you really want a wish list beyond that, I would say, I think that there is this idea that corporations should come along and benevolently support projects. It's in their own self-interest, for sure, but it's definitely a charity. And I don't think that's the right way to frame it. I don't think you're ever going to convince a large swath of companies to support open source without really giving them something direct and tangible in value. So I know it's a contentious idea,

Starting point is 00:40:31 but some idea of giving priority support to some corporate sponsorships or giving some additional benefits that you can really give to a company and say, hey, if you sponsor this thing for, I don't know, $100 a month, then you get these benefits. You can do that outside of GitHub. There are ways of doing that. But I think to streamline it inside of GitHub would be really powerful. I think that would really motivate a lot of open source contributors. I think the framing of the sponsorship is really where it gets, as you said, weird.

Starting point is 00:41:00 Even I would say at a company level, I would personally much prefer it if you just offered a product, and that one product was just simply support, and it was only open to corporations or businesses, LLC, corporation, whatever you want to be, just not an individual. So an individual software maintainer like yourself doing business with corporations, and I might personally prefer to just do the business with that business personally rather than leverage github but i think if github could you know produce tooling the framing of it being sponsors or github sponsors like that's where it gets in my mind weird like even for us we as a podcast network and a podcast business, a media company, we sell sponsorships. But once we pass that threshold of like relationship, we begin to call them and treat them much like partners because we're not looking for sponsors and transactions. We're looking for people who care about us as a business, the community we serve, which is software developers, and that's I think, you know, they get in the door with the word sponsorship, but we soon

Starting point is 00:42:07 after help them understand our own lexicon, which is, you know, we treat you like a sponsor or a partner and not so much like a sponsor, and at that point we prefer you to not be transactional and prefer to lean on relationship, but I'm kind of going in the weeds on our own business, but

Starting point is 00:42:23 that's, I think the word sponsorship is gets yeah a little murky in my opinion no i think that's totally fair yeah whatever you want to call it i think that the biggest hurdle i think that github can help with is that you know companies tend to have you know these painful procurement processes where you have to invoice them and it has to be whatever i think to be able to streamline that piece i think would help like yeah the idea of most developers you know going through procurement processes for every company i think seems overwhelming i would be happy to pay github 30 or whatever you know typical app store fees are to manage that kind of stuff to provide tooling around that. I would have no problem giving that money away to them rather than having to kind of side channel all that stuff

Starting point is 00:43:10 through a website I have to build or some tool I have to use outside of GitHub. That makes sense because if they can knock down all that red tape, all that minutiae in the process, the bureaucracy of that buying process, PO numbers and accounts payable. It can be a nightmare if you have no patience or you don't want to spend your time there, which I would imagine you would just much rather write code or handle non-existent pull requests or hang in discussions or whatever.

Starting point is 00:43:44 I'm just kidding. But that would be a better use of your time. And if GitHub could level the playing field globally at a corporation level and remove that red tape and make it as easy as just a relationship thing rather than saying, let me ask my accounts payable department, let me talk to my boss. We've already, GitHub's already sort of leveled the playing field and made corporations who do want to pour back into or buy these kinds of would-be products from open source developers like yourself. That would be pretty cool.

Starting point is 00:44:14 And I don't have all the answers, so I'm sure there are reasons that is a terrible idea. But I think normalizing companies paying for some kind of product on top of open source, especially support or other things of that ilk, I think are a good direction for sure. I'm curious, Ben, if you've been to the SQLite website much or read much of their documentation lately. I have read all their documentation. Did you read their copyright? Is it the public domain? The reason why I ask is because they say something very similar to you.

Starting point is 00:44:46 They say open source, comma, space, not open contribution. So even a lot of language is very similar on that front. And they say a lot of what you've said. So similar stance at least. Yeah, yeah. I think I pulled in some of that from the readme as well. I think I tried to reference that. But it definitely did influence some of that from the readme as well i think i tried to reference that okay but it's uh it definitely did influence um some of my thoughts around it so i don't mean to like

Starting point is 00:45:08 discount anything that you know brought into the conversation i'm not trying to i'm just trying to draw similarities yeah yeah and i think that they do it mainly to kind of keep the copyright clean exactly and that's definitely part of what i'm doing i think my main focus is more mental health and just you know keeping a really tight scope, which I don't think necessarily applies for SQLite. I think they can broaden their scope quite a bit. Have you gotten the call yet? The call from Mr. Hip himself? That's right. No, yeah. They actually reached out pretty early on. We did a conference call with them. They were super nice. Yeah, they got on and we kind of walked through how it all works.

Starting point is 00:45:45 And I was fully expecting them to think that I have done some unspeakable, terrible things to their database. But they were quite supportive of it. So I really appreciate that. Yeah. Well, you know, Richard, even when he was on the show, he talked about, you know, essentially what you said in why you built Lightstream was SQLite is kind of touted as this toy database and not taken super seriously. And obviously when Jared and I had him on that show, I forget what episode that was, but it was 201. Yeah, episode 201.

Starting point is 00:46:15 Great episode. And just a whole different side having had that conversation with Richard about SQLite and how it's used and even the business model behind it and how they run it. And I just drew some similarities, I suppose, to the challenges you have. And they had some pretty expensive prices on their pro support page, which is they've been able to make money from events. I'm hopeful for you, at least. Yeah, I appreciate it. You may be able to be in their stream, so to speak.

Starting point is 00:46:47 Yeah. Pun intended. And I've actually, I won't say who this was, but I had a conversation with somebody who was a CTO of a VC-backed company, a database company. And he had talked to Richard Hitt before. And essentially, they run their group.

Starting point is 00:47:06 They make money through, i think uh memberships and you know i don't know how i have no idea how much they make and but the the guy was talking to him like you know how much you know you know how are you guys doing all that stuff and kind of like asking his approach to to doing it that way instead of going the vc route and raising a bunch of money and doing a big exit. And my understanding, this is again secondhand, but that Dr. Hipp basically said, how much time do you spend coding at your company? And the guy at this VC-backed company is basically zero.

Starting point is 00:47:38 He's just kind of management and talking to VCs and talking to investors and whatnot. And Dr. Hipp basically said, he gets to spend every day. He gets to code. And he's like, Dr. Hip basically said, you know, he gets to spend every day, he gets to code. And he's like, that's, and that's kind of like, that's kind of my end goal is like, I would love to be able to get in a place where I can just work in open source. And like, I don't have any interest in raising VC money. You know, if there's something that it would really help with, sure. But like, at the end of the day, like, even if I thought about this, like, if I made $100 million, like, I don't see my

Starting point is 00:48:05 life changing significantly other than I would just spend my time working on open source. All the time. I don't love like yachts or like fancy cars or anything like that. I just like, you know, solving problems that I find interesting. So I think my long-term goal would be somehow to make it a sustainable thing that I could just work on in that sense. So that'd be my goal. This episode of The Change Log is brought to you by Render. Render is a unified platform to build and run all your apps and websites

Starting point is 00:48:41 with free SSL, a global CDN, private networks, and auto-deploys from Git. They handle everything from simple static sites to complex applications with dozens of microservices. If you're a developer or a founder that's frustrated with AWS's complexity or Heroku's high costs, you owe it to yourself to use the $100 in free credits they're giving our listeners to give Render a try. Render is built for modern applications and offers everything you need out of the box. One-click scaling, zero downtime deploys, built-in SSL, private networking, managed databases, secrets and configuration management, persistent block storage, and infrastructure as code. Heroku customers running production and staging workloads typically see cost reductions of over 50% after switching to Render. Here's the best part. We work closely with the team at Render

Starting point is 00:49:30 to ensure you have zero risk by giving you $100 in free credits. Plus, they're going to assign a world-class engineer to your account to offer guidance and answer any questions you have. When you're ready to transition your infrastructure, they'll be there to help you with that too. Automate your cloud hosting with Render at render.com slash changelog. Get $100 in free credits to try the Render platform, plus a world-class engineer assigned to your account to guide you along the way to send an email to our special email changelog at render.com to get access to those free credits. All that begins at render.com slash changelog.

Starting point is 00:50:20 So we've been talking for a while about open source, but let's talk about the software, shall we? Yeah, sure. So the project is Lightstream. So it's L-i-t-e stream as you know s whatever however you spell that um stream yeah s-t-r-e-a-n there we go yeah say it together there you go it's a way to basically if you have a sqlite database you know you want to deploy your application on you little tiny $5 a month VPS, and you want that to run. It doesn't need to be the biggest scale platform in the world, but most apps can probably run on a $5 VPS running SQLite. But the problem is that if that VPS dies suddenly, then all your data is gone too.

Starting point is 00:51:01 So the idea with Lightstream is you could do backups every hour, every day, but then you're losing an hour or a day of data if that happens, if you lose that VPS. So what Lightstream does is it basically runs separately outside of your application in a little process and continuously pulls in changes from your database and streams those out to S3, like an Amazon S3, like an object store, so that you're never losing more than a couple seconds of data

Starting point is 00:51:30 if your VPS just dies catastrophically. That's the idea with it. So that's kind of where it started, and that's largely the use case I'm looking at. But there's been a lot of really interesting use cases coming from other people where they're like, hey, can I run this thing? But I actually want to have a bunch of read replicas too.

Starting point is 00:51:48 So it's really a way that you could scale out SQLite, which is kind of a weird idea. Yeah, that's kind of a weird idea. Yeah, and that's not in there yet, but that's definitely on the roadmap right now. And I've had other people that are interested where, actually, there's been a lot of interest around this whole idea of the JAMstack, where I've never really gotten into the JAMstack stack so please correct me if i'm totally wrong in this but a lot of people they'll take the data that they have and they basically generate out the pages and post those on a cdn so that you know you put those on a cdn and then everyone in the world gets kind of a local copy of that page and it's super fast and super responsive but then if you

Starting point is 00:52:23 take that idea and you instead of generating all your pages, you just have read replicas around the world on these tiny $5 a month VPSs, you could have a global application where you have 100 millisecond or less latency between you and the server for everyone in the world because you're replicating it out, which is kind of a weird idea.

Starting point is 00:52:46 There's actually a service, I haven't used this yet, so I guess I'm plugging them, but I cannot vouch for them, called Fly.io. It's kind of like a Heroku. They have persistent disks available as well. But you can run those things

Starting point is 00:53:00 for a couple bucks. And I think they have like 20 different regions where you can deploy out to. So really, you could run this kind of like as a serverless platform, basically. But you can run the serverless platform for 40 bucks a month, and you're running globally around the world, and your users get these super fast latencies. So there's a lot of potential for where Lightstream can go. Sorry, that was a really expounded answer. But the idea is really, in a nutshell where Lightstream can go. Sorry, that was a really expounded answer.

Starting point is 00:53:27 But the idea is really, in a nutshell, Lightstream is meant to let you run SQLite in production. Right. And kind of whatever way you want to look at that. Well, let's loop back around to the JAMStack bit, because that is interesting and a conversation that's been somewhat ongoing on the show. Maybe even more so on JS Party, but I want to loop back around to that. Let's just start with SQLite in production.

Starting point is 00:53:49 First of all, I'm a fan, a SQLite fan. But I do tend to reach for Postgres when it comes to production. I don't know if I do that because I just feel like SQLite's just not made for production. We do use it, I guess, in one production capacity for ChangeLog Nightly. It's what backs ChangeLog Nightly, but that's basically a batch process that runs nightly and sends out you know

Starting point is 00:54:08 does some processing sends out emails and persists you know its state in sqlite but it's not like a web server that's getting hit by hundreds of requests a second and all that and i always thought like sqlite was cool and all and for specific things like in your phone it makes sense, but would you run it on a VPS with a web server front end? Aren't there concurrency issues with SQLite or anything like that that you wouldn't want to do it? It does run multi-threaded.

Starting point is 00:54:39 So I write Go. That's my language of choice. And I've written projects in SQLite. And I will say, I guess, a few things on that topic. It does well multi-threaded. I can run thousands of requests at this VPS at a time. And the fact that you can actually run a request. And I've done testing where I've had several queries run on an HTTP request.

Starting point is 00:55:03 And the total time, and this includes rendering out HTML as well, the total time to connect to the queries, pull that back, render out the front end was about 50 microseconds. The way that you develop, I find, with embedded databases tends to kind of change your mindset a bit. I have this theory that all databases are actually the same. The only real difference that you have among databases is that is latency so like once you have a client server situation you can't you know you have issues like n plus one queries so

Starting point is 00:55:35 really you want to optimize to get as much of your data back in a single query as possible and you have to do joins you have to do all kinds of, there's a lot of stuff around ORM tools, where they kind of like try to batch together requests. And it's always a pain in the ass. And, you know, that query language is what kind of really makes the difference. So, you know, if you have graph data, you want to have a graph language. If you have document data, you want to have a document language. SQL, you know, works on relational tables.

Starting point is 00:56:02 But once you actually move all the storage locally into the same process as your code, you really don't even need those separate languages. I mean, they can kind of help from a usability standpoint, but from a performance standpoint, you could just as easily look up your individual traversed graph nodes locally using your own language versus the actual query language itself.

Starting point is 00:56:22 Does that make sense? That's a bit esoteric. To a certain degree. Yeah, so I mean, like underline pretty much all databases, you know, there's some exceptions, but I would say most use a B tree and that's kind of, you know, you have a thing that you store according to a primary key. And that's true in a document store and a graphs database, pretty much all databases use that kind of underlying format. So it's not that I'm particularly in love with SQLite. I think it's a good database, but at the end of the day, it's a B-tree that has some nice little SQL on top of it

Starting point is 00:56:53 that make it a little more usable. In that sense, it's a bit of a rant, but I think once you move the data locally, then it really changes how you approach the database. So what makes SQLite different than BoltDB, for example? I mean, they're both similar foundations, but is it the query language? Yeah, I mean, query language, I think the... I've built applications on top of BoltDB,

Starting point is 00:57:15 and there are a lot of things I really like about it. I would say the biggest thing that you miss that's really nice about having something like SQLite is that you're separating out your code, kind of almost like your code schema from your data schema, where you might change your, say, for example, you change your application, you add a new type, or maybe you split off some type in your code

Starting point is 00:57:37 into two separate tables. Does that make sense? And then you go to deploy that, but if your code is very much tied, or your underlying data in your database is tied to the structure of your code in your application, then it makes it really tough to transition between versions of code.

Starting point is 00:57:55 Because when you deploy it, your data is still in that old format. So having that declarative schema and being able to change that kind of separately from your code, actually I found to be super nice. And just little things like indexes and foreign key constraints. So really pretty simple things. Right.

Starting point is 00:58:14 I don't use any crazy features. I mean, there are use cases for, you know, Postgres has all kinds of crazy features you can use. It does. But at the end of the day, I use 99% of my code. It's just some select statements and some DDL. Gotcha. So you can go concurrent with it via threading. And because it's embedded, I guess you don't have the network connection

Starting point is 00:58:39 set up and teared down, so you're not worried so much about badging or pooling connections, right? Because it's not connections. It's just like the same process in memory. The only problem is that it's just sitting right there inside of your binary and you don't

Starting point is 00:58:56 have it backed up, but now you've got that solved with Lightstream. That's kind of the idea. When I thought about what was the thing that was keeping me from running SQLite in production, replication and disaster recovery was really kind of the main thing. And I actually spent a long time trying to figure this problem out. The code itself isn't even huge.

Starting point is 00:59:16 You can open up the code. It's not going to blow your hair back or anything. It's not that fancy. But trying to figure out how to actually make it happen was like a long journey where i originally actually ported sqlite to go like it's kind of a thing i do where like i don't understand code until i really work with it and kind of move it around and the idea wasn't necessarily to like release that code but just really to try to understand what was going on underneath and you know i did that and then i actually moved on i

Starting point is 00:59:45 tried to do uh do you know fuels a fuse file system is it's like a network mount thing uh sort of it's like a you can build your own file systems in in linux basically and with fuse and it's this weird uh it's like if you wanted to make a file system of all your github issues you could have like this intermediate binary that kind of interacts between your unix commands like ls and whatnot and then your binary can translate those commands into okay you know github calls or something like people do all kinds of weird things with it so i kind of built like a i tried doing an intermediate fuse file system where it kind of intercepted the rights to sqlite and replicate those that would kind of intercept the writes to SQLite and replicate those. That was kind of overly complicated. And then

Starting point is 01:00:27 the actual trick with Lightstream, the thing that actually makes it work is that, so there's a write-ahead log in SQLite. I don't know if this gets too much in the wheeze here. But every time you write to the database, it doesn't write to your data file. It writes to this write-ahead log. And those

Starting point is 01:00:44 writes kind of, they're just append only. So they keep getting tacked onto the end of your write-ahead log. And then eventually, you know, that write-ahead log gets too big. And it has to do a thing called checkpointing, where it essentially moves all those pages from your write-ahead log back into your database. And the issue that I had originally is that I didn't have any control over when SQLite would checkpoint and move that stuff back over. And that's kind of the key.

Starting point is 01:01:07 You don't want your underlying data through your wall file to disappear because that's what you're replicating from. But SQLite has this little caveat where it actually can't checkpoint if there's an open read connection on the database. So Lightstream actually keeps a persistent read connection on your, or like transaction on your database at all times and has some tricks around when to release that and checkpoint back and it kind of takes over that checkpointing process so lightstream essentially controls that whole process and is able to capture every wall frame like wall right that goes in and then can ship those off to s3 so when you you take the kind of sum total of all those rights and you replay them then you basically get your your database that you uh your end state of your database

Starting point is 01:01:52 does that make sense i know that's yeah so it's kind of like i wouldn't call it like hijacking that right ahead log or it's kind of like forcing it to be there long enough that it can piggyback the data over and then it flushes? Yeah, and the wall basically acts like a circular buffer. So it kind of goes to the end and starts back at the beginning. So it essentially just keeps track of the end of that and tails it, more or less. And that doesn't degrade the performance of the production database at all?

Starting point is 01:02:20 No. And actually, when you're running it, Lightstream uses almost no CPU at all or anything. And actually, when you're running it, Lightstream uses almost no CPU at all or anything. It's pretty low overhead. Most of the stuff is in the OS page cache anyway,

Starting point is 01:02:30 like the data itself. So you're not really even doing much disk access. And yeah, there's definitely some optimization still to be done, but you generally

Starting point is 01:02:39 shouldn't see, you shouldn't really notice Lightstream running. Have you tested it against larger, large databases like, you know, gig notice Lightstream running. Have you tested it against larger databases like megabyte, gigabyte sized SQLite files?

Starting point is 01:02:51 I have a VPS running at all times. This is the one thing that actually gives me confidence around Lightstream. There's two different kinds of replication. You can do logical replication, which is where you say someone submits an update X know, for all your records, and you're storing kind of that command of how to make the change. And then there's physical

Starting point is 01:03:13 replication, which is what Lightstream does, where every page that gets written, we actually replicate that whole page. And then we can replay those pages to build the database. So what Lightstream is able to do is that it can actually check some build the database. So what Lightstream's able to do is that it can actually check some of the database. So you can do basically an MD5 hash on the database at a point in time, and then it'll replay the replica from S3, and those two should match byte for byte.

Starting point is 01:03:36 So there's a VPS I run that actually constantly pulls from the GitHub archive. So it's just pulling in events from there, and building, pushing them into a database, and then every hour or so it actually pulls down the replica, replays it all, ensures that they're byte for byte

Starting point is 01:03:52 matching exactly. And yeah, it does great. I haven't had issues with multi-gigabyte databases at all. Cool, and it just keeps growing. Yeah, it just keeps growing. Growing, growing, growing. It's kind of like what Changelog Nightly does only we're not storing the actual events another little interesting bit is like s3 is super cheap like the you get billed for a couple different things you get billed for

Starting point is 01:04:15 the number of files you push up there like the actual request itself but you don't actually get billed for the the bytes that you push up like you can send up a 10 gigabyte file but you only get charged for a single put request it's only when you download the the data that you really incur much charges so i think the the put requests i think cost like five thousandths of a penny or something like that for each request so you can essentially run you can run light stream um where it's you know pushing up every about 10 seconds and it costs you about $1.30 a month. And because you don't have the overhead,

Starting point is 01:04:50 like you don't get a cost incurred per byte sent up, you really have minimal costs in that realm. So it's a weird, like super cheap backup strategy. That doesn't seem like it should work, but the actual economics work pretty well. Although the VPS that I run to continuously verify it that doesn't seem like it should work but the actual economics work pretty well although the vps that i run to continuously verify it does actually cost a little chunk of change because it's constantly down gigabytes of data so right so you're just replacing the same file over and

Starting point is 01:05:16 over again versus proliferating files right is that why it's a single put no it's actually so it's doing a new put for every new chunk of wall rights that gets pushed up. It'll snapshot it periodically as well. You generally have about a fixed size of data that you're pushing up. SQLite files tend to compress really well. B-trees in general do. They tend to have a lot of empty space. The actual monthly cost of the gigabytes tends to be pretty trivial too.

Starting point is 01:05:45 It's also a weird thing too, where people I've had people ask me like, if I'm going to start a business around this thing, um, and I've had interest in VCs and whatnot, but like it has this, um, this thing where it like almost shoots you in the foot where it's like so cheap and so easy to run that like,

Starting point is 01:05:58 I don't think I can't think of a service that would actually make it like easier or cheaper or like better necessarily like I could sell. So that's been a, it's, it's worked out great so far, I could sell. So it's worked out great so far, but not from a money-making standpoint. That's not really... Scale so well that you can't sell it. Or make a service that makes it better.

Starting point is 01:06:15 It is one of those things, yeah. In that blog post, though, where you talked about why you... I think you said it's titled Why I Built Lightstream. You mentioned about scaling. Can you talk about scaling a little bit there? Because I'm sure that once you've proved it's stable and usable and you can actually use it, at some point you're going to rely upon it

Starting point is 01:06:33 more so than just simply a Greenfield application. You'll need to scale to more CPUs, more RAM, more servers. Sure, yeah. Talk about that. Yeah, so I think scaling is an interesting topic in our field. I feel like it's been an obsession over scaling and uptime, I think, that have kind of gone off the rails over the last 10, 20 years, where we have this idea of everyone tries to build their application to be the next Twitter or whatnot, or people worry about,

Starting point is 01:07:03 what if I have to scale? It's crazy in whatever amount of time. And generally, that's not the case, first of all. But given Moore's Law, where we are seeing exponential increases in compute that we have available in a single box, but for some weird reason, we keep having this exponential scaling of the number of nodes we actually need to run to run applications seems backwards to me like we have you know we have nodes on amazon where you can get you can spin up a 96 core box for you know however much money a month but that's a lot of cores like each one's doing 30 you know 3 billion operations per second you know we should be able to run you know a couple

Starting point is 01:07:42 hundred hdb requests to that so as far as as the scaling piece, I find that most people, if you're running a local SQLite database, you're not going to hit those scaling concerns. Actually, one scaling concern I find people actually hit is things like Postgres tend to have a high overhead for connections. So you end up having to put in something like pgBouncer in between that can actually start to pull those connections to not overload Postgres. Whereas you just, you don't get that when you have an in-process database. So, you know, from that standpoint, it's great. I would say that, you know, if you're running application, you know, again, I write in Go, it's super fast language and running locally, I can run, you know, I can push through thousands and thousands of requests per second on pretty modest hardware. And I think that really covers probably 90% of applications out there that people are going to write. And even if you don't use SQLite for your main company's application,

Starting point is 01:08:40 there's probably a ton of applications in your company that are on the side or periphery that don't need to be and you know some huge kubernetes cluster so i'll say that on the scaling side and then on the uptime side i feel like people have this obsession around uptime but i feel like the more tools that people add and i don't really mean to rag on kubernetes all the time i do but i think it has a tool that has an appropriate use case, but it's not the vast majority of people's use cases. I think that from an uptime perspective, I think you're getting many more layers of complexity in there

Starting point is 01:09:16 that are going to cause you to have more downtime than simply running a single node that may go down because of a network connection once a year, or a couple times a year for a couple minutes. I don't think people are really taking the cost of downtime when they think about the trade-off they're making to make these complex systems

Starting point is 01:09:37 that give them the illusion of uptime. Hope that makes sense. In your blog post you mentioned solutions such as Kubernetes tout the benefits of zero downtime deployments, but ignore that their inherent complexity causes availability issues. Then you link out to this other thing, which I had no clue of before, which is a public postmortem website for Kubernetes. And there's just like a lot.

Starting point is 01:09:59 List of postmortems for Kubernetes. It's k-s.af. That's a compiled list of links to public failure stories related to Kubernetes. Most recent publications are on top. But it's, I mean, it's a few, it's several scrolls. So there's a lot.

Starting point is 01:10:16 I don't want to like, you know, people have put in good effort into Kubernetes. I don't think it's a bad piece of software. I feel like core Kubernetes is generally good. I feel like the ecosystem around it is overly complex for most people and you know i feel like kubernetes is the future but i don't think it's the present right now like i feel like people really need to have a great use case for why they're going to use kubernetes before they jump on there you know i've worked with companies before that are trying to evaluate their Kubernetes strategy before they actually have customers.

Starting point is 01:10:46 And that seems insane to me. Yeah, it does. I generally have a rule of thumb that the cost of going to Kubernetes is probably, say, a million dollars. And it's not meant to be like a hard and fast rule, like it's going to cost that much for everyone. But you need to have a million dollar problem that you're solving with Kubernetes. And if the idea, if the number one million dollars

Starting point is 01:11:08 sounds like a lot of money, you shouldn't be using Kubernetes. It's probably well beyond your problem space. So that's my personal view on where we're going with technology and the complexity around it. I don't think people should take on those tools lightly. What would you consider the best use case then for, I'm going to say it like Richard Hipp says, which is SQLite. I'm sorry to correct you guys on that

Starting point is 01:11:32 because that's what he said. SQLite. He's not here right now. So what's the best use case for SQLite and then using Lightstream? If someone's using Postgres or they're chasing uptime, they're chasing scaling,

Starting point is 01:11:46 why would a team or an individual developer that's building an application choose SQLite or Lightstream? Sure, yeah. I mean, I guess I kind of think of it in the opposite direction. I kind of start from a default of, hey, SQLite, as they say it. It's supposed to be like a stalagmite or stalactite. Like a meteorite. It's SQLite.

Starting point is 01:12:10 You know, actually, so this is a bit of an aside. I cannot for the life of me pick up that pronunciation. But whenever I'm writing, there's always a distinction of, like if you call it SQLite, then you would say a SQLite database. Whereas if you call it SQLite, you you would say a SQLite database. Whereas if you call it SQLite, you'd call it an SQLite database. I always have this torn around the grammatical side. Anyway, I think of the actual deployment from a different side where I feel like most applications would probably work fine on SQLite.

Starting point is 01:12:42 I think you really need a good reason to move off of that. If you're going to start introducing additional tools, you're doing multi-node deployments, I think that you really should have a good reason for that. There's an inherent complexity in that, in that once you move away from a single node, there's a lot of things you can't do anymore.

Starting point is 01:13:03 You might have a Postgres cluster and it's connected to from multiple nodes, but that becomes slow because of latency to the database. So you may want to add some kind of in-memory cache, but you can't add an in-memory cache on the, you know, the web nodes because those are all connected to the database and they don't have a full view of, you know, if changes came through a different web node. So then you have to use something like memcached or maybe a Redis node. So you really, you know, this phrase like complexity

Starting point is 01:13:31 begets complexity. Like you're going to, anytime you add more complex systems, those complex systems are going to probably rely on more complexity later on. Like you're not going to have a full view of the complexity you're adding initially. So to answer your question, I think most people should run SQLite databases, especially now that you can run them safely.

Starting point is 01:13:51 And I think you should really have a good reason not to if you're not going to. Well, I'd say we would loop back around to the JAMstack. So I want to do that before we forget. This idea of read-only replicas and basically shipping them off to points of presence around the world so that not only is your static assets CDN'd but your data store is CDN'd effectively. And so you could run, we talk about edge computing and you have these functions on the edge

Starting point is 01:14:23 and Jamstack proponents are big on that. But I always say, well the function's running on the edge, but anytime it needs to interact with my database, it has to come all the way back to whatever centralized server the actual backend is running on. It has to incur that cost. Of course you can cache and stuff, so there are advantages of doing that, but ultimately your database is still in one place or a few places.

Starting point is 01:14:49 And so the goal would be to get your database just everywhere. And not have to worry about how that works. That does sound pretty awesome. And so I've kind of just been saying, and then there's FaunaDB's kind of doing that, and I think Cockroach has some sort of angle into that. There's people working on this.

Starting point is 01:15:10 Mostly what people say is like, well, it's being worked on. And so everybody kind of wants that, because once your database servers are just CDN, then of course your application servers can just be that way as well, if you have a separate app and DB. But in the case of an embedded database well your application's already out there and your

Starting point is 01:15:28 database is embedded and lightstream's just managing that so it sounds really rad yeah that's but they're read-only replicas so when it comes to writes you'd still have like a centralized thing but rights are usually less often than reads so it's like not the panacea but it's pretty stinking close if it could work well. Yeah, exactly. I think people generally, at least a lot of the web apps that I've worked on over the years, tend to be 90% reads, 10% writes. You go onto a website, like an e-commerce website,

Starting point is 01:16:00 you're probably browsing around a bunch, clicking on at least nine different pages before you actually check out. And I think that idea of the read mostly apps really benefit from this kind of thing. I think most people are pretty okay with, they get on a website and by the time they have to go check out, they're used to waiting a couple seconds at worst for a credit card to go through, that kind of thing.

Starting point is 01:16:23 I think the expectations around that are pretty okay. But to be able to actually get, to snap through a website, to browse around an e-commerce website, and every page loads in sub-100 milliseconds, I think it would be awesome, no matter where you are in the world. I think that's a pretty compelling case. So what would it take to get SQLite So what would it take to get SQLite? What would it take to get SQLite?

Starting point is 01:16:48 We're going to spend most of the rest of this podcast on that. I'm going to call it the way I've been doing it the whole time. We'll make it real hard for our transcriber. To just throw back to the episode we had with Richard Hitho, Jared, you did say that you were going to try hard to say it the way he said it. I did try hard, and that was like 10 years ago. And I had given up. I also told Gregory Kurtzer that I would pronounce it his way

Starting point is 01:17:13 because he was right here on the show. And once he leaves, I'm going to go back to my own way. That's true. So that's what I'm doing with Richard Hipp. Okay, you've got an out. Go ahead. Sorry, Richard. Go your own way.

Starting point is 01:17:21 Sequelite. I go back and forth. I call it SQL, then I call it SQL. I have no consistency. Not internally consistent. And now I've lost my train of thought. Thanks, Adam. What was I talking about?

Starting point is 01:17:34 What would it take to get SQLite plus Lightstream deployed in such a fashion? You mentioned there's some serverless platforms. Maybe they would have to use Lightstream somehow. Can I just go to DigitalOcean or to Linode and just pick VPSs around the world

Starting point is 01:17:50 and then just do my own thing? How would it actually play out? Sure, yeah. I think the biggest issue you really have around these re-replicas, especially serverless, is you really need all your rights to go to a single node.

Starting point is 01:18:07 It doesn't really make as much sense if they're going everywhere because most of them are going to be read replicas. So I think solving that issue is probably the biggest one. You can certainly do it in your own code. It would be nice to make it more automatic. I'm not quite sure how that would work. But once you redirect your writes, say you're pushing all your posts and puts

Starting point is 01:18:25 and patches http methods over there to one single node you know i think that makes it a lot easier and then from that read replica is coming into light stream and the next version and that basically has it basically streams out those changes to all the different serverless nodes so that one system the fly fly IO, they have persistent disks, which solves a lot of the issue. Uh, you can do it without persistent disks too. Um, but you get some issues around, you essentially need to download the database on startup of that serverless function, uh, when it's cold and actually, uh, bring it into the local file system. So that can be, that kind of negates some of the benefit

Starting point is 01:19:05 of a fast serverless platform. So those are kind of the two main issues. So the persistent disks, I would say, you can solve that, but otherwise it's redirecting writes. Yeah, as you redirect writes, you're kind of turning SQLite into a client-server database, though, because you're pushing all your writes

Starting point is 01:19:23 to one particular instance. And so aren't those other instances having to basically become clients of that instance? I wouldn't go that far. I think you can do a lot just simply with rerouting or doing a proxy through an HTTP server. I think you could probably make a lot of that invisible. I see what you're saying.

Starting point is 01:19:41 So the proxy can't handle it. If you can guarantee that all your Git methods are going to be read-only then I think you could probably easily do that yeah fair enough that's a good point read-only replicas are coming soon and then is Lightstream done or is there a future beyond that for the tool

Starting point is 01:19:59 or do you feel like that's your scope and you're sticking to it I would say that's largely the scope that I'm looking for I really want to make it just hardened and just work as easily as possible like i think that's where a lot of work really uh really comes in is just like getting every single little edge case that comes up and making sure that it flows smoothly and that you can use whatever you know s3 store you want to use. Making it work well with NFS disks is another thing.

Starting point is 01:20:31 There's some different configurations you can do with it. I don't really have any big plans for anything crazy beyond that. Honestly, if I can get a globally distributed SQLite database, I'm pretty happy. Well, Ben, thank you so much for, I suppose, being bold to say no to contributions. Bold not to ruffle some feathers, but I mean, that's can kind of see some details there. But hearing a full-length episode like this, I think, does provide some pathways to understand what a maintainer is truly trying to do with their software. So I appreciate you sharing your time and your wisdom here today. Thank you, Ben.

Starting point is 01:21:16 Yeah, thanks for having me on. I really appreciate it. That's it for this episode. Thanks so much for tuning in. I want to give a plug for Ben. He's got an awesome blog out there called Go Beyond. It's at gobeyond.dev. We're huge fans of Ben, so make sure you check that out.

Starting point is 01:21:30 If you haven't heard yet, we have a membership. It's called ChangeLog++ because, hey, why not increment things? It is better, as they say. You can subscribe at changelog.com slash plus plus. Get closer to the metal. Make the ads disappear, and of course, support all of our podcasts.

Starting point is 01:21:49 Again, changelog.com slash plus plus. And of course, huge thanks to our partners, Linode, Fastly, and LaunchDarkly. Also, thanks to Breakmaster Cylinder for making all of our awesome beats. And of course, thanks to you for listening.

Starting point is 01:22:02 We appreciate your attention. We appreciate you listening. And one more step you can take is to join the community. changelog.com slash community. It's free to join. Come hang with us in Slack. Call this place your home. changelog.com slash community.

Starting point is 01:22:15 That's it for this week. We'll see you next week. Game on!

The Changelog: Software Development, Open Source - Open source, not open contribution (Interview)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.