The Changelog: Software Development, Open Source - BoltDB, InfluxDB, Key-Value Databases (Interview)

Starting point is 00:00:00 welcome back everyone this is the change log and i'm your host adams dicoviac this is episode 170 and today we're joined by ben johnson and when i say we i I mean Jared because Jared went solo on this show and he was joined by Ben to be schooled on BoltDB, InfluxDB and several other key value store databases out there and Ben also shared why he's so passionate about developing open source software. We have three awesome sponsors for the show CodeShip, a long time supporter and two brand new sponsors, Imagix and Casper so huge thanks to each of them for supporting the show, CodeShip, a longtime supporter, and two brand new sponsors,

Starting point is 00:00:45 Imagix and Casper. So huge thanks to each of them for supporting the show. Our first sponsor is CodeShip. They've launched a brand new feature called Organizations. You've heard me talk to you about it before, but now you can create teams, set permissions for specific team members, and improve collaboration in your continuous delivery workflows. You can maintain centralized control over your organization's projects as well as teams with this brand new feature. And you can save 20% off any plan you choose for three months by using our code, TheChangeLawPodcast. Once again, that code is TheChangeLawPodcast. So get 20% off any plane you choose from

Starting point is 00:01:26 code ship for three months head to code ship.com slash the changelog to get started and now on to the show welcome back everyone jared here i am joined today by Ben Johnson. Ben is a Denverite. I think that's what they're called. Is it a Denverite? Is that the term for someone from Denver? Okay. Yeah, we're Denverites. Yeah, so Ben's a Denverite. We met Ben out at GopherCon back about a month ago now. He describes himself as an open source software developer who specializes in customer behavior analytics and data visualization. He's also big into distributed systems and data stores. Ben, welcome to the show.

Starting point is 00:02:14 Thanks for having me. So, out at GopherCon, it seemed like your name kept coming up all over the place. Very active in the Go community, very active in kind of the New Wave data store community. Why don't you tell me a little bit about yourself, how you got here, and what you do? Yeah, sure. So I've been writing software for about 15 years now.

Starting point is 00:02:38 I started out as an Oracle DBA back in the day and kind of moved into web development and just kind of jumped around web development. And, uh, yeah, just kind of jumped around, started getting into open source, I don't know, four or five years ago and somehow just kind of landed inside the, the data database world.

Starting point is 00:02:54 Um, there's just a lot of turns and whatnot. So, uh, yeah, just, I love writing, uh,

Starting point is 00:03:00 open source. So I do a lot of it in my free time. Well, yeah. What did, what initially drew you to open source as a thing what excited you um i think i think the idea that you could put something out there and that it not only helps people but you also you know people give you feedback and you

Starting point is 00:03:17 get a kind of you just learn so much as soon as you put out software everyone will either love what you have or tear you apart. There's all this learning, even if it is kind of hard. So I try to do a little bit of exploration just to find out more about you online. Ben B. Johnson on Twitter, if anybody's interested out there. Notice you don't seem to have a website. You apparently are a consultant of some kind. You do have a LinkedIn, I found that.

Starting point is 00:03:48 But you seem to be focused on your GitHub and your Twitter, and that's about it. Can you tell me about your business side and your consultancy and how that all works? Sure. Yeah, I work with Influx TV. I work with them and just write a lot of the storage layer and the distributed systems parts of that database. That's who I work for during the day. And I've consulted in the past, and I used to work at Shopify for a while. So I just kind of hopped around here and there.

Starting point is 00:04:18 Right on. So you're focused on Influx at the moment. Well, we have you on the show because we want to talk about databases. And it seems like for a long time there was kind of your set of relational databases. And then there was these niche things out there that you heard of, these document stores or column stores. I remember in the area that I worked, there was actually a database called Cache. I don't know that one. I don't know that one. You don't know that one. So yeah, it's big in kind of the medical world and also in financial transactions and whatnot.

Starting point is 00:04:54 But it was very niche. So you kind of have these pockets, and it has its thing going on. And then you had kind of the NoSQL explosion a few years back where you had your mongos and your reacts and interest around those um and we seem to be having another wave of of new things and maybe they're not new they just i think they're new because i don't really come across them as much um some of which are things that

Starting point is 00:05:17 you're you're involved in influx bolt uh you emailed me a list of things i had never heard of half of these and i try to keep up with open source. It's hard. It's hard, for sure. But there's LMDB, LevelDB, Parquet, which I had never heard of, Cassandra, which is coming out of Twitter and Facebook was big a little while ago.

Starting point is 00:05:39 Has there been an explosion in data stores, or am I just noticing it? Oh, no. Yeah, there's definitely an explosion. I think people are starting to realize that once you once you get to a certain scale or a certain use case like you can optimize at these really low levels and you start to you know what used to just be an application and you're going further down and uh lower down you start to just build your own database at that point and uh i think that people find a lot of either um operational simplicity

Starting point is 00:06:05 from having a very specific uh target or use case or they just get a lot of performance out of it um there's all kinds of different reasons to get down that far i think when the no sql uh movement might just call it that first hit and i'm not i can't remember the timing but maybe it was 10 years i can't remember how long ago that was. And MongoDB became kind of a thing that was winning the hearts and minds of developers. There was this whole throw out your relational database mindset. It seems like now that's kind of shifted, and it's not like throw it away. It's like here's something you can use in addition.

Starting point is 00:06:45 Is that how you feel about these types of databases? Yeah, I think a lot of it has to do with, you know, your requirements, kind of what you know already, where you're coming from. I think SQL came out of, you know, back in the 80s, it came out of, you know, you had business people that would go up to a terminal and they wanted to be able to write their own queries or, you know, some level of that. And then we started trying to fit it into this object model too. And I think a lot of people have gotten tired of trying to fit the relational and object model together. And business users, they'll use a web UI now. It's made by a developer. So we don't have that direct SQL requirement anymore.

Starting point is 00:07:18 But I think that there's not, the NoSQL movement doesn't have enough of a structure. We make all these databases, but we don't tell people how to actually use them or what best practices are. So I think we developed these 20 or 30 years of SQL best practices that I feel like they're starting to fall back on. And they're saying, well, I know how to do that. I'll go back to doing that. This object thing or document database is confusing.

Starting point is 00:07:44 So I think if we can actually do a lot of education around that, this object thing is just, or document database is confusing. Yeah. So I think if we can actually do a lot of education around that, I think there's some great use cases for, you know, document databases or key value stores. Yeah, I think key value stores is one where we've definitely seen a lot of activity, a lot of options. And maybe it's because a key value store conceptually is pretty simple. I don't know. I'm not going to go out and say to implement it is simple because you would know a lot better than I do that I'm sure there's tons of nitty-gritty details and bumps in that road.

Starting point is 00:08:17 But man, there sure are a lot of options. And it seems like a lot of those options are written in Go. Yes, there's been a huge influx. I think part of it is just the simplicity of you get something written in Go, you can compile it onto a bunch of operating systems and just distribute it out. A lot of the uptake we got at Influx

Starting point is 00:08:38 has just been people saying, this is really easy to set up compared to a lot of other alternatives that have been around for longer. It's just people can get up and running, people don't want to spend their whole day trying to learn one tool. They just want to run a command and have it there.

Starting point is 00:08:54 So I think Go does that really well. So you have one of these fancy new data stores. I'm just going to act like an old man and talk about everything as if it's shiny new and foreign to me. At least for the first part of the call until you kind of school me on how all these things all work. But yours is called Bolt.

Starting point is 00:09:18 Bolt DB slash Bolt on GitHub. And it seems to be production ready. Why don't you go ahead and just give us the elevator pitch for BoltDB? Sure. BoltDB is a read-optimized store, a key value store, and its goals beyond anything else is just to be operationally simple and to have a clean API and have strong transactional support. So there's a lot of key value stores out there that will give you,

Starting point is 00:09:48 maybe it's really fast write performance, but the reads are really slow. Maybe you'll get certain other benefits where it might have a crazy API, but it might be fast. Actually, a lot of key value stores seem to be centered around just being fast, which I feel like as computers get faster, you know, I don't feel like, you know, most websites out there aren't getting thousands of hits per second, you know, they're getting a hit per second or, you know, somewhere in the tens of hits per second. So I think that a lot of people try to, they look for the fastest thing out there because they want everything to be blazing fast.

Starting point is 00:10:25 And they just forget about all these other operational side considerations. Yeah, it's kind of the thought that became a meme with web scale back in the day to find out what is and what is not web scale. And the fact of many people's businesses and websites is you can count the Twitters and the Facebooks on one hand. Sure, there are other large sites out there. There's the Reddits and the top 100 Alexas. But most of us make our living and live on the web in smaller, less populated areas. Yeah, for sure.

Starting point is 00:11:03 It's interesting to see all the databases that come out from you know the facebooks and the twitters because they have such different requirements than 99 of people out there um so i think that you know it's interesting to see where where people are the databases are coming from and right how those requirements line up So it looks like Bolt was 1.0 November of 2014. And it's a bit of a remix because you say it's inspired by Howard Shue's LMDB project. Can you tell us about LMDB, how it inspired. I really like what Howard did with that. And what it is, is it's basically a B-tree. So your data's structured in this B-tree that you can access your data, you can write to it. And whenever you change a leaf inside of your tree, it'll copy all the parents as well and kind of make this new version of the tree. So every change will make this new version, this incremental version, so that as you're going along, everything that's reading from that tree will get a kind of a snapshot of

Starting point is 00:12:12 it and work in a transactional way. And then as things update along, other readers get their own snapshot of the world. And it's really good as far as having great transactional support. You can do great things like operationally where you can just essentially just copy the file as a backup. And if you're setting up a website or setting up an application, you don't want to have to worry about setting up MySQL and having a replica and doing all this other crazy stuff. You can attach on just a web handler, like an an http handler and stream down your database if you wanted to have that option like it's three lines of code to do a backup basically so certain things like that it has this very simplistic design as opposed to um there's other there's another type

Starting point is 00:12:55 of database called an lsm tree it's a log structured merge tree merge tree and what that is is it it takes these different levels it'll kind of create keys and values in these sorted blocks. And each one will be a different file. And then as you get these files large enough, they get compacted and written into a new block that's larger. So it takes a bunch of them and makes them into these larger ones that are kind of at different levels. And those are really good for writes. But operationally, it can be a huge pain because you can end up with hundreds or thousands of files

Starting point is 00:13:27 where you have this kind of tough, it's kind of tough to snapshot, like just copy a file. It's much more involved than that and how you try to stream that out and stream it in. So operationally, Bolt is simpler, although it doesn't have the benefits of write optimization,

Starting point is 00:13:43 like something like an LSM does. So again, it's kind of right tool for the right job. If you have a read-heavy situation, Bolt might be a little bit better fit. If it's write-heavy, then something that uses LSM might be a little better fit. Yeah, for sure. And actually, a lot of people will

Starting point is 00:13:59 ping me on Twitter or on GitHub and say, Bolt is slow, or whatever. Just like that. Like, hey, Bolt is slow or like, you know, whatever. Just like that. Like, hey, Bolt is slow. Pretty much. They'll just say like, Bolt sucks. People are so nice.

Starting point is 00:14:11 Yeah. And I mean, I will just paste them a link to a different database that will probably fit their use case better. I feel like people try to, if you come across a project where they try to be everything to everyone, it's just, I feel like it's injured ears. We should know that there's trade-offs on all this stuff. And, you know, Bolt is not the right tool for probably many projects out there, but it might be for yours, you know?

Starting point is 00:14:31 You know what's funny is, we just had Thomas Reynolds on the show last week talking about Middleman, which is a static site generator, completely different situation. But, you know, he's been writing Ruby and JavaScript for years, and we started talking about programming languages and stuff, and I just asked him very pointedly if he's still writing Ruby and JavaScript for years, and we started talking about programming languages and stuff. And I just asked him very poignantly if he's still bullish on Ruby.

Starting point is 00:14:49 And his answer was very familiar to what you just kind of said here, where it's like, well, there's probably a better tool for a different job that you may have. And it's like, well, that was an extremely level-headed answer. And it's funny because we all kind of live, work, and have our interactions on the internet. And I don't know if it's the written form versus here we are, you and I talking on Skype. It's like people are very level-headed about these types of things in real life. But when we get on the web, it's just like, you know, bull sucks.

Starting point is 00:15:19 Yeah, exactly. We lose all sense of like right tool for the right job, and we're all trying to just build good software. And it's like we get into religious wars over these things. I wonder if it's just that degree of separation or what it is about the internet that makes us like that. I think it is just the anonymity. Because if I go to a conference, no one ever comes up to me and tells me how much it sucks. They say, I had this difficulty with performance where I did this.

Starting point is 00:15:43 And then we actually have a conversation about it. And then everyone walks away kind of being a little more knowledgeable. Right. Yeah, it's just hard to be nasty. Exactly. So let's go back to not bolt sucking, but things that it's good at. And it must have a lot of use cases because you do have a lot of adoption. You have a lot of other projects that kind of use bolt um behind the scenes um i think perhaps the reason is that because it's embedded um as opposed to a server type of setup can you talk about the embedded

Starting point is 00:16:17 aspect of the database sure yeah so it's it's just a library that you you bring into your go program and then you point out a file and and you're basically ready to go. There's almost no configuration options. Even if you wanted them, you can't really configure the database. And it's a single file. There's an OS lock on it, so you can only have one kind of attached to that file at a time, as opposed to like a MySQL or Postgres, where you have this gigantic configuration file where you have like this gigantic configuration file

Starting point is 00:16:46 where you have to find some tweaks to make. But yeah, so I mean, from that side, operationally, it's easy to just get it up and running. A lot of projects, especially when they're products or like an open source project, you can't make that requirement to say, okay, first you guys got to set up these four services and then configure it here and do all this stuff. Cause no one, no one wants to go through all that. They don't want to add one

Starting point is 00:17:09 more thing to their stack. So I think it's, it's been a lot of success from that. And you know, I think a lot of times too, another important thing too, that people don't think about is that in a lot of projects, the data store is not your bottleneck. You might have a lot of other processing going on, and you're just storing some metadata maybe, or you're transferring across a network, or you're doing all kinds of other things. So from that perspective, when performance is an issue, usually the next most important thing is operational simplicity.

Starting point is 00:17:41 And just to be able to say, this is a file, I can just deploy it, I don't need to do anything beyond just starting up the program. So I think that goes a long way. Yeah, I agree. Operational simplicity absolutely does go a long way. You also kind of focus on API simplicity. And the fact that this is just a key value store. And that's not a bad thing, that's a good thing, right? You're keeping it simple on purpose. As far as the API is concerned, is it just a matter of I put data in with a key and then I get it back out by that key?

Starting point is 00:18:16 Is it as simple as that, or is there more to it? It's essentially that. You have things called buckets as well. They're basically like a key space. So you can only have one unique key in a bucket but you can have buckets inside buckets and you can do some some interesting things around that but honestly i mean most of the time when i use use bolt for an application i'll treat it almost like the way that i structure like tables inside of a relational database okay i kind of have top level buckets where i might have a customer's bucket or a

Starting point is 00:18:43 you know whatnot and my primary key you know primary key, you can create sequential integers inside Bolt per bucket where I might have an ID for a user. I can generate it off that bucket and the user's bucket, and then that becomes the key for them. So user one is pointing out this encoded data structure for the user. So, I mean, really, it's not actually that far of a departure from relational databases in that sense. Like you think of, you want to find a user by ID, they just find their ID and look them up. You don't get the benefits of things like indexes. You don't get a fancy query language. Yeah. So if you want to find that user by their first name, now you're in trouble. Yeah. I mean, you need to look, you need to save that separately as a,

Starting point is 00:19:24 you know, kind of create your own index. you need to save that separately as a kind of creature on index. So there's definitely a bit of a hurdle in that sense. But if the indexing isn't a huge piece for you, or if you're, a lot of times people index using something like Elasticsearch, you know, some full-text search engine. Actually, there's another one called Believe that'll use a bolt underneath.

Starting point is 00:19:42 People are going to attach onto that and do their searches through that. So it depends on the use case. Again, right tool for the right job. But I treat them kind of similarly to a relational database. And when you think about relational databases too, they store their rows that they have in there

Starting point is 00:19:58 or just an encoded data structure that has a row ID that points to it. So they're almost, what's the word, key value stores underneath. They've set kind of a relational layer on top. Yeah, that makes a lot of sense. And then when it comes back to the operational simplicity side of things, you're just storing all this in a file on disk, right? It's very SQLite in that sense.

Starting point is 00:20:22 Yeah, single file. Yeah, it's pretty straightforward. Pretty straightforward. So that, you know, backups, moving things around, copying data, it's, you know, if you use your Linux or your Unix tools, right? Oh yeah, pretty much.

Starting point is 00:20:38 I mean, there's some locking stuff, so you have to go through. It'll actually be a transactional copy inside the database. So you start a transaction, you can stream it out. But it'll go as fast as your operating system, your SSD can read the data off. So it goes pretty snappy. And it sounds like you've gotten that into a scenario that says Bolt currently in high load production environment serving databases as large as one terabyte.

Starting point is 00:21:02 So even in that case, you just have one terabyte file sitting there? Sometimes we'll split off into multiple partitions. That's more of a load balancing thing. It was actually at Shopify, we created an analytics database that was clustered and we had

Starting point is 00:21:20 multiple bolt partitions running on each one. And then we'd copy them around and redistribute the load as we needed it to. We used consistent hashing inside of there to be able to redirect requests to the correct partition. Very cool. Well, this sounds like a good spot to stop and hear from one of our awesome sponsors.

Starting point is 00:21:39 When we get back, I'm going to talk to you about some more use cases, maybe compare it to a few other key value stores, LevelDB. Others people might be familiar with Memcached, Redis, such things. So stick around, and we'll be right back. ImageX is a real-time image processing proxy in CDN, and let me tell you, this is way more than ImageMagick running on EC2. This is way better. It's everything on ec2 this is way better it's everything your

Starting point is 00:22:06 friend and developers have dreamt of output to png jpeg gif jpeg 2000 and several other formats and if you're like me you've ever argued with your boss or a teammate about serving retina images to non-retina devices you'll appreciate their open-source, dependency-free JavaScript library that allows you to easily use the ImageX API to make your images responsive to any device. Now, all of this takes a platform, and the ImageX platform is built on three core values, flexibility and quality, performance, and affordability. When it comes to flexibility and quality imagix has over 90 url parameters that you can mix and match to provide an unlimited amount of transformations that you

Starting point is 00:22:53 need for your images and they take quality very seriously and because of their commitment to high quality the guardian eventbrite kickstarter quiz up and many more trust them to serve their images now when it comes to performance imagix operates out of data centers filled with top of the line mac pros and mac minis and they're set up for a completely streaming solution this means your images never hit the disk images are served by the best ssd-based cdn for delivery around the world anywhere extremely fast and while we're talking about speed almost all the image processing happens on gpus this means transformations are super fast when compared to competing virtualized environments and lastly it's all about affordability everyone wants to save a buck. That's how the world works. Because Imagix processes close to a billion with a B images per day, they're able to make certain

Starting point is 00:23:50 optimizations at scale and pass those savings on to you. To learn more about Imagix and what they're all about, head to imgix.com. Once again, imgix.com and tell them adam from the changelog sent you all right we are back with ben johnson talking all things open source databases specifically at this moment bolt db which is ben's popular key value store in the go ecosystem ben we were talking about use cases. Can you give us kind of how it's being used in the wild and maybe some projects that are built on top of Bolt? Yeah, sure.

Starting point is 00:24:35 I think it's largely used by projects that, you know, a lot of times it's for projects that have like a data store they need inside of there. But that's not, you know, it's not the main focus of the application where they have, you know, it's not like a web app where some giant database is sitting behind it and people are using it. So I think it's getting towards that. But I think a lot of cases tend to be it's storing metadata or like smaller sets of data currently. There are definitely some exceptions to that. There's a guy named TV. He wrote Bazil, which is like a distributed file system, like personal file system, kind of drop boxes.

Starting point is 00:25:18 But he's using Bolt for that. He's actually been around for a long time. When I first wrote Bolt, I put it out there, or not even put it out there, I just had it as a repo. He just came along one day and was just going line by line through the code and being, this is wrong, this is wrong.

Starting point is 00:25:34 What? I mean, in the most friendly way. He'd tell me how to fix it, give me links to low-level Unix documentation. So he definitely helped to stabilize Bolt. So huge shout out to him. I know that at Heroku, they have some log stuff that runs through Bolt

Starting point is 00:25:53 or uses Bolt in some capacity. But yeah, there's definitely some cool projects out there that people are using it for. So in addition to that, it seems like whenever you talk about key values, there are a few common use cases specifically thinking about web apps uh that's kind of where my mind goes as i'm a you know web developer by trade so um caching is a big one um background jobs seems

Starting point is 00:26:18 like those cues are pretty good scenarios um there are tools out there that do such things. I mentioned them before the break, Memcached and also Redis. Can you kind of compare and contrast to those if you're familiar with them? Sure, yeah. How good would Bolt be at those particular jobs? Well, so Memcached is meant to be, if I understand it correctly, it's an in-memory cache. So I don't think there's a backing store on it. Yeah, it's not persistent.

Starting point is 00:26:47 Yeah, it's been a long time since I've used that. But yeah, so you can store data in there all day, but it's meant to just be a layer to hit quickly, but you can always fall back to the underlying data store. So Bolt, in contrast, it writes all the data to disk safely. Even in the event of a crash, it'll come back up. And if you've committed a transaction, that transaction will be there. If you look at something like Redis, on the other hand, it has, I think, two different persistence layers.

Starting point is 00:27:13 They have like a write-ahead log and a snapshot, I think. I could totally be butchering this. But yeah, I mean, Redis, it stands at kind of a higher layer. They have a key value piece in there. I know they have a whole bunch of other data structures they do as well. Yeah, I mean, as far as complexity goes, Redis has lists and sets and different objects and stuff. Yeah, which is really cool. They don't have a sense of a transaction, though.

Starting point is 00:27:42 So I think if you really want strong transactions, which I think a lot of people don't realize how important that is. Like we get these kind of weird inconsistent states when we're trying to write 10 keys, but we only write eight of them. And, you know, what happened to those last two? And we try to resolve that by, you know, writing jobs to kind of fix it later on or check for it. But if you can actually get strong serialization or serializable transactions, I think that goes a long way. So Bolt has transactions. Yeah, they're actually full ACID serializable transactions.

Starting point is 00:28:14 Can you teach me that like I'm five? Can you just go through a transaction and tell us what that all implies? Sure. So you start a transaction. You can do read transactions or write transactions. Write transactions, you can only have one at a time, so they all go sequentially. They're serialized. Read transactions can start on and off whenever they want.

Starting point is 00:28:35 You can have multiple at the same time, and they'll all go off at the same kind of that point in time when the transaction started. So the actual write transactions, they will kind of give you a space to work in and you can change data and rewrite those keys and values or create buckets. And then when it goes to commit it, it'll take those pages it wrote and it'll write all the pages out

Starting point is 00:28:58 and it'll write a new meta page. And it kind of has this almost like a, if you've ever done like graphic stuff, has like basically a double buffer for your meta page. So it has of has this almost like a, if you've ever done like graphic stuff, has like basically a double buffer for your, for your meta page. So it has to write all the data first and then it writes a new meta page to point to that new data. And it, the transaction is not committed until it writes that single last meta page. So it has this interesting piece to it where it's, there's not like a, it's not not recovery like you get in a lot of databases like if it crashes it'll just start back up with whatever data is committed there's no doesn't have to re-read a log to you know reapply changes it's just wherever it was

Starting point is 00:29:35 it has this unique safety property which is really nice so the i don't know if that's in depth enough or you want some some more. No, that was pretty good. Okay. So that sounds like, I mean, and you implemented all that yourself, so that sounds like something that is a nice thing to have, especially for something that you're going to be building on top of. Sounds like a feature that is definitely not unique to Bolt, but as far as key values go,

Starting point is 00:30:07 I think that's nice to have, right? Or that's kind of even a, you got to have that, right? Well, you think you got to have it, but like serializable transactions, they're not even the default on a lot of relational databases. I think they're actually recommitted transactions. There's all kinds of different isolation levels.

Starting point is 00:30:23 And it's honestly hard to remember all of the little nuances. But serializable transactions means you can't read anything that's been committed or anything in another transaction that's been committed already. But didn't get committed before the transaction started. You kind of get this whole view of the database. And it's basically how you think of transactions like in your head normally it's like i have the safe world where everything is you know how i expect it to be that's a realizable transaction there's a lot of other ones that try to

Starting point is 00:30:56 make um make trade-offs for performance or speed yeah yeah where you can kind of like you can read things that have been committed in another transaction after this one started, but before it stopped. It's confusing, honestly. But if you think of transactions, it's probably what you'd expect. But yeah, it's really useful to have that safety. And I tried to pare down Bolt to really be the core things I needed. LMDB had a lot of other features around performance where you could write stuff directly into the database instead of going through some other safety measures. And they had some other tradeoffs they made.

Starting point is 00:31:35 But I tried to cut out all those extra pieces. So it ended up being 2,000 lines of code, which I don't know if that sounds like a lot or not. For a database, it's tiny. Yeah, I was going to say, it sounds like a lot if I was just going to sit down and code that day, but for a database, it doesn't sound like too much. Yeah, so I mean, LMDB, I think is about 8,000 lines. If you look at like LevelDB, I think it's around 20,000 lines. LevelDB is very similar to Bolt. It's out of Google. It seems like there are some differences.

Starting point is 00:32:03 Yeah, so that's an LSM tree. So that'll do the write optimized. Whereas, you could write stuff into LevelDB much faster than you can in Bolt. But if you're looking to do range scans where you have a set of data in order that you're trying to go across, Bolt will be much faster than

Starting point is 00:32:19 LevelDB. Awesome. So that's Bolt in a nutshell. Great readme, by the way. Gotta give you respect for going into great detail there. GitHub.com slash BoltDB slash Bolt. Check out the readme. Ben goes through not just usage and

Starting point is 00:32:35 backups and stuff, but he actually goes through comparison with other databases. He'll talk about the LSM tree versus the B tree, when you should use which one. There's even caveats and limitations. Lots to be had there. Check out Bolt, a low-level key value store that's simple on purpose and sounds like it's a rock. It's been production ready since November of last year and people are picking it up. So check that out. I think we should switch gears a little bit and talk about the next one.

Starting point is 00:33:07 I know we had a list of a ton of databases. We're just going to pick a couple because we don't have too much time. The next is the one that you seem to be working with either in a consulting capacity or full-time, but InfluxDB, which is open source as well but also has a business built around it. Can you tell us about Influx? Sure, yeah. It's a time series database, and we really center around being easy to get up and running. We have clustering in there.

Starting point is 00:33:34 We can actually spread it across a lot of machines. And then we're building out a lot of new functionality now for doing a lot of write-ahead log stuff for write optimization and doing compression in there to shrink down the size of the database. So it's coming along. People have really been interested in it as far as just, again, it's one of those simple databases that we use Bolt underneath, so there's no other service to get up and running.

Starting point is 00:34:01 I know some other things have relied on Redis or some other data stores in the past some actually a lot of them rely on um cassandra in the background that kind of push that off to there uh but we it's really just one binary you just download and just start up so that's yes it's been great in that sense people have been really interested so as far as time series databases go um i don't have much of a context besides speaking with Julius Volz about Prometheus, which at its core has a time series database.

Starting point is 00:34:33 And I know there's some Prometheus uses Bolt here or there. I'm curious about if it uses Influx or not. But are there other time series databases out there that people can pick up and use or is is this uh a brand new thing uh there's been time series databases out there um but the funny thing with time series databases is that especially some of the older ones is that they're just notoriously difficult to get up and running um and a lot of people will actually pick up influx more or less i mean initially out of frustration just

Starting point is 00:35:06 they spent three hours trying to get right graphite running or something like that uh-huh and they just they gave up so i mean it's it's interesting like the the technical decisions you make along the way about what dependencies you might need and how those dependencies change over time and what how that makes a project, hard to get up and running. So, yeah, it's not a new thing by any means. I think there's a lot of ease of use stuff. We have a query language in there. We do a lot around the way people can retain data long term and how they roll it up and how they can move it around.

Starting point is 00:35:38 So there's a lot of thought that's put in that too. So let's maybe zoom out a second and talk about time series as a thing. When would I reach for this type of a data store? The one I can think of off the top of my head is analytics. But are there other use cases for time series data stores? Sure, yeah. I mean, analytics is a big one. Monitoring has been another big one as well.

Starting point is 00:36:00 A lot of people have sensor data. That's actually been a big growing one with Indeed. And there's some weird use cases with sensor data as well. A lot of people have sensor data. That's actually been a big growing one where they need. And there's some weird use cases with sensor data as well. There's one where there's a company that has sensors, but they don't send data continuously. They store it up and then every like four hours, they send off the data. And for some reason, some databases, they expect kind of a stream of data coming in and stuff will get dropped off if it's too late or out of order or certain things like that. So sensor data has been a big one as well.

Starting point is 00:36:32 So I think between those three, those are probably the main ones. I can see it also with streaming financial transactions and market stuff. Yeah, that's another one too. Yeah. I mean, anything that's going to have a real-time stream of data and you're going to be either capturing it or aggregating points in time to use later. Seems like that's kind of where these things play. Yeah, it's one of those things too. It's one of those use cases that's grown large enough where people have started writing databases specifically for it. And when you have, if you try to put it into something like MySQL,

Starting point is 00:37:06 I mean, MySQL has a ton of features on there for relational access and indexes and all kinds of stuff. But if you really just have a timestamp and a value or a set of values, and that's the data that you have going in, there's much better ways you can optimize that in a specific store. So Influx is both an open source project and a company. I'm not sure what the product is, if it's a services, if it's a pro plan. How does the business side break apart from the open source side with Influx?

Starting point is 00:37:38 Yeah, so the business side, we have a managed hosted product over there as well. And we do a lot of, we have some SLA stuff as well for more enterprise customers. Those have been the big pushes too. We have some stuff coming down the line as well. But I think that's more hush hush. So, yeah. So the people have been pretty excited too about having kind of a roadmap of where Influx is going and what we're doing with that. I think a lot of times some businesses have been hesitant with other open source projects that they don't know the long term.

Starting point is 00:38:14 Like if they want to build a product on top of Influx, they want to know that there's a company there and that they have funding. They can't disappear. Exactly, yeah. Because sometimes projects do kind of go into the ether right yeah and i guess that you know whenever we have a business divide and and an open source divide we start to wonder about licenses by the way uh changelog listeners who always are asking us talk more about licenses um bolt is mit licensed um how does influx's license break out yeah influx is also i believe either mit or bsd and they uh honestly the one of the reasons i came on originally with them is that both the uh the founders are just awesome laid-back cool guys

Starting point is 00:39:00 that love open source that paul paul dix has been involved with on the ruby side for a long time um and then you know they're they're very focused on putting out stuff and being in the community and being um and talking to people on twitter or on github and getting people involved so i really like that about them and but they don't have a restriction around uh like a gpl license or anything like that. Okay. So they've been pretty open about it all. I know there's been some contention about whether you should do a dual license

Starting point is 00:39:31 or how all that lays out. I'm kind of anti-GPL personally, but I'm sure that's going to start a flame war right there. Why? Tell us why. You know, I think the thing is, I guess I shouldn't say anti-GPL. If it works for you, that's great. For me personally, I like to make things, and I like to be able to just put them out there in the world.

Starting point is 00:40:03 And people can kind of riff off that and do something with it if they want to, or they could go build a company out of it. If I can do something that will somehow make value in the world, I think that's awesome. But whenever I see something that's GPL, I don't know if I'm ever going to want to do something in that realm again. I don't want to worry about some derivative work issue coming along later on. So if I see GPL, I honestly just close down the project. I don't even look at the project because I don't know. Just like that.

Starting point is 00:40:25 Yeah, just like that. I hate to even say that because I honestly just close down the product. Like I don't even look at the project because I don't know. Just like that. Yeah, just like that. I mean, I hate to even say that because I think people have done great work. That is GPL. But, you know, there are a lot of businesses that just simply can't use it. Right. And some people may want to use it in a business capacity. I know there's all kinds of hoops that just kind of make me skittish personally. Yeah, I'm of two minds, as i am on many things um i can

Starting point is 00:40:47 see both sides and uh i personally mit license almost everything i do that being said like i'm mostly putting out small things that are i think you know trivial um not like run i mean even your bolt db is more ambitious and and because its infrastructure is more likely to be included in commercial products than anything that I've built open source. So if I had a more substantial, bigger thing, I might put more thought into it personally. But yeah, I can see how the GPL limits adoption, absolutely, and how there's a lot of noise in open source especially now more than ever it's hard to uh let the cream rise to the top so to speak that's you know one of our missions with the changelog is to shine the light on open source like the little guys who are doing cool things but you know their voice gets drowned out in the crowd is We like to shine a light on that because we realize that there's a lot of noise.

Starting point is 00:41:47 And so putting a GPL license on your thing makes it harder for it to take off like it would with a more liberal license. That being said, I also understand the side where it's like companies are just profiting off of my work. I get that. I get that too so it's it's tough um yeah i had a discussion with mike or just a small twitter discussion with mike perim is that how you say his last name perim yep yeah and uh yeah i mean it seems like the gpl

Starting point is 00:42:18 the dual license is working for him with sidekick um so i mean i i certainly i don't want to knock it by any means. I think there's definitely a use case out there for it. We've had Mike on the show a few times. He's been unique in the ability to turn a popular open source project into a business, a lifestyle business, not a VC-funded larger thing. He has a lot of opinions on

Starting point is 00:42:44 not just licensing, but also the sustainability of open source and how to make it work for you. And so I'll just submit that for the listeners. If you're interested in that topic, I don't have episode numbers on me, but go to changelog.com slash podcast and just search in page for the word Mike or Parham. You'll find some interesting episodes on that. Yeah, I mean, when it comes to licensing, it's something that we all have to wrestle with

Starting point is 00:43:11 as we put our software out there, is what are our priorities and what do we feel comfortable with. So everybody's got to make their own decision on that front. Yes, it's a minefield, though. It really is, yeah um so back to influx just a little bit um it's at 092 so um you know not quite at 1.0 but it seems like it's out there and gaining steam um anything else about influx db that you want to hit on before we move on?

Starting point is 00:43:49 Um, you know, we're just, yeah, we just keep working at it. I mean, I know that there are, you know, I think, yeah, I think it's, it's just a product that's continually evolving and improving. So I think that if people have tried it in the past, you know, we've done a lot to, to improve upon it. So I hope people try it again. Certainly. Awesome. All right.

Starting point is 00:44:04 We'll take our second break when we get back i want to talk to you about something a little bit different which is um i'll just leave it as the secret lives of data let's just leave it right there and we'll peel that apart when we get back guess what everyone we've partnered with casper the online retailer of premium mattresses to to give you $50 towards your new mattress. The mattress industry has inherently forced consumers, myself included, into paying notoriously high markups. And Casper has revolutionized the mattress industry by cutting the cost of dealing with resellers and showrooms. And they pass those savings directly on to you.

Starting point is 00:44:44 Their mattress is a one-of-a-kind. It's a new hybrid mattress that combines premium latex foam with memory foam. And the Casper Experience was designed with you in mind and optimized for sleep. And this is my favorite part. It's backed by a 100-night no-hassle return policy with full refund and a 10 year warranty. And what's even cooler is how they ship this mattress to you. It comes in a box that couldn't possibly fit a mattress. And when you open it, the mattress unravels for you to lay down and catch some Z's. Head to casper.com slash changelog and use the code changelog when you check out to get $50 towards your new mattress. Enjoy.

Starting point is 00:45:27 All right, we are back with Ben Johnson talking open source databases and perhaps somewhat related is this really cool thing called the Secret Lives of Data. That's thesecretlivesofdata.com. We'll link it up in the show notes where he explains a thing called raft in a cool visual way ben can you tell us about this yeah sure so the secret lives of data is just meant to be a project where um i feel like there's a lot of like distributed systems and database topics and computer science topics that like i honestly feel like you can explain any of those topics with like circles and lines and motion like that's kind of whenever you go on a whiteboard, you're like, this is the server here and over here and does that. But we don't kind of have that.

Starting point is 00:46:11 We have books, static images. And I feel like there's just kind of this there's a piece that's lacking, especially with so many new distributed databases and all these kind of systems design things that people need to learn about. But it's like research papers and it's these books that come out that are kind of tough to sink in. So I wanted to find something in some way that was easily digestible to explain complicated topics like distributed consensus, for example, is not like the easiest topic to explain to someone. But if you can step through it piece by piece and kind of show some motion with it, I think people tend to pick it up.

Starting point is 00:46:51 I've had a lot of people actually mention that they read through the paper a couple of times, but it didn't click until they saw this visualization of it. So to explain what it actually is, it's kind of a data visual. It's almost like a motion graphic of how raft and distributed consensus so this protocol called raft uh implements distributed consensus where you have a set of nodes like a uh like a cluster of computers and they need to agree on some value and how that happens and how it changes over time and if you get like a split in your network you know what happens to the different sets of nodes, and how does it avoid situations where some nodes think that they might have one value, another might think it has value.

Starting point is 00:47:30 So there's all these edge cases that you don't think about and that are kind of hard to wrap your head around, but I try to explain that visually. So it does a great job, by the way. This is incredibly impressive. And I actually came across this, I don't know when you launched this, but I think it hit my feeds then. I didn't know who did it at the time,

Starting point is 00:47:49 but then when I started doing some research into it, I was like, oh, man, he did this. That's pretty cool. So where's the motivation behind sinking the time into this? Do you have an educational background, or what made you want to do this? I know that some people have put out so many great resources that I've learned from. I know you've had Elio Grigoric on the show a bunch.

Starting point is 00:48:18 When I first got into writing databases, every time I'd find some concept I wanted to learn about, I'd type it into Google and he'd have the first page on there with his blog about some obscure topic right like bloom filters or whatever exactly and like I always see these you know and he sunk so much time into his blog to explain these great things that I learned from I feel like you know what's some way I can give back and uh like I knew I knew raft really well i've implemented implementations in it and uh yeah just i was trying to think of a great way to kind of visualize that and show it and at first i thought it was going to be like a week you know and i'd be done how long i think it's like a month and a half wow and it was i ended up writing i wrote in d3

Starting point is 00:49:01 but d3 doesn't have a way to like stop motion like part way through okay so i actually had to write my own kind of timers and framework on how i was gonna like structure stuff and i essentially wrote like a raft implementation in javascript that i could run inside because if you play the visualization twice it'll actually be different the second time okay and the you know the way the the nodes shoot off and yeah so. It was a huge pain. It's been sitting idle for a little while, but I ended up taking a while off and learning this program called After Effects from Adobe,

Starting point is 00:49:37 which actually does motion graphics. Because I thought there had to be an easier way, and there is. People do this for a living. It's almost like a flash, but you can generate video and do all kinds of stuff. People use it for special effects in movies a lot. So yeah, I want to start doing stuff. Originally, I was going to do five-minute videos for things like Apache Kafka or Cassandra or all these more complicated databases and how they work.

Starting point is 00:50:03 Yeah. And, uh, yeah. So I spent like three or four months learning after effects and reading books and watching videos on it. Um, and then I, I ran out of time to actually make the,

Starting point is 00:50:16 the visualizations, but the, uh, I realized also originally I was going to do these five minute videos, but then I kind of realized later on, like people like snippets seem to be much more easily digestible. I'm thinking about doing a smaller format of 20-second animated GIFs that I can post up to Twitter. It seems like I'd be able to spread a little better.

Starting point is 00:50:38 You just click on it and learn about how Apache Kafka works in 30 seconds. We'll see if that works, but that's my goal right now. I love it. I mean, I would say that, you know, just to exhort you to continue in these efforts, because I think it is a powerful way of teaching and, um, you know, maybe not to give up completely on your, your, um, the work you put in to build this one. I don't know if maybe it's just too crazy,

Starting point is 00:51:09 but if you could get some sort of a framework in place to where you could do other things more easily, then you could start to have an infrastructure for other people building out these types of things on the web. That being said, animated GIFs, people love those. People love them although it's strange to find one that's useful it is yeah I think it would probably be the first one

Starting point is 00:51:31 they usually just get it displaying some sort of emotion or surprise but yeah the first useful animated gif maybe you could get it on wikipedia for that I hope so awesome we'll link that up in the show notes.

Starting point is 00:51:46 Ben, I think it's time to go to our awesome closing questions. And we will ask the first one, which has become somewhat compulsory these days, which is, who is your programming hero? I would have to say Elio Gregoric. I just learned so much from that guy, from his blog. I would totally just be his groupie. Totally, if he was at a conference, I'd just follow him around the whole time.

Starting point is 00:52:13 I have to give my amen on that one. He's influenced me quite a bit in my development. I don't want to get too nervous for these shows, but with Ilya for the first time, I was kind of like, oh man, this guy's so smart. I hope I don't get too nervous for these shows, but with Ilya for the first time, I was kind of like, you know, had that like, oh, man, this guy is so smart.

Starting point is 00:52:27 I hope I don't sound like a dope interviewing him. Yeah, he's awesome. Shout out to Ilya out there. Very cool. Next one is open source radar. So if you had a weekend and you were just going to hack on some stuff, you weren't working on your After Effects things, but some new project, something that's interesting to you um what is it uh it's not

Starting point is 00:52:52 even necessarily a new project but the new stuff going into uh like the go standard library and the go tool chain i think it's just been fascinating there's just been like a lot of the stuff around the garbage collection and then um this is actually standard library but go fuzz is another one that came out recently which is kind of like fuzz testing and uh just making really solid libraries that are you know well tested against all kinds of you know crazy incoming data so i'd say those two very good very good okay last one for you is if you weren't uh an awesome open source developer working on these database tools and whatnot, if you weren't doing this, what else would you be doing? Oh man, that's a tough question right there. You know, actually, this is going to be kind of a cop-out answer, but I started doing the startup thing for several years.

Starting point is 00:53:43 I was going to make a company and do all this stuff. Right. And I eventually stopped doing that because I came to this realization that if I made a bunch of money, I'd just go write open source all day for my free time. So I can't think of what else I'd be doing, honestly, with my free time. I think I'd just go on hikes. We've got some awesome stuff around here in Colorado, so I think I'd just hike. I'd like to be a tour guide, maybe. That works. That works tour guide, maybe.

Starting point is 00:54:05 That works. That works. I love that. So you're like, well, if I can make a bunch of money, then I can go do open source for the rest of my days. You're like, wait a second. Wait, I can do that right now. I can just do open source right now. Yeah, exactly. Awesome.

Starting point is 00:54:17 Well, one thing to mention before we say goodbye is that we've been doing a film series, ChangeLog Films, at all of the, not all of the, but many developer conferences. So we call it Beyond Code. We ask similar questions to the ones that we ask for our closing questions. In fact, Programming Hero is featured in that series as well.

Starting point is 00:54:38 Two different developers of all shapes and sizes at the after parties of different conferences. It's really cool. I want you to check it out we just finally launched the website um because we've had the videos forever but you know the uh the cobblers kids have no shoes so making a website for ourselves you know was a lot of work um but we're pretty proud of it we want you to check it out it's at beyondcode.tv right now we have season one up that was at keep ruby weird last fall we have seasons two three and four also in the can so those videos will be

Starting point is 00:55:13 showing up there shortly um check it out beyondcode.tv let us know what you think and i just want to say thanks to you ben for joining us uh it's really good conversation i'm excited about bolt db and all these cool new things coming out of the go ecosystem. I want to give a shout out to our Changelog members and our awesome sponsors for this show, helping make it happen. Don't forget to tune in next week when Karen Meyer joins us to talk about Clojure. Check that out. And until then, we'll see you. Thanks for having me, Jared. We'll see you next time.

The Changelog: Software Development, Open Source - BoltDB, InfluxDB, Key-Value Databases (Interview)

Ben Johnson joined the show to talk about BoltDB, InfluxDB, and several other key-value store databases out there and why he's so passionate about developing open source software....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

The Changelog: Software Development, Open Source - BoltDB, InfluxDB, Key-Value Databases (Interview)

Ben Johnson joined the show to talk about BoltDB, InfluxDB, and several other key-value store databases out there and why he's so passionate about developing open source software....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.