The Changelog: Software Development, Open Source - BoltDB, InfluxDB, Key-Value Databases (Interview)
Episode Date: August 22, 2015Ben Johnson joined the show to talk about BoltDB, InfluxDB, and several other key-value store databases out there and why he's so passionate about developing open source software....
Transcript
Discussion (0)
welcome back everyone this is the change log and i'm your host adams dicoviac this is episode
170 and today we're joined by ben johnson and when i say we i I mean Jared because Jared went solo on this show and he was joined by Ben to be schooled on BoltDB, InfluxDB
and several other key value store databases out there
and Ben also shared why he's so passionate
about developing open source software.
We have three awesome sponsors for the show
CodeShip, a long time supporter
and two brand new sponsors, Imagix and Casper so huge thanks to each of them for supporting the show, CodeShip, a longtime supporter, and two brand new sponsors,
Imagix and Casper. So huge thanks to each of them for supporting the show.
Our first sponsor is CodeShip. They've launched a brand new feature called Organizations.
You've heard me talk to you about it before, but now you can create teams,
set permissions for specific team members, and improve collaboration in your continuous delivery workflows.
You can maintain centralized control over your organization's projects as well as teams with this brand new feature.
And you can save 20% off any plan you choose for three months by using our code, TheChangeLawPodcast.
Once again, that code is TheChangeLawPodcast.
So get 20% off any plane you choose from
code ship for three months head to code ship.com slash the changelog to get started and now on to
the show welcome back everyone jared here i am joined today by Ben Johnson. Ben is a Denverite. I think that's what they're called.
Is it a Denverite? Is that the term for someone from Denver? Okay.
Yeah, we're Denverites.
Yeah, so Ben's a Denverite. We met Ben out at GopherCon back about a month ago now.
He describes himself as an open source software developer who specializes in customer behavior analytics and data visualization.
He's also big into distributed systems and data stores.
Ben, welcome to the show.
Thanks for having me.
So, out at GopherCon, it seemed like your name kept coming up all over the place.
Very active in the Go community,
very active in kind of the New Wave data store community.
Why don't you tell me a little bit about yourself,
how you got here, and what you do?
Yeah, sure.
So I've been writing software for about 15 years now.
I started out as an Oracle DBA back in the day
and kind of moved into web development
and just kind of jumped around web development. And, uh,
yeah, just kind of jumped around,
started getting into open source,
I don't know,
four or five years ago and somehow just kind of landed inside the,
the data database world.
Um,
there's just a lot of turns and whatnot.
So,
uh,
yeah,
just,
I love writing,
uh,
open source.
So I do a lot of it in my free time.
Well,
yeah.
What did,
what initially drew you to
open source as a thing what excited you um i think i think the idea that you could put something out
there and that it not only helps people but you also you know people give you feedback and you
get a kind of you just learn so much as soon as you put out software everyone will either love
what you have or tear you apart.
There's all this learning, even if it is kind of hard.
So I try to do a little bit of exploration just to find out more about you online.
Ben B. Johnson on Twitter, if anybody's interested out there.
Notice you don't seem to have a website.
You apparently are a consultant of some kind.
You do have a LinkedIn, I found that.
But you seem to be focused on your GitHub and your Twitter, and that's about it.
Can you tell me about your business side and your consultancy and how that all works?
Sure.
Yeah, I work with Influx TV.
I work with them and just write a lot of the storage layer and the distributed systems parts of that database.
That's who I work for during the day.
And I've consulted in the past, and I used to work at Shopify for a while.
So I just kind of hopped around here and there.
Right on. So you're focused on Influx at the moment.
Well, we have you on the show because we want to talk about databases.
And it seems like for a long time there was kind of your set of relational databases. And then there was these niche things out there that you heard of, these document stores or column stores.
I remember in the area that I worked, there was actually a database called Cache.
I don't know that one. I don't know that one.
You don't know that one.
So yeah, it's big in kind of the medical world
and also in financial transactions and whatnot.
But it was very niche.
So you kind of have these pockets,
and it has its thing going on.
And then you had kind of the NoSQL explosion a few years back
where you had your mongos and your reacts
and interest around those um
and we seem to be having another wave of of new things and maybe they're not new they just i think
they're new because i don't really come across them as much um some of which are things that
you're you're involved in influx bolt uh you emailed me a list of things i had never heard
of half of these and i try to keep up with open source.
It's hard.
It's hard, for sure.
But there's LMDB, LevelDB, Parquet,
which I had never heard of, Cassandra,
which is coming out of Twitter and Facebook
was big a little while ago.
Has there been an explosion in data stores,
or am I just noticing it?
Oh, no.
Yeah, there's definitely an explosion.
I think people are starting to realize that once you once you get to a certain scale or
a certain use case like you can optimize at these really low levels and you start to you know what
used to just be an application and you're going further down and uh lower down you start to just
build your own database at that point and uh i think that people find a lot of either um operational simplicity
from having a very specific uh target or use case or they just get a lot of performance out of it
um there's all kinds of different reasons to get down that far i think when the no sql uh movement
might just call it that first hit and i'm not i can't remember the timing but maybe it was 10
years i can't remember how long ago that was.
And MongoDB became kind of a thing that was winning the hearts and minds of developers.
There was this whole throw out your relational database mindset.
It seems like now that's kind of shifted, and it's not like throw it away.
It's like here's something you can use in addition.
Is that how you feel about these types of databases?
Yeah, I think a lot of it has to do with, you know, your requirements, kind of what you know already, where you're coming from.
I think SQL came out of, you know, back in the 80s, it came out of, you know, you had business people that would go up to a terminal and they wanted to be able to write their own queries or, you know, some level of that.
And then we started trying to fit it into this object model too.
And I think a lot of people have gotten tired of trying to fit the relational and object model together.
And business users, they'll use a web UI now.
It's made by a developer.
So we don't have that direct SQL requirement anymore.
But I think that there's not, the NoSQL movement doesn't have enough of a structure.
We make all these databases, but we don't tell people how to actually use them
or what best practices are.
So I think we developed these 20 or 30 years of SQL best practices
that I feel like they're starting to fall back on.
And they're saying, well, I know how to do that.
I'll go back to doing that.
This object thing or document database is confusing.
So I think if we can actually do a lot of education around that, this object thing is just, or document database is confusing. Yeah.
So I think if we can actually do a lot of education around that,
I think there's some great use cases for, you know, document databases or key value stores.
Yeah, I think key value stores is one where we've definitely seen a lot of activity, a lot of options. And maybe it's because a key value store conceptually is pretty simple.
I don't know.
I'm not going to go out and say to implement it is simple because you would know a lot better than I do
that I'm sure there's tons of nitty-gritty details
and bumps in that road.
But man, there sure are a lot of options.
And it seems like a lot of those options are written in Go.
Yes, there's been a huge influx.
I think part of it is just the simplicity of
you get something written in Go,
you can compile it onto a bunch of operating systems
and just distribute it out.
A lot of the uptake we got at Influx
has just been people saying,
this is really easy to set up
compared to a lot of other alternatives
that have been around for longer.
It's just people can get up and running,
people don't want
to spend their whole day trying to learn one tool.
They just want to run a command and have it there.
So I think Go does
that really well.
So you have
one of these fancy
new data stores.
I'm just going to act like an old man and talk about everything as if it's shiny new and foreign to me.
At least for the first part of the call until you kind of school me on how all these things all work.
But yours is called Bolt.
Bolt DB slash Bolt on GitHub.
And it seems to be production ready.
Why don't you go ahead and just give us the elevator pitch for BoltDB?
Sure.
BoltDB is a read-optimized store, a key value store, and its goals beyond anything else
is just to be operationally simple and to have a clean API and have strong transactional
support.
So there's a lot of key value stores out there that will give you,
maybe it's really fast write performance, but the reads are really slow.
Maybe you'll get certain other benefits where it might have a crazy API,
but it might be fast.
Actually, a lot of key value stores seem to be centered around just being fast, which I feel like as computers get faster, you know, I don't feel
like, you know, most websites out there aren't getting thousands of hits per second, you know,
they're getting a hit per second or, you know, somewhere in the tens of hits per second. So I
think that a lot of people try to, they look for the fastest thing out there because they want
everything to be blazing fast.
And they just forget about all these other operational side considerations.
Yeah, it's kind of the thought that became a meme with web scale back in the day to find out what is and what is not web scale.
And the fact of many people's businesses and websites is you can count the Twitters and the Facebooks on one hand.
Sure, there are other large sites out there.
There's the Reddits and the top 100 Alexas.
But most of us make our living and live on the web
in smaller, less populated areas.
Yeah, for sure.
It's interesting to see all the databases that
come out from you know the facebooks and the twitters because they have such different
requirements than 99 of people out there um so i think that you know it's interesting to see where
where people are the databases are coming from and right how those requirements line up So it looks like Bolt was 1.0 November of 2014. And it's a bit of a remix because you say it's inspired by Howard Shue's LMDB project. Can you tell us about LMDB, how it inspired. I really like what Howard did with that. And what it is,
is it's basically a B-tree. So your data's structured in this B-tree that you can access
your data, you can write to it. And whenever you change a leaf inside of your tree, it'll copy all
the parents as well and kind of make this new version of the tree. So every change will make this new version, this incremental version, so that as you're
going along, everything that's reading from that tree will get a kind of a snapshot of
it and work in a transactional way.
And then as things update along, other readers get their own snapshot of the world.
And it's really good as far as having great transactional support.
You can do great things like operationally where you can just essentially just copy the file as a backup.
And if you're setting up a website or setting up an application, you don't want to have to worry about setting up MySQL and having a replica and doing all this other crazy stuff.
You can attach on just a web handler, like an an http handler and stream down your database if you
wanted to have that option like it's three lines of code to do a backup basically so certain things
like that it has this very simplistic design as opposed to um there's other there's another type
of database called an lsm tree it's a log structured merge tree merge tree and what that
is is it it takes these different levels it'll kind of create keys and values in these sorted blocks.
And each one will be a different file.
And then as you get these files large enough, they get compacted and written into a new block that's larger.
So it takes a bunch of them and makes them into these larger ones that are kind of at different levels.
And those are really good for writes.
But operationally, it can be a huge pain
because you can end up with hundreds or thousands of files
where you have this kind of tough,
it's kind of tough to snapshot,
like just copy a file.
It's much more involved than that
and how you try to stream that out and stream it in.
So operationally, Bolt is simpler,
although it doesn't have the benefits
of write optimization,
like something like an LSM does.
So again, it's kind of right tool for the
right job. If you have a
read-heavy situation,
Bolt might be a little bit better fit. If it's
write-heavy, then something that uses LSM
might be a little better fit.
Yeah, for sure. And actually, a lot of people will
ping me on Twitter or on GitHub
and say, Bolt is slow, or
whatever. Just like that. Like, hey, Bolt is slow or like, you know, whatever.
Just like that.
Like, hey, Bolt is slow.
Pretty much.
They'll just say like, Bolt sucks.
People are so nice.
Yeah.
And I mean, I will just paste them a link to a different database that will probably fit their use case better.
I feel like people try to, if you come across a project where they try to be everything
to everyone, it's just, I feel like it's injured ears.
We should know that there's trade-offs on all this stuff.
And, you know, Bolt is not the right tool
for probably many projects out there,
but it might be for yours, you know?
You know what's funny is,
we just had Thomas Reynolds on the show last week
talking about Middleman,
which is a static site generator,
completely different situation.
But, you know, he's been writing Ruby and JavaScript for years,
and we started talking about programming languages and stuff, and I just asked him very pointedly if he's still writing Ruby and JavaScript for years, and we started talking about programming languages and stuff.
And I just asked him very poignantly if he's still bullish on Ruby.
And his answer was very familiar to what you just kind of said here, where it's like, well, there's probably a better tool for a different job that you may have.
And it's like, well, that was an extremely level-headed answer.
And it's funny because we all kind of live, work, and have our interactions on the internet.
And I don't know if it's the written form
versus here we are, you and I talking on Skype.
It's like people are very level-headed
about these types of things in real life.
But when we get on the web, it's just like, you know, bull sucks.
Yeah, exactly.
We lose all sense of like right tool for the right job,
and we're all trying to just build good software.
And it's like we get into religious wars over these things.
I wonder if it's just that degree of separation or what it is about the internet that makes us like that.
I think it is just the anonymity.
Because if I go to a conference, no one ever comes up to me and tells me how much it sucks.
They say, I had this difficulty with performance where I did this.
And then we actually have a conversation about it.
And then everyone walks away kind of being a little more knowledgeable.
Right. Yeah, it's just hard to be nasty.
Exactly.
So let's go back to not bolt sucking, but things that it's good at.
And it must have a lot of use cases because you do have a lot of adoption.
You have a lot of other projects that kind of use bolt um behind the scenes um i think perhaps the reason is that
because it's embedded um as opposed to a server type of setup can you talk about the embedded
aspect of the database sure yeah so it's it's just a library that you you bring into your go program
and then you point out a file and and you're basically ready to go.
There's almost no configuration options.
Even if you wanted them, you can't really configure the database.
And it's a single file.
There's an OS lock on it, so you can only have one kind of attached to that file at a time,
as opposed to like a MySQL or Postgres,
where you have this gigantic configuration file where you have like this gigantic configuration file
where you have to find some tweaks to make.
But yeah, so I mean, from that side,
operationally, it's easy to just get it up and running.
A lot of projects, especially when they're products
or like an open source project,
you can't make that requirement to say,
okay, first you guys got to set up these four services
and then configure it here and do all this stuff. Cause no one, no one wants to go through all that. They don't want to add one
more thing to their stack. So I think it's, it's been a lot of success from that. And you know,
I think a lot of times too, another important thing too, that people don't think about is that
in a lot of projects, the data store is not your bottleneck. You might have a lot of other processing going on,
and you're just storing some metadata maybe,
or you're transferring across a network,
or you're doing all kinds of other things.
So from that perspective, when performance is an issue,
usually the next most important thing is operational simplicity.
And just to be able to say, this is a file, I can just deploy it,
I don't need to
do anything beyond just starting up the program. So I think that goes a long way.
Yeah, I agree. Operational simplicity absolutely does go a long way. You also kind of focus on
API simplicity. And the fact that this is just a key value store. And that's not a bad thing,
that's a good thing, right?
You're keeping it simple on purpose.
As far as the API is concerned, is it just a matter of I put data in with a key and then I get it back out by that key?
Is it as simple as that, or is there more to it?
It's essentially that.
You have things called buckets as well.
They're basically like a key space.
So you can only have one unique key in a bucket but you can have buckets inside buckets and you can do some some
interesting things around that but honestly i mean most of the time when i use use bolt for an
application i'll treat it almost like the way that i structure like tables inside of a relational
database okay i kind of have top level buckets where i might have a customer's bucket or a
you know whatnot and my primary key you know primary key, you can create sequential integers inside Bolt per bucket where I might have an ID for a user.
I can generate it off that bucket and the user's bucket, and then that becomes the key for them.
So user one is pointing out this encoded data structure for the user.
So, I mean, really, it's not actually that far of a departure from
relational databases in that sense. Like you think of, you want to find a user by ID, they just
find their ID and look them up. You don't get the benefits of things like indexes.
You don't get a fancy query language. Yeah. So if you want to find that user by their first name,
now you're in trouble. Yeah. I mean, you need to look, you need to save that separately as a,
you know, kind of create your own index. you need to save that separately as a kind of creature on index.
So there's definitely a bit of a hurdle in that sense.
But if the indexing isn't a huge piece for you,
or if you're, a lot of times people index
using something like Elasticsearch,
you know, some full-text search engine.
Actually, there's another one called Believe
that'll use a bolt underneath.
People are going to attach onto that
and do their searches through that.
So it depends on the use case.
Again, right tool for the right job.
But I treat them kind of similarly
to a relational database.
And when you think about relational databases too,
they store their rows that they have in there
or just an encoded data structure
that has a row ID that points to it.
So they're almost, what's the word, key value stores underneath.
They've set kind of a relational layer on top.
Yeah, that makes a lot of sense.
And then when it comes back to the operational simplicity side of things,
you're just storing all this in a file on disk, right?
It's very SQLite in that sense.
Yeah, single file.
Yeah, it's pretty straightforward.
Pretty straightforward. So that, you know,
backups,
moving things around,
copying data, it's, you know, if you
use your Linux or your Unix tools,
right? Oh yeah, pretty much.
I mean, there's some locking stuff, so you have to go
through. It'll actually be a
transactional copy inside the database.
So you start a transaction, you can stream it out. But it'll go as fast
as your operating system, your SSD can read the data off.
So it goes pretty snappy.
And it sounds like you've gotten that into a scenario that says Bolt currently
in high load production environment serving databases as large as one terabyte.
So even in that case, you just have one terabyte file
sitting there? Sometimes we'll split
off
into multiple partitions.
That's more of a load balancing thing.
It was actually at Shopify, we created
an analytics database
that was clustered and we had
multiple bolt partitions
running on each one.
And then we'd copy them around and redistribute the load as we needed it to.
We used consistent hashing inside of there
to be able to redirect requests to the correct partition.
Very cool.
Well, this sounds like a good spot to stop
and hear from one of our awesome sponsors.
When we get back,
I'm going to talk to you about some more use cases,
maybe compare it to a few other key value stores,
LevelDB.
Others people might be familiar with Memcached, Redis, such things.
So stick around, and we'll be right back.
ImageX is a real-time image processing proxy in CDN, and let me tell you, this is way more than ImageMagick running on EC2.
This is way better. It's everything on ec2 this is way better it's everything your
friend and developers have dreamt of output to png jpeg gif jpeg 2000 and several other formats
and if you're like me you've ever argued with your boss or a teammate about serving retina images
to non-retina devices you'll appreciate their open-source, dependency-free JavaScript library
that allows you to easily use the ImageX API to make your images responsive to any device.
Now, all of this takes a platform, and the ImageX platform is built on three core values,
flexibility and quality, performance, and affordability.
When it comes to flexibility and quality imagix has over 90 url
parameters that you can mix and match to provide an unlimited amount of transformations that you
need for your images and they take quality very seriously and because of their commitment to high
quality the guardian eventbrite kickstarter quiz up and many more trust them to serve their images
now when it comes to performance imagix operates out of data centers filled with top of the line
mac pros and mac minis and they're set up for a completely streaming solution this means your
images never hit the disk images are served by the best ssd-based cdn for delivery around the world anywhere extremely fast
and while we're talking about speed almost all the image processing happens on gpus
this means transformations are super fast when compared to competing virtualized environments
and lastly it's all about affordability everyone wants to save a buck. That's how the world works. Because Imagix processes close to a billion with a B images per day, they're able to make certain
optimizations at scale and pass those savings on to you. To learn more about Imagix and what
they're all about, head to imgix.com. Once again, imgix.com and tell them adam from the changelog sent you
all right we are back with ben johnson talking all things open source databases
specifically at this moment bolt db which is ben's popular key value store
in the go ecosystem ben we were talking about use cases.
Can you give us kind of how it's being used in the wild and maybe some projects that are
built on top of Bolt?
Yeah, sure.
I think it's largely used by projects that, you know, a lot of times it's for projects
that have like a data store they need inside of there.
But that's not, you know, it's not the main focus of the application where they have, you know, it's not like a web app where some giant database is sitting behind it and people are using it.
So I think it's getting towards that.
But I think a lot of cases tend to be it's storing metadata or like smaller sets of data currently.
There are definitely some exceptions to that.
There's a guy named TV.
He wrote Bazil, which is like a distributed file system, like personal file system, kind of drop boxes.
But he's using Bolt for that.
He's actually been around for a long time.
When I first wrote Bolt, I put it out there,
or not even put it out there,
I just had it as a repo.
He just came along one day
and was just going line by line through the code
and being, this is wrong, this is wrong.
What?
I mean, in the most friendly way.
He'd tell me how to fix it,
give me links to low-level Unix documentation.
So he definitely helped to stabilize Bolt.
So huge shout out to him.
I know that at Heroku,
they have some log stuff that runs through Bolt
or uses Bolt in some capacity.
But yeah, there's definitely some cool projects out there
that people are using it for.
So in addition to that,
it seems like whenever you talk about key values,
there are a few
common use cases specifically thinking about web apps uh that's kind of where my mind goes
as i'm a you know web developer by trade so um caching is a big one um background jobs seems
like those cues are pretty good scenarios um there are tools out there that do such things. I mentioned them before the break, Memcached and also Redis.
Can you kind of compare and contrast to those if you're familiar with them?
Sure, yeah.
How good would Bolt be at those particular jobs?
Well, so Memcached is meant to be, if I understand it correctly,
it's an in-memory cache.
So I don't think there's a backing store on it.
Yeah, it's not persistent.
Yeah, it's been a long time since I've used that.
But yeah, so you can store data in there all day,
but it's meant to just be a layer to hit quickly,
but you can always fall back to the underlying data store.
So Bolt, in contrast, it writes all the data to disk safely.
Even in the event of a crash, it'll come back up.
And if you've committed a transaction, that transaction will be there.
If you look at something like Redis, on the other hand, it has, I think, two different persistence layers.
They have like a write-ahead log and a snapshot, I think.
I could totally be butchering this.
But yeah, I mean, Redis, it stands at kind of a higher layer.
They have a key value piece in there.
I know they have a whole bunch of other data structures they do as well.
Yeah, I mean, as far as complexity goes, Redis has lists and sets and different objects and stuff.
Yeah, which is really cool.
They don't have a sense of a transaction, though.
So I think if you really want strong transactions, which I think a lot of people don't realize how important that is.
Like we get these kind of weird inconsistent states when we're trying to write 10 keys, but we only write eight of them.
And, you know, what happened to those last two?
And we try to resolve that by, you know, writing jobs to kind of fix it later on or check for it.
But if you can actually get strong serialization or serializable transactions, I think that
goes a long way.
So Bolt has transactions.
Yeah, they're actually full ACID serializable transactions.
Can you teach me that like I'm five?
Can you just go through a transaction and tell us what that all implies?
Sure.
So you start a transaction.
You can do read transactions or write transactions.
Write transactions, you can only have one at a time, so they all go sequentially.
They're serialized.
Read transactions can start on and off whenever they want.
You can have multiple at the same time, and they'll all go off at the same kind of that
point in time when the transaction started.
So the actual write transactions, they will kind of give you a space to work in
and you can change data
and rewrite those keys and values or create buckets.
And then when it goes to commit it,
it'll take those pages it wrote
and it'll write all the pages out
and it'll write a new meta page.
And it kind of has this almost like a,
if you've ever done like graphic stuff,
has like basically a double buffer for your meta page. So it has of has this almost like a, if you've ever done like graphic stuff, has like basically a double buffer for your, for your meta page. So it has to write all the data
first and then it writes a new meta page to point to that new data. And it, the transaction is not
committed until it writes that single last meta page. So it has this interesting piece to it
where it's, there's not like a, it's not not recovery like you get in a lot of databases like if it crashes it'll just start back up with whatever data is committed there's no
doesn't have to re-read a log to you know reapply changes it's just wherever it was
it has this unique safety property which is really nice so the i don't know if that's in
depth enough or you want some some more. No, that was pretty good.
Okay.
So that sounds like, I mean, and you implemented all that yourself,
so that sounds like something that is a nice thing to have,
especially for something that you're going to be building on top of.
Sounds like a feature that is definitely not unique to Bolt,
but as far as key values go,
I think that's nice to have, right?
Or that's kind of even a,
you got to have that, right?
Well, you think you got to have it, but like serializable transactions,
they're not even the default
on a lot of relational databases.
I think they're actually recommitted transactions.
There's all kinds of different isolation levels.
And it's honestly hard to remember
all of the little nuances.
But serializable transactions means you can't read anything that's been committed or anything in another transaction that's been committed already.
But didn't get committed before the transaction started.
You kind of get this whole view of the database.
And it's basically how you think of transactions
like in your head normally it's like i have the safe world where everything is you know how i
expect it to be that's a realizable transaction there's a lot of other ones that try to
make um make trade-offs for performance or speed yeah yeah where you can kind of like you can read
things that have been committed in another transaction after this one started, but before it stopped.
It's confusing, honestly.
But if you think of transactions, it's probably what you'd expect.
But yeah, it's really useful to have that safety.
And I tried to pare down Bolt to really be the core things I needed.
LMDB had a lot of other features around performance where you could write stuff directly into the database instead of going through some other safety measures.
And they had some other tradeoffs they made.
But I tried to cut out all those extra pieces.
So it ended up being 2,000 lines of code, which I don't know if that sounds like a lot or not.
For a database, it's tiny.
Yeah, I was going to say, it sounds like a lot if I was just going to sit down and code that day,
but for a database, it doesn't sound like too much.
Yeah, so I mean, LMDB, I think is about 8,000 lines. If you look at like LevelDB,
I think it's around 20,000 lines.
LevelDB is very similar to Bolt. It's out of Google. It seems like there are some differences.
Yeah, so that's an LSM tree. So that'll do the write
optimized. Whereas,
you could write stuff into LevelDB
much faster than you can in Bolt.
But if you're looking to do range scans
where you have a set of data
in order that you're trying to go across,
Bolt will be much faster than
LevelDB. Awesome.
So that's Bolt in a nutshell.
Great readme, by the way. Gotta give you
respect for going into great detail
there. GitHub.com
slash BoltDB slash Bolt.
Check out the readme. Ben goes
through not just usage and
backups and stuff, but he actually
goes through comparison with other
databases. He'll
talk about the LSM tree versus the
B tree, when you should use which one. There's
even caveats and limitations. Lots to be had there. Check out Bolt, a low-level key value store that's
simple on purpose and sounds like it's a rock. It's been production ready since November of last
year and people are picking it up. So check that out. I think we should switch gears a little bit and talk about the next one.
I know we had a list of a ton of databases.
We're just going to pick a couple because we don't have too much time.
The next is the one that you seem to be working with either in a consulting capacity or full-time,
but InfluxDB, which is open source as well but also has a business built around it.
Can you tell us about Influx?
Sure, yeah. It's a time series database,
and we really center around being easy to get up and running.
We have clustering in there.
We can actually spread it across a lot of machines.
And then we're building out a lot of new functionality now
for doing a lot of write-ahead log stuff for write optimization
and doing compression in there to shrink down the size of the database.
So it's coming along.
People have really been interested in it as far as just, again,
it's one of those simple databases that we use Bolt underneath,
so there's no other service to get up and running.
I know some other things have relied on Redis
or some other data stores in the
past some actually a lot of them rely on um cassandra in the background that kind of push
that off to there uh but we it's really just one binary you just download and just start up so
that's yes it's been great in that sense people have been really interested so as far as time
series databases go um i don't have much of a context
besides speaking with Julius Volz about Prometheus,
which at its core has a time series database.
And I know there's some Prometheus uses Bolt here or there.
I'm curious about if it uses Influx or not.
But are there other time series databases out there
that people can pick up and use or is
is this uh a brand new thing uh there's been time series databases out there um but the funny thing
with time series databases is that especially some of the older ones is that they're just
notoriously difficult to get up and running um and a lot of people will actually pick up
influx more or less i mean initially out of frustration just
they spent three hours trying to get right graphite running or something like that uh-huh
and they just they gave up so i mean it's it's interesting like the the technical decisions you
make along the way about what dependencies you might need and how those dependencies change over
time and what how that makes a project, hard to get up and running.
So, yeah, it's not a new thing by any means.
I think there's a lot of ease of use stuff.
We have a query language in there.
We do a lot around the way people can retain data long term and how they roll it up and how they can move it around.
So there's a lot of thought that's put in that too.
So let's maybe zoom out a second and talk about time series as a thing.
When would I reach for this type of a data store?
The one I can think of off the top of my head is analytics.
But are there other use cases for time series data stores?
Sure, yeah.
I mean, analytics is a big one.
Monitoring has been another big one as well.
A lot of people have sensor data.
That's actually been a big growing one with Indeed. And there's some weird use cases with sensor data as well. A lot of people have sensor data. That's actually been a big growing one where they need.
And there's some weird use cases with sensor data as well. There's one where there's a company that
has sensors, but they don't send data continuously. They store it up and then every like four hours,
they send off the data. And for some reason, some databases, they expect kind of a stream
of data coming in and stuff will get dropped off if it's too late or out of order
or certain things like that.
So sensor data has been a big one as well.
So I think between those three, those are probably the main ones.
I can see it also with streaming financial transactions and market stuff.
Yeah, that's another one too.
Yeah.
I mean, anything that's going to have a real-time stream of data and you're going to be either capturing it or aggregating points in time to use later.
Seems like that's kind of where these things play.
Yeah, it's one of those things too.
It's one of those use cases that's grown large enough where people have started writing databases specifically for it. And when you have, if you try to put it into something like MySQL,
I mean, MySQL has a ton of features on there for relational access
and indexes and all kinds of stuff.
But if you really just have a timestamp and a value or a set of values,
and that's the data that you have going in,
there's much better ways you can optimize that in a specific store.
So Influx is both an open source project and a company.
I'm not sure what the product is, if it's a services, if it's a pro plan.
How does the business side break apart from the open source side with Influx?
Yeah, so the business side, we have a managed hosted product over there as well.
And we do a lot of, we have some SLA stuff as well for more enterprise customers.
Those have been the big pushes too.
We have some stuff coming down the line as well.
But I think that's more hush hush.
So, yeah.
So the people have been pretty excited too about having kind of a roadmap of where Influx is going and what we're doing with that.
I think a lot of times some businesses have been hesitant with other open source projects that they don't know the long term.
Like if they want to build a product on top of Influx, they want to know that there's a company there and that they have funding.
They can't disappear.
Exactly, yeah.
Because sometimes projects do kind of go into the ether right yeah and i guess that you know whenever we have
a business divide and and an open source divide we start to wonder about licenses by the way
uh changelog listeners who always are asking us talk more about licenses um bolt is mit licensed um how does influx's license break out yeah influx is
also i believe either mit or bsd and they uh honestly the one of the reasons i came on
originally with them is that both the uh the founders are just awesome laid-back cool guys
that love open source that paul paul dix has been involved with on the ruby side for a long
time um and then you know they're they're very focused on putting out stuff and being in the
community and being um and talking to people on twitter or on github and getting people involved
so i really like that about them and but they don't have a restriction around uh like a gpl
license or anything like that.
Okay.
So they've been pretty open about it all.
I know there's been some contention about whether you should do a dual license
or how all that lays out.
I'm kind of anti-GPL personally,
but I'm sure that's going to start a flame war right there.
Why? Tell us why.
You know, I think the thing is,
I guess I shouldn't say anti-GPL.
If it works for you, that's great.
For me personally, I like to make things, and I like to be able to just put them out there in the world.
And people can kind of riff off that and do something with it if they want to, or they could go build a company out of it. If I can do something that will somehow make value in the world,
I think that's awesome.
But whenever I see something that's GPL,
I don't know if I'm ever going to want to do something in that realm again.
I don't want to worry about some derivative work issue coming along later on.
So if I see GPL, I honestly just close down the project.
I don't even look at the project because I don't know.
Just like that.
Yeah, just like that. I hate to even say that because I honestly just close down the product. Like I don't even look at the project because I don't know. Just like that. Yeah, just like that.
I mean, I hate to even say that because I think people have done great work.
That is GPL.
But, you know, there are a lot of businesses that just simply can't use it.
Right.
And some people may want to use it in a business capacity.
I know there's all kinds of hoops that just kind of make me skittish personally.
Yeah, I'm of two minds, as i am on many things um i can
see both sides and uh i personally mit license almost everything i do that being said like i'm
mostly putting out small things that are i think you know trivial um not like run i mean even your
bolt db is more ambitious and and because its infrastructure is more likely to be included in commercial products than anything that I've built open source.
So if I had a more substantial, bigger thing, I might put more thought into it personally.
But yeah, I can see how the GPL limits adoption, absolutely, and how there's a lot of noise in open source especially now more than ever
it's hard to uh let the cream rise to the top so to speak that's you know one of our missions with
the changelog is to shine the light on open source like the little guys who are doing cool things but
you know their voice gets drowned out in the crowd is We like to shine a light on that because we realize that there's a lot of noise.
And so putting a GPL license on your thing
makes it harder for it to take off
like it would with a more liberal license.
That being said, I also understand the side
where it's like companies are just profiting off of my work.
I get that. I get that too so it's
it's tough um yeah i had a discussion with mike or just a small twitter discussion with mike perim
is that how you say his last name perim yep yeah and uh yeah i mean it seems like the gpl
the dual license is working for him with sidekick um so i mean i i certainly i don't want to knock
it by any means.
I think there's definitely a use case out there for it.
We've had Mike on the show a few times.
He's been unique in the ability to turn a popular
open source project into a business,
a lifestyle business, not a VC-funded larger thing.
He has a lot of opinions on
not just licensing, but also the
sustainability of open source and how to make it work for you.
And so I'll just submit that for the listeners.
If you're interested in that topic, I don't have episode numbers on me, but go to changelog.com
slash podcast and just search in page for the word Mike or Parham.
You'll find some interesting episodes on that.
Yeah, I mean, when it comes to licensing,
it's something that we all have to wrestle with
as we put our software out there,
is what are our priorities
and what do we feel comfortable with.
So everybody's got to make their own decision on that front.
Yes, it's a minefield, though.
It really is, yeah um so back
to influx just a little bit um it's at 092 so um you know not quite at 1.0 but it seems like it's
out there and gaining steam um anything else about influx db that you want to hit on before we move on?
Um, you know, we're just, yeah, we just keep working at it.
I mean, I know that there are, you know, I think, yeah, I think it's, it's just a product that's continually evolving and improving.
So I think that if people have tried it in the past, you know, we've done a lot to, to
improve upon it.
So I hope people try it again.
Certainly.
Awesome.
All right.
We'll take our second break
when we get back i want to talk to you about something a little bit different which is
um i'll just leave it as the secret lives of data let's just leave it right there
and we'll peel that apart when we get back guess what everyone we've partnered with casper
the online retailer of premium mattresses to to give you $50 towards your new mattress.
The mattress industry has inherently forced consumers, myself included, into paying notoriously high markups.
And Casper has revolutionized the mattress industry by cutting the cost of dealing with resellers and showrooms.
And they pass those savings directly on to you.
Their mattress is a one-of-a-kind.
It's a new hybrid mattress that combines premium latex foam with memory foam.
And the Casper Experience was designed with you in mind and optimized for sleep.
And this is my favorite part.
It's backed by a 100-night no-hassle return policy with full refund and a 10 year warranty. And what's even cooler
is how they ship this mattress to you. It comes in a box that couldn't possibly fit a mattress.
And when you open it, the mattress unravels for you to lay down and catch some Z's.
Head to casper.com slash changelog and use the code changelog when you check out to get $50 towards your new mattress. Enjoy.
All right, we are back with Ben Johnson talking open source databases and perhaps somewhat related
is this really cool thing called the Secret Lives of Data. That's thesecretlivesofdata.com.
We'll link it up in the show notes where he explains a thing called raft in a cool visual way ben can you tell us about this
yeah sure so the secret lives of data is just meant to be a project where um i feel like there's
a lot of like distributed systems and database topics and computer science topics that like i
honestly feel like you can explain any of those topics with like circles and lines and motion
like that's kind of whenever you go on a whiteboard, you're like, this is the server here and over here and does that.
But we don't kind of have that.
We have books, static images.
And I feel like there's just kind of this there's a piece that's lacking, especially with so many new distributed databases and all these kind of systems design things that people need to learn about.
But it's like research papers and it's these books that come out that are kind of tough
to sink in.
So I wanted to find something in some way that was easily digestible to explain complicated
topics like distributed consensus, for example, is not like the easiest topic to explain to
someone.
But if you can step through it piece by piece and kind of show some motion with it, I think people tend to pick it up.
I've had a lot of people actually mention that they read through the paper a couple of times, but it didn't click until they saw this visualization of it.
So to explain what it actually is, it's kind of a data visual.
It's almost like a motion graphic of how raft and distributed
consensus so this protocol called raft uh implements distributed consensus where you have
a set of nodes like a uh like a cluster of computers and they need to agree on some value
and how that happens and how it changes over time and if you get like a split in your network you
know what happens to the different sets of nodes, and how does it avoid situations where some nodes think that they might have one value,
another might think it has value.
So there's all these edge cases that you don't think about
and that are kind of hard to wrap your head around,
but I try to explain that visually.
So it does a great job, by the way.
This is incredibly impressive.
And I actually came across this, I don't know when you launched this,
but I think it hit my feeds then.
I didn't know who did it at the time,
but then when I started doing some research into it,
I was like, oh, man, he did this.
That's pretty cool.
So where's the motivation behind sinking the time into this?
Do you have an educational background,
or what made you want to do this?
I know that some people have put out so many great resources that I've learned from.
I know you've had Elio Grigoric on the show a bunch.
When I first got into writing databases, every time I'd find some concept I wanted to learn about,
I'd type it into Google and he'd have the first page on there with his blog about some obscure topic right like bloom
filters or whatever exactly and like I always see these you know and he sunk so much time into his
blog to explain these great things that I learned from I feel like you know what's some way I can
give back and uh like I knew I knew raft really well i've implemented implementations in it and
uh yeah just i was trying to think of a great way to kind of visualize that and show it
and at first i thought it was going to be like a week you know and i'd be done
how long i think it's like a month and a half wow and it was i ended up writing i wrote in d3
but d3 doesn't have a way to like stop motion like part way through okay so i
actually had to write my own kind of timers and framework on how i was gonna like structure stuff
and i essentially wrote like a raft implementation in javascript that i could run inside because if
you play the visualization twice it'll actually be different the second time okay and the you know
the way the the nodes shoot off and yeah so. It was a huge pain.
It's been sitting idle for a little while,
but I ended up taking a while off
and learning this program called After Effects from Adobe,
which actually does motion graphics.
Because I thought there had to be an easier way, and there is.
People do this for a living.
It's almost like a flash, but you can generate video and do all kinds of stuff.
People use it for special effects in movies a lot.
So yeah, I want to start doing stuff.
Originally, I was going to do five-minute videos for things like Apache Kafka or Cassandra
or all these more complicated databases and how they work.
Yeah.
And,
uh,
yeah. So I spent like three or four months learning after effects and reading
books and watching videos on it.
Um,
and then I,
I ran out of time to actually make the,
the visualizations,
but the,
uh,
I realized also originally I was going to do these five minute videos,
but then I kind of realized later on,
like people like snippets seem to be much more easily digestible.
I'm thinking about doing a smaller format of 20-second animated GIFs that I can post up to Twitter.
It seems like I'd be able to spread a little better.
You just click on it and learn about how Apache Kafka works in 30 seconds.
We'll see if that works, but that's my goal right now.
I love it. I mean, I would say that, you know,
just to exhort you to continue in these efforts,
because I think it is a powerful way of teaching and, um,
you know, maybe not to give up completely on your, your, um,
the work you put in to build this one.
I don't know if maybe it's just too crazy,
but if you could get some sort of a framework in place
to where you could do other things more easily,
then you could start to have an infrastructure
for other people building out these types of things on the web.
That being said, animated GIFs, people love those.
People love them
although it's strange to find one that's useful
it is yeah I think it would probably be the first one
they usually just get it
displaying some sort of emotion or surprise
but yeah the first
useful animated gif
maybe you could get it on wikipedia for that
I hope so
awesome
we'll link that up in the show notes.
Ben, I think it's time to go to our awesome closing questions.
And we will ask the first one, which has become somewhat compulsory these days,
which is, who is your programming hero?
I would have to say Elio Gregoric.
I just learned so much from that guy, from his blog.
I would totally just
be his groupie. Totally, if he was at a conference,
I'd just follow him around the whole time.
I have to
give my amen on that one.
He's influenced me quite a bit in my
development.
I don't want to get too nervous for these
shows, but with Ilya for the first time, I was
kind of like, oh man, this guy's so smart. I hope I don't get too nervous for these shows, but with Ilya for the first time, I was kind of like, you know,
had that like, oh, man, this guy is so smart.
I hope I don't sound like a dope interviewing him.
Yeah, he's awesome.
Shout out to Ilya out there.
Very cool.
Next one is open source radar.
So if you had a weekend and you were just going to hack on some stuff,
you weren't working on your After Effects things,
but some new project, something that's interesting to you um what is it uh it's not
even necessarily a new project but the new stuff going into uh like the go standard library and
the go tool chain i think it's just been fascinating there's just been like a lot of
the stuff around the garbage collection and then um this is actually standard library but go fuzz is another one that came out recently which is kind of like fuzz testing
and uh just making really solid libraries that are you know well tested against all kinds of
you know crazy incoming data so i'd say those two very good very good okay last one for you
is if you weren't uh an awesome open source developer working on these database tools and whatnot, if you weren't doing this, what else would you be doing?
Oh man, that's a tough question right there.
You know, actually, this is going to be kind of a cop-out answer, but I started doing the startup thing for several years.
I was going to make a company and do all this stuff.
Right.
And I eventually stopped doing that because I came to this realization that if I made a bunch of money, I'd just go write open source all day for my free time.
So I can't think of what else I'd be doing, honestly, with my free time.
I think I'd just go on hikes.
We've got some awesome stuff around here in Colorado, so I think I'd just hike.
I'd like to be a tour guide, maybe.
That works. That works tour guide, maybe.
That works.
That works.
I love that.
So you're like, well, if I can make a bunch of money, then I can go do open source for the rest of my days.
You're like, wait a second.
Wait, I can do that right now. I can just do open source right now.
Yeah, exactly.
Awesome.
Well, one thing to mention before we say goodbye is that we've been doing a film series,
ChangeLog Films, at all of the, not all of the,
but many developer conferences.
So we call it Beyond Code.
We ask similar questions to the ones
that we ask for our closing questions.
In fact, Programming Hero is featured
in that series as well.
Two different developers of all shapes and sizes
at the after parties of different conferences.
It's really cool.
I want you to check it out we just finally launched the website um because
we've had the videos forever but you know the uh the cobblers kids have no shoes so making a
website for ourselves you know was a lot of work um but we're pretty proud of it we want you to
check it out it's at beyondcode.tv right now we have season one up that was at keep
ruby weird last fall we have seasons two three and four also in the can so those videos will be
showing up there shortly um check it out beyondcode.tv let us know what you think and i just want to say
thanks to you ben for joining us uh it's really good conversation i'm excited about bolt db and
all these cool new things coming out of the go ecosystem. I want to give a shout out to our
Changelog members and our awesome sponsors for this show, helping make it happen. Don't forget
to tune in next week when Karen Meyer joins us to talk about Clojure. Check that out. And until then,
we'll see you. Thanks for having me, Jared. We'll see you next time.