The Changelog: Software Development, Open Source - Redis In-Memory Data Store (Interview)
Episode Date: January 17, 2011Wynn caught up with Salvatore Sanfilippo to talk about Redis, the super hot key value store....
Transcript
Discussion (0)
Welcome to the changelog episode 0.4.5. I'm Adam Stachowiak.
And I'm Winn Netherland. This is the changelog where we cover what's fresh and new in open source.
If you found us on iTunes, we're also on the web at thechangelog.com.
We're also up on GitHub.
Head to github.com slash explore.
You'll find some trending repos, some feature repos from the blog, as well as the audio podcasts.
If you're on Twitter, follow ChangeLogShow, ChangeLogJobs, and me, Adam Stack.
And I'm Penguin, P-E-N-G-W-I-N-N.
This week's episode is sponsored by GitHub Jobs.
Head to thechangelog.com
slash jobs to get started.
If you'd like us to feature your job on the show,
select advertise on the changelog when posting your job
and we'll take care of the rest.
Just like our buddy Chris Epstein over at caring.com did.
He needs a really senior Rails engineer
who's also wicked smart.
Interested in this job?
It's lg.gd slash 5s
for the short link.
Erlang in the back and Python in the front.
UK-based Smarkets is merging internet and financial tech in real time.
If you're game, check out shortlink lg.gd.gd.gd.
AppSpark is looking for a lead iOS developer who can take charge of the UI and UX decisions over there.
If you sling the cocoa, the objective C, and you want to know more, it's lg.gd.com.
Fun episode this week. Talk to Salvatore Sanfilippo from Redis.
I actually love this interview a lot. It was a lot of fun to edit, but the fun thing I think
we'll take from this really is how he mentions his liking or disliking of the term of NoSQL.
Yeah, it's nice to get another take on someone that's created a wildly popular NoSQL solution
and what they think of that term.
It's kind of like HTML5.
It's one of those things you ask 10 people, you get a dozen definitions.
And not to mention the fact that he didn't even look at the other solutions the entire time
he started the development of Redis.
And the cool thing of how this all even got popularized was a pretty cool story, I think, he told.
You know, it seems to be a common thread in really popular applications is to build something that you want to consume yourself and just put the blinders on.
And, you know, darn the torpedoes, I'm going to build something that I want to use.
Yeah, a couple months later, it was a damn near full-time job for him.
Been kind of disappointed we had to sit on this one for a week.
We recorded this a week ago and finally getting to release it. Should we get to it? Let's do it.
We're chatting today with Salvatore Sanfilippo, the creator of Redis. So Salvatore, why don't
you take a moment and introduce yourself and a little bit about the project.
Hi all, I am Salvat to, I'm now working for VMware
that is supporting the development of Redis
and not just my development of Redis, but VMware is also paying the development
time of Peter for Redis. So my usual day is just hacking on Redis the whole time. And this was very, an interesting change compared to the
past. Redis started something like a hobby. Well, not completely without something to
gain because I used it for my startups, for web startups, but the project itself was not
funded in any mean.
For those that don't know, Redis, would you call it a key value store?
Well, it is pretty hard to find the right position of Redis in an everyday more complex database field
because in some way Redis is for sure a key value database.
This is clear from the fact that you can mostly access read data just by the primary unique index that is the key itself.
So in some way, it is for sure a key value database.
But from another point of view, most of the key value database that were before Redis, were from a mathematically standpoint
just a string-to-string map.
While in Redis, values can be much more complex,
and every value itself is something like a small database.
Just to provide an example,
the sorted set is something like a balanced tree itself. So it's like
there is an outer shell that works like a key value database, but the single values
have specific data models. So this is the sum of a key value database with a number of data models that are conceived in order to address, model different kinds of problems.
From everyone that I've spoken to that's used Redis, the very first thing that they talk about is speed.
Does that come from it being written in C, or what's the internals of Redis? Redis started to be fast in order to solve a specific problem I had with...
I needed to write an analytic program, web analytics, that was really, really fast. And I needed to track real-time user interaction with the site
to show these interactions in a web interface,
Ajax web interface.
I tried to model this problem with MySQL,
and it worked for a few months.
But when we started to get more and more users,
we realized that this was not the way to do things
because the cost for every user was impossible to handle.
Later, this project was more or less aborted.
But the idea was to create a freemium business model.
So free users had to cost very little to us.
Otherwise, it was impossible to go forward with the project.
So I started to write Redis with the goal of making it fast.
What is, I think, interesting is that actually Redis started as a free database, not because the internals, the C internals were very optimized,
like a 3D game or something like that.
It was fast because it was in C.
It used an event model, an event-driven programming,
and the data model itself was designed to be fast.
So it's not the fluid of an optimization, of micro-optimizations,
but the API that Redis exports is designed in order to take little time
when dealing with the internal data structures
exported by Redis.
Let's talk a moment about some of the features of Redis.
So how does it compare in replication
to other NoSQL options?
To be honest, when I started to write Redis,
I started without any kind of idea
about the other NoSQL solutions.
And, well, what is interesting is that
after almost two years, SQL solutions. And well, what is interesting is that after
almost two years, I
more or less, I'm continuing to
never look to other solutions.
I for sure played a bit
with the most
interesting solutions
of the NoSQL
environment,
but I never focused in
the implementation of other systems in a very
specific way.
I think that Redis replication,
by the way,
is implemented
in a completely different
way. Because of
the design of Redis,
it
needed to be very different
because I wanted non-blocking replication from the point of view of the master.
I want automatic resynchronization when there was a problem in the link
connecting the master to the slave.
And I want to have such features
with very, very little symbol code.
So I needed to take all these compromises together
and to try to model something that could work.
And the final solution for replication
was to use the persistence code we had. So in order to create a replica,
what happens is that the slave asks the be used by clients. It's just
for the slave.
When a master receives
the
sync command, it starts
to produce just a dump.
A dump on exactly
like when you call
bg save. So it's like
the usual
persistence.
So we obtain a single file, an RDB file, that is transmitted back to the slave as a bulk file.
It's just a file transfer thing. But when we started to produce this dump file,
we also started to log every
write query we received from clients
and accumulate these writes in a buffer.
So when the slave will finally receive the dump,
it will load this dump,
and the master will start to transmit the accumulated buffer of change.
And this buffer of change, it's just exactly like the Redis protocol itself.
So it's not something like a binary log and so forth. At this point, the master will continue forever to write a stream of commands
received from clients to the slaves, to all the slaves.
And so the slave will continuously be updated.
What is important is that Redis replication is not a synchronous replication,
but is a sync. So while a command is processed in the master, the client will get the okay
from the master, and later the command is put in a key, it is sent to the slave.
So if you get the OK code from the master, it doesn't mean that the slave is updated as well.
But what is interesting is that since we have a very efficient efficiency, it's very good. So actually the
delay is in the order of
less than one
millisecond usually,
the master and the other replicas.
And it's working very well.
The replication is also a very
important
piece
of the Redis cluster.
This is our next big project
I'm developing currently.
You mentioned the invented model in the internals.
When you build personal applications using Redis,
what sort of application server model do you follow?
Is it also invented?
Or what's your tool set of choice?
When I write, what kind of applications when you yourself are building web applications are you putting something invented in front of
it like twisted or event machine or node.js or just your personal flavor of application Okay. Well, I love to use Ruby with Sinatra. This is my pick. I love to have very, very small frameworks because I think that the more complex frameworks, for sure it's true that you can do a lot of things with very little code. But in the end, I think that when you want to create something more complex,
you have in some way to learn more and more how the framework itself works.
And sometimes this learning activity may result in more or the same time needed to build it your own.
So what I do is to take Ruby with Sinatra
and then I use a set of libraries I developed for myself.
For example, I have a library that's called mysql.rb
that is something like active records
but much more simple that I use to talk with my SQL.
And then I use the Redis gem to talk with Redis.
And a library I wrote myself for HTML generation,
programmatic generation,
that is designed in order to be very fast.
All these libraries are the kind of code you take around,
you get from your old project and put in your new project
and then hack on it.
So I don't have really repositories for this code.
I don't release this code as open source, but it's a few years at this point
that I'm using this kind of framework
composed of my libraries and Ruby and Sinatra.
I'm looking at the client list on Redis.io,
and there's wide support for a lot of languages for Redis.
Where do you see growth,
and what communities are growing in using Redis?
The reason there are so many languages listed in the Redis.io site is because the Redis
protocol was so simple that everybody had the fun of writing a client.
But actually, there are a few of these clients that are very, with good support,
with a lot of user, with a big user base,
and others are a bit like hacks.
The big users of Redis are for sure
in the Ruby, Python,
and possibly more and more in the Java languages.
There is also a good amount of people
using the C client even directly. Also, I think the
Perl module is
used enough.
What's interesting is that
the C client,
well, there is
something special about
the C client. It is the only client
we support directly.
I and Peter
as the Redis project, we wrote this client and
we support this client in a direct way. And there are a lot of people using Redis in very
high performance environments that don't want to use an intermediate layer to talk with Redis. So they use directly C written programs
to write queries to Redis.
And I think this is a bit strange,
as I expected the C client to be very little used,
because currently dynamic languages
are much more interesting for the fast development.
Earlier you said that you didn't look too much at the other NoSQL solutions out there.
What do you feel about that term NoSQL and what does it mean to you?
The term?
Yes, is that an adequate label for apps like this?
Yeah, I think I have mixed feelings about it
because I don't like the NoSQL word itself.
But after all,
as even the design of the patterns book
demonstrated clearly,
if there is no word for something, a very bad award, it
is very hard to communicate to the programming community that we are on something, that we
are trying to do something after a lot of years of database monoculture. So while the NoSQL term may be the best term,
for sure it was something like
an incredible marketing thing
to have such a term.
It's like Web 2.0.
It's not exactly a very cool term, but it was a very interesting way to communicate to all the web developers that something was changing. embarrassing in some way because the SQL solutions
are so different
that the term is really
making less and
less sense. For instance,
I can see in the
NoSQL arena
databases that are
actually more
or less evolutions
of the SQL paradigm with
a new implementation
maybe much more
concerned with performance
than with consistency
maybe with new
protocols to talk to the
database but this
after all the same data model
you have objects
these objects can have complex indexes,
and you can run complex queries against these objects.
And this is a very worthwhile evolution of former databases.
Then there are databases in the NoSQL world
that are completely different than this
paradigm, yet we
use the same term to
address the whole
space of these alternative
solutions. It's
working currently, but
I guess we are near to the
we are starting to see
that the term used a little less than it was used before.
I see the news in Hacker News that are more and more not about SQL, but more about Redis, MongoDB, Cassandra, React, and so forth.
So I think things are evolving.
Talk to us a bit about Redis PubSub.
Yes, Redis PubSub was a bit of a strange addiction to Redis
because, well, after all, you can think that it's not a fit for Redis because Redis is a database and PubSub is clearly a messaging primitive.
So why we added it? Because to start Redis itself in the internal, in its core,
its internal core is very suited for this kind of message passing activities.
And then we had already something that looked like more a messaging data structure
than a database data structure that was the least. The least is
very useful as
a database
kind of database
value, but it's
also, because it
supports push and
pop operation in
constant time, it was very
interesting as a primitive
to create messaging solutions.
And actually, GitHub started using it for rescue.
And then we started to get more and more requests
about providing more powerful lists to create instead because the list can be used just one producer,
one receiver, or if there are multiple receivers of these messages,
they can pick the same message all the consumers.
One consumer will get the first message, the second consumer will get the next message all the consumers. One consumer will get the first message,
the second consumer will get the next message, and so forth.
So instead of trying to evolve the list to create something
it was not designed to,
we tried to add something different.
That is a simple fire-and get pub sub functionality. Another important concern
was about the need for users to communicate to other clients that something was different
in the data space. For instance, you have a key. and when this key will be modified, you want to inform another client that this modification takes place.
So what do you do?
There was the possibility of providing a generic way to communicate state change in the key space. So I can create a feature that is able to,
you can listen to a key,
and when this key will get some kind of read,
write operation and so forth,
the listening client will get a message.
But if you think about it,
this is a lot of different use cases.
Do you want to listen for the election of the keys of this key? Do you want, if it's a list and can be changed in many ways,
it can be popped, pushed, what kind of operations you are interested in? So it's easy to realize how much involved such an API could start to be.
So PubSub was also able to solve this kind of problems.
If you want to communicate to some client that there was a state change in a key, what you do is to use a regist transaction, that is the sum of multi-index command,
and you, inside the transaction,
put two commands,
one to actually change your data
and one to publish in a given channel
the fact that this key was changed.
So basically, we provided a more general form
of communication
between clients that can be used to communicate
the changes in the keys, in the key space,
but is also more generic than this.
What was interesting is that after we provided this new feature, we saw more
and more people switching from messaging solutions to Redis, because Redis was much more simple to
start with. The performance was very, very good. And so people started to use Redis
as a messaging system
and at this point
we have really three kind of users
and with big
overlaps in these
three sets of users
that is Redis
used as database
Redis used as messaging
that is the list of
the sum of the
list commands, rescue
and so forth, and
pub-sub. And finally, Redis
used as a cache.
There are three businesses
that are going in parallel.
Now that services like
Redis2Go are offering hosted
Redis and even add-ons for sites like Heroku, what has that done for the adoption of Redis?
I'm not sure these services are currently very, very, very useful for users.
The reason is I think there is a lot of value in theory in managing instances of some kind of software.
But Redis is so simple to run for the final user, and these services are usually a bit expensive,
that I'm not sure it makes sense for many users to adopt this
kind of, to use this kind of services. So I don't think they are doing a lot to make
Redis more popular. What I think these services should, these companies should focus on is in providing
more value
in these solutions. More value
is backups
to make sure that these
instances
are easy to scale,
to make sure that upgrades
are very simple
to perform and without
downtime from the point of view of users. There are are very simple to perform and without downtime
from the point of view of users.
There are clear ways to do this.
For instance, if you have a spare box,
a fresh box you can use,
and you want to upgrade Redis,
you start a new instance with the new release,
and then you start the replication process and so you switch in a instantly the
IP address to the new box the one that was the slave and you issue a command to the slave
to turn it in the master so you upgraded your Redis instance without any service interruption.
I think the value is in this kind of services.
So you can say as a user, okay, I will get this hosted Redis solution
because I will stop to think about it.
If I want a bigger instance, I will just pay more
and they will do the upgrade needed to do this
without any kind of interruption of
my services. I'm
sure they will make the backups.
I'm sure they will be able to
rotate my app and only file
if I use this kind of
persistence model without
problem in the cron service,
without problem in the
additional memory used by the background
process and so forth but my impression is that the current solutions in the market are not providing
all this interesting added value what's the largest redis installation that you've come
across as far as memory and other resources? Well, I'm not sure,
but one of the biggest I remember I saw currently
was in Blizzard.
Blizzard, the guys from World of Warcraft,
are using Redis to power the front end
of the web interface,
the mobile interface of the game,
where there are the avatars,
and you can check your avatar,
and it's used to power this part of site
and to create the 3D renders
that there are in this page
and they are using
just for this
8 nodes
of Redis
with if I remember
correctly 16
gigabytes of
RAM for
instance. I think there
is also an advertising company, I'm not remembering very
well what the name is, that is using a much larger installation of Redis with 64 gigabyte instances of Redis and 10 nodes of Redis.
So I think that currently
the biggest installations I can report
are about real servers
and 10 hosts,
in the range of 10 hosts with many gigabytes of RAM for every host.
Also, there is a DIG and'm not sure exactly how much Redis servers
and how much memory is used for this kind of installations.
I've got a question from Twitter.
Justin Campbell wants to know if VMware plans to include Redis in a future product release.
Well, I think that there is the idea to use Redis to provide services inside VMware, some kind of services.
So I really hope we will see soon something interesting about Redis and VMware.
But for sure, there is a lot of interest inside VMware about Redis.
And there are people working to solutions that will use Redis both internally and as exposed service. So you're in Italy, and for those that think that you have to be in San Francisco
or somewhere with a larger tech scene
to create a popular open source project,
how did you go about spreading the word about Redis?
I think that it was very, very strange.
The curve of popularity of Redis,
it was something I learned about from.
Because when I released the first version of Redis,
it was very, very simple.
A few lines of code demonstrating the first ideas.
There was get, set, and a few operations about lists.
I had this prototype.
This prototype was already working inside my production system,
and I put a homepage for it and posted it in the Hacker News.
When I posted it in Hacker News, there was a very good response from people.
And especially Ezra Zygmuntowicz, that is now working for VMware as well. Well, I think Ezra did a really huge difference
in the popularity of Redis
in the first months of the life of Redis.
Because after all,
you can't expect all the people out here
to be so brave to use a new solution
without any kind of guru
that is somewhat popularizing this solution.
And then there was GitHub.
GitHub started to use Redis in interesting ways
and to make users aware of this kind of of users i think you really need
in in your initial user base a few brave users but users are not brave just because they are
hazard those with their production systems the the reason is is I think that when an hacker is very good,
it starts to be confident
that he can pick the good solutions
for
modeling his problems.
Then when you started to have
a few
interesting users
in your user base, they
will start to be like
a green light for all the other users.
So Redis started to get adoption every day,
more and more and more for the first two or three months.
Then there was a stop in the adoption rate of Redis.
Okay, there was a few users using it,
a few new users,
but I clearly was saying that there was a stop
in the adoption rate of new users.
So what I did was to reconsider it.
I started to think,
so maybe this is, after all, not really interesting for most users.
But then I realized that actually I really trusted the project and continued the development,
even if it was completely a free effort at the time.
And it was a lot of work.
It started to be almost a full-time job after a few months.
And I pushed more and more features, more work, created a better implementation, and so forth.
And users started to actually acknowledge all this work
and started to adopt it more and more.
So I think there are like two different stages.
One stage is the wow stage.
When you put this project in the Anchor News front page
and people say, oh, but this is cool.
I could have uses for this project and so forth.
Then the hype will stop for a bit.
Then you need to carry this small child
into something more big, more supported,
more real-world usable.
And this is really hard work.
And during this time, you should try to don't give up.
You should try to put more and more value to your project.
And eventually, users will recognize that this works
and will start to trust this solution more and more.
When you're not busy hacking on Redis,
what tools in the open source world do you want to play with?
I like a lot programming languages.
This was one of my biggest interest before
Redis was for sure programming languages.
So what I like is to
download some
new language and to try
what is different from
all the other languages I know.
What of these ideas I
can somewhat use
in my code written for languages,
not explicitly supporting these new ideas,
but many times you can adapt this concept,
even if your language is not completely intended for these kind of things.
So I really enjoy in general programming languages.
Well, thanks for taking the time. I know it's in the evening over in Italy and surely appreciate
finally getting to catch up with you. And this is one of those episodes that
is going to be difficult to sit on for a week before we publish, but thanks again.
Thank you.
Thank you for listening to The Change Log.
This episode is sponsored by LesConf.
LesConf is a conference for people who do amazing things, and that means you.
Take advantage of early bird pricing right now until February 14th.
Head to lesconf.com to learn more and register.
Thanks for listening.
So how could I forget when I found myself for the first time?
Safe in your arms as if no passion shown.
Who was mine alone? Outro Music