The Changelog: Software Development, Open Source - Redis In-Memory Data Store (Interview)

Episode Date: January 17, 2011

Wynn caught up with Salvatore Sanfilippo to talk about Redis, the super hot key value store....

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the changelog episode 0.4.5. I'm Adam Stachowiak. And I'm Winn Netherland. This is the changelog where we cover what's fresh and new in open source. If you found us on iTunes, we're also on the web at thechangelog.com. We're also up on GitHub. Head to github.com slash explore. You'll find some trending repos, some feature repos from the blog, as well as the audio podcasts. If you're on Twitter, follow ChangeLogShow, ChangeLogJobs, and me, Adam Stack. And I'm Penguin, P-E-N-G-W-I-N-N.
Starting point is 00:00:41 This week's episode is sponsored by GitHub Jobs. Head to thechangelog.com slash jobs to get started. If you'd like us to feature your job on the show, select advertise on the changelog when posting your job and we'll take care of the rest. Just like our buddy Chris Epstein over at caring.com did. He needs a really senior Rails engineer
Starting point is 00:00:57 who's also wicked smart. Interested in this job? It's lg.gd slash 5s for the short link. Erlang in the back and Python in the front. UK-based Smarkets is merging internet and financial tech in real time. If you're game, check out shortlink lg.gd.gd.gd. AppSpark is looking for a lead iOS developer who can take charge of the UI and UX decisions over there.
Starting point is 00:01:21 If you sling the cocoa, the objective C, and you want to know more, it's lg.gd.com. Fun episode this week. Talk to Salvatore Sanfilippo from Redis. I actually love this interview a lot. It was a lot of fun to edit, but the fun thing I think we'll take from this really is how he mentions his liking or disliking of the term of NoSQL. Yeah, it's nice to get another take on someone that's created a wildly popular NoSQL solution and what they think of that term. It's kind of like HTML5. It's one of those things you ask 10 people, you get a dozen definitions.
Starting point is 00:01:54 And not to mention the fact that he didn't even look at the other solutions the entire time he started the development of Redis. And the cool thing of how this all even got popularized was a pretty cool story, I think, he told. You know, it seems to be a common thread in really popular applications is to build something that you want to consume yourself and just put the blinders on. And, you know, darn the torpedoes, I'm going to build something that I want to use. Yeah, a couple months later, it was a damn near full-time job for him. Been kind of disappointed we had to sit on this one for a week. We recorded this a week ago and finally getting to release it. Should we get to it? Let's do it.
Starting point is 00:02:36 We're chatting today with Salvatore Sanfilippo, the creator of Redis. So Salvatore, why don't you take a moment and introduce yourself and a little bit about the project. Hi all, I am Salvat to, I'm now working for VMware that is supporting the development of Redis and not just my development of Redis, but VMware is also paying the development time of Peter for Redis. So my usual day is just hacking on Redis the whole time. And this was very, an interesting change compared to the past. Redis started something like a hobby. Well, not completely without something to gain because I used it for my startups, for web startups, but the project itself was not
Starting point is 00:04:07 funded in any mean. For those that don't know, Redis, would you call it a key value store? Well, it is pretty hard to find the right position of Redis in an everyday more complex database field because in some way Redis is for sure a key value database. This is clear from the fact that you can mostly access read data just by the primary unique index that is the key itself. So in some way, it is for sure a key value database. But from another point of view, most of the key value database that were before Redis, were from a mathematically standpoint just a string-to-string map.
Starting point is 00:05:10 While in Redis, values can be much more complex, and every value itself is something like a small database. Just to provide an example, the sorted set is something like a balanced tree itself. So it's like there is an outer shell that works like a key value database, but the single values have specific data models. So this is the sum of a key value database with a number of data models that are conceived in order to address, model different kinds of problems. From everyone that I've spoken to that's used Redis, the very first thing that they talk about is speed. Does that come from it being written in C, or what's the internals of Redis? Redis started to be fast in order to solve a specific problem I had with...
Starting point is 00:06:13 I needed to write an analytic program, web analytics, that was really, really fast. And I needed to track real-time user interaction with the site to show these interactions in a web interface, Ajax web interface. I tried to model this problem with MySQL, and it worked for a few months. But when we started to get more and more users, we realized that this was not the way to do things because the cost for every user was impossible to handle.
Starting point is 00:06:59 Later, this project was more or less aborted. But the idea was to create a freemium business model. So free users had to cost very little to us. Otherwise, it was impossible to go forward with the project. So I started to write Redis with the goal of making it fast. What is, I think, interesting is that actually Redis started as a free database, not because the internals, the C internals were very optimized, like a 3D game or something like that. It was fast because it was in C.
Starting point is 00:07:46 It used an event model, an event-driven programming, and the data model itself was designed to be fast. So it's not the fluid of an optimization, of micro-optimizations, but the API that Redis exports is designed in order to take little time when dealing with the internal data structures exported by Redis. Let's talk a moment about some of the features of Redis. So how does it compare in replication
Starting point is 00:08:25 to other NoSQL options? To be honest, when I started to write Redis, I started without any kind of idea about the other NoSQL solutions. And, well, what is interesting is that after almost two years, SQL solutions. And well, what is interesting is that after almost two years, I more or less, I'm continuing to
Starting point is 00:08:49 never look to other solutions. I for sure played a bit with the most interesting solutions of the NoSQL environment, but I never focused in the implementation of other systems in a very
Starting point is 00:09:07 specific way. I think that Redis replication, by the way, is implemented in a completely different way. Because of the design of Redis, it
Starting point is 00:09:24 needed to be very different because I wanted non-blocking replication from the point of view of the master. I want automatic resynchronization when there was a problem in the link connecting the master to the slave. And I want to have such features with very, very little symbol code. So I needed to take all these compromises together and to try to model something that could work.
Starting point is 00:09:58 And the final solution for replication was to use the persistence code we had. So in order to create a replica, what happens is that the slave asks the be used by clients. It's just for the slave. When a master receives the sync command, it starts to produce just a dump.
Starting point is 00:10:36 A dump on exactly like when you call bg save. So it's like the usual persistence. So we obtain a single file, an RDB file, that is transmitted back to the slave as a bulk file. It's just a file transfer thing. But when we started to produce this dump file, we also started to log every
Starting point is 00:11:11 write query we received from clients and accumulate these writes in a buffer. So when the slave will finally receive the dump, it will load this dump, and the master will start to transmit the accumulated buffer of change. And this buffer of change, it's just exactly like the Redis protocol itself. So it's not something like a binary log and so forth. At this point, the master will continue forever to write a stream of commands received from clients to the slaves, to all the slaves.
Starting point is 00:11:55 And so the slave will continuously be updated. What is important is that Redis replication is not a synchronous replication, but is a sync. So while a command is processed in the master, the client will get the okay from the master, and later the command is put in a key, it is sent to the slave. So if you get the OK code from the master, it doesn't mean that the slave is updated as well. But what is interesting is that since we have a very efficient efficiency, it's very good. So actually the delay is in the order of less than one
Starting point is 00:12:50 millisecond usually, the master and the other replicas. And it's working very well. The replication is also a very important piece of the Redis cluster. This is our next big project
Starting point is 00:13:06 I'm developing currently. You mentioned the invented model in the internals. When you build personal applications using Redis, what sort of application server model do you follow? Is it also invented? Or what's your tool set of choice? When I write, what kind of applications when you yourself are building web applications are you putting something invented in front of it like twisted or event machine or node.js or just your personal flavor of application Okay. Well, I love to use Ruby with Sinatra. This is my pick. I love to have very, very small frameworks because I think that the more complex frameworks, for sure it's true that you can do a lot of things with very little code. But in the end, I think that when you want to create something more complex,
Starting point is 00:14:08 you have in some way to learn more and more how the framework itself works. And sometimes this learning activity may result in more or the same time needed to build it your own. So what I do is to take Ruby with Sinatra and then I use a set of libraries I developed for myself. For example, I have a library that's called mysql.rb that is something like active records but much more simple that I use to talk with my SQL. And then I use the Redis gem to talk with Redis.
Starting point is 00:14:52 And a library I wrote myself for HTML generation, programmatic generation, that is designed in order to be very fast. All these libraries are the kind of code you take around, you get from your old project and put in your new project and then hack on it. So I don't have really repositories for this code. I don't release this code as open source, but it's a few years at this point
Starting point is 00:15:27 that I'm using this kind of framework composed of my libraries and Ruby and Sinatra. I'm looking at the client list on Redis.io, and there's wide support for a lot of languages for Redis. Where do you see growth, and what communities are growing in using Redis? The reason there are so many languages listed in the Redis.io site is because the Redis protocol was so simple that everybody had the fun of writing a client.
Starting point is 00:16:11 But actually, there are a few of these clients that are very, with good support, with a lot of user, with a big user base, and others are a bit like hacks. The big users of Redis are for sure in the Ruby, Python, and possibly more and more in the Java languages. There is also a good amount of people using the C client even directly. Also, I think the
Starting point is 00:16:45 Perl module is used enough. What's interesting is that the C client, well, there is something special about the C client. It is the only client we support directly.
Starting point is 00:17:02 I and Peter as the Redis project, we wrote this client and we support this client in a direct way. And there are a lot of people using Redis in very high performance environments that don't want to use an intermediate layer to talk with Redis. So they use directly C written programs to write queries to Redis. And I think this is a bit strange, as I expected the C client to be very little used, because currently dynamic languages
Starting point is 00:17:44 are much more interesting for the fast development. Earlier you said that you didn't look too much at the other NoSQL solutions out there. What do you feel about that term NoSQL and what does it mean to you? The term? Yes, is that an adequate label for apps like this? Yeah, I think I have mixed feelings about it because I don't like the NoSQL word itself. But after all,
Starting point is 00:18:17 as even the design of the patterns book demonstrated clearly, if there is no word for something, a very bad award, it is very hard to communicate to the programming community that we are on something, that we are trying to do something after a lot of years of database monoculture. So while the NoSQL term may be the best term, for sure it was something like an incredible marketing thing to have such a term.
Starting point is 00:18:59 It's like Web 2.0. It's not exactly a very cool term, but it was a very interesting way to communicate to all the web developers that something was changing. embarrassing in some way because the SQL solutions are so different that the term is really making less and less sense. For instance, I can see in the NoSQL arena
Starting point is 00:19:36 databases that are actually more or less evolutions of the SQL paradigm with a new implementation maybe much more concerned with performance than with consistency
Starting point is 00:19:54 maybe with new protocols to talk to the database but this after all the same data model you have objects these objects can have complex indexes, and you can run complex queries against these objects. And this is a very worthwhile evolution of former databases.
Starting point is 00:20:19 Then there are databases in the NoSQL world that are completely different than this paradigm, yet we use the same term to address the whole space of these alternative solutions. It's working currently, but
Starting point is 00:20:40 I guess we are near to the we are starting to see that the term used a little less than it was used before. I see the news in Hacker News that are more and more not about SQL, but more about Redis, MongoDB, Cassandra, React, and so forth. So I think things are evolving. Talk to us a bit about Redis PubSub. Yes, Redis PubSub was a bit of a strange addiction to Redis because, well, after all, you can think that it's not a fit for Redis because Redis is a database and PubSub is clearly a messaging primitive.
Starting point is 00:21:38 So why we added it? Because to start Redis itself in the internal, in its core, its internal core is very suited for this kind of message passing activities. And then we had already something that looked like more a messaging data structure than a database data structure that was the least. The least is very useful as a database kind of database value, but it's
Starting point is 00:22:13 also, because it supports push and pop operation in constant time, it was very interesting as a primitive to create messaging solutions. And actually, GitHub started using it for rescue. And then we started to get more and more requests
Starting point is 00:22:37 about providing more powerful lists to create instead because the list can be used just one producer, one receiver, or if there are multiple receivers of these messages, they can pick the same message all the consumers. One consumer will get the first message, the second consumer will get the next message all the consumers. One consumer will get the first message, the second consumer will get the next message, and so forth. So instead of trying to evolve the list to create something it was not designed to, we tried to add something different.
Starting point is 00:23:20 That is a simple fire-and get pub sub functionality. Another important concern was about the need for users to communicate to other clients that something was different in the data space. For instance, you have a key. and when this key will be modified, you want to inform another client that this modification takes place. So what do you do? There was the possibility of providing a generic way to communicate state change in the key space. So I can create a feature that is able to, you can listen to a key, and when this key will get some kind of read, write operation and so forth,
Starting point is 00:24:17 the listening client will get a message. But if you think about it, this is a lot of different use cases. Do you want to listen for the election of the keys of this key? Do you want, if it's a list and can be changed in many ways, it can be popped, pushed, what kind of operations you are interested in? So it's easy to realize how much involved such an API could start to be. So PubSub was also able to solve this kind of problems. If you want to communicate to some client that there was a state change in a key, what you do is to use a regist transaction, that is the sum of multi-index command, and you, inside the transaction,
Starting point is 00:25:09 put two commands, one to actually change your data and one to publish in a given channel the fact that this key was changed. So basically, we provided a more general form of communication between clients that can be used to communicate the changes in the keys, in the key space,
Starting point is 00:25:38 but is also more generic than this. What was interesting is that after we provided this new feature, we saw more and more people switching from messaging solutions to Redis, because Redis was much more simple to start with. The performance was very, very good. And so people started to use Redis as a messaging system and at this point we have really three kind of users and with big
Starting point is 00:26:13 overlaps in these three sets of users that is Redis used as database Redis used as messaging that is the list of the sum of the list commands, rescue
Starting point is 00:26:30 and so forth, and pub-sub. And finally, Redis used as a cache. There are three businesses that are going in parallel. Now that services like Redis2Go are offering hosted Redis and even add-ons for sites like Heroku, what has that done for the adoption of Redis?
Starting point is 00:26:50 I'm not sure these services are currently very, very, very useful for users. The reason is I think there is a lot of value in theory in managing instances of some kind of software. But Redis is so simple to run for the final user, and these services are usually a bit expensive, that I'm not sure it makes sense for many users to adopt this kind of, to use this kind of services. So I don't think they are doing a lot to make Redis more popular. What I think these services should, these companies should focus on is in providing more value in these solutions. More value
Starting point is 00:27:50 is backups to make sure that these instances are easy to scale, to make sure that upgrades are very simple to perform and without downtime from the point of view of users. There are are very simple to perform and without downtime
Starting point is 00:28:05 from the point of view of users. There are clear ways to do this. For instance, if you have a spare box, a fresh box you can use, and you want to upgrade Redis, you start a new instance with the new release, and then you start the replication process and so you switch in a instantly the IP address to the new box the one that was the slave and you issue a command to the slave
Starting point is 00:28:38 to turn it in the master so you upgraded your Redis instance without any service interruption. I think the value is in this kind of services. So you can say as a user, okay, I will get this hosted Redis solution because I will stop to think about it. If I want a bigger instance, I will just pay more and they will do the upgrade needed to do this without any kind of interruption of my services. I'm
Starting point is 00:29:09 sure they will make the backups. I'm sure they will be able to rotate my app and only file if I use this kind of persistence model without problem in the cron service, without problem in the additional memory used by the background
Starting point is 00:29:26 process and so forth but my impression is that the current solutions in the market are not providing all this interesting added value what's the largest redis installation that you've come across as far as memory and other resources? Well, I'm not sure, but one of the biggest I remember I saw currently was in Blizzard. Blizzard, the guys from World of Warcraft, are using Redis to power the front end of the web interface,
Starting point is 00:30:09 the mobile interface of the game, where there are the avatars, and you can check your avatar, and it's used to power this part of site and to create the 3D renders that there are in this page and they are using just for this
Starting point is 00:30:30 8 nodes of Redis with if I remember correctly 16 gigabytes of RAM for instance. I think there is also an advertising company, I'm not remembering very
Starting point is 00:30:48 well what the name is, that is using a much larger installation of Redis with 64 gigabyte instances of Redis and 10 nodes of Redis. So I think that currently the biggest installations I can report are about real servers and 10 hosts, in the range of 10 hosts with many gigabytes of RAM for every host. Also, there is a DIG and'm not sure exactly how much Redis servers and how much memory is used for this kind of installations.
Starting point is 00:31:52 I've got a question from Twitter. Justin Campbell wants to know if VMware plans to include Redis in a future product release. Well, I think that there is the idea to use Redis to provide services inside VMware, some kind of services. So I really hope we will see soon something interesting about Redis and VMware. But for sure, there is a lot of interest inside VMware about Redis. And there are people working to solutions that will use Redis both internally and as exposed service. So you're in Italy, and for those that think that you have to be in San Francisco or somewhere with a larger tech scene to create a popular open source project,
Starting point is 00:32:54 how did you go about spreading the word about Redis? I think that it was very, very strange. The curve of popularity of Redis, it was something I learned about from. Because when I released the first version of Redis, it was very, very simple. A few lines of code demonstrating the first ideas. There was get, set, and a few operations about lists.
Starting point is 00:33:29 I had this prototype. This prototype was already working inside my production system, and I put a homepage for it and posted it in the Hacker News. When I posted it in Hacker News, there was a very good response from people. And especially Ezra Zygmuntowicz, that is now working for VMware as well. Well, I think Ezra did a really huge difference in the popularity of Redis in the first months of the life of Redis. Because after all,
Starting point is 00:34:15 you can't expect all the people out here to be so brave to use a new solution without any kind of guru that is somewhat popularizing this solution. And then there was GitHub. GitHub started to use Redis in interesting ways and to make users aware of this kind of of users i think you really need in in your initial user base a few brave users but users are not brave just because they are
Starting point is 00:34:57 hazard those with their production systems the the reason is is I think that when an hacker is very good, it starts to be confident that he can pick the good solutions for modeling his problems. Then when you started to have a few interesting users
Starting point is 00:35:20 in your user base, they will start to be like a green light for all the other users. So Redis started to get adoption every day, more and more and more for the first two or three months. Then there was a stop in the adoption rate of Redis. Okay, there was a few users using it, a few new users,
Starting point is 00:35:49 but I clearly was saying that there was a stop in the adoption rate of new users. So what I did was to reconsider it. I started to think, so maybe this is, after all, not really interesting for most users. But then I realized that actually I really trusted the project and continued the development, even if it was completely a free effort at the time. And it was a lot of work.
Starting point is 00:36:27 It started to be almost a full-time job after a few months. And I pushed more and more features, more work, created a better implementation, and so forth. And users started to actually acknowledge all this work and started to adopt it more and more. So I think there are like two different stages. One stage is the wow stage. When you put this project in the Anchor News front page and people say, oh, but this is cool.
Starting point is 00:37:01 I could have uses for this project and so forth. Then the hype will stop for a bit. Then you need to carry this small child into something more big, more supported, more real-world usable. And this is really hard work. And during this time, you should try to don't give up. You should try to put more and more value to your project.
Starting point is 00:37:38 And eventually, users will recognize that this works and will start to trust this solution more and more. When you're not busy hacking on Redis, what tools in the open source world do you want to play with? I like a lot programming languages. This was one of my biggest interest before Redis was for sure programming languages. So what I like is to
Starting point is 00:38:11 download some new language and to try what is different from all the other languages I know. What of these ideas I can somewhat use in my code written for languages, not explicitly supporting these new ideas,
Starting point is 00:38:31 but many times you can adapt this concept, even if your language is not completely intended for these kind of things. So I really enjoy in general programming languages. Well, thanks for taking the time. I know it's in the evening over in Italy and surely appreciate finally getting to catch up with you. And this is one of those episodes that is going to be difficult to sit on for a week before we publish, but thanks again. Thank you. Thank you for listening to The Change Log.
Starting point is 00:39:05 This episode is sponsored by LesConf. LesConf is a conference for people who do amazing things, and that means you. Take advantage of early bird pricing right now until February 14th. Head to lesconf.com to learn more and register. Thanks for listening. So how could I forget when I found myself for the first time? Safe in your arms as if no passion shown. Who was mine alone? Outro Music

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.