Software Huddle - Redis but Faster With Roman Gershman

Starting point is 00:00:00 Do you see a lot of Node.js people using Dragonfly or is it like, hey, bigger systems that are running in C++ or Java or something like that? We see everything. We see everything. They are not shy in bringing you use cases that succeed to break Dragonfly from time to time. How do you find developer attention or how do you get your name out there? What are you finding effective in terms of the marketing aspect of just getting you know, getting Dragonfly known? I mean, you know how we understood that our project really brings value? We are clueless in marketing and still people came to us.

Starting point is 00:00:45 We did this not because it's easy, but because we thought it was gonna be easy. Hey folks, this is Alex and we're back. Got a new episode with Roman Gershman here. And this is like one of my favorite types of episodes because Roman is the co-founder and CTO of Dragonfly, which is a, just like a faster version of Redis, like basically API compatible with Redis,

Starting point is 00:01:03 but faster and just improved in a lot of different ways. And we just go deep on all that sort of stuff, the threading improvements, the memory improvements, sort of his path there, like what he saw and why he got interested in this problem. He used AWS ElastiCache before that at a big startup that was using a ton of Redis cache and just having issues there. So really cool story there. Back on we talked about some business stuff, so check it out.

Starting point is 00:01:26 As always, if you want some other guests, if you have any questions for us, feel free to reach out to me or Sean. With that, let's get to the show. Roman, welcome to the show. Thanks for having me. Yeah, absolutely. Well, you're like one of my favorite types of guests to have

Starting point is 00:01:41 because you're just one of these people that's like on a different level than me, like thinking about systems and performance and all sorts of that stuff. So I'm super excited to dig into that today. You are the CTO and co-founder at Dragonfly. Maybe for those that don't know you or don't know Dragonfly, can you give us some background on what Dragonfly is and what sort of your background is? Yeah, sure. So DragonflyDB, Dragonfly is an open project hosted on GitHub, which is a drop-in replacement for Redis and Memcached. It has compatible APIs.

Starting point is 00:02:18 We have more than 250 supported commands, lots of functionality, and basically our customers are those that used before Memcached or Redis, and now they're switching to Dragonfly due to its better performance. Well, we'll talk about this, but yeah. Yep, yep, okay, so the big calling card is, hey, it's everything's, not everything,

Starting point is 00:02:44 but for the most part compatible with Redis, but significantly better performance on that. So API compatible, but performance, which is interesting because everyone thinks of Redis as this super fast thing. And of course, it is compared to a disk-based solution. Then you also hear, you've heard over the years of just like, hey, Redis does some things well,

Starting point is 00:03:05 but there are like some performance things that it can do. So I'm like, I'm really excited to like dive into some of those ones. So I'd say like, maybe let's just talk through some of the improvements and in Dragonfly, how you improved on Redis. And I don't know if you want to start in a specific way. I thought maybe starting with threading,

Starting point is 00:03:22 like the thing I always hear about and think about is, hey, Redis is single threaded and that works well for some things. It doesn't work well for other things. So maybe talk about the threading model that you all use at Dragonfly, how that's different than Redis. Yeah, sure.

Starting point is 00:03:36 So before that, I can just kind of talk about my personal experience with Redis. And that's what pushed me to think about multi-threading. So in my past, I was working at Google. By the way, there I co-designed a memory system that powered a Google Suggest API, the thing that provides real-time queries when you type in Google Search. That's something that we launched globally at Google

Starting point is 00:04:14 in 2010. And there, I kind of like the whole thing of performance, scale, efficiency, and et cetera, et cetera. And it was in a memory service, which like the whole thing of performance, scale, efficiency, and so on. And it was in a memory service, which was also close to our domain. After Google, I was a founding engineer at a small startup called Ubimo. And one of the co-founders of that startup is my current co-founder, Odette, who is the CEO of DragonflyDB.

Starting point is 00:04:48 And I was one of the first engineers there. And we needed to kind of learn about modern infrastructure that existed back then. It was 2015 and Redis quickly came up on our radar. And you start looking up Redis at the internet, and you see that it's the most performant, the best in-memory store. That's how it was marketed. And we tried it out. Now, the startup was focused on ad tech, real-time bidding, programmatic interfaces. So it handled lots of traffic, programmatic APIs,

Starting point is 00:05:35 et cetera, et cetera. And we needed really high throughput, something that reached millions of QPS internally. And we tried out Redis, and then to my surprise, I saw it's single-threaded. So, okay, they had this cluster mode. We started using this. It was a pain in the ass

Starting point is 00:05:58 because it was much more complicated. And we hoped that we gonna solve our scaling issues with cluster mode, but then we just noticed that we can't take snapshotting just because of the, you know, so high write throughput that our system had, the snapshotting didn't work. And I remember then thinking, well, it is possible to do it better. It is possible to create a system that will be multi-threaded, that will be able to use more CPU

Starting point is 00:06:34 power, and then all those operations that you would expect from a memory store, like snapshotting, just would work better and it will be There will be more robust It was a thought that set and the back of my head, but I went on with my life I was an engineer at the UB more And didn't do anything about that But we had developed internal systems

Starting point is 00:07:05 which were multi-threaded in C++ that kinda replaced some of the functionality of Redis. Then the startup was bought and I... And just to interrupt you for a second, you said you built some stuff inside Ubumo that was multi-threaded. So did you replace your Redis usage altogether in some cases or how are you sort of thinking about that?

Starting point is 00:07:29 At some point, I felt that we built too much custom stuff and it's not how folks outside operate. So we kind of maybe focused on the wrong things inside the company. So what I did, I tried to replace the most critical parts that we just couldn't scale up with Redis. We had a cookie management system that needed to contain or hold billions of device IDs slash cookies. And it was too much memory, and we needed to replace it, like snapshotting and stuff. And it just didn't work with Redis at all, so at least not with the budget we had.

Starting point is 00:08:25 So I designed and implemented a system that was just optimized for this use case. And it was orders of magnitude cheaper and simpler to operate. Again, not something that could replace Redis because it wasn't a general database or data store, but it served our needs. Yeah, absolutely. Yeah, everything that I kind of tried to build there was multi-threaded because I wasn't a DevOps. And production or operational complexity was a big pain point for our team.

Starting point is 00:09:14 We were engineers. So when we designed multi-threaded systems, we didn't need to operate multiple nodes, distributed nodes, and it was just simpler to run in production. Yep, yep. One question I wanted to follow up. So you mentioned snapshotting,

Starting point is 00:09:35 and I read about that a lot in like sort of your docs and things and the way you've improved snapshotting. And I guess like, where do you see people using snapshotting with caches? Cause I often think of caches of ephemeral data, you know, if you need to go back to your, you know, like a read-aside cache type thing. I guess, like, were you using this almost like more like a primary data store in some case, just because of the requirements of it and because that you're using snapshotting? Or how are you

Starting point is 00:09:59 sort of folding snapshotting into your strategy there? Yeah. So when you think about Redis, you often think about it as a caching system. But I would generalize it to something else. It's not just the caching system, it's a data store that has own right to exist for business use cases where data loss is not business critical issue. So basically if you have like a single source of truth, which you can use to fill up your in-memory store, and maybe it also serves as a caching system, but if it is lost, yeah, maybe it's a pain point, but your business survives it, then maybe it's a good use case for Redis. I wouldn't want to see a bank using Redis for its transactions, right?

Starting point is 00:11:17 Yeah. But there are many, many use cases that are beyond caching, that people have Redis for them, operate Redis. Yep. But then when you're taking those snaps, when you're snapshotting your Redis instance, I guess what were you doing with the snapshot?

Starting point is 00:11:37 Is that if an instance failed, you'd use it just to more quickly recover from that failure? Or was that maybe not a permanent store, but like the only source of that particular data that was a little more ephemeral, like cookies or something like that. I guess like, help me understand where and why you're using snapshots.

Starting point is 00:11:55 Yeah, so basically we personally used it for disaster recovery. And yeah, there are several solutions to that. Sometimes you can just use high availability and sometimes you can use high availability with snapshotting. But basically we had, let's say again, at the frequency capping system

Starting point is 00:12:20 that for some device we don't want to show ads more than K times. That's something that we managed there, our Redis cluster. And for that, yeah, if it crashes, probably the last three, four hours are going to be lost, and we'll show more ads for that device. But it's not business critical. And then you can load. And by the way, availability is more important because when we do not have this system, we basically didn't show any ads and then it's a loss of revenues. So we wanted to load

Starting point is 00:13:00 this as quickly as possible. And again, Redis didn't do a great job with that as well. Gotcha, gotcha. What does that look like? Like I say, I have a big, you know, one of these bigger high, very high traffic caches that you're talking about, maybe tens, hundreds of gigs. How long does it take to recover a snapshot from a new node if you have one that fails over,

Starting point is 00:13:21 that's that size? Well, it really depends on your production environment. I think we stored a snapshot locally on disk and then we are subject to bandwidth limits on that instance. So basically we didn't have a high performance EBS disk and also radius by itself has its own limitations. And then you have multi-gigabyte snapshot and then you need to wait for it to be loaded into memory. In addition, of course, we did some backups on cloud storage.

Starting point is 00:14:05 But usually we designed our system that in case of crashes or something, it would load the snapshots from local disk. Gotcha. Okay. And then you said after Ubimo, you went and worked actually at AWS ElastiCache for two years as a principal engineer there. Tell me, I guess, like what you worked on there. Like, did you try to get them to do some of this stuff or make some changes or make their own?

Starting point is 00:14:37 Or like, how did you sort of think about that role? Yeah, to tell you the truth, I reached out to them. They opened a new team in Tel Aviv, in Israel, where I live. And when I heard about it, I remembered my wishes about how to improve Redis. And I thought, OK, that's probably the perfect place to do such things. And I wrote an email to a manager of the team and told him, listen, I can just improve Redis so much.

Starting point is 00:15:17 I didn't have any slightest idea, by the way, how I will do it. But that was my selling mail. I just wanted to try this. So I told him, I can improve Redis. I can rebuild or fix so much of its problematic design points. And they accepted me to the team. Back then I had a pretty lots of experience not just around Redis but also around designing

Starting point is 00:15:55 memory systems so it looked like a very good fit and yes in ElastiCache, I had the privilege to see how other customers use Redis clusters at scale, so small but very big ones, and also could suggest improvements and to discuss design, problematic design decisions that were made in Redis. And I did all of that. Yep. Yep. But you're probably still like somewhat limited, right? They probably didn't want to like make their own fork of Redis at that, you know, now a couple of years later, they've of course got Valkyrie and all that stuff. But at that point, they were probably fairly locked into Redis and not... They did their enhanced I.O. type things and a few different things, but not wanting to do probably the wholesale changes that you're doing at Dragonfly. Yeah, I had these discussions.

Starting point is 00:17:13 Yeah, I had these discussions. I mean, it's a classic innovator's dilemma, I guess. And I had very open discussions with senior engineers on the team, with the management of the team about what can we do, whether it's the right place for ElastiCache team to start doing something new. And I felt that there are good reasons why, maybe it's not a good idea for the ElastiCache, but at the end of the day, I thought it's a classic innovators dilemma where basically they have their reasons

Starting point is 00:17:51 maybe not to innovate there, but maybe the world, the engineering community would benefit from another system which possibly be much better than Redis. Yep, yep. Okay, so then you go off and you start dragging and fly. And just from like some of the benchmarks I've seen, we're seeing 20, 25x improvements on sort of queries

Starting point is 00:18:16 per second and things like that over Redis. I know there's a couple different things here, the threading, the memory improvements, a lot of different things. I guess like, I know this is hard to quantify, but is one of those changes doing the bulk of them, bulk of the improvements? Is it all those things working together? I guess where's a good place to start in terms of understanding the improvements that Dragonfly is making? Yeah, so let's put something out straight.

Starting point is 00:18:44 Let's put something out straight. When we talk about 25 times better throughput, it's when we compare single-process Redis versus single-process Dragonfly. I think historically the objection to that in the Redis community was that horizontal scale solves those issues and we can discuss why it doesn't really solve the scaling issues. But if you compare single process Redis with Dragonfly, yeah, you can get even higher than 25 times more throughput on bigger machines. 25 times more throughput on bigger machines. Today we have machines with 128 CPUs, even higher with really good networking devices. Usually the bottleneck is gonna be on a networking throughput

Starting point is 00:19:38 and but Dragonfly is not the bottleneck there. So a multi-threading is the reason why we can scale vertically to the maximum limits of any machine. So on a weak hardware, we'll probably get much smaller improvements, but on a very strong hardware, ironically on AWS, you can get to really high throughputs. Yep, yep.

Starting point is 00:20:10 Okay, and so some of the threading stuff, I was looking through it and it looks like you are in the C-Star community, which I think of like the, is this similar stuff is like what Cilla does for Cassandra or what Red Panda does for Kafka? And I think of like those sort of people, is that like a similar type of thing going on here

Starting point is 00:20:30 with Dragonfly and Redis? Yep, actually I learned about ShareNothing by reading blog posts from Sistar. That's how I discovered this architecture. That's how I discovered this architecture. I learned from a reading blog post of a Avikiviti and Costa Glober, I think. I hope I pronounce his name right. He's the CEO of Turson right now.

Starting point is 00:21:00 Turson, we've had him on. He's a great guy. Yeah, for sure. Yeah. They were writing a blog post about how share nothing architecture helps them to scale their use case to multi core servers. And a I like, I tried sister, when I learned about coroutine slash fibers slash Asynchronousity it was before I joined AWS I am a C++ guy. So it was close to my hardest also, but I kind of found it not very convenient for me and to learn

Starting point is 00:21:45 found it not very convenient for me to learn. So I went and also I kind of learned when I implement or try to code myself. So this is how I came up with another project which is called Helios and this became something that is equivalent to C star for C-Ladb and the Red Panda but yeah they're kind of very similar in their design goals. Mine uses fibers and allows me to write code that kind of resembles maybe Node.js asynchronous function calls, where you just tried your code synchronously, but underneath, everything is asynchronous. And their code looks like continuations.

Starting point is 00:22:40 I don't know if you know, like, it's an actor-based design where you need to write, do this then, and you have this lambda that you chain and just looked a bit complicated on my test, but they succeeded a lot, so. Yep, for sure. So you mentioned like even some of the difficulty of picking that up.

Starting point is 00:23:04 And I know Salvatore from the Redis project had talked about just how single threaded is simpler and that's sort of why he did it. I guess like, is that still the case? You know, a couple of years here in the Dragonfly where it's like, hey, this multi-threading, it's harder. We have to be more careful and all that stuff. And there are these huge performance gains

Starting point is 00:23:22 we can get from it and it's worth that, but it is like hard. Is it hard to find developers for that? like what are you finding a couple of years into the project? Yeah. So first of all, he's right. Single threading is much simpler. Python is single threaded, Node.js is single threaded.

Starting point is 00:23:40 But for them, they have an excuse. Developers use Python and Node for building stateless applications. They rely on something like Redis to offload their state. So it's kind of like a bubble in a way. Like Redis doesn't have anything to rely on. It's like the final destination. It is the thing. And stateful systems must be reliable and performant. They are responsible for managing the state of all those applications that the full-stack engineers build.

Starting point is 00:24:30 And I agree with Salvatore and Filippo. It is complicated to build multi-threaded applications, but life is hard and we need to do this improvement and innovation. And I think it's not a good enough excuse why a piece of infrastructure is not as performant as it could be. Yep. Yep. And especially like you just consider the leverage we have on it. You know, it's hard for you. It's hard for your team. You have to put a lot into that. But then so many people can benefit from that. It's not like everyone has to sort of learn this multi-threading stuff. You know, again, like me as a node developer,

Starting point is 00:25:10 I cannot farm it out to Dragonfly and not have to worry about that as much. Yeah, but again, innovators deliver. At the end, it becomes a symbiotic relationship a symbiotic relationship where you already have your ecosystem of users and companies using you. Now to disrupt yourself when you start from scratch, it's very, very hard. It's really impossible even if you want to. You need to say no to potentially great business. You need to start from scratch. Your current users will be disappointed because instead of deploying new features, you're going to be focused on some crazy idea of rewriting this from scratch. So this is what innovators dilemma is about. And for that to work, you need to be an outsider, unfortunately. And as an outsider, I was a bit arrogant.

Starting point is 00:26:15 I frankly thought it's gonna be simpler. But at the end of the day, I'm happy of how it worked out. The original design still works. We constantly improve Dragonfly and it delivers. Yeah, it reminds me of that meme where it's like, we did this not because it's easy, but because we thought it was going to be easy. Yeah, that's probably where we were hitting the wrong areas. Exactly.

Starting point is 00:26:44 Yeah. Okay, so a probably right. Exactly. Yeah. OK, so a little more on threading, and this is maybe a dumb question because I don't know that much about system and stuff. But you talk about shared nothing on this single node sort of split across these threads. And when I think of shared nothing, I think of databases like Dynamo or Vitesse

Starting point is 00:27:00 or something like that where it's like, hey, they're splitting the key space across these different nodes. Is that what's happening here? Like, hey, if I have this, you know, Dragonfly key space, the data is segmented across these different threads. Is that what's going on there? Yeah, exactly. Each thread in Dragonfly process is responsible for its own keys. And basically there is no contention when you need to access. Sorry, if for example, a client connection needs to access a key, it can just go and read a memory where this key is located. It needs to communicate the intent with the thread that is responsible.

Starting point is 00:27:49 So basically, it's very much like a distributed system across multiple machines, but collapsed into a single process. And everything is done, there are messages. And you can ask, how come it is more efficient than just launch multiple processes, for example? And it's a great question to ask. The thing is that there are some critical components inside the that are implemented using shared memory, like atomics, memory atomics, and some log-free algorithms that make the whole thing much more efficient. So basically it's much more efficient

Starting point is 00:28:39 to do this share nothing architecture within the process than launching multiple processes on the same machine or just using multiple processes on different machines. So it's another level of inefficiency. Yep, yep, okay, okay, that's super interesting. Okay, so that's the threading stuff. I know that you've also done some improvements just in the memory management itself with Dash Table.

Starting point is 00:29:07 I guess maybe give me the high level on Dash Table on how that's sort of improving on Redis and Redis dictionaries. I mean, it was kind of a random thing. Everything is random actually in Dragonflyer. But multi-threading was my first goal, something that I explored and read lots of state-of-the-art papers from the research committee on how to implement all those operations in Redis. It was a research of like research and development effort on my side.

Starting point is 00:29:48 And dashboard, I myself like hash tables. I read about them, I like, I read about their implementations, just like this data structure. And I stumbled upon a paper that talks about a dash table, but in a different context. They're talking about persistent memory. If you remember, several years ago, Intel tried to push a persistent memory design, but it didn't really work out. So there were a bunch of papers talking

Starting point is 00:30:25 about this and this hash table was persistent memory friendly in terms of its design. But I saw something else. I saw that the same design can be used with regular memory and have some advantages over the hash table that is being used in the Redis. By the way, in Redis they use it is very well engineered hash table implementation, but it's very simple. It's something that we learn in computer science course in the university. So it's very well engineered, but nothing sophisticated there. There are lots of hash table designs. I can probably talk about this for an hour.

Starting point is 00:31:18 But yeah, I like in the hash table that its resize operation is kind of elastic. So in Redis when you perform resize it's always multiplies by two and so for really large workloads on this machine it becomes an issue because you need to resize a huge array and just by itself a very heavy operation. And then you need to do rehashing and stuff. So that design was much more efficient. It was designed for modern hardware. It could use a vectorized instructions

Starting point is 00:32:07 and just liked it and they saw that it fits Dragonfly. It fits in memory data store and just borrowed. Yep, yep. And are the big benefits for that? Are they mostly around like, you know, using less memory, having smaller memory size or is it also just faster or I guess like, what are the- It's everything, it's everything. So it's much faster. I think for regular use cases, you won't feel it so much because other stuff like networking management, IOS,

Starting point is 00:32:42 parsing will will be more significant on your profiling radar when you see it. But it really affects 99% because 99% is about those slow things that the radius does and suddenly they're much faster in Dragonfly. In addition, you mentioned memory. Yeah, it has open addressing scheme, and Redis hash table is a chained design where you have lots of pointers. And with pointers, you have more metadata.

Starting point is 00:33:30 and with pointers you have more metadata and so you have more like overhead in terms of memory especially for smaller items in case you store small items so there we are much more efficient as well and by the way it is a kind of interesting when Dragonfly launched in 2022, a year, sorry, a week afterwards, suddenly in Redis report, they added an issue, improve Redis hash table design. I don't know if it is coincidence or not, but that's what they added after we launched with Dragonfly. In Valq, they are actually going to release,

Starting point is 00:34:11 I think Valq 8, a bit more efficient data structure than the original Redis hash table. Yeah. Okay, very cool. I do want to talk about Valq later and some of this other stuff, so I want to come back to that. Just one or two more later and some of this other stuff, so I wanna come back to that. Just one or two more things around,

Starting point is 00:34:27 like some of, I guess the improvements, or just things around that. One thing I wanna ask about is the SDKs. Right now, you just say, hey, use the Redis SDKs, those are gonna work, or MMCacheD, those are all just gonna work. Do you think you'll ever write your own SDKs? Is there some low hanging fruit

Starting point is 00:34:46 just in terms of connections or something elsewhere if you had more control over it, it would be beneficial? Or is it just like, hey, you know, the Redis ones are good enough, we don't need to mess around with that layer. First, definitely. Definitely there is the right performance improvements if we would write our own client library.

Starting point is 00:35:08 So basically, Dragonfly benefits from using more connections as well as using pipelining. With Redis, it's all about pipelining. So you can just pipeline your request and this reduces the pressure on the networking stack of the server. With Dragonfly, it's both. Because it's multi-threaded and asynchronous, you can use multiple connections to access it, and it will just respond. So basically, if you just just respond. So basically, if you just use a single connection, even with pipeline, Dragonfly is obliged to process each request on that connection sequentially. So this will introduce a latency just because Dragonfly can't jump and skip because maybe the next request is a kind of dependent on the previous one.

Starting point is 00:36:10 So it would break atomicity guarantees. But if a kind of customer or user would use multiple connections, they already know what can be parallelized, what kind of workloads are independent, and they can use multiple connections, then Dragonfly will be utilized much better. That is usually a problem around Node.js client, which its design is around a single connection. a single connection. I mean, yeah, it could be improved, but we just saw that Node.js users still get lots of improvements from using Dragonfly, even with the classical client libraries. And I myself and our team is not big experts in that area. So we just, right now we stick to what we know.

Starting point is 00:37:07 Yep, yep. Do you see a lot of like Node.js people using Dragafly or is it, you know, like, hey, bigger systems that are, you know, running in C++ or Java or something like that? We see everything. Like we see everything. And unfortunately, they are not shy in bringing new use cases that succeed to break DragonFly from time to time.

Starting point is 00:37:30 But yeah, we see Ruby, we see Python, Node.js,.NET, Java, Python, of course, like everything basically. Rust. Actually, I don't know if I saw Rust user, but. Really? OK, yep. On that same note of the SDK, do you like the Redis API? Or are you like, oh, man, I wish it were different? Yeah. So Redis API is about kind of your question,

Starting point is 00:38:00 actually, whether I like Redis. Because it's kind of equivalent. Yeah, I like its simplicity. I like Redis as a product. It's a system with 1,000 faces, basically, because everyone can find their own use case with Redis. And I like its versatility. So yeah, it can be sometimes overwhelming

Starting point is 00:38:31 and to learn all those things. And it's really kind of hard to experience the knowledge of how to translate your flows into the building blocks of Redis. And really, community folks did a great job with all those frameworks. There are lots of them around Redis of how to translate job management, state of loading, everything like this into Redis. Really good. Yep. On the same note of the SDKs and things like that,

Starting point is 00:39:18 I know you're highly compatible with Redis, you have all the commands listed where you are. For the ones that you haven't implemented yet, are you aiming for eventually 100% or are you just like, hey, these things sort of don't make sense in this sort of world? The type of customers we have don't use those things, maybe shouldn't use those things, there's a lack of interest. I guess, how do you think about trying to be complete versus focusing on the things that matter for your customers? Yeah. So we went just from older... Just common sense that older APIs are used more frequently.

Starting point is 00:39:59 So we started with those. And our first milestone was to reach RedisX compatibility. And we did that around a year ago. Now we are going to implement all the 7.2 APIs just to reach the compatibility with just to reach the compatibility with the Redis slash Valky fork. And then kind of, it's going to be a good place to rest in terms of APIs compatibility. Yeah, yeah. So given all these changes that we've talked about, are there differences in just modeling advice or the way people should do things in drag? I imagine it's mostly similar to Redis,

Starting point is 00:40:51 but are there certain things where either you had to worry about this in Redis and now you don't? Or, hey, maybe you used to do this in Redis and it's not as good idea to draw and drag and fly. Are there any differences in sort of modeling and approaches there? It's a great question. I'm trying to think about a good answer. So first of all, a I'll tell you about immediate benches that you have with dragonfly and

Starting point is 00:41:17 Sometimes people start with a single node Redis and they need All those a Redis and they need all the entities at the store that know together just because they sometimes fetch it with different combinations from permutations like using mgets or because they do some set computations, so they need to intersect multiple sets. So they need them all together, and it's very hard for them to scale horizontally, for example. Dragonfly allows to scale vertically and to just forget about this restriction.

Starting point is 00:42:02 When you scale horizontally, you need to collocate your data within hashtags. forget about this restriction when you scale horizontally you need to collocate your data within hashtags and you can do cross hashtag operations. So Dragonfly has a cluster mode emulated mode where it can just work with cluster clients but also it lifts all those restrictions about cross-slot operations. And everything is still atomic. So basically you're working with single node radius,

Starting point is 00:42:37 but you can scale it up to 64, 100 times, whatever the size of your server is. That's one thing. Another thing we mentioned briefly, that Dragonfly likes multiple connections. You don't need to push into pipelining millions of requests. It just increases your latency. you can paralyze it. And that's it, I guess. There are not much to it in terms like for Dragonfly. Hope it answers your question.

Starting point is 00:43:19 Yep, yep. And one thing you said earlier is Redis, I can't remember if you call it like the tool or a thousand faces or something like that, but there's so many different use cases for Redis. And I know that you all work with very high need users, either lots of data, lots of TPS, things like that. I guess how does their usage of cache differ from, you're sort of like more

Starting point is 00:43:45 standard every day, you know, smaller, smaller Redis users, are they using just like the simpler things just sets and gets are they still using pretty advanced stuff, I guess, like, what does that? What does that look like for for those high scale users? It's a good question. I would say that there is no correlation. You can find a small company being really sophisticated around Redis, and they don't have to be sophisticated. Well, for example, if they start using, let's say, Sidekick framework, it uses quite a few APIs.

Starting point is 00:44:28 So just by using this framework, and usually is what you see that many companies do not use Redis directly, they use it through those frameworks, and then it's just variety of APIs. If we look at like Bird bird's-view on kind of API usage of our systems in production, I think set and get is going to be dominant APIs. But again, it's also kind of for smaller companies and bigger ones.

Starting point is 00:45:06 It doesn't mean that the bigger companies only use Set and Get. They can have also job management frameworks, different use cases, but yeah, Set and Get probably most predominant ones. Yeah. Yeah. Okay. Let's talk a little bit about, I know like in the news, the last year or so with Redis,

Starting point is 00:45:26 there's been the Redis licensing drama and sort of all that stuff. I guess for people that don't know, what 2018, 2019 or so Redis Inc started, they first had like their modules that they were licensing separately and people couldn't use them. And then last year announced that like new versions

Starting point is 00:45:45 of Redis would have a different license. I guess like, what do you think of, how does that impact you? How do you think about that? How do you all think about that at Dragonfly? We kind of stayed out of this drama, but we saw a positive effect. We saw new users, community users, and also customers coming to us, kind of started looking up for latest alternatives.

Starting point is 00:46:30 And maybe they didn't hear about Dragonfly. So just because of this motion, we sometimes met new users and new customers. Yeah, but I generally think it's a good thing for the community. Even if it fragments it, it kind of creates a competition. So, like I mentioned before, when Dragonfly has been launched, a week after they added an issue of improving their dictionary. So suddenly you see that when there is a competing framework, just people try harder.

Starting point is 00:47:14 You need to try hard when you know that there is a competition in this field, and you can see the results. Valky now have lots of performance improvements and Redis has their own stuff going. So yeah, it's a good for community. Yeah. Yeah. Speaking on that, that like competition aspect, it's interesting because I feel like, you know,

Starting point is 00:47:37 there was like a 10 year period where it was all Redis and only Redis and sort of like, that was the only thing in the like, you know, 2009 it comes out. I think until 2019, you didn't really see much else out there. But now in the last couple of years, you're starting to see, I guess, more competition in the space between you and KeyDB, which was acquired by Snap and Valkey and Memento. I guess like, what, what do you think in the last five years has sort of, I guess, triggered some of that interest in the space? I just, that's kind of what I learned at the ElastiCache team is that there is a tremendous appetite for such system in memory store, because there are more and more use cases that require

Starting point is 00:48:32 real-time, people want responses in real-time. And there is a tremendous pressure between the current state of the art and what really people want. And there is no challenge, nobody challenges the status quo. And I think there is a... I need to give a big kudos to KDB team that started challenging the project by starting their own thing. I disagree with their design decisions, but it worked out for them. They got acquired by Snap and I hope Snap benefits from their system. But I wanted to go bigger, me and my partner, actually. And that's why we chose a different architecture.

Starting point is 00:49:37 And we decided to do all in and to rewrite the whole system from scratch. And the thing is is a kind of strategic. I can ask you, I ask myself, how do you see an in-memory system in 10 years? Do you really want to see hundreds of those tiny single process registers spread around your infrastructure stack? Or you want to see a software that can accommodate itself to any modern hardware that we have. I didn't see that this is a possibility.

Starting point is 00:50:14 I was sure that someone who will do something better just decided to be the thing that tries to do this. Yeah. What do you think the competition will look like? You know now that Redis has its own thing, you know Valkeys forking off that and like everyone is Redis compatible right now. Do you think that will like I guess will the will

Starting point is 00:50:37 will be competition on the API surface and different features or is like hey red you know Redis has the features that it needs. That's that's mostly what people need from cache is these data structures. And the competition is going to be more of the under hood, under the hood performance and pricing and, and, um, ease of use type stuff. I think in terms of APIs, it's very hard, um,

Starting point is 00:51:03 to add something that is going to be really disruptive. By the way, it's true for any database system. Can you imagine any new feature for PostgresQL that is going to be super disruptive? I can't. There will be improvements, of course. But I think the way people consume those services, the complexity or lack of it is going to be the decisive factor, because then it affects the whole experience

Starting point is 00:51:44 of using a system. And that's what we see with relational databases as well, like with Athena, for example, like serverless and stuff. Like it's same technology, but different consumption model, and suddenly it's super popular. Yeah, yep. On that same note, do you see much usage of the various, like I was like Redis modules,

Starting point is 00:52:11 the more like exotic API type stuff, whether it's search or blue filters or things, like do you see much usage of that? Or is it like, hey, most people are just using Redis Core? I have inherent bias because what I saw, it was on the ElastiCache platform. And there, they only had the core features. I believe on-prem users have more usage.

Starting point is 00:52:37 And that's what pushes them to buy from Redis Enterprise or Redis Labs, their cloud product. But I think the majority of users just use, like I said before that, it was set and get. So maybe some percentage uses more sophisticated APIs, but basically the first 100 APIs, like 99% of the whole usage. Yeah, yeah.

Starting point is 00:53:08 Okay, talking, changing a little bit into like building an infrastructure company. I guess like, what is it like when you're sort of competing with one of your main providers, right? If you're running on AWS or GCP, I guess, what is that sort of interaction like? I mean, I guess, what is that sort of interaction like? I mean, I think, like tell you the truth,

Starting point is 00:53:31 I don't feel any pressure there. So basically we just operate to best of our capacity. Of course, maybe in terms of business, motion, their sales team on those cloud providers prefers their internal products. Probably true for any vendor that tries to challenge big hyperscalers. But that's kind of our life and that's okay. But it's okay. Yep, yep. What about, you know, one thing I, when I talk to infra, you know, startups, depending on their flavor,

Starting point is 00:54:21 I hear a lot of complaints about AWS networking costs. I guess like, is that a big factor for you? Is that, does that come up a lot for customers or for you internally? I guess, how do you think about those? Yeah, yeah. I mean, yeah, it's a, their durable advantage. And they put those barriers where we can just, you know,

Starting point is 00:54:42 provide our service to a third-party customer in a frictionless way without paying them a tax. And it's around, if you want a VPC endpoint, you need to set it up and then you need to pay for bandwidth. And there are also customers outside of cloud like on-prem that won't consume Dragonfly and then they need to pay for agris and sometimes those agris costs really become significant compared to the data store costs. It's a problem. It's a problem. Yeah. Yeah. Yeah, for sure. That's right here.

Starting point is 00:55:25 What about building an infor company? I guess, how do you find developer attention? Or how do you get your name out there? What are you finding effective in terms of the marketing aspect of just getting Dragonfly known? I mean, you know how we understood that our project really brings value? We are clueless in marketing and still people came to us.

Starting point is 00:55:55 We really had zero knowledge, didn't know how to market ourselves. ourselves and suddenly in, let's say, in 48 hours, we had almost, I don't know, 4,000 stars on GitHub since once we launched the project on GitHub and then it grew quite nicely. And then we had a production, like workloads the same year, like several months after we launched, we already had community users running DragonFlame production. So basically, I can't give any advice about this. It's still a miracle for me. I don't really understand how this happens.

Starting point is 00:56:40 Maybe just because we really created a project that provides value for our users. And that's the secret. Yep, it sells itself. And these are people experiencing some kind of pain, so they're looking for options in this particular area. In terms of DragonFi, you mentioned the open source stuff. And DragonFi's license is sort of like a business source

Starting point is 00:57:05 license. Is that right? Like where I can run it myself, I can't offer it as a service without working with you. Is that the gist of it? Yeah, I mean, we just try to avoid all the past mistakes that other companies did. We wanted folks to feel safe with our license and actually some of the open source licenses are too restrictive for companies to run these workloads. So we chose BSL, which is our promise

Starting point is 00:57:41 that for each version, each version will be converted to Apache license in four years in the future. So we move this date with each significant version that we release, but basically it just gives us four years of advantage over our competitors. Yep, absolutely. Okay, cool. Okay, last question for you, a little bit sort of maybe wacky, but you talked earlier about some of the difficulties around writing multi-threaded code and sort of learning that

Starting point is 00:58:18 stuff. And I'm just thinking like in terms of the, I guess like the AI revolution that's happening in code right now, and I write a lot of web apps, and there's a million examples of React components or simple backend APIs that they can do that pretty easily. I guess, are you and your team able to use any of these AI tools to help

Starting point is 00:58:37 with your coding process at all, or is it so specialized and sort of narrow that you have to be a little more careful and sort of write it the old way by hand here. I use EITools every day, but usually it's for helper scripts, stuff that would take me like two hours to write, I just do in minutes,

Starting point is 00:59:04 but it can't write C++ code. Good luck with that. That's what I figured. I'm like, man, which is probably good for you. Good job security and long-term prospects. That one's going to be a lot harder. But like me, I'm building web apps. The robots are going to be building all those next year, I feel like. So we've got to figure out what that looks like. I mean, I use Copilot.

Starting point is 00:59:32 And sometimes I have those wow moments where it just fills up like 10 lines of C++ code. And it looks really, really good. But then I run it and yeah, okay. I have a few more years as a C++ developer without this system challenging me. Yeah, yeah, for sure. Okay, cool.

Starting point is 00:59:58 Well, Robin, thanks for coming on. This is great. I've been watching Dragonfly from afar and I love, I just love people pushing the performance envelope on things and it's cool what you all are doing. And not only what you're doing, but sharing it back. You all have a great blog. You wrote some great blog posts and all that stuff. So I learned a lot there. If people want to learn more about Dragonfly or about you, I guess, where should they reach out?

Starting point is 01:00:20 Where should they go? Discord, our site, TalkToUs, our GitHub. I'm very responsive on all those platforms. Yeah, and thanks for having me. And it was a pleasure talking to you, Alex. Thank you. Awesome. Thanks for coming on.

Starting point is 01:00:44 We'll put links in the show notes, but yeah, thanks for coming on and sharing this stuff and best of luck to you at Dragonfly going forward. Thank you. Awesome. Thanks for coming on. We'll put links in the show notes, but yeah, thanks for coming on and sharing this stuff and best of luck to you at Dragonfly going forward. Thank you. Thank you. Bye.

CODACE Plant Stand

Software Huddle - Redis but Faster With Roman Gershman

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Software Huddle - Redis but Faster With Roman Gershman

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.