Software Huddle - Redis but Faster With Roman Gershman
Episode Date: March 4, 2025Redis is consistently one of the most beloved pieces of infrastructure for developers. And in the last few years, we've seen a number of new Redis-compatible projects that aim to improve on the core o...f Redis in some way. One of those projects is DragonflyDB, a multi-threaded version of Redis that allows for significantly higher throughput on a single instance. Roman Gershman is the co-founder and CTO at Dragonfly, and he has a fascinating background. Roman initially worked at Google and then was a frustrated user of Redis while working as an engineer at a fast-growing startup. He did a stint on the ElastiCache team at AWS but struck off on his own to make a new, faster version of Redis. In this episode, we talk through the improvements that Dragonfly makes to Redis and why it matters to high-scale users. We go through the different needs and requirements of high-scale cache applications and what Roman learned at AWS. We also go through the Redis licensing drama and how to attract developer attention in 2025.
Transcript
Discussion (0)
Do you see a lot of Node.js people using Dragonfly or is it like, hey, bigger systems that are
running in C++ or Java or something like that? We see everything. We see everything. They are
not shy in bringing you use cases that succeed to break Dragonfly from time to time.
How do you find developer attention or how do you get your name out there? What are you finding
effective in terms of the marketing aspect of just getting
you know, getting Dragonfly known?
I mean, you know how we understood that our project really brings value?
We are clueless in marketing and still people came to us.
We did this not because it's easy,
but because we thought it was gonna be easy.
Hey folks, this is Alex and we're back.
Got a new episode with Roman Gershman here.
And this is like one of my favorite types of episodes
because Roman is the co-founder and CTO of Dragonfly,
which is a, just like a faster version of Redis,
like basically API compatible with Redis,
but faster and just improved in a lot of different ways.
And we just go deep on all that sort of stuff, the threading improvements, the memory improvements,
sort of his path there, like what he saw and why he got interested in this problem.
He used AWS ElastiCache before that at a big startup that was using a ton of Redis cache
and just having issues there.
So really cool story there.
Back on we talked about some business stuff,
so check it out.
As always, if you want some other guests,
if you have any questions for us,
feel free to reach out to me or Sean.
With that, let's get to the show.
Roman, welcome to the show.
Thanks for having me.
Yeah, absolutely.
Well, you're like one of my favorite types of guests to have
because you're just one of these people
that's like on a different level than me, like thinking about systems and performance and
all sorts of that stuff. So I'm super excited to dig into that today. You are the CTO and co-founder
at Dragonfly. Maybe for those that don't know you or don't know Dragonfly, can you give us some
background on what Dragonfly is and what sort of your background is? Yeah, sure. So DragonflyDB, Dragonfly is an open project
hosted on GitHub, which is a drop-in replacement
for Redis and Memcached.
It has compatible APIs.
We have more than 250 supported commands,
lots of functionality,
and basically our customers are those that used before
Memcached or Redis, and now they're switching to Dragonfly
due to its better performance.
Well, we'll talk about this, but yeah.
Yep, yep, okay, so the big calling card is,
hey, it's everything's, not everything,
but for the most part compatible with Redis,
but significantly better performance on that.
So API compatible, but performance,
which is interesting because everyone thinks of Redis
as this super fast thing.
And of course, it is compared to a disk-based solution.
Then you also hear, you've heard over the years of just like,
hey, Redis does some things well,
but there are like some performance things that it can do.
So I'm like, I'm really excited to like dive
into some of those ones.
So I'd say like, maybe let's just talk through
some of the improvements and in Dragonfly,
how you improved on Redis.
And I don't know if you want to start in a specific way.
I thought maybe starting with threading,
like the thing I always hear about and think about is,
hey, Redis is single threaded
and that works well for some things.
It doesn't work well for other things.
So maybe talk about the threading model
that you all use at Dragonfly,
how that's different than Redis.
Yeah, sure.
So before that, I can just kind of talk
about my personal experience with Redis.
And that's what pushed me to think about multi-threading.
So in my past, I was working at Google. By the way, there I co-designed a memory system
that powered a Google Suggest API,
the thing that provides real-time queries when
you type in Google Search.
That's something that we launched globally at Google
in 2010.
And there, I kind of like the whole thing
of performance, scale, efficiency,
and et cetera, et cetera. And it was in a memory service, which like the whole thing of performance, scale, efficiency, and so on.
And it was in a memory service, which was also close to our domain.
After Google, I was a founding engineer at a small startup called Ubimo.
And one of the co-founders of that startup is my current co-founder, Odette,
who is the CEO of DragonflyDB.
And I was one of the first engineers there.
And we needed to kind of learn about modern infrastructure
that existed back then.
It was 2015 and Redis quickly came up on our radar.
And you start looking up Redis at the internet, and you see that it's the most performant,
the best in-memory store. That's how it was marketed. And we tried it out. Now, the startup was focused on ad tech, real-time bidding,
programmatic interfaces.
So it handled lots of traffic, programmatic APIs,
et cetera, et cetera.
And we needed really high throughput, something
that reached millions of QPS internally.
And we tried out Redis,
and then to my surprise, I saw it's single-threaded.
So, okay, they had this cluster mode.
We started using this.
It was a pain in the ass
because it was much more complicated.
And we hoped that we gonna solve our scaling issues with cluster mode,
but then we just noticed that we can't take snapshotting just because of the, you know,
so high write throughput that our system had, the snapshotting didn't work.
And I remember then thinking, well, it
is possible to do it better.
It is possible to create a system that
will be multi-threaded, that will be able to use more CPU
power, and then all those operations
that you would expect from a memory store,
like snapshotting, just would work better and it will be
There will be more robust
It was a thought that set and the back of my head, but I went on with my life
I was an engineer at the UB more
And didn't do anything about that
But we had developed internal systems
which were multi-threaded in C++
that kinda replaced some of the functionality of Redis.
Then the startup was bought and I...
And just to interrupt you for a second,
you said you built some stuff inside Ubumo
that was multi-threaded.
So did you replace your Redis usage altogether
in some cases or how are you sort of thinking about that?
At some point, I felt that we built too much custom stuff and it's not how folks outside operate. So we kind of maybe focused on the wrong things inside the company.
So what I did, I tried to replace the most critical parts that we just couldn't scale
up with Redis. We had a cookie management system that needed to contain or hold billions of device
IDs slash cookies.
And it was too much memory, and we
needed to replace it, like snapshotting and stuff.
And it just didn't work with Redis at all,
so at least not with the budget we had.
So I designed and implemented a system that was just
optimized for this use case.
And it was orders of magnitude cheaper and simpler to operate.
Again, not something that could replace Redis because it wasn't a
general database or data store, but it served our needs. Yeah, absolutely.
Yeah, everything that I kind of tried to build there was multi-threaded
because I wasn't a DevOps. And production or operational complexity
was a big pain point for our team.
We were engineers.
So when we designed multi-threaded systems,
we didn't need to operate multiple nodes,
distributed nodes, and it was just simpler
to run in production.
Yep, yep.
One question I wanted to follow up.
So you mentioned snapshotting,
and I read about that a lot in like sort of your docs
and things and the way you've improved snapshotting.
And I guess like, where do you see people using
snapshotting with caches?
Cause I often think of caches of ephemeral
data, you know, if you need to go back to your, you know, like a read-aside cache type thing.
I guess, like, were you using this almost like more like a primary data store in some case,
just because of the requirements of it and because that you're using snapshotting? Or how are you
sort of folding snapshotting into your strategy there? Yeah. So when you think about Redis, you often
think about it as a caching system.
But I would generalize it to something else.
It's not just the caching system, it's a data store that
has own right to exist for business use cases where data loss is not business
critical issue. So basically if you have like a single source of truth, which you can use to fill up your in-memory store,
and maybe it also serves as a caching system, but if it is lost, yeah, maybe it's a pain point, but your business survives it, then maybe it's a good use case for Redis.
I wouldn't want to see a bank using Redis for its transactions, right?
Yeah.
But there are many, many use cases that are beyond caching,
that people have
Redis for them, operate Redis.
Yep.
But then when you're taking those snaps,
when you're snapshotting your Redis instance,
I guess what were you doing with the snapshot?
Is that if an instance failed, you'd
use it just to more quickly recover from that failure?
Or was that maybe not a permanent store,
but like the only source of that particular data
that was a little more ephemeral,
like cookies or something like that.
I guess like, help me understand where
and why you're using snapshots.
Yeah, so basically we personally used it
for disaster recovery.
And yeah, there are several solutions to that.
Sometimes you can just use high availability
and sometimes you can use high availability
with snapshotting.
But basically we had, let's say again,
at the frequency capping system
that for some device we don't want to show ads
more than K times.
That's something that we managed there, our Redis cluster.
And for that, yeah, if it crashes,
probably the last three, four hours are going to be lost,
and we'll show more ads for that device.
But it's not business critical.
And then you can load. And by the way, availability is more important because when we do not have this system, we basically didn't show any ads and then it's a loss of revenues. So we wanted to load
this as quickly as possible. And again, Redis didn't do a great job with that as well.
Gotcha, gotcha.
What does that look like?
Like I say, I have a big, you know,
one of these bigger high, very high traffic caches
that you're talking about, maybe tens, hundreds of gigs.
How long does it take to recover a snapshot
from a new node if you have one that fails over,
that's that size?
Well, it really depends on your production environment. I think we stored a snapshot locally on disk
and then we are subject to bandwidth limits
on that instance. So basically we didn't have
a high performance EBS disk and also radius by itself has its own limitations.
And then you have multi-gigabyte snapshot and then you need to wait for it to be loaded
into memory.
In addition, of course, we did some backups on cloud storage.
But usually we designed our system that in case of crashes or something,
it would load the snapshots from local disk.
Gotcha.
Okay. And then you said after Ubimo, you went and worked actually at AWS ElastiCache
for two years as a principal engineer there.
Tell me, I guess, like what you worked on there.
Like, did you try to get them to do some of this stuff
or make some changes or make their own?
Or like, how did you sort of think about that role?
Yeah, to tell you the truth, I reached out to them.
They opened a new team in Tel Aviv, in Israel, where I live.
And when I heard about it, I remembered my wishes
about how to improve Redis.
And I thought, OK, that's probably the perfect place
to do such things. And I wrote an email to a manager of the team
and told him, listen, I can just improve Redis so much.
I didn't have any slightest idea, by the way,
how I will do it.
But that was my selling mail.
I just wanted to try this.
So I told him, I can improve Redis.
I can rebuild or fix so much of its problematic design points.
And they accepted me to the team. Back then I had a pretty
lots of experience not just around Redis but also around designing
memory systems so it looked like a very good fit and yes in ElastiCache, I had the privilege to see how other customers use Redis clusters at scale,
so small but very big ones, and also could suggest improvements and to discuss design, problematic design decisions
that were made in Redis. And I did all of that.
Yep. Yep. But you're probably still like somewhat limited, right? They probably didn't want
to like make their own fork of Redis at that, you know, now a couple of years later, they've of course got Valkyrie
and all that stuff. But at that point, they were probably fairly locked into Redis and not...
They did their enhanced I.O. type things and a few different things, but not wanting to do
probably the wholesale changes that you're doing at Dragonfly. Yeah, I had these discussions.
Yeah, I had these discussions. I mean, it's a classic innovator's dilemma, I guess. And I had very open discussions with senior engineers on the team, with the management of the team about what can we do,
whether it's the right place for ElastiCache team
to start doing something new.
And I felt that there are good reasons why,
maybe it's not a good idea for the ElastiCache,
but at the end of the day,
I thought it's a classic innovators dilemma
where basically they have their reasons
maybe not to innovate there,
but maybe the world, the engineering community
would benefit from another system
which possibly be much better than Redis.
Yep, yep.
Okay, so then you go off and you start dragging and fly.
And just from like some of the benchmarks I've seen,
we're seeing 20, 25x improvements on sort of queries
per second and things like that over Redis.
I know there's a couple different things here,
the threading, the memory improvements,
a lot of different things.
I guess like, I know this is
hard to quantify, but is one of those changes doing the bulk of them, bulk of the improvements? Is it
all those things working together? I guess where's a good place to start in terms of understanding
the improvements that Dragonfly is making? Yeah, so let's put something out straight.
Let's put something out straight. When we talk about 25 times better throughput,
it's when we compare single-process Redis versus single-process Dragonfly.
I think historically the objection to that in the Redis community was that horizontal scale solves those issues and we can discuss
why it doesn't really solve the scaling issues. But if you compare single process Redis with
Dragonfly, yeah, you can get even higher than 25 times more throughput on bigger machines.
25 times more throughput on bigger machines. Today we have machines with 128 CPUs,
even higher with really good networking devices.
Usually the bottleneck is gonna be on a networking throughput
and but Dragonfly is not the bottleneck there.
So a multi-threading is the reason why we can scale vertically
to the maximum limits of any machine.
So on a weak hardware,
we'll probably get much smaller improvements,
but on a very strong hardware,
ironically on AWS, you can get to really high throughputs.
Yep, yep.
Okay, and so some of the threading stuff,
I was looking through it and it looks like
you are in the C-Star community,
which I think of like the,
is this similar stuff is like what Cilla does for Cassandra
or what Red Panda does for Kafka?
And I think of like those sort of people,
is that like a similar type of thing going on here
with Dragonfly and Redis?
Yep, actually I learned about ShareNothing
by reading blog posts from Sistar.
That's how I discovered this architecture.
That's how I discovered this architecture. I learned from a reading blog post of
a Avikiviti and Costa Glober, I think.
I hope I pronounce his name right.
He's the CEO of Turson right now.
Turson, we've had him on. He's a great guy. Yeah, for sure.
Yeah. They were writing a blog post about how
share nothing architecture helps them to scale their use case to
multi core servers. And a I like, I tried sister, when I
learned about coroutine slash fibers slash
Asynchronousity it was before I joined AWS
I am a C++ guy. So it was close to my hardest
also, but I kind of found it not very convenient for me and to learn
found it not very convenient for me to learn. So I went and also I kind of learned when I implement or try to code myself.
So this is how I came up with another project which is called Helios and this became something that is equivalent to C star for C-Ladb
and the Red Panda but yeah they're kind of very similar in their design goals.
Mine uses fibers and allows me to write code that kind of resembles maybe
Node.js asynchronous function calls, where you just
tried your code synchronously, but underneath,
everything is asynchronous.
And their code looks like continuations.
I don't know if you know, like, it's an actor-based design
where you need to write, do this then,
and you have this lambda that you chain
and just looked a bit complicated on my test,
but they succeeded a lot, so.
Yep, for sure.
So you mentioned like even some of the difficulty
of picking that up.
And I know Salvatore from the Redis project
had talked about just how single threaded is simpler
and that's sort of why he did it.
I guess like, is that still the case?
You know, a couple of years here in the Dragonfly
where it's like, hey, this multi-threading, it's harder.
We have to be more careful and all that stuff.
And there are these huge performance gains
we can get from it and it's worth that,
but it is like hard.
Is it hard to find developers for that? like what are you finding a couple of years
into the project?
Yeah.
So first of all, he's right.
Single threading is much simpler.
Python is single threaded, Node.js is single threaded.
But for them, they have an excuse. Developers use Python and Node for building stateless applications.
They rely on something like Redis to offload their state.
So it's kind of like a bubble in a way. Like Redis doesn't have anything to rely on.
It's like the final destination.
It is the thing.
And stateful systems must be reliable and performant.
They are responsible for managing the state
of all those applications that the full-stack engineers build.
And I agree with Salvatore and Filippo. It is complicated to build multi-threaded applications,
but life is hard and we need to do this improvement and innovation.
And I think it's not a good enough excuse why a piece of infrastructure
is not as performant as it could be.
Yep. Yep. And especially like you just consider the
leverage we have on it. You know, it's hard for you. It's hard for your team.
You have to put a lot into that. But then so many people can benefit from that.
It's not like everyone has to sort of learn this multi-threading stuff. You know, again, like me as a node developer,
I cannot farm it out to Dragonfly and not have to worry about that as much.
Yeah, but again, innovators deliver. At the end, it becomes a symbiotic relationship
a symbiotic relationship where you already have your ecosystem of users and companies using you. Now to disrupt yourself when you start from scratch, it's very, very hard. It's really
impossible even if you want to. You need to say no to potentially great business. You need to start from scratch. Your current users will
be disappointed because instead of deploying new features, you're going to be focused on
some crazy idea of rewriting this from scratch. So this is what innovators dilemma is about. And for that to work, you need to be an outsider,
unfortunately.
And as an outsider, I was a bit arrogant.
I frankly thought it's gonna be simpler.
But at the end of the day,
I'm happy of how it worked out. The original design still works.
We constantly improve Dragonfly and it delivers.
Yeah, it reminds me of that meme where it's like,
we did this not because it's easy, but because we thought it was going to be easy.
Yeah, that's probably where we were hitting the wrong areas.
Exactly.
Yeah. Okay, so a probably right. Exactly. Yeah.
OK, so a little more on threading,
and this is maybe a dumb question
because I don't know that much about system and stuff.
But you talk about shared nothing on this single node
sort of split across these threads.
And when I think of shared nothing,
I think of databases like Dynamo or Vitesse
or something like that where it's like, hey,
they're splitting the key space across these different nodes. Is that what's happening here? Like, hey, if I have this,
you know, Dragonfly key space, the data is segmented across these different threads.
Is that what's going on there? Yeah, exactly. Each thread in Dragonfly process is responsible for its own keys. And basically there is no contention when you need to
access.
Sorry, if for example, a client connection needs to access a
key, it can just go and read a memory where this key is
located. It needs to communicate the intent with the thread that is responsible.
So basically, it's very much like a distributed system across multiple machines,
but collapsed into a single process.
And everything is done, there are messages. And you can ask, how come it is more efficient than just launch multiple processes, for example?
And it's a great question to ask. The thing is that there are some critical components inside the that are implemented using shared memory,
like atomics, memory atomics,
and some log-free algorithms
that make the whole thing much more efficient.
So basically it's much more efficient
to do this share nothing architecture within the process
than launching multiple processes on the same machine
or just using multiple processes on different machines.
So it's another level of inefficiency.
Yep, yep, okay, okay, that's super interesting.
Okay, so that's the threading stuff.
I know that you've also done some improvements
just in the memory management itself with Dash Table.
I guess maybe give me the high level on Dash Table
on how that's sort of improving
on Redis and Redis dictionaries.
I mean, it was kind of a random thing.
Everything is random actually in Dragonflyer.
But multi-threading was my first goal, something that I explored
and read lots of state-of-the-art papers from the research committee on how to implement
all those operations in Redis. It was a research of like research and development effort on my side.
And dashboard, I myself like hash tables. I read about them, I like, I read about their
implementations, just like this data structure. And I stumbled upon a paper that talks about a dash table,
but in a different context.
They're talking about persistent memory.
If you remember, several years ago,
Intel tried to push a persistent memory design,
but it didn't really work out.
So there were a bunch of papers talking
about this and this hash table was persistent memory friendly in terms of
its design. But I saw something else. I saw that the same design can be
used with regular memory and have some advantages over the hash table that is being used in the Redis.
By the way, in Redis they use it is very well engineered hash table implementation,
but it's very simple. It's something that we learn in computer science course in the university.
So it's very well engineered, but nothing sophisticated there.
There are lots of hash table designs.
I can probably talk about this for an hour.
But yeah, I like in the hash table that its resize operation is kind of elastic. So in
Redis when you perform resize it's always multiplies by two and so for
really large workloads on this machine it becomes an issue because you need to resize a huge array
and just by itself a very heavy operation.
And then you need to do rehashing and stuff.
So that design was much more efficient.
It was designed for modern hardware.
It could use a vectorized instructions
and just liked it and they saw that it fits Dragonfly. It fits in memory data store and just
borrowed. Yep, yep. And are the big benefits for that? Are they mostly around like, you know,
using less memory, having smaller memory size or is it also just faster or I guess like, what are the-
It's everything, it's everything.
So it's much faster.
I think for regular use cases,
you won't feel it so much because other stuff
like networking management, IOS,
parsing will will be more significant on your profiling radar when you see it.
But it really affects 99% because 99% is about those slow things that the radius does and
suddenly they're much faster in Dragonfly.
In addition, you mentioned memory.
Yeah, it has open addressing scheme,
and Redis hash table is a chained design
where you have lots of pointers.
And with pointers, you have more metadata.
and with pointers you have more metadata and so you have more like overhead in terms of memory especially for smaller items in case you store small items so
there we are much more efficient as well and by the way it is a kind of
interesting when Dragonfly launched in 2022, a year, sorry,
a week afterwards, suddenly in Redis report,
they added an issue, improve Redis hash table design.
I don't know if it is coincidence or not,
but that's what they added after we launched with Dragonfly.
In Valq, they are actually going to release,
I think Valq 8,
a bit more efficient data structure
than the original Redis hash table.
Yeah. Okay, very cool.
I do want to talk about Valq later and some of this other stuff,
so I want to come back to that.
Just one or two more later and some of this other stuff, so I wanna come back to that.
Just one or two more things around,
like some of, I guess the improvements,
or just things around that.
One thing I wanna ask about is the SDKs.
Right now, you just say, hey, use the Redis SDKs,
those are gonna work, or MMCacheD,
those are all just gonna work.
Do you think you'll ever write your own SDKs?
Is there some low hanging fruit
just in terms of connections or something elsewhere
if you had more control over it, it would be beneficial?
Or is it just like, hey, you know,
the Redis ones are good enough,
we don't need to mess around with that layer.
First, definitely.
Definitely there is the right performance improvements
if we would write our own client library.
So basically, Dragonfly benefits from using more connections as well as using pipelining.
With Redis, it's all about pipelining.
So you can just pipeline your request and this reduces the pressure on the networking stack of the server.
With Dragonfly, it's both. Because it's multi-threaded and asynchronous,
you can use multiple connections to access it, and it will just respond. So basically, if you just
just respond. So basically, if you just use a single connection, even with pipeline,
Dragonfly is obliged to process each request on that connection sequentially. So this will introduce a latency just because Dragonfly can't jump and skip because maybe the next request is
a kind of dependent on the previous one.
So it would break atomicity guarantees.
But if a kind of customer or user would use multiple connections, they already know what
can be parallelized, what kind of workloads are independent, and they can use multiple connections,
then Dragonfly will be utilized much better.
That is usually a problem around Node.js client,
which its design is around a single connection.
a single connection. I mean, yeah, it could be improved, but we just saw that Node.js users still get lots of improvements from using Dragonfly, even with the classical client libraries.
And I myself and our team is not big experts in that area. So we just, right now we stick to what we know.
Yep, yep.
Do you see a lot of like Node.js people using Dragafly
or is it, you know, like, hey, bigger systems
that are, you know, running in C++ or Java
or something like that?
We see everything.
Like we see everything.
And unfortunately, they are not shy in bringing new use cases that succeed to break DragonFly from time to time.
But yeah, we see Ruby, we see Python, Node.js,.NET, Java, Python, of course, like everything basically.
Rust. Actually, I don't know if I saw Rust user, but.
Really?
OK, yep.
On that same note of the SDK, do you like the Redis API?
Or are you like, oh, man, I wish it were different?
Yeah.
So Redis API is about kind of your question,
actually, whether I like Redis.
Because it's kind of equivalent.
Yeah, I like its simplicity.
I like Redis as a product.
It's a system with 1,000 faces, basically, because everyone can find their own use case
with Redis.
And I like its versatility.
So yeah, it can be sometimes overwhelming
and to learn all those things.
And it's really kind of hard to experience the knowledge
of how to translate your flows into the building blocks of
Redis. And really, community folks did a great job with all those frameworks. There are lots of them
around Redis of how to translate job management,
state of loading, everything like this into Redis.
Really good. Yep.
On the same note of the SDKs and things like that,
I know you're highly compatible with Redis,
you have all the commands listed where you are.
For the ones that you haven't implemented yet, are you aiming for eventually 100% or are you just like, hey, these things
sort of don't make sense in this sort of world? The type of customers we have don't use those
things, maybe shouldn't use those things, there's a lack of interest. I guess, how do you think about
trying to be complete versus focusing on the things that matter for your customers? Yeah.
So we went just from older...
Just common sense that older APIs are used more frequently.
So we started with those.
And our first milestone was to reach RedisX compatibility.
And we did that around a year ago. Now we are going to
implement all the 7.2 APIs just to reach the compatibility with
just to reach the compatibility with the Redis slash Valky fork. And then kind of, it's going to be a good place to rest in terms of APIs compatibility.
Yeah, yeah. So given all these changes that we've talked about, are there differences in just modeling advice
or the way people should do things in drag?
I imagine it's mostly similar to Redis,
but are there certain things where either you had to worry
about this in Redis and now you don't?
Or, hey, maybe you used to do this in Redis
and it's not as good idea to draw and drag and fly.
Are there any differences in sort of modeling
and approaches there?
It's a great question. I'm trying to think about a good answer. So first of all, a
I'll tell you about immediate benches that you have with dragonfly and
Sometimes people start with a single node
Redis and they need
All those a Redis and they need all the entities at the store that know together just because they sometimes fetch it with different combinations from
permutations like using mgets or because they do some set computations, so they need to intersect multiple sets.
So they need them all together, and it's very hard for them
to scale horizontally, for example.
Dragonfly allows to scale vertically
and to just forget about this restriction.
When you scale horizontally, you need
to collocate your data within hashtags. forget about this restriction when you scale horizontally you need to
collocate your data within hashtags and you can do cross hashtag operations.
So Dragonfly has a cluster mode emulated mode where it can
just work with cluster clients but also it lifts all those restrictions
about cross-slot operations.
And everything is still atomic.
So basically you're working with single node radius,
but you can scale it up to 64, 100 times,
whatever the size of your server is.
That's one thing.
Another thing we mentioned briefly, that Dragonfly likes multiple connections.
You don't need to push into pipelining millions of requests.
It just increases your latency. you can paralyze it.
And that's it, I guess. There are not much to it in terms like for Dragonfly.
Hope it answers your question.
Yep, yep.
And one thing you said earlier is Redis,
I can't remember if you call it like the tool or a thousand faces or something like that,
but there's so many different use cases for Redis.
And I know that you all work with very high need users,
either lots of data, lots of TPS, things like that.
I guess how does their usage of cache differ from,
you're sort of like more
standard every day, you know, smaller, smaller Redis users,
are they using just like the simpler things just sets and
gets are they still using pretty advanced stuff, I guess, like,
what does that? What does that look like for for those high
scale users?
It's a good question. I would say that there is no correlation.
You can find a small company being really sophisticated around Redis, and they don't have to be sophisticated.
Well, for example, if they start using, let's say, Sidekick framework, it uses quite a few APIs.
So just by using this framework,
and usually is what you see that many companies
do not use Redis directly,
they use it through those frameworks,
and then it's just variety of APIs.
If we look at like Bird bird's-view on kind of API usage of our
systems in production, I think set and get is going to be dominant APIs. But
again, it's also kind of for smaller companies and bigger ones.
It doesn't mean that the bigger companies only use Set and Get.
They can have also job management frameworks, different use cases, but yeah, Set and Get
probably most predominant ones.
Yeah.
Yeah.
Okay.
Let's talk a little bit about, I know like in the news,
the last year or so with Redis,
there's been the Redis licensing drama
and sort of all that stuff.
I guess for people that don't know,
what 2018, 2019 or so Redis Inc started,
they first had like their modules
that they were licensing separately
and people couldn't use them.
And then last year announced that like new versions
of Redis would have a different license.
I guess like, what do you think of,
how does that impact you?
How do you think about that?
How do you all think about that at Dragonfly?
We kind of stayed out of this drama,
but we saw a positive effect.
We saw new users, community users, and also customers coming to us, kind of started looking up for latest alternatives.
And maybe they didn't hear about Dragonfly.
So just because of this motion, we sometimes met new users and new customers.
Yeah, but I generally think it's a good thing for the community.
Even if it fragments it, it kind of creates a competition.
So, like I mentioned before, when Dragonfly has been launched,
a week after they added an issue of improving their dictionary.
So suddenly you see that when there is a competing framework,
just people try harder.
You need to try hard when you know that there is a competition in this field,
and you can see the results.
Valky now have lots of performance improvements
and Redis has their own stuff going.
So yeah, it's a good for community.
Yeah. Yeah.
Speaking on that, that like competition aspect,
it's interesting because I feel like, you know,
there was like a 10 year period where it was all Redis
and only Redis and sort of like, that was the only thing
in the like, you know, 2009 it comes out. I think until 2019, you didn't really see much else out there. But now in
the last couple of years, you're starting to see, I guess, more competition in the space
between you and KeyDB, which was acquired by Snap and Valkey and Memento. I guess like,
what, what do you think in the last five years has sort of, I guess,
triggered some of that interest in the space?
I just, that's kind of what I learned at the ElastiCache team is that there is a tremendous appetite for such system in memory store, because there are more and more use cases that require
real-time, people want responses in real-time. And there is a tremendous pressure between the current state of the art
and what really people want.
And there is no challenge, nobody challenges the status quo.
And I think there is a... I need to give a big kudos to KDB team that started challenging the project by starting their own thing.
I disagree with their design decisions, but it worked out for them.
They got acquired by Snap and I hope Snap benefits from their system.
But I wanted to go bigger, me and my partner, actually.
And that's why we chose a different architecture.
And we decided to do all in and to rewrite the whole system
from scratch.
And the thing is is a kind of strategic. I can
ask you, I ask myself, how do you see an in-memory system in 10 years? Do you
really want to see hundreds of those tiny single process registers spread
around your infrastructure stack? Or you want to see a software that can accommodate itself
to any modern hardware that we have.
I didn't see that this is a possibility.
I was sure that someone who will do something better
just decided to be the thing that tries to do this.
Yeah.
What do you think the competition will look like?
You know now that Redis has its own thing,
you know Valkeys forking off that and like everyone is
Redis compatible right now.
Do you think that will like I guess will the will
will be competition on the API surface and different features
or is like hey red you know Redis has the features
that it needs.
That's that's mostly what people need from cache is these data structures.
And the competition is going to be more of the under hood,
under the hood performance and pricing and, and, um, ease of use type stuff.
I think in terms of APIs, it's very hard,
um,
to add something that is going to be really disruptive.
By the way, it's true for any database system.
Can you imagine any new feature for PostgresQL that is going to be super disruptive?
I can't.
There will be improvements, of course.
But I think the way people consume those services,
the complexity or lack of it is going to be the decisive factor,
because then it affects the whole experience
of using a system.
And that's what we see with relational databases as well,
like with Athena, for example, like serverless and stuff.
Like it's same technology, but different consumption model,
and suddenly it's super popular.
Yeah, yep.
On that same note, do you see much usage of the various,
like I was like Redis modules,
the more like exotic API type stuff,
whether it's search or blue filters or things,
like do you see much usage of that?
Or is it like, hey, most people are just using Redis Core?
I have inherent bias because what I saw,
it was on the ElastiCache platform.
And there, they only had the core features.
I believe on-prem users have more usage.
And that's what pushes them to buy from Redis Enterprise
or Redis Labs, their cloud product.
But I think the majority of users just use,
like I said before that, it was set and get.
So maybe some percentage uses more sophisticated APIs,
but basically the first 100 APIs,
like 99% of the whole usage.
Yeah, yeah.
Okay, talking, changing a little bit
into like building an infrastructure company.
I guess like, what is it like when you're sort of competing
with one of your main providers, right?
If you're running on AWS or GCP, I guess,
what is that sort of interaction like?
I mean, I guess, what is that sort of interaction like? I mean, I think,
like tell you the truth,
I don't feel any pressure there.
So basically we just operate to best of our capacity.
Of course, maybe in terms of business, motion, their sales team on those cloud providers
prefers their internal products. Probably true for any vendor that tries to challenge big hyperscalers. But that's kind of our life and that's okay.
But it's okay.
Yep, yep. What about, you know, one thing I,
when I talk to infra, you know, startups,
depending on their flavor,
I hear a lot of complaints about AWS networking costs.
I guess like, is that a big factor for you?
Is that, does that come up a lot for customers
or for you internally?
I guess, how do you think about those?
Yeah, yeah.
I mean, yeah, it's a, their durable advantage.
And they put those barriers where we can just, you know,
provide our service to a third-party customer
in a frictionless way without paying them a tax.
And it's around, if you want a VPC endpoint,
you need to set it up and then you need to pay for bandwidth.
And there are also customers
outside of cloud like on-prem that won't consume Dragonfly and then they need to pay for agris
and sometimes those agris costs really become significant compared to the data store costs.
It's a problem. It's a problem. Yeah. Yeah. Yeah, for sure. That's right here.
What about building an infor company?
I guess, how do you find developer attention?
Or how do you get your name out there?
What are you finding effective in terms of the marketing
aspect of just getting Dragonfly known?
I mean, you know how we understood
that our project really brings value?
We are clueless in marketing and still people came to us.
We really had zero knowledge, didn't know how to market ourselves. ourselves and suddenly in, let's say, in 48 hours, we had almost,
I don't know, 4,000 stars on GitHub since once we launched the project on
GitHub and then it grew quite nicely.
And then we had a production, like workloads the same year, like several
months after we launched, we already had community users running DragonFlame production.
So basically, I can't give any advice about this.
It's still a miracle for me.
I don't really understand how this happens.
Maybe just because we really created a project that
provides value for our users.
And that's the secret.
Yep, it sells itself.
And these are people experiencing some kind of pain,
so they're looking for options in this particular area.
In terms of DragonFi, you mentioned the open source stuff.
And DragonFi's license is sort of like a business source
license.
Is that right?
Like where I can run it myself, I can't offer it as a service without working with you.
Is that the gist of it?
Yeah, I mean, we just try to avoid all the past mistakes that other companies did. We wanted folks to feel safe with our license
and actually some of the open source licenses
are too restrictive for companies to run these workloads.
So we chose BSL, which is our promise
that for each version, each version will be converted to Apache license
in four years in the future.
So we move this date with each significant version that we release,
but basically it just gives us four years of advantage over our competitors.
Yep, absolutely.
Okay, cool.
Okay, last question for you, a little bit sort of maybe wacky, but you talked earlier
about some of the difficulties around writing multi-threaded code and sort of learning that
stuff.
And I'm just thinking like in terms of the, I guess like the AI revolution that's happening
in code right now,
and I write a lot of web apps,
and there's a million examples of React components
or simple backend APIs that they can do that pretty easily.
I guess, are you and your team able to use
any of these AI tools to help
with your coding process at all,
or is it so specialized and sort of narrow
that you have to be a little more careful
and sort of write it the old way by hand here.
I use EITools every day,
but usually it's for helper scripts,
stuff that would take me like two hours to write,
I just do in minutes,
but it can't write C++ code. Good luck with
that.
That's what I figured. I'm like, man, which is probably good for you. Good job security
and long-term prospects. That one's going to be a lot harder. But like me, I'm building
web apps. The robots are going to be building all those next year,
I feel like.
So we've got to figure out what that looks like.
I mean, I use Copilot.
And sometimes I have those wow moments
where it just fills up like 10 lines of C++ code.
And it looks really, really good.
But then I run it and yeah, okay.
I have a few more years as a C++ developer
without this system challenging me.
Yeah, yeah, for sure.
Okay, cool.
Well, Robin, thanks for coming on.
This is great.
I've been watching Dragonfly from afar and I love,
I just love people pushing the performance envelope on things and it's cool what you all are doing.
And not only what you're doing, but sharing it back.
You all have a great blog. You wrote some great blog posts and all that stuff.
So I learned a lot there.
If people want to learn more about Dragonfly or about you, I guess, where should they reach out?
Where should they go?
Discord, our site, TalkToUs, our GitHub.
I'm very responsive on all those platforms.
Yeah, and thanks for having me.
And it was a pleasure talking to you, Alex.
Thank you.
Awesome.
Thanks for coming on.
We'll put links in the show notes, but yeah, thanks for coming on and sharing this stuff and best of luck to you at Dragonfly going forward. Thank you. Awesome. Thanks for coming on. We'll put links in the show notes, but yeah, thanks for coming on and sharing this stuff
and best of luck to you at Dragonfly going forward.
Thank you.
Thank you.
Bye.