Orchestrate all the Things - Speedb is a drop-in replacement for RocksDB that wants to take the embedded key-value store world by storm. Featuring CEO & Co-founder Adi Gelvan
Episode Date: November 18, 2021RocksDB is the secret sauce underlying many data management systems. Speedb is a drop-in replacement for RocksDB that offers a significant boost in performance and now powers Redis on Flash. Art...icle published on ZDNet
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting the dots together.
RocksDB is the secret source underlying many data management systems.
SpeedDB is a drop-in replacement for RocksDB that offers a significant boost in performance
and now powers Redis on Flash.
I hope you will enjoy the podcast.
If you like my work, you can follow Link Data
Orchestration on Twitter, LinkedIn and Facebook.
Good to meet you, Adi, and thanks for making the time for the conversation. And
I thought, well, since we haven't spoken before, and to be honest with you, it's also the first
time I hear about SpeedDB and you were probably kind of not exactly in stealth mode but under
the radar let's say up to now so I thought the good way to start would be
well from from a beginning if you can just say a few words about yourself and
you know founder story let's say and what what drove you to to start this
cool cool so so thanks for taking the time. It's a pleasure. The story of Speedybee starts
actually about, I would say, 10 years ago when I met my co-founders. We worked together
in Infinidat, a very big storage company. I led the sales for
the company and my co-founders, they were the chief scientist and chief architect.
They actually worked there for about 10 years and one of their last projects in
the company was a new problem that arised due to the need to
support more protocols and actually having to cope with a very interesting
phenomenon that happened which is the metadata growth. Different protocols and the new data that is being created by the world
is actually creating a new phenomena. I would call it the metadata sprawl, but
actually data now, if before the metadata was a very, very insignificant part of the data. Now, sometimes it's bigger than the data itself. So one of the
projects they had to deal with is to find a way to handle the
metadata very, very effectively, without having to add more
hardware to the storage system. So one option was to rewrite the whole data stack, which was not really an option.
And the second was to go to the market and see what available solutions are there to
manage the metadata effectively.
And then they went to the market and looked around and they saw that the storage engine world was really interesting.
This is the part of the system that is managing effectively the metadata.
It's sitting beneath the application and the storage layer. And this market was a, it's an open source market, as you may well know,
and controlled by companies like Google and Facebook and Apple and Oracle, each one working
with its own storage engine. And then they saw that the most prevalent one was RocksTV,
which is used by a vast amount of customers and has a very large
community. It's open sourced by Facebook. It's actually a fork of what is called LevelDB from
Google. And they said, okay, let's embed it within the storage. And when they worked with it they saw that it was working fine with very very
small data but it wasn't really effective with large data and it really
couldn't scale when it scaled to more than a hundred gig then they saw some very weird performance issue. And things like stalls and the IOS stops
and very, very high usage of memory and CPU.
And then they found out that it was nothing wrong
with the product itself, it was just not meant to scale.
So they reached out to the community
and then spoke to the vendors.
And they found out that this is how it's working.
And the solution in the market is very easy.
It's called sharding.
Right.
You buy the data sets into much smaller pieces
and then you assign each one to a node. and every node runs with its own storage engine,
and that was the solution.
And in an engineer's eyes, this was like a workaround,
not the real solution.
And there, the thoughts about, okay, why don't we create
a next generation storage engine?
Why don't we create a storage engine that is built to scale
and that will be able to provide the customers the ability to run faster,
scale much higher, and do it with much less resources?
And there the idea of Speedy B raised, and and we started uh thinking about okay how can we do
it different okay i'm going um excuse me for for interrupting and we're going to get to the part of
what you did in a different way uh but before we get to that you've already mentioned a few things that kind of got my attention, let's say.
So I'm going to ask you a bit of a follow-up there.
So the first one, you said that your point of departure, let's say, was the need to handle metadata.
And I'm kind of wondering, how come you ended up choosing something like RocksDB for that?
I mean, RocksDB, it's very good at what it does.
It's a very good key value store.
But there are systems and ways of managing metadata
that are more specialized, let's say, more tailored to that need.
So what made you made that initial choice?
Yeah, so first of all, we needed to address the embedded storage engine market. Yeah,
the need for an embedded solution, which will be consistent to this was where we started
from.
Okay.
So your requirement was that you needed something that worked in embedded systems, right?
Yes.
Okay. Okay.
When you look at this market, the embedded key value store is probably the most vulnerable
because it's embedded in your system, it's running on the same resources,
and the efficiency is very, very much needed.
If it's lying outside the system, then it's someone else's problem
and you can actually get it as a service.
But when you're talking about an embedded library,
it has a direct impact on your application
and about the resource
utilization. So the need to save memory, CPU and other resources is very very critical.
Okay, I understand. If you have this requirement then it makes sense that you
that you ended up with something like ROCKS2B. It was kind of an obvious choice in a way and the other question is really kind of related and i think in to some extent you may have already
answered so i was wondering what's what's wrong with sharding basically i mean it's a very it's
a very standard uh thing that many people do and at some point you do reach uh limit in in
performance and scalability and all that and well, you have to split your workload into more nodes, basically.
But again, I guess it goes back to that same requirement that you had
that you needed to have everything in the same node.
Yeah, so there are various reasons for sharding, right?
One of them is compute power.
Sometimes you simply want of them is compute power. Sometimes you simply
want to use more compute
power than you have in one system, then
you shard the application, and
that's fine, especially in
the cloud era when, you know, you
can spin up thousands of instances in
a second, and this
is very popular.
But sharding in the data space
is coming from
other reasons, right?
Sometimes you want a DR, you want mirroring
and other solutions, but there is no real reason
to shard a data set if you can actually
do it very efficiently on less hardware.
Now, from an architecture point of view,
sharding is a real menace. You need to manage a lot of nodes, you need to design
your code to support sharding, and more than that when you scale, since you're
talking about data, then inter-node communication start to interfere
and then you find yourself, excuse me for my French, in a nightmare of performance and
scalability. So, if you are looking from a data standpoint, if you could do more with less, if you could scale your data with less nodes,
then you will be able to shard as much as you need
and not, sorry, as much as you want
and not as much as you need.
Found out that by giving extra performance and scale
and less resources, we can allow our customers to run much more data
at a much higher speed with less resources. And they would charge as much as they want
from the application point of view and not be really limited by the data point of view.
Okay. Yeah. I mean, my point of view on that, I guess, somehow converges with what you said.
Because basically, yes, absolutely, you should implement your solution to be as efficient as possible.
And I guess that's what you've went and done. I was just saying that, well, basically, at some point, I think you will reach a point that you kind of have to start.
So you're only going to push it forward a little bit.
And obviously, this is a good thing, but you can't avoid it in the end.
Yeah, sure. Sure. But,
but we want to let you shard as much as we really want and not be forced by
the size of the data.
Okay. Okay. Sure. That makes sense. And just to re to,
to go back to the thread of the, of the story you were saying.
So you said
it all started about 10 years ago and eventually
you started wondering about, you explored the options and then you started
wondering, well, how can we do this better? So when did
the idea of actually forming a company around that
shape up? Right, right. So to be more accurate, about 10 years ago I
met my partners in Infinidat. The project which they had to actually embed
RocksDB was in the late 2019 when I was no longer in the company. And at about the beginning of 2020,
actually when the COVID hit,
then the real issue with the RocksDB
arised and then we gathered together,
me and my partners,
they looked at the technology stack and said, okay,
we think we know how to redesign the LSM structure in a way that,
that we think is a,
is a very unique and will enable us to do these technical things.
And I was looking at the size of the problem. When we realized after talking to
a lot of customers that the amount of customers out there using RocksDB is much bigger than we
thought and that RocksDB is de facto the standard embedded LSM storage engine, we realized that building a company would be something smart and
will enable us to solve many pain points of many customers. That was the point where we decided that
we should build something out of it. Okay. Okay. So it's relatively recent,
a recent thing then.
And just to tie this all together with the news that you're about to announce,
I think it's twofold.
One part has to do with the fact that you're getting some seed funding and the other part has to do with the fact that you're also announcing
a collaboration with Redis. So let's address
them both in turn and I would suggest to start with the seed funding part because then by doing
that it will also enable you to share a few words about the company basics. So like, what is the team like?
Yeah, what's the funding you are getting?
What are you going to use it for?
And this type of thing.
Cool.
So we actually got our seed funding in November 2020.
It's actually one year ago.
And it's exactly one year ago
because today we are celebrating internally our first anniversary here in the office.
Okay.
Yeah, so this is by chance very, very good. we spoke to a couple of investors. There was a lot of interest because of the new technology stack
that we were building
and the understanding of this pain point
that was not really widely addressed.
So we got the privilege of choosing between VCs.
We chose a boutique VC here in Israel
called Hyperwise Ventures
we felt more comfortable
with the local VC
and then we
were ready to go
we took
4 million dollars
as the seed money
and we started gathering our original team from Infinidat.
So now we are 15 people.
Most of us are engineers, deep R&D C++ people,
specialized in data structures and in algorithms, all these things that people don't like to talk about, but
this is our bread and butter. We joke
that our mother tongue is C++ here.
And we are three people addressing the business.
Myself, I have a product guy and an SDR.
But the vast majority are engineers. Okay.
Well, I would say it makes sense for a company in your space and also for a company the age
that your company has.
Anything else would be kind of surprising, really.
Then it makes sense that you're focused on research
and development at this point.
So let's go to the...
What I want to mention is that we are 15 people,
but the age of engineers we have here
is spread from 24 years old to 60 years old so so
we are taking advantage of each decade we have people here who have designed
storage 30 years ago but also very young engineers that know new data structures
and and how modern systems are built.
We're very proud of this fusion, if I may say.
Yeah, yeah.
So the other, let's go to the second part of your announcement.
You're also announcing a partnership with Redis.
And from memory, if I remember having read the draft announcement previously, it has to do
with Redis substituting, if I'm not mistaken, ROXDB with SpeedDB on its Flash solution. So I was
wondering if you can say, well, a little bit about the technical aspect of it, so whether it
is indeed substitution, like a drop-in replacement, and also a little bit about the business aspect.
So how come you got connected with Redis and what does this partnership entail exactly? Yeah, cool. So pretty early on, from leading businesses for international companies, my take on a company is if you can't solve a very big problem for many customers, even if you have a rocket science technology, then it probably isn't good enough. So early on from the first days,
we started talking to RocksDB customers and lucky for us,
it's a huge market.
And the biggest and closest one we found here in Israel was actually a couple
of floors from where I worked were Redis.
So I went to them and I asked them simple questions
about are using RocksDB, what are you suffering from and not very surprisingly
they were suffering from all the problems we were trying to solve. So
pretty early on we told them about what we are solving and they immediately said
if you're able to provide us with a drop-in replacement
so we don't have to change a single line of code because that's the biggest fear of companies
they need to change on their side if you're able to be rocks to be compliant with a simple drop-in
replacement we will give it a try so we went to to work, we actually were in direct contact with
them. They were pretty excited because it was a big problem
that required from them attention from all the engineers that instead of doing
the database work, we have to deal with the ROXDB parameters and configuration and performance issues.
So when we first got our GA product, it took us about six and a half months.
We came back to them and said, okay, this is a drop and replace.
Why don't you give it a try?
So without being too modest, it took them about a half an hour to get back to us
after they did a drop in replace for 30 seconds.
And they came to us and said,
"'Okay guys, we need to talk.
"'Let's talk about business because we just,
"'in 30 seconds, we ran our first benchmark.
"'We got twice performance on half of the resources, which is 4x the
initial product, which is a game changer from our side.
And from there, it took us about one month to close a global deal with them. The thing with Redis is that Redis is distributed, embedded, in-memory
key-value store. They are high performance when it pertains to in-memory,
but not so much when it's about when they have to extend the memory into flash. When the
customers scale and want to use much bigger data sets, they either have to buy
lots of memory, which is very expensive, and Redis wants to allow their
customers to grow in a more cost-efficient way. Their challenge is by
using RocksDB. They had performance issues when they scaled.
It was performance, they needed stronger instances with more CPU, more memory.
And by replacing RocksDB with SpeedDB, we allow their customers to scale 100 times bigger.
They have linear performance when they scale
to the flash drives, not linear on the memory,
but linear on the capacity on the flash.
And they're able to scale very, very cost-effectively.
So that's a perfect win-win for Redis.
They're able to give their customers to grow.
They can actually do it more cost-efficiently.
And for us, it was a huge win because it gave us the opportunity to help not only Redis,
but thousands of customers.
And from Redis' point of view, it was painless.
Drop and replace, very easy, no single line of code change.
So it was a perfect fit.
Okay. Well, it sounds like they are basically your customers, not really partners. I mean,
sure, you work with them, but it sounds like you have a paying customer.
Oh, yes, yes. They are definitely a paying customer and a big one. The reason I use the term partners is because we are an embedded library, we sit in the heart of the system,
right? They're really counting on us to run their most important thing, which is their customers and their systems, on our
technology.
So we really look at it as partnership.
It's a very close work and very close relationship.
They trust us and we look at it as a partnership.
But it's a business partnership where everyone is getting their value.
Okay. And I guess that's also, I was kind of imagining why you deviate,
let's say, from the standard for databases these days, which is being open source.
But I guess what I was imagining is that, well, since if you have 100% compatibility with RocksDB,
and as you mentioned, RocksDB already has a pretty big market share,
then your strategy would be kind of different than in the case where you would be starting with a new offering.
You would basically address very specifically existing users of RocksDB
and just go to them with the offering like,
okay, what you basically just said to Reddish, you can keep using your code in the same way
that you have, you get a huge performance boost. Well, you have to pay us to do that.
And I guess many people will choose to do it. Yeah. You actually said it by yourself.
What I could add to this is when you're in a business of solving problems,
you need to make sure that the problem you are solving is actually,
I would say, can easily be handed to the customer. I've seen technologies that, if I may say,
invented the wheel but needed customers
to do a whole chain of their software,
which there was no adoption because it was too complex.
So are us choosing making SpeedDB be ROXDB compliant 100% was knowing that we want to address as
big a market as we can and solve those problems out of the box without educating customers
how to work.
And we intend to continue to do it as long as we can.
Just out of curiosity, besides Redis, do you already have other paying customers?
We will announce it, I hope, pretty soon, but we are in the process of onboarding a few. We'll have separate announcements,
but all of them are big data providers.
Okay.
All right.
And then let's go back to the technical aspects a little bit.
Another thing that I was imagining before we got to actually talk
was that possibly this re-implementation
that you have done would be C++ based.
And you have just confirmed that earlier in our discussion.
And to be honest with you, your story reminds me a little bit
of another drop-in replacement, SkillaDB,
which is a drop-in replacement for Cassandra.
Their story has some similarities.
So a very popular database with some performance issues.
They re-implemented it and made a number of optimizations.
And by doing that, they have achieved better performance.
And well, they're addressing an existing market.
And it sounds like you're doing something very similar for a different database.
So besides the choice of programming language for implementation, I think you have also introduced a number of optimizations, let's say. So if you can say a few words about those.
Yeah, so first of all I want to address the thing about the similarity to ScyllaDB,
which we know of course well. ScyllaDB is a full blown database compatible with Cassandra,
which is a database management system. It is on a different layer.
They are in the layer of Redis and other databases,
and their play is in the database management system.
This is not our play.
We are an embedded key value store,
which is on a lower layer where we serve the database,
the data management.
So we are much more internal.
But from a compatibility point of view, I believe it's similar.
To the technology stack, this is, I think, the most interesting part of what we do.
It's not about the optimization. The academia and many other companies have done lots of work about optimizing
RocksDB or optimizing LSM trees. There are lots of work, extensive work done on RocksDB,
around RocksDB and LSM trees. And the mutual thing, the thing they have in common is that it's all about
optimizing the LSM structure. We redesigned the whole LSM
structure. We redesigned it, we wrote it from scratch. It's still LSM but
technology-wise it's totally different. We took very very wide components of the LSM tree, and we rewrote them.
We're looking at the problem in a different way.
Things that we think have not done before,
we have written some patents around it.
And if you look at the work done
about optimization of ROXDB, then you will see improvements of 5, 10, 20%, you know, depending on the workload.
Redesigning it from scratch gave us the ability to do things that, you know, today we see some hardware vendors that are trying to do, but we were able to do it in software.
So we actually rewrote the LSM3 from scratch and tried to look at it, if I may say, from
a different angle.
Okay.
Well, that makes it even more impressive, I would say.
I mean, and I'm thinking about the fact that based on what you said earlier,
it must have taken you a little bit over a year to deliver a working version of the software.
It took my two co-founders a year before we started the company and another six months in so
more than 18 months you know not including all the thoughts before but
that to make 10x out of rocks DB it's not an optimization right it's trying to
look at the problem from a different angle. And if I could compare it,
what 3DXPoint did to the flash, I think it's a similar analogy. We did the same to ROXDB.
We've redesigned an LSM tree that is built on multi-dimensions.
Okay, so that makes me wonder what's next in your trajectory.
I mean, it sounds like you already have a working product of good quality
since it's used by the likes of Redis.
And I don't know if you're going to be extending it significantly
on the software side.
And you have a specific go-to-market strategy.
You already have your seed funding.
So what's your plan for the future?
Where are you going next?
Yeah, so first of all, there is a huge market out there of companies
that don't know about SpeedDB using RocksDB that we want to help.
So in the next years to come, we have a huge market to address.
Many, many companies, many users.
We find more use cases and more companies
embedding RocksDB and working with it.
So we think we can really help them
and give them the opportunity to look on the software stack
and not on the data stack and help them scale
and get better performance and focus on their application.
And so we intend to pursue this go-to-market,
you know, for the foreseeable future.
We'll also be looking at raising more money to scale the company so we can
address many of these customers and give them
decent enterprise support and many more features that we have in the stack and um and one thing
you mentioned about open source yeah um we since we were busy building our IP in the last year and like building a company,
we were not born as open source, but we are very much aware of the open source trend.
We are aware of the open source community.
And this is something we are definitely considering.
And I would say
stay tuned for the future.
Okay, interesting.
So you also mentioned, well,
you touched upon a number
of aspects, let's say.
First, and I was also wondering
and was going to ask you about it,
scaling up the company. So you mentioned
hiring people to be able to go to market.
And well, the current team that you have may have served you well so far,
but to go beyond, obviously, you need a different type of skill set.
So I guess you're going to be hiring for that.
And you also mentioned adding features.
So what kind of features are you looking to add?
I think the interesting part about embedded key value stores
is that because of the, I would say, glass ceiling
of the things you could have done with a specific amount of data, you would limit
yourself to very, very little functionality in the storage engine. Our ability to support
a huge amount of data allows us to bring more functionality to the data layer without paying the performance
penalty. Think about the fact that we improved the write amplification factor
about six or eight times better than RocksDB. We have lots of room to put
other functionality that will actually help the application above
to look inside the data and do smaller things with the data. We are able to add more layers
inside. The fact that we are using a fraction of the CPU and memory allows us to put much more functionality
without getting to the limit of the hardware.
So we have plans to add more functionality that will allow applications to work smarter
with the data, but lowering it to the data management or the data layer and saving lots of CPU and
memory power from the application.
Does that also mean going beyond what RocksDB offers?
So deviating from its API? Right now, we are doing our best to stay with the RocksDB API to,
to,
to address all the RocksDB users,
but in the future we will add more functionality.
So we can support the RocksDB customers and maybe other use cases that will
be relevant.
Okay.
And what about the open source part that you mentioned?
Are you thinking something like maybe an open core offering
whereby you offer some basic functionality as open source
and some enterprise features in addition to that?
I would say that without getting into specific details,
okay,
you are barging into an open door.
We have been looking at it very, very carefully.
We understand the value.
We understand the potential.
And as I said, we are deeply considering it.
And stay tuned.
Okay, fair enough.
All right.
So yeah, I think that's already a very, very good start for you.
So congratulations on all of that.
Well, the funding, even though, as you said,
it's already been a year, nobody else besides you knew.
So it's like news for everyone else.
And congratulations, excuse me, for the partnership,
which is really, in the way that you described it,
the partnership in the sense that it's something
that you offer to Redis that is very central to them, but still for you it's a paying customer.
So congratulations on that as well.
And good luck with all your future plans.
I think on my side, I'm pretty much covered.
So unless you have anything you'd like to add, I think we can wrap up here.
I think we're good.
Thank you for the time. I really enjoyed the can wrap up here. I think we're good. Thank you for the time.
I really enjoyed the conversation.
Same here.
Thank you and nice meeting and good luck with everything going
forward.
I hope you enjoyed the podcast.
If you like my work, you can follow
Link Data Orchestration on Twitter, LinkedIn
and Facebook.