Grey Beards on Systems - 132: GreyBeards talk fast embedded k-v stores with Speedb’s Co-Founder&CEO Adi Gelvan
Episode Date: May 6, 2022We’ve been talking a lot about K8s of late so we thought it was time to get back down to earth and spend some time with Adi Gelvan (@speedb_io), Co-founder and CEO of Speedb, an embedded key-value s...tore, drop-in/replacement for RocksDB, that significantly improves on its IO performance for large metadata databases. At Adi’s last … Continue reading "132: GreyBeards talk fast embedded k-v stores with Speedb’s Co-Founder&CEO Adi Gelvan"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here.
Jason Collier here.
Welcome to another sponsored episode of the Greybeards on Storage podcast,
a show where we get Greybeards bloggers together with storage assistant vendors
to discuss upcoming products, technologies, and trends affecting the data center today.
And now it is my pleasure to introduce Adi Gelvin, co-founder and CEO of SpeedDB.
So Adi, why don't you tell us a little bit about yourself and what SpeedDB is all about?
Great. Thanks for having me today.
So Adi, I'm a former techie, math and computer science guy.
Went through the IT route, including IT management, then gradually moved to business.
And at some point I started working for storage startups.
Made a career in storage, was a part of the founding XIV team, then the
Infinidat team and led the worldwide business of Infinidat, which is where I found my co-founders
of SpeedyV, by the way. And during our work in Infinidat, we came to the point where we
had to choose an effective engine to manage the metadata.
We've seen the sprawl of metadata, everyone's talking about it in the past decade. And
one of the issues that we had was how do we manage metadata in an effective way? And the
obvious choice was to go for one of the storage engines in the market. Storage engines are those
complicated software layer that actually manages the metadata within systems
that no one has ever heard about, unless you're a storage guy or a database guy,
or a data structures guy. And we went to the market and we looked at what we could find. And my co-founders, Chilik and Mike, they said, OK, let's go for the most prevalent one, which was RocksDB.
RocksDB started actually as the brainchild of Google called LevelDB.
And then by Facebook, they named it RocksDB.
And they said Facebook can't go wrong, right?
RocksDB is being used by thousands of clients worldwide,
large community or user base, if I may call it,
and being used everywhere
by the biggest companies in the world.
So they tried to embed it within the storage
and they saw that it was working really great
in small data sizes. But when the
data size grew up, it had some issues like stalls and IO hangs and instability in performance,
excess usage of CPU and memory. High variability kind of things. Yeah.
Yeah. And it raised some questions. How come all Facebook is being actually managed,
the data in Facebook is being managed by RocksDB
and on a single storage machine, it really can't scale.
And when they looked into the problem
and spoke to Facebook and to the community,
they realized that there's a workaround called charting.
Storage engines in general,
and it doesn't matter if they're B3-based or LSM-based,
they were actually used to manage metadata.
So they were designed to manage very effectively small data sizes.
When it pertains to large data sizes,
the data structure was just not built to scale.
So what you're doing, you're charting the problem,
you're getting the maximum size of the storage engine can manage and then you split it to several nodes and you allegedly solve the problem, which is
cool, but not so cool because sharding has its toll. You need to write your
software different. You really need to take care of the sharding and the flexibility of the sharding,
how to scale up, how to scale down.
You really need to write code differently.
Yes.
Yeah, so you're talking about,
let's say the metadata within a storage system,
like defining blocks and deduplication things
and where volumes lie and stuff like that.
So we're talking pretty intense, real-time dependent metadata, right?
Yes. And we started with storage.
But when you look at it on a broader view,
you see that all the databases are actually dependent on the metadata and storage engines.
And applications today are actually managing their metadata by
themselves so they're dependent too so you you realize that the thing called metadata which was
really hidden because it was really small is now growing pretty fast and you know people who haven't
heard this um you know the term metadata when you're in storage database or data market, but now everybody knows it because it sort of became a problem.
Yeah, it's everywhere.
It's not so prevalent to the industry anymore.
And those software layers were actually managing the metadata.
In charge of it are the storage engines, which were not built to support large data sizes.
So you either shardarded or work around.
And by the way, the worst problem of sharding is that is the CFO.
The CFO needs to pay the price of sharding, which is more development, more resources.
And of course, if you're on the cloud, then more instances.
No one wants to pay the toll of sharding.
And two, when you get into sharding as well, you see a lot of the eventual consistency
components, right?
I'm sure I see it on Facebook all the time, right?
Yeah.
Yeah.
So, it's a nice, not nice, but rather easy workaround, better than changing the code of a storage engine
and they get a structure but then we realized that everyone was actually sharding the problems
and when we spoke to the community and to facebook we got answers like no this is how it's working so
people are really treating a storage engine as an atom, as a one component.
Whereas my co-founders said, no, no, let's look inside.
Let's explore the field of LSM trees.
Let's explore the field of B-trees.
And when they did that, they said, okay, this is a cool thing.
So there is a very important component here that everyone uses.
Now it needs to change, and no one really put thought of how to scale it. So let's try and do something of my own.
So let me try to unpack some of this,
Adi. You mentioned LSTM, so that's
Log Structured Tree Merge?
LSM. And what do those things look like?
Are they effectively...
All right, so SpeedDB and RocksDB
are essentially a key value store.
So you're mapping some arbitrary key of arbitrary length
to arbitrary data.
It could be binary data.
It could be integers.
It could be floating point.
It could be the text of the Gettysburg Address or something. It doesn't matter, right?
It's just some key to some value.
Right, right. So how does LSTM fit in?
Okay, so it's LSM.
Oh, okay. And it's Log Structure Merge.
So I'm familiar with Log Structure, but not log structure merges.
Okay.
So the world of storage engines is divided, if I can, into two data structures.
One is B-tree, the ones you see in Oracle like InnoDB.
These are binary trees that are very, very,
they are like trees with many, many leaves
and they have to be balanced.
So when you search data,
your search is gonna be consistent.
And binary trees are very, very effective in read operations,
but very, very bad in write and update
because you always have to rebalance the tree,
which takes CPU and memory and space.
And then you have the LSM trees,
log structure merge tree trees,
which are actually built
different.
They're built in a form that you write data to the cache, to the RAM, and then you flush
the cache into SSDs, which are immutable files.
So the write operation is very, very fast.
But what you get is you get lots of immutable
files with duplicated data, which you have to merge.
That's why I called log structure merge street.
The merge operation, after you've written very, very fast, is very expensive because
now you have to really merge and order those.
And when you do it, you actually write them to the next place, which is the next level
and the next level. And then you end up with really fast writes. But when you come to read it,
then you have to search all those files. So there is this process called compaction,
which is merging those into a different level. I feel like I'm a professor of third year.
That's all right. Sorry. Sorry for that.
We're good at this stuff.
So it just takes us a while to get up to speed in the terminology.
So in the LSM trees, you get very, very fast writes,
but you pay the price of reads.
And the compaction is what goes through, let's say, the LSM segment and reduces the duplication and orders it and merges multiple segments or blocks together into a single larger block and that sort of stuff.
It's called SSD files. The cache flushes files that are ordered,
and the compaction actually combines them and gathers them to larger files that are ordered,
so you'll be able to fetch the data faster. And what does this do to write amplification? So
in a storage SSD, for instance, it's possible that some data block could be written multiple times
on the storage as it goes through compaction and garbage collection and those sorts of things.
It would seem in this solution that there might be some write amplification.
Right. So the biggest challenge of storage engines is the write amplification. So let me say two words about write amplification.
Write amplification is, or the write amplification factor,
is how many times do you have to write logically or physically
write one block to get one logical write. So when you write your name into a file or
a storage, if you have five letters you may find yourself with a write
amplification factor of 30x. So you might need to write 150 physical writes, which includes writing, flushing, garbage collection, reassembling the
data, and doing over and over so it will be written in the right place. And the right
amplification factor we see normally is close to 30x. 30x seems obscene.
I mean, I think write application for SSDs
to maybe be 1.7, maybe 2 or something like that.
Yeah.
30x seems to be...
I'm writing one byte and I'm actually writing 30.
Yeah.
So when you're talking about SSDs,
then the write amplification is internal,
which is maybe the garbage collection
operations. When you're talking about a database or an application, the write amplification can be
even larger. So I was going to say, is that amplification also part of, let's say,
the garbage collection that's going on within the solid state drive itself is basically that's one
amplification factor. No, he's talking about physical rights to storage.
Right, right.
Yeah.
You have to multiply the 30X with the right amplification of the hard drive or the SSD,
which will bring it, you know, to 60 or 70.
We even see 100 sometimes.
So that's where you get that order of magnification.
Right.
So why do we have this ratification?
Because, for example, if you write, if you update a B-tree,
you want to put a letter into a B-tree.
Sometimes you really have to balance the B-tree so much
that you have to move many objects or many leaves to the other side.
So you can actually put one letter.
All these writes, they are part of the write amplification.
In LSM trees, the write amplification is mainly the compaction, the combining
those files together and writing them to the next level and what we see in the
market that 30x is a reasonable
number. And we said, Okay, that's what you see. And actually, the larger your data set is,
the higher the write amplification factor will be. And what we saw, yeah, what we saw in storage engines, that the write amplification
factor is not really bad when it's small sizes. And when it's small sizes and the data fits into
the RAM, then, you know, you can do many mistakes in the RAM. It like, you know, suffers everything.
It becomes like a write buffer almost, you know?
Right. When you write into silver media,
then the effect of the right application is much higher.
I was going to say, you too, as a serial entrepreneur,
when you see numbers like, oh, 30 times, that's a reasonable number.
I bet you saw market opportunity, right?
Right, right.
So what we did, we went to the best benchmarks and every workload, and we said, okay, can we improve it?
We've seen that there have been tons of academic researches on LSM trees and B trees and ROXDB.
These are very, very prevalent storage engines.
So academia has really done some good research. And with the current technology of a regular LSM tree,
you could really get it down to 15x, 14x.
So we said, can we do it better?
So my co-founders left.
They sat at home for about a year,
and they found a way to actually redesign the LSM tree to a different data structure and reduce the rat amplification factor to 5x.
And that's for any arbitrary size metadata?
I mean, from megabytes to gigabytes to terabytes kind of thing?
In marketing and in PowerPoints, yes.
In real life.
I know, we're talking real world here, Adi.
Yeah, in real world, the answer is mostly yes.
We are still struggling with some workloads,
but I think that what we've done really, really good
is we've built a data structure
that can actually support terabytes of metadata.
On average, reach five or six X ratification.
And when you compare that to what you have in the market, you realize that eventually there's an order of magnitude of performance and scalability
and efficiency that we can actually do so we see today with clients that we
enable them to run 50 node clusters on five nodes hmm we allow them to increase
the data set from tens of gigabytes to hundreds of gigabytes and to terabytes of data with no performance degradation.
OK, so this is what we're looking for.
And yes, I can tell you that I have a saying in the PowerPoint.
Everything was great when it goes to the lab. 80 percent is working great.
When you meet customers, nothing's working. So what we've done,
yeah. It's kind of a development life cycle, right? Runs great in PowerPoint and then
actually working. That's the next step. Yeah. There's a joke. I heard someone say that
God created the world in seven days because he didn't have user base.
Somewhere around day six, I think he created the user base, days because he didn't have user base.
Somewhere around day six, I think he created the user base,
but that's different.
Yeah.
So, yeah, please.
So do you want to talk about what you did to LSM to make it faster? Or is it, is it's,
it's obviously it's IP to speed DB and all that stuff.
Yeah.
But I can give you some nice hints and
give you the way we thought
about the problem.
When you look at an LSM
tree, you have two dimensions.
One is the number
of levels. Second
is the width or the size.
Right?
You can actually play with
the number of levels and the sizes of those levels so you can actually play with the number of levels and the sizes of of those levels
and you can decide to do it with less levels and to um write faster you can do it with more levels
and read better from the bottom most levels but you have to really pay the right amplification
because you need to compact them so there is a there is a simple trade-off between the number of labels.
And what you said, okay, now playing with two axes, X and Y, is pretty limited.
Why don't we think of a data structure that will actually add dimensions to the structure.
Why don't we look at multiple dimensions?
Think about what 3DXPoint did to the SSDs, to the NAND, right?
We added more dimensions.
And more endurance.
So what we did, we added more dimensions that enabled us actually to look at the data from different places and to actually merge them on different, I would say, verticals.
So we can actually look at this data structure.
Think about multi-dimensions, and now we can
merge between the z-axis and the y-axis, and then the x-axis and the y-axis, and we can create all
kinds of small data structures within this data structure that will allow us to control when
compaction is done to what data and really control and give us another level or additional
levels of granularity.
So that helps us to write very, very fast because we control the write. But it enables us to control where we read from and when.
And we can now control when we do the compaction.
Now, the good news is that it actually works.
This is great.
The bad news is that with such a complicated data structure,
if you're looking about very, very small sizes of data,
then we don't really have an advantage. We kind of complicate the
problem. So if you're really working on small data sizes, you may get some benefit, but not really
meaningful one. But what we see is that we see less and less clients with small data sizes. Data is just booming. Luckily, the data world is going big, not small.
Yeah, yeah, yeah.
So data is working with us.
Exactly.
You're playing the trend, which is always good for a startup.
Yeah, one of the nice things is that...
This is nice.
One of the nice things, when we solved, when we built our data structure, we said, wow, this is nice. One of the nice things when we solved, when we built our data structure,
we said, wow, this is cool. But then we said, okay, we've changed this. Now we need to change
some other components like Bloom filters and write flow and things that are really intertwined
within this data structure. So from changing the compaction, we find ourselves changing
more and more structures within the LSM tree. So we find ourselves changing more and more structures
within the L-Semtry so we find ourselves
really writing the whole
thing from scratch
so
we thought it would be very very easy
but we still
keep on changing stuff
so we can actually
enable the
you know
there are no things as no trade know, no trade-offs.
There are always trade-offs.
Let me just talk a little bit here.
So a Bloom filter is used to identify whether, I'll call it a specific key exists in the metadata.
And what you do is effectively, I don't know, you hash the key somehow.
You get a couple of, of you know you get a bit
pattern and you yeah and you or this or i guess yeah you or this into a larger bit pattern and
then when you somebody comes back with that key uh you can you can you can check to see if that
you do an and to see if it exists and stuff like that but the problem with the bloom filter
is you have to clear it every once in a while, right?
For it to work properly.
Isn't that the way it works?
I mean, it's not like you go in and do the hash
and do an XOR to get rid of it out of the Bloom filter
because there are plenty of other keys
that could potentially map to some of those bits.
Yeah, yeah.
So you touched the pain point here
because so one thing,
Bloom filter is, it gives you the probability that a certain key will be within that SSD file,
but it will not be necessarily there, right? So if you want to get better probability, you need to define a large Bloom filter, which is great,
but you take your memory. And then when you went to clear it and update it, it will kill you.
Excuse my French. So we need to say, okay, how do we improve our search abilities without really growing the memory?
So embracing memory and all that stuff.
Yeah, yeah.
We needed to define new ways
to actually map SSDs in a Bloom filter
and still keeping it small
and find effective ways to clean and update them.
So this is also one thing we had
to you know take care of it is pretty complicated this stuff and being being the ceo actually give
us gives me uh the opportunity not to be able to tell about these things because i don't necessarily
understand so there's no risk no risk of me telling you.
Since you're CEO, I'll shoot another question.
Is there a specific killer app or a market segment
that you really like to target and think that this technology
is just perfect for?
When we started, the obvious place to go was database market, right?
Database scale, the storage scales,
and the layer in between doesn't scale.
So naturally, this was our place.
Every database provider today
is moving into database as a service,
and their challenge is not the scalability,
but the price they pay for this scalability.
And with the current storage engines,
you're actually forced to use a huge amount of infrastructure
to support the data.
And with our technology,
with our technology,
we can actually enable you to grow on a single node much bigger.
So we can actually save you a lot of money.
So this was the obvious place.
If we could provide you more performance, more scalability, and more efficiency,
it sounds like a win-win-win.
But when we explored this market, we saw that not only database, today you have more and
more applications that are actually managing their own metadata. Data is the new oil. Every
application needs its data. The more you can do with your data, the more you control the data,
you actually get more benefit. And you see more and more applications,
fintech, blockchain,
cybersecurity
that are managing
their metadata directly. And guess
what storage engine they use?
RocksDB, normally.
So, we
are
one and a half years old. Every month
we're finding more and more use cases.
Some of them are pretty straightforward.
Some of them are not.
But one thing is clear that metadata is growing.
It's not waiting for a new technology to be there.
And the challenges grow.
And we definitely see that more and more use cases
from cybersecurity company, banks, database providers, storage providers, cloud providers.
The whole ML stuff, the machine learning and the data and features and stuff like that.
It's all to a large extent, it's metadata.
It's associated with the classification stuff.
It's impressive.
Yeah, exactly.
I was going to say, are you seeing any particular target?
I guess so if you think of like cloud data center edge, all of those components, do you see any of those particularly being a kind of a hot spot for your technology?
Interesting you mentioned edge. When we started, everyone was saying, go to the edge, go to the fog.
Tons of innovation there.
I mean, things are going there.
We think that when we explore this market, we found that people are talking about it a lot, aiming there. We didn't see any real production
or real life growth in these places yet.
I'm sure it's on the way,
but we didn't see really large data sizes
or companies really being able to harness
the edge and the fog on a large scale.
We know there's lots of innovation and cloud providers aiming there,
but we haven't really seen this market as mature enough.
But lucky for us, we have other many mature markets
that are struggling with this.
And I think that there are always trade-offs.
We are an embedded library, very techy embedded library,
which means that we can fit in the lowest places
with almost every application there.
On the other side, we are not on the front of the application.
So you really need to get to the heart of the application
and do your magic there.
So the good news is that we see tons of opportunity.
The bad news is that the technology is not really visible to the end user.
Right. So I would say it's also not easily adaptable, I would think.
Right. I think one of the smart things or one of the smart decisions we made initially, which was probably more luck than sense, we decided to be compatible with RocksDB and to stay compatible.
So every RocksDB client can actually drop and replace.
It's as simple as that.
30 seconds, you drop and replace and it's working.
With other storage engines,
there are some things you need to do,
but right now we're aiming the RocksDB.
We're targeting the RocksDB market.
Lots of application, lots of customers.
And when we're done with that,
we will increase our support to other key value stores API.
Somewhere I saw in your documentation that
you reduced or compacted the key size rather than being some arbitrary key, let's say 256 bytes,
you've compressed it into something like 24 bits, three bytes, something. That's why I can't believe I saw that.
So you could fit all this stuff in memory?
I'm not really sure what you're talking about.
Okay, that's all right.
We'll save that for another time.
Yeah, I'm not really sure,
but I can tell you that one of the challenges we're trying to solve is that when the data size does not really fit into memory, then you start paying high prices for the write amplification factor, right?
So we've done some significant technology improvements into putting some valuable structures in the cache, but the majority of the metadata in high scale is outside the cache.
So we're harnessing our technology to keep those components in the cache, but to support the massive data that is out of the cache.
Right, right, right, right, right. So massive levels of indexing
and that sort of stuff
is required to support all this stuff.
So how is something like SpeedDB
licensed or paid for?
Is it just you get access,
you pay one flat fee
and you get access to the functionality
and you're off and you go?
Or is it something like on a per terabyte basis?
No.
Okay.
So we have two pricing models.
And then I have some good news.
So right now we are selling closed source to OEMs and to end customer.
If you're an OEM, we have the
revenue share model like we did with Redis on Flash. Redis on Flash is working
on speedDB instead of RocksDB. And then we have for end customers which is price
per node per month. Every node that you're using speedDB you're paying a
license. But very soon we're gonna go out with our open
source version because we realized that if we wanna um really get the market of the storage
engines if we really wanna not do one by one the big customers but get there get the developers
we're actually embedding the storage engine into
the application get them to use it adopt it and then use it on the enterprise in production we
need to have our open source version so we're right now working on our open source version
it's going to be open core we're going to reveal some of the secret sauce to the open source and
then you're going to have an enterprise version that we're selling today
as closed source.
So a closed source that you will
be able to use in production. And that's coming
really soon.
Right, right, right. And then you offer
enterprise level support things
and that sort of stuff, right?
7 by 24 by
365 kind of thing.
Enterprise level support features, scalability, extra performance boost.
Some of our secret sauce is going to be there.
So, yeah, exciting times ahead of us.
How is something like an embedded solution like this sold?
I mean, what's your go-to-market kind of thing?
I mean, obviously a website and stuff like that.
Yeah, I think one of the challenges we had with our go-to-market is that the technology is really great.
When you test it, you see the magic happening and great. One of the challenges is how do you get to those clients?
How do you speak to the people in those vendors
that understand what you guys are doing,
that understand the benefit
and that understand where this technology goes?
The challenge was that we are selling to developers.
The really C++ developers were developing the storage
engines and those data structures and can actually replace storage engines and
understand the value of it. And these guys, they're not the typical CXOs or people in the organization that are buying.
Right, right.
Right.
So what we did, we went to the CXOs, told them about the business benefits,
and they would redirect us to the programmers that would test
and then we would close deals.
We realized that the path would be much shorter if we went out
with our open source version.
Every one of those programmers is looking for effective solutions and
technologies.
And by enabling them to drop and replace very easily and test it,
this would shorten the sales cycle.
I would say the other place you might consider is SNIA has a developer
conference every year.
And there are lots of, you know, I would say sizable application kinds of developers that go to these sorts of things.
The other thing might be Fast Conference.
Are you familiar with the Linux Fast Conference?
Yes.
Jason, you must know about that, right?
Oh, yeah.
And, you know, and I think, you you know kind of another one on top of that
is like getting involved like the cncf as well uh cloud native computer foundation it seems like a
a good good fit for it right right right right all those that have you know fairly sizable
development uh audiences kind of associated with them and stuff like that.
Yeah, definitely.
The nice part about going open source or our decision to go open source is the realization that we need to go bottom up.
We need to talk to the community.
We need to develop a community.
And lucky for us, there is a huge user base of RocksDB without real community out there so thousands of customers using a
software software library that is not really supported so there are many
developers who are looking for a solution we think that speedDB can be
great help to them and we really want to harness our technology to
build a community and to serve them what they need uh bug fixes support and allow them to contribute
yeah better have thousands of you know developers who know what they need help you and we think uh
that by getting to them and and managing this community well, we can really make a change.
And the organizations you just mentioned like SNEA and FAST and CNCF
are the place to go.
I agree.
Yeah.
Yeah, exactly.
Exactly.
The other thing I was thinking that you might, you know,
given the advantages that you bring to the table,
you'd think somebody like Facebook and LinkedIn and these big,
huge RocksDB users would be chomping at the bit for something like this, right?
You would think.
I can't comment on that, but we haven't built Speedybee to be sold to one of those giants.
We've built it because we really want to make a change.
You know, we can maybe, you know, sell this to a giant cloud provider and make it, you know, do something for his own.
But we really want to change this industry.
We really want to provide something to the market that will enable clients, different
customers and the developers to do something else.
And we want to build a large business.
So we think that the way there is not necessarily by, you know,
talking to the big guys.
We're not after the quick cash.
We want to change the market.
Yeah.
The hyperscalers and mega data centers,
they've got a real not invented here problem with most of their technology. No kidding. That too. Yeah. Probably. Yeah. Yeah. So far, I have. The hyperscalers and megadata centers, they've got a real not invented here problem
with most of their technology.
No kidding.
That too.
Yeah.
Probably.
Yeah.
Considering RocksDB was a Facebook thing, right?
So yeah.
It was something I heard.
It is still a Facebook thing.
RocksDB is still a Facebook thing.
It's called open source.
But when you look at it, 99 maybe percent of the contributors are
from Facebook.
I had something else, but it slipped my mind.
There she goes.
Jason, any last questions for Adi before we close?
It's been very informative.
I think you've got some good market opportunity.
Like I said, having that big RocksDB community,
I like get the mailing list and start spamming them, right?
I wouldn't go that far. Don't do that, Jason.
Don't tell them that stuff.
Adi doesn't really work with developers.
Yeah, yeah, yeah.
You wouldn't think.
That's true.
They just flush you down the toilet.
Yeah, no.
I like the community.
That community involvement aspect is a really good call, I think.
A better way to go.
It's a good route to market.
Yeah, yeah, exactly.
So, Adi, I was talking about,
so what about the response times and things like that?
How does SpeedDB compare to RocksDB?
And let's say, well, I guess the problem is the sharding.
It kind of depends on the sharding factor and stuff like that.
Can you point to some sort of response time benefit
from SpeedDB versus RocksDB?
The nice thing is that there is a very easy way to benchmark RocksDB.
There is a tool called DBBench, which is actually a benchmark developed by Facebook to benchmark RocksDB through various workloads.
When you run this against
SpeedyBee in various workloads and data
sizes, you really see
the impact that we
do. So what you see is that
in small scales,
we may be faster,
but
you won't see a radical change.
In large scale, the larger your data set is,
and I'm talking about 30 gig plus,
then you really see the impact.
30 gig is not that large, Adi.
For metadata, 30 gig starts to be large.
And when you look at the metadata in those 30 gig,
you may have billions of objects.
Yeah.
When you have those billions of keys, then it really becomes, it sounds small,
but when the data grows now 100 gig and 200 gig,
which is not considered to be large in storage, but in metadata,
when it doesn't fit into cache, it starts to be a problem.
And then you actually see IOHANG stalls, slow response times.
The response time is usually linear to the write application factor.
So we see sometimes 10x improvement in the WAF, you'll see it in the response time.
Yeah.
All right.
And the ability to allow you to scale.
Yeah.
Just one final thing.
The ability to allow you to scale on a single node without charting not only gives you the
benefit of utilizing your hardware much better, you get better performance,
better scalability, and less usage of memory and CPU, and allows you to reduce the amount
of hardware you're using overall. So this is a major financial benefit.
You had a benchmark of, it might have been RocksDB against SpeedDB with different styles of Amazon compute engines.
The SpeedDB was actually smaller and was able to produce better throughput, better response time and that sort of stuff than the RocksDB.
In one of our first clients, we actually showed them that they could use a quarter size instance in Amazon
and yet get twice the performance.
So this is a factor of eight.
One fourth the size of the instance and still get double the performance.
Performance. That is pretty impressive.
All right, Adi, anything you'd like to say to our listening audience before we close?
Thanks for having me.
It's really a pleasure to be talking to people who understand this problem.
And I'm looking forward to our next step going open source and building a huge community of people who like to use a very scalable storage engine.
Okay.
Well, this has been great, Adib.
Thanks for being on our show today.
Thank you.
Thanks for having me.
That's it for now.
Bye, Adib.
And bye, Jason.
Until next time.
Next time, we will talk to the most system storage technology person.
Any questions you want us to
ask please let us know and if you enjoy our podcast tell your friends about it please
review us on apple podcast google play and spotify as this will help get the word out Thank you.