Storage Developer Conference - #73: Key Value SSD Explained – Concept, Device, System, and Standard
Episode Date: August 15, 2018...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference.
The link to the slides is available in the show notes
at snea.org slash podcasts.
You are listening to STC Podcast Episode 73.
I'm Yang Seok-Gi from Samsung,
especially from the San Jose office.
We have an R&D center at San Jose office.
So this work is actually global collaboration across Korea and the U.S.
And today I'm going to introduce our experience to you about the key value SSD.
The main purpose of this talk
is not to show off what we have done.
Rather, we share our learning from this experience
and invite industry to work together
and move forward together. That is the main reason.
So some data is tight recent and we are still working on to implement feature or
to evaluate the performance and feature from different angles.
So during this presentation you can interrupt me any time
and ask question, I can share the information
as much as I can, as long as it does not
break my confidentiality.
Okay.
So in this talk I talk, it's quite short compared to the FMS.
So I actually collapsed many slides into small number of slides.
But I'm going to go over why we start this work
and what is the concept of key value SSD
and what kind of ecosystem we need to enable this kind of technology.
And what we are working on from a standard perspective,
Bill Martin is sitting over there, so he can explain more,
but I'll cover as far as I know, and he can fill in the more details
about the most recent progress.
And I'm gonna share some experiment
about this device from the system perspective.
So we actually build a real system
and plug in this prototype into the system
and evaluate from different angles.
So let me start with the background briefly.
So this is quite well known.
So this say a lot of data is generated
from different devices and the application,
the mobile devices and the application, the mobile devices and PC,
users are connected to the internet and cloud
and keep generating data.
So for example, Netflix, just one year,
within one year, the traffic amount is 1.5x increase.
And across different type of application,
a lot of data is generated and increased dramatically.
So this is quite popular graph as well.
So I put a little bit different name here.
So around 2005, the cloud concept was introduced by Amazon. The EC2 was
first come up and virtualization technology. Okay, so I hold on this mic. so that's better, right? Yeah, so around 2005, cloud concept was introduced,
and after that, iPhone was introduced
in the mobile device quite popular.
Since then, the data is generated a lot
from mobile devices, and also in the cloud,
the infrastructure itself generates a lot of data.
So, to me, sort of the new era started around that time.
So I can say before cloud and after 2000
is sort of the era of the data.
So I would say the era of data.
So it's a new BCAD for IT technology.
So, one of the interesting thing is,
since the cloud was introduced,
the lot of unstructured data was generated.
So, unstructured data is contrasted
to the traditional relational database data. 공식적인 데이터는 일반적인 데이터를 비교하는 데이터입니다.
비디오 파일, 포토 파일,
공식적인, 세미 공식적인 데이터,
JSON 파일, VM 이미지, 컴프레스 파일 등
많은 데이터를 이 시기에 생성했습니다.
그럼 문제는 무엇입니까? 그래서 우리가 관리하는 데이터는 이미 액자입니다.
파일의 형태인데,
사람들이 데이터를 액자처럼 perceives합니다.
하지만 데이터가 어떻게 구성되는지,
우리는 아직도 블록 스토리지에 의해 의지하고 있습니다. object, but how data is actually stored. We are still relying on the block storage,
meaning that even though you store some concept of data,
but when data is stored, you have
to split into fixed size of chunk
and distribute the chunk across physical devices.
But can you store such data directly to the device?
That is the starting point of our thought.
Is this concept new?
Not at all.
We heard about the OSD more than 10 years ago.
And the OSD was proposed to actually solve
this kind of problem, especially to solve the metadata handling
problem. But due to the metadata handling problem.
But due to the complexity of the implementation,
OSD actually didn't take off.
But to store the object into device,
there are, I think, two different ways.
We can actually rely on the traditional OSD model.
Traditional OSD model consists of three components mainly.
To identify an object, you need to specify ID,
and you can associate attributes to the object,
such as, for example, if you store a photo,
you can specify where did I take this photo,
or when did I take this photo, with whom.
The search data is not directly related to your photo, or when did I take this photo, with whom. The search data is not directly related to your photo,
but it's associated with the photo
and specifies more information about your data.
And the actual data is the user data.
So typically, in OSD concept,
an object can associate with three components.
But there's another way to store that is,
okay, why don't you just integrate attribute into
identifier, so that is the key, right?
So in the key value concept, you can specify
identity of object using key
and store actual data using value.
But key can be simple but powerful.
Instead of having three different component,
you can encode much more information
about your data into key.
So by just handling key,
the upper layer application
can differentiate or identify object from key.
For example, I use the photo example, right?
You can specify where the place you take photo
or time you take photo.
You can actually encode that information into key
instead of store that information as separate metadata.
So, then why we just choose key value path,
not the OST path?
Because when I actually survey
the data center infrastructure, a lot of system actually relying
on key value abstraction.
So this is not comprehensive list of application
or domain, but there's some snapshot of the application
which rely on key value abstraction.
But quite different application actually using key value abstraction.
So the most popular one is cache.
Redis is in-memory object cache which use key value abstraction.
It's mostly for the DRAM, but also can be extended for storage as well.
And the storage side,
SEP is quite popular,
and at the bottom of SEP layer,
it actually rely on RocksDB abstraction,
even though they introduced BlueStory,
essentially it rely on the RocksDB abstraction,
which basically provides key value abstraction.
The NetApp case, a solidifier system,
when you see the solidifier scale out storage system,
it, at the bottom layer of the stack,
basically key value to provide more efficient
duplication and compression.
And the large scale, hyperscale data center like Amazon Azure,
when you see the Azure storage architecture,
basically, it's a very huge key value store at the bottom.
And in the database side, the MongoDB
is one of the NoSQL database and it provides
the NoSQL document store.
But at the bottom,
an interesting thing of MongoDB
is it has a storage layer
and then you can plug in different types of storage
into this infrastructure.
But at the bottom of the MongoDB stack,
it basically has a key
value abstraction.
You can plug in a wire tiger, you can plug in
the rocks DB, there's different type can be
plugged into the system.
And another interesting is the Facebook actually
introduced the My Rocks.
My SQL is basically relational database, but they want to introduce the MyRocks. MySQL is basically relational database,
but they want to replace the
InnoDB with the RocksDB to reduce the space
user database actually consume.
They reduce, I don't remember the exact number,
around 50% of user database space by replacing InnoDB with Rocksteam.
And recently, the Cafe2
basically replaced their storage model
from the NFS to the key value,
like LevelDB and the ReadySpace.
And the service provider, like Airbnb, Rakuten,
they build object store, but at the bottom of their stack,
they're using key value abstraction.
So okay, so key value quite popular in the data center.
Can it help them?
So then what is the problem they have right now?
So, all the application basically interact
with the storage system through object abstraction.
But the bottom of the stack, the hardware,
just provide the block interface.
So there is a gap between what hardware can provide
and what application actually wants.
So to bridge the gap,
many systems are actually using software-based key value store.
The most popular one is the RocksDB.
RocksDB is a branch of LevelDB.
LevelDB was introduced by Google, and RocksDB was introduced by Facebook.
And another popular one is WildTiger.
WildTiger is acquired by MongoDB, and that is the base storage backend for MongoDB.
But WildTiger is also used by Amazon as well.
Amazon DynamoDB as well.
So there are several popular
key value store and that
key value store's main job is
to translate the upper layer
object abstraction
to the block
abstraction
at the bottom.
So basic idea of the key value SSD is
take the common functionality from software,
key value store, into device.
I will explain why this approach makes sense
in the later.
You can maybe think, well, what if you just move
a certain component into the device?
You may have more penalty because computing capability
or host is much better than device.
Why would you want to move certain component
into the device?
I will explain why this approach makes sense later.
But very simply put, let's get rid of that layer and put it into the device.
But I need to be more specific about this statement
because it does not mean we get rid of the stack completely
but we can actually reduce the overhead
the existing the key value store,
software key value store has.
So by doing that, we can improve the overall
throughput of system and also reduce the overhead existing key value store
has like a write amplification problem
and read amplification problem
over the load throughput problem.
So to realize this concept, we prototyped a key value SSD concept using the new Samsung device.
So in the last month, we introduced into the market a new small form factor SSD.
At the bottom, this is traditional,
the M.2 form factor SSD.
And the new form factor is called
next generation small form factor SSD.
It's long, mouth-proof.
But there is a reason we mouthful name, but there is a reason
we use different name, but anyway.
So it is same length of the M.2,
traditional M.2, but it's a little bit wider
than the M.2, so you can actually put the NAND chip
in the both side, and then you can put it
two rows like this.
We also have prototype for U.2 form factor as well.
And in terms of capacity,
it can be 1 to 16 terabyte,
but we prototype the key value concept using the 1 terabyte device.
So this is sort of the summary of the benefit, but I will cover the one by one in the later slide.
But basically, initial goal was to provide
better performance from the system perspective
and provide better capability.
And also, depending on the system,
you can actually use more disk space from system
by leveraging key value SSD compared to the block. And by, the point is the main focus of key value SSD
is to provide benefit from the system perspective,
not the device directly.
So if you want to compare block device performance
and key value device performance directly,
obviously key value SSD may be slower
than the block device for now because there are
several reasons. We haven't had standard yet, so we cannot automate the operation efficiently.
And as you can, I will explain a little bit more later, but key value SSD is more complex than the
block SSD. So it will have more overhead from the device perspective.
So it's going to be slower.
But from system level perspective,
you can get much more benefit by using this kind of device.
So the first one is when you build a system storage system,
the key value SSD can provide better scalability
than the block device.
I will show some data about this, what this means.
So basically, you can add more devices into the box.
Key value SSD basically provide linear scalability
in terms of capacity and performance.
So if you add more, a traditional block,
you can add more device to increase capacity,
but performance does not grow
when you provide kind of object interface to the user.
I will explain why that happened.
And by doing that, what is the benefit to the user?
Actually,
key value SSD, you need just one core to saturate device.
A traditional block device,
if you want to have an object interface,
a key value interface using software key value store,
you may need to use multiple cores.
Typically, when you use RocksDB to saturate one device,
you may need eight or nine cores.
Basically, we consume more CPU power.
Because of that, it's very hard to scale the box,
to scale the system by adding more devices
because you will hit the CPU saturation point quickly.
It doesn't matter how expensive the CPU is.
So I will show the result using very expensive CPU, $5,000 per unit,
but it is quickly saturated when you try to do KVALUE implement
KVALUE interface using the software based solution.
So you can actually reduce number of server
by leveraging KVALUE SSD
because it is linear scale that you can
provide better capacity and better
performance leverage in this.
So overall TCO cost will drop down.
Depending on how you calculate it, it can be changed, but based on our system, it can
reduce around 20% or 30% 정도의 렉을 제공할 수 있습니다.
키웨어는 꽤 쉽게 스케일이 됩니다.
기능을 여러 기능에 따라서 키웨어를 해시하는 방법으로 기능을 수정할 수 있습니다. hashing the key across multiple nodes, so it's quite easy to scale out.
So, KeyValue SSD, within a box,
you can add in more devices,
performance and capacity goes up.
By adding more server, also easily capacity
and the performance goes up.
But that is the on-page summary
of the main benefit of key-value SSD.
I will provide more details later.
Okay, so you have new device.
This good, but you may have concern, right?
So there is no ecosystem for that.
How can you leverage that? So we fully understand that
and it's quite challenging to us because we actually
overcome this kind of big
obstacle to us.
That's why actually at the beginning of this presentation,
this is not to show off something to you.
Actually, we want to invite you to work together from different aspects.
So I will go one by one.
So to enable this kind of technology, we believe three pieces go together.
We should show the benefit by providing a product.
So we both prototype the concept in our real product.
But okay, you build something,
but how can you show the benefit to me?
You may ask that.
So we explore several applications,
whether those applications can pick up
this technology quickly.
Applications do not have any infrastructure to pick up this technology,
so we actually need to build an infrastructure like a device driver, library, or API, or a command set to extend this kind of device.
So we should build a core software, a prototype, to prove this concept is working.
And that's not enough, right?
When we talk to customers, they say, well, this is good, but you are the only one.
Then we don't want to be locked in.
So to invite more device vendors, we actually open what we have done.
And basically, OK, this is the core requirement for this kind of device. And we want to make a standard and open to the community
and invite others who can actually contribute.
So we are working on standardization in MBME and also SMEA.
So we started a few months ago, and discussions are ongoing right now.
So
this should be both together
and if industry see
the benefit, then we are open
to work with you.
depending on your company's situation,
you can contribute application side
or you can contribute product side or
center side or the infrastructure side.
So we are quite open for that.
So let me go quickly one by one.
So this is really busy,
but in the previous slide,
I talk about, okay, you have a key value store,
we can remove this and move into the device.
You can simply say that, but is it really good?
Then we have to think about whether it's really good
or not, and how much you want to move
from the software to the hardware.
If you want to move entire
KVL storing to device,
it's not going to work because it's too
heavy. Device does not have
that capability. And how much
you want to move.
What is the core
feature you want to move?
So when you look at the traditional stack,
at the bottom of the stack is you have block device,
and you need block device driver.
Usually you put the operating system,
and you may have volume manager under this,
but let's skip that part,
and you may put the file system on top of that,
and typically software key-value storage
running on top of file system.
And the application is running on top of this key-value interface.
This is a typical structure.
The application can be MongoDB, can be DynamoDB, can be SAF.
And if it's SAF case,
you manage multiple nodes across
the cluster.
But if you just see one node
in a cluster, it's mostly look like this.
Then what happens here?
So, let's start
from the key value store.
Key value store typically
manage index to identify
object within the key value store.
And also do the logging to provide transaction.
And what happen in the file system? We have a file mapping, so block mapping.
To identify a file, we have to maintain all the block location information.
Basically, a file system provides two things.
One is to provide the namespace,
and the other is to manage the storage.
So to manage storage, you need to
the mapping information,
maintain mapping information about your file
to the devices.
And also file system to the journal.
Okay, so sort of the reason done,
and they maintain this information for their own purpose.
Then what happened in the SSD?
SSD also has a mapping to translate the logical
block address to the physical block address
to provide a transaction efficiently
so SSD maintain the log information.
So as you can see, there are a lot of
redundant operation across stack.
And these actually add more overhead
and drop the performance and also reduce the available user space.
But SSD already has this kind of functionality
and we collapse this functionality into the device.
Basically, we already have that,
so this one can be
efficiently implemented in the device.
While we are doing that, we can provide a different interface that application actually consumes.
That is the main idea.
But from the user's perspective, you can think, okay, you remove this software key value store, and the device provides this interface track.
What happens here is actually you collapse the multiple logging and the multiple mapping into
the single mapping and the
single logging into device.
So we try to minimize
the capability into
device, but
that capability should be enough
to support application.
We don't want to cover all applications
because some of them
doesn't need to be implemented device can be implemented software stack like a library.
So we can cover more applications, but what is the core functionality should be pushed down to the device. 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 주요 So in the previous slide, I briefly explained the basic idea and whether it is actually a good idea to move that into the device.
And actually, I have many questions. have the information about kinetics, and, oh, you are building something on top of a block
inside the device. That is a common question.
But in the previous slide, what I meant was, okay, you can
collapse all the redundant functionality into device.
So device itself actually manages differently.
Traditional, the SSD has a mapping table
and translates LVA to PVA,
but we call that a flash translation layer.
And it maintains mapping and also handling the transaction,
but we completely rewrite the FTR.
So FTR itself is aware 다시 적어냈습니다.
FTR은 KeyValue Abstraction에 의해 깨달았죠. 그래서 우리는 NAND를 다르게 만들었습니다.
우리가 여기 맵핑 테이블에 대해선 Index을 사용합니다.
Index는 KeyValueStore에 비슷한 것입니다. instead of mapping table, because index is similar to the host-based key-value store.
So this guy actually handle variable size,
key variable size object by itself.
So it is not block-based mapping anymore.
So, new FTL is implemented and provide interface
of key value directly from the device. UFTL is implemented and provide interface
of key value directly from the device. And again, I'm using the term device.
What they mean, this is device actually.
You cannot work alone.
It does not exclude the possibility.
In the previous session, we talked about the object storage,
and whether this device can have Ethernet interface or not,
that's completely also our problem.
But we actually prefer to have device interface,
not the service interface right now.
It can evolve, but the reason is that the market is quite different for now.
It can be converged, but we are targeting the main storage, not cold storage.
So providing high performance and providing enough capacity for that market segment,
we need to provide a device interface for that.
So that's why we, based on MVME and this device protocol,
we extend the protocol to support key value.
We introduce several new commands into the MVM spec and through that command
the host can communicate with this device.
So currently it is not standardized
so we are actually using the vendor command,
vendor unit command.
So to use this kind of device 그래서 이 기술을 사용하기 위해서는 호스사인에서 가장 적은 소프트웨어를 필요합니다.
그래서 당연히 이것은 기술입니다. 그래서 새로운 기술 드라이버가 필요합니다.
현재 우리는 3가지 종류의 기술 드라이버가 있습니다. Currently, we have three different types of device driver. So for Linux, we extended the community version
of the NVMe device driver, adding new command into that,
and also adding a new feature like asynchronous I.O.
because the host can send
command to device through IAFTAR for now,
but IAFTAR problem, the IAFTAR has a problem,
like it is synchronous operation.
If you do the synchronous operation,
your performance gonna be very bad,
so we added to the lib I.O. type infrastructure
into this new device
driver, so you can actually
do the asynchronous operation from
the application.
We also have a user-based
driver. It's mostly
SPDK extension,
so
working more well.
There are some limitations
due to the user-level device driver itself, but there are some limitations due to
the user-level device driver itself.
But in terms of functionality,
we provide that.
And also implement
the Windows device driver.
And
we plan to open
this Windows device driver.
Actually, all of them.
But not open yet.
But once the standard device driver, all of them, but not open yet.
But once the standard work progress,
we may be able to open it.
And on top of that, it doesn't matter what device driver
you're using, what is the feature device should provide
eventually, so we call as as abstract device interface,
and this basically provide an abstract device functionality
from the host perspective.
So it does not need to aware of what is actual protocol,
the command protocol on the line.
It provide mostly functionality and semantic
of using this device.
So the basic functionality like provide namespace,
how we can actually see
object across multiple devices,
and the object itself, and the basic operation like put, get, delete, and exist.
This is actually part of standardization
and it can be extended over discussion,
but we try to minimize the model of operation
at the beginning and if we have a consensus,
we can extend more and more.
So if we have too many things up front,
then the people are going to reject it, right?
So we have the opposite approach.
We put a very minimum set, and do we need to extend that?
We can decide after discussing through the standard activities.
And on top of that,
since
we collapse the software stack,
that is not always
good. We lose something.
For example, in the file system,
it has page cache.
It improves performance a lot.
But if you talk
to the device directly, you lose the caching effect completely.
And the kernel stack has a different type of feature
like asynchronous IO.
But as I mentioned in the early discussion,
Linux driver through the IOctor can be synchronous.
Can you overcome that kind of problem?
So we actually implement the library
to provide better management,
especially for user-level device driver.
You have to allocate memory from the huge page.
So you, and if you don't use that,
then you have to copy and basically negate
the benefit of using user-level device.
Right.
So there are several issues actually we have to address.
So, what is the minimum functionality,
at least we can use this device
without significant penalty.
Through this study, actually we introduced
the memory manager and this library can manage
multiple devices and provide multiple queues. Okay.
Okay.
And also have mostly the write back
write through cache
to maintain the
status of object
consistent. So we
implemented the several
key features into
this library.
So, and,
we can provide this kind of the infrastructure,
but can application actually can pick up this technology?
There are different type of integration points.
So one example is, for example,
if you want to actually implement RocksDB on top of p-value,
there's no easy way to do that.
Actually, we have to cut out the major portion of RocksDB
and plug it into the system using ROCKDB.
But in terms of the performance and efficiency,
it's gonna improve significantly.
Another model is to using the storage engine.
MongoDB has abstraction for storage engines,
so you can plug in different type of storage engine
as long as it compatible to their abstraction.
So we did this kind of work with MongoDB
and we did some work for LevelDB actually
to plug in the key value SSD into the system.
And another way is to, like for example,
they have abstraction, much higher abstraction, like OSD.
You can, as long as you provide OSD interface,
you can plug in any storage engine into this system.
So some system, the point is that some system
already have abstraction for their system,
so this new device can be easily plugged in.
Some system does not have that kind of abstraction, so we may have to change the software quite a bit.
But the good thing is there are many systems that actually have that kind of abstraction,
so we can plug in our library,
the Kibble SSD library into the existing system.
So,
I briefly talk about what kind of software we need. So, the software work,
some of them we can,
the Samsung can do,
but actually many work
like storage engine or the application
like cache or storage system, that is not our work.
So we need to work with industry
to enabling this technology.
And we show some pro point through our own development
and study and by providing the performance number
and the benefit, so we can work with you guys.
And regarding the standard, this is from this slide
I talk about standard.
We are working on the NVMe and the Cinear right now. And the NVMe case, we actually propose T-PAR
and discussing with the community right now.
And Cinear we defines several API
and I use the term ADI in the previous slide
and define what kind of the operation we need to define.
But the one thing I'd like to highlight
is it's not about the object drive
and it's different from the object drive
is about key value interface
and I discussed about
what is the difference between object and
key value in the previous slide
so from
the command perspective
actually we introduced
four new commands put, get, delete, exist
and we extend
existing command to manage
device efficiency 그리고 기술을 유지하기 위한 존재적인 조건을 제공합니다.
이 부분을 끊겠습니다. 이 부분은 연결이 안 되었습니다.
이 부분은 KV-LUSSD에 관한 모든 종류의 논의를 통해 열립니다.
사실 제가 많은 답변을 받았습니다. any type of discussion regarding key value SSD. So if you, actually I had a lot of the feedback.
Okay, can you add this, can you add this?
So you can actually join, participate this activity
in the NVMe community or the CINEA community.
You can actually show your interest
and reflect your need into the standard.
So, regarding the performance,
so we did some experiment by building a real system.
I'm gonna quickly touch some data here.
So basically, in the previous slide,
I talk about, okay, what is the benefit of using key value SSD.
From system perspective it provides better scalability,
it can scale up and scale up,
and compared to the existing key value store
it provides better performance.
So to show that from the single node perspective
we compare RocksDB, because RocksDB is quite popular.
You can say, well, is RocksDB the right one or not?
That's debatable, but this is a very popular one,
so we compare the performance of RocksDB
on top of block device versus the Kibale SSD
using our software stack.
And RocksDB, 페이스북에서 유명한 소프트웨어 스택입니다.
이 소프트웨어 스택은 적절한 수행을 지원하기 위해 적절하게 적용되어 있습니다. Depending on the workload, you can say different things,
but we compare the write operation mostly.
The reason is that read operation is heavily depending on the caching effect.
So if you have a large cache, RockCV cache, you read all the requests from the DRAM.
So it's very hard to compare and justify any number.
So what is the better way?
So we mostly focusing on the right operation
and there are two type, right?
You can use the random operation, the sequential operation.
For ROFS DB case, if you do the sequential operation,
basically it's overhead minimum.
So, but the real world the sequential operation, basically it's overhead minimum.
But the real world, sequential operation does not exist in the key value space.
Then most operation is random.
So we mostly focusing on the random operation here.
So this is one, this graph basically,
you can take as a one data point,
it's actually not the strong claim,
but the,
when you use the RocksDB,
you have around 13 watt, what that mean.
The RocksDB system, if you write one byte,
the RocksDB actually write 13 bytes to the disk.
So, existing the key value store ROXDB or Viotel,
the main problem of the system is they have very high
write amplification and read amplification.
Even though you do not write that much,
actually system write a lot.
So, we can get the benefit of by using the key value SSD,
but reducing that kind of the overhead.
I also mentioned about reducing the redundant part
and the right implication actually back to SSD
because you write more data than you need to do.
Actually, device age it very quickly.
So in terms of performance,
WAF actually eat up your bandwidth.
So device provide a bandwidth,
but you can use only a fraction of the bandwidth
due to the WAF from the application perspective.
So that is the actual performance benefit
and the reliability benefit of using the key value SSD.
And this is another graph for scalability.
When you add more devices,
key value SSD case actually by adding more,
you can get more performance.
But the ROX DB case actually tried to saturate the device
as much as we can by leveraging the older CPU power,
but after some point, CPU is saturated.
So this system has 48 cores,
but it saturates very quickly after six devices.
So to scale more, you have to put more cores into system,
but this system costs around 5,000 per CPU,
but we only use 18 actually for key value SSD.
Key value SSD just use one core,
but existing system actually use lot of cores,
so you can easily hit the CPU saturation.
So that is the main benefit from the scale-up
perspective. And we also
configure system using the MVM fabric. So MVM fabric has
very low latency, and you can actually easily
disaggregate the system, and by configuring
multiple systems, you can actually
get a similar performance benefit
to the scale of cases. So since
I'm mostly running out of time, so I'm going to
stop here.
So the main point of the experiment is that
a traditional approach to provide a key value interface
using host software basically eat up the CPU power a lot
because of that it's very hard to make systems scalable.
But key value case offloaded the core functionality
from the host to device and the collapse software stack
and the remove redundant operation.
By doing that, you can easily scale up
by adding more devices in terms of performance
and the capacity.
So we show the benefit from scale up, scale out,
and also from the single storage perspective.
So again, I can't discuss this more,
the offline or by email,
but basically we did some implementation
and experiment, and
there is great potential
for this technology, so we
want to work with you
guys in the industry, and
also
move forward together.
So, thank you very much.
I'm happy to take the question.
Yeah.
Thanks for listening. So thank you very much. I'm happy to take sneha.org.
Here you can ask questions and discuss this topic further with your peers in the storage developer community.
For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.