Storage Developer Conference - #141: Unlocking the New Performance and QoS Capabilities of the Software-Enabled Flash API
Episode Date: March 2, 2021...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage
Developer Conference. The link to the slides is available in the show notes at snea.org
slash podcasts. You are listening to STC Podcast, episode 141.
Hello, I'm Rory Bolt, one of the principal architects of Software-Enabled Flash, a newly announced technology from Kyoksha.
Today I will be covering what Software-Enabled Flash is, why we created it, and the concepts and technologies that it contains.
I'll also show some demonstrations of Software-Enabled Flash running on a prototype FPGA, a few coding examples,
and I'll finish up by directing you to where you can find more information
on software-enabled Flash.
What is software-enabled Flash?
Software-enabled Flash is a media-based, host-managed hardware approach.
We'll talk more about this in the coming slides.
We redefine host interactions with Flash to allow applications to maximize
performance and provide functionality that would be difficult, if not impossible, to achieve with
existing interfaces. Giving the host control over the media enables the host to define both the
behavior and the performance characteristics for Flash-based solutions. This can enable new storage
functionality that maximizes performance and efficiencies.
Or stated simply, software-enabled Flash
gives you the ability to extract maximum value
from your Flash resources.
Coupled with this media-based hardware approach
is a software-enabling API.
This API is designed to make managing Flash media easy
while exposing all the capabilities of Flash.
The highlights of this API are,
first and foremost, it is an open-source Flash-native API.
Next, it's designed to make it easy to create
storage solutions with powerful capabilities.
It abstracts the low-level details that are vendor and flash generation specific
so that your code can take advantage of the latest flash technologies without modification.
Being an open project, any flash vendor can build software-enabled flash devices
that are optimized for their own flash media.
And finally, although just the API and documentation
are published now by Kyoksha, we will
be releasing an open source software development
kit with reference source code for block drivers,
Flash translation layers, utilities, and other coding
examples.
As previously introduced, software enabled Flash
consists of hardware and software components working together.
I've had the unique opportunity to meet with the storage developers of most of the world's hyperscalers and talk about their storage needs.
Taken with the input of other engineers at Kyoksha, this has allowed us to distill a list of basic requirements and features for hyperscale customers. I should mention that
although hyperscalers face similar problems, their individual priorities and approaches vary
significantly. On the requirement side, flash abstraction is not just about code reuse.
There can be major economic and performance advantages to transitioning quickly to newer flash generations at scale.
Scheduling is increasingly important to the hyperscalers, and it's going to be covered
in depth in following slides. Access to all the media, or other words, avoiding the RAID tax.
Most of the hyperscalers already ensure system integrity at the system level and view RAID within the device as actually being a capacity tax that they pay for and don't really need.
Host CPU offload.
For hyperscalers, the host CPU can actually be a sellable resource, and so minimizing the impact to the host CPU is very important to them.
Flexible DRAM configurations.
Flexibility in the DRAM architecture relieves the device from handling worst-case scenarios,
and we're going to be talking more about this too. On the functionality side, data placement
to minimize write amplification is extremely important to almost all the hyperscalers. Also of interest
is isolation. This can be isolation for security reasons or relief from noisy neighbors in
multi-tenant environments. Latency is extremely important to the hyperscalers, and we're going
to cover this in depth in the following slides too. Buffer management is really tied to the flexible DRAM configurations I mentioned earlier
and will be covered in the following slides.
Finally, adaptability to new workloads.
The hyperscale environment is dynamic, and it changes very quickly behind the scenes.
The preference is for standard configurations that can be provisioned,
configured, and deployed in real time.
In response to those requirements and required features, the software-enabled Flash API with hardware structure, buffer management, programming of the media itself, and error management on the media.
Not listed separately, but touching many of these areas, is latency control.
In order to maximize the performance of Flash storage, it is necessary to be aware of the geometry of the flash resources. The software-enabled
flash API allows storage applications to query this geometry in a standardized manner. Some
of the characteristics exposed are listed here. The API also allows control over how
many flash blocks may be opened simultaneously and control management of the associated write
buffers. When discussing the programming of Flash,
it is important to note that the optimal programming algorithms
vary from vendor to vendor,
and often between Flash generations of a given vendor.
Software-enabled Flash handles all these details
and lets the Flash vendor optimize for their own media.
The API was created with consideration for other Flash vendors,
as well as all the foreseeable needs
of Kyokushio's own future Flash generations.
Finally, with respect to error management,
software-enabled Flash allows vendors
to optimize error reduction techniques
specifically for their own media characteristics.
And it also controls the time budget
allowed for error recovery attempts,
once again tying back to latency.
This is a high-level block diagram of our software-enabled flash controller.
This is one possible configuration.
Other vendors are free to implement different architectures as long as they comply at the API level.
For example, although we use the toggle interface to connect to our flash chips, other vendors might want to implement the OnFee interconnect.
Note that the design also uses standardized interfaces wherever possible and is focused mainly on the management of the flash media itself, programming, lifetime management, and defect management.
An example of utilization of standardized interfaces is the use of the PCI interface to communicate with the host itself. The controller has advanced scheduling capabilities on a per die basis and hardware
acceleration for garbage collection if needed or wear leveling tasks within the device.
Another important call out is the optional use of on-device DRAM. Software-enabled flash can be configured without any on-device DRAM and can be used
host memory resources instead. As shown in the block diagram, the use of DRAM on our
software-enabled flash controller is optional. Why is that important?
The answer lies in the fact that hyperscale customers often require thousands of simultaneously open flash blocks on each flash drive. The actual numbers vary from hyperscaler to hyperscaler,
but requirements we have heard have ranged between 4,000 open blocks per device
to up to 16,000 open blocks per device. Since each open block requires a write buffer,
this can create a demand for a lot of memory, potentially tens of gigabytes per device. It's often the case that the actual number of open flash blocks is unknown
ahead of time and can vary significantly over time as a function of system load. So sizing the device
DRAM to be able to handle the worst case loads creates stranded DRAM resources on the device under normal circumstances.
Software-enabled Flash supports device-side DRAM configuration,
host-only DRAM configurations, or hybrid configurations
that allow drive DRAM to be sized for normal usage
and host DRAM to be used during periods exceeding the limits of the drive's DRAM resources.
Note that in systems that use host-side DRAM resources, there are system requirements to
protect against data loss in the event of unexpected power loss to the system.
Many hyperscale environments already have this in place with either non-volatile memory
resources in the hosts, system-level mirroring, or system-level erasure encoding.
Now I'd like to introduce the software components of software-enabled Flash.
Kyokusha has created and will release a software development kit.
This will provide open-source reference block drivers,
open-source reference Flash translation layers,
and open source device
management utilities. Bundled with the SDK will be an open source API library as well as an open
source device driver. This block diagram shows how the pieces of the SDK interface with each other.
Two notes on the software layering. First, as you can see, it is possible for a user application to interface directly to the software-enabled Flash API library, bypassing file systems and traditional device drivers within the system.
We've built a couple proof-of-concept applications, and we have them running today on our FPGA prototypes.
These include a software-enabled Flash engine for FIO, as well as versions of RocksDB and Firecracker that are software-enabled Flash native.
Although these applications are currently just proof-of-concepts, in the future, we plan to include open- source native applications as part of the SDK itself.
The second note is the SDS stack in the center of the diagram.
Most hyperscalers today are already running their own software-defined storage stacks in their environments. These software-defined storage stacks can be modified to interface to the software-enabled Flash API library and are not dependent upon the SDK reference code
for any purposes other than as an example of best practices for how to implement solutions using software-enabled Flash.
And now for a system-level view of one possible deployment of software-enabled Flash.
Note that the items outlined in red dashes are items that would likely be customized for a particular user or customer environment.
So customers would likely modify the reference flash translation layer,
possibly the reference block drivers,
and certainly they would have their own software-enabled flash native applications.
Now let's start at the top and work down.
Here we see unmodified applications using the POSIX API to
talk to a file system. The file system would talk using a block device I.O. call, transitioning
from the virtualized guest into the host system and would interface with the software-enabled flash QEMU block driver.
The block driver makes use of the reference flash translation layer,
and then uses the Ceph API to call into the software-enabled flash library.
From there, we are now working on an IOU ring extension to allow us to bypass system calls
and interface directly with the software-enabled flash driver, and then the transition through the
software-enabled flash command set from the kernel to the software-enabled flash unit itself,
the actual hardware device. The software-enabled flash QEMU block driver is part of the
SDK itself and it's useful because it allows unmodified applications to take
advantage of many but not all of the features of software-enabled flash. Some
of the features that can be used in this type of configuration are isolation, latency control, and die time allocation.
Now I will introduce the concepts and features of software-enabled flash that will be necessary
to understand the later examples.
This diagram depicts one possible software-enabled flash unit.
There is the controller and then the die that make up the unit.
In this particular example, we have 32 die that are arranged as four banks across eight channels attached to the software-enabled flash controller.
The first concept I'd like to introduce is that of a virtual device.
And a virtual device is a set of one or more flash die that provide hardware-level isolation.
So an individual die can only ever be assigned to one virtual device at a time.
DAI are never split or shared across virtual devices.
The next concept is that of a quality of service or QoS domain.
And this is a mechanism that we use to impose both capacity quotas as well as scheduling policy
and provide provisions
for software-based isolation. Note that it is possible for multiple QoS domains to share
a single virtual device. So in this example, two QoS domains are sharing one virtual device. This QoS domain is consuming the entire
virtual device. And likewise, the amount of capacity that's allocated to a QoS domain
can be variable and allocated over time.
The final concept is that of a placement ID,
and this is a mechanism that allows applications to group data at the superblock level
within a QoLity of service domain.
This slide describes how the concepts introduced provide control over data isolation and placement.
Superblocks start out in a free pool within the virtual device.
As a QoS domain allocates a storage block, it is drawn from this free pool and assigned to the domain. The device is free to choose any free block in the pool so that it can
track block wear and block health to assign the optimal block to a domain to maximize device
endurance. Superblocks are never shared between domains. There is no mixing of data at the block
level. When a superblock is released, it is returned to the free pool so that over the
lifetime of the device, ownership of a superblock can transition between QoS domains if there are
multiple QoS domains defined in a virtual device. So to summarize data isolation, the two main
mechanisms are listed here with their benefits and restrictions.
Die-level isolation is hardware-based isolation. It is the most effective and the least scalable.
The reason it's the least scalable is that it's limited by the number of physical die
present. On most devices, there will be somewhere between 32 and possibly 128 DAI per device. So you would have a maximum of either 32 or 128
virtual devices possible. Isolation at the block level within QoS domains is the most scalable
solution. This scales to thousands of tenants, but it only provides software-based isolation
that's enforced by the scheduling capabilities of the software-enabled flash unit.
Closely related to data isolation is data placement.
Here we're going to introduce the concept of a nameless write mechanism to control data placement.
Why did we feel that a new write mechanism was necessary?
There are system benefits that can be realized with control over data placement.
However, if physical addressing is allowed for writes, the host
becomes responsible for device wear. Flash memory is a consumable. Poor choices for physical data
placement can wear out flash devices quickly. So how can the host have control over data placement
without needing to take responsibility over ensuring device health? The answer is Nameless Write.
When a new superblock is required for a placement ID,
or if a new superblock is manually allocated,
the device chooses the optimal superblock to use.
This is the framework for Nameless Write.
Now let's see how a Nameless Write works.
Nameless Write allows the device to choose where to physically write the data, but it
allows the host to bound the possible choices for the device.
As mentioned earlier, QoS domains are mapped to device nodes in the system, so that Nameless
Write operations must supply a QoS domain handle as well as either a placement ID for auto-allocation mode
or a superblock flash address returned by a previous manual superblock allocate command.
The QoS domain maps to a virtual device, which in turn specifies which dies can be used for the write.
The placement ID or flash address specifies which superblock, owned by the domain, should be used for the write.
If a placement ID is specified, a nameless write can span superblocks and additional superblocks will be allocated to the domain as needed.
In manual allocation mode, nameless writes cannot span superblocks.
The device is free to write data to any empty space within the bounds specified by the
host. And when the write is complete, the actual physical flash address is returned, enabling
direct physical reads with no address translation. Nameless write operation automatically handles
all media defects and mapping. Direct physical read optimizes performance
and minimizes latency.
Similar to the nameless write operation,
software-enabled Flash has a nameless copy operation
that can be used to move data within device
without host processing.
This is useful for implementing garbage collection, wear leveling, and other
administrative tasks. The nameless copy function takes as input a source superblock, a destination
superblock, and a copy instruction. These copy instructions are powerful primitives supporting
valid data bitmaps, lists of logical block addresses, or even filters to select copied
data based upon logical block address. A nameless copy can operate on entire superblocks with a
single command. This animation illustrates the difference in impact for the host for
implementing a garbage collect using standard read and write commands versus the nameless copy command.
And now for a more concrete example.
This movie is a demonstration of nameless copy running on our FPGA prototype device.
Both sides are running identical write workloads. In a short while, Garbage Collect will start and you can see the difference in impact to
the system.
As Garbage Collect starts, you can see CPU utilization rising rapidly on the manual copy
side.
This demonstration has been sped up for the purposes of this recording,
but at the end of the demonstration,
the nameless copy has issued 24 commands versus over 800,000 commands for a manual copy.
The nameless copy has issued or transferred 120 kilobytes of copy instructions
versus over 3 gigabytes of data that had to be copied using the read and write permittives.
Another key feature of software enabled with Flash is the advanced queuing and scheduling features, which we will spend a lot of time on over the next several slides.
Scheduling and queuing controls how the die time is spent.
We can control read time versus write time versus copy time, as well as the
prioritization of multiple workloads. Consider a multi-tenant environment. There may be business
reasons to enforce fairness or to give certain tenants priorities, and these business needs may
change over time. These tenants can share a device, and a weighted fair queuing can support the performance goals of the business.
The host is allowed to prioritize and manage die time through the software-enabled Flash API,
and the device will enforce the scheduling policy.
This is the basic architecture of the software-enabled Flash scheduler.
First, I will go down the feature
list. Each virtual device has eight input queues, and since a virtual device can map to as small an
area as one die, this means that each individual die can potentially have eight FIFO input queues.
The device scheduler automatically handles program suspend resume for both program and
erase commands.
The host can specify a specific queue for each flash access command on a per QoS domain
basis. So one QoS domain might submit its reads to Q0
and its writes to Q1
and its copy commands down to Q7.
And another QoS domain, for priority reasons,
might want to have its reads go into Q2,
its write operations or programmer operations go into queue one, and
its copy operations go into queue seven.
Every queue can specify die time weightings for each individual operation, read, erase,
and program.
So each queue has its own weights for erase program and read operations.
And finally, the host can provide overrides for both the default queue assignments and the default weightings for individual commands to dynamically adjust to changes in the environment.
So now that I've gone over the features, let's talk a little bit about the functionality.
When all of the weights are set to zero for all of the queues, it works as an eight-level priority-based scheduler, with Q0 being the highest priority queue and Q7 being the lowest
priority queue. When all the programming erase and read weights are set to the same non-zero value, it works as a round-robin
scheduler. And finally, when unique erase program and read weights are assigned on a per-queue basis,
it works as a die-time weighted scheduler. You should also note that even though there are eight
queues defined in the architecture, it is not necessary to use
all eight queues if your application does not need to.
And here is a demonstration of software-enabled Flash scheduler running on our FPGA prototype. We're going to start the test and note that we
have two domains. We have set the weight to be slightly higher for one domain
than the other domain so that the two lines do not overlap each other. In a moment, we will alter the weight, and you can see that we have now reduced the weight
of QoS Domain 2 to 150 for read operations, and we have increased the weight of QoS Domain
1 to 250.
This graph is a graph of latency, And so we have lowered the latency for domain
two and now we have just reversed it and made QoS domain one have a weight of 150 and QoS
domain two the weight of 250. And you can see that the priorities invert and now QoS domain 1 has the lower latency. We can take this to some
extremes and watch in real time as the scheduler reacts to the waiting overrides and adjusts the
latency response curves accordingly. This is the final demonstration of our FPGA prototype. In this demonstration, we defined
three separate virtual devices and are running three different workloads on three different
storage protocols. For this demonstration, we defined a virtual device that was running the ZNS protocol with a mix of read and write workload.
For this graph, blue represents a heat map of read operations.
Red represents a heat map of write operations.
You can see the channels labeled across the front, channel 0 through channel 8, and four banks.
And so here we have a virtual device spanning channels 0, 1, and 2, and banks 2 and 3 that was doing a read-dominated workload running a custom hyperscale FTL from one of the hyperscalers. And finally, in the foreground, we had a third virtual device that was running a standard block mode driver with a write-dominated workload, hence the red bars.
None of these workloads were impacting any of the others.
They had full hardware isolation with no shared dies between them.
It's important to note that we don't think this is a very realistic use case.
We don't see people trying to run multiple storage protocols simultaneously on the same device. The purpose of showing this flexibility, though,
is to illustrate a capability that is important to hyperscale customers.
Hyperscale customers can deploy a single device at scale, and then they can provision and configure
the device in real time to match the dynamic needs of their storage environment.
And as new storage protocols and new storage applications are created, they can quickly
be implemented using the software-enabled flash primitives.
And now for some actual examples from the upcoming SDK.
SEF CLI is used to configure software-enabled flash devices.
It's a command line tool.
It's open source, and it's included in the software development kit.
Any of the functions of the SEF CLI command could actually be incorporated into an application if needed.
There's extensive built-in help, and this is the top-level help output.
In addition to allowing the configuration of the device, it also supports all of the API primitives, so you can actually read and write data using the sefcli command if you want. But probably the neatest
feature is this, the shell command.
Sefcli contains a built-in Python shell for interactive programming of the device.
This is extremely useful for examining the device
for diagnostic purposes during software development.
And you can even write Python scripts
and send them to the SEFCLI program to execute.
Once again, a really helpful capability
for debugging your software.
Now that I've introduced SEFCLI, let's go over a few examples of its use.
Note that in all of these examples, many of the possible settings are not illustrated,
and we're just using the default values.
This is an example of creating a virtual device.
When you create a virtual device, you can specify the default weights for
the erase program and read operations, as well as the copy program, copy erase, and
copy read operations. We haven't included that in this example for the sake of
brevity. So here we are invoking SCF CLI telling it to create a virtual device. Minus S0
is specifying that we want to operate on software enabled flash unit zero. We're now going to define
the layout of the virtual device saying that it starts on channel zero and spans four channels.
It starts on bank zero and spans four banks. We're going to assign this virtual device a unique
identifier, and we're going to specify how many QoS domains we're going to allow to be created
within this virtual device. Once you execute this command, you have created the virtual device
and you can use the following command,
sefcli list virtual,
to list out all the virtual devices,
including the one you just created.
The next example is that of creating a QoS domain.
And when you're creating a QoS domain, you get to specify the queue assignments for each of the flash operations.
Note, same invocation line, but now we're saying instead of create virtual device, we're creating a QoS domain.
We're specifying it to operate on the SEF unit zero, the first unit present in the system.
We're going to give that virtual device that we used in the last example as input here.
We're saying create this QoS domain on virtual device zero.
And now we're going to assign an ID to the QoS domain.
In this case, we're making QoS domain ID 2.
We're going to put a capacity limit.
And in this case, we are saying that this QoS domain is going to have a maximum of 3 million ADUs or atomic data units. We're next specifying the size of the atomic data unit
as being 4k. And we're going to put in a couple interesting parameters here at the end.
The number of root pointers for this QoS domain and the number of placement IDs for this QoS domain. The number of placement IDs defines how many parallel auto-allocating streams
you can have within each QoS domain that will group data in the same superblock
as specified by the application.
The number of root pointers specifies the number of metadata
locations that you can store in the configuration of the QoS domain. And so if you're implementing
a lookup table or a key value store or some other storage construct on top of software-enabled flash,
you may want to store the metadata associated with a QoS domain.
And this construct, a root pointer, allows you to store the metadata for a QoS domain
within the actual QoS domain itself and then store the address at which you stored it in the root pointer,
and it gives you a bootstrapping mechanism for reinitializing a QoS domain at startup time.
And finally, just like with the virtual device example, after you've created this QoS domain,
you can use the list QoS domain command to see all the QoS domains that have been defined for that unit.
And now an example of nameless write.
This is sort of the smallest possible program to perform a write, and it contains three main functions.
The first function is called get set Ceph handle.
This fetches a handle to a particular Ceph unit in the system.
So if you had multiple Ceph devices in the system,
they would be enumerated zero through N. You specify in the index of the unit that you
want to operate on, and it returns a handle to that SEF unit. Next, we're going to open a previously
created QoS domain. And so we're going to call SCF open QoS domain.
We're going to pass in the handle so it knows which unit we're operating on.
We're going to pass in the ID of the QoS domain we want to open.
We're going to pass in a notification function pointer
to receive asynchronous event notifications,
as well as a context, which is a piece of user-defined data that will be passed back and forth with all calls, helpful for implementing multiple contexts in your environment.
A key, which is the encryption key for the QoS domain.
And finally, we're going to return a handle to the QoS domain that we just opened.
This is all preparation work for the actual nameless write command down here,
which is called SCF write without physical address.
It should be noted that you have to get a handle to the device and open the QoS
domain once at the start of your application, and then you can issue as many write commands as you
want with the open handle and QoS domain. So when you want to write to a QoS domain, you call
write without physical address. You pass in the handle to the QoS domain.
In this case, we're setting the mode to be auto allocate. We then have to supply a placement ID
of which super block of the super blocks that can be kept separate by placement ID, we want to write this data into.
We will pass in the user address, which is just user-defined metadata.
In the case of a block mode driver, this would be the LBA.
The number of atomic data units we want to write,
remember when we defined a QoS domain, we specified the size of the atomic data unit.
In the previous example, it was 4k.
This is saying how many 4k chunks we want to write into this QoS domain. We pass in the address
of an IOV, which defines the memory buffers associated with the data that we want to write. This is the number of entries in the IO vector.
Permanent address. Remember, the nameless write functionality returns the address at which the
data was written. So we specify we want the data to go into this QoS domain and be grouped with
this placement ID, but the actual unit determines the physical address at which the data is going to be placed and returns that here in the permanent address.
It also returns the distance to the end of the superblock that we're currently operating in after the write has completed.
This is useful when you're not using auto-allocate mode so that you can know when it's time to allocate the next block. And finally, I'd like to call attention to the override structure.
The overrides parameter is a pointer to a structure. It can be nil if you don't want to
override anything, but if you wanted to override the default queue assignment or operation wait,
you would do it by supplying an override to the write function.
And once again, this is how we showed that previous example
of dynamically adjusting the latency response between two QoS domains.
That was by altering the override of the defaults for those two QoS domains, that was by altering the overrides of the defaults for those two QoS domains.
As I mentioned in the earlier example, when you open a QoS domain, you can supply a pointer to a
notification function. This is an example of an asynchronous event handler. Once again,
opening the QoS domain, we passed in a handle to the event handler. Once again, opening the QS domain we passed in a handle to the
Notification handler. Here's the definition of the notification handler.
And this is typically implemented as just a giant switch statement handling the different types of notifications that can come from the event. There are several different types of asynchronous events that the device can issue,
address update notifications in the case of data
that's been moved on the device,
block state change events for super blocks
that have been closed or filled,
as well as capacity-related events.
This slide is illustrating a snippet of code from our Flash translation layer and an important concept of the software-enabled Flash
API itself. This is the routine that's used to update the lookup table for our reference flash translation layer.
And you will notice that when the FTL update command is called, it is called with the old flash address and the new flash address.
And this is an important design decision. When we update or move data,
we supply both the old address that the data was located at, as well as the new address where the
data now resides. This was done so that we can make updates to the lookup tables in the flash translation layer lockless. And so we can
handle race conditions between incoming data overwrites from users and offloaded copy operations
within the device by the use of atomic compare and exchange operations on the lookup table that allow us to handle
the race conditions without having to introduce locks into the FTL.
So a very important concept.
This next example is a direct access read.
Once again, this is sort of the minimal program where we're going through the operations of specifying which unit we want to operate on and getting the handle to the SCF unit.
We're opening the QoS domain.
Once again, these two steps don't need to be handled more than once in an application. But once you have specified a unit and opened up a QS domain,
you can then issue read with physical address and using a flash address parameter, which would be
something that was returned in the previous example in the permanent address field. You can specify the starting address you want to read from,
the number of atomic data units.
Once again, in these examples, these are 4K blocks, if you will.
And again, an IOV specifying the layout of the memory
to put the return data in.
The IOV count says how many entries are in the IO vector itself.
IO vector offset is a field that allows you to do
multiple operations on a single IO vector at different offsets
so that you can handle very complex memory layout schemes spread across multiple operations.
Finally, you have the opportunity to pass in the user address.
And once again, this is metadata associated with the user data itself.
In the case of the block mode or the reference FTL,
the user address is in fact the LBA.
And when we perform the read of the physical flash address,
we will read the associated metadata and compare it to the expected LBA as a data integrity check
that the data being read from the flash itself is the data that was expected.
And finally, once again, we have a pointer to an override structure
that would allow you to override either the default queue assignments
or the operation weights.
There's a little note here in this example.
The error recovery mode for a QoS domain is set at the time of creation, and this determines
whether the software-enabled flash unit will perform automatic error correction
and automatic error recovery on the QoS domain. In manual mode, there's a function called SEF set read deadline
that determines essentially what the recovery time budget is
so that you can specify how much time the SEF unit will spend trying to recover data when an error has occurred before
aborting the operation with an error response. And this is important because, as mentioned earlier,
many times in hyperscale environments, they've triple mirrored or they have other copies of the
data. And it's faster often to go fetch the data from an alternate source than to try and do a heroic ECC recovery operation on the flash itself.
Now, all of the previous examples, mainly just to make them more easily understood, have been synchronous examples.
But it should go without saying that all the data path operations
for software-enabled Flash have asynchronous versions. This is an example of what read with
physical address would look like in its asynchronous form. We essentially have an
IO control block that has the bundle of parameters associated with the call, and then you just issue read with physical address async,
passing in once again the handle, which QoS domain on which unit we're operating on,
and a pointer to the IO control block itself.
Well, we're coming to the end of the presentation. At this point, I want to do a little summary and tell you where you can go to learn more about software-enabled Flash.
So, as a wrap-up or a summary, software-enabled Flash fundamentally changes the relationship between the host and solid state storage. It consists of purpose-built hardware and a very powerful open source Flash native API to interface to software.
It leverages industry standard protocols wherever possible,
and it can be used as demonstrated as a building block to create different types of storage solutions. Once again, in our demonstrations, we have created standard block mode devices,
zone namespace devices, as well as custom hyperscale FTL devices,
all on top of the software-enabled flash primitives.
And the most important note to me anyway, is we're combining full host control
with ease of use, taking away the burden of the media management and the low level details
of the Flash itself. So for more information on software enabled Flash,
my recommendation is to go to our microsite,
www.softwareenabledflash.com.
On the microsite, you can find our white paper
and also either through the link below
for github.com, Keoxia America,
or through the microsite, you'll find a link to it.
You can actually go to the GitHub repository, download the latest version of the API, as well as associated documentation.
I encourage you to check back on the microsite as we have more of the software development kit available.
It will be announced on the microsite, as well as we have some interesting demonstrations on latency control and garbage collect that will be hosted shortly on the microsite, too.
Thank you very much.
Thanks for listening. about the material presented in this podcast, be sure and join our developers mailing list
by sending an email to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further
with your peers in the storage developer community.
For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.