Storage Developer Conference - #142: ZNS: Enabling in-place Updates and Transparent High Queue-Depths

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to STC Podcast, episode 142. Hi, everybody, and welcome. In this session, we're going to be talking about some of the use cases of sound namespaces. In particular, we're going to cover some of the new features that are coming to the MDME standard for in-place updates and transfer high queue depths. So, Kansen and

Starting point is 00:00:59 myself will be running the session. Kansen, do you want to introduce yourself? Hi, everyone. My name is Kenshin, and I will try to cover the Linux IO stack part of the GNS feature, how do we go about plumbing them into the stack. Back to you, Javier. Thank you. So I'm Javier. I'm also a software engineer, and I'm going to be covering the introduction

Starting point is 00:01:20 to some namespaces and then take a little bit on the user space stack. So if we get started motivating, why do we need a new interface? So SSDs are already mainstream. They offer great performance. They're easy to deploy in plug-and-play replacement for hard drives. They're getting to an acceptable price in terms of $2 per gigabyte, right?

Starting point is 00:01:48 However, there are three main problems that have not yet been solved in current solid-state drives. The first one is the problem of the lock-on-lock. So this basically means we have a lot of work in software stacks that do a big effort in trying to do append-only workloads. You can see this at application level,

Starting point is 00:02:13 like log structure merge trees. You can see this at a file system level with file systems like K2FS or ParterFS that try to be log structure and more flash friendly. But at the end of the day, the abstractions that we have in place for block devices make that whatever tries to be serialized in the host is being moved around by the different layers.

Starting point is 00:02:42 So being able to have a direct data placement policy that allows us to leverage those data structures directly through the device would allow us to get better write amplification, better over-provisioning or less over-provisioning, and remove the device data placement. And this effectively has an impact on the total cost of ownership of

Starting point is 00:03:05 solid-state drives. The second part is about multi-tenancy. So we're still yet to solve the noisy neighbor problem. There are solutions out there, but most of them are still following an implicit model. Having an explicit policy where host software is able to place data on different LPA ranges and have control about the QIS is still something that we need to solve

Starting point is 00:03:35 and we believe that CNS, as we're going to present it today, can help in that path. The third part is cost. As I mentioned before, putting those two things together, we make it easier to adopt technologies like QLC or later down the road,

Starting point is 00:03:54 higher bit counts. By removing the one gigabyte per terabyte mapping table for the 4K mapping, we can reduce the DRAM on the device. And then reducing this logon structures for data placement we can also reduce the op and write amplification so moving quickly through use cases that are targeted for for cell name spaces um i believe one of the first ones that we're hearing about, many people talk about it, is Archival. This has a very specific need for adoption of QLC.

Starting point is 00:04:37 And coming back to the TCO, reduced RAM amplification, over-provisioninging and DRAM. This is great for cold storage, trying to introduce solid state drives into the cold storage tiring. We see this use case where we have very large zones that are able to be mapped to very large objects that are already mutable. This means that the host side can do all the work that it needs to do for updates using other technologies,

Starting point is 00:05:14 either another layer of SSDs or some persistent memory, NVDs, whatever. And once those large objects are immutable, then they are directly persistent into the CMS. Another use case that we're starting to see, which is very interesting, is that we're calling logIO. And it has a lot of similarities with the archival use case. It is also about facilitating QLC adoption with probably TLC coexistence. It also tries to leverage these existing flash-friendly data structures, log structure, still reducing the TCO. But one of the main differences with the

Starting point is 00:06:01 archival use case is that we're not necessarily targeting very large objects. Instead, we are targeting a normal storage stack that is able to use CNS, but not necessarily be on an archival cold storage level. And for these use cases, we see the need for smaller zones. This is essentially because on workloads that are not targeted for archival, the objects that are being mapped are not that big. So if you look, for example, at log structure, merge trees, we have different levels. The first levels are not in the orders of gigabytes. Like that would be a large zone on a QLC device.

Starting point is 00:06:52 We see some megabytes, hundreds of megabytes maybe, depending on the use case. So having the possibility to map those to smaller zones and then have the ability to put these zones together to to um to be able to map to larger objects is of importance for this use case gathering the small objects on a single large zone would incur more garbage collection the third use case which is a little it is different than the two previous, is about IO predictability. And in this case, we do not care much about the type of NAND that we're using. What we do care

Starting point is 00:07:35 is that the data placement within the zones is offering a level of QoS across different zones, meaning that we can put different tenants on different places of the SSD in different zones and we will have some guarantees in terms of isolation across these zones. In this example you could see that if application one is write heavy and application two is read heavy, we would not see the outliers in the read heavy workload when both applications are running at the same time in this particular multi-tenant environment. So now running quickly through the main features of CNS, we have different

Starting point is 00:08:19 talks in this conference presented CNS, so we're not going to spend much time. I believe there are two things that we need to cover to understand the basics. First one is CNS allows data placement at different LBA ranges. How these zones are configured depends from vendor, depends from product. We have different configurations here,

Starting point is 00:08:42 and I'm sure there are many others. The second important part is about the state machine. It is important to understand that this state machine is being managed by the host. So whenever we want to write to zone, we need to open that zone. We have different ways of doing that. And whenever we want to reuse the zone, we need to open that sound. We have different ways of doing that. And whenever we want to reuse

Starting point is 00:09:05 the sound, we need to reset the sound. And this is an important concept because it's what allows host garbage collection to take place. So the host is in charge of moving valid data from a sound that it needs to reuse by resetting. This essentially breaks all data movement within the controller. The third part, which is write operation, I'm going to delay it to a slide in the future because we're going to talk about different write models used in CNS.

Starting point is 00:09:40 The specs are public, so you can go to that link and get the different TPs that are affecting the CNS. So 56 for namespace types, 53 for the CNS spec, and then right now we have 61 for simple copy. So the write model, once you have this CNS model, we again see three different ways of writing to a cns device right first one most simple obvious is writing a q-depth one why is this necessary so this is because on on a zone device we have the concept of the right pointer so So the right pointer sits at a specific LBA. And when we send IOs, right IOs,

Starting point is 00:10:30 to that particular zone, we need to guarantee that IOs are completed before we send a new IO to the zone. This is because of the non-ordering guarantee in NVMe. This is a limiting factor because on large zones, this essentially means

Starting point is 00:10:50 that we are not able to saturate the nano rate underneath. So some use cases are okay, but this is a limiting factor. In order to overcome this, we have two ways of writing. One is what we're calling the zone append command that has been standardized as part of the CNS spec. And it essentially means that we can send a nameless write to a particular zone without necessarily

Starting point is 00:11:21 pointing to the LDA where the right pointer is pointed at. So we can send several IOs of different sizes to a zone and then the controller will take those IOs, will place them in the right order and return the LBA value to the host. Then the host is responsible for remapping in the completion path. This has some implications

Starting point is 00:11:44 on how a host stack needs to deal with it, but it is a good way to deal with larger zones where we want to increase the QDepth. The second way of writing when we don't want to do QDepth 1 is about zone striping. And this essentially means we do QDepth 1, but across the smaller zones.

Starting point is 00:12:07 So essentially the whole software is responsible of deciding to which zones is being written in parallel. This striping gives the flexibility to the host to choose where it wants to write and how it manages the latency versus the bandwidth parameters when it is designing how storage is being used. It also has the advantage

Starting point is 00:12:39 that it is using traditional write operations. So changes to the host stack are really minimal. And now, you know, that is what is the basic concept of CNS as we have it today. What is coming new is what we're calling the some random write area. So this is still a new TP. It is underdeveloped on the MDME task force.

Starting point is 00:13:08 So we are going to talk about the main concept, but we are not going to talk about the specifics about the TP because it is being finalized. The main motivation for this some random right area is, well, what if we have a workload where we want to eliminate this constraint about the right pointer so there are cases where you want to be able to go a little bit back in time because your data is not your workload is not fully sequential.

Starting point is 00:13:46 There are small in-place updates primarily targeted at a metadata. And instead of having to serialize them and then do extra garbage collection, which means more write amplification and it's kind of counterintuitive for CNS, we essentially have a mechanism that places a window in front of the

Starting point is 00:14:07 right pointer and within that window we can do any in-place updates we want to until the data is immutable and once the data is immutable we essentially flash that window to the media and advance the right pointer where a new window begins. So this has two interesting properties. Obviously the first one is that you can do updates on a zone namespace. The second one is for the size of the window you essentially don't need to respect the strict in-order policy. Meaning that you can increase the queue depth. IOS will arrive at any order.

Starting point is 00:14:49 And within that window, that will be okay. There will be no problems and write errors due to the write pointer. And we see this beneficial for people that are having issues adopting the append philosophy and we also see this useful for people that have this inbuilt metadata into their iopath and then they need to to do these updates you know some file system checkpoints, some applications that are putting metadata either intermediate within the object or at the end of the object, things like that.

Starting point is 00:15:31 Those are the main use cases. How we plan this, Kansan will tell you in a little bit. The second new feature that is actually already ratified and published is simple copy. This is interesting for some namespaces because once you have to do host garbage collection, we have the problem that if you imagine a modern disaggregated storage, your storage might sit on the other side of some sort of fabric. When you need to do host garbage collection,

Starting point is 00:16:15 you basically need to move the data from your storage node all the way to your compute node through whatever fabrics you have in the middle. Go through your CPU root complex and then come back. This consumes a PCI bandwidth, any other fabric, RDMA, TCP IP bandwidth you have in the middle and it's consuming your CPU cycles. SimpleCopy is essentially a new command that allows us to choose a number of LBA ranges that can be on several zones that is a scattered gather list. You can choose a large number of

Starting point is 00:16:58 them. And then you select a single LBA range on a own, and by sending that command, data within the controller will be moved. So you are sparing all the data movement and CPU utilization. This can actually be used for other things that are of CNS, but it is primarily targeted for this. It is great when you combine them with things like some random right area

Starting point is 00:17:30 because then you can have different objects for your incoming user data and your garbage collected data and at the same time put the metadata how it fits. We see different customers using this as a way to better adopt CNS on an existing and mature software stack. And right now, Kansen, you can take over

Starting point is 00:17:59 and guide us how we do this in Linux. Thank you, Javier. So the picture here is the bird's eye view of the Linux ecosystem for GNS. At the bottom, we have the GNS SSD, which implements device side of the GNS DP. Parallel to that, we have QMU, which implements the same DP

Starting point is 00:18:23 and presents emulated GNS device. And that helps in the faster development and testing of the software stack. Moving upwards to the kernel, from 5.9 onwards, GNS support is present in NVMe driver. Coming to the block layer, a lot is inherited from the infra that was added for SMR device and GNS specific new things like zone append command is also present. On the FS side, F2FS and zone FS, they can support the zone devices already while work is in progress for ButterFS and GFS. Now, if we talk about taking some of this goodness to the user space application,

Starting point is 00:19:12 syscall interfaces need to be built. And a good amount of work is in progress towards enabling sync and async interfaces for new features via IOUring, LibreIO and Sync1s. Coming to the user space, there's an array of softwares getting GNS updates. There are management tools like BLK zone and MME CLI, performance and test tools such as FIO and block test. A couple of zone aware file systems and DBs and there are libraries such as XenVME which is going to help application

Starting point is 00:19:58 development by providing a single unified API for multiple backends. So I will talk on how we can go about plumbing some of the features that Javier mentioned earlier in Linux IO stack. While kernel would obviously get to use the new features first, one of the important goals that we are after is making these features reach to the user space applications directly. With this application, which like to do things by themselves, such as user space file systems and DB, they get more control, flexibility,

Starting point is 00:20:34 and hopefully better efficiency. And towards that goal, we believe that the block device interface coupled with async transport such as U-ring and AIO can be handy. Starting with the zone append, so as I mentioned earlier in the previous slide that zone append command is already present in the kernel and in kernel users are able to make use of it.

Starting point is 00:21:02 But when it's about exposing it to user space, there are a couple of challenges. The first one is how best to return where append landed because existing system calls didn't don't really have a very clear way to return the offset. And another issue is how to ask for zone append and not for regular write. Essentially append is a variant of write so application needs a way to tell when it wants append and not a regular write. So the way to solve these problems in AIO, first I will talk about AIO because it is much more straightforward compared to Uring. So in case of AIOV, we could use an unused field,

Starting point is 00:21:47 result two, to return the offset to the user space. And append is triggered by combining two flags, existing rw underscore append flag, and a new one which tells the kernel to report the offset directly. This will make more sense when we see the the urine scheme. So we will get there. Yes in case of IO urine we didn't really have any unused field like AIO but it was possible that application can can pass a pointer along with the SQE.

Starting point is 00:22:30 So when application passes a pointer to the URING, URING can update it when it receives the completion from the lower layer. And as far as trigger for the append is concerned, it is same, same as AIO, except that here the flag tells the kernel to report the offset indirectly because it is expecting a pointer. So the flag should convey the same. The last point about this whole interface is that it is not yet frozen, but a lot of debate has happened in the Linux kernel mailing list to arrive at this. So I believe the final interface wouldn't differ much from what I described here.

Starting point is 00:23:17 And before I move to GRWA, we talked as base on zone write lock. So when multiple writes are issued simultaneously on a zone, either by multiple applications operating on the single zone or one application sending writes at higher QD to the single zone, in both the cases, it may lead to error because of the violation of sequential writing within a zone rule. So MQ-Deadline implements a per-zone write lock. So many writes can arrive to the MQ-Deadline, but it makes sure that at max one ride is dispatched to the driver on a zone. And that eventually leads to QD1 per zone result.

Starting point is 00:24:16 In case of zone append, it is avoided. The zone ride lock is avoided. And the reason is that the caller is fine without order placement as long as it happens within a zone. So multiple events can be outstanding on a zone and they would get completed within the zone at some point of time and caller will get the location when it receives the completion for those append commands. So that's how the multi-QD is achieved in case of append. Now coming to GIWA, it is pluggable to a zone. So GRWA buffer is attached or removed from a zone via

Starting point is 00:25:08 Ioptel. That's the setup part of it. The nifty thing is that during write, nothing special is done in the sense that new opcode is not required. It travels like a regular write. And

Starting point is 00:25:23 the GRWA back region of a zone can be treated like a conventional zone conceptually as it can be randomly written so higher QD is possible and in place update is allowed so MQ deadlines write lock is skipped for GRW enable zone coming to copy, so because simple copy is a new operation we had to add a new opcode in the kernel which is called request op underscore copy. With that one can prepare a bio, package a list of source LPA in the payload and send it down to the NVMe for processing. And this whole thing is presented to the application via Ioptil for synchronous processing and to Uring or AIO for asynchronous processing.

Starting point is 00:26:16 This is how we can go about leveraging GRWA and simple copy in the F2FS. F2FS is used as a reference here. It is possible to apply the same underlying idea to the other log structure software as well. Couple of things are required to set up F2FS on zone devices. So it has maximum six open zones, hot, warm and cold and two sets of each, one for data, other for node.

Starting point is 00:26:44 The allocation and garbage collection unit is section, which is collection of two MB segments. For zone devices, this section is mapped to the device zone size. And it is made sure that F2FS disables the adaptive logging and remains in strict LFS mode so that writing to the holes is skipped.

Starting point is 00:27:11 Now, there are a couple of new problems with respect to GNS. Because F2FS metadata requires in-place update and GNS does not really have any conventional zones. So, a multi-device setup is required. And given the large zone size, it is possible that garbage collection takes more time, which can affect the user applications during foreground GC. And obviously 2D1 rides are speed dampeners,

Starting point is 00:27:46 be it during the regular rides or be during GC. So how we can go about solving some of the problems. So some of the metadata which gets updated frequently can be created out of the GNS itself such as checkpoint given that GRWA allows in-place update. The higher QD can be obtained by using append or GRWA. And simple copy is going to help in offloading the data moment, the GC data moment to the device. And by using the GRWA in-place update, some of the writes on the flash media can be avoided. So this example here,

Starting point is 00:28:31 which shows the recurring updates happening on a file. Few blocks are getting updated more frequently than others. In this case, the blocks zero and one are getting updated twice, while the block number two and 3 are getting updated only once. When we run this update sequence on GRWA disabled zone, which is shown in the left-hand side, it produces two invalid blocks.

Starting point is 00:28:59 Reason being, block 0 and 1 are getting updated twice. And for the same sequence, when it is run on a GRWA-enabled zone, which is shown in the right-hand side here, it doesn't produce any invalid block because in-place update happens in the GRWA buffer, and it doesn't really reach to the underlying flash. So here are some early performance numbers with GRWA and simple copy. The right-hand side plot here shows the trend we see with 4kB rides pumped at QD16 and the engine used here is IEURI. The blue line shows the bandwidth which scales as we increase the number of zones. We are writing these zones in in stride fashion.

Starting point is 00:29:49 And the reason of this hike is that as we increase the number of zones being written concurrently, the workload gets more parallel. The orange line shows the bandwidth with the same setup. The only difference is that the GRWA has been enabled on these zones which gives about 20 to 25 percent of increase in the bandwidth. The bottom most two plots are for simple copy. In this experiment 64 zones are being copied either via read and write, combination of read and write or via simple copy command. So simple copy takes about compared to read and write it takes, it saves about 30% of the time and the whole CPU consumption also goes down by about 500%. And one last note about all this data is that it's obtained from FIO. FIO already has the ability to run zone-compliant workload.

Starting point is 00:31:00 And we have added a bunch of new features, simple copy, GRWA and Stripe writing among zones to name a few. Zone append is not shown in the slide, but FIO patches are already out implementing the user interface, which I talked about in the previous slides. And at this point I will give the control back to Javier for the rest of the session. Thanks, Kanchan. So now that we've covered the kernel, let's talk a little bit about a library that we're calling XMVME.

Starting point is 00:31:40 And I'm super excited about it. And I'm just going to talk quickly through it because we do have a presentation by Simon who is going to get into the details of how XMVMe works, where are the performance numbers for conventional namespaces and what you can expect from it. But in essence, XMVMe is a library

Starting point is 00:32:03 that sits right underneath an application. It provides a single API for applications to be able to use any NVMe namespace types and be able to run through a number of different IO backends. So if we choose Linux, we have different possibilities when it comes to submitting I.O. If you go through internal, we have the traditional Libby I.O., normally peer-read synchronous. In the last year, we've got an I.O. year-end. And we also have a possibility to run a speed game. All these are great. The problem is that you need to become an expert on each backend

Starting point is 00:32:48 to be able to really take the most out of it. We have experienced that people that are an expert in one backend, they tend not to be data experts in different backends. So you can get 100% out of one thing, but you end up losing a lot of the performance or the features in another backend. And that's where XMV can help because through a single API,

Starting point is 00:33:10 we start really thinking about how the backend is doing things and you can submit an IO and change the backend at runtime and do not necessarily need to make changes to the application. There are several things that we do in terms of abstracting the backend.

Starting point is 00:33:29 We also try to abstract things that are as different as the threading model if you compare SPDK to IO-During. And we even target changes of operating systems. So right now, an application that is running on XMVME can run on any of the Linux backends there, but it can also run on FreeBSD through SPDK. And we're at the moment

Starting point is 00:33:57 implementing support for the normal IOPath in FreeBSD and through Windows if you want to do that. But again, Simon will talk you through that, so I really encourage you to go see his session. And the second part that I want to talk about is, well, what happens when you want to implement CNS on your software stack,

Starting point is 00:34:26 but you either do not have the hardware yet or you have new features that you want to get that are not in the samples that you have at the moment. Well, we've done a lot of work in QEMU to be able to support all new namespace types. So again, Klaus will have a dedicated talk at the SEC where he's going to talk about the rewrite he's done of the QMU and VME device and about the features that he has implemented on CNS. But a little bit of a spoiler, we have support for all the CNS.

Starting point is 00:35:10 We have support for a simple copy. All the patches are posted and now we are working on how we put all the work that is being done from different sources together. And as soon as TPs like the Sun random rat area

Starting point is 00:35:25 are ratified, we will submit the batches for those two. One small comment that I think is also interesting, a QEMU is mainly an emulated device, so you cannot really run performance number thread. But we're doing some work in what we're calling nbme load latency mode which we expect would allow us to use q and u sort of like a no block device in the kernel but not stopping at

Starting point is 00:35:56 the block layer but stopping on the other side of the of the nbme driver so that we can really test what is the performance and the latency that we should expect when using different paths and different IOPaths, but also different parts of the NVMe driver or even different NVMe drivers for that sake. So I believe this is the last part of the content. And this is about, well, when you put this all together on an application,

Starting point is 00:36:30 what happens? We chose RocksDB. And this particular implementation of RocksDB is through a library that we call XCTL, which is also available on the OpenMPDK website. XCTL basically allows us to abstract the logic for an LSM tree. There are many databases like RocksDB, you can choose examples like Sandra, some other more proprietary LSM trees that use the same logic. And we wanted to abstract that away

Starting point is 00:37:06 so that the actual backend that makes it to the application is very, very thin and easy to maintain. If you think of RocksDB, you can think of the HDFS backend, for example. And basically the approach we followed is that RocksDB through XCTL and then through XMVME chooses a pool of zones, and it is choosing a striping approach with a smaller zone so that we can really leverage the small objects on the first levels of the tree and then grow the size as we go to the deeper levels. And while we strive, we use the some random right area as a way to update the metadata at the end of the SS table. You can run ROCKCP

Starting point is 00:37:55 with a fully serialized SS table, but you will incur a little bit more of right amplification. Using the some random right area, you can allow these small updates that happen at the end of the SS table metadata and reduce the garbage collection that is needed afterwards. We also implement the simple copy to do the garbage collection.

Starting point is 00:38:19 This is the same logic as Canson has mentioned before, but on the application level. And it's fairly easy to plan the existing garbage collection in RocksDB to leverage SimpleCopy. And at the moment, obviously, we're using

Starting point is 00:38:36 XMVME. When we go through the kernel, we use the path that Canson has described. When we go through SPDK, we use directly the logic on xmbme through the pass through and bd and some early numbers and that we want to share so in the slides you have a screenshot of a grafana a interface but i think that we can go through a live demo or, you know, as live as it gets in this situation. So here you see RockCP workload. On the left-hand side, you have the terminal.

Starting point is 00:39:16 In this particular example, we're running an override workload, but the device is in precondition with a full write workload. And what you see here is basically three numbers that I believe are of interest. So the first one is in red, the write amplification that this workload is generating on an X4 file system without any form for sound support. So the way we're measuring this is number of bytes that come from user space and number of bytes that are leaving the file system only for the data part.

Starting point is 00:39:57 This is an important differentiation because we're not interested in measuring metadata at this moment. We just want to see how the way we're organizing data and we're mapping it to different, in these particular segments and their blocks, has an impact when compared to how we map data to zones on a zone-aware application or file system. The next step is what you see in yellow.

Starting point is 00:40:25 That is F2FS with zone support. In this case, we have two different block devices, one conventional for the metadata, one zone for the data. And we do the exact same thing. We run the same workload with the same preconditioning. And what you see is that there is a big improvement just by the way we're mapping the data in terms, again, of number of bytes that enter F2FS and number of bytes that are leaving F2FS on the data zone namespace. And then in green, you can see the same workload,

Starting point is 00:41:09 but running on our RocksDB backend on XCTL, XMVME, in this particular case, running also through the kernels so that this shouldn't have any impact, but yes, to be complete apples to apples. What is important to mention here is that the difference that we see over time, just on the software level, is very relevant because this number, at the end of the day, you need to multiply it by the RAD amplification that is being generated by your device so we can obviously not speak about the RAD amplification

Starting point is 00:41:54 generated on the SSD this is vendor dependent it depends on the workload it depends on many other things but if you do know something about the RAID amplification on your device, you can use these numbers as a way to factor the real RAID amplification that you are generating besides the real RAID amplification that ROCKSDB is in itself laid on top. In this case, that's what you see around 2x on the bottom. So, you know, real life examples, this tells us that a very flash friendly workload can gain, you know,

Starting point is 00:42:35 in the realm of 3 to 5x. Workloads that are actually not that good with the room preconditioning. We're talking between 15 and 20x. So, you know, I don't like putting specific numbers because it really depends. But I hope this gives you an idea of the benefit that we are getting from a software perspective when you are running an application that already understands log-structured objects that can easily be laid on top of zones. We also have some early numbers on the zone random red area,

Starting point is 00:43:14 which not only shows you that we can get the advantage of the in-place update, but as we do striping, if we choose to use larger zones that span across several non-blocks, then we also get the advantage of the better battle ways. So I think that's all. You know, we've gone through what are the basic concepts of CNS. I'm sure after this conference, you got that covered. What are the use cases

Starting point is 00:43:45 that we're targeting? That's important to understand because when you're moving from an archival-only use case to a more log IO determinism, new features, the new features that we're bringing to NVMe

Starting point is 00:44:01 are becoming more and more relevant, especially if you're thinking of adopting open channel SSD use cases through CNS. CanSum has gone in detail through the ecosystem in Linux. And then we've covered

Starting point is 00:44:15 other tools, XNVMe, the work that we've done in QMU, and some of the RocksDB part. So I hope you enjoyed, and if you have any questions, you're very welcome to reach out to CanSum and to me directly, you know, I hope you enjoyed. And if you have any questions, you're very welcome to reach out to Canson and to me directly. And again, I encourage you to go and see Simon's talk, Klaus' talk,

Starting point is 00:44:32 and the rest of the Zona Storage track where we have folks from WB, SUSE, and Catalyst.io covering interesting topics from also QMU, Kernel, and different applications. So look forward to listening to your questions in the Slack, and different applications. So look forward to listening to your questions in the Slack, and thank you very much. Thanks for listening. If you have

Starting point is 00:44:50 questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at snea.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.

Storage Developer Conference - #142: ZNS: Enabling in-place Updates and Transparent High Queue-Depths

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.