Storage Developer Conference - #178: Key per IO - Fine Grain Encryption for Storage

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to SDC Podcast, episode number 178. Okay, welcome everyone. This is our 335 talk. We'll be talking about key pariah, fine grain encryption for storage. So my name is Frederick Knight. I'm from NetApp, and we have Festus from SolidEye. We'll be telling you about this topic this afternoon. Festus is going to start. He's going to share about some of the key operation and some of the information about the encryption part of it. And I'll be talking about the NVMe interfaces

Starting point is 00:01:13 to how this stuff is all going to work. So Festus, and he'll get started here. Thanks, Fred. So, as Fred touched on, we'll start with just an overview of the evolution of data arrest protection schemes. Then that should get us into key parallel architectural components. And we'll look at the benefits. But more importantly, we'll look at the latest updates that TCG Storage Workgroup have been making towards the standardization of key parallel. Once we get through that, Fred will introduce various key parallel use cases and then how NVMe interface is being modified to support this. All right, so to start, most of you are already familiar with general approaches toward on-source devices, their arrest protection scheme. You generally have some media encryption,

Starting point is 00:02:27 media encryption keys generated by the device that are then used to encrypt or decrypt user data, right, as it's being written to NAND or being read out of NAND. So that's the general building block. And on top of that, some of the recent technology, like OPPO, they've been built to add the layer of authentication to try to tie those media encryption keys, which are generated by the device,

Starting point is 00:02:57 to some outside password, user password. So that's, you know, general how the data risk protection work in today devices. So that works well especially for use case where you have contiguous range of LBA on a storage device and you want to associate that with a particular key and tie that to some user supply password. But as you can see on the left, it creates challenges if the number of your ranges increase because it means you have to manage more keys on the device which, you know, as the number of keys increases, now you have a more complicated scheme on how you protect

Starting point is 00:03:43 those keys on a search device. that makes the the search device itself more of a an appealing target for theft fundamentally because your keys are there and the data stays there so one of the things we've been trying to explore in TCG is figuring out ways we can improve on that architecture, maybe provide ways, especially for use case that may be comfortable with dealing with the key management of the keys themselves, provide some sort of an interface and method

Starting point is 00:04:17 that they can inject keys into a storage device, and leverage those keys to perform on-device user data encryption. So with the idea of externally managing media encryption keys, it introduced some excellent benefits in terms of now you can, instead of associating a range of ALBAs on-device with one key, you can actually associate a high-level object with keys. For instance, one set of key can be associated with objects across different devices

Starting point is 00:04:54 instead of the old architecture where you have only some LBAs that can be entirely coupled with one key. But that also means, you know, if you can manage media encryption keys externally, it means you can cryptoerase at a higher level, not just at LBA range level. You can cryptoerase at an object level.

Starting point is 00:05:17 That means be spending multiple storage device appliances. With external management of keys, it obviously simplifies your key management implementation of the device since you don't own the keys. So the audit process that Eric talked about earlier becomes a lot more simpler since you don't have the keys. So these are some of the key benefits that KeyPario at a high level provides. There are not, that you cannot, at least some of this, obviously,

Starting point is 00:05:53 the flexibility exists if you do software encryption. But, you know, this tries to extend that layer down into the storage device. So at high level, the way KeyPriority operates, you basically have some, you have a lay in the middle, so let's call that a KeyPriority host management application that sort of tries to relay the keys that may be owned by a particular application,

Starting point is 00:06:23 try to relay those keys to the storage device, and then select those keys to perform user data encryption on the device. So in that example, we have, say, you have your key-parry application talking to some key manager. In this instance, let's assume a key management service that may be hosted by some server. It has to generate some keys or retrieve some keys, and that key management service, external, obviously, to the SSD, may go ahead and create those keys and send it back to the application, and the application will then inject those keys into the SSD, and then within the SSD, obviously, the keys could be populated in the SSD controller key cache

Starting point is 00:07:10 for subsequent IA usage. So that's kind of the high-level model of how key-parry operates. In a storage workgroup, we've been basically taking this model and tried to find various architectural, even protocol elements that have been needed to support something like this. So there's also the NVMe component. We'll get into that.

Starting point is 00:07:37 Where we get into details, various architectural elements that are being added to NVMe command set to be able to support the key selection per namespace. From a TCG perspective, the protocol layering that we envision today is for key management, right now, we're gonna, for version.1, we'll start with KMIP.

Starting point is 00:08:04 KMIP is a fairly common key management protocol that allow exchange of keys between some host application and some key managers but then from a host to the storage perspective we're gonna have a couple of different protocols. One is what most of you are really familiar with, using a security send and receive to do some basic activation of the KEEPIO as an ASP like we have all for today. So you have that.

Starting point is 00:08:38 That's not going to change in terms of, you know, setting up which authorities are allowed to perform different configuration options. The new thing we're adding is obviously the interface to the KMIP protocol. So we're looking to have a KMIP protocol, the request message protocol, as a payload for TCG com packet. So this at least keeps existing software stack that manages output the same, while allowing quick integration with KMF

Starting point is 00:09:13 without introducing some of the older TCG problems like sessions. So, you know, as you can see, this is a quick, it's a stateless protocol, basically just strictly designed to inject keys into the storage device. You then have, obviously, on the NVMe side, you have extension to NVMe I commands to support selecting keys. And then TCG will still use protocol two, security protocol two, to do things like a TCG-specific reset, but also to do things like clearing of keys and in a particular key slots.

Starting point is 00:09:55 So that's how, therefore, what the protocol layering will look like once the standard comes out. From a key injection perspective, we actually did an interesting set of problems here. I want to get you through this a little bit. So one thing, when we started, one of the questions we had was, when you inject a key,

Starting point is 00:10:24 when you try to protect the keys, do you protect them from the key origination point, from the entity that owns the key, all the way to the storage device? Or do you trust the key by your host management application and protect the keys from that application to the storage device. As you can see, one of the decisions we made is basically to consider the interaction between the key manager, the entity that holds the keys, to the host application that's managing various storage devices. Those interactions are out of scope, will be out of scope with this standard.

Starting point is 00:11:05 Not only because we, you know, there may be some use cases with that application who want to know, you know, key with this key ID or want it to be in a particular namespace, so they may want to get some metadata on the keys, therefore enforcing a stricter traffic encryption from all the way to the key manager to the namespace, it may create some complications with those use cases. So for

Starting point is 00:11:34 version 1, the first thing we did was to focus on the interaction between the key prior host management application, and the actual SSD element. So, since there are many technologies that deal with the transport security, the first, we built a couple option. One was basically just to rely on those technology, like SPDM, Secure Sessions, or PCIe, since they give you full end-to-end link encryption

Starting point is 00:12:06 between your host and SSD traffic. Then key-priority traffic doesn't need to add any more, you know, complexity. You can just leverage those existing transport and send down the keys. So from the protocol perspective, the keys, you know, at least the first provisioning, the key encryption keys you're sending,

Starting point is 00:12:26 they're in plain text, but in practice they're not, since they're leveraging the protection provided by things like SPDM or PCIe. Subsequent key update, though, the standard obviously provides ways to perform authenticated key updates using NIST AES GCM, though, the standard app provides ways to perform authenticated key updates using NIST AES GCM since there you

Starting point is 00:12:49 can add some integrity protection on the ciphertext in addition to confidentiality. So the first set of keys you inject is basically keys that you pre-share that will help authenticate the next set of keys. So if you have to have a specific bit strength

Starting point is 00:13:08 on the wrapping keys? Don't they have to be higher for the strength of the wrapping? Well, the... Oh, yeah, yeah. Thank you. Thanks. So the question it's asking is the security strength of the wrapping keys,

Starting point is 00:13:24 does it have to be higher than the keys themselves that they're wrapping? Very good question. For, I believe we're considering a 256 across the board for the wrapping keys, also media encryption keys. So that should meet the security strength for for the wrapping keys and the data themselves they're encrypted by the keys wrapped by the wrapping keys that opens the question So, you know, for the first set of keys, as I mentioned, you set them up so you can use to authenticate the next set of keys. The other option we have, and all these are host configurable, you know, we looked at use case where you may not have

Starting point is 00:14:27 these link protection technologies available. You know, I'm not sure how many vendors have already link encryption shipping today or support for SPDM secure session. So for those use case, we provide an option to use an SDK transport, at least for provisioning of this first set of keys. But then subsequent key updates, you can use symmetric KES GCM. In this option, the assumption is that the device can be pre-provisioned. The source device can be pre-provisioned with some sort of public key certificates. You can then use the certificate to set up the key transport algorithm. The way it will work, basically,

Starting point is 00:15:12 the host application pulls the certificate from the device and then use the public key, register the certificate with the key manager, and then it can then tell the key manager to use that certificate to encrypt whatever key encryption key wants to send down and can just relate that to the source device in this setup of the first keys one of the future work we're considering is it you know as a briefly touched on, our current options, they target the protection of keys from the host application to the source device. But there are use case where you may want to establish

Starting point is 00:15:59 end-to-end protection of a key. For example, if the key is owned by a user who sits on top of that application, you example, if the key's owned by a user who sits on top of that application, you wanna protect the keys from that user all the way to the namespace. In addition to that, we've heard from, some of the feedback we've gotten is that not everyone want those key encryption keys

Starting point is 00:16:21 to persist on a storage device. Some use case will want basically the device to not have access to any of the keys when it loses power. So that's the concept of ephemeral kegs. We've been looking to figure out exactly how it will work. The idea is that you still want to be able to establish the protection of the keys, but still allowing the intermediate layer to be able to dictate which namespace will consume which keys. So this will probably not make it in a... it's not going to make it in a

Starting point is 00:16:56 version 1.0 but in subsequent versions. So once you've established your first keys, we said the key encryption keys, we can view them as the authentication keys. You then go to establishing the media encryption keys. So these are the keys that are going to be used for user data protection. So for these keys, the idea is that they can only be provisioned

Starting point is 00:17:22 by the entity that provisioned the authentication keys. So these will all be sent down encrypted using the previously provisioned key encryption keys or authentication keys. Obviously, it's very important that you have the integrity guarantees of the ciphertext for MEX. Since they are used for encryption of user data, you want to make sure that the keys you're using, they haven't been tampered with. At least if they've been tampered with, we should be able to detect that before those keys are accepted. One of the main properties of KeyPyro is that the media encryption keys then never exist on a storage device once they lose power. So on every boot the host will basically have to re-authenticate

Starting point is 00:18:07 by supplying MEX again protected by previously provisioned authentication keys. The other property that's nice that we baked in in version 1 is the ability to support replay protection. As you imagine, if I have an analyzer sitting on a traffic and record the previously inserted keys, it can just replay and get access to the data. So having a replay protection is just to make sure that

Starting point is 00:18:36 every time a key is inserted that we have a quick challenge test to be able to tell that the keys that are being injected are fresh, not old copies that have been previously injected. So this will come as well in the version 1.0. So I think at a high level for key provisioning schemes, this is what you can expect to see, at least in version 1.0.

Starting point is 00:19:07 Any questions before we jump into the interface interactions? In your presentation, you have mentioned ETIE. I'm assuming this will work over at the I was wondering what that means for fabrics I think it's a very good question who's wondering what that means for fabrics. I think it's a very good question,

Starting point is 00:19:48 and that inspired that option, second option. We need an option that doesn't rely on either SPDM secure station or PCI IDE, something that's native to the protocol that can provide the link protection. And that's what our option today okay thank you So just to elaborate a little bit on Mike's question there about fabrics, we do have authentication and TLS types of protocols. We have FCS SP. So there are ways to secure the transports for some of those fabric environments.

Starting point is 00:21:00 So basically the use case for Keeper IO is to be able to do much more fine-grained control. The methods we have with self-encrypting drives today are either for the entire drive or for predefined LBA ranges. That makes it hard for the application to be able to make sure that their data goes in a particular range of LBAs so that it gets encrypted with the appropriate key. So the idea here is that each IO operation, each read command, each write command, gets to select its own key that is unique for that data. So here's what that might look like. We have our green tenant, our yellow tenant, our blue tenant, and our purple tenant. And each of their data is being encrypted with their own key before it's stored out on the media so that they don't have to worry about where it goes using the existing SED technologies that TCG has provided. So they can then all be mixed out on that volume,

Starting point is 00:22:07 and you can more easily control the coming and going of that data. You can erase the data and get rid of it, because just by getting rid of the key, it makes it virtually impossible to be able to get that data back in its original form. So a couple places where this might be implemented. If we have some specific machines, the tenants on the left, our green, yellow, and red tenant,

Starting point is 00:22:38 and they send some operations to an array controller out of the fabric, the array controller could then have its own KMIP database that keeps track of each of those connections to ensure that each tenant has its own data encrypted and stored securely, so that if somebody wants to securely obliterate one of those tenants, they don't have to go find all of the data

Starting point is 00:23:00 that is associated with that tenant and scrub it somehow. They can just delete the key out of the KMIP database. They can delete the key out of each of those SSDs. And all of a sudden, all of the data for that tenant is gone without disturbing any of the other data for any of the other tenants. But right now, to be able to do that, you have to do it with the whole drive, or the array has to have previously divided up those LBAs into some fixed LBA ranges. So you can do the same thing in virtual machines, where each virtual machine gets its own key associated with it. It goes through the hypervisor, and the data keeps its association with that key,

Starting point is 00:23:46 and now it's stored out on whatever device it's being written to. And so, again, to get rid of any one of those individual machines, those virtual machines, you just have to get rid of the key that was being used by that machine. So there's several different other ways that this can be put to use. This just happens to be two of the common use cases. So to operate with this, the system is going to have to do some discovery just like it does with any feature of a device.

Starting point is 00:24:21 It's going to have to figure out how many key tags are available. So the key tag is the thing that's used with the I.O. You say, I want to use key number one, key number 10, key number 100. That's the key tag. So each I.O. doesn't have to pack an extra 512 bytes worth of key as part of the I.O. It's the key tag that references the key. So then the key gets stored separately in the device, and the appropriate key is then the associated key with that key tag is what gets used. So the host has to know how many of those a device can support. Sometimes different encryption algorithms have different requirements on the amount of data that they're encrypting.

Starting point is 00:25:04 And so there are maybe some granularity and alignment requirements that are going to be associated with different algorithms, and the host is going to need to know that. So some of this is obtained through the NVMe identify commands, and some of this is obtained through the TCG discovery commands, the security protocol send to request the information, and the security protocol receive to get the result back to be able to determine what the capabilities are that the device supports. So the first step, establishing the key encryption keys.

Starting point is 00:25:45 Festus talked about that. That's a negotiation that happens between the host and the device to get the key encryption keys there. And there's a couple of ways that that can happen that he talked about. And then the media encryption keys have to be inserted. And the key encryption keys are the things that are used to protect that. So we have a couple of different protocol-specific ways that the transports protect that information through either some of the PCI mechanisms, through some of the fabric encryption mechanisms, having a TLS channel established between the host and the storage device. So the media encryption keys get injected

Starting point is 00:26:26 using the TCG security send command so that the device can learn what the keys are. So here we have a very limited device. This device has, looks like, seven key tag slots in its key cache. Now, hopefully, nobody's going to build a device this small. We expect the smallest ones will probably be at least in the hundreds, maybe thousands of keys that will be supported by the device. But this key tag value is a 16-bit value, so it's possible that there can be 65,000 keys within the device at any given point in time.

Starting point is 00:27:04 And since that's all backed by a K-MIP database, there can be millions of keys, hundreds of millions of keys, however large your K-MIP database can grow to support all of those keys, and they can be staged through the cache on an as-needed basis. So in this case, we've got these seven key tags. There's their key tag number, and there's a 256-bit key that's been inserted into the key cache that's associated with that key tag. Now, you can see here, these media encryption keys aren't very creative. They're sort of sequential, just so it's kind of easy to see that they're different, and you can tell that they're sort of sequential just so it's kind of easy to see that they're different and you can tell that they're there so we've got different key tags we've got seven different values that the host can reference as it's going to be sending its IOs so we've injected these keys

Starting point is 00:28:00 and we've started using some of them and the host determines that it hasn't quite accurately predicted or maybe the the load has changed and it needs to change what's in the cache just like any cache the host is now managing this on a least recently used mechanism and it's going to kick out the ones that it doesn't need and And in this case, you see it's kept the first two there, the EF and the E0, but it's had to change some of the ones after that. And it's injected some new keys into those key tags. So we've got a K-NIP database

Starting point is 00:28:34 that has all of these extra keys in it that can't fit in the device at any one point in time. So we have to stage through them as we need them to do our different IOs. So in particular, the commands in NVMe that need to be aware of this are any command that's going to do IO. So the compare command, it's got to be able to make sure that the data can get decrypted to be compared. The copy command, it's going to have to decrypt the data,

Starting point is 00:29:07 copy the data, and then re-encrypt it again. The verify, read, write, writing zeros. Why do you need an encryption key on write zeros? Well, if you didn't, then everyone would be able to tell what data was zeros, and it wouldn't be protected. So all of the write zeros are going to make it look like something different depending on the encryption key. would be able to tell what data was zeros, and it wouldn't be protected. So all of the right zeros are going to make it look like something different depending on the encryption key. And the zone append command for ZNS devices

Starting point is 00:29:33 because that's an additional way to get data written. So here we have our key tag database, which has the keys in it, and we're going to be using some of those key tags to write some data to our device. So here's an example sequence where we've issued a write command. We're writing out to LBA 100. We're going to write eight blocks, and we're going to use key tag number one. So what key tag number one means is that we have a media encryption key in the cache that ends in that little EF value. Then we go on to write to LBA 200, and we're going to write 16 blocks this time using key tag 100.

Starting point is 00:30:14 So that's the key that ends in the value E6. Then we go to read the data from LBA 100, and the host now has to know where that key tag is. It notes that it was written with key tag number one, but it's possible that that key could now be in a different slot. It could have gotten unloaded by the time that it wanted to do the read, and it might have had to have gotten loaded back into a different slot because of the least recently used algorithms. In this case, that key happens to still be in slot number one, so key tag number one is used to do the read. That key is then used to do the decryption, and you get your data back. So in the next example, the host makes a mistake. I mean, the hosts aren't perfect. They have bugs. Sometimes applications have bugs, or sometimes human error comes in.

Starting point is 00:31:06 In this case, they try to read LBA 200. They know they want to read 16 blocks, but they specify key tag number one. So what does the device do? It looks into slot number one. It finds an encryption key, and it does a decryption. And so depending on the implementation and the device, there's a couple of things that can happen if you decrypt data with the wrong key you get back garbage that's just the way it is you give it the wrong key that's what you get so if the device in fact stored its ecc check values

Starting point is 00:31:40 prior to doing the encryption then when you do the decryption, you're going to find that that ECC check value doesn't match what it should, and you're going to think you got a media error because the device is going to see the mismatch on the ECC. So there's a number of bad things that can happen to a host that specifies the wrong media encryption key when it tries to do the read of the data. But the point is the data is protected, and if you don't specify the right key, you don't get the right data. So then the host notices its mistake. It reissues the read, this time with key tag 100. That happens to match what it was written with, and so now the host gets their data back.

Starting point is 00:32:26 We had a question, Randy. Who manages the... which media keys are currently loaded to the main cache? So the management of the media encryption keys in that key cache and that association with the key tag is managed by the host. The host indicates which media keys get loaded into which slots. The host can have those keys removed from a slot. It can

Starting point is 00:32:53 replace the key that's in that slot, so the host is in total control of managing that. Are the slots associated with the host or with the namespace? The slots are associated with the host or the namespace. It is per namespace. Each namespace has its own key cache that can be loaded by the host. The host is control of which keys go into which slots for each namespace. So if you have multiple hosts going into one namespace. If you have multiple hosts that are both going to access the same namespace, then the assumption is that those hosts are going to know about each other,

Starting point is 00:33:38 they're going to be coordinating their LBA accesses, and therefore they're also going to be coordinating the key management. Sort of like a cluster. When a cluster accesses the same namespace, the individual nodes have to have some level of coordination between them. So the same thing happens with this key slot management. Yes, if you want to restrict access to one at a time, then those reservations would apply here as well. You can say, you know, I want to... Reservation covers the whole namespace. Right.

Starting point is 00:34:22 So you can use reservations to lock out other hosts from being able to do things to that namespace. Right. So you can use reservations to lock out other hosts from being able to do things to that namespace. But that's not much of a shared environment when you do that. So we had another question. Are the commands

Starting point is 00:34:35 that manage the cache are made in the NVMe layer or in the TCG layer? Are those commands at the TCG layer or at the NVMe layer? That's sort of both because they are NVMe security send command and an NVMe security receive command, which then contains TCG content within the data buffers that flow back and forth.

Starting point is 00:35:02 So it's sort of both. Yeah, yeah. It works the same way that all the other TCG commands work. Yes. Question there. Yes, the key cache is limited. There's only so many slots. That's what the discovery process is.

Starting point is 00:35:27 If the hosts want to, they could agree to divide up the key cache. You get slot 1 to 100, and I get slot 101 to 200. So there's negotiations that have to happen in any shared environment. Yes. Yeah. I'm sorry. This is the first time I've heard of this concept. So I have a very, very basic idea. Yes. Yeah. and especially contrast it with the wider of the data. It uses something in the first data. So the question is about the value of the feature.

Starting point is 00:36:18 So yes, what you described works perfectly fine. You can have a host software implementation that encrypts the data, burning host CPU cycles to do that encryption work and send the data out in an encrypted form to the media and can keep track of all of that and manage all of that so that when it reads the data back in again,

Starting point is 00:36:35 it then additionally burns more host CPU cycles to do the decryption of that data. But if we can take all of that encryption and decryption work and we can shuffle it out to the device, then we've saved our, yes, it could be thought of as a form of computational storage that's using unique TCG style APIs.

Starting point is 00:37:00 Yeah, I mean, there's many other use cases for it. One of the ones that was mentioned at the beginning was the European GDPR requirements. I mean, you can imagine this going to a worst-case environment. Well, actually, for a storage vendor, it might be a really good solution, is that every person in the world gets their own key. Yeah, that tends to, that starts to make my mind explode with scaling issues,

Starting point is 00:37:30 especially if each person wants to have their own unique key for each of their own types of data. I want to have a key from, so when I want to be forgotten on Facebook, I can say, just forget my Facebook key. That's a lot of keys to manage. Two people with one picture.

Starting point is 00:37:57 Yes, question? And that's what you can do with Opal. How does this compare to today's methods? So today's methods, you can either self-encrypt the whole drive, or you can self-encrypt predefined fixed LBA ranges. And that number of predefined fixed LBA ranges is not terribly large, but now the host is stuck managing those pre-allocated spaces, and it's much less dynamic. If the first one that you set up, if they only store a little bit of data, you've already pre-allocated it. If the second range, if they start filling that up with data and you realize, oh my goodness, I don't have enough space, what do you do? There's not a whole lot you can do.

Starting point is 00:38:48 It's all preallocated. You're going to have to take a second range, which means you're now going to be using a separate key. You're going to have two keys for the same application. Just because it stored too much data and overflowed its first range, the difference between the static nature of predefined ranges and each I.O. gets to define its own. So it's fully dynamic. Is this one-sided? Does the initiator encrypt and the target just pick the encrypted data?

Starting point is 00:39:23 So do both ends have to implement it? So the question is, do both ends have to implement it? And the answer is no. The data is going over the wire just like it does with the self-encrypting device today. It goes in the clear. And then the device is doing the encryption. This is an extension of the SCD or the self-encrypting drive so that the data gets encrypted in the device when it receives it and the data gets decrypted before it is sent back to the host. So what the host is doing is the host is managing which key is being used for any given I.O.

Starting point is 00:40:02 And the device is then doing all the work to encrypt and store the data. This is still about protecting data at rest. Yeah, if your drive falls off the back of the truck, and somebody walks away with that drive, what do they get? Well, in the case of an SED today with Opal, they get the drive and they get the key. The device is the only one that knows that key in today's self-encrypting drives. In this case, they get the drive with the data, but they don't get any of the keys. The keys are all back in the host, because when the power is lost, the keys are lost too. Question? Yes.

Starting point is 00:41:02 So the question is about managing the depth of the NVMe queue. 65,000 queues with 65,000 IOs on each queue. How do you coordinate that with the management plane access for managing that cache? Right. And making sure that that happens correctly is the responsibility of the host. If you want to use a key, you have to send the admin or you have to send the security send command to get the key out there. And then you have to wait for that command to complete before you can use the key because that's the only way you know that the key is there to be used. There have been some proof of concepts that have been done with this, which for the size of the scale that was done in those proof of concepts, have been done with this, which for the size of the scale that was done

Starting point is 00:41:46 in those proof of concepts, there were no issues. I don't know if the scale, that obviously didn't scale all the way up to the 64K by 64K configurations, but at the scale that it was tested, it worked well. Today's self-encrypting drives, they run these encryption algorithms at line speed. They can still go full bore in both directions, reading and writing. And so the expectation, yeah. Well, the flash, self-encrypting device, they changed it from

Starting point is 00:42:22 self-encrypting disk to self-encrypting device. So the flash devices, the CPUs that are doing all of that stuff are still doing it at line speed. Correct. The key management was much less complex in the self-encrypting drives of today versus what we have here with the key per IO. Do you have a comment, Festus? So, yeah, I don't know if I can repeat all that, but some of the performance is dependent on those access patterns. The other thing we'll be getting the device that matches the applications is that if you

Starting point is 00:43:01 buy a rotating device and expect to get SSD performance out of it, you bought the wrong device. If you buy a device that has a key cache of 100 elements and you expect to run applications that need 1,000 elements in their cache, then you bought the wrong device. You should have bought one that had room for 1,000 if that's what you really expect to run. So there will be some variability in the market in terms of the number of slots that are in them. And so hopefully there won't be too much variability. I'm sure manufacturers don't want to have all the different SKUs to manage and stuff like that. So there will likely be a couple of points where they'll have fewer or larger number of slots.

Starting point is 00:43:50 So we've talked about some of these things already. The security send receive commands. We have the new protocol ID. We still have the TCG authentication discovery process. So we have new commands for injecting the keys, clearing the keys, replacing the keys in the cache, purging the cache. You can pull the plug, but you can also send a command to just tell the device to get rid of all the keys.

Starting point is 00:44:20 And we've got a number of different encryption algorithms that have been included in the standard that can be used. You can specify when you load the key. So the host is responsible for the key management. Everything to do with the key, the creation of the key, the loading it in the device, and the host will keep a copy of that around in its KMIB database. So just because you pull the plug on a device

Starting point is 00:44:42 doesn't mean the key went away. The key went away in the device, but the key still exists in the database. And if you really want to get rid of it, you have to get rid of it in both places. That's the host's responsibility. The host has to make sure the right keys are there at the right time, that all that gets coordinated, and the host, of course, has to deal with any mistakes it makes. So right now, there's a little bit of work left.

Starting point is 00:45:15 We're very close. Most of these documents are in their final review at this point, and so people are making some comments. We may have one more spin of them, but if you're involved in the committees, take a look at them. Because at this point, they're just a few weeks to months away from public release. The positional is, so when you load it in the drive, it's loaded into a slot. Well, that's up to the host. The host could overwrite a key in a previous slot that hadn't been used in a long time.

Starting point is 00:46:01 I'm not sure what you mean. Will the drive reject a duplicate key tag? I don't know what a duplicate key tag is. Mm-hmm. Yep. The slot 10 in the cache. Mm-hmm. Yep.

Starting point is 00:46:23 And just changes the value of the key in that slot. So there is a slot number 10? Yes. The slots are identified, and you load the contents of that slot. As far as denial of service attacks, you have to have an application that's gone through the authentication process

Starting point is 00:46:42 and knows how to access the key encryption keys to be able to get this stuff out over the wire, there's an awfully lot of other security layers that are going to make that pretty hard. So we're working on the NVMe protocol in the NVMexpress group and the TCG protocol in the TCG group. Yes?

Starting point is 00:47:16 It's associated with a namespace. And so if you have a PCI card with one namespace, it has one cache. If you have an array that is presenting 10 namespaces, it's going to have 10 caches per namespace. So the last thing is that we are using the TCG architecture with the security protocol in and out commands in SCSI, the security send and receive, and the SATA devices. So this would all be possible to port to those protocols as well. Right now, there's not a lot of interest.

Starting point is 00:47:52 There's been a couple of people in the SAS area that have started to ask the question about whether it would be appropriate. But there's, so far, NVMe only. So we'll have to see how that develops over time. Yes, Randy. Did you speak out of a thinking on why the key guesses associated with the namespace as opposed to with the key cases? The expectation, there were a couple of expectations

Starting point is 00:48:23 as we went through the use cases. Yeah, repeat the question. Why it was per namespace, why the key cache is per namespace instead of per controller. The assumption was that applications would be more per namespace than per controller. And there were arguments on both sides. And we just ended up deciding, yeah, it could have been a toss-up. Maybe, you know, I don't think it was that we just sort of let the loudest person win. But, yeah.

Starting point is 00:49:00 And we are officially out of time. So hopefully everybody's gotten their questions answered. And here's sort of the statements from TCG and NVMe about who they are. And we're out of time. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the Storage Developer community. For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.

Storage Developer Conference - #178: Key per IO - Fine Grain Encryption for Storage

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.