Storage Developer Conference - #111: SMB3 Landscape and Directions

Episode Date: October 14, 2019

...

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to STC Podcast, Episode 111. I'm Matthew George, and this is Wen Xin, both of us from Microsoft
Starting point is 00:00:46 he's been taking over a lot of the core SMB and the chore too so development for the last couple of years now one year?
Starting point is 00:01:02 so he's been experimenting quite a bit with a lot of the transport layer stuff and the lower layers of the SMB stack, and he'll be doing some of the performance work and some of the stuff which he has done second half of the talk. So I'm going to talk, give you a high-level summary of what's been going on and where things are moving and that kind of stuff. So is this an upgrade? Yeah. Alright. So where is SMB3 today? I know over the last 10 years or so, I think
Starting point is 00:01:43 me and David Kruse for a lot of people who have been here for the last several years, we have been working on SMB3, trying to improve it, get it performing well in the data center, and so on. So where do we see the maximum uptake of SMB3 today? So the first and the most important one is a software-defined data center or private cloud or whatever way you want to define it. SMB3 today. So the first and the most important one is a software-defined data center or private cloud or whatever way you want to define it. And we have seen a lot of the features that we have done in the last several years, multichannel and RDMA and better encryption signing. Pretty much everything ties into data center workloads. And we have seen a lot of
Starting point is 00:02:24 adoption of SMB there, primarily in the scale-out file server for applications, so to run your applications like databases, hypervisor, VMs, that kind of stuff. And then, more recently, inside Microsoft, we have been using SMB as a transport for several other application workloads, and one of them is Storage Spaces Direct, which basically is a system designed to cluster together disks on individual machines in a cluster and then export them over the network so that other machines in the cluster can see these disks in a shared pool.
Starting point is 00:03:08 And S2D, or Storage Spaces Direct, uses SMB3 at transport, and they rely on RDMA and multichannel and a lot of the... may not be so much of the failover capabilities, but the network resilience portion of SMB3 as well as the high performance we offer and the security we offer,
Starting point is 00:03:26 they use it extensively. And the second scenario which we have seen is to use SMB3 as a file access protocol from guest to host, especially in the container scenarios where you have hypervisor VM containers running on a Windows server, and you want to access files stored on the host machine. And they use SMB as a transport to access those files. And it's not a regular TCP-based transport. They use the VM bus transport, as they call it. It's the transport which VMs use to communicate to each other,
Starting point is 00:04:12 and they run SMB over VM bus. And in fact, we have even done what we call pseudo-RDMA over this VM bus transport. RDMA is just a way of registering memory across so that it's accessible across the network. So you could do the same thing in a virtual way over VM to host, too. And then the other interesting scenario is Azure files or cloud scale storage where we are seeing SMB3 being used to access files over the internet. And aligning on these three directions, we have been prototyping a bunch of things. So none of these are actually official or published, and these days the Windows schedules make it kind of unpredictable to plan these things.
Starting point is 00:04:54 So we take it as we go, and we prototype it, and then things look good, we push it into a release. That's the kind of new release model. The first thing I want to kind of talk about is SMB over QUIC, and for those who are new to QUIC, QUIC is basically invented by Google as a transport protocol for web applications, and we think we can use the QUIC transport as a way of transporting SMB over the Internet, because it gives you a couple of nice properties.
Starting point is 00:05:27 And then Ven is pretty much going to talk about some of the changes to the transform logic in SMB to incorporate new signing algorithms and we have prototype compression and so on. And Ven is going to talk about it. Yes, question. Just quickly, I mean, one of the main features is multiple channels inside one connection. How are you going to use that?
Starting point is 00:05:50 Question is, quick has support for multiple channels. How are you going to use them? I'll talk about it in a while. All right, and these are a few minor updates to the protocol from the last one year. So the first is there's a new normalized name query, which was added to the protocol document. I don't know if folks have noticed it, but this allows a client to query the full path name for a file in a normalized fashion, so that it basically, if you have short names in the path, it will
Starting point is 00:06:25 expand them to long names and so on. And this is something which is used by all these filters running on Windows, like antivirus folks and stuff, to match path names. So you get a unique path name for a file, essentially. And if you do not support this, the thing gets really chatty, and the system actually opens each folder, does a dir, and then builds up the path name, and it can get quite chatty. So that's why we went ahead and did this. And the second one is improvements to directory caching. This is not a wire-visible behavior. This is something that I did on the SMB client,
Starting point is 00:07:03 because the directory cache used to be able to support only small directories up to 64 kilobytes in size. So it's not more than a few, maybe 100 entries or so. Now we can support larger directories and we can enumerate directories in larger chunks to like one meg buffer. So things work much better with the directory cache now. And then the last thing, again, is not protocol visible. When we run workloads like Storage Spaces Direct, which is using SMB as a transport protocol for block storage, a lot of the file system complexity is kind of removed from the picture
Starting point is 00:07:40 because it's just reading and writing blocks. So we have put in some IO path improvements in the SMB client to shortcut some of these things. There are a whole bunch of complexities in the client systems like locks, synchronization, all that kind of stuff. So you don't need a lot of the stuff if you're're just doing disk raw disk io or smb so there were some good gains to be had there to to reduce the path length and improve the latency of ios so what is quick for so quick is basically a replacement or i wouldn't call it a replacement for TCP IP, but it's a replacement for SSL TLS,
Starting point is 00:08:26 and Google has been working on it since, I think, 2012 or 2014, and they have it in their Chrome browser implemented as a library. And it was designed for transporting HTTP traffic, and a lot of the improvements here are kind of tied to HTTP sessions, which are kind of short-lived and things like that. So they have low-latency connection setup without, you can do, like, zero RTT connection setup, too, where TCP involves, like, you have to send a SYN,
Starting point is 00:08:57 then a response, and then the data transfer starts. So with QUIC, you can actually transfer data right away without waiting for an action from the peer. So zero RTT connection set up, and then it's encrypted by default. It uses the latest TLS 1.3, I believe, and later algorithms for encryption. And it's all certificate-based and stuff. And then there are a few things which it does above and beyond TCP, and one is multiple streams. So you can set up a quick connection and then open up multiple streams
Starting point is 00:09:31 underneath it. And this allows you to do two things. One is to avoid these long queue depths and head-of-line blocking, as you call it. When you queue up a large amount of data, TCP is an in-order transport, so things get stuck behind one another, and one packet loss somewhere in the middle means that everything gets backlogged. So multiple streams actually allow you to recover from losses better, recover from congestion better.
Starting point is 00:10:01 And the second thing is what they call application layer protocol negotiations. So you could use the same quick connection with the same four tuples. So you have the local and remote IP addresses and the port numbers. And then on top of that, you could multiplex the same connection for multiple applications. So you could run HTTPS traffic or SMB traffic all on the same underlying four-tuple. So you don't have to dedicate a port number, which sometimes can help, especially with SMB.
Starting point is 00:10:39 And even on the server side, at least on the Windows implementation, you could have both HTTP and SMB. For example, listen on QUIC 443, the SSL port, and then it could get traffic demultiplexed by the stack. And then there are a few other things like connection migration. If you have a wireless and a wired link, it can automatically switch between the two interfaces, cellular, wireless. And then error recovery,
Starting point is 00:11:11 I believe they have forward error correction. So intermittent packet losses, you don't need to retransmit. They can reconstruct the packet on the fly. And then the last thing is it's UDP-based library implementation, so you don't need big kernel changes or anything of that sort. You can have UDP sockets. This will work. It's just a library built on top of it. And it's with IETF now in draft stage, but still experimental.
Starting point is 00:11:46 And the good part about this is that Google's experiments over the last few years have shown that it's quite NAT friendly. You would not expect UDP-based protocols to be NAT friendly, but they found that it works well in about 93 percent of the cases. So home NAS boxes are NAT routers, allow it through, you can punch holes through NAT routers allow it through. You can punch holes through NAT routers without any external configurations. So it works well. And right now it's software only. So a lot of the work which is done by TCP these days in hardware on the NICs, offloaded to the NICs, it's all done in software. So it is expensive. And I even heard about talks about RDMA, people trying to implement RDMA over QUIC.
Starting point is 00:12:25 I don't think there's anything concrete there, but maybe. And there are interoperability concerns. I know that the TLS, some of the authentication and encryption stuff is still not there yet. At least I know the Windows implementation right now is not interoperable with the Google one, but eventually we hope it'll get there.
Starting point is 00:12:51 And how do we bind? So this is just talking about our prototype. How do we bind QUIC to SMB? How do we run SMB over QUIC? So I talked about the ALP and the application layer protocol negotiation. So it's basically a string which identifies your application, and we just chose an application layer identifier for SMB. I believe it's called SMBQ or something of that sort. And then so you can share the same port with HTTPS traffic. and right now, SMB has implementation of multi-channel outside of QUIC. So we can do multiple TCP connections anyways. So I don't immediately see the value of using multiple QUIC channels under the same connection, because we can, at least at this point, we can do better because we can span interfaces and we can use different types of transports and stuff.
Starting point is 00:13:50 So right now, QUIC is just used, in our prototype, is just used as a TCP stack replacement with a single channel. There's nothing else other than the fact that you get encryption. And the biggest scenario, I think, which might come out of this work is that we are seeing SMB deployment in the cloud
Starting point is 00:14:09 and TCP 445 has become or at least it's been shown to be a big pain with comcap like all the internet providers ISPs lot of them block it so if we can get SMB traffic to look like SSL traffic, that works well. So that's something which I think is a good motivation for why we are doing this. If we can get this working over the internet, secure TLS, secure stream, running SMB. That will work well.
Starting point is 00:14:47 Now, that's what we have for QUIC. We have a prototype running. Perf numbers didn't look good in the initial cut, so I'm not even going to bother talking about it. It's just an initial implementation, but we have it working between a Windows server and a Windows client. And hopefully, once the Microsoft Quick library
Starting point is 00:15:06 is interoperable with the Google open source libraries, then we can actually set up something and then see how it works across implementations. The second thing, which more of the detail is going to be given by when, but we talked in 2015, SDC, about performance of G-Mac versus, or AES-GCM versus AES-CCM. CCM is the current signing algorithm in SMB 3.1, and we know that GCM performs significantly better in terms of CPU usage.
Starting point is 00:15:48 And we want to change the signing algorithm in SMB3 to use GMAC, or the HMAC version of GCM, AES-GCM. And most CPUs have optimized instructions for GCM. Now, the other design question is, well, SMB signing, when it was done, it was actually embedded into the SMB portion of the protocol. It's actually in the SMB2 header, right? But when we went ahead and did the encryption,
Starting point is 00:16:23 it was done as a layer above SMB2. You have the transform header. So this has been something which we have been debating internally, whether we want to just use the transform header going forward for signing and encryption, because if we make new algorithms, if you want to add features and stuff,
Starting point is 00:16:42 it might be easier from a layering perspective to keep it at a separate layer rather than in the SMB2 header, but there's no evidence either way. There are reasons why you could do it. So one is better layering, and then the signing and encryption, they all use the same algorithms. One, they use AES-GCM to sign versus encrypt. It's pretty much the same kind of operations which happen under the cover, so unification helps. But there is an extra cost of usually buffering the data, because if you do it at a separate
Starting point is 00:17:13 layer, the transform layer will have to receive the entire data, then it verifies the signature and then hands it up over to the upper layer, while with the SMB2 built-in signing, we do signing, at least on the client, we do signing computations piecemeal. So as and when we receive the data, we compute the signature, copy it into the user's buffer, and so on. So it's a little bit, there's a copy, extra copy,
Starting point is 00:17:38 which might be involved in the transform path. So we haven't fully evaluated which way we'll go, but that's something that we're considering to unify encryption and signing going forward. The other big thing is RDMA and signing. So when initially people brought this up, we think, why would anybody spend like thousands of dollars on their RDMA hardware and then turn on signing just to ruin their performance. And it turns out that a lot of people just do this. And often it's beyond, it's their corporate mandated policies. People pull down security policies from their domain controllers or whatever
Starting point is 00:18:19 and they go and force signing everywhere, including their data centers where they have RDMA networks. So whether it's a good thing or not, people end up doing it. And when you turn on signing on RDMA, things which happen are not often understood well by the customer. So what happens when you turn on signing on RDMA? What people observe is that if you have just 10 gig Ethernet versus 10 gig RDMA, TCP over 10 gig Ethernet performs better when signing is enabled over RDMA. And that's kind of surprising to most people because when you use signing over RDMA,
Starting point is 00:19:02 direct placement is, or RDMA reads and writes, or RDMA direct placement is disabled. So essentially you end up going through the RDMA send-receive path. And RDMA send-receive buffers are usually small. I believe it's like four kilobytes or something on each side. And so RDMA software stack, the SMB direct layer has to do fragmentation and reassembly of all those data. And TCP, most of these cases, it's offloaded on the NICs these days. So it ends up that you lose RDMA rights, and then the additional cost of fragmentation reassembly just kills your perf, and people end up complaining, badly saying, like, why is, why is performance so bad when RDMA is
Starting point is 00:19:44 enabled? So we thought about it, and we said, oh, okay, why does performance so bad when RDMA is enabled? So we thought about it, and we said, oh, okay, people are not going to turn off signing, so what can we do better? So we thought, okay. Oh, that wouldn't be good. We said, okay, if you want to sign, well, we've got to provide signing. Why can't we sign or encrypt the RDMA buffers separately? Or rather, why do we want to disable the direct placement?
Starting point is 00:20:15 Let's say that we allow direct placement, but we allow to allow, we sign the direct placement buffer or we modify the direct placement buffer with encrypted data and then still use signing. So that's something which Ven is going to talk about. And I think there will be some benefits. We haven't fully tested it out yet, but it's still a work in progress. So, well, that's kind of the high-level picture that I want to share. And then now, Wen will...
Starting point is 00:20:49 Hey, everyone. So Matthew gave a great overview of the new transforms that we're adding to SMB, we're planning to add. So why don't we delve more into the documentation and the detail side of the new transform. So let's talk about signing first, and if you're already familiar with how encryption algorithms are negotiated
Starting point is 00:21:22 in the current SMB 3.1.1 protocol. This is probably very familiar to you. So for those of you who are less familiar, this is essentially for the client and server to know what signing algorithm is each other support. And so before this proposal for signing, we don't have any negotiation going on between the client and server. So basically AES-128-GMAC is the first algorithm that we are planning to make negotiable. So what you do is the client basically sends
Starting point is 00:21:57 the algorithm account, for now it's just always one, and the list of algorithm IDs, and server will select one just like it will select just one encryption algorithm when you negotiate, send this negotiation context back to the client and if you're into the details of the negotiation context, I think the encryption's negotiation context ID is 04 and so we used zero eight.
Starting point is 00:22:28 And yeah, so it would be better, this would be more useful when we adopt more algorithms than just one. So the reason, a little bit about the background, we're trying to switch to DMACC, which Matthew sort of already covered. Biggest reasons, of course, performance gain. And so, and GMAC inherently also enforces a nonce,
Starting point is 00:22:57 so it basically natively prevents against replay attacks, but in the SMB protocol, we actually have the message ID number, which also serves as a nonce, so this is kind of duplicate, but with algorithm native support it gives you more flexibility in terms of designs. And as Matthew mentioned, we moved the signature verification logic from the SMB2 layer down to the SMB transform layer, which it's maybe not benefiting everybody,
Starting point is 00:23:33 but if you're suspect to do as a text, for example, faster signature rejection probably benefits you. And so, like Matthew also mentioned, since we are moving the logic down to, including signing and verification logic all down to the SMB transform layer, it is beneficial to use the same transform header with as encryption algorithms.
Starting point is 00:24:01 So that, so I think here it doesn't quite conflict with each other because our current encryption algorithms are authenticated encryption algorithms, so which basically means your signing algorithms and encryption algorithms are mutually redundant. So in the MSSMB2 doc, we basically have a flag slash encryption algorithm field for this transform header. And the only, currently in 3.1.1,
Starting point is 00:24:35 the only defined value is 0.1, which is encryption. And so now if you just signed your packet, you use 0.2. So I think Greg did... You should say that's proposed. That's not yet implemented. Well, we have a slide in the last thing. Everything can change.
Starting point is 00:24:58 So in 2015, I think Greg, an intern, implemented a prototype for the GMAC. In 2015, I think Greg and an intern implemented a prototype for the GMAC, so the performance is really irresistible, I think. So they used a 10G PBS link, and if it's too small, the blue bar on the left is the CMAC performance, and yellow is GCM, and purple is the GMAQ variant of GCM.
Starting point is 00:25:30 So basically around 4x the throughput, and around 2x the speedup for cycles per byte. So pretty promising, I would say. And that's, yeah. Sorry, both of them use. Yeah, they're... Oh, no. Are you asking whether they are... I think the question was,
Starting point is 00:26:20 do you use the same Intel AES and I instruction set? Yeah, I believe they do. Because I think the new GMAC, the GCM instructions came in later, I think like 2015 or something, right?
Starting point is 00:26:37 So I know that they both use whatever instructions are available. So GCM performs significantly better. Yeah. So, and like Matthew mentioned, we have, we're implementing the overhaul for signing encryption over RDMA. And for a quick overview, so basically take SMB RDMA write as an example.
Starting point is 00:27:11 You would, so the client would want to register an RDMA buffer and send a regular SMB request write packet to the server with the RDMA descriptor it gets from the RDMA layer and the serve will do an RDMA pool, which is this is a very simple version of without transform. So with transform, for performance issues of course, we want to keep using this RDMA pool mechanism
Starting point is 00:27:49 and on the RDMA read, that's RDMA push. So what we do is essentially add a transform layer between the client, or for write at least, between the client and the buffer registration. So what you do is to encrypt the buffer first, and then you actually register the, or sign buffer, and then you actually register the encrypted or signed buffer, and you send a slightly modified
Starting point is 00:28:23 write request to the server to give it more information on the transforms, and then you just register it and let the server pull the buffer as if it's just part of the RDMA protocol. And then you, on the server side, because you added additional information, the SMB write request, you would have, you would be able to decrypt or verify the buffer.
Starting point is 00:28:51 And how will we do this in the SMB2 write request? So we, if you are familiar with the channel types that's in the existing SMBD document, we have a V1 which is pretty well defined and a slight variant of V1 which is the remote, with the remote invalidation of the registered buffer. So we, with this change, we're introducing a new type, 03. So this is what the current request write packet looks like.
Starting point is 00:29:35 So what we do is essentially with the transform, to support the transform, we need signature and nouns, that's the least. We include the original message size to sort of in case any transforms would change the data size. And then, so, so about the channel offset and channel lens, the reason they're here is because,
Starting point is 00:29:57 if you recall the SMB2 request write, the structure, they actually have a channel offset and channel lens that's describing the offset from the SMB2 header all the way to the beginning of the RDMA descriptor. So with this transform descriptor added in, we actually modify the semantics of the, you shouldn't call it channel offset and channel lens,
Starting point is 00:30:23 the offset and lens field in the request write to actually point to the beginning of the transform descriptor, and then you would get the actual offset from the transform descriptor to the RDMA descriptor in the channel offset field. And then, so, the last field, the channel, is basically,
Starting point is 00:30:45 that's a v1 or v1 invalidate, not six. So that's basically your old channel types. We're just, so this new type is, you can think of it as a pseudo type. It kind of wraps the existing two types into that field. So, which means your request write, your request write structure, you're gonna to have 0, 3 as the channel field. And you basically have a little range for signature and nonce,
Starting point is 00:31:15 and the offset are both, all of the three offsets are calculated from the beginning of the transform descriptor in this design. So this is the write, and the read is very similar, except that since the client is responsible for sending over the RDMA descriptor in your read response,
Starting point is 00:31:40 there's just not that. So basically your channel offset length and channel should be reserved as far as I believe so yeah so that's RDMA with transforms
Starting point is 00:32:01 yeah so how does it interact with the hardware? So how do you, does the group need a buffer for it? Yeah, you sort of just encrypt or sign software, and then, oh yeah. So the question is, so where does the encryption signing happen? Are they in hardware,
Starting point is 00:32:27 or how do they interact with the hardware? So I think, so the idea is, since we don't have any hardware to offload these calculations to, so you would have to pay a little bit of overhead to encrypt or sign them first, and then you register the buffer, the memory buffer, and then the rest of it should have basically
Starting point is 00:32:49 the same performance as a regular SMBD. So, and then onto compression, so, so quick correction first. I think I mistakenly said that the negotiation context ID for encryption was 04, but that should actually be not 04. I'll double check the document, but it's actually zero four is for compression.
Starting point is 00:33:26 And so as you can see, negotiation pretty much is the same with a tiny difference, which is, so for signing encryption, there's probably no good reason for you to sort of negotiate down to a set of algorithms. So if you want GMAC, use GMAC for all your traffic in that connection, but you don't kind of mix them. But with compression, it doesn't actually matter that much.
Starting point is 00:34:03 And especially beneficial if you have a set of negotiated algorithms, which I'll demonstrate later. Because if you have, say, compressible data, you probably want to use a more, algorithm that gives you a better compression ratio and vice versa. So you can sort of dynamically,
Starting point is 00:34:22 the application can sort of dynamically adjust based on how compressible the data is, and I'll show that a little bit later. So right now we support three compression algorithms, and that's defining MXC8. So, like I said, so the server would respond with just N algorithms instead of just one. And so, once negotiated, when you send packets,
Starting point is 00:34:53 we want you to be able to say, hey, this packet is compressed with Express, the next one is compressed with Express Huffman. How do we do that? You probably already guessed. We are using another transform header, but instead of using the 52-byte heavy version of the regular transform header,
Starting point is 00:35:13 we designed a compact one because compression essentially doesn't need much, and we made it eight-byte aligned. So we'll have a protocol ID just like your transform header and your SMB2 header will have original message size because compression, you want it to change your message size, right? So you want to keep it original.
Starting point is 00:35:35 And algorithm, like, so that's how you'll be able to tell, you know, this package is compressed with this algorithm, so the receiver will be able to decompress. And so, with this new transform header, there's basically an interop issue. Well, with compression and signing or encryption, you would introduce this issue anyway. So, how we do this is we kind of layer them out.
Starting point is 00:36:05 And you would compress first because if you encrypt first in compression, or if you encrypt, when you compress and encrypt, you won't compress first because if you encrypt first, your compression won't give you much benefit. So which means the, your, so you have your payload, including your SMB2 header,
Starting point is 00:36:29 and your compression will always be the inner header. And then the out, outmost goes with your SMB transform header, and although it might not matter that much with signing plus compression, but for conformity benefits, we, for now we designed it to be just this way. So basically sign and then, sorry, compress and then sign.
Starting point is 00:36:56 So this is actually, with software compression, well, you may or may not like it because, well, first of all, it consumes CPU. So if you're running heavy workloads, you probably don't want to spend most of your CPU cores doing compressing work for your networking. And when it's a software compression, for your networking. And when it's a software compression, well it would be ideal if you can offload it to some NIC that supports that.
Starting point is 00:37:34 But when it's just software compression, the performance usually won't keep up with too high bandwidth, so most use cases with OR experimentation is with, if you're on a Wi-Fi network or any lower bandwidth network. So we tested the performance with basically two groups of data, pattern data, which means they're more compressible, and random data, which are less compressible. So the first test is using 100 megabit.
Starting point is 00:38:15 So left side, the gray ones are your basically theoretical max of your bandwidth. So with pattern data, well, the extreme form of pattern data would be all zeros. That's not all zeros. All zeros would be much faster than that. So that's sort of an outlier. So pattern data is like A, B, C, D,
Starting point is 00:38:35 and so on and so forth. So, which is probably extremely compressible as well. So we actually got 4X, the throughput on a fairly old hardware. It's probably from 09, the W3520. So which means your CPU could probably push more. And then random data, we got a 1.68x speedup.
Starting point is 00:39:04 So, and then we tested with 200 megabits just to sort of show another point, which is that you can see the trend that, if you can see it on the left side, pattern data, you have 100 as your baseline, 400 when compressed, I'm sorry. And then when you have, when you double the bandwidth, there's not much gain when you compress.
Starting point is 00:39:32 It's probably hitting the CPU or whatever bottleneck. It's the bottleneck. And same with on the right side, it's, I think with my personal experience in this setup, it probably tops out between 200 megabit and 300, which is still pretty good if you're on a Wi-Fi network, and you don't mind using your CPU for this. But for future, we might have to think about offloading to mitigate the CPU usage
Starting point is 00:40:11 and the sort of low bandwidth keep up. Yeah. Can you, I'd have to look back and think about that. Is it possible that this... You can make this adaptive because you don't want anyone to have to say, oh, I want to store network-friendly compression now.
Starting point is 00:40:35 You want it to automatically adapt to the test. So what about making that happen? So the question is, can we make the compression algorithms adaptive, am I correct? Not the algorithm, I'm saying turn on the compression. So whether to use the compression or more, I would say, what algorithm is to use, because essentially if you look at the Express, Express Huffman, and actually forget about LZNT1, that's more legacy than that. So Express and Express Huffman is basically,
Starting point is 00:41:13 Express is best for if your data is medium compressible, and Express Huffman gives you a better compression ratio, but slightly worse throughput. So I had that similar thought than you when I was looking at this, but I personally haven't experimented with whether, you know, if I have pattern data and I use Express Huffman, that would make more sense than if I use Express. What I'm trying to say is there's probably a pretty complex heuristic that we have to study here.
Starting point is 00:41:51 Yeah, I was just thinking about if you know these things. Yeah. Well, speed is like, speed is not doing complex things. The thing about the, you mentioned the speed of the, like the connection. Say what?
Starting point is 00:42:02 Say what? You can simply not do the press. You just don't throw the header of the transform header on. And send You can simply not compress. You just don't throw the header of the transform header out. And send it and not compress. So you kind of like, you have to negotiate the algorithms up front, but then you're doing that. That is that. Yeah.
Starting point is 00:42:14 Yeah, that's right. That's right. So you're basically toggling off on a packet by packet. Yeah, that's right. You can leave the transform header out. Yes. Yes. So actually...
Starting point is 00:42:30 Yeah, yeah. Yeah, I think... Actually, one point I forgot to mention is the current heuristic, there's one current heuristic, which is if the compressed payload is bigger than heuristic, there's one current heuristic which is if the compressed payload is bigger than the original payload, or you actually plus the size of the transform header, which is a pretty conservative heuristic,
Starting point is 00:42:54 you would not apply the transform header. But I totally agree with you, there's a lot more play that we can, you know, experiment with. And I think the other thing is we didn't even, we, in fact, initially thought about making it a shared property, but then we thought, well, it doesn't make any sense to make it a shared property because it's totally dependent on the data that you have, right? So it's got to be some intelligence on the client, as far as I think, maybe based on the file type or the file extension or something of that sort.
Starting point is 00:43:32 And then maybe if you attempt on a particular file handle, you try to compress, it doesn't compress very well, then just turn off compression for that file. It's got to be something like that. And the other thing I wanted to call out is it's not just for files. In the data center, for example, we use SMB3 to
Starting point is 00:43:54 live migrate VMs from one node to another. The memory of a VM we copy over from one node to the other. Memory of VMs are mostly like there are zeros. There are a whole bunch of zeros in all of them so those things do compress well and in fact they have been asking us for
Starting point is 00:44:09 that kind of a feature so in fact I don't think it's called out here we might even add a specific like a tag or something saying that it's just a buffer full of zeros even rather than just trying to compress it so it's just a buffer full of zeros, even rather than just trying to compress it. So it's a possibility.
Starting point is 00:44:28 All right. So that's most of the content that we had. And I just want to emphasize the first point again. This is all work in progress. There is no official protocol document which Tom has signed off on or anything. So some of these things will change and can change, will change. And when we are ready to ship it, obviously we'll put it into MSSMB. And this is just a look ahead into what we are thinking about doing and what we're going
Starting point is 00:44:59 to productize possibly. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers dash subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.