Storage Developer Conference - #111: SMB3 Landscape and Directions
Episode Date: October 14, 2019...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference.
The link to the slides is available in the show notes
at snea.org slash podcasts.
You are listening to STC Podcast, Episode 111.
I'm Matthew George, and this is Wen Xin,
both of us from Microsoft
he's been
taking over
a lot of the core
SMB and the chore too
so
development for the last
couple of years now
one year?
so he's been experimenting quite a bit with a lot of the transport layer stuff and the
lower layers of the SMB stack, and he'll be doing some of the performance work and some
of the stuff which he has done second half of the talk.
So I'm going to talk, give you a high-level summary of what's been going on and where things are moving and that kind of stuff.
So is this an upgrade?
Yeah.
Alright. So where is SMB3 today?
I know over the last 10 years or so, I think
me and David Kruse for a lot of people who have been here for the last several years,
we have been working on SMB3, trying to improve it, get it performing well in the data center, and so on.
So where do we see the maximum uptake of SMB3 today?
So the first and the most important one is a software-defined data center
or private cloud or whatever way you want to define it. SMB3 today. So the first and the most important one is a software-defined data center or
private cloud or whatever way you want to define it. And we have seen a lot of the features that
we have done in the last several years, multichannel and RDMA and better encryption
signing. Pretty much everything ties into data center workloads. And we have seen a lot of
adoption of SMB there, primarily in the scale-out
file server for applications, so to run your applications like databases, hypervisor, VMs,
that kind of stuff. And then, more recently, inside Microsoft, we have been using SMB as a
transport for several other application
workloads, and one of them is Storage Spaces Direct, which basically is a system designed
to cluster together disks on individual machines in a cluster and then export them over the
network so that other machines in the cluster
can see these disks in a shared pool.
And S2D, or Storage Spaces Direct,
uses SMB3 at transport,
and they rely on RDMA and multichannel
and a lot of the...
may not be so much of the failover capabilities,
but the network resilience portion of SMB3
as well as the high performance we offer
and the security we offer,
they use it extensively. And the second scenario which we have seen is to use SMB3 as a file
access protocol from guest to host, especially in the container scenarios where you have hypervisor
VM containers running on a Windows server,
and you want to access files stored on the host machine.
And they use SMB as a transport to access those files.
And it's not a regular TCP-based transport.
They use the VM bus transport, as they call it.
It's the transport which VMs use to communicate to each other,
and they run SMB over VM bus. And in fact, we have even done what we call pseudo-RDMA over this VM bus transport. RDMA is just a way of registering memory across so that it's accessible across
the network. So you could do the same thing in a virtual way over VM to host, too.
And then the other interesting scenario is Azure files or cloud
scale storage where we are seeing SMB3 being used to access files over the internet.
And aligning on these three directions, we have been prototyping a bunch of things.
So none of these are actually official or published,
and these days the Windows schedules
make it kind of unpredictable to plan these things.
So we take it as we go, and we prototype it,
and then things look good, we push it into a release.
That's the kind of new release model.
The first thing I want to kind of talk about
is SMB over QUIC,
and for those who are new to QUIC, QUIC is basically invented by Google as a transport
protocol for web applications, and we think we can use the QUIC transport as a way of
transporting SMB over the Internet, because it gives you a couple of nice properties.
And then Ven is pretty much going to talk about some of the changes
to the transform logic in SMB to incorporate new signing algorithms
and we have prototype compression and so on.
And Ven is going to talk about it.
Yes, question.
Just quickly, I mean, one of the main features
is multiple channels inside one connection.
How are you going to use that?
Question is, quick has support for multiple channels.
How are you going to use them?
I'll talk about it in a while.
All right, and these are a few minor updates
to the protocol from the last one year.
So the first is there's a new normalized name query, which was added to the protocol document.
I don't know if folks have noticed it, but this allows a client to query the full path name for a file in a normalized fashion,
so that it basically, if you have short names in the path, it will
expand them to long names and so on.
And this is something which is used by all these filters running on Windows, like antivirus
folks and stuff, to match path names.
So you get a unique path name for a file, essentially.
And if you do not support this, the thing gets really chatty, and the system actually opens each folder,
does a dir, and then builds up the path name, and it can get quite chatty. So that's why
we went ahead and did this. And the second one is improvements to directory caching.
This is not a wire-visible behavior. This is something that I did on the SMB client,
because the directory cache used to be able to support only small directories up to 64 kilobytes in size. So
it's not more than a few, maybe 100 entries or so. Now we can support larger directories
and we can enumerate directories in larger chunks to like one meg buffer. So things work
much better with the directory cache now.
And then the last thing, again, is not protocol visible.
When we run workloads like Storage Spaces Direct,
which is using SMB as a transport protocol for block storage,
a lot of the file system complexity is kind of removed from the picture
because it's just reading and writing blocks.
So we have put in some IO path improvements in the SMB client
to shortcut some of these things.
There are a whole bunch of complexities in the client systems
like locks, synchronization, all that kind of stuff.
So you don't need a lot of the stuff if you're're just doing disk raw disk io or smb so there were some good
gains to be had there to to reduce the path length and improve the latency of ios so what is
quick for so quick is basically a replacement or i wouldn't call it a replacement for TCP IP, but it's a replacement for SSL TLS,
and Google has been working on it since, I think, 2012 or 2014,
and they have it in their Chrome browser implemented as a library.
And it was designed for transporting HTTP traffic,
and a lot of the improvements here are kind of tied to HTTP sessions,
which are kind of short-lived and things like that.
So they have low-latency connection setup without,
you can do, like, zero RTT connection setup, too,
where TCP involves, like, you have to send a SYN,
then a response, and then the data transfer starts.
So with QUIC, you can actually transfer data right away
without waiting for an action from the peer.
So zero RTT connection set up, and then it's encrypted by default.
It uses the latest TLS 1.3, I believe, and later algorithms for encryption.
And it's all certificate-based and stuff.
And then there are a few things which it does above and beyond TCP,
and one is multiple streams. So you can set up a quick connection and then open up multiple streams
underneath it. And this allows you to do two things. One is to avoid these long queue depths
and head-of-line blocking, as you call it. When you queue up a large amount of data, TCP is an in-order transport,
so things get stuck behind one another,
and one packet loss somewhere in the middle
means that everything gets backlogged.
So multiple streams actually allow you
to recover from losses better,
recover from congestion better.
And the second thing is what they call
application layer protocol negotiations.
So you could use the same quick connection with the same four tuples. So you have the
local and remote IP addresses and the port numbers. And then on top of that, you could
multiplex the same connection for multiple applications. So you could run HTTPS traffic or SMB traffic
all on the same underlying four-tuple.
So you don't have to dedicate a port number,
which sometimes can help, especially with SMB.
And even on the server side,
at least on the Windows implementation,
you could have both HTTP and SMB.
For example, listen on QUIC 443, the SSL port,
and then it could get traffic demultiplexed by the stack.
And then there are a few other things like connection migration.
If you have a wireless and a wired link,
it can automatically switch between the two interfaces, cellular, wireless. And then error recovery,
I believe they have forward error correction. So intermittent packet losses, you don't need
to retransmit. They can reconstruct the packet on the fly. And then the last thing is it's
UDP-based library implementation,
so you don't need big kernel changes or anything of that sort.
You can have UDP sockets.
This will work.
It's just a library built on top of it.
And it's with IETF now in draft stage, but still experimental.
And the good part about this is that Google's experiments over the last few years have shown that it's quite NAT friendly. You would not expect UDP-based protocols to
be NAT friendly, but they found that it works well in about 93 percent of the cases. So
home NAS boxes are NAT routers, allow it through, you can punch holes through NAT routers allow it through. You can punch holes through NAT routers without any external configurations.
So it works well.
And right now it's software only.
So a lot of the work which is done by TCP these days in hardware on the NICs, offloaded to the NICs, it's all done in software.
So it is expensive.
And I even heard about talks about RDMA, people trying to implement RDMA over QUIC.
I don't think there's anything concrete there,
but maybe.
And there are interoperability concerns.
I know that the TLS,
some of the authentication and encryption stuff
is still not there yet.
At least I know the Windows implementation right now
is not interoperable with the Google one, but eventually we hope it'll get there.
And how do we bind? So this is just talking about our prototype. How do we bind QUIC to
SMB? How do we run SMB over QUIC? So I talked about the ALP and the application layer protocol negotiation.
So it's basically a string which identifies your application, and we just chose an application layer identifier for SMB. I believe it's called SMBQ or something of that sort.
And then so you can share the same port with HTTPS traffic. and right now, SMB has implementation of multi-channel outside
of QUIC. So we can do multiple TCP connections anyways. So I don't immediately see the value
of using multiple QUIC channels under the same connection, because we can, at least
at this point, we can do better because we can span interfaces
and we can use different types of transports and stuff.
So right now, QUIC is just used,
in our prototype, is just used as a TCP stack replacement
with a single channel.
There's nothing else other than the fact
that you get encryption.
And the biggest scenario, I think,
which might come out of this work is that
we are seeing SMB deployment in the cloud
and TCP 445 has become or at least
it's been shown to be a big pain with
comcap like all the
internet providers ISPs
lot of them block it so if we can get
SMB traffic to look like SSL traffic,
that works well. So that's something which I think is a good motivation for why we are doing this.
If we can get this working over the internet, secure TLS, secure stream, running SMB. That will work well.
Now, that's what we have for QUIC.
We have a prototype running.
Perf numbers didn't look good in the initial cut,
so I'm not even going to bother talking about it.
It's just an initial implementation,
but we have it working between a Windows server
and a Windows client.
And hopefully, once the Microsoft Quick library
is interoperable with the Google open source libraries,
then we can actually set up something
and then see how it works across implementations.
The second thing,
which more of the detail is going to be given by when,
but we talked in 2015, SDC, about performance of G-Mac versus, or AES-GCM versus AES-CCM.
CCM is the current signing algorithm in SMB 3.1,
and we know that GCM performs significantly better in terms of CPU usage.
And we want to change the signing algorithm in SMB3 to use GMAC,
or the HMAC version of GCM, AES-GCM.
And most CPUs have optimized instructions for GCM.
Now, the other design question is,
well, SMB signing, when it was done,
it was actually embedded into the SMB portion of the protocol.
It's actually in the SMB2 header, right?
But when we went ahead and did the encryption,
it was done as a layer above SMB2.
You have the transform header.
So this has been something
which we have been debating internally,
whether we want to just use the transform header
going forward for signing and encryption,
because if we make new algorithms,
if you want to add features and stuff,
it might be easier from a layering perspective
to keep it at
a separate layer rather than in the SMB2 header, but there's no evidence either way.
There are reasons why you could do it.
So one is better layering, and then the signing and encryption, they all use the same algorithms.
One, they use AES-GCM to sign versus encrypt.
It's pretty much the same kind of operations which happen under the cover, so unification helps.
But there is an extra cost of usually buffering the data, because if you do it at a separate
layer, the transform layer will have to receive the entire data, then it verifies the signature
and then hands it up over to the upper layer, while with the SMB2 built-in signing,
we do signing, at least on the client,
we do signing computations piecemeal.
So as and when we receive the data,
we compute the signature,
copy it into the user's buffer, and so on.
So it's a little bit, there's a copy, extra copy,
which might be involved in the transform path.
So we haven't fully evaluated which way we'll go,
but that's something that we're considering
to unify encryption and signing going forward. The other big thing is RDMA and signing.
So when initially people brought this up, we think, why would anybody spend like thousands
of dollars on their RDMA hardware and then turn on signing just to ruin their performance.
And it turns out that a lot of people just do this. And often it's beyond, it's their corporate mandated policies.
People pull down security policies from their domain controllers or whatever
and they go and force signing everywhere, including their data centers where they have RDMA networks.
So whether it's a good thing or not, people end up doing it.
And when you turn on signing on RDMA,
things which happen are not often understood well by the customer.
So what happens when you turn on signing on RDMA?
What people observe is that if you have just 10 gig Ethernet versus 10 gig RDMA,
TCP over 10 gig Ethernet performs better when signing is enabled over RDMA.
And that's kind of surprising to most people because when you use signing over RDMA,
direct placement is, or RDMA reads and writes, or RDMA direct placement is disabled.
So essentially you end up going through the RDMA send-receive path.
And RDMA send-receive buffers are usually small.
I believe it's like four kilobytes or something on each side.
And so RDMA software stack, the SMB direct layer has to do fragmentation and reassembly of all those data.
And TCP, most of these cases, it's offloaded on the NICs these days. So it ends up that you lose RDMA rights, and
then the additional cost of fragmentation reassembly just kills your perf, and people
end up complaining, badly saying, like, why is, why is performance so bad when RDMA is
enabled? So we thought about it, and we said, oh, okay, why does performance so bad when RDMA is enabled?
So we thought about it, and we said, oh, okay, people are not going to turn off signing,
so what can we do better?
So we thought, okay.
Oh, that wouldn't be good.
We said, okay, if you want to sign, well, we've got to provide signing.
Why can't we sign or encrypt the RDMA buffers separately?
Or rather, why do we want to disable the direct placement?
Let's say that we allow direct placement,
but we allow to allow, we sign the direct placement buffer or we modify the direct placement buffer with encrypted data
and then still use signing.
So that's something which Ven is going to talk about.
And I think there will be some benefits.
We haven't fully tested it out yet, but it's still a work in progress.
So, well, that's kind of the high-level picture that I want to share.
And then now, Wen will...
Hey, everyone.
So Matthew gave a great overview of the new transforms
that we're adding to SMB, we're planning to add.
So why don't we delve more into the documentation
and the detail side of the new transform.
So let's talk about signing first,
and if you're already familiar with how
encryption algorithms are negotiated
in the current SMB 3.1.1 protocol.
This is probably very familiar to you.
So for those of you who are less familiar, this is essentially for the client and server to know what signing algorithm is each other support.
And so before this proposal for signing, we don't have any negotiation going on
between the client and server.
So basically AES-128-GMAC is the first algorithm
that we are planning to make negotiable.
So what you do is the client basically sends
the algorithm account, for now it's just always one,
and the list of algorithm IDs,
and server will select one
just like it will select just one encryption algorithm
when you negotiate, send this negotiation context
back to the client and if you're into the details
of the negotiation context, I think the encryption's
negotiation context ID is 04 and so we used zero eight.
And yeah, so it would be better,
this would be more useful when we adopt
more algorithms than just one.
So the reason, a little bit about the background,
we're trying to switch to DMACC,
which Matthew sort of already covered.
Biggest reasons, of course, performance gain.
And so, and GMAC inherently also enforces a nonce,
so it basically natively prevents against replay attacks,
but in the SMB protocol, we actually have the message ID number,
which also serves as a nonce, so this is kind of duplicate,
but with algorithm native support it gives you
more flexibility in terms of designs.
And as Matthew mentioned, we moved the signature
verification logic from the SMB2 layer down to the SMB transform layer,
which it's maybe not benefiting everybody,
but if you're suspect to do as a text, for example,
faster signature rejection probably benefits you.
And so, like Matthew also mentioned,
since we are moving the logic down to,
including signing and verification logic
all down to the SMB transform layer,
it is beneficial to use the same transform header
with as encryption algorithms.
So that, so I think here it doesn't quite conflict
with each other because our current encryption algorithms
are authenticated encryption algorithms,
so which basically means your signing algorithms
and encryption algorithms are mutually redundant.
So in the MSSMB2 doc, we basically have a flag
slash encryption algorithm field for this transform header.
And the only, currently in 3.1.1,
the only defined value is 0.1, which is encryption.
And so now if you just signed your packet,
you use 0.2.
So I think Greg did...
You should say that's proposed.
That's not yet implemented.
Well, we have a slide in the last thing.
Everything can change.
So in 2015, I think Greg, an intern,
implemented a prototype for the GMAC. In 2015, I think Greg and an intern implemented
a prototype for the GMAC,
so the performance is really irresistible, I think.
So they used a 10G PBS link,
and if it's too small,
the blue bar on the left is the CMAC performance,
and yellow is GCM, and purple is the GMAQ variant of GCM.
So basically around 4x the throughput,
and around 2x the speedup for cycles per byte.
So pretty promising, I would say.
And that's, yeah.
Sorry, both of them use. Yeah, they're...
Oh, no.
Are you asking whether they are...
I think the question was,
do you use the same Intel AES and I
instruction set?
Yeah, I believe they do.
Because I think the new
GMAC, the
GCM
instructions came in later, I think like
2015 or something, right?
So I know
that they both use whatever
instructions are available. So GCM
performs significantly better. Yeah.
So, and like Matthew mentioned,
we have, we're implementing the overhaul for signing encryption over RDMA.
And for a quick overview, so basically
take SMB RDMA write as an example.
You would, so the client would want to register
an RDMA buffer and send a regular SMB
request write packet to the server
with the RDMA descriptor it gets from the RDMA layer
and the serve will do an RDMA pool,
which is this is a very simple version of without transform.
So with transform, for performance issues of course,
we want to keep using this RDMA pool mechanism
and on the RDMA read, that's RDMA push.
So what we do is essentially add a transform layer
between the client, or for write at least,
between the client and the buffer registration.
So what you do is to encrypt the buffer first,
and then you actually register the, or sign buffer,
and then you actually register the encrypted
or signed buffer, and you send a slightly modified
write request to the server to give it more information on the transforms,
and then you just register it
and let the server pull the buffer
as if it's just part of the RDMA protocol.
And then you, on the server side,
because you added additional information,
the SMB write request, you would have,
you would be able to decrypt or verify the buffer.
And how will we do this in the SMB2 write request?
So we, if you are familiar with the channel types
that's in the existing SMBD document,
we have a V1 which is pretty well defined
and a slight variant of V1 which is the remote,
with the remote invalidation of the registered buffer.
So we, with this change, we're introducing a new type, 03.
So this is what the current request write packet looks like.
So what we do is essentially with the transform,
to support the transform, we need signature and nouns,
that's the least.
We include the original message size
to sort of in case any transforms
would change the data size.
And then, so, so about the channel offset and channel lens,
the reason they're here is because,
if you recall the SMB2 request write, the structure,
they actually have a channel offset and channel lens
that's describing the offset
from the SMB2 header all the way to the beginning
of the RDMA descriptor.
So with this transform descriptor added in,
we actually modify the semantics of the,
you shouldn't call it channel offset and channel lens,
the offset and lens field in the request write
to actually point to the beginning
of the transform descriptor,
and then you would get the actual offset
from the transform descriptor to the RDMA descriptor
in the channel offset field.
And then, so, the last field, the channel,
is basically,
that's a v1 or v1 invalidate, not six.
So that's basically your old channel types.
We're just, so this new type is,
you can think of it as a pseudo type.
It kind of wraps the existing two types into that field.
So, which means your request write,
your request write structure, you're gonna to have 0, 3 as the channel field.
And you basically have a little range for signature and nonce,
and the offset are both, all of the three offsets
are calculated from the beginning of the transform
descriptor in this design.
So this is the write,
and the read is very similar,
except that since the client is responsible
for sending over the RDMA descriptor
in your read response,
there's just not that.
So basically your channel offset
length and
channel should be
reserved as far as I believe
so yeah
so that's
RDMA with transforms
yeah
so how does it interact with the hardware?
So how do you, does the group need a buffer for it?
Yeah, you sort of just encrypt or sign software,
and then, oh yeah.
So the question is,
so where does the encryption signing happen?
Are they in hardware,
or how do they interact with the hardware?
So I think, so the idea is,
since we don't have any hardware
to offload these calculations to,
so you would have to pay a little bit of overhead
to encrypt or sign them first,
and then you register the buffer,
the memory buffer, and then the rest of it should have basically
the same performance as a regular SMBD.
So,
and then onto compression, so,
so quick correction first.
I think I mistakenly said that the negotiation context ID
for encryption was 04, but that should actually be not 04.
I'll double check the document,
but it's actually zero four is for compression.
And so as you can see, negotiation pretty much
is the same with a tiny difference,
which is, so for signing encryption,
there's probably no good reason for you
to sort of negotiate down to a set of algorithms.
So if you want GMAC, use GMAC for all your traffic
in that connection, but you don't kind of mix them.
But with compression, it doesn't actually matter that much.
And especially beneficial if you have a set
of negotiated algorithms,
which I'll demonstrate later.
Because if you have, say, compressible data,
you probably want to use a more,
algorithm that gives you a better compression ratio
and vice versa.
So you can sort of dynamically,
the application can sort of dynamically adjust
based on how compressible the data is,
and I'll show that a little bit later.
So right now we support three compression algorithms,
and that's defining MXC8.
So, like I said, so the server would respond with
just N algorithms instead of just one.
And so, once negotiated, when you send packets,
we want you to be able to say,
hey, this packet is compressed with Express,
the next one is compressed with Express Huffman.
How do we do that?
You probably already guessed.
We are using another transform header,
but instead of using the 52-byte heavy version
of the regular transform header,
we designed a compact one
because compression essentially doesn't need much,
and we made it eight-byte aligned.
So we'll have a protocol ID
just like your transform header
and your SMB2 header will have original message size
because compression, you want it to change your message size, right?
So you want to keep it original.
And algorithm, like, so that's how you'll be able to tell,
you know, this package is compressed with this algorithm,
so the receiver will be able to decompress.
And so, with this new transform header,
there's basically an interop issue.
Well, with compression and signing or encryption,
you would introduce this issue anyway.
So, how we do this is we kind of layer them out.
And you would compress first
because if you encrypt first in compression,
or if you encrypt, when you compress and encrypt,
you won't compress first
because if you encrypt first,
your compression won't give you much benefit.
So which means the, your,
so you have your payload, including your SMB2 header,
and your compression will always be the inner header.
And then the out, outmost goes with your
SMB transform header, and although it might not matter
that much with signing plus compression,
but for conformity
benefits, we, for now we designed it to be
just this way. So basically sign and then, sorry, compress
and then sign.
So this is actually, with software compression,
well, you may or may not like it because, well, first of all, it consumes CPU.
So if you're running heavy workloads,
you probably don't want to spend most of your CPU cores
doing compressing work for your networking.
And when it's a software compression, for your networking.
And when it's a software compression, well it would be ideal if you can offload it
to some NIC that supports that.
But when it's just software compression,
the performance usually won't keep up
with too high bandwidth, so most use cases with OR experimentation is with,
if you're on a Wi-Fi network or any lower bandwidth network.
So we tested the performance with basically
two groups of data, pattern data, which means they're more compressible,
and random data, which are less compressible.
So the first test is using 100 megabit.
So left side, the gray ones are your basically
theoretical max of your bandwidth.
So with pattern data, well, the extreme form of pattern data
would be all zeros.
That's not all zeros.
All zeros would be much faster than that.
So that's sort of an outlier.
So pattern data is like A, B, C, D,
and so on and so forth.
So, which is probably extremely compressible as well.
So we actually got 4X,
the throughput
on a fairly old hardware.
It's probably from 09, the W3520.
So which means your CPU could probably push more.
And then random data, we got a 1.68x speedup.
So, and then we tested with 200 megabits
just to sort of show another point,
which is that you can see the trend that,
if you can see it on the left side,
pattern data, you have 100 as your baseline,
400 when compressed, I'm sorry.
And then when you have, when you double the bandwidth,
there's not much gain when you compress.
It's probably hitting the CPU or whatever bottleneck.
It's the bottleneck.
And same with on the right side, it's, I think with my personal experience in this setup,
it probably tops out between 200 megabit and 300,
which is still pretty good if you're on a Wi-Fi network,
and you don't mind using your CPU for this.
But for future, we might have to think about
offloading to mitigate the CPU usage
and the sort of low bandwidth keep up.
Yeah.
Can you, I'd have to look back
and think about that.
Is it possible that this...
You can make this adaptive
because you don't want anyone to have to say,
oh, I want to store network-friendly compression now.
You want it to automatically adapt to the test.
So what about making that happen?
So the question is,
can we make the compression algorithms adaptive, am I correct?
Not the algorithm, I'm saying turn on the compression.
So whether to use the compression or more, I would say, what algorithm is to use,
because essentially if you look at the Express, Express Huffman, and actually forget
about LZNT1, that's more legacy than that. So Express and Express Huffman is basically,
Express is best for if your data is medium compressible, and Express Huffman gives you a better compression ratio, but slightly worse throughput.
So I had that similar thought than you when I was looking at this,
but I personally haven't experimented with whether,
you know, if I have pattern data
and I use Express Huffman,
that would make more sense than if I use Express.
What I'm trying to say is there's probably
a pretty complex heuristic that we have to study here.
Yeah, I was just thinking about
if you know these things.
Yeah.
Well, speed is like,
speed is not doing complex things.
The thing about the, you mentioned the speed of the,
like the connection.
Say what?
Say what?
You can simply not do the press. You just don't throw the header of the transform header on. And send You can simply not compress.
You just don't throw the header of the transform header out.
And send it and not compress. So you kind of like,
you have to negotiate the algorithms up front,
but then you're doing that.
That is that.
Yeah.
Yeah, that's right.
That's right.
So you're basically toggling off on a packet by packet.
Yeah, that's right.
You can leave the transform header out.
Yes.
Yes.
So actually...
Yeah, yeah.
Yeah, I think...
Actually, one point I forgot to mention is the current heuristic,
there's one current heuristic, which is
if the compressed payload is bigger than heuristic, there's one current heuristic which is if the compressed payload is bigger
than the original payload, or you actually plus
the size of the transform header,
which is a pretty conservative heuristic,
you would not apply the transform header.
But I totally agree with you, there's a lot more play
that we can, you know, experiment with.
And I think the other thing is we didn't even, we, in fact, initially thought about making
it a shared property, but then we thought, well, it doesn't make any sense to make it
a shared property because it's totally dependent on the data that you have, right?
So it's got to be some intelligence on the client, as far as I think,
maybe based on the file type or the file extension or something of that sort.
And then maybe if you attempt on a particular file handle, you try to compress,
it doesn't compress very well, then just turn off compression for that file.
It's got to be something like that.
And the other
thing I wanted to call out is it's not
just for files.
In the data center, for example,
we use SMB3 to
live migrate VMs
from one node to another. The memory of a VM
we copy over from one node to the other.
Memory of VMs are mostly like
there are zeros. There are a whole bunch of zeros
in all of them so those things
do compress well and
in fact they have been asking us for
that kind of a feature so
in fact
I don't think
it's called out here we might even
add a specific like a tag or
something saying that it's just a buffer full of zeros
even rather than just trying to compress
it so it's just a buffer full of zeros, even rather than just trying to compress it. So it's a possibility.
All right.
So that's most of the content that we had.
And I just want to emphasize the first point again.
This is all work in progress.
There is no official protocol document which Tom has signed off on or anything.
So some of these things will change and can change, will change.
And when we are ready to ship it, obviously we'll put it into MSSMB.
And this is just a look ahead into what we are thinking about doing and what we're going
to productize possibly.
Thanks for listening.
If you have questions about the material presented
in this podcast, be sure and join our developers mailing list by sending an email to developers
dash subscribe at sneha.org. Here you can ask questions and discuss this topic further with
your peers in the storage developer community.
For additional information about the Storage Developer Conference,
visit www.storagedeveloper.org.