Storage Developer Conference - #10: Linux SMB3 and pNFS - Shaping the Future of Network File Systems
Episode Date: June 6, 2016...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC
Podcast. Every week, the SDC Podcast presents important technical topics to the developer
community. Each episode is hand-selected by the SNEA Technical Council from the presentations
at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast.
You are listening to SDC Podcast Episode 10.
Today we hear from Stephen French, Principal System Engineer with the Samba team and Primary Data,
as he presents Linux SMB3 and PNFS from the 2015 Storage Developer Conference.
Okay, so today I'd like to talk about SMB3 and the newest versions of NFS,
NFS version 4.2, and some of the new layouts that have been proposed for PNFS.
As I said, I've been involved in development of NAS stuff since early years at IBM on both
NFS and SMB. And I found this very interesting
meeting Dr. Barry Feigenbaum who in 1984 as a young
PhD student was inventing some of the stuff while others at Sun were
inventing the NFS stuff. So why are we here?
How many years later? 30 years later? This
is very interesting stuff that we've evolved and improved so much over these years. I maintain
the Linux SIFS client. One of my coworkers, Tron, maintains the Linux NFS client. The
SIFS client allows you to access SMB3 servers. Obviously you've seen this. This is a Mac, right?
In making this presentation,
I had to use my code to get stuff back and forth to the Mac.
But also, obviously,
I think all of you are aware of Windows
and the enormous amount of
very exciting things that Windows has done with SMB3.
But first we should talk a little bit about NAS.
Many of you guys were here at the presentation right before this.
The presentation right before this was about iSCSI, ICER, right?
This is not NAS, right?
So why do we care about NAS?
And why do we care about these old 31-year-old protocols?
So we have some questions we have to address first.
And to answer those, we kind of have to, like, why do we care about file systems?
Well, it's over 50 years, right?
Remember Multics?
I don't think any of you guys were around then, right?
The amount of unstructured data is staggering.
Now, I have kids. I have three kids. Different ages.
College down to 14 years old. The amount of unstructured data they create
each day is frightening. And I think any of you guys,
if you look at their laptops, will be sort of startled how much of that stuff is created.
So, this is just staggering
amount. And NAS does very well. But all of this stuff, whether you're using NAS or not,
depends at some level on file systems. You know, Facebook, Google, whatever.
This stuff gets stored on file systems. When you go to conferences like Vault,
or I enjoyed going to the Linux File System Summit again
the last few years, it's really interesting to hear the kind of challenges
in file systems
that still 50-something years later we're struggling with.
But why do we care about NAS instead of SAN? I'm prejudiced. I've been doing this
for a long time. But the ownership information, the better security,
it's easier to set up. Most of us don't want to deal with lungs.
Most of us don't want to deal with configuring some of the bizarre stuff you have to for
SAN. And then you get this application access info
stuff, this patterns that you can optimize. In NFS, the metadata patterns you can use
to do really cool things and some of the things that NetApp and EMC and companies like mine,
Primary Data, are doing, trying to understand how to move data around.
This is a lot easier when you actually see the access patterns.
And of course, nice features like encryption and compression.
I'm sure the guys in the cloud want everything encrypted, right?
This is a lot easier on NAS than it is on SAN.
And policy setting is easier.
I borrowed this slide here from, you can see the link below for more information, but it's just amazing how much unstructured file system related stuff, even in the internet, how much of that's getting sent in NAS backends.
But then the obvious question is why Linux, right? I've got a Mac here, I work on Linux, we've got Windows people out there, we have people from multiple NAS vendors. But why do we talk about Linux?
Well, 90,000 change sets
last year. 90,000.
This is not one line change, right? 90,000 sets of changes. That's huge.
Now, the development pace looks like
it's going up slightly, not going down for
Linux. So it's fun when I go to the Linux file system summit and meet a lot of these
developers, very bright guys, Ted So, Christoph Helwig, others. It's a really talented community.
Now if you just look at the file system, over 5,000 changes in the last year.
This is just the file system.
Over 1,000 just in the last point release, 4.2.
Over 1,000.
So we have almost a million lines
of file system code, which is very deceptive.
Linux is incredibly terse.
People send me patches every month.
Remove this, clean up this, use this helper function, they add some helper function, shrink
ten lines of code. It's very terse code, and yet still quite large. Over 1200 developers
per release. That's a big community. Lots of people. Lots of eyes. And good processes and tools. And of course, look at all those file systems.
And here's a picture. I'm buried in there somewhere.
But this is just the file system
experts and MM experts.
That's a really talented group. So, why Linux? Obviously
we have a very talented group. So, why Linux? Obviously we have a very talented group.
Now, what are the most active things we do in Linux?
Well, surprise, surprise.
NAS is really high up there.
So the majority of file system activity in Linux is driven by six file systems.
There they are.
And guess what?
NFS server, NFS client, CIFs are up there. Top six.
And, you know, we have a ButterFS talk. I think that just went on
is actually going on right now. That is the most
active file system and has been for a number of years. So many people
dis ButterFS because it's still evolving. But ButterFS is getting a lot of
love, a lot of activity.
EXT4, as well known as it is, gets less than XFS.
So it's interesting to see where the actual activity has been going on year after year.
NFS server, NFS client, SIFs, ButterFS, XFS, and EXT4.
Interestingly, the NFS server activity, largely driven by Jeff Layton and others in my company,
but also a couple others,
recently the NFS server activity over the last year and a half has skyrocketed, has increased a lot.
Bruce Fields has a nice presentation,
a Red Hat server lead for NFS,
has a good presentation on that from the Vault conference this year.
They can talk about some of the NFS server improvements.
So we've had more than 30 year of improvements.
Impressive protocols.
I think you all are running a variety of laptops and devices,
NFS or SMBs available, and most or all of them.
The NAS server support both, they'll support both.
They're very well tested,
they're understood reasonably well.
And, of course, they're more common than all of the cluster
and other network file systems combined.
So really we're down to NFS v3 versus,
sorry, NFS v4.something versus SMB3.
So what do we have?
Our current kernel.
This is, well, the VMs here are running 4.3 RC1.
I run, when I do my presentations and testing, I typically am running, this is two weeks
old, right? A week old. 4.3 RC1 kernel. Thirteen months ago we had a shuffling zombie juror,
for those of you who like John Grisham novels. But that was 316.
You know, it's kind of cool to look at performance.
I get kind of excited looking at this stuff because it seems like even after many years of doing this,
you can still be surprised.
You know, SMB3 is quite fast.
Jeff Layton a few years ago
and Pavel Shalovsky also two years ago and last year
did some really good stuff for increasing SMB3 performance.
But in addition, there are some nice SMB3 features, multi-crediting and the larger I.O.
sizes that really help Linux in certain very easy to understand, very common workloads
and large write performance. Here's a very simple example. So these are VMs on this box.
I just ran these last night.
SMB3 averaged more than 25% faster than CIFS.
I did a loopback to eliminate network effects
because obviously on a low-end laptop,
I'm not going to have a really fast network adapter.
For those of you in Mellanox and Chelsea and others, you can try their very, very fast network adapters
and replicate this stuff. But running loopback and running on VMs on the same box allowed
me to eliminate network hardware deficiencies on a laptop. But you can see that SMB3 was significantly faster, both reads and writes.
And Unix extensions help on CIFS because it allows us to do larger IOs,
but even with Unix extensions, SMB3 is faster than CIFS with Unix extensions.
Unfortunately, the Linux client doesn't support RDMA yet,
so we don't get the benefit of that offload.
We do get some wonderful network effects and send file,
but we don't get the RDMA advantages.
And that's work in progress, and we'd love some patches,
and we have lots of good developers out here who want to submit patches.
We would very happily finish up the RDMA work.
Okay, so what's the status of SMB3 optional features? Security features look
pretty good. The downgrade attack stuff that was added in SMB2.1 is there. The SMB3.1 negotiate
contexts are there. We can mount with SMB3.1.1, which is the dialect Windows 10 ships, but
it's really not complete in the sense that there are some security features that aren't
finished for that. It only works in some use cases for SMB311. So we do recommend that you mount
SMB3 or SMB302. We don't do per share encryption yet. That's something that I want to finish
up this week if possible. There was also some good blog entries that Edgar on the Microsoft
support team did on some of these features that we're going to leverage to finish that
up. Also there are some CIFS features that we have that have not been merged into the SMB3.
When you mount with SMB3, you aren't enabled.
For example, KRB5, being able to use the SPNego KRB5 negotiation works for CIFS,
but for SMB3, it's not tunneled through.
And also...
You and I should have that finished this week.
Yes.
And that's Jim McDonough, by the way, the SUSE guy.
And I would love to finish that this week because these are not big changes.
But as all of you know, Linux is driven by a set of a different model than everything else.
It's kind of this squeaky, whatever gets the grease thing.
So if a vendor comes with something they need, we can review it, we do it.
But a lot of these things, the KRB5 stuff that Jim was mentioning, are not big issues to finish up.
But they are important for some workloads.
Now, claims-based ACLs, another good example.
Adding an Ioctl to do the claims-based ACL set and get would be useful.
We have this for rich ACLs, the CIFS ACLs.
But most of the key security features are in.
On the data integrity side,
we don't have persistent handle support.
We do have durable handle support.
Recovery is pretty good.
We don't have all the clustering features, though.
So some of these things, like persistent handles,
there's been a prototype of the witness protocol
for Samba user space client.
When that is done, that makes more sense to tile that together into a complete clustering story.
Performance actually looks pretty good.
We have two different mechanisms for fast server-side copy, the copy chunk as well as the new block ref counting counting that REFS supports. We have both of those in.
The duplicate extents thing is actually very interesting
because the performance is just staggering on that to REFS.
Very, very impressive. We don't have the T10 style of copy offload.
That could be added. We don't have multi-channel, although
it's close. We talked about RDMA.
There are some really neat little things that SMB gives you.
We can set compression on a file. Okay, you can do that
in some file systems on Linux, but it's still kind of neat to be able to do, you know, change
adder and set compression. You can set integrity, right? Windows has this concept
that you want to mark a file as needing extra integrity.
We've got an IAQ tool to set integrity across the wire.
You want to mark certain files as more, you know, higher levels of rate or whatever, go for it.
So it's kind of neat to see some of these rich sets of features and how to expose them in Linux.
Okay, so what about NFS v4.2? Now on this, I'm going to steal some stuff from Alex McDonald's pitch
and also from Bruce Field's, the Red Hat server lead.
But Alex has a very good SNEA talk that you guys can Google and find
from a few months ago summarizing the NFS v4.2.
So I do recommend go to the SNEA site, find Alex, talk on NFS 4.2. But basically there
are three major layout types. File, block,
and object. The NFS client on Linux supports all of them.
The kernel server only has support for block as
a 4.0 kernel.
Layout stats, available in 4.2.
Flex files, usable starting in, flex files is the kind of thing
that a lot of our guys in my company work on, Trond and Tom Haines. It's the newest
layout type of successor or follow on to files layout.
That is added in 4.2 kernel, so this is in the last year.
Sorry, 4.0 kernel.
Sparse files, support for sparse files was added in 3.18.
Space reservation in 3.19.
Labeled NFS was added, I think for the Red Hat guys, to allow SELinux on NFS.
Interestingly, those X adders are really the only X addAdders that you see supported, although there is an RFC that Mark Eschel
and some others at IBM Almaden
have proposed to add X-Adder support for NFS.
So you can go on the IETF site and see that.
But they do at least support the security labeled
as a 3.11 kernel.
So there are other 4.2 features, though, that Linux client does not support. IOadvise, allowing
you to notify your access patterns.
And copy-offload. To copy-offload, there are patches out there. There's been some good
discussion on this. Some of it's waiting on a true syscall interface. What we have
in SMB3 is, for the block ref counting, the
RAFS style, the cp command dash
dash ref link can do it.
For the other style, the copy chunk, there's an iOctal for that.
NFS, we have patches obviously for this in NFS, but these aren't merged yet.
Also application data holes are not in.
And Christoph Helleweg has proposed a new SCSI file layout.
And you can go read the RFC on that.
But there's a draft RFC for that, a new layout type that's been proposed.
As you can see, a good chunk of the 4.2 features are in.
And I do strongly recommend that nobody mount with NFS 4.0. At least mount with 4.1 if you're going to mount with...
NFS v3 is fine, but 4.1 or later.
There's no real reason that I concede ever mount with 4.0.
And obviously in Linux you can use 4.2.
Okay, so what about these layout types?
File layout is obviously most common.
NetApp and others have supported it for a long time.
Now we have flex files. And if you want a really good presentation on this at Connect-a-thon,
Tom Haynes has one. So I have references at the end of the talk
if you guys want to go get more details on this. Object.
There are at least two implementations
of object out there that I'm aware of. And then block.
But in block, I'll be very curious
to see how Christoph's
new SCSI layout
is received.
With FlexFiles, we have some really
neat new features.
I'm kind of curious with the SCSI layout
versus the block layout, how that's going to play out.
Okay, so what is FlexFiles?
It's the successor to the Files layout. It allows
you to integrate real servers.
Now I think all of you guys have NFSv3 servers somewhere.
And the rollout of NFSv4.1 servers has been surprisingly slow.
So there's a ton of data out on NFSv3 servers, or even v4 servers.
There are a lot of OSs out there, a lot of Unixes that didn't really support 4.1 very well. It didn't support PNFS at all. So how do you use all of these servers? Well,
one thing that's really nice about FlexFiles is your data can sit on legacy servers. It's
kind of a nice feature. It'd be like in SMB3, we could have the data sitting on old CIFS
NAS boxes that we'd already paid for, but our metadata was stored on SMB3.
So it also allows you to...
For clustered file systems,
you know, CIF and Gluster,
it's kind of interesting.
It allows you to export clustered data
a little bit better in some cases.
But I think there's some really interesting things
that we're going to understand better in another year or two
how well this works
but I think it's going to work really well
one is, in some cases
it's better to have the client doing the mirroring
than the server doing the mirroring
this would allow you to stripe data differently
and allows the client
the server gives the layout to the client
and tells the client which servers to write to.
That's an option.
Also, it lets you do SLAs, these service level agreements,
and set management policy.
Moving data more easily is one of the things,
a theme we're seeing a lot in storage now.
Making balancing and tiering decisions in your metadata server
and letting the client spray the data where you tell it
to. And then of course we have this concept of fencing where you can order the
client to basically give the layout back and then you move it
to somewhere where it's faster or better.
So it's a really interesting thing
separating metadata and data, allowing spraying I.O. across a large number of servers from one client.
That's not a concept we have an exact match for in SMB.
Here's a picture of it from Alex's talk.
Once again, I do recommend you take a look at this to get more data.
It mentions a little bit about fencing. As you can see, the concept of a metadata server
and multiple data servers where you have basically the client
given a layout and then the client can report back to the server.
Some of the statistics and things like that.
So the server can make better decisions about where to move data.
So these layout stats and layout errors are very useful in allowing
you to very gracefully
move data where it belongs.
Anyway, the concept of
flex files I think is very, very good.
But there are other areas where SMB3 obviously is more terse and efficient and better.
And even with flex files, of course SMB3 is a much broader protocol than NFS.
But let's just look at some of the comparison points.
Current dialects, 4.2 versus 3.1.1.
Obviously NFS is more POSIX compatible.
Richard Sharp and others have done some nice little prototype patches for Unix extensions,
for SMB3, but advisory byte length locking and unlinked delete behavior are more POSIX
compliant over NFS.
I mean the vast majority of things are POSIX compliant, either on SMB3 or NFS, but there
are some glaring examples here where emulation only gets you so far. There is no equivalent of PNFS on
SMB3. Obviously, I don't want to
underestimate the difference of layered versus
non-layered. SMB3 talks directly to TCP.
NFS is encapsulated in some RPC. This has
effects on what you can do. It makes it a little harder to tune NFS.
Also, there's no exact equivalent of labeled NFS,
the SELinux security labels.
We can do X adders, but not the security type of X adders.
We do the user type X adders in SMB.
And conversely, NFS doesn't do the user type adders.
It does only the SELinux type.
Okay, so what about SMB3?
We have a global namespace.
Well, Chuck Lever and others at Oracle have made proposals for global namespace for NFS.
But what we have in SMB3.1.1, and actually all the way back into CIFS,
we have a nice global namespace, very usable, very commonly used as DFS.
Claims-based ACLs don't have an exact equivalent of NFS. Obviously the file replication, witness protocol, there's
no exact equivalent to these. There are other protocols you can use for some of these same
purposes. You can run rsync over NFS, right? But tightly coupled or better coupled to the
protocol are a large set of useful features.
All of the DCRPC management stuff, for example. I think it's fair to say, even with Tom Talpy here, that the SMB3 RDMA
learned from NFS RDMA, did some useful things.
So we've improved, or SMB3 RDMA
has features that NFS RDMA doesn't have. And also
multi-channel is very useful.
Multi-channel allows you to move
your network I.O. on the fly.
It's not exactly...
PNFS, you can do some of that, but it's not
really the same thing.
Being able to change
the number of network adapters on the fly
and which ones you're using is a very powerful feature.
And I think you're going to find
that the multi-crediting and
the I.O. dispatching is a little cleaner in SMB3.
What about performance?
I did some tests on the plane.
I did some tests this morning even.
Actually, it kind of startled me.
I wanted to add this slide at the last minute,
and I expected to see a slight difference
in NFS and SMB3 performance. I got a much bigger difference than I expected to see a slight difference in NFS and SMB3 performance.
I got a much bigger difference than I expected.
So, once again, this laptop, running Ubuntu and RHEL VMs, tried various combinations.
I tried the simplest example I could.
Let's just use the DD command.
Large block sizes.
Simulate a big file copy.
Nothing fancy. This is just
reads and writes. You know, open the file, read it and write it. Separate test cases.
DD input file from dev zero. In the other case, I'm outputting to dev null. So I'm taking
the disk mostly out of this and I'm taking the network mostly out of it because I'm running either loopback or running VMs on the same host. So I'm just trying to look at the protocol. And it was a much bigger difference than I expected.
So for writes, and remember this laptop is running other things, but I have two mounts, mount and mount 1. One NFS, I tried V3, V4, V4.1, and tried SMB3.
The server is Samba 4.3.
Samba 4.3 has been out for a few weeks now.
Nice stable thing.
I've been running it for a while.
The server is either the NFS kernel server from 4.1 or the NFS kernel server
from 4.1
or the NFS kernel server from 4.3
RC1
so the most current kernel server
versus user space Samba
interesting
look at this
1.1 gigabytes
per second versus 700 meg per second
when writing
reading the gap was 1.1 gigabytes per second versus 700 meg per second when writing.
Reading, the gap was smaller, but still, even reading, SMB3 was faster.
I used the default.
So these are defaults.
So this is, please try your own experiments, too, because, I mean, this was just quick and dirty. I just wanted to use the defaults. So this is, please try your own experiments too because this was just quick and dirty.
I just wanted to use the defaults.
I used a default SMB conf from RHEL.
I used a default, I guess the server file system isn't actually going to matter much,
but it's going to be EXT4 in both of these.
The R size is the default. So Linux
defaults to one megabyte
R size. NFS defaults
to 512.
But even with 512, it actually seems to
fall back to 256K.
So NFS R size
and W size is smaller by default.
But there's other factors than just the R size and W size
it's very interesting to look at
so something to play around with
and I'm not saying that NFS is faster or slower in general
because clearly there are workloads where NFS is much faster
but this was easy to understand
when you're trying to look at a Wireshark trace
and talk to a whole bunch of people
just large reads, large writes. DD. Now, if I took two VMs instead of loopback,
so here I'm writing 20 meg blocks, 400 meg file. And by the way, larger files, you can see the same kind of thing. SMB3 was more than three times faster.
Now, Reeds was closer.
But even Reeds, you got 150, was it 50% faster for Reeds?
Now, why?
If you look at the Wireshark traces, there's two things that are really obvious.
The I-O size is larger for SMB. Interesting though, increasing that to, let's say, 4 meg,
to Samba, reality is it didn't make much difference beyond about 2 meg, and even 1 meg was about the same.
It actually got a little bit worse when the R size got larger for Samba.
For Windows, I don't think that would be the case.
So the default 1 meg R size was actually pretty good for SMB3.
There are many, many fewer TCP requests, probably less fragmentation going on.
But what was also fascinating was
Jeff Layton and others had done a fantastic job
improving parallelism in the NFS kernel server, but it doesn't help
this workload. So what you see is because of the slot
count and various dispatch things, you see chunks of
NFS responses sent all at once and holding up
a whole bunch of stuff. So you see groups of 8 or 16
sometimes responses sent at one time.
So if you look at the Wireshark traces,
you'll see a more graceful dispatching of requests
in the SMB3 case.
Now realize that that's to the Linux kernel server.
There are plenty of you guys out there
with your own NAS servers
that might see slightly different access patterns.
But I think it was interesting to see sort of the client-generated stuff
where you just sort of the fragmentation and these sorts of things.
And, you know, multi-credit helps.
Large I.O. sizes help.
And obviously we should even be doing larger I.O. than this.
You know, 8 megs should be the default.
Now what about other things?
Copy offload.
Copy offload is huge.
It's in SMB3. It's
not in the NFS client yet, although it's coming. You can see the patches out there. The lease
model allowing lease downgrades and upgrades I think is a big help for SMB3 and obviously
multi-channel. But many, many, many workloads are faster on NFS where they can take advantage
of PNFS and spraying IO across a large number of servers.
So let's not forget that.
Flex files, for example.
Okay, so let's talk about
work in progress on SMB3. We have
a lot of testing focus.
We have this week, hopefully, not
just, as Jim was saying, the care of E5 stuff, but
there's a couple of
punch hole, F allocate bugs
that need to be fixed.
And obviously if we can hammer out with Richard Sharp and others his prototype of the Unix
extensions that would be a great timing.
Okay, POSIX compatibility.
We have a big problem that we talked about at the Microsoft Plugfest in some detail.
SMB deprecation.
And with old SMB SIFs deprecation,
we need to move to SMB3 POSIX extensions quickly
because today we rely on,
heck, we'll just use the SIFs mount
because we'll get POSIX compatibility then.
But full POSIX compatibility requires that
it's at least good enough.
And SMB3 is better than SIFs for,
or at least equal for everything,
not a step bar.
POSIX API is complex.
There's a lot of stuff.
And we really need to be,
well, we need to be very aware
of how rich this API is.
And I put in bold here
some of the things that are very important.
The case-sensitive file mapping, the POSIX locking,
renaming and delete of open files.
We have emulation for some of these,
but
it's very important. Being able to
return the POSIX info, the mode, and the UID
GID owner saves round trips
because a lot of metadata-heavy operations
are easier on NFS because you're not
having to go query the ACL to go pull out the UID or whatever.
Okay, the proposed POSIX capabilities that were in these patches are here.
Basically flags that allow you to negotiate these capabilities.
And just to give some details about POSIX compatibility, SMB3 can do hard links,
it can do the reserved characters and POSIX path links, it can do SIM links, mounting
with the Apple style MF SIM links, it can do FIFO's, PIPES character devices, it can
partially do extended attributes, it can, These are the Linux extended attributes,
so strictly speaking, not POSIX.
It can do POSIX stat and statfs,
but there are a couple fields that are emulated,
and POSIX byte range locks are emulated.
We can't do the SELinux labeled NFS kind of stuff,
the security flavor of XADDRs.
We can't do the POSIX mode bits except by sticking them in an ACL.
That's very interesting, by the way, to look at Andreas's patches
over the last six months for rich ACLs in Linux
because you can obviously pull the mode bits out of a rich ACL.
But once again, it is cleaner to be able to return it in one operation.
We can do UID and GID ownership mapping today, but it's a mapping. We're not returning the actual Unix UID. Okay. So here are some steps that we're going to be working
on this week, for example. Here's some example. Maybe we can look at this after the presentation.
Some examples, you can see creating symlinks. You can see using
special, you can see some of the bizarre characters that are used. It's kind of fun
looking at what some of
these characters look like when you actually look at them locally
instead of over the network.
So like an asterisk, these are the Microsoft Apple style Unicode mappings for these reserved
characters. So you add, basically they're above Fox 000. But if you look at these mapped
characters locally, they're a little odd verbal odd in a case over the network
uh...
no apple client or a
uh... when exclamable will happily see them
and you know some of the happily store them
but it's not a kind of odd to see these uh... what they actually look like
locally
so these give you an information about
no the actual amounts i was doing here on about options
and you know you can see here, again,
the stat command and the statfs command,
and the ls command,
and you can see the difference between a local and a network mount.
But we get very, very close.
Positive simulation isn't bad.
So what other features are we looking at?
Obviously, HECL support, improved HECL support for SME3.
One of the things we looked at is stream support, how to list streams.
We had patches for doing that over iOctols, for example,
or do we want to do it a different way?
Linux error handling is pretty good for SIFs,
but there are a lot of corner cases in error handling.
And so one of them that we've been looking at
is the persistent and durable handle cases,
recovering pending byte range locks,
making sure we do that perfectly.
We talked about the XADR cases.
Okay, what about release by release?
So some of this is useful reference
maybe for later for you guys.
If you're curious, I've got a particular version
of Linux kernel, does it have a fix?
This summarizes sort of by release the changes.
But to give you an idea of some of the more recent ones,
the MF sim links were added in 3.18.
This allows for emulated sim links to a Windows server, for example, from Linux client.
The POSIX reserve character stuff we talked about.
And then there were some bugs in the Mac client
with the way they handled the CIFS Unix extension.
So when you mount with the default CIFS,
some of those were in 3.18.
3.19, there was a much improved F-allocate support.
Some of these aren't even in NFS.
Some of these features't even in NFS, some of these features. Let's see.
4.2 added the 3.1.1 dialect negotiation, added the duplicate extent,
so you can do cp dash dash ref link to RAFS,
and that 1,000 times faster copy if the files are both on the same mount on the server.
Also added an iOctal for get and set integrity,
so if you want to make a particular file on that mount,
get better data integrity,
that get and set integrity iOctal is there.
And in 4.3, we just made a very useful bug fix that I had to use yesterday.
It's kind of funny.
There's a time zone problem with Mac server,
where if your time zone is off by two hours,
and for those of you who run VMs, it's very frequent that your time zone is off by two hours, and for those
of you who run VMs, it's very frequent that your time will drift more than two hours.
So in any case, there's a fix for working around the Mac OS mount problem when your
server and client's time clocks have drifted too far.
That was actually useful to me yesterday.
Wonderful things about Linux.
You benefit from other people's work.
And of course, the most important thing, we have four days and not just Jim McDonough
but any of you guys here, please find fixes, help me, let's get these things in. We really
want to get these improvements and any bug fixes that... I know that there's a fairly large backlog of things that need to be closed out
and bug fixes.
And we want to clean that up
so we don't have any worries about any...
I don't know.
There's a lot of NAS servers.
One of the things that I...
After many years of doing this,
I think it's very hard to underestimate
there are so many server bugs to work around,
or server features to work around.
And lots of client bugs, too.
I don't mean, we have lots of client bugs.
But it's fascinating.
There's probably 10 separate server implementations here,
at least.
You'd be amazed how many bugs to work around.
It creates SIFT's utils, there's some work going on right now on maybe adding some improved statistics gathering for these and adding some extra new utilities.
This is an area that is very easy to make improvements in because, for example, just this last release we added an iOctl that allows you to query the information on a mount.
And we had to do some of this so we could detect, hey, is it RAFS?
Does it support block ref counting?
Does it support these features?
There's a lot of features that are returned about the file system and the device on the
server and some of these matter.
We now have kernel tools to do it but not the user space tools to make it pretty.
Okay, some practical tips. Consider using SFU and MF SimLinks mount options
if you're mounting with SMB3.
Use verse equals 2.1 or later.
Don't bother with SMB2.
Just go straight to 2.1 or later.
You might as well, like these Macs that I have,
support SMB3, mount with SMB3.
For higher performance,
especially to Windows, you might increase the R size
and W size beyond its default of 1 meg.
Obviously this won't
help you with SIFs, but with SMB2, 1, and later
try the larger than 1 meg.
It didn't help in my test
to Samba, but I think it would to Windows.
And I don't know about to Mac. I should have experimented
with Mac.
Obviously case sensitivity can cause problems.
One of the things that
drives me crazy is
if I want to do a kernel build
on CIFS,
it works fine to Samba, because we have case sensitivity.
The number one
personal pain point is I can't do
a build, because you have
files in the Linux kernel build that have the same name, different case.
So that's an example where it can be painful like doing a kernel build.
Okay, so SMB3 support is very solid.
Pretty well tested.
I'm not very worried.
It's missing some key features.
We've talked about those key features. The default will change to SMB3.
As a matter of fact, I was thinking about doing that in CIFS Utils very soon. Love feedback
on that. We obviously this week are going to continue to do XFS tests, continue to do testing
against various devices. I'd like to do more, especially against other NAS devices that I don't
have at home, that I don't have in the office.
We have a page out there with the XFS test status. In have at home, that I don't have in the office.
We have a page out there with the XFS status. In Linux, the file systems have moved to basically
one bucket XFS test that includes everything.
XFS test. There's hundreds of tests, and here are some
of the ones that cause problems. And many of these also have problems on NFS as well.
Usually because of timestamp issues, client and server, mtime stuff, that's really hard to do in a network file system that really only makes sense in a local file system.
But there are some FLK issues as well buried in there.
But the big picture I want to leave here, NFS 4.2 is actually very exciting. SMB 3.1.1 is very exciting.
These are very interesting protocols.
There are workloads, even Linux to Linux, where SMB 3.0 is better.
There are workloads probably even to SMB-centric NAS,
where NFS is a little bit better.
But generally, between the two of them,
there's so much breadth of workloads you can cover.
And the features that have been added, nobody implements all the features in one protocol, much less both.
But they're very interesting protocols.
I think you were just mentioning, right, that nobody really understands the implications of some of these cool features.
Async operations. Just async operations. The ability
to do cool things in your server while you're waiting for operations to be pending.
There is a lot of optimization still to be
done on both the server and client side in both protocols. But the big picture
is that these have real workloads.
It's not just casually storing PDF files from
your presentation at SNEA. These are things like Facebook. These are things like big NAS
boxes in the back end of cloud. I'm very excited as you see more of this integrated into private
cloud. But I do want to encourage you guys this week, do
more testing, more patches, more bugs. Let's make this much better, both for NFS as well
as for SMB3. I think we've got a couple minutes for questions, so I'd be happy to ask some
questions. questions
that's it
that's a very good question he asked for us to be free what's the interaction
between microsoft and the linux community
uh... now there's two parts of this right one is
the soma side
where we have both server and a huge set of tools
and microsoft has been very active with the Samba team for many years
in using some of the tools to help debug our servers and vice versa.
On the kernel side, though, they've contributed directly.
I mean, I got a patch.
There's a patch in 4.2 from Microsoft.
So Microsoft has, from time to time,
contributed patches.
They found some stuff doing network boot of Linux.
Kind of neat.
Found a bug.
Unfortunately, their server responded faster than...
The server responded before the request was processed.
Does that make sense?
So when you have a really fast server and it responds before the request was sent.
That was painful.
In any case, that was an example. Microsoft has contributed patches.
I think largely the interaction that I've had has been testing and
asking crazy questions to Tom Talpy and others about things I see.
On the other hand, clearly Microsoft is evolving,
and it's a great question to ask these guys
because Microsoft guys are here.
But in any case, I think it's been a very good relationship.
On the Samba side as well as on the Linux side,
you're seeing Microsoft testing things,
and we test with them regularly.
Many of us test two or three times a test with them regularly.
Many of us test two or three times a year with Microsoft guys.
Go ahead.
At 5.30 this afternoon the SMB3 Club Fest is open to everybody to come to look.
You'll see a large number of people doing SMB3 presentations,
many of them on Linux, not all of them are Samba.
Some of them are available, third party people, licensing, and a lot of people talk to you down there.
You'll be there, food, everything, etc. 5.30, 9.30.
So as SW noted, 5.30 we have the Plugfest is open, and a very important thing you noted is that although in my talk
I focused obviously
on Samba server
and NFS kernel server
because they're common
and well understood
there are other servers as well
Ganesha, open source
user space server
and lots of closed source servers
even on Linux.
So we've spent a lot of time
talking about the open source ones.
In any case I think it's been very I've enjoyed very much being able to interact well with Microsoft guys
because seeing these two operating systems interact, one common thread I see is we both have some surprisingly similar problems to deal with.
Even with operating systems as diverse as Linux and Windows.
And as a client developer, this is just fun.
It's like horror stories sharing the horror.
Okay, question back there.
Just an NFL-specific one.
You mentioned extended attributes.
Do you think we'll see extended attribute support in NFS?
Yes.
Okay.
So let me answer it three ways.
One is we already have it in some way because they do have this subset of the security ones.
And in another sense, there are named attributes in NFS already.
Linux client doesn't expose many.
In theory, you could expose some named attributes already.
The bigger picture is Mark Eschel
and another IBM colleague of his at IBM Almaden
have an RFC draft.
I really don't know much I can say other than,
I don't know personally what the reaction in the NFS standards community is to that draft.
But that would add support for XHatters.
Personally, what I struggle with a little bit is that there are Linux-specific security attributes
that I'd like to put on the wire to Samba
at least that are a little tricky for me. So I have the reverse problem.
I can do user attributes but I'd like to be able to do others.
Okay, so Tom did you have a question?
I did. Did you mention Linux client support
for multi-channel and RDMA?
Do you have anything to add on those?
So his question was Linux support for multichannel and RDMA.
So RDMA support depends to some extent on multichannel.
Tom gave some fantastic suggestions a few months ago on how to stage that work.
So this is very important stuff to do for Linux.
Because right now, to get RDMA support in, which may be
useful, by the way, not just in multi-thousand dollar
adapters, but in some of the commodity workloads as well,
we need to finish multichannel.
And only the first couple steps of multichannel are done.
There were patches that I worked on back in June that got a little bit farther
but they got kind of dropped on the floor because of work stuff that I was doing
unrelated to the Linux client.
One of the things I would like to do if there's time this week is maybe make some progress
with that. But it's an area where we would
eagerly want to finish.
Finishing the multi-channel support and letting a client off
would allow us to better balance workload
and allow the server to adjust much better
to changing network configurations.
But it also allows you to advertise RDMA.
And RDMA, there are a number of cases,
not just the high-end Mellanox cards,
where this is useful.
And I think all of you guys for the last three or four years
have probably seen Microsoft's spectacular demos
of what multichannel gives you.
So I would like to see that.
But no development progress really since June or July.
I don't know if all the previous ideas
were specifically addressed
SMB3 encryption. Yes, I talked briefly about SMB3 encryption.
So the question was, did I address SMB3 encryption? So SMB3 encryption,
the
layering makes it easier than it sounds to do in the Linux client
because the signing layer and the way things are
the way the protocol implementation for the
transport layer in the sys.ko makes it fairly easy to do encryption
other than the technical obstacles of making sure you've got the
keys correct and all that stuff. We got
part way through that with Edgar
back in the summer,
but haven't worked on it since.
So one of the things I'd like to look at this week
to finish up if we can.
So SMB3 encryption is extremely useful
if you don't trust your network.
All of you guys love to hear NSA stories
and spy, whatever, spying stories, but it really
is helpful.
SMB encryption is really powerful because it allows you on a particular share, a particular
export to say this data matters, I want this encrypted, now this doesn't have to be encrypted.
That's kind of a neat feature.
Most other things force you to encrypt everything and obviously IPsec and other are very hard to set up.
So SMB3 encryption is relatively easy Windows to Windows to set up,
and I think even the Mac supports it, right?
So it is something that's fairly high on our priority list,
but it's not finished.
Okay, so I would welcome...
Let me make sure we're out of time here.
Yeah, we're about out of time.
I'll be outside and also be at the Plugfest and welcome additional questions. But I do want to encourage
you guys to...SW, you had a question?
I noticed a lot of people are taking photos of the screen, etc.
We welcome that, but the slides are posted. Steve may have forgotten
to update them to the as-presented version, but I'm sure he'll be doing that later.
This is a developer conference. We never actually come in and give you the slides
in time.
But we do expect you to post them afterwards.
They're available when you have the program on like
the after.
These are as of an hour ago.
So these have been updated.
The slides are up there. but about 20% have changed.
So I found some really fun stuff on the plane.
Thank you all.
Thanks for listening. If you have questions about the material
presented in this podcast, be sure and join our developers mailing list
by sending an email to
developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with
your peers in the developer community. For additional information about the storage
developer conference, visit storage developer.org.