Storage Developer Conference - #10: Linux SMB3 and pNFS - Shaping the Future of Network File Systems

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast. You are listening to SDC Podcast Episode 10. Today we hear from Stephen French, Principal System Engineer with the Samba team and Primary Data, as he presents Linux SMB3 and PNFS from the 2015 Storage Developer Conference. Okay, so today I'd like to talk about SMB3 and the newest versions of NFS,

Starting point is 00:00:55 NFS version 4.2, and some of the new layouts that have been proposed for PNFS. As I said, I've been involved in development of NAS stuff since early years at IBM on both NFS and SMB. And I found this very interesting meeting Dr. Barry Feigenbaum who in 1984 as a young PhD student was inventing some of the stuff while others at Sun were inventing the NFS stuff. So why are we here? How many years later? 30 years later? This is very interesting stuff that we've evolved and improved so much over these years. I maintain

Starting point is 00:01:32 the Linux SIFS client. One of my coworkers, Tron, maintains the Linux NFS client. The SIFS client allows you to access SMB3 servers. Obviously you've seen this. This is a Mac, right? In making this presentation, I had to use my code to get stuff back and forth to the Mac. But also, obviously, I think all of you are aware of Windows and the enormous amount of very exciting things that Windows has done with SMB3.

Starting point is 00:02:07 But first we should talk a little bit about NAS. Many of you guys were here at the presentation right before this. The presentation right before this was about iSCSI, ICER, right? This is not NAS, right? So why do we care about NAS? And why do we care about these old 31-year-old protocols? So we have some questions we have to address first. And to answer those, we kind of have to, like, why do we care about file systems?

Starting point is 00:02:33 Well, it's over 50 years, right? Remember Multics? I don't think any of you guys were around then, right? The amount of unstructured data is staggering. Now, I have kids. I have three kids. Different ages. College down to 14 years old. The amount of unstructured data they create each day is frightening. And I think any of you guys, if you look at their laptops, will be sort of startled how much of that stuff is created.

Starting point is 00:03:00 So, this is just staggering amount. And NAS does very well. But all of this stuff, whether you're using NAS or not, depends at some level on file systems. You know, Facebook, Google, whatever. This stuff gets stored on file systems. When you go to conferences like Vault, or I enjoyed going to the Linux File System Summit again the last few years, it's really interesting to hear the kind of challenges in file systems that still 50-something years later we're struggling with.

Starting point is 00:03:28 But why do we care about NAS instead of SAN? I'm prejudiced. I've been doing this for a long time. But the ownership information, the better security, it's easier to set up. Most of us don't want to deal with lungs. Most of us don't want to deal with configuring some of the bizarre stuff you have to for SAN. And then you get this application access info stuff, this patterns that you can optimize. In NFS, the metadata patterns you can use to do really cool things and some of the things that NetApp and EMC and companies like mine, Primary Data, are doing, trying to understand how to move data around.

Starting point is 00:04:05 This is a lot easier when you actually see the access patterns. And of course, nice features like encryption and compression. I'm sure the guys in the cloud want everything encrypted, right? This is a lot easier on NAS than it is on SAN. And policy setting is easier. I borrowed this slide here from, you can see the link below for more information, but it's just amazing how much unstructured file system related stuff, even in the internet, how much of that's getting sent in NAS backends. But then the obvious question is why Linux, right? I've got a Mac here, I work on Linux, we've got Windows people out there, we have people from multiple NAS vendors. But why do we talk about Linux? Well, 90,000 change sets

Starting point is 00:04:51 last year. 90,000. This is not one line change, right? 90,000 sets of changes. That's huge. Now, the development pace looks like it's going up slightly, not going down for Linux. So it's fun when I go to the Linux file system summit and meet a lot of these developers, very bright guys, Ted So, Christoph Helwig, others. It's a really talented community. Now if you just look at the file system, over 5,000 changes in the last year. This is just the file system.

Starting point is 00:05:27 Over 1,000 just in the last point release, 4.2. Over 1,000. So we have almost a million lines of file system code, which is very deceptive. Linux is incredibly terse. People send me patches every month. Remove this, clean up this, use this helper function, they add some helper function, shrink ten lines of code. It's very terse code, and yet still quite large. Over 1200 developers

Starting point is 00:06:00 per release. That's a big community. Lots of people. Lots of eyes. And good processes and tools. And of course, look at all those file systems. And here's a picture. I'm buried in there somewhere. But this is just the file system experts and MM experts. That's a really talented group. So, why Linux? Obviously we have a very talented group. So, why Linux? Obviously we have a very talented group. Now, what are the most active things we do in Linux? Well, surprise, surprise.

Starting point is 00:06:34 NAS is really high up there. So the majority of file system activity in Linux is driven by six file systems. There they are. And guess what? NFS server, NFS client, CIFs are up there. Top six. And, you know, we have a ButterFS talk. I think that just went on is actually going on right now. That is the most active file system and has been for a number of years. So many people

Starting point is 00:07:00 dis ButterFS because it's still evolving. But ButterFS is getting a lot of love, a lot of activity. EXT4, as well known as it is, gets less than XFS. So it's interesting to see where the actual activity has been going on year after year. NFS server, NFS client, SIFs, ButterFS, XFS, and EXT4. Interestingly, the NFS server activity, largely driven by Jeff Layton and others in my company, but also a couple others, recently the NFS server activity over the last year and a half has skyrocketed, has increased a lot.

Starting point is 00:07:34 Bruce Fields has a nice presentation, a Red Hat server lead for NFS, has a good presentation on that from the Vault conference this year. They can talk about some of the NFS server improvements. So we've had more than 30 year of improvements. Impressive protocols. I think you all are running a variety of laptops and devices, NFS or SMBs available, and most or all of them.

Starting point is 00:07:58 The NAS server support both, they'll support both. They're very well tested, they're understood reasonably well. And, of course, they're more common than all of the cluster and other network file systems combined. So really we're down to NFS v3 versus, sorry, NFS v4.something versus SMB3. So what do we have?

Starting point is 00:08:20 Our current kernel. This is, well, the VMs here are running 4.3 RC1. I run, when I do my presentations and testing, I typically am running, this is two weeks old, right? A week old. 4.3 RC1 kernel. Thirteen months ago we had a shuffling zombie juror, for those of you who like John Grisham novels. But that was 316. You know, it's kind of cool to look at performance. I get kind of excited looking at this stuff because it seems like even after many years of doing this, you can still be surprised.

Starting point is 00:08:54 You know, SMB3 is quite fast. Jeff Layton a few years ago and Pavel Shalovsky also two years ago and last year did some really good stuff for increasing SMB3 performance. But in addition, there are some nice SMB3 features, multi-crediting and the larger I.O. sizes that really help Linux in certain very easy to understand, very common workloads and large write performance. Here's a very simple example. So these are VMs on this box. I just ran these last night.

Starting point is 00:09:32 SMB3 averaged more than 25% faster than CIFS. I did a loopback to eliminate network effects because obviously on a low-end laptop, I'm not going to have a really fast network adapter. For those of you in Mellanox and Chelsea and others, you can try their very, very fast network adapters and replicate this stuff. But running loopback and running on VMs on the same box allowed me to eliminate network hardware deficiencies on a laptop. But you can see that SMB3 was significantly faster, both reads and writes. And Unix extensions help on CIFS because it allows us to do larger IOs,

Starting point is 00:10:13 but even with Unix extensions, SMB3 is faster than CIFS with Unix extensions. Unfortunately, the Linux client doesn't support RDMA yet, so we don't get the benefit of that offload. We do get some wonderful network effects and send file, but we don't get the RDMA advantages. And that's work in progress, and we'd love some patches, and we have lots of good developers out here who want to submit patches. We would very happily finish up the RDMA work.

Starting point is 00:10:44 Okay, so what's the status of SMB3 optional features? Security features look pretty good. The downgrade attack stuff that was added in SMB2.1 is there. The SMB3.1 negotiate contexts are there. We can mount with SMB3.1.1, which is the dialect Windows 10 ships, but it's really not complete in the sense that there are some security features that aren't finished for that. It only works in some use cases for SMB311. So we do recommend that you mount SMB3 or SMB302. We don't do per share encryption yet. That's something that I want to finish up this week if possible. There was also some good blog entries that Edgar on the Microsoft support team did on some of these features that we're going to leverage to finish that

Starting point is 00:11:23 up. Also there are some CIFS features that we have that have not been merged into the SMB3. When you mount with SMB3, you aren't enabled. For example, KRB5, being able to use the SPNego KRB5 negotiation works for CIFS, but for SMB3, it's not tunneled through. And also... You and I should have that finished this week. Yes. And that's Jim McDonough, by the way, the SUSE guy.

Starting point is 00:11:48 And I would love to finish that this week because these are not big changes. But as all of you know, Linux is driven by a set of a different model than everything else. It's kind of this squeaky, whatever gets the grease thing. So if a vendor comes with something they need, we can review it, we do it. But a lot of these things, the KRB5 stuff that Jim was mentioning, are not big issues to finish up. But they are important for some workloads. Now, claims-based ACLs, another good example. Adding an Ioctl to do the claims-based ACL set and get would be useful.

Starting point is 00:12:22 We have this for rich ACLs, the CIFS ACLs. But most of the key security features are in. On the data integrity side, we don't have persistent handle support. We do have durable handle support. Recovery is pretty good. We don't have all the clustering features, though. So some of these things, like persistent handles,

Starting point is 00:12:41 there's been a prototype of the witness protocol for Samba user space client. When that is done, that makes more sense to tile that together into a complete clustering story. Performance actually looks pretty good. We have two different mechanisms for fast server-side copy, the copy chunk as well as the new block ref counting counting that REFS supports. We have both of those in. The duplicate extents thing is actually very interesting because the performance is just staggering on that to REFS. Very, very impressive. We don't have the T10 style of copy offload.

Starting point is 00:13:20 That could be added. We don't have multi-channel, although it's close. We talked about RDMA. There are some really neat little things that SMB gives you. We can set compression on a file. Okay, you can do that in some file systems on Linux, but it's still kind of neat to be able to do, you know, change adder and set compression. You can set integrity, right? Windows has this concept that you want to mark a file as needing extra integrity. We've got an IAQ tool to set integrity across the wire.

Starting point is 00:13:48 You want to mark certain files as more, you know, higher levels of rate or whatever, go for it. So it's kind of neat to see some of these rich sets of features and how to expose them in Linux. Okay, so what about NFS v4.2? Now on this, I'm going to steal some stuff from Alex McDonald's pitch and also from Bruce Field's, the Red Hat server lead. But Alex has a very good SNEA talk that you guys can Google and find from a few months ago summarizing the NFS v4.2. So I do recommend go to the SNEA site, find Alex, talk on NFS 4.2. But basically there are three major layout types. File, block,

Starting point is 00:14:32 and object. The NFS client on Linux supports all of them. The kernel server only has support for block as a 4.0 kernel. Layout stats, available in 4.2. Flex files, usable starting in, flex files is the kind of thing that a lot of our guys in my company work on, Trond and Tom Haines. It's the newest layout type of successor or follow on to files layout. That is added in 4.2 kernel, so this is in the last year.

Starting point is 00:15:04 Sorry, 4.0 kernel. Sparse files, support for sparse files was added in 3.18. Space reservation in 3.19. Labeled NFS was added, I think for the Red Hat guys, to allow SELinux on NFS. Interestingly, those X adders are really the only X addAdders that you see supported, although there is an RFC that Mark Eschel and some others at IBM Almaden have proposed to add X-Adder support for NFS. So you can go on the IETF site and see that.

Starting point is 00:15:37 But they do at least support the security labeled as a 3.11 kernel. So there are other 4.2 features, though, that Linux client does not support. IOadvise, allowing you to notify your access patterns. And copy-offload. To copy-offload, there are patches out there. There's been some good discussion on this. Some of it's waiting on a true syscall interface. What we have in SMB3 is, for the block ref counting, the RAFS style, the cp command dash

Starting point is 00:16:06 dash ref link can do it. For the other style, the copy chunk, there's an iOctal for that. NFS, we have patches obviously for this in NFS, but these aren't merged yet. Also application data holes are not in. And Christoph Helleweg has proposed a new SCSI file layout. And you can go read the RFC on that. But there's a draft RFC for that, a new layout type that's been proposed. As you can see, a good chunk of the 4.2 features are in.

Starting point is 00:16:39 And I do strongly recommend that nobody mount with NFS 4.0. At least mount with 4.1 if you're going to mount with... NFS v3 is fine, but 4.1 or later. There's no real reason that I concede ever mount with 4.0. And obviously in Linux you can use 4.2. Okay, so what about these layout types? File layout is obviously most common. NetApp and others have supported it for a long time. Now we have flex files. And if you want a really good presentation on this at Connect-a-thon,

Starting point is 00:17:08 Tom Haynes has one. So I have references at the end of the talk if you guys want to go get more details on this. Object. There are at least two implementations of object out there that I'm aware of. And then block. But in block, I'll be very curious to see how Christoph's new SCSI layout is received.

Starting point is 00:17:32 With FlexFiles, we have some really neat new features. I'm kind of curious with the SCSI layout versus the block layout, how that's going to play out. Okay, so what is FlexFiles? It's the successor to the Files layout. It allows you to integrate real servers. Now I think all of you guys have NFSv3 servers somewhere.

Starting point is 00:17:49 And the rollout of NFSv4.1 servers has been surprisingly slow. So there's a ton of data out on NFSv3 servers, or even v4 servers. There are a lot of OSs out there, a lot of Unixes that didn't really support 4.1 very well. It didn't support PNFS at all. So how do you use all of these servers? Well, one thing that's really nice about FlexFiles is your data can sit on legacy servers. It's kind of a nice feature. It'd be like in SMB3, we could have the data sitting on old CIFS NAS boxes that we'd already paid for, but our metadata was stored on SMB3. So it also allows you to... For clustered file systems,

Starting point is 00:18:34 you know, CIF and Gluster, it's kind of interesting. It allows you to export clustered data a little bit better in some cases. But I think there's some really interesting things that we're going to understand better in another year or two how well this works but I think it's going to work really well

Starting point is 00:18:50 one is, in some cases it's better to have the client doing the mirroring than the server doing the mirroring this would allow you to stripe data differently and allows the client the server gives the layout to the client and tells the client which servers to write to. That's an option.

Starting point is 00:19:08 Also, it lets you do SLAs, these service level agreements, and set management policy. Moving data more easily is one of the things, a theme we're seeing a lot in storage now. Making balancing and tiering decisions in your metadata server and letting the client spray the data where you tell it to. And then of course we have this concept of fencing where you can order the client to basically give the layout back and then you move it

Starting point is 00:19:36 to somewhere where it's faster or better. So it's a really interesting thing separating metadata and data, allowing spraying I.O. across a large number of servers from one client. That's not a concept we have an exact match for in SMB. Here's a picture of it from Alex's talk. Once again, I do recommend you take a look at this to get more data. It mentions a little bit about fencing. As you can see, the concept of a metadata server and multiple data servers where you have basically the client

Starting point is 00:20:09 given a layout and then the client can report back to the server. Some of the statistics and things like that. So the server can make better decisions about where to move data. So these layout stats and layout errors are very useful in allowing you to very gracefully move data where it belongs. Anyway, the concept of flex files I think is very, very good.

Starting point is 00:20:43 But there are other areas where SMB3 obviously is more terse and efficient and better. And even with flex files, of course SMB3 is a much broader protocol than NFS. But let's just look at some of the comparison points. Current dialects, 4.2 versus 3.1.1. Obviously NFS is more POSIX compatible. Richard Sharp and others have done some nice little prototype patches for Unix extensions, for SMB3, but advisory byte length locking and unlinked delete behavior are more POSIX compliant over NFS.

Starting point is 00:21:13 I mean the vast majority of things are POSIX compliant, either on SMB3 or NFS, but there are some glaring examples here where emulation only gets you so far. There is no equivalent of PNFS on SMB3. Obviously, I don't want to underestimate the difference of layered versus non-layered. SMB3 talks directly to TCP. NFS is encapsulated in some RPC. This has effects on what you can do. It makes it a little harder to tune NFS. Also, there's no exact equivalent of labeled NFS,

Starting point is 00:21:51 the SELinux security labels. We can do X adders, but not the security type of X adders. We do the user type X adders in SMB. And conversely, NFS doesn't do the user type adders. It does only the SELinux type. Okay, so what about SMB3? We have a global namespace. Well, Chuck Lever and others at Oracle have made proposals for global namespace for NFS.

Starting point is 00:22:14 But what we have in SMB3.1.1, and actually all the way back into CIFS, we have a nice global namespace, very usable, very commonly used as DFS. Claims-based ACLs don't have an exact equivalent of NFS. Obviously the file replication, witness protocol, there's no exact equivalent to these. There are other protocols you can use for some of these same purposes. You can run rsync over NFS, right? But tightly coupled or better coupled to the protocol are a large set of useful features. All of the DCRPC management stuff, for example. I think it's fair to say, even with Tom Talpy here, that the SMB3 RDMA learned from NFS RDMA, did some useful things.

Starting point is 00:22:55 So we've improved, or SMB3 RDMA has features that NFS RDMA doesn't have. And also multi-channel is very useful. Multi-channel allows you to move your network I.O. on the fly. It's not exactly... PNFS, you can do some of that, but it's not really the same thing.

Starting point is 00:23:15 Being able to change the number of network adapters on the fly and which ones you're using is a very powerful feature. And I think you're going to find that the multi-crediting and the I.O. dispatching is a little cleaner in SMB3. What about performance? I did some tests on the plane.

Starting point is 00:23:32 I did some tests this morning even. Actually, it kind of startled me. I wanted to add this slide at the last minute, and I expected to see a slight difference in NFS and SMB3 performance. I got a much bigger difference than I expected to see a slight difference in NFS and SMB3 performance. I got a much bigger difference than I expected. So, once again, this laptop, running Ubuntu and RHEL VMs, tried various combinations. I tried the simplest example I could.

Starting point is 00:23:59 Let's just use the DD command. Large block sizes. Simulate a big file copy. Nothing fancy. This is just reads and writes. You know, open the file, read it and write it. Separate test cases. DD input file from dev zero. In the other case, I'm outputting to dev null. So I'm taking the disk mostly out of this and I'm taking the network mostly out of it because I'm running either loopback or running VMs on the same host. So I'm just trying to look at the protocol. And it was a much bigger difference than I expected. So for writes, and remember this laptop is running other things, but I have two mounts, mount and mount 1. One NFS, I tried V3, V4, V4.1, and tried SMB3.

Starting point is 00:24:49 The server is Samba 4.3. Samba 4.3 has been out for a few weeks now. Nice stable thing. I've been running it for a while. The server is either the NFS kernel server from 4.1 or the NFS kernel server from 4.1 or the NFS kernel server from 4.3 RC1

Starting point is 00:25:10 so the most current kernel server versus user space Samba interesting look at this 1.1 gigabytes per second versus 700 meg per second when writing reading the gap was 1.1 gigabytes per second versus 700 meg per second when writing.

Starting point is 00:25:33 Reading, the gap was smaller, but still, even reading, SMB3 was faster. I used the default. So these are defaults. So this is, please try your own experiments, too, because, I mean, this was just quick and dirty. I just wanted to use the defaults. So this is, please try your own experiments too because this was just quick and dirty. I just wanted to use the defaults. I used a default SMB conf from RHEL. I used a default, I guess the server file system isn't actually going to matter much, but it's going to be EXT4 in both of these.

Starting point is 00:26:08 The R size is the default. So Linux defaults to one megabyte R size. NFS defaults to 512. But even with 512, it actually seems to fall back to 256K. So NFS R size and W size is smaller by default.

Starting point is 00:26:26 But there's other factors than just the R size and W size it's very interesting to look at so something to play around with and I'm not saying that NFS is faster or slower in general because clearly there are workloads where NFS is much faster but this was easy to understand when you're trying to look at a Wireshark trace and talk to a whole bunch of people

Starting point is 00:26:44 just large reads, large writes. DD. Now, if I took two VMs instead of loopback, so here I'm writing 20 meg blocks, 400 meg file. And by the way, larger files, you can see the same kind of thing. SMB3 was more than three times faster. Now, Reeds was closer. But even Reeds, you got 150, was it 50% faster for Reeds? Now, why? If you look at the Wireshark traces, there's two things that are really obvious. The I-O size is larger for SMB. Interesting though, increasing that to, let's say, 4 meg, to Samba, reality is it didn't make much difference beyond about 2 meg, and even 1 meg was about the same.

Starting point is 00:27:51 It actually got a little bit worse when the R size got larger for Samba. For Windows, I don't think that would be the case. So the default 1 meg R size was actually pretty good for SMB3. There are many, many fewer TCP requests, probably less fragmentation going on. But what was also fascinating was Jeff Layton and others had done a fantastic job improving parallelism in the NFS kernel server, but it doesn't help this workload. So what you see is because of the slot

Starting point is 00:28:15 count and various dispatch things, you see chunks of NFS responses sent all at once and holding up a whole bunch of stuff. So you see groups of 8 or 16 sometimes responses sent at one time. So if you look at the Wireshark traces, you'll see a more graceful dispatching of requests in the SMB3 case. Now realize that that's to the Linux kernel server.

Starting point is 00:28:38 There are plenty of you guys out there with your own NAS servers that might see slightly different access patterns. But I think it was interesting to see sort of the client-generated stuff where you just sort of the fragmentation and these sorts of things. And, you know, multi-credit helps. Large I.O. sizes help. And obviously we should even be doing larger I.O. than this.

Starting point is 00:28:58 You know, 8 megs should be the default. Now what about other things? Copy offload. Copy offload is huge. It's in SMB3. It's not in the NFS client yet, although it's coming. You can see the patches out there. The lease model allowing lease downgrades and upgrades I think is a big help for SMB3 and obviously multi-channel. But many, many, many workloads are faster on NFS where they can take advantage

Starting point is 00:29:22 of PNFS and spraying IO across a large number of servers. So let's not forget that. Flex files, for example. Okay, so let's talk about work in progress on SMB3. We have a lot of testing focus. We have this week, hopefully, not just, as Jim was saying, the care of E5 stuff, but

Starting point is 00:29:40 there's a couple of punch hole, F allocate bugs that need to be fixed. And obviously if we can hammer out with Richard Sharp and others his prototype of the Unix extensions that would be a great timing. Okay, POSIX compatibility. We have a big problem that we talked about at the Microsoft Plugfest in some detail. SMB deprecation.

Starting point is 00:30:04 And with old SMB SIFs deprecation, we need to move to SMB3 POSIX extensions quickly because today we rely on, heck, we'll just use the SIFs mount because we'll get POSIX compatibility then. But full POSIX compatibility requires that it's at least good enough. And SMB3 is better than SIFs for,

Starting point is 00:30:25 or at least equal for everything, not a step bar. POSIX API is complex. There's a lot of stuff. And we really need to be, well, we need to be very aware of how rich this API is. And I put in bold here

Starting point is 00:30:43 some of the things that are very important. The case-sensitive file mapping, the POSIX locking, renaming and delete of open files. We have emulation for some of these, but it's very important. Being able to return the POSIX info, the mode, and the UID GID owner saves round trips

Starting point is 00:30:59 because a lot of metadata-heavy operations are easier on NFS because you're not having to go query the ACL to go pull out the UID or whatever. Okay, the proposed POSIX capabilities that were in these patches are here. Basically flags that allow you to negotiate these capabilities. And just to give some details about POSIX compatibility, SMB3 can do hard links, it can do the reserved characters and POSIX path links, it can do SIM links, mounting with the Apple style MF SIM links, it can do FIFO's, PIPES character devices, it can

Starting point is 00:31:40 partially do extended attributes, it can, These are the Linux extended attributes, so strictly speaking, not POSIX. It can do POSIX stat and statfs, but there are a couple fields that are emulated, and POSIX byte range locks are emulated. We can't do the SELinux labeled NFS kind of stuff, the security flavor of XADDRs. We can't do the POSIX mode bits except by sticking them in an ACL.

Starting point is 00:32:11 That's very interesting, by the way, to look at Andreas's patches over the last six months for rich ACLs in Linux because you can obviously pull the mode bits out of a rich ACL. But once again, it is cleaner to be able to return it in one operation. We can do UID and GID ownership mapping today, but it's a mapping. We're not returning the actual Unix UID. Okay. So here are some steps that we're going to be working on this week, for example. Here's some example. Maybe we can look at this after the presentation. Some examples, you can see creating symlinks. You can see using special, you can see some of the bizarre characters that are used. It's kind of fun

Starting point is 00:32:56 looking at what some of these characters look like when you actually look at them locally instead of over the network. So like an asterisk, these are the Microsoft Apple style Unicode mappings for these reserved characters. So you add, basically they're above Fox 000. But if you look at these mapped characters locally, they're a little odd verbal odd in a case over the network uh... no apple client or a

Starting point is 00:33:28 uh... when exclamable will happily see them and you know some of the happily store them but it's not a kind of odd to see these uh... what they actually look like locally so these give you an information about no the actual amounts i was doing here on about options and you know you can see here, again, the stat command and the statfs command,

Starting point is 00:33:50 and the ls command, and you can see the difference between a local and a network mount. But we get very, very close. Positive simulation isn't bad. So what other features are we looking at? Obviously, HECL support, improved HECL support for SME3. One of the things we looked at is stream support, how to list streams. We had patches for doing that over iOctols, for example,

Starting point is 00:34:15 or do we want to do it a different way? Linux error handling is pretty good for SIFs, but there are a lot of corner cases in error handling. And so one of them that we've been looking at is the persistent and durable handle cases, recovering pending byte range locks, making sure we do that perfectly. We talked about the XADR cases.

Starting point is 00:34:36 Okay, what about release by release? So some of this is useful reference maybe for later for you guys. If you're curious, I've got a particular version of Linux kernel, does it have a fix? This summarizes sort of by release the changes. But to give you an idea of some of the more recent ones, the MF sim links were added in 3.18.

Starting point is 00:34:55 This allows for emulated sim links to a Windows server, for example, from Linux client. The POSIX reserve character stuff we talked about. And then there were some bugs in the Mac client with the way they handled the CIFS Unix extension. So when you mount with the default CIFS, some of those were in 3.18. 3.19, there was a much improved F-allocate support. Some of these aren't even in NFS.

Starting point is 00:35:21 Some of these features't even in NFS, some of these features. Let's see. 4.2 added the 3.1.1 dialect negotiation, added the duplicate extent, so you can do cp dash dash ref link to RAFS, and that 1,000 times faster copy if the files are both on the same mount on the server. Also added an iOctal for get and set integrity, so if you want to make a particular file on that mount, get better data integrity, that get and set integrity iOctal is there.

Starting point is 00:35:54 And in 4.3, we just made a very useful bug fix that I had to use yesterday. It's kind of funny. There's a time zone problem with Mac server, where if your time zone is off by two hours, and for those of you who run VMs, it's very frequent that your time zone is off by two hours, and for those of you who run VMs, it's very frequent that your time will drift more than two hours. So in any case, there's a fix for working around the Mac OS mount problem when your server and client's time clocks have drifted too far.

Starting point is 00:36:19 That was actually useful to me yesterday. Wonderful things about Linux. You benefit from other people's work. And of course, the most important thing, we have four days and not just Jim McDonough but any of you guys here, please find fixes, help me, let's get these things in. We really want to get these improvements and any bug fixes that... I know that there's a fairly large backlog of things that need to be closed out and bug fixes. And we want to clean that up

Starting point is 00:36:50 so we don't have any worries about any... I don't know. There's a lot of NAS servers. One of the things that I... After many years of doing this, I think it's very hard to underestimate there are so many server bugs to work around, or server features to work around.

Starting point is 00:37:09 And lots of client bugs, too. I don't mean, we have lots of client bugs. But it's fascinating. There's probably 10 separate server implementations here, at least. You'd be amazed how many bugs to work around. It creates SIFT's utils, there's some work going on right now on maybe adding some improved statistics gathering for these and adding some extra new utilities. This is an area that is very easy to make improvements in because, for example, just this last release we added an iOctl that allows you to query the information on a mount.

Starting point is 00:37:46 And we had to do some of this so we could detect, hey, is it RAFS? Does it support block ref counting? Does it support these features? There's a lot of features that are returned about the file system and the device on the server and some of these matter. We now have kernel tools to do it but not the user space tools to make it pretty. Okay, some practical tips. Consider using SFU and MF SimLinks mount options if you're mounting with SMB3.

Starting point is 00:38:09 Use verse equals 2.1 or later. Don't bother with SMB2. Just go straight to 2.1 or later. You might as well, like these Macs that I have, support SMB3, mount with SMB3. For higher performance, especially to Windows, you might increase the R size and W size beyond its default of 1 meg.

Starting point is 00:38:31 Obviously this won't help you with SIFs, but with SMB2, 1, and later try the larger than 1 meg. It didn't help in my test to Samba, but I think it would to Windows. And I don't know about to Mac. I should have experimented with Mac. Obviously case sensitivity can cause problems.

Starting point is 00:38:48 One of the things that drives me crazy is if I want to do a kernel build on CIFS, it works fine to Samba, because we have case sensitivity. The number one personal pain point is I can't do a build, because you have

Starting point is 00:39:03 files in the Linux kernel build that have the same name, different case. So that's an example where it can be painful like doing a kernel build. Okay, so SMB3 support is very solid. Pretty well tested. I'm not very worried. It's missing some key features. We've talked about those key features. The default will change to SMB3. As a matter of fact, I was thinking about doing that in CIFS Utils very soon. Love feedback

Starting point is 00:39:32 on that. We obviously this week are going to continue to do XFS tests, continue to do testing against various devices. I'd like to do more, especially against other NAS devices that I don't have at home, that I don't have in the office. We have a page out there with the XFS test status. In have at home, that I don't have in the office. We have a page out there with the XFS status. In Linux, the file systems have moved to basically one bucket XFS test that includes everything. XFS test. There's hundreds of tests, and here are some of the ones that cause problems. And many of these also have problems on NFS as well.

Starting point is 00:40:01 Usually because of timestamp issues, client and server, mtime stuff, that's really hard to do in a network file system that really only makes sense in a local file system. But there are some FLK issues as well buried in there. But the big picture I want to leave here, NFS 4.2 is actually very exciting. SMB 3.1.1 is very exciting. These are very interesting protocols. There are workloads, even Linux to Linux, where SMB 3.0 is better. There are workloads probably even to SMB-centric NAS, where NFS is a little bit better. But generally, between the two of them,

Starting point is 00:40:42 there's so much breadth of workloads you can cover. And the features that have been added, nobody implements all the features in one protocol, much less both. But they're very interesting protocols. I think you were just mentioning, right, that nobody really understands the implications of some of these cool features. Async operations. Just async operations. The ability to do cool things in your server while you're waiting for operations to be pending. There is a lot of optimization still to be done on both the server and client side in both protocols. But the big picture

Starting point is 00:41:20 is that these have real workloads. It's not just casually storing PDF files from your presentation at SNEA. These are things like Facebook. These are things like big NAS boxes in the back end of cloud. I'm very excited as you see more of this integrated into private cloud. But I do want to encourage you guys this week, do more testing, more patches, more bugs. Let's make this much better, both for NFS as well as for SMB3. I think we've got a couple minutes for questions, so I'd be happy to ask some questions. questions

Starting point is 00:42:11 that's it that's a very good question he asked for us to be free what's the interaction between microsoft and the linux community uh... now there's two parts of this right one is the soma side where we have both server and a huge set of tools and microsoft has been very active with the Samba team for many years in using some of the tools to help debug our servers and vice versa.

Starting point is 00:42:35 On the kernel side, though, they've contributed directly. I mean, I got a patch. There's a patch in 4.2 from Microsoft. So Microsoft has, from time to time, contributed patches. They found some stuff doing network boot of Linux. Kind of neat. Found a bug.

Starting point is 00:42:55 Unfortunately, their server responded faster than... The server responded before the request was processed. Does that make sense? So when you have a really fast server and it responds before the request was sent. That was painful. In any case, that was an example. Microsoft has contributed patches. I think largely the interaction that I've had has been testing and asking crazy questions to Tom Talpy and others about things I see.

Starting point is 00:43:24 On the other hand, clearly Microsoft is evolving, and it's a great question to ask these guys because Microsoft guys are here. But in any case, I think it's been a very good relationship. On the Samba side as well as on the Linux side, you're seeing Microsoft testing things, and we test with them regularly. Many of us test two or three times a test with them regularly.

Starting point is 00:43:45 Many of us test two or three times a year with Microsoft guys. Go ahead. At 5.30 this afternoon the SMB3 Club Fest is open to everybody to come to look. You'll see a large number of people doing SMB3 presentations, many of them on Linux, not all of them are Samba. Some of them are available, third party people, licensing, and a lot of people talk to you down there. You'll be there, food, everything, etc. 5.30, 9.30. So as SW noted, 5.30 we have the Plugfest is open, and a very important thing you noted is that although in my talk

Starting point is 00:44:25 I focused obviously on Samba server and NFS kernel server because they're common and well understood there are other servers as well Ganesha, open source user space server

Starting point is 00:44:37 and lots of closed source servers even on Linux. So we've spent a lot of time talking about the open source ones. In any case I think it's been very I've enjoyed very much being able to interact well with Microsoft guys because seeing these two operating systems interact, one common thread I see is we both have some surprisingly similar problems to deal with. Even with operating systems as diverse as Linux and Windows. And as a client developer, this is just fun.

Starting point is 00:45:11 It's like horror stories sharing the horror. Okay, question back there. Just an NFL-specific one. You mentioned extended attributes. Do you think we'll see extended attribute support in NFS? Yes. Okay. So let me answer it three ways.

Starting point is 00:45:38 One is we already have it in some way because they do have this subset of the security ones. And in another sense, there are named attributes in NFS already. Linux client doesn't expose many. In theory, you could expose some named attributes already. The bigger picture is Mark Eschel and another IBM colleague of his at IBM Almaden have an RFC draft. I really don't know much I can say other than,

Starting point is 00:46:07 I don't know personally what the reaction in the NFS standards community is to that draft. But that would add support for XHatters. Personally, what I struggle with a little bit is that there are Linux-specific security attributes that I'd like to put on the wire to Samba at least that are a little tricky for me. So I have the reverse problem. I can do user attributes but I'd like to be able to do others. Okay, so Tom did you have a question? I did. Did you mention Linux client support

Starting point is 00:46:43 for multi-channel and RDMA? Do you have anything to add on those? So his question was Linux support for multichannel and RDMA. So RDMA support depends to some extent on multichannel. Tom gave some fantastic suggestions a few months ago on how to stage that work. So this is very important stuff to do for Linux. Because right now, to get RDMA support in, which may be useful, by the way, not just in multi-thousand dollar

Starting point is 00:47:15 adapters, but in some of the commodity workloads as well, we need to finish multichannel. And only the first couple steps of multichannel are done. There were patches that I worked on back in June that got a little bit farther but they got kind of dropped on the floor because of work stuff that I was doing unrelated to the Linux client. One of the things I would like to do if there's time this week is maybe make some progress with that. But it's an area where we would

Starting point is 00:47:44 eagerly want to finish. Finishing the multi-channel support and letting a client off would allow us to better balance workload and allow the server to adjust much better to changing network configurations. But it also allows you to advertise RDMA. And RDMA, there are a number of cases, not just the high-end Mellanox cards,

Starting point is 00:48:07 where this is useful. And I think all of you guys for the last three or four years have probably seen Microsoft's spectacular demos of what multichannel gives you. So I would like to see that. But no development progress really since June or July. I don't know if all the previous ideas were specifically addressed

Starting point is 00:48:25 SMB3 encryption. Yes, I talked briefly about SMB3 encryption. So the question was, did I address SMB3 encryption? So SMB3 encryption, the layering makes it easier than it sounds to do in the Linux client because the signing layer and the way things are the way the protocol implementation for the transport layer in the sys.ko makes it fairly easy to do encryption other than the technical obstacles of making sure you've got the

Starting point is 00:49:00 keys correct and all that stuff. We got part way through that with Edgar back in the summer, but haven't worked on it since. So one of the things I'd like to look at this week to finish up if we can. So SMB3 encryption is extremely useful if you don't trust your network.

Starting point is 00:49:20 All of you guys love to hear NSA stories and spy, whatever, spying stories, but it really is helpful. SMB encryption is really powerful because it allows you on a particular share, a particular export to say this data matters, I want this encrypted, now this doesn't have to be encrypted. That's kind of a neat feature. Most other things force you to encrypt everything and obviously IPsec and other are very hard to set up. So SMB3 encryption is relatively easy Windows to Windows to set up,

Starting point is 00:49:49 and I think even the Mac supports it, right? So it is something that's fairly high on our priority list, but it's not finished. Okay, so I would welcome... Let me make sure we're out of time here. Yeah, we're about out of time. I'll be outside and also be at the Plugfest and welcome additional questions. But I do want to encourage you guys to...SW, you had a question?

Starting point is 00:50:10 I noticed a lot of people are taking photos of the screen, etc. We welcome that, but the slides are posted. Steve may have forgotten to update them to the as-presented version, but I'm sure he'll be doing that later. This is a developer conference. We never actually come in and give you the slides in time. But we do expect you to post them afterwards. They're available when you have the program on like the after.

Starting point is 00:50:38 These are as of an hour ago. So these have been updated. The slides are up there. but about 20% have changed. So I found some really fun stuff on the plane. Thank you all. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to

Starting point is 00:51:05 developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the developer community. For additional information about the storage developer conference, visit storage developer.org.

Your Ad Here

Storage Developer Conference - #10: Linux SMB3 and pNFS - Shaping the Future of Network File Systems

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.