Storage Developer Conference - #35: SMB3 and Linux - A Seamless File Sharing Protocol

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast. You are listening to SDC Podcast Episode 35. Today we hear from Jeremy Allison, Samba Team Engineer at Google, as he presents SMB3 and Linux, a seamless file sharing protocol from the 2016 Storage Developer Conference.

Starting point is 00:00:48 My name is Jeremy Allison, and I work for Google in their open source programs office. Having said that, my Google email address isn't there, and this is not a Google-sponsored talk. So please don't run away saying that these are Google's opinions on anything, because they really aren't. These are my opinions and ideas, so those that I didn't steal from other people of course. Right, if you have any questions, comments, you think the presentation is just plain wrong, please stick your hand up and feel free to ask questions or comments any time during the talk. I'm very happy to have a discussion with the audience, eventually a fist fight or something.

Starting point is 00:01:33 So the talk today is about SMB3 and Linux creating seamless POSIX file serving. So what does that actually mean? And I've had some variant of this slide on my talks about the past five or six years. I used to have a wonderful picture of a mushroom cloud blowing the world of storage up because that's what people seem to think cloud storage is. But I've actually come to believe that it's it's more like this cloud storage yes it's the future yes it's coming yes it's going to change everything just really slowly it's it's more of an Ice Age you know glacial pace coming along grinding all the old storage vendors out of their cozy little homes in the rocks

Starting point is 00:02:25 around the valley that it's carving. So it's not, you know, I first thought it was going to be this amazing flash of nuclear explosion, everything changes, you know, we're left with a flat new storage space. Now it seems to be a much slower thing. And the reason for that is cloud storage is great everybody loves it and whenever you start talking to app vendors about cloud storage the first thing you have to say is yeah it's great just rewrite all your existing apps to use a cloud back-end and and this tends to limit some of its functionality. Mr. Blobby the Blobfish.

Starting point is 00:03:10 Cloud storage, of course, is a blob store. And for many existing apps Oh, and here's a man to just prove me wrong. For many existing apps they don't map very well onto the semantics that most applications would like to use. Which is the open, close, rewrite, random access most applications use.

Starting point is 00:03:31 Now, people are rewriting their apps to cope with blob stores, but as I said, this is really going to take a long time to get there. And in the meantime, we have some interesting files type protocols that we can help adapt and help become very useful in large scales so we still need file access protocols and even if you have a set of services that are running in the cloud on different virtual machines, they still want to talk to some kind of, usually, file-based access store. If they haven't been rewritten,

Starting point is 00:04:10 if you wanna move your existing app into the cloud, the easiest way to do that is just run it on a virtual machine, back-end it onto some cloud storage somewhere that's running a file access type protocol. So there are really only two options in that world. There's NFS v4, boo-hiss, and then there's SMB3, or 2+, as it used to be. I'll just call it SMB3 from now on. So why SMB3? Well, you know, it's that client's stupid. NFS and SMB are supported by really the only clients that matter

Starting point is 00:04:48 if you're going to run large-scale apps which are Windows, Mac OS, Linux but Windows as the odd OS out it supports SMB3 way better than it supports NFS. I mean you know Windows does have NFS clients, but they're horrible. They've been horrible for a long time. They were horrible when I first started writing this stuff, which is why I started writing Stamber in the first place. They're still horrible today. It's much nicer if you can make your servers speak SMB3, and then Windows clients are much happier, and everything seems to work better. So, I mean, most people in this room, I'm sure, already know this stuff, but

Starting point is 00:05:29 SMB3 and SMB4, they're roughly comparable. SMB3 tends to have a few more features, so they copy from each other. I don't want to get into a shouting match of who invented what first, except for ACLs. I know Windows did the, SMB3 did the ACLs first.

Starting point is 00:05:48 So you have delegations. You can do SMB2 leases, namespaces, DFS. You've got long-lived handles that can recover if servers crash. NFS adopted the SMB ACL model, which, as I say, was a disaster. And you have parallel NFS, and you have an RDMA transport. And SMB3 has pretty much the same kinds of things. So you've got transparent failover, you've got clustering, you've got many active servers, which, of course, people have done with NFS2. You have the RDMA transport, SMB Direct. You have multiple lake, multi-channel. Oh, NFSv4,

Starting point is 00:06:32 I didn't put it down, but it does have encryption as well. That's basically Kerberized NFS. SMB3 has transport level encryption. So SMB3 probably has more stuff in it as defined as the core part of the protocol without extensions. You've got snapshots, server-side copies. And, you know, this is kind of the advantage of having Microsoft, the 800-pound gorilla

Starting point is 00:06:58 in the file storage space, essentially revving the protocol whenever they need. And, you know, in many ways, this actually helps move the protocol on faster not that I'm saying standards-based stuff is wrong it's just that you know the realities of it you can add things Microsoft can add things quicker into SMB3 than a standards organization can with NFS and as I said before Windows clients really want SMB3 they you know They don't work so well with NFS. So, NFS v4, where is it really great?

Starting point is 00:07:33 Linux to Linux environments. You get complete, well, as close as NFS grants you, it's very close to POSIX semantics. And the whole thing was designed around POSIX clients talking to POSIX servers.. And the whole thing was designed around POSIX clients talking to POSIX servers. You have native advisory locking, didn't work in NFS v3, probably works in NFS v4. You can rename open files, you can unlink open files. Some of the things like extended attributes have been added, but Windows clients don't really interoperate very well

Starting point is 00:08:07 in that world. So if you're running Linux to Linux, probably NFS v4 is a good choice. But who just runs Linux to Linux? Everyone has to interoperate with other systems. So So how are we going to make SMB3 compete in that space? Well, the answer is Unix extensions. Add some extensions into the SMB3 protocol that make it work better between POSIX to POSIX clients. And if we get that right, then you have the universal solvent. You have a file serving protocol that can serve Windows clients really well, that can serve POSIX clients, Mac and Linux clients really well, and everybody's happy. And the great thing is, unlike SMB1, SMB3 is actually really close. It's much closer than the old SMB1 SIF semantics were. So we add POSIX semantics to the Windows protocol. So, how do we do this? Hey, of course you use Samba. Why wouldn't you?

Starting point is 00:09:10 We have a history of making this kind of interaction work very badly that I'm going to talk about. SMB1 Unix extensions. So if you remember the old SCO before they turned into new insane SCO they created what they called POSIX info levels which they added to SMB1 in conjunction with HP I think HP also had an SMB client

Starting point is 00:09:39 in HPUX a long time ago and so they added these info levels into SMB1 to make it work better to get the POSIX style semantics that they required. And so we ended up adopting those. And, you know, in the best traditions of, hey, it's our protocol, we can do what we like, we just added a bunch of, hey, well, Microsoft do the same. We added a bunch of extensions. So we had POSIX path names, we were the pioneers for transport level encryption, we added POSIX ACLs in,

Starting point is 00:10:12 added symlinks and the behavior of the protocol became different when you'd negotiated Unix extensions. So you could rename an open file, you could delete an open file, it would disappear from the namespace. File locking was advisory, not mandatory. A bunch of interesting things that we did. So how did it work? Badly, and with horrible mutants, if anyone remembers that wonderful ad there. So how did it work? The client would basically create your standard SMB1 connection, just like talking to a Windows server.

Starting point is 00:10:49 And then the server would actually, in the negotiate protocol, would set a magic bit, which we had reserved from Microsoft, that said, I do Unix extensions. And then the horrors would start. So, because there was no way to basically say, well, we only want POSIX extensions on some things, not others. So, I added the setfsinfo request, which has nothing to do with the file system and I kind of justified it to myself as it's like, well, it's kind of like asking the file system to do something different, so FSInfo is as good a place as any.

Starting point is 00:11:29 I remember Tridge beating me about the head with a baseball bat for doing this. So you would send a list of bits that you would like, and then the server, you would only send that, of course, if you've got Unix capabilities. And if you sent that arbitrarily, the server would just reply an unknown info level to that request. And then set of info reply, the server would reply, here are the bits that I actually do support. And of course, you know, theoretically, you could have some

Starting point is 00:11:56 nightmare mutant combination where the server would say, sure, I do POSIX locking, but not POSIX path names. And I will do POSIX rename but not POSIX delete, all sorts of horrors, right? None of which the client ever checked ever again, of course. And then in order to get a POSIX open, there was a special trans2 call, again, which I justified to myself because there was already a trans2 open call in SMB1, but then again there are like seven different flavors of open call in SMB1, so hey, what's one more? So we added a Unix-specific one and you would get back this Unix-specific

Starting point is 00:12:37 file handle. And it worked, but it was really ugly. So, you know, in the best tradition of the Unix haters handbook, this thing was truly disgusting. It was an utter hack job. So, I mean, you know, protocol abuse isn't a crime, but if it would have been, we would all have been in jail for this.

Starting point is 00:13:00 Using set of... Yeah, that was the quote. Using set of us info to set global state on the protocol connection makes me want to vomit. Yes. It was pretty ugly. And then of course Apple did the same. Following in wonderful footsteps there. By adding, that was Thursby I believe

Starting point is 00:13:19 who were doing the Dave client for the Macs, the old SME client before Apple. Back when Apple was only doing NetAalk. AFP, sorry, AFP. NetATalk was the implementation. So Apple added some Mac-specific share levels for Get and Set. But the real ugliness, the real sin is the global state. So once you had negotiated these extensions using setfsinfo, existing operations change behavior globally. So even if you open the file with the Windows file open request, you would still get the state change on the server. This is just nasty. It's not really what you want. What else did we get wrong?

Starting point is 00:14:09 Security. Well, security is hard. Sim links and security is a disaster zone. There's a reason, I think, that Windows only allows administrator or above to set sim links. So this took us a great deal of time to get right. In fact, we had a CV against this, I think

Starting point is 00:14:32 it was last year, so we're still trying to get this right. Windows clients really want to follow symlinks on the server. Windows clients sort of natively don't really care about symlinks. They don't want to see them unless they're the repass points were added for that,

Starting point is 00:14:48 but they're not as widely used. Symlinks are everywhere on POSIX file systems. But Unix clients really must not follow symlinks. So Windows likes to follow symlinks. Unix clients, you really shouldn't follow symlinks. So here you have disasters like the early days of SAML with people saying well it would be really convenient to put a SimLink on the file system and I'll just point it outside

Starting point is 00:15:10 the share, you know, leading to people, which was fine when it was only the administrator who could create them because there was no way for clients to do it but as soon as we added Unix extensions that could create SimLinks you could have the situation where without proper access checks somebody could add a extensions that could create some links you could have the situation where

Starting point is 00:15:25 without proper access checks somebody could add a sim link onto a remote file share that pointed to the service local etc shadow or etc password and a client could then follow that pull things down we had some disasters in those areas other things that were horrible transport level security was poorly designed, I bear completely little blame for that. I'm not a crypto guy, I try not to do, I've learned my lesson from this, don't try doing crypto code because you'll only mess it up. It works for what it is, the biggest thing I did get right on that was to just leverage GSS API sign and seal for GSS API sealing. So at least I chose the standard mechanism for sealing.

Starting point is 00:16:14 The problem was I was too ambitious due to being egged on by Steve, who I don't think is in the room, Steve French, basically said, oh, wouldn't it be nice if you could have multiple encryption contexts going at the same time? And so I sort of designed this in and then got it wrong. So it was never used that way. Differences in extended attributes were ignored in the Unix protocol. We kind of went, oh, thank you, Steve.

Starting point is 00:16:43 I was just, your ears must have been burning. So extended attributes, we ignored the differences, so we kind of went, well, nobody uses extended attributes much on POSIX anyway, so the Windows ones are probably good enough, and they're not really. POSIX extended attributes are case-sensitive, Windows ones are not. The Windows 1s use the OEM code page set.

Starting point is 00:17:09 Unix 1s really want to use UDF8, etc. And no one else but Sambra implemented them, of course. That wasn't helpful. So, if we're going to do this again and do it right, we have a clean slate to work on because we haven't done it yet. And this is hopefully, we'll get the design, I won't say completely right, but hopefully a little better than we did last time. So the great thing about SMB, it's handle based.

Starting point is 00:17:39 The only operation you can use to turn a path name into a handle is create. So once you've got that, you have a convenient abstraction to handle all of the properties that you need to change for POSIX semantics hung off one thing, the handle. And handles are used to do deletes, renames, locking, EAs, everything. All of the places where POSIX actually differs from Windows, you have to do with handle-based operations. So it's really convenient. So obviously the thing is,

Starting point is 00:18:13 well, if you could hang the POSIX semantics off a file handle. So another nice thing about SMB3 is it's extensible. It has something called create context for people who know the protocol. You can essentially hang arbitrary blobs of data off a create request and the server will ignore them if it doesn't understand them. So you can, at that point, extend the protocol to add things, whatever those things may be, into the operation that converts the path name into handle.

Starting point is 00:18:53 Perfect, exactly what you need. So this is how the snapshot stuff was done, adding the time warp, snapshot opens, create app instance ID. I can't even remember what that is now, but it's weird. So, I mean, because they're all sort of like four character ones. We were thinking about POSX, but to be honest, we should probably just create a grid and use that. Just four character? I'm sorry?

Starting point is 00:19:27 So the question is, are they really just for character? I think there are two kinds. There are four character ones. There's like a 32-bit. But the other way of doing it is you actually specify a, I think it's a 128-bit

Starting point is 00:19:41 GUID. So there are actually two flavors of this. I think the first set were all four characters, but everything we've done recently is one. Yeah, so from David, the comment was the first ones were four-character. I think they were four-character limited, right? Because there wasn't

Starting point is 00:19:57 a length before those. It was just four characters. And the new ones have a length, so yeah, well, I'm assuming that we'll just use a GUID for that. It's probably the easiest kind of thing to do. So we will have a POSIX create grid. So how do we negotiate this? As I say, SMB1 Unix extensions use this 32-bit user capability field in the initial response. And then I think we had a 64-bit exchange of what are the capabilities

Starting point is 00:20:26 you support and what are the capabilities the server will give you, which is horrible. So it did require coordination with Microsoft. We could do this again for SMB3, but I really don't want to. The more I think about this, the more I think that's a terrible idea. So, again, another place we could do it is in the... Oh, yes, this is in the negotiate, the neg prot. You can actually add now arbitrary capabilities. It's a little bit like create contexts. It's not a GUID.

Starting point is 00:21:02 It's only a 16-bit field here. So, again, if we were to use that as a negotiate, I would like to do POSIX capabilities. Again, we'd have to request a number out of Microsoft. So I don't think we need to do that. I don't think we need to have a protocol-level modification, at least at the initial negprot or any capabilities. We should just use create context. I think this is far easier.

Starting point is 00:21:28 So let's look at what Apple did. Sad Mac, I'm afraid. It's kind of horrible. It has an AAPL create context, fair enough, and Ralph implemented this in Samba with VFS Fruit. But it's horrible. Horrible, horrible, horrible. It repeats all of the ugliness

Starting point is 00:21:51 and the disgusting design of the SMB1. It was just kind of like, it mutated into an SMB3 change, which is just hideous. Info levels, all of a sudden, when you negotiate this, they change their meaning. They come back and they do different things.

Starting point is 00:22:08 You have to do a negotiation step, which is done by doing a new apple create context on a null name once you've opened the connection, which is just nasty. So it's basically, it's back to the old global server state turned on by the client, which I really, really want to get away with. So I don't want to do what Apple did. Even though we do what Apple does, as it were, because we have to, I would rather that not be the method used for SMB3 Unix extensions. So here's what I would like it to look like. No protocol negotiation. You just open your regular SMB3 connection and then if you want POSIX you start issuing new create context in your create.

Starting point is 00:23:00 We kicked around for a while the idea of changing the path name parsing in the create call to allow raw UTF-8 going into there, and eventually decided not to do that. I think it's a bad idea. The problem is UCS2 isn't quite UTF-8, but it's kind of close enough. So rather than saying oh if you have this special create context you need to pass the file name differently, nah just screw that let's do the standard UCS2 path name parsing. No alternate data streams. You can't get alternate data streams from POSIX. If you want to do that, open a Windows handle. Go away.

Starting point is 00:23:53 No negotiation on the subtleties of do I do locking, do I do this, do I do that. It's an all or nothing thing. I want to get away from this negotiation idea. You open the POSIX handle. What you get back is a POSIX handle. And at that point, you will get the server's best effort to do all the things that POSIX does, which is locking, advisory locking, rename open files, delete open files, etc.

Starting point is 00:24:19 And the handle gets flagged as a POSIX or Unix handle internally, and so all operations on that handle become POSIX. Now, there is a great devil in the details here, which I will come to later on. If I don't come to it, remind me and I'll come back to this slide. So it's not quite as simple as that. So what are the POSIX type semantics that you need in a handle? Reads and writes ignore POSIX locks. So, I'm coming into this a little bit.

Starting point is 00:24:52 The real problems become what happens when you have multiple handles open on the same object at the same time. So, reads and writes need to ignore POSIX locks, but not Windows ones. Lock requests become advisory, not mandatory. How do they conflict with Windows locks? You can do unlinks and renames on open handles. And when you do a directory listing, you see a POSIX namespace. Should query directory change info level returns? I don't think so.

Starting point is 00:25:24 Maybe, maybe. The problem is that there are some POSIX attributes, a little bit like the mode bits that you would quite want to have POSIX device. I'm trying to remember the name of it. When you have a POSIX device object. Maybe we should change info level returns. Not 100% sure. Get and set EAs. We should use the Unix namespace, not the Windows namespace, which means that EA names become case-sensitive.

Starting point is 00:26:14 And to be honest, I'm thinking that what we should do... Oh, question there, Steve. I didn't remember the user dot. Yes, that's what I was coming to. The other thing is, I think we should expose the full possible POSIX EA namespace, which at least on Linux means you have system. or user. So in other words, we should no longer implicitly imply user. We should expect the client to send the fully qualified namespace in those EA requests. I have an idea about the mode bits that we'll get to.

Starting point is 00:26:49 So, symlinks are still horrible. How do we create them? There are symlink operations within SMB3 for Windows. We can probably just leverage those. Problem is, what are the extended attributes and ACL operations that are permitted on a symlink? Now, this is where POSIX falls down. Solaris, for instance, completely disallows

Starting point is 00:27:19 extended attributes and ACLs on a symlink. Linux does not. FreeBSD, I think, will do extended attributes. I'm not sure about ACLs. So this is where we trip over our own feet. POSIX isn't a known, well-defined thing when you get down to this level of detail. I am very tempted to just basically say, if it's a symlink, you are not having extended attributes or ACLs on it over SMB3. The problem with that is I then run into the SE Linux people on Linux who will start

Starting point is 00:27:57 whining that this breaks all their code. Yes, Steve. Since Windows has these reparse points, the other relevant question is, what happens if you try to set an EA today for a reparse point if it is one of the recently published? So Steve's comment is, Windows has reparse points. What happens when you try to set EAs or Eccles

Starting point is 00:28:22 on a reparse point on a Windows file system? The answer to that is, you're going to write me the SMB torture test. And we'll find out. Yes, so we don't know. So this... On a repass point. Oh, okay, so Uri mentions they've solved the problem in the same way that I want to do, which is to say, yeah, no EAs or ACLs on a repass point.

Starting point is 00:28:49 I would love to do that and tell the SE Linux people to take a flying jump. I don't know whether I can get away with that. Maybe. Because they're wrong and horrible. But, so the current POSIX info levels are these two bytes, hex 200 to 2FF. I think we only use up to 0B of them. But right now, there's only a one-byte info level space for SMB3.

Starting point is 00:29:20 We could define info levels only on POSIX handles because you have to have a handle to do an info level. So if you were doing a query directory, you would open a directory handle as POSIX and then we did the find first, find next equivalents that would return POSIX. I'd rather not do that. If I can get away, and I'm still not 100% convinced,

Starting point is 00:29:42 but if I can get away without defining new info levels and I think we're close to that I would like to do so let's see Windows lock ranges run sign, positive sign, there's lots of nasty little details and the thing is all these things need to be

Starting point is 00:30:02 very well specified because if you don't specify them, it becomes essentially like the Apple thing of the Samba to Samba private protocol that no one else could possibly implement, because it's not specified well enough to be clear on what the semantics really are. And I don't want to do that. I at least want to have something that people could have a chance to implement if they really wanted to.

Starting point is 00:30:25 So, doing it in Samba, both Volker and Richard Sharp, who's not here today, did some prototypes of it. Internal Samba issues mostly meaning people not deciding exactly what that should look like, prevented it going into production, which I'm kind of glad about, really, because, like I say, I would like to get to something that we agree on by consensus before we start putting code in.

Starting point is 00:30:52 So it means we'll probably have some work-in-progress branches. So the big blocker for at least Samba implementation was the global state. And because it was my fault it went in in the first place, I was the person who had to go in and take it all out. So when I first did the SMB Unix, SMB1 Unix extensions, I was incredibly lazy and said, well, one server, one client, one server process, one client,

Starting point is 00:31:18 no one is ever going to do anything different. Let's just stick some global state in and have a function called to get it. And so I finally nailed that in March of this year. I got rid of it. And so now we actually have proper structures passed around for all path name operations which can have flag state internally which says this is a POSIX path, this is a Windows path, and we can differentiate correctly between them. So we've still got some cleanup to do, but it's mostly done. The fact that

Starting point is 00:31:53 we can now, at least internally to Samba, stick a POSIX flag on a handle and you start to get POSIX semantics, because this is how we're doing it internally for a lot of the SMB1 stuff that's being mapped onto it. So what about ACLs? Yes. SMB3's native ACL format is Windows, which are close but not quite the same as NFS v4s. so we have to support the POSIX ACL draft spec which we do

Starting point is 00:32:31 over SMB1 there's a lot of code out there that uses it, expects it there's a new idea called rich ACLs that's having trouble getting to Linux kernel maybe we already have info levels for these POSIX ACLs. I would like to not use those info levels. I would like to ditch them completely and I think we can do it

Starting point is 00:32:51 and here's how. So here's what I think we should do. There are well-known mappings for POSIX mode bits and ACLs into Windows ACL representations. So what I'd like to do is have, when you do a get ACL or set ACL on a handle, you get the best effort mapping of what is on disk given to the client. So if you do that on a POSIX handle and it's POSIX ACLs on disk, what you will get is a Windows ACL that has been mapped as closely as possible directly from the POSIX ACL. And when you do a set on a handle that's been opened with POSIX semantics, with a POSIX trait context, if you're on top of a file system that does POSIX draft ACLs, you do a best effort mapping. You don't go through the normal Windows ACL mapping layer.

Starting point is 00:33:50 You do a best effort mapping to put what you were given directly onto POSIX draft ACLs. That way, if you're running in a Linux-to-Linux environment, you essentially have an identity mapping between the POSIX ACLs at that point. But it means we don't have to have new info levels. It does put a little bit more work on the client,

Starting point is 00:34:11 because when you do a GET ACL on a POSIX handle, that means that the client has to do some figuring out what they've actually got, whether they've got a real Windows ACL, a rich ACL, a POSIX ACL, or just the mapping of mode bits. But if we specify each of those mappings really carefully, clients can make the decision on what they actually got back from the remote server. So, you know, we just need to make this very well known. And I think, I was thinking about it, I think, given a well-known SID for the ACL mask, we can

Starting point is 00:34:47 actually cover the ACL mask in the POSIX draft ACLs. We may need to create a well-known SID for that, I don't think Microsoft has one right now, but we already have a well-known SID namespace that Volcker invented for POSIX permissions directly. So how are we... So let's presume we've got all the features worked out. How are we actually going to prototype it? Oh, yes, thank you.

Starting point is 00:35:15 I guess we could work along the development system. Just briefly go into what you do about UIDs and GIDs. So the comment is the UIDs and GIDs. So right now, at least in Samba, we have a S-1-5-21. I think it is an S-5-1-22. And the 21 is the UID space, followed by just a number, which is the UID.

Starting point is 00:35:41 And the 22 is the GID. Now, Microsoft have a different mapping that they use for their NFS ACLs, so we can pick one or the other, or we could specify that either of those mappings is allowable. Because, to be honest, the client's going to have to sort it out. So, you know, it's a matter of when the client gets an ACL back, the client is going to have to do some interpretation as to what kind of thing this really is. You're going to be straight Windows ACLs, nothing but Windows ACLs, right? They may look weird for a DraftPosix ACL mapped directly to Windows, but it will be on the wire a Windows ACL.

Starting point is 00:36:26 No translation, no straight UIDs, JIDs. Everything goes from the get and set Windows. Doing it that way allows us not to have new info levels and to leverage the existing get security to script, set security to script to code that everybody has to have anyway. So, does that answer the question? So how are we going to prototype this up? So actually we have a really nice client, like top level client library that is already used inside the GNOME VFS and other things.

Starting point is 00:37:07 Client under bar XXX. It's not... We have an ABI that we don't break, which is libsmb-client. Internal to that, that then calls into these internal functions, which we don't guarantee an ABI on, which we can change all the time. And this kind of allows us to rapidly prototype. The great thing about having this mapping from external ABI to internal interface

Starting point is 00:37:28 is that it actually allowed us, without changing the external ABI whatsoever, to add SMB2 and 3 support into the external interface. So as Linux desktops upgraded and the new versions of the Sambra libraries appeared, the GNOME that they were running, the GNOME VFS and hopefully the KDE one as well, gained SMB2 and 3 capabilities without any changes being needed in the desktop UI software, which was really nice because I hate that stuff. I wouldn't want to have to get them to change it because it would be horribly wrong. So, you know, obviously we have to agree with Steve and his collaborators.

Starting point is 00:38:13 We can rapidly prototype this in a work-in-progress branch using the CLI call, client underbar calls. Eventually, it's going to have to go into real clients that sit in the Linux kernel. And, you know, we're not going to do the encryption support. Basically I don't want to put anything that I don't I want everything to actually have real code behind it. There were a bunch of things at least in SMB 1.0's extensions that were kitchen sink they were thrown in there as okay let's do it this way no one ever implemented it.

Starting point is 00:38:45 And because of that, it turned out that it was a bad design that was unimplementable, like the encryption context support. And then there's me being mean about the BSD of the Monks Club, of course, and Solaris. Of course, there's only one BSD that matters.

Starting point is 00:39:01 Three BSD, yeah. But, no, you didn't say that. Anyway, so, you know, it would be nice if they have clients. I don't think BSD has an internal kernel client, so they can use the lib SMB client or the SMB client stuff that we make available. Solaris, I know, does have an internal client. Oh, okay. I'm kind of finishing up anyway. So the devil is in the details. And I began to realise

Starting point is 00:39:33 this is one of the reasons why the effort kind of slowed down of late because of a bug that we recently had that showed how subtle and complex this stuff can be. All of a sudden Samba started crashing, which was a bit of a panic for certain customers, and it turned out that this was just something we'd never tested before. Someone was operating both Windows and Linux clients against the Samba server, the Windows clients had added named streams to files, and when the POSIX clients tried to delete them, they tripped into a code path that had never been tested and blew us up because we were attempting to treat the stream

Starting point is 00:40:12 name as a POSIX file name and it ended up basically recursing and blowing the stack up because it's like, oh, okay, here's a new file with stream name on the end, let's call delete on that and then something split it off and then put the stream name on the end, let's call delete on that, and then something split it off, and then put the stream back on again, so bad things happened. So how much of these do we need? Some of these are going to be sort of Samba

Starting point is 00:40:35 implementation notes, pretty much like the Windows implementation notes that come out in the MS protocol specs, and we don't yet know how many of those will be Samba implementation notes and how many of those will be actually critical protocol-specific details. Now, I know some of them are critical protocol-specific details, like what happens if you have a Windows handle

Starting point is 00:40:57 and a POSIX handle open at the same time and the POSIX handle tries to delete? What happens if you've got Windows and POSIX and they both have conflicting locks? Or one of them does it right against the locks on the other? All this kind of thing I know have to be protocol-relevant details. Some of the other things may end up just being sample implementation notes. So how do we know when we've won and we're successful

Starting point is 00:41:21 and we have a happy POSIX to POSIX SMB3 world? When someone else implements it, and I'm picking Microsoft here, but it could be anyone else. It could be EMC, it could be NetApp, it could be Isilon, whoever. It would be really nice to have a second implementer. At that point, we know which things, for for sure are Samba implementation details and which things are actually critical details we need to document properly.

Starting point is 00:41:50 Hey, maybe Azure would be easiest. Because it's new and it doesn't have all the old crufty crap inside it like we do. So the main thing I want people to take away with is let's take our time and get this

Starting point is 00:42:05 right. And I know it's been many years already, and we don't seem to have progressed, but I think we actually have, because we've learned a lesson of don't screw it up. And that's a valuable lesson to learn. And we did screw it up last time, and let's not do that again. Now, I have great faith that we will screw it up, but we'll screw it up in a different way this time, which is,. Which is my definition of learning. I knew it. Interesting. Exactly. We'll screw it up in a way we didn't expect.

Starting point is 00:42:31 But hopefully, if we have more than one implementation, that will really boil down the things that we screw up to problems that more than just me and Steve missed. So, oh, ah, yes, TBD. Okay, I did give them to the SNARE people,

Starting point is 00:42:49 so wherever the SNARE people put the slides is where the slides will be. Oh, great. Wonderful, wonderful, sorry. Great. So are there any other questions, comments, anything you want to go back? Oh, go on, yeah.

Starting point is 00:43:03 Do you have a spec under development for somewhere that someone, anything you want to go back? Oh, go on. Do you have a spec under development for somewhere that someone wants to follow the development of this LRR? So the question is, have I done anything yet? Have I written anything up yet? And the answer to that is unfortunately no, not yet. When I do

Starting point is 00:43:19 probably post it, the best way to do this actually might be, I think we have a mailing list. We could mail things around on a mailing list, change types of documents. We could also do it in a wiki. Somebody did,

Starting point is 00:43:42 I think James started to write something up because he wants some extensions to go in to do with being able to tell servers, I need an infinite timeout on this write and read, that he would like to make part of this implement and make part of these changes as well. So, yeah, I mean, I'm not going to be doing anything for probably about a month because I've got the trip to Redmond next week. But I will, at some point, try and write

Starting point is 00:44:06 down and I will shamelessly steal the template from the Microsoft documents and try and make something that at least looks like it could have been put into those. So, you know, October, November time, hopefully we'll have something for people to look at and go, that's wrong!

Starting point is 00:44:22 Yeah, any other questions? Yeah? So for your class example, should you consider adding it as an info class? Because there's like four different info classes. If you added one, you'd have a full U-chart or the 0, 0, and the above. Oh, that's a...

Starting point is 00:44:35 So David's comment is, you have an info class, you idiot, why don't you split the two bytes into two and do it that way? That's a really clever idea. We should do that. One more note. We investigated 3.1 with negotiate context

Starting point is 00:44:48 for encryption algorithm negotiation. Yes, I did mention that. Okay, yeah. Sorry. You want to flag deposits like my source of support deposits. Yeah, so the comment was, you know, I already did mention that we've got negotiate context. But to be honest, I'd rather not do that I mean

Starting point is 00:45:05 maybe maybe maybe we could do that the the way I would like to do that is for a client to just ask on the create context and if the server just ignores it then you know don't ask for POSIX on this server and at that point you don't have to do any negotiation because you already know at the open time. Yeah, Michael. Just commenting directly, I mean, not using the negotiate context makes it possibly easier to use SMB2

Starting point is 00:45:34 as well. Just using the negotiate context and that makes it easier. So the other thing would be restricting to SMB2. Oh, that's a good point. Yes. So Michael's comment was, if we don't restricting to SMB2. Oh, that's a good point. Yes. So Michael's comment was, if we don't depend on the SMB3

Starting point is 00:45:49 negotiate context features, then theoretically you could drop it into an existing SMB2-based implementation and still have everything except the encryption. But the encryption is part of the SMB3 protocol part anyway. I mean, the other risk I see would be for client owners, like Steve. The first operation he wants to do is create a new file. He might like to know if it's going to be with

Starting point is 00:46:09 or not before he sends it to the browser. So it forces him to. I don't think he cares. If it fails, he's got to cope with it anyway. The scenario to think about is, OK, so I'm trying to create got to cope with it anyway..... . . .

Starting point is 00:46:28 . . . . . . . .

Starting point is 00:46:36 . . . . . . . .

Starting point is 00:46:44 ..... Apple, as you know, and the Linux client, will default to, if it has one of these seven characters, will be massive. The service supports POSIX, we know the service supports POSIX, but not now. So what I was planning on doing was opening dot on the chair. The first thing, right? Open dot on the chair.

Starting point is 00:47:01 So the comment is, I might really need to know at least on this connection if you do do POSIX, but as Steve just pointed out, you can do POSIX create context open on dot, which essentially has the same effect. And if the server responds to that, then you know it can negotiate POSIX. So I'm trying to avoid having to coordinate with you as much as possible. It's not that I don't like talking to you.

Starting point is 00:47:28 It's just that these things are harder when you've got to sort of carve chunks of namespace out or something. The only other case I would leave, even if we never did it, if there's ever a V2, because we messed up something at V1, having some context of whether the server understands V2 versus V1, I don't know what the context is. Ah, but you changed the GUID on the create context for that. You send the one you want, and if it ignores it, you send the lower one. Oh, good point.

Starting point is 00:48:00 Thank you, of course. Yes, that's smart. Sorry, this is getting into a discussion. It's hard to track with all that. So for people listening, we were discussing about the best way to handle having to rev the protocol with a negotiate context would be like that. And the conclusion that we came to from Tom there was that the best way to do that would be to rev a different GUID for the new protocol.

Starting point is 00:48:22 And then when you do the open, the first open, when you want to find out what protocol they support, you send both GUID for the new protocol and then when you do the open, the first open when you want to find out what protocol they support, you send both GUIDs, you send both create contexts and the server picks the highest level it understands. So, right, any other questions, comments? Wow, I actually

Starting point is 00:48:40 filled up the time, amazing. Alright, well I think we're having a panel discussion later on where we can continue this. So what time is that Steve? Tomorrow. Ah, it's tomorrow. Okay. Yeah, so I mean anyone who would like to help edit documents or at least have input into how we think we should do this, that would be very helpful. But we do, it would be nice to get at least some preliminary level of documents done by no later than the

Starting point is 00:49:09 end of the year. So that's, I'll put that on my OKRs or something at Google. So one last question from Steve. comment about the possible extension is very valid because until we actually see the wire traffic, we won't see it. For example, you can get a re-dirt, you know, re-dirt has five times the traffic that NFS has on re-dirt. So the comment is we may want to change it, but to be honest what I would like to do is rather than sticking this out in production and finding we screwed it all up, I would at least like to have some private testing, even if it's put out there in production, at least for Samba releases, I would like to have the ability to say this is a draft implementation feature, we reserve the right to destroy this and turn it off at any point, or change it, this is not a stable solution. Until we get to the point where we think we can do POSIX to POSIX and everything seems to work,

Starting point is 00:50:12 and then we can say, right, we have a protocol that we can at least preliminary have a supported implementation of. And I think with that, I have to let Metz take over. All right, thank you very much. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org.

Starting point is 00:50:38 Here you can ask questions and discuss this topic further with your peers in the developer community. For additional information about the Storage Developer Conference, visit storagedeveloper.org.

Your Ad Here

Storage Developer Conference - #35: SMB3 and Linux - A Seamless File Sharing Protocol

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.