Storage Developer Conference - #35: SMB3 and Linux - A Seamless File Sharing Protocol
Episode Date: March 6, 2017...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC
Podcast. Every week, the SDC Podcast presents important technical topics to the developer
community. Each episode is hand-selected by the SNEA Technical Council from the presentations
at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast.
You are listening to SDC Podcast Episode 35.
Today we hear from Jeremy Allison, Samba Team Engineer at Google,
as he presents SMB3 and Linux, a seamless file sharing protocol
from the 2016 Storage Developer Conference.
My name is Jeremy Allison, and I work for Google in their open source programs office.
Having said that, my Google email address isn't there, and this is not a Google-sponsored talk.
So please don't run away saying that these are Google's opinions on anything, because they really aren't.
These are my opinions and ideas, so those that I didn't steal from other people of course.
Right, if you have any questions, comments, you think the presentation is just plain wrong,
please stick your hand up and feel free to ask questions or comments
any time during the talk. I'm very happy to have a discussion with the audience,
eventually a fist fight or something.
So the talk today is about SMB3 and Linux creating seamless POSIX file serving.
So what does that actually mean? And I've had some variant of this
slide on my talks about the past five or six years. I used to have a wonderful picture of a
mushroom cloud blowing the world of storage up because that's what people seem to think cloud
storage is. But I've actually come to believe that it's it's more like this cloud storage yes it's the future yes it's coming yes
it's going to change everything just really slowly it's it's more of an
Ice Age you know glacial pace coming along grinding all the old storage
vendors out of their cozy little homes in the rocks
around the valley that it's carving. So it's not, you know, I first thought it was
going to be this amazing flash of nuclear explosion, everything changes, you know,
we're left with a flat new storage space. Now it seems to be a much slower thing.
And the reason for that is cloud storage is great everybody loves
it and whenever you start talking to app vendors about cloud storage the first
thing you have to say is yeah it's great just rewrite all your existing apps to
use a cloud back-end and and this tends to limit some of its functionality.
Mr. Blobby the Blobfish.
Cloud storage, of course, is a blob store.
And for many existing apps Oh, and here's a man
to just prove me wrong.
For many existing apps
they don't map very well
onto the semantics that most
applications would like to use.
Which is the open, close, rewrite, random access most applications use.
Now, people are rewriting their apps to cope with blob stores,
but as I said, this is really going to take a long time to get there.
And in the meantime, we have some interesting files type protocols
that we can help adapt and help become very useful in large scales so we still
need file access protocols and even if you have a set of services that are
running in the cloud on different virtual machines, they still want to talk to some kind of,
usually, file-based access store.
If they haven't been rewritten,
if you wanna move your existing app into the cloud,
the easiest way to do that is just run it
on a virtual machine, back-end it onto some
cloud storage somewhere that's running
a file access type protocol.
So there are really only two options in that world. There's NFS v4, boo-hiss,
and then there's SMB3, or 2+, as it used to be. I'll just call it SMB3 from now on.
So why SMB3? Well, you know, it's that client's stupid. NFS and SMB are supported by really the only clients that matter
if you're going to run large-scale apps which are Windows, Mac OS, Linux but
Windows as the odd OS out it supports SMB3 way better than it supports NFS.
I mean you know Windows does have NFS clients, but they're
horrible. They've been horrible for a long time. They were horrible when I first started
writing this stuff, which is why I started writing Stamber in the first place. They're
still horrible today. It's much nicer if you can make your servers speak SMB3, and then
Windows clients are much happier, and everything seems to work better. So, I mean, most people in this room, I'm sure,
already know this stuff, but
SMB3 and SMB4,
they're roughly comparable.
SMB3 tends to have
a few more features, so
they copy from each other.
I don't want to get into a shouting match
of who invented what first, except for
ACLs. I know Windows did the, SMB3 did the ACLs first.
So you have delegations.
You can do SMB2 leases, namespaces, DFS.
You've got long-lived handles that can recover if servers crash.
NFS adopted the SMB ACL model, which, as I say, was a disaster. And you have parallel
NFS, and you have an RDMA transport. And SMB3 has pretty much the same kinds of things.
So you've got transparent failover, you've got clustering, you've got many active servers,
which, of course, people have done with NFS2. You have the RDMA transport, SMB Direct.
You have multiple lake, multi-channel. Oh, NFSv4,
I didn't put it down, but it does have encryption as well.
That's basically Kerberized NFS.
SMB3 has transport level encryption.
So SMB3 probably has more stuff in it as defined as the core part of the protocol
without extensions.
You've got snapshots, server-side copies.
And, you know, this is kind of the advantage
of having Microsoft, the 800-pound gorilla
in the file storage space,
essentially revving the protocol whenever they need.
And, you know, in many ways, this actually helps move the protocol on faster not that I'm saying
standards-based stuff is wrong it's just that you know the realities of it you
can add things Microsoft can add things quicker into SMB3 than a
standards organization can with NFS and as I said before Windows clients
really want SMB3 they you know They don't work so well with NFS.
So, NFS v4, where is it really great?
Linux to Linux environments.
You get complete, well, as close as NFS grants you,
it's very close to POSIX semantics.
And the whole thing was designed around POSIX clients
talking to POSIX servers.. And the whole thing was designed around POSIX clients talking to POSIX servers.
You have native advisory locking, didn't work in NFS v3, probably works in NFS v4. You can
rename open files, you can unlink open files. Some of the things like extended attributes
have been added, but Windows clients don't really interoperate very well
in that world. So if you're running Linux to Linux, probably NFS v4 is a good choice.
But who just runs Linux to Linux? Everyone has to interoperate with other systems. So So how are we going to make SMB3 compete in that space?
Well, the answer is Unix extensions.
Add some extensions into the SMB3 protocol that make it work better between POSIX to POSIX clients.
And if we get that right, then you have the universal solvent.
You have a file serving protocol that can serve Windows clients really well, that can serve POSIX clients, Mac and Linux clients really well, and everybody's happy.
And the great thing is, unlike SMB1, SMB3 is actually really close. It's much closer
than the old SMB1 SIF semantics were. So we add POSIX semantics to the Windows protocol. So, how do we do this? Hey, of course you use Samba. Why wouldn't you?
We have a history of making this kind of interaction work very badly that I'm going to talk about.
SMB1 Unix extensions. So if you remember the old SCO before they turned into new insane SCO
they created what they called
POSIX info levels which they added
to SMB1
in conjunction with HP
I think HP
also had an SMB client
in HPUX a long time ago
and so they added these
info levels into SMB1
to make it work better to get the POSIX
style semantics that they required. And so we ended up adopting those. And, you know, in the
best traditions of, hey, it's our protocol, we can do what we like, we just added a bunch of,
hey, well, Microsoft do the same. We added a bunch of extensions. So we had POSIX path names,
we were the pioneers for transport level encryption, we added POSIX ACLs in,
added symlinks and the behavior of the protocol became different when you'd negotiated Unix
extensions. So you could rename an open file, you could delete an open file, it would disappear from the namespace.
File locking was advisory, not mandatory.
A bunch of interesting things that we did.
So how did it work?
Badly, and with horrible mutants, if anyone remembers that wonderful ad there.
So how did it work?
The client would basically create your standard SMB1 connection, just like talking to a Windows server.
And then the server would actually, in the negotiate protocol, would set a magic bit, which we had reserved from Microsoft, that said, I do Unix extensions.
And then the horrors would start. So, because there was no way to basically say,
well, we only want POSIX extensions on some things, not others.
So, I added the setfsinfo request,
which has nothing to do with the file system
and I kind of justified it to myself as it's like,
well, it's kind of like asking the file system
to do something different, so FSInfo is as good a place as any.
I remember Tridge beating me about the head with a baseball bat
for doing this.
So you would send a list of bits that you would like,
and then the server, you would only send that, of course,
if you've got Unix capabilities.
And if you sent that arbitrarily, the server would just reply
an unknown info level to that request. And then set of info reply, the server would reply, here are the
bits that I actually do support. And of course, you know, theoretically, you could have some
nightmare mutant combination where the server would say, sure, I do POSIX locking, but not
POSIX path names. And I will do POSIX rename
but not POSIX delete, all sorts of horrors, right?
None of which the client ever checked ever again, of course.
And then in order to get a POSIX open, there was a special trans2 call, again, which I
justified to myself because there was already a trans2 open
call in SMB1, but then again there are like seven different flavors of open call in SMB1, so hey,
what's one more? So we added a Unix-specific one and you would get back this Unix-specific
file handle. And it worked, but it was really ugly. So, you know, in the best tradition of the Unix
haters handbook, this
thing was truly disgusting.
It was an utter
hack job.
So, I mean, you know, protocol abuse
isn't a crime, but if it would have
been, we would all have been in jail for this.
Using set of... Yeah, that was
the quote. Using set of us info to set global
state on the protocol connection makes me want to vomit.
Yes.
It was pretty ugly.
And then of course Apple did the same.
Following in wonderful footsteps there.
By adding, that was Thursby I believe
who were doing the Dave client for the Macs,
the old SME client before Apple.
Back when Apple was only doing NetAalk. AFP, sorry, AFP. NetATalk was the implementation. So Apple added some
Mac-specific share levels for Get and Set. But the real ugliness, the real sin is the
global state. So once you had negotiated these extensions using setfsinfo, existing
operations change behavior globally. So even if you open the file with the Windows file open request,
you would still get the state change on the server. This is just nasty. It's not really what you want.
What else did we get wrong?
Security.
Well, security is hard.
Sim links and security is a disaster zone.
There's a reason, I think, that Windows only allows
administrator or above to set sim links.
So this took us a great deal of time
to get right. In fact, we had a
CV against this, I think
it was last year, so we're still trying to get this
right. Windows clients
really want to follow symlinks
on the server. Windows clients sort of
natively don't really care about symlinks.
They don't want to see them unless they're
the
repass points were added for that,
but they're not as widely used.
Symlinks are everywhere on POSIX file systems.
But Unix clients really must not follow symlinks.
So Windows likes to follow symlinks.
Unix clients, you really shouldn't follow symlinks.
So here you have disasters like the early days of SAML with people
saying well it would be really convenient to put a
SimLink on the file system and I'll just point it outside
the share, you know, leading
to people, which was fine when it was
only the administrator who could create them
because there was no way for clients to do it
but as soon as we added Unix extensions
that could create SimLinks
you could have the situation where
without proper access checks somebody could add a extensions that could create some links you could have the situation where
without proper access checks somebody could add a sim link onto a remote file
share that pointed to the service local etc shadow or etc password and a client
could then follow that pull things down we had some disasters in those areas
other things that were horrible transport level security was poorly designed, I bear
completely little blame for that. I'm not a crypto guy, I try not to do, I've learned
my lesson from this, don't try doing crypto code because you'll only mess it up. It works
for what it is, the biggest thing I did get right on that was to just leverage GSS API sign and seal for GSS API sealing.
So at least I chose the standard mechanism for sealing.
The problem was I was too ambitious due to being egged on by Steve, who I don't think is in the room, Steve French, basically said, oh, wouldn't it be nice
if you could have multiple encryption contexts
going at the same time?
And so I sort of designed this in and then got it wrong.
So it was never used that way.
Differences in extended attributes
were ignored in the Unix protocol.
We kind of went, oh, thank you, Steve.
I was just, your ears must have been burning.
So extended attributes, we ignored the differences,
so we kind of went, well, nobody uses extended attributes
much on POSIX anyway, so the Windows ones
are probably good enough, and they're not really.
POSIX extended attributes are case-sensitive,
Windows ones are not.
The Windows 1s use the OEM code page set.
Unix 1s really want to use UDF8, etc.
And no one else but Sambra implemented them, of course.
That wasn't helpful.
So, if we're going to do this again and do it right,
we have a clean slate to work on
because we haven't done it yet.
And this is hopefully, we'll get the design, I won't say completely right, but hopefully
a little better than we did last time. So the great thing about SMB, it's handle based.
The only operation you can use to turn a path name into a handle is create.
So once you've got that, you have a convenient abstraction to handle all of the properties that you need to change
for POSIX semantics hung off one thing, the handle.
And handles are used to do deletes, renames, locking, EAs, everything.
All of the places where POSIX actually differs from Windows,
you have to do with handle-based operations.
So it's really convenient.
So obviously the thing is,
well, if you could hang the POSIX semantics off a file handle.
So another nice thing about SMB3 is it's extensible.
It has something called create context for people who know the protocol.
You can essentially hang arbitrary blobs of data off a create request and the server will
ignore them if it doesn't understand them.
So you can, at that point, extend the protocol
to add things, whatever those things may be,
into the operation that converts the path name into handle.
Perfect, exactly what you need.
So this is how the snapshot stuff was done,
adding the time warp, snapshot opens, create app instance ID. I can't even remember
what that is now, but it's weird. So, I mean, because they're all sort of like four character
ones. We were thinking about POSX, but to be honest, we should probably just create
a grid and use that.
Just four character?
I'm sorry?
So
the question is, are they really just for character?
I think
there are two kinds. There are four
character ones. There's like a 32-bit.
But the other
way of doing it is you actually specify
a, I think it's a 128-bit
GUID. So there are actually
two flavors of this.
I think the first set were all
four characters, but everything we've done recently
is one. Yeah, so
from David, the comment was the first
ones were four-character. I think they were four-character
limited, right? Because there wasn't
a length before those. It was just four characters.
And the new ones have a length, so
yeah, well, I'm assuming that we'll just use a
GUID for that. It's probably the easiest kind of thing to do.
So we will have a POSIX create grid.
So how do we negotiate this?
As I say, SMB1 Unix extensions use this 32-bit user capability field in the initial response.
And then I think we had a 64-bit exchange of what are the capabilities
you support and what are the capabilities the server will give you, which is horrible.
So it did require coordination with Microsoft.
We could do this again for SMB3, but I really don't want to.
The more I think about this, the more I think that's a terrible idea. So, again, another place we could do it is in the...
Oh, yes, this is in the negotiate, the neg prot.
You can actually add now arbitrary capabilities.
It's a little bit like create contexts.
It's not a GUID.
It's only a 16-bit field here.
So, again, if we were to use that as a negotiate, I would like to do POSIX capabilities.
Again, we'd have to request a number out of Microsoft.
So I don't think we need to do that.
I don't think we need to have a protocol-level modification,
at least at the initial negprot or any capabilities.
We should just use create context.
I think this is far easier.
So let's look at what Apple did.
Sad Mac, I'm afraid.
It's kind of horrible.
It has an AAPL create context, fair enough,
and Ralph implemented this in Samba with VFS Fruit.
But it's horrible.
Horrible, horrible, horrible.
It repeats all of the ugliness
and the disgusting design of the SMB1.
It was just kind of like,
it mutated into an SMB3 change,
which is just hideous.
Info levels, all of a sudden,
when you negotiate this,
they change their meaning.
They come back and they do different things.
You have to do a negotiation step, which is done by doing a new apple create context on a null name
once you've opened the connection, which is just nasty.
So it's basically, it's back to the old global server state turned on by the client,
which I really, really want to get away with. So I don't want to do what Apple did. Even
though we do what Apple does, as it were, because we have to, I would rather that not
be the method used for SMB3 Unix extensions. So here's what I would like it to look like. No protocol
negotiation. You just open your regular SMB3 connection and then if you want
POSIX you start issuing new create context in your create.
We kicked around for a while the idea of changing the path name parsing in the create call
to allow raw UTF-8 going into there, and eventually decided not to do that.
I think it's a bad idea.
The problem is UCS2 isn't quite UTF-8, but it's kind of close enough. So rather than saying oh if you have this
special create context you need to pass the file name differently, nah just screw
that let's do the standard UCS2 path name parsing. No alternate data streams.
You can't get alternate data streams from POSIX. If you want to do that, open a Windows handle.
Go away.
No negotiation on the subtleties of do I do locking, do I do this, do I do that.
It's an all or nothing thing.
I want to get away from this negotiation idea.
You open the POSIX handle.
What you get back is a POSIX handle.
And at that point, you will get the server's best effort to do all the things that POSIX does,
which is locking, advisory locking,
rename open files, delete open files, etc.
And the handle gets flagged as a POSIX or Unix handle internally,
and so all operations on that handle become POSIX.
Now, there is a great devil in the details here, which I will come to later on.
If I don't come to it, remind me and I'll come back to this slide.
So it's not quite as simple as that.
So what are the POSIX type semantics that you need in a handle?
Reads and writes ignore POSIX locks.
So, I'm coming into this a little bit.
The real problems become what happens when you have multiple handles open on the same object at the same time.
So, reads and writes need to ignore POSIX locks, but not Windows ones.
Lock requests become advisory, not mandatory.
How do they conflict with Windows locks?
You can do unlinks and renames on open handles.
And when you do a directory listing, you see a POSIX namespace.
Should query directory change info level returns?
I don't think so.
Maybe, maybe. The problem is that there are some POSIX attributes, a little bit like the mode bits that you would quite want to have POSIX device.
I'm trying to remember the name of it.
When you have a POSIX device object.
Maybe we should change info level returns.
Not 100% sure.
Get and set EAs.
We should use the Unix namespace, not the Windows namespace,
which means that EA names become case-sensitive.
And to be honest, I'm thinking that what we should do... Oh, question there, Steve.
I didn't remember the user dot.
Yes, that's what I was coming to.
The other thing is, I think we should expose the full possible POSIX EA namespace,
which at least on Linux means you have system. or user.
So in other words, we should no longer implicitly imply user.
We should expect the client to send the fully qualified namespace in those EA requests.
I have an idea about the mode bits that we'll get to.
So, symlinks are still horrible.
How do we create them?
There are symlink operations within SMB3 for Windows.
We can probably just leverage those.
Problem is, what are the extended attributes
and ACL operations that are permitted on a symlink?
Now, this is where POSIX falls down.
Solaris, for instance, completely disallows
extended attributes and ACLs on a symlink.
Linux does not. FreeBSD, I think, will do extended attributes. I'm not sure about ACLs.
So this is where we trip over our own feet. POSIX isn't a known, well-defined thing when
you get down to this level of detail. I am very tempted to just basically say,
if it's a symlink, you are not having
extended attributes or ACLs on it over SMB3.
The problem with that is I then run into
the SE Linux people on Linux who will start
whining that this breaks all their code.
Yes, Steve.
Since Windows has these reparse points,
the other relevant question is,
what happens if you try to set an EA today for a reparse point if it is one of the recently published?
So Steve's comment is,
Windows has reparse points.
What happens when you try to set EAs or Eccles
on a reparse point on a Windows file system?
The answer to that is, you're going to write me the SMB torture test.
And we'll find out.
Yes, so we don't know.
So this...
On a repass point.
Oh, okay, so Uri mentions they've solved the problem in the same way that I want to do,
which is to say, yeah, no EAs or ACLs on a repass point.
I would love to do that and tell the SE Linux people to take a flying jump.
I don't know whether I can get away with that.
Maybe.
Because they're wrong and horrible.
But, so the current POSIX info levels are these two bytes,
hex 200 to 2FF.
I think we only use up to 0B of them.
But right now, there's only a one-byte info level space for SMB3.
We could define info levels only on POSIX handles
because you have to have a handle to do an info level.
So if you were doing a query directory,
you would open a directory handle as POSIX
and then we did the find first, find next equivalents
that would return POSIX.
I'd rather not do that.
If I can get away, and I'm still not 100% convinced,
but if I can get away without defining new info levels
and I think we're close to that
I would like to do so
let's see
Windows lock ranges
run sign, positive sign, there's lots of
nasty little details
and the thing is all these things need to be
very well specified because if you don't
specify them,
it becomes essentially like the Apple thing of the Samba to Samba private protocol
that no one else could possibly implement,
because it's not specified well enough to be clear on what the semantics really are.
And I don't want to do that.
I at least want to have something that people could have a chance to implement
if they really wanted to.
So, doing it in Samba, both Volker
and Richard Sharp, who's not here today, did some
prototypes of it. Internal Samba issues
mostly meaning people not deciding exactly what that
should look like, prevented it going into production, which I'm kind of
glad about, really,
because, like I say, I would like to get to something
that we agree on by consensus before we start putting code in.
So it means we'll probably have some work-in-progress branches.
So the big blocker for at least Samba implementation
was the global state.
And because it was my fault it went in in the first place,
I was the person who had to go in and take it all out.
So when I first did the SMB Unix, SMB1 Unix extensions,
I was incredibly lazy and said,
well, one server, one client, one server process, one client,
no one is ever going to do anything different.
Let's just stick some global state in and have a function called to get it.
And so I finally nailed that in March of this year.
I got rid of it.
And so now we actually have proper structures passed around for all path name operations
which can have flag state internally which says this is a POSIX path,
this is a Windows path, and we can differentiate correctly between them.
So we've still got some cleanup to do, but it's mostly done. The fact that
we can now, at least internally to Samba, stick a POSIX flag on a handle and you start
to get POSIX semantics, because this is how we're doing it internally for a lot of the SMB1 stuff that's being mapped onto it.
So what about ACLs?
Yes.
SMB3's native ACL format is Windows, which are close but not quite the same as NFS v4s. so we have to support
the
POSIX ACL draft
spec which we do
over SMB1
there's a lot of code out there that uses it, expects it
there's a new idea
called rich ACLs that's having trouble getting to
Linux kernel maybe
we already have
info levels for these POSIX ACLs. I would like to not use
those info levels. I would like to ditch them completely and I think we can do it
and here's how. So here's what I think we should do. There are well-known
mappings for POSIX mode bits and ACLs into Windows ACL representations.
So what I'd like to do is have, when you do a get ACL or set ACL on a handle, you get
the best effort mapping of what is on disk given to the client.
So if you do that on a POSIX handle and it's POSIX ACLs on disk, what you will get is a Windows ACL that has been mapped as closely as possible directly from the POSIX ACL.
And when you do a set on a handle that's been opened with POSIX semantics, with a POSIX trait context, if you're on top of a file system that does POSIX draft ACLs,
you do a best effort mapping.
You don't go through the normal Windows ACL mapping layer.
You do a best effort mapping
to put what you were given
directly onto POSIX draft ACLs.
That way, if you're running in a Linux-to-Linux environment,
you essentially have an identity mapping
between the POSIX ACLs at that point.
But it means we don't have to have new info levels.
It does put a little bit more work on the client,
because when you do a GET ACL on a POSIX handle,
that means that the client has to do some figuring out what they've actually got,
whether they've got a real Windows ACL, a rich ACL, a POSIX ACL,
or just the mapping of mode bits.
But if we specify each of those mappings really carefully,
clients can make the decision on what they actually got back from the remote server.
So, you know, we just need to make this very well known.
And I think, I was thinking about it, I think, given a well-known SID for the ACL mask, we can
actually cover the ACL mask in the POSIX draft ACLs. We may need to create a
well-known SID for that, I don't think Microsoft has one right now, but we
already have a well-known SID namespace that Volcker invented for POSIX
permissions directly.
So how are we... So let's presume we've got all the
features worked out.
How are we actually going to prototype it?
Oh, yes, thank you.
I guess we could work along
the development system.
Just briefly go into what you do about
UIDs and GIDs.
So the comment is the UIDs and GIDs.
So right now, at least in Samba, we have a S-1-5-21.
I think it is an S-5-1-22.
And the 21 is the UID space, followed by just a number, which is the UID.
And the 22 is the GID. Now, Microsoft have a different mapping that they use for their NFS ACLs,
so we can pick one or the other,
or we could specify that either of those mappings is allowable.
Because, to be honest, the client's going to have to sort it out.
So, you know, it's a matter of when the client gets an ACL
back, the client is going to have to do some interpretation as to what kind of thing this
really is. You're going to be straight Windows ACLs, nothing but Windows ACLs, right? They
may look weird for a DraftPosix ACL mapped directly to Windows, but it will be on the wire a Windows ACL.
No translation, no straight UIDs, JIDs.
Everything goes from the get and set Windows.
Doing it that way allows us not to have new info levels
and to leverage the existing get security to script,
set security to script to code that everybody has to have anyway.
So, does that answer the question?
So how are we going to prototype this up? So actually we have a really nice client,
like top level client library that is already used inside the GNOME VFS and other things.
Client under bar XXX.
It's not... We have an ABI that we don't break,
which is libsmb-client.
Internal to that, that then calls into these internal functions,
which we don't guarantee an ABI on,
which we can change all the time.
And this kind of allows us to rapidly prototype.
The great thing about having this mapping from external ABI to internal interface
is that it actually allowed us, without changing the external ABI whatsoever,
to add SMB2 and 3 support into the external interface.
So as Linux desktops upgraded and the new versions of the Sambra libraries appeared,
the GNOME that they were running, the GNOME VFS and hopefully the KDE one as well,
gained SMB2 and 3 capabilities without any changes being needed in the desktop UI software,
which was really nice because I hate that stuff.
I wouldn't want to have to get them to change it because it would be horribly wrong.
So, you know, obviously we have to agree with Steve and his collaborators.
We can rapidly prototype this in a work-in-progress branch
using the CLI call, client underbar calls.
Eventually, it's going to have to go into real clients
that sit in the Linux kernel.
And, you know, we're not going to do the encryption support. Basically I don't want to put anything
that I don't I want everything to actually have real code behind it. There were a bunch of things
at least in SMB 1.0's extensions that were kitchen sink they were thrown in there as okay let's do it
this way no one ever implemented it.
And because of that,
it turned out that it was a bad design
that was unimplementable,
like the encryption context support.
And then there's me being mean
about the BSD of the Monks Club, of course,
and Solaris.
Of course, there's only one BSD that matters.
Three BSD, yeah.
But, no, you didn't say that.
Anyway, so, you know, it would be nice if they have clients.
I don't think BSD has an internal kernel client,
so they can use the lib SMB client or the SMB client stuff that we make available.
Solaris, I know, does have an internal client.
Oh, okay.
I'm kind of finishing up anyway. So the devil is in the details. And I began to realise
this is one of the reasons why the effort kind of slowed down of late because of a bug
that we recently had that showed how subtle and complex this stuff can be. All of a sudden
Samba started crashing, which was a bit of a panic for certain customers, and it turned out that this
was just something we'd never tested before. Someone was operating both
Windows and Linux clients against the Samba server, the Windows clients had
added named streams to files, and when the POSIX clients tried to delete them,
they tripped into a code path
that had never been tested and blew us up because we were attempting to treat the stream
name as a POSIX file name and it ended up basically recursing and blowing the stack
up because it's like, oh, okay, here's a new file with stream name on the end, let's call
delete on that and then something split it off and then put the stream name on the end, let's call delete on that, and then something split it off, and
then put the stream back on again, so
bad things happened.
So
how much of these do we need?
Some of these are going to be sort of Samba
implementation notes, pretty much like
the Windows implementation notes that come out
in the
MS protocol specs, and
we don't yet know how many of those will be Samba implementation notes
and how many of those will be actually critical protocol-specific details.
Now, I know some of them are critical protocol-specific details,
like what happens if you have a Windows handle
and a POSIX handle open at the same time
and the POSIX handle tries to delete?
What happens if you've got Windows and POSIX
and they both have conflicting locks?
Or one of them does it right against the locks on the other?
All this kind of thing I know have to be protocol-relevant details.
Some of the other things may end up just being sample implementation notes.
So how do we know when we've won and we're successful
and we have a happy POSIX to POSIX SMB3 world?
When someone else implements it,
and I'm picking Microsoft here, but it could be anyone else.
It could be EMC, it could be NetApp, it could be Isilon, whoever.
It would be really nice to have a second implementer.
At that point, we know which things, for for sure are Samba implementation details and which things are
actually critical details
we need to document properly.
Hey, maybe Azure would
be easiest.
Because it's new and it doesn't have all the old
crufty crap inside it like we do.
So the
main thing I want people to
take away with is
let's take our time and get this
right. And I know it's been many years already, and we don't seem to have progressed, but
I think we actually have, because we've learned a lesson of don't screw it up. And that's
a valuable lesson to learn. And we did screw it up last time, and let's not do that again.
Now, I have great faith that we will screw it up, but we'll screw it up in a different
way this time, which is,. Which is my definition of learning.
I knew it. Interesting.
Exactly. We'll screw it up in a way we didn't
expect.
But hopefully, if we have more than
one implementation, that will really
boil down the things that we screw up to
problems that
more than just me and Steve
missed.
So, oh, ah, yes, TBD.
Okay, I did give them to the SNARE people,
so wherever the SNARE people put the slides
is where the slides will be.
Oh, great.
Wonderful, wonderful, sorry.
Great.
So are there any other questions, comments,
anything you want to go back?
Oh, go on, yeah.
Do you have a spec under development for somewhere that someone, anything you want to go back? Oh, go on. Do you have a spec under development for somewhere
that someone wants to follow
the development of this LRR?
So the question is, have I done
anything yet? Have I written anything
up yet? And the answer to that is unfortunately
no, not yet.
When I do
probably post it,
the best way to do this actually
might be,
I think we have a mailing list.
We could mail things around on a mailing list,
change types of documents.
We could also do it in a wiki.
Somebody did,
I think James started to write something up because he wants some extensions to go in
to do with being able to tell servers,
I need an infinite timeout on this write and read,
that he would like to make part of this implement
and make part of these changes as well.
So, yeah, I mean, I'm not going to be doing anything for probably about a month
because I've got the trip to Redmond next week.
But I will, at some point, try and write
down and I will shamelessly steal the
template from the Microsoft documents
and try and make something that at least looks
like it could have been put into those.
So,
you know, October, November time,
hopefully we'll have something for people to look at and go,
that's wrong!
Yeah, any other questions?
Yeah?
So for your class example,
should you consider adding it as an info class?
Because there's like four different info classes.
If you added one, you'd have a full U-chart
or the 0, 0, and the above.
Oh, that's a...
So David's comment is,
you have an info class, you idiot,
why don't you split the two bytes into two
and do it that way?
That's a really clever idea.
We should do that.
One more note.
We investigated 3.1 with negotiate context
for encryption algorithm negotiation.
Yes, I did mention that.
Okay, yeah.
Sorry.
You want to flag deposits like my source of support deposits.
Yeah, so the comment was, you know,
I already did mention that we've got negotiate context.
But to be honest, I'd rather not do that I mean
maybe maybe maybe we could do that the the way I would like to do that is for a
client to just ask on the create context and if the server just ignores it then
you know don't ask for POSIX on this server and at that point you don't have
to do any negotiation because you already know
at the open time. Yeah, Michael.
Just commenting directly, I mean,
not using the negotiate context
makes it possibly easier to use SMB2
as well.
Just using the negotiate context
and that makes it easier.
So the other thing would be restricting to SMB2.
Oh, that's a good point.
Yes. So Michael's comment was, if we don't restricting to SMB2. Oh, that's a good point. Yes.
So Michael's comment was,
if we don't depend on the SMB3
negotiate context features,
then theoretically you could drop it
into an existing SMB2-based implementation
and still have everything except the encryption.
But the encryption is part of the SMB3 protocol part anyway.
I mean, the other risk I see would be for client owners, like Steve.
The first operation he wants to do is create a new file.
He might like to know if it's going to be with
or not before he sends it to the browser.
So it forces him to.
I don't think he cares.
If it fails, he's got to cope with it anyway.
The scenario to think about is, OK, so I'm trying to create got to cope with it anyway.....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..... Apple, as you know, and the Linux client, will default to, if it has one of these seven characters,
will be massive.
The service supports POSIX,
we know the service supports POSIX,
but not now. So what I was planning on doing
was opening dot
on the chair.
The first thing, right? Open dot on the chair.
So the
comment is, I might really need
to know at least on this connection if you do
do POSIX, but as Steve just pointed out, you can do POSIX create context open on dot, which
essentially has the same effect.
And if the server responds to that, then you know it can negotiate POSIX.
So I'm trying to avoid having to coordinate with you as much as possible.
It's not that I don't like talking to you.
It's just that these things are harder when you've got to sort of carve chunks of namespace out or something.
The only other case I would leave, even if we never did it,
if there's ever a V2, because we messed up something at V1,
having some context of whether the server understands V2 versus V1,
I don't know what the context is. Ah, but you changed the GUID on the create context for that.
You send the one you want, and if it ignores it,
you send the lower one.
Oh, good point.
Thank you, of course.
Yes, that's smart.
Sorry, this is getting into a discussion.
It's hard to track with all that.
So for people listening, we were discussing about the best way to handle
having to rev the protocol with a negotiate context would be like that.
And the conclusion that we came to from Tom there was that the best way to do that
would be to rev a different GUID for the new protocol.
And then when you do the open, the first open, when you want to find out what protocol they support, you send both GUID for the new protocol and then when you do the open, the first open when you want to find out
what protocol they support, you send
both GUIDs, you send both create
contexts and the server
picks the highest level it understands.
So,
right, any other questions,
comments? Wow, I actually
filled up the time, amazing.
Alright,
well I think we're having a panel discussion later on where we can continue this.
So what time is that Steve?
Tomorrow.
Ah, it's tomorrow. Okay.
Yeah, so I mean anyone who would like to help edit documents or at least have input into how we think we should do this,
that would be very helpful. But we do, it would be nice to get at least some preliminary level of documents done by no later than the
end of the year. So that's, I'll put that on my OKRs or something at Google. So one
last question from Steve. comment about the possible extension is very valid because until we actually see the wire traffic, we won't see it. For example, you can get a re-dirt, you know, re-dirt has five times the traffic that NFS has on re-dirt.
So the comment is we may want to change it, but to be honest what I would like to do is rather than sticking this out in production and finding we screwed it all up, I would at least like to have some private testing, even if it's put out there in production,
at least for Samba releases, I would like to have the ability to say this is a draft
implementation feature, we reserve the right to destroy this and turn it off at any point,
or change it, this is not a stable solution.
Until we get to the point where we think
we can do POSIX to POSIX and everything seems to work,
and then we can say, right, we have a protocol
that we can at least preliminary have a supported implementation of.
And I think with that, I have to let Metz take over.
All right, thank you very much.
Thanks for listening.
If you have questions about the material presented in this podcast,
be sure and join our developers mailing list
by sending an email to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further
with your peers in the developer community.
For additional information about the Storage Developer Conference,
visit storagedeveloper.org.