Storage Developer Conference - #20: SMB3 Multi-Channel in Samba
Episode Date: August 31, 2016...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC
Podcast. Every week, the SDC Podcast presents important technical topics to the developer
community. Each episode is hand-selected by the SNEA Technical Council from the presentations
at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast.
You are listening to SDC Podcast Episode 20.
Today we hear from Michael Adam, Principal Software Engineer with Red Hat and the Samba team,
as he presents SMB3 Multichannel in Samba from the 2015 Storage Developers Conference.
My name is Michael Adam, or Michael Adam, which is much more easy to pronounce for most.
I'm one of the Samba developers, working on Samba since several years now.
I think we're on seven, eight years by now.
I'm currently working with Red Hat in the storage team.
So we are working on improving SMB experience
on top of distributed file systems.
Yes, please, comments.
And ever since Microsoft had started announcing
the SMB3 protocol version,
which was then still called SMB2.2,
we have tried to implement these various features.
Yesterday, Ira Cooper has given an overview of the status in Samba.
And so after a long time of designing things
and making little progress,
mostly due to, well,
we are just not very many developers
and it's difficult to get
dedicated resources.
I want to present you today
about the status in multichannel,
which is arguably one of the most
advanced development projects
we have in the Samba SMB3 effort.
A brief recap of yesterday's overview of the status of SMB3 and Samba, which not all may have seen.
So SMB3 consists of many features.
Most of the features are centered around clustering,
all active and failover clustering of the protocol.
Let's take this one. Sorry.
Okay, so there are basically by now three dialect versions,
SMB 3.0, 3.02, which was released with Windows 8.1,
and SMB 3.1.1, which is very recent, released with Windows 10.
And so what do we have in Samba?
We have, let's say, the easy parts.
We have the negotiation of the protocol.
We have the new crypto algorithm, secure negotiation.
We have extended our durable handle support to include the new version of just requesting durable handles.
What I'm going to talk about is multi-channel, which is, well, let's say the most universally useful features of it
because it's not related to clustering.
It is also having a good, well,
it brings benefit to non-clustered servers as well.
So the clustering features are a work in progress.
You will hear about them in a later session this afternoon by Günter Deschner and José Rivera.
Then persistent file handles, which is sort of like durable files with guarantees we don't
have yet.
We have the underpinnings, but we have to implement the guarantees, basically.
And clustering features, some clustering aspects are sort of a prerequisite for that.
So there are tracer patches, basically, but no finished things.
And that's, from my point of view, the most important aspect. So multi-channel and a little
bit of Outlook towards SMB Direct support, which is SMB3 over RDMA, will be part of this
talk. As you see, these things are released since 2012, so very shortly
after SMB3 was released by Microsoft and the new protocol versions have
just been released with Samba 4.3 but beware these are just let's say the
basics of the protocol, the advanced features are all negotiable
capabilities and multi-channel is the topic of this talk.
What I'm going to talk about is
not just my work,
it's a joint work of several
people, most notably I've been
working together with Stefan Metzmacher, so
he has done
much of the designing and
development.
I've recently been able to pick up our joint effort and drive it forward.
Okay, so what is multichannel?
Let me just ask, who doesn't know what Samba is?
I forgot to ask in the beginning.
Great.
So I will briefly explain about multichannel.
I won't explain a lot about Samba,
but just in the ways important for me to explain our design choices.
All right.
So what is it?
Basically, multichannel is a SMB protocol means to bundle multiple transport connections into one authenticated session, SMB session.
The important aspect here is that the client can query the server for information about the interfaces,
and it can choose which interfaces to connect to and to bind into one session.
Usually, the server, you will see in a little more detail further on,
the server reports interfaces and bandwidths
and characteristics of the interface, if it's TCP or RDMA.
I'm here in this talk only talking about TCP for now.
So usually the client chooses the fastest interface it can get if offered multiple ones.
And the one important thing is that the session is really valid as long as one last channel is still active on the session.
So you can bind multiple channels.
You don't need to use them all, but you won't lose the authenticated user context on the server as long as there's at least one channel.
You can plug multiple cables, have multi-channel session, unplug certain cables,
a switch dies in between, as long as there's one path to the server, everything is intact.
So that explains that there are two basically purposes.
One is increased throughput.
I can bundle several channels of same quality to increase throughput
because the client will then thin out the load, this guy also file IO over the
various connections and it is also serving purposes of improved reliability
so it's not been just a single session fails the the whole session fails. That's not the case. That's a general thing.
Okay.
It stopped
working.
Okay.
What can I do?
Okay. I already covered that.
Channels of the same quality will increase throughput
and fail back to a channel of lesser quality will keep the session intact.
Okay, how does this work from the protocol level, from the protocol point of view. So the client, the SMB client first establishes one transport connection,
one TCP connection.
Then there is an IO control, IO control FSTTL query network interface info
that will be sent to the server on this first connection.
The server will reply with a list of interfaces and qualities of these
interfaces. The client then chooses to bind additional TCP or with SMB direct RDMA connections
then called channels to the already established SMB3 session. This is done with the so-called
session bind, which is just a special form of the session setup request. Usually the
session setup request doesn't, so there is a session ID field.
This is usually set to zero on the server
when establishing initial connection replies
by providing the session ID generated.
And the session bind is apart from one flag that is passed,
the binding flag,
but also giving the session ID that the client wishes to bind to.
So that's the session bind.
And as I said, users' connection is the same
and the best quality only usually.
So if I have three gigabit interfaces and one 10 gigabit,
usually the client will choose the one 10 gigabit interface
with the option to fall back to the slower ones
if this faster interface fails.
One important
thing for us, because
Samba has since many years supported
all active
SMB clustering with the help of
the CDDB software.
One important thing is
multi-channel.
Does this only
connect, does the client usually only connect to a single
node of the cluster or does it connect to multiple
nodes, it usually just connects to a single
node, that's an important thing for us to remember
and then in order
to make this thing robust there are
replay and retry mechanisms, so called
epoch numbers, channel epoch numbers
associated to this, to the whole
concepts, these things need
I don't go into details here at this point,
but these need consideration and implementation in Samba.
Yes, question?
So you said that Windows binds only to a single node,
but if the server is like a cluster thing,
does Windows make another connection using a specific IP address?
How can they prevent it from going to a different...
That's the response.
So the server has the choice to respond
when you ask for interfaces
to just reply with interfaces from the same node.
And as far as I'm aware of,
this is what happens.
You see?
But failing over to another node is a different thing.
So in that case, all
SMB requests, read, write,
and so on, arrive at
the same hardware
at the same host, and then
so you don't need to coordinate
how it goes to disk between the nodes.
That's, I think,
the reason.
Okay.
Now about multichannel in Samba.
What did we have to think about when implementing it?
One very important thing you have to know about Samba
is that Samba is not a multithreaded program, a daemon, but it is a multiprocess.
So traditionally there is a one-to-one correspondence between TCP connections to Samba and forked type processes of the main SMBD server process.
That was a design decision many, many years ago.
It has many advantages.
It's also arguably more resource intensive and so on.
But it's what we are living with today
and it has worked very well.
In this case, we needed to think about it
because look at this.
Assume
the client has one
session here
and we want to do
a multi-channel
connection
so
or I can use this one
we do
establish a second
connection which
automatically creates
a new process
and then we are
in this situation
that we have two processes
with
connections
bundled to a single session
that then have to coordinate how they access the actual file system.
And we wanted to avoid that.
So what can we do about it?
We were thinking, well, the idea is to just transfer the new TCP connection
to the process that's already serving this session.
So this is the established session.
It's already accessing the disk, possibly.
The new connection arrives.
Let me check whether this doesn't...
It's...
Well, devices.
Then we want to have a means of passing or transferring the connection over to the original process,
the first SMB daemon, and then bind it as a channel into the connection there so that in that case all the network traffic arrives at the same process here
and then the disk access is automatically coordinated. So yeah, so how can we do it?
In Linux, there's a mechanism called FD passing. It's part of the send message, receive message
calls and that was the natural candidate to try and achieve something like that.
So when do we do it?
There is a natural choice when session bind happens.
But we thought it might be better to do it as early as possible.
And so the idea was to do it based on the so-called client grid. In the SMB packet, starting with SMB 2.1, so it's always available in SMB 3,
the client provides a so-called client grid, identifying the client, basically.
And so instead of waiting for session setup for the bind request and then looking for the session ID which SMBD process serves this
session ID we can just proactively
move all connections
that belong to a certain client
good to one process and basically
establish a per client good
single process model for it
the reason for this idea
is that
if you could go back
if if this is done later is that if you could go back if
if this is done later
then in this
in this TCP's connection
already
other operations
may have happened
in theory
the client could have
established a different session
which is not to be bound
to the original session
but then establish
a second session
and so what I'll be doing
with this guy
has already happened in that process?
If we then pass it, we have to take care of more things.
So we wanted to keep it simple
and therefore came up with the idea
to pass it over by client good in the negotiate request.
So when the first request comes in,
the first SMB request ever basically,
negotiate, provides the client good,
we say, aha, we pass it over,
and then the response for the first
SMB packet is coming from
the original thing.
This would look like
this. So this is the flow
of packets, basically.
I'm not doing any, all TCP
and so on. So here's the TCP connection.
This
initiates a fork, creates
child one of the main SMBD process.
I didn't include TCP act and so on, right?
So there's the SMB2 negotiate request.
It enters here, comes back, session setup.
This is the initial, the first connection which establishes the session.
Then we have a second connect, TCP connect here.
This forks a new child process.
And then we receive the negotiated request.
We look up the client that we can extract out of the SMB packet.
And we look up in our internal tables.
Okay, there is child one.
This is the process serving this.
We pass the client the TCP socket, which is an FD,
which is a file descriptor.
This can be passed with the FD passing.
And so the whole TCP connection is passed over to the original child 1,
and the negotiated reply is sent from this process here.
This is the basic idea.
This process doesn't have anything more to do and can just die.
It goes away.
The session setup bind request is already arriving at this original SMB D1
and everything is much more easy then.
So that's a little more detail of the flow of things with this idea.
There's a question.
Is the design decision to use a single process
potentially going to bottleneck
you when
if you have a client that would
be load balancing
over multiple connections
and you have to go to one trial?
Wait a minute. What about performance?
Exactly. That was the question, right?
I mean, we had this
one SMBD per TCP connection, and this scales out quite well.
But also there, we've had performance problems.
And so Samba is not purely single process, multi-process anymore.
But in these processes, we do use short-lived worker threads, P-threads, for IO operations.
And so the most important things that could be blocking are
already panning out over CPUs and so
on. So this is
not proven. These are heuristics
that this will work. We still
need to do benchmarks and possibly some
tunings.
What we are doing really is
forking for
connections and then using threads
in order to scale better.
This is
a very good question and I already
anticipated the question of course in preparing
the answer because it's the obvious one.
So
as I said, still needs proof but
I think this will
work out quite well.
One of the next things on the agenda is really doing these tests, these kinds of tests.
Okay.
So, all right.
We had this, but with that choice, there may be problems with the choice of using the client good.
It was brought to our attention by Tom Talby of Microsoft
that the relevance of the client good may have changed in SMB.
Yes, there's a question?
So you mentioned the fault tolerance, right?
Fault tolerance, yes.
In the process, one process is affecting the two, this is the only change.
Yes. Yes? Yes? Yes?
Well, I mean, there can be several reasons for a connection failing.
If there's a reason somewhere in between,
because a network cable is plugged, a switch is dying or so on,
this is covered. Even if
a network adapter on the host fails,
that's covered. If the process
crashes, that's not covered.
But I mean,
I think that's the same situation in Windows
or threaded approaches.
So hopefully that's not
in that respect respect the fault tolerance
should not be affected as much
by the way I mentioned
that for other
SMB
the witness protocol
especially we will have a talk later today
directly after this talk
we will hear a talk by Volker after this talk we will hear a talk by
Volker Lendecke about our messaging
which includes, so it is
mainly due to him that we now have the
FD passing, we didn't have that
that's one of the important preparations
and so he did
a whole new
Unix datagram
based messaging system inside Samba
which is what we are building on here.
So you can learn about that in greater detail later.
Just as a side remark.
So there may be problems with the choice using the client grid.
This needs to be thought about,
but the problems are even more severe, the possible problems,
because our assumption, of course,
is that whenever a client tries to do a session bind
from one connection binding it to an existing session,
these two connections will use the same client GUID.
We thought the client GUID is the identifier of the client as an entity,
and this is only reasonable.
And we also assumed that the server actually enforces this,
I mean the Windows server.
And there's some evidence from the MS-SMB2 document.
I have noted two sections here. So receiving the Windows server. And there's some evidence from the MS-SMB2 document.
I have noted two sections here.
So receiving the create request.
Replay detection, which is an aspect of multi-channel.
Replay means one channel fails, so network dies.
Another channel is intact, and the client doesn't know because it didn't receive a reply packet for a certain packet.
He doesn't know whether it has received.
He resend the packet over another channel.
It marks it as a replay operation.
And this, in the document, checks whether the client code is the same.
If it's not the same, then it's rejected.
So there is evidence.
There is more evidence that the client code is checked in various places. But the truth is, as I learned from Tom Telpy, the server
doesn't enforce it. And luckily, so we thought, oh well, we have to rethink
everything, that doesn't work, we can't rely on it. But yeah, the server doesn't,
the latest information is the server does not enforce it, but our evidence is actually true.
Clients can be expected to do it because if they don't, something may really not be working.
And it's not explicitly documented like this in the docs, which created a lot of confusion for me in the last few weeks, actually.
The good news,
I heard that this will be documented.
It will be noted that
it may be a bad decision, even if
the client is free to choose a different
client GUID.
It may be a bad thing, and
it will be noted that it's completely okay
for server implementers to enforce
the equality of the client GUID
in multi-channel sessions.
Yay.
Yes.
Is that an alternative?
If not, is it not an alternative?
Is there anything else that can be installed for Windows?
Is there a need?
Yes.
You can assume that the Windows client always uses the same client grid for... Windows server.
Does the Windows server have to deal with the same problem,
identify connections coming from the same client associated settings?
As far as I know, it has no other means than using the client grid.
Yeah, it doesn't have.
And so the obvious assumption should be that it's the same.
And this algorithm for checking the replay and validating the replay operation is, I mean, it looks obvious.
But it's not documented as enforced.
Apparently, it is not enforced, but we can assume that it's like that.
So, well, a little strange situation. While we're at it, the client also popped up in a different context,
namely the context of leases.
This is just a digression here, right?
But we were scanning the document.
We were looking.
So leases, basically the caching,
client-side caching mechanisms that are handed up from the file system
through SMB to the client-side caching mechanisms that are handed up from the file system through SMB to the client.
Leases are identified by lease keys.
And if you look in the document, there is algorithms for leasing in an object store it was it called when the client requests a lease
so the client goes to the SMB server
the SMB server
process on the
on the Windows basically
requests from the file system a lease
and the
just updated documentation
just as of release of Windows 10
it says ok up to Windows 7
and the behavioral notes footnot okay, up to Windows 7 on the behavioral notes, footnotes
up to Windows 7
the client identity
or client lease ID
as it's called
as of now, consists basically
of the client good
and of the lease key
both come from the protocol
so this is
combined into a certain numeric entity
and then provided to the file system as the lease identifier. Starting with Windows 8,
only the lease key is used according to the new updated document. But there's a problem
there because all the other things, for breaking a lease, for instance, is object store indicates a lease break,
refers to, well, okay, the server takes the client GUID
that the file system has given it back
and looks in the global lease table list
and identifies the lease table to use
and then use this lease key to look this up.
So this is, from my point of view, it's inconsistent.
I don't really know what to do.
But, well, I'll have to report that to DocHelp so that this gets amended.
It's not so bad.
I mean, it's mostly an implementation detail, if you wish.
I think implementers have the choice to whatever key to use. And so we are currently identifying our leases in Samba with a combination of lease key and client key.
So we have this hierarchy that is referenced there.
It's just a note because we recently stumbled across this.
And, yeah, that's the situation.
So it's a document.
Beware, it's slightly inconsistent
the recent version of it
ok
end of the digression
any questions so far?
just
I mean
I have included this
slightly modified
oh there is a question back there
yes
so what happens if you receive
a session set without a bind flag?
Aha, if I receive
a session set without a bind flag
on that second connection?
But with a provided
session ID or without?
So we have a session ID zero, no bind flag.
Yeah, then it's a new session.
A new session.
Then you'll spawn another child?
No.
Child processes are only spawned upon a TCP connect.
So, it was already possible before.
So, it's just one process originally in Samba serves a TCP connection,
and this can very well have multiple sessions inside, right?
That has been very possible before.
For instance, heavily used by terminal servers or something
and so this is
still valid of course
we will just have more sessions
and more connections
inside a single process
we will have to deal with that
alright so the plan B
if client
what's our plan B
if the client good thing wouldn't have been resolved the plan B would if client so, that was our plan B, if the client good thing wouldn't have been resolved,
the plan B would have been to really
not
pass on by client good in
negotiate, but pass later in session
bind, but as I said earlier,
we'd have to deal with
more complicated
well, bookkeeping
then, and
it seems we don't have to do this.
So, that's basically
the
explanation of
our design of what's there.
What's the status right now?
There's a long list.
Let's just briefly go through it.
The messaging provide using
static sockets with send message and so on is done.
I think you'll hear
by Volker a lot more about this.
So FD passing
has been added to messaging.
Then all our
internal structures, SMB structures
and so on have been prepared
to be able
to take multiple
channels for one session.
So this is already released.
This has been at least pushed upstream.
The code, the session setup code,
has been prepared to cope with multiple channels.
It's still single channel in Upstream Master
because there's no trigger for it
because the session bind is still missing.
But all this is in preparation. You see
the SMBD message
to pass, or SMB message to pass
a TCP socket with a
negotiate packet
with it to another process
is essentially done. Needs to be polished
a little, but I'll show you the
stage in the software branch later.
The transfer based on
the client good in the negotiate is essentially done.
It's working.
The session bind is also essentially done.
Well, we were thinking about whether we have to implement the session,
the moving of the passing of the connection by a session ID.
So currently we probably won't.
We will just stick with what we have then we need to implement these replay retry things they these are in
progress working to some extent there are details of course here that need to
be fixed up interface discovery this this FSTL that I mentioned first is also
work in progress the point is what I'll show you soon is I have a code where we can configure the characteristics of the interfaces.
What we need still to do is to just like ETH tool, just retrieve the characteristics from the kernel, from the libraries.
And this is, of course, not portable.
So we need to think about what to do on Linux systems and so on.
But we have by now a means to configure this
manually,
basically. And, yeah, of course, we need test
cases, but that's always work in progress, isn't it?
And in order to really
use it in our self-test, we need the support
for FD passing in our
socket wrapper library, but
this is, well, designed at least
and work on that is starting also.
So it's either done, essentially done, or some things are still work in progress.
But we have made quite good progress.
Okay.
Okay, it's open source.
Where's the code?
The most recent state you can see in my own private repository on Samba.org,
keep.samba.org.
The branch here that I'm currently using is this.
And as I'm working mostly together with Steffen Metzmacher on this,
you'll also find copies of this branch
where we basically play ping-pong at some times.
All right.
Okay.
Some considerations for clustering.
As I said, we have this clustering where we also have the CDDP, the failover of IPs and so on.
What we'll have to think about is channels we only support to one node,
but this is in our control because we craft the reply to the interface information request. One important thing that we notice
is that we shouldn't bind addresses,
connections to addresses that can move.
So if an address is moved by failover of CDDB,
that would be pretty bad.
And so we should, in our CDDB cluster,
we probably need to add static public-facing IPs
to the nodes,
and only use these as a reply for interface discovery.
But that's just a detail.
Let me go a little more quick.
So when will we have it?
So my current estimate is we'll have it in the next major release,
which is Samba 4.4.
And according to our plans,
so Samba Upstream has just reviewed and renewed the plans for release schedules.
We just released 4.3,
and we are now going forward to a six-month release cycle.
So the estimate is that this will be released in March 2016.
So unless something weird happens, you can expect to have multi-channel support in some
of them.
Okay, there is a few details.
I wanted to show how we do our internal structures and a little bit how we reorganize it, but
I think given the time I will just skip briefly over it and go to a short demo.
Any questions before the demo?
Yes.
My question is, could the client-width consideration be best when you have a multi-user in the same client,
basically like different users located at the same session?
Do they be...
Different users are usually different sessions,
because, I mean, the session is in the authenticated user context on the server.
The client would be the same
and we end up having the empty parsing.
If the client good is the same,
yes, we will end up in the same process.
So we might have multiple multi-channel connections in one process.
That's right.
So I mean my...
In the case of the HopLock way,
could that be a problem
because there are basically two different users?
We are changing
our user contexts in the
server regularly, so the Samba
process is changing its user identity
when acting on behalf
of one user or another user.
That's already happening when we have, for instance,
a terminal server or
some server where multiple users
are going
over one connection.
That's usually...
That shouldn't be a problem,
actually, because
there's already code to do exactly that.
I expect that once the IP passing is done,
it's treated as both
are part of the same session.
All the other stuff,
like breaking all the other sessions.
Right, right.
Of course, if you, for instance, break,
this refers to an open file handle,
which in the case of SMP2
belongs to TreeConnect, to a session,
and we identify where to send it.
Another question?
Once you have multiple connections that your traffic is moving, another question well first of all
it's the client sending the sequence
and we of course have to
make sure that the replies are somewhere in order.
But still, I mean, that's actually...
We just reply to the packets as they come in.
And the client, when distributing the requests over multiple channels,
we send the reply back over the same channel as it came in usually
so it shouldn't be
difficult for the client to reassemble everything correctly
and I think the same problem applies to
any implementation.
I haven't seen
a problem with that yet.
Oh.
Well, yeah, that's right.
So usually...
Yeah.
Right.
Client are sending over a single...
Jeremy says we are doing that already
with our multi-thread architecture for IO operations.
That's right.
So, and Windows clients our multi-threaded architecture for IO operations, that's right and Windows clients being multi-threaded, they already sent
multiple packets that
comprise a longer stream of operations
over the same channel already
I mean Windows 7 starting, I think it was heavily used
and
so the
I don't think so
no
right so So the... I don't think so, no.
Right.
So there is one more question. Yeah. Yeah, the client
I have a couple of minutes
I can show it to you
the client gets the list of interfaces
with speeds associated
to it that the server just provides
and then chooses based on that.
So if there's one 10 gig interface
and a couple of gigabit interfaces,
it will choose only to do traffic
only on the 10 gigabit.
It's actually what's happening.
And the server,
so in the server,
we have to implement
how to get these numbers, right,
to send back.
Was that the question?
Right.
And yeah, so one thing is
ETH tool linux has the
ability
so it
sends an
octal
which is
probably what
we'll do
on linux
it's not
implemented
yet
but we
have for
testing
we have a
means to
configure
speeds to
certain
interfaces
so if we
expose
let's say
1 gig
and 10 gig
this to the
client
do we see
multi-channel
being used
on both
interfaces?
No. If it only chooses
interfaces of the same quality.
You see?
If there are two 1 gigabit interfaces
and these are the fastest,
it will use them
and so it will use both channels to do
traffic. I will show you exactly that
using Windows client against Samba.
But if then a 10 gigabit interface is added, traffic stops on the 1 gigabit interface and only the 10 gigabit interface is used.
And it doesn't spawn multiple channels or anything, just uses one connection?
In that case, yes.
I mean, there is with this RSS, receive site scaling capability, it may even spawn multiple channels to a single interface, but it depends.
So usually it's for each interface potentially one connection.
And depending on the, so even the connections are established,
but traffic is just sent over the most powerful ones.
So let me just break out into demo.
What's happening here? Aha, okay. Let me just break out into demo. What's happening here?
Aha, okay.
Let me check.
So what do I have here?
Is that vaguely readable back there?
This here, up here?
I guess I have to make it a little bigger, right?
Okay, and all questions always bad. Is that readable? Yes. Okay, and all questions are always bad.
Is that readable?
Yes.
Okay, cool.
So what do we have here?
I have here, this is a PuTTY session.
So I use it to have one view.
PowerShell on the Windows.
This is Windows 2012 R2 server.
This is Samba.
And it's just a single non-clustered sample server, very easy. It has here the Git checkout of my current work-in-progress branch.
You see there are a lot of patches, mostly by Metz and myself,
and so a lot of work-in-progress patches, really a lot of them,
like stuff, a lot of them, like stuff.
Oh, a lot of patches.
Hack, revert, whatever.
Those steps.
So we are slowly cleaning it up, and stuff that's ready percolates down and goes into master.
In the past couple of weeks, stuff has gone out. So I've compiled this, and we can start the server here.
Started.
So there's SMBD.
There are several processes here.
Main process and already two forked sub-processes, but these
don't serve connections yet
let me now
I have already prepared a watch job
which
the top part
will show the TCP
connections
not including SSH connections
because this is SSH here
and down here we'll see SMB status
I've
augmented SMB status. I've augmented
SMB status also to show the session ID.
Just out of interest.
So what do we do here?
So, wait.
No.
I'm not really good at this.
Sorry for that.
I don't know how to use.
Aha, there it is.
Aha. Let me just delete it for that. I don't know how to use. Ah, there it is. Aha.
Let me just delete it for now.
Delete z colon.
So I'm going to use.
I did a new session.
So you see up here, session appears, and here TCP connection.
This goes to the address I've specified here
we should
see what interfaces we do have
so
oops
so we have
I'm not using ETH0
ETH1 is the one we're using here.
ETH2, ETH3, and ETH4 is not up.
So what do we have?
1, 2, 3, 4.
This is the one, the four interfaces we are using.
And now I can show you...
I have configured it such that...
So, that's what I meant.
We don't have proper detection of interface speed yet,
but I have included configuration,
so I can, in our interfaces list,
I can add speed information to the interface.
That's what I want network interface discovery to present to our clients.
So, the first one is the slowest.
This is 100 Mbit.
This is Gigabit.
And the last one, which is not up.
E4 was not in the list of interfaces you've just seen.
The last one is 10 Gigabit.
So, and down here in the windows, you see, oh, let me just switch back.
Get multi-channel connection.
It has already seen, oh, I could have shown the wire shock,
but I think the time is not enough to see the response of the,
I would have to go back.
So it has seen, oh, this one has the 10 gigabit speed.
It doesn't know yet that this interface is not responding.
So let's try to, how do I search in PowerShell?
I'm going to copy, copy, sorry.
There's a big file.
I'm copying it.
And we see, for now, it is still using this one, this interface.
The point is that you have to do it a couple of times.
And at some point, it will use the other channels. Now, the other channel connections
are established, they weren't there before
so you see
and now you see
traffic is not
so the
there was this loop
this is a gigabyte file
and this is
really not about performance.
This is just on my poor small laptop.
It's just VM and the Samba is inside a container.
Where's the for loop?
I have this nice PowerShell for loop,
but I don't know how to search.
Can anybody help me with PowerShell?
I'm sorry.
Crap.
It was there.
There it is.
No.
No, this was the wrong folder.
There it is.
Okay.
Copying.
We see traffic only happening on these two, 20, 30.
These are the gigabit interfaces.
So you don't see any data on the 100-embit interface while we're copying. And now, while
this thing is going, I will just enable the 10-gigabit interface. Remember, don't expect
it to be more fast. It's just the same interface. I just cheated and told Windows, this is 10
gigabit. It's not just to demonstrate that
it's in principle working. This is not about performance, right? So what we have... Need four. Up.
So, as before, it will take a couple of seconds and a couple of copies for it to detect this and then add the channel for the faster interface.
I hope it will make it before it reaches 100.
But, well, I can restart the loop then.
Okay. Okay.
Wow.
Any questions while we're waiting?
Steve?
So, one thing I was curious about is the RSS flag.
Is there any way that you can figure out
a query that
from its config or some API?
No.
Also, according to the last things I heard, the relevance of this is really reduced.
So I don't know whether Windows makes a lot of use of that anymore.
So we need to check that.
But I don't know.
I have to find out exactly these things. You asked that the original claim
was that flag was on the
other one.
Even if you only had one address.
Exactly, yeah.
And if that's true, it seems logical
that it comes from the address in the hallway.
But, by the way,
you didn't see it because you didn't see my
trace, but I did send the
RSS flag back to the client.
And it doesn't do multiple connections to one address here.
Doesn't.
So I did send it.
And I can also show you...
So when will it happen?
I don't know.
So one quick thing here.
Just waiting for this to end.
So there is the... So updatemb.multichannel.connection.
So, oh, this has now added the.40. Get smb.multichannel.connection. And this is the voodoo of PowerShell, FL star, right?
Okay.
So, here.
Server RSS capable true. I sent it. It sets it to true
even though
despite that it doesn't do
multiple connections. I don't know. Maybe there's
more to it. But you see now
it's only using this
.40 interface to do
traffic.
Yeah.
When you run the connection
command, is it sending an FSCTO to I think so, yeah.
But the list of interfaces is the same.
So it doesn't change when an interface is torn down or brought up.
The interface is there.
The address is in principle there.
It was the same response, but it then newly checks. And so it just cuts down the time out until the server rechecks whether another more powerful interface is available, basically.
Okay, so this is now working.
That's the end of the demo.
That worked.
Wow.
Kind of.
Let's just end this here.
And let's briefly go back.
I just have
very little
time
oh no
like this
to continue
but I've
covered most
of the stuff
I wanted to
say
since we
had a lot
of questions
I just
want to
make a
couple of
brief notes
about the
SMB direct
which is well SMB3
over RDMA so transport is RDMA capable network interfaces instead of TCP and
SMB direct itself is rather a small wrapper protocol around SMB or SMB3
rather to be put onto RDMA transports.
And here the reads and writes
really use RDMA reads and writes.
And so that's making up for
reduced latency and so on.
This is done
oh wait, this is
done via multi-channel.
So first channel is always
TCP, and then you bind
another transport connection with RDMA and do RDMA over that.
And so we need multichannel first, and this is an actual follow-up to multichannel efforts.
Here, what is the chance to get this in Samba?
There is a wire structure sector, which we've basically provided.
We have, well, the prerequisite is multichannel, which we've basically provided we have well the prerequisite is multi-channel
which is work in progress
mainly METS has
thought about transport abstractions
in our cold and has some work in progress
patches flying around since
many many months I think by now
but there is a fundamental
problem so we can't do it
the same way as we do it with GCP multichannel
because, I mean, what do we do?
We are forking, we're taking the connections over a fork,
and then we are passing the connection with FD passing.
All that doesn't work with RDMA,
partly because the concepts are different
and partly because when it's similar,
they are not fork-safe,
they do not support passing RDMA connections
over to different processes.
There's no mechanism for that.
And so there is an idea
to create a central RDMA proxy.
Out of time.
Yes, thank you.
Just very briefly.
Either as a proof of concept,
user-space daemon to have a quick development turnaround but for production as was detailed a little bit
in Ira's talk yesterday
we envision going towards a kernel module
in order to have really direct memory access
remote memory access
so just as a finishing thing here
you will recognize the similarity system. So, just as a finishing thing here, there is
an X. You will recognize
the similarity to the
diagram we had before.
So, it's just there are a couple
of extra twists. So, the gray box
here is that MA proxy
daemon, which we call SMBDD for now,
be it kernel
driver or kernel daemon or
user space daemon. So, we have to connect the first child process, negotiate, set up, initial session, done.
Then we have an RDMA connection coming in.
This ends up in this SMBD daemon or a proxy daemon which listens on RDMA.
A proxy Unix domain socket is created and passed down to the main SMBD. And from here on, we could use our usual fork semantics.
So a connection comes in, we fork, create a new child.
Then the negotiate request is created.
And we have the kind of it.
We can pass it down to the responsible child process.
And this can pass now not the TCP connection FD but it
can pass this proxy FD for
communication between the proxy and
the process pass it down here and can
die and then we have
the very similar so we have all the
SMB processing in the same
SMB process and we have the
communication channel with the proxy
daemon which where by establishment of a shared memory area,
we can use RDMA reads and writes to directly go to the memory we need.
So that's a very high-level idea how to modify our SMB multi-channel implementation
to also include RDMA support.
But, of course, there are many problems to be solved.
We need to interact with hardware drivers and so on and so forth. And so this is
the idea where we'll go. And that's the end of the talk. There's a lot of SMB3
and SMB, general SMB planning on our wiki.summer.org. Many of the things
can already be found in some form there and so
yeah
time is over
but maybe we have
time for one or more
two more questions
one more question
yeah
did you learn
or
are you
creating
file system access
rather than
you know
getting passing
from one process
to the other
doing it all
in one SMB process
have multiple SMBs and connection,
but when it comes to processing,
just send it to one SMBD,
that will kind of make both RDMA and...
But sending every single read and write request
adds a lot of...
I didn't do that, but I didn't try it,
but I wouldn't try it because, I mean,
it's over for every single write or read request.
I mean,
for each, that is not done in
the real channel.
So, I think we should
give the folk a little bit of time to
switch over. I mean, there are
more opportunities to ask questions or discuss
out in the coffee area
and so on later on. So thank you for your attention.
Thanks for listening.
If you have questions about the material presented in this podcast,
be sure and join our developers mailing list
by sending an email to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further
with your peers in the developer community.
For additional information about the Storage Developer Conference, visit storagedeveloper.org.