Storage Developer Conference - #23: Overview of Swordfish: Scalable Storage Management
Episode Date: October 20, 2016...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC
Podcast. Every week, the SDC Podcast presents important technical topics to the developer
community. Each episode is hand-selected by the SNEA Technical Council from the presentations
at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast.
You are listening to SDC Podcast Episode 23.
Today we hear from Rochelle Alvers, Principal Storage Management Architect with Broadcom,
as she presents Overview of Swordfish, Scalable Storage Management,
from the 2016 Storage Developer Conference.
What is Swordfish?
How many people in here have heard of Swordfish before today or before this week?
Wow, that's, okay, Marali, you put your hand down.
How many people not involved in developing Swordfish have heard of Swordfish before this week?
Okay, that's better.
All right, so what we're going to cover here a little bit is talking about where Swordfish came from,
what's the genesis for it, and then a little bit overview of kind of what the concepts are,
how we've developed it, and who's involved in building it. And then, obviously, questions.
So, who
basically,
obviously, you guys all read this already,
right? And as Mark just said,
the SNEA's Scalable Storage Management Technical
Workgroup has been developing
the Swordfish specification.
We actually announced the release on Monday.
Sorry, a little bit of tiny little bit of housekeeping here.
One of the things that you get an opportunity to do for sitting in and listening to this
presentation today is a chance to register to win a Phantom 3 drone.
And if I just parked up and got really animated at the thought of a Phantom 3 drone,
it's because I have one of these, and they're awesome.
So I'm a true nerd.
So Don and Mark and Arnold are passing out
what you need to do to register for these.
It's going to be an online registration,
so no need to be present to win.
Your presence here is your present to win part.
So if you don't have a ticket, wave your hand and make sure you get your ticket,
and then you go online and register to win.
But they're really, really fun.
You do have to register them with the FAA, though.
You don't need to register with the Future Farmers of America, though.
Anyway, so back to our regularly scheduled programming.
So what Swordfish is, and with the technical work group,
we were formed about nine months ago to start
and define the Swordfish specification, we released version 1.0 on Monday.
And so I'll kind of start that again and repeat that again for you.
We formed this Twig nine months ago and released the spec on Monday.
So let's kind of go through the history of that.
One key thing, disclaimer side,
I'm sure everyone has seen a variant
of this disclaimer slide before.
But note on here, snea.org slash swordfish.
So if you don't remember anything else,
remember snea.org slash swordfish.
So that's where you can go to find any and all pertinent information from today and moving forward.
Okay.
So what are the drivers for swordfish?
So over the last several years, we've obviously had storage management standards for quite a while. I'm
sure a lot of you are familiar with SMIS. And that's actually been a really good standard
for managing enterprise class storage. It's been extremely widely adopted. But there have
obviously been some things we needed to work on for that. We've had a lot of feedback from
customers and vendors alike on things that we need to think about in terms of not just SMIS, but storage
management standards in general. So things like, you know, make them easier to implement and consume.
So from the implementation side, you know, the technologies we picked require a lot of, you know, very specific knowledge.
They're not used broadly.
And that's true for both the implementation side and the consumption side.
It's got a high learning curve.
So, you know, what can we do to simplify that?
Improve access efficiency.
So one of the things that we've done in the past
is we've basically defined standards largely driven by vendors.
And by vendors, I mean the people that build storage hardware
and storage infrastructure, more so than clients.
And the clients are, you know, the people that use it, right?
Either the
storage applications or the direct end users. And so when you build things from the vendor
perspective, what you get is look at all my neat knobs and widgets and look at every feature that I
built. And so you don't think about, you know, necessarily I want to look at this particular attribute of this system
a thousand times.
What we've gotten is a lot of feedback over the years to say, I want to look at this particular
attribute. We've had to do a lot of refinement. We've done that in the current systems,
but what we need to do is actually when you're doing a new standard,
build that in from scratch. That's where improving access efficiency comes in.
The third and fifth ones kind of go together. Providing useful access via standard browser
and using standard tools, right? It's very hard in not just SMIS, but in a lot of other standards,
legacy standards, particularly even in the server space, to use them directly.
So if you look at a lot of the tools people use in, like DevOps people use directly in
their day-to-day work, you can't actually interact directly with the tools.
And so those are a lot of the feedback we've gotten over the years.
Another thing is just a shift in where storage is deployed.
You know, five, ten years ago, you had DAS, you had SAN,
but what we've seen is a rapid transition to including converged,
hyper-converged, hyperscale environments,
and the standards really haven't kept up with that.
So those are a lot of the areas of feedback we've gotten to say,
what can we do to expand the standards to those?
So there's a couple of different ways you could do that.
One is you could evolve the existing standards.
The other one is start from scratch, build something to address that.
And so we basically took the latter approach.
So with Swordfish, we basically said, okay, taking all of those things into account, as well as, you know, what are we, you know, what else can we do?
I said start from scratch, but we didn't really start from scratch, because we couldn't start from scratch completely and turn a new standard in nine months.
I've already talked about us, you know, basically starting nine months ago and releasing a spec today. What we did instead was
steal liberally from a bunch of different sources. And one of those sources was a spec called
Redfish. So some of you may have heard of Redfish. How many of you have heard? Okay, so what Redfish is, it's actually a spec that the DMTF organization put out.
A lot of the same vendors involved in developing Swordfish were involved in developing Redfish.
Redfish is a standard that DMTF came out with to do very similar objectives in the server
management space initially, and then a plan to extend it.
It initially came out a year ago, almost a year ago right now,
with base functionality in replacing IPMI for BMCs.
That's the level of functionality it came out with. But the base protocol infrastructure
is RESTful with JSON data transports and including OData metadata as well. So we were able to
basically take all of that, and I'll show you a little bit more about all of that.
But basically we were able to take all of that protocol level work
that they've done, as well as all of those schemas
they've defined about servers,
because as we all know,
a lot of storage is actually built using server components.
So we basically took all of those,
and we just leveraged them directly.
And we extend, what we do with Swordfish
is we just extend all of that Redfish
and focus exclusively on what we call the storage services.
And that's what Swordfish is.
Other things we did was we stole or leveraged, pick your favorite term,
work that was actually going on in SMIS.
SMIS was actually working on simplified models that were much more client-oriented.
Some of those things that I talked about on the last slide around what can we do to be more client-oriented,
what can we do to focus on getting those refactored APIs
so that somebody doesn't have to make the same call once,
they can do it, or 1,000 times they can make it once,
get it a lot more user-centric,
and simplify the
models. We stole a lot of that. One of the other key features we've added into this, and this is
big, is we've actually moved to a class of service-based provisioning and monitoring model.
And so what that actually means is instead of, again, focusing on this is very, very user-centric rather than equipment and vendor-centric,
is we've moved to a model where instead of focusing on, you know, you have to understand every single knob and widget and configuration,
you can actually set it up so that the vendors can do it out of the box, and then the storage administrator can figure it so that when a DevOps guy or your IT admin comes in
and wants to configure the system,
what he actually gets presented with is,
here's the class of service that someone else has configured for him.
I just get capacity,
and whatever class of service you're permitted to use,
you just look at that and configure your capacity off that.
You don't have to know anything about the underlying infrastructure.
And that's completely configurable for your environment.
And that's done in a completely standard way, obviously.
It's a standard API.
And so then the vendors are able to differentiate all around that.
We cover block file and object.
And then the other thing that this does is because we're working so tightly with Redfish,
it covers very seamlessly the storage server and fabric together.
So there's existing simple fabric models,
and we'll be working to extend those to cover storage fabrics. Redfish is already working to extend those to cover networking fabrics.
It's really going to give us something we've been trying to do for a long time in the standard
space, a really true seamless standard API to cover your entire data center.
Okay.
So who is developing this?
I talked a little bit about, you know,
all the players really having a lot of overlap.
These companies are highlighted in blue down here,
or blue, purple.
The companies highlighted in purple here really have been the key players.
I want to highlight a couple of things.
Microsoft and VMware have really been very instrumental in this whole process from a client perspective. One of
the issues we've had when we developed SMIS, and one of the reasons it kind of took a long
time to get things really stabilized, and it was largely driven by the vendors,
and it took time to get a lot of client engagement.
We have a lot of client engagement up front.
We'd really like a lot more input from clients,
but this has been very, very client-driven so far.
And as you can see, we've had Broadcom, my company,
Dell, EMC, HP, HPE, sorry, Intel,
and Intel bringing a lot of their input from their RackScale design architecture in as well.
So that's got both a client and provider, both the client and vendor inputs, which is a really good perspective. It also has a lot of that server storage networking ecosystem input.
And then we've had Nimble
and then a couple of smaller players like Innova Development in as well,
looking at it from an infrastructure perspective.
So it's been a really good breadth.
And then also I'd highlight that most of these companies
are also playing and key that most of these companies are
also playing and key players in the companies that are developing the Redfish space. We have
a lot of other companies here that have also been watching what's been going on. And most of these
companies are also companies that have been very active in developing SMIS. But we'd love to have a lot more people come
and engage and work with us on validating what we've done
and expanding functionality,
validating, developing reference implementations,
developing real implementations,
and working with us moving forward.
And by the way, we don't actually have to wait till the end for questions. If people do
have questions, we will take, I'm happy to take questions anytime as we go through.
It will probably also help me not have coughing fits as I go forward here.
So I guess I probably could have clarified that up front.
All right, so what functionality did we include in V1?
You know, I keep saying, you know,
yay, we did this in nine months.
You're probably thinking,
yeah, probably didn't get much there.
Because we leveraged so much stuff,
we actually got a ton of functionality in here.
We actually have full block functionality,
full provisioning with class and service controls,
volume mapping and masking,
full replication capabilities, capacity and health metrics,
and then we also put file system on top of that.
So the file system leverages the entire block infrastructure
and then adds file system and file share schemas on top of it.
We've also got support for object drive storage.
I know earlier I said we were going to do object store.
This is not full object store.
This is for object drive, and so for anyone who's not aware, there's another technical
work group going in SNEA that is the object drive storage technical work group.
They're in the process or have already released a spec for object drives.
It's out for public review right now, so if you haven't heard of this, go look at that.
And this has support for the object drive storage in it.
And so if you're not aware of it, just go look at that.
It's a pretty interesting specification for folks to go be aware of.
Okay.
So diving down a little bit into what this stuff actually looks like.
Before I do that, any questions on what we've covered so far?
All right.
So what does this stuff actually look like?
We've talked a little bit about this being REST-based
and then using JSON and then OData metadata extensions as well.
And we also talked about it leveraging Redfish.
So the way this is actually
structured and what
we've done is it
uses the same Redfish resource
we've just extended using the
Redfish resource now. So this slide and the next
slide basically talk a little bit about
directly how we've done that
as well as starting to lead in
a little bit to
what a Swordfish implementation actually will look like.
So this is a slide we've leveraged from Redfish.
So if you see any of the Redfish presentations, you'll see this slide there.
So Redfish is fundamentally structured in this hierarchy.
So there's this notion called collections with entities inside them.
So that's the terminology there.
And the primary structure that they use is you have four,
and there's a few more things that have started to show up now,
but the primary structure is
you have systems,
which is the logical structure
for a computer system,
and then the chassis,
which is the physical part of the system.
And then the managers over,
I'm pointing to my screen.
Isn't that really helpful?
And then the manager.
Is there a pointer?
Ah, thank you.
And then there's a...
Ah, yay, a green pointer.
The manager's pointer over here is where you will either see functionality for a BMC or, in our case, where we will add things like a software management infrastructure.
So that's where we'll extend that.
And then there's other services here like session management, account management, schemas, and events.
And so these things all hang off of what we
call the service route.
So the collection
is where you'll see, you'll go here
and see how many of those
things exist.
So this is the logical system.
This is the physical system.
The
distinction is
pretty much that way.
If you're quibbling why something's in one spot or another,
it was probably a multi-hour discussion
as to why it should be one spot or another.
And, you know, everything's not quite that clean,
but it is where it is.
But you'll see over here exactly what you expect to see,
model, serial number, inventory information,
and then information like power and thermal
and rack hierarchies and stuff, or a chassis, okay?
And so what we've done with Swordfish
is we basically looked at that entire hierarchy
and said, yep, we'll use that.
And we added all the stuff that's in purple.
And so in purple, we did a couple of things.
One is we recognized that in some configurations, our storage systems are built entirely and exactly out of standard servers. And so some of these systems will be using
exactly these storage systems, or these systems.
But in some cases, we have storage systems
that are very similar, but have custom hardware,
and so there's also this logical construct
that's called a storage system.
And those things are very, very similar as well.
But we basically have...
The bulk of the focus is actually
in what we call the storage service.
So the storage service is where you'll see
all of what you expect to see,
a logical construct for storage.
This is where you see, you know, storage, you know,
the volumes, and the storage pools, and the endpoints, and groups, and this is where we've
added the class of service constructs, and all of that. So again, you see, you know, the collection
that says how many of them there are,
and this is where you see the details of each individual entry.
So one thing about cardinality of all of this is for a Redfish implementation that's done on a BMC, what you might see is one of each of these.
For a storage, like a large-scale storage system, what you might see is a storage service
that has, for the storage services, you might see a large number of storage services because that's the way
the system is developed.
This thing is just designed to be built on a much larger scale.
It's a very highly scalable system.
We have a question up here.
Hang on. Hold on just a sec highly scalable system. So we have a question up here. Sorry.
Hang on.
Hold on just a sec for the mic.
When you say class of service, could you give us some example of what kind of class I was talking about?
Is it related to QoS or is it a group or something like that?
Okay.
So the question about what do we mean by class of service.
So a class of service is actually defined to be, I think we will get into that in a little bit and have a few more details. But fundamentally, a class of service is defined to be based on a set of,
extending a set of capabilities.
And so you get to define these capabilities.
They can be based on protection.
They can be on capacity, on a whole set of attributes that we've defined.
But basically, it could be performance-related.
It could be protection-related.
It could be availability-related.
So some people call this quality of service-related, quality of service instead of class of service.
We've chosen to call it class of service. And so I'll have, when I get a little
bit further in, I'll have some mock-ups that show a little bit more detailed examples. And so if I
don't have quite all the detail for you, then let me know. All right. Okay. So we also talked about how we extend Redfish.
This I want to highlight again.
One of the things that we do,
Redfish actually covers what we call local storage.
When you have storage attached to a local server,
Redfish covers that notion.
They have this notion of a volume that's got some set of attributes in it.
We also have a notion of a volume.
And so we didn't want those to diverge at all.
So what we've done is basically said we will not diverge.
Instead, we extend that volume.
We have all of those same attributes in there, but when you
move to Swordfish, you of necessity have a need for some additional attributes. And so the model
that we have to work between the organizations is to extend that to include all of those additional
attributes. One of the things that we didn't include in our V1 release,
but we will be including and developing
and including in a later release,
is an implementer's guide that's very specific
that includes details for implementers
to talk about, you know,
which specific attributes to include
and specific implementations.
So our spec includes all of these things now.
We will just be adding additional guidelines
for when to use all of the attributes.
Okay, so what does a Swordfish system look like?
Or can you see what a Swordfish system looks like today?
And the answer is yes.
Even though we don't actually have any implementations, you can actually see what
this looks like. Now, how do you do that? So one of our development tools for putting the spec
together is we call mockups. And so this has actually helped us develop this a little bit more quickly.
And some of this goes back to the fact
that with the JSON infrastructure,
we can actually put together static views
of what a system would look like in JSON
and say, does this make sense?
And yes or no, or modify it.
And then instead of having to work entirely in schema.
And so what we've developed are actually three different sets of mockups,
one that's actually, here's a small-scale system,
here's a large-scale system that has everything,
and here's a file system view.
And so we've actually used those as both development tools,
and we've released all three sets of our mock-ups as part of our,
well, they're actually part of all of our work-in-progress releases,
but they'll be released as part of the spec bundle as well,
so that you can get a good sense of what different configurations would look like.
So I'll give you a sense of what Swordfish systems look like
by actually using part of that work.
So here we go.
So here's a little bit of the Swordfish mock-ups.
For those in the back, you should sit in front.
And as a note, all of the slides are online,
so you can actually... Thanks, Marty.
You can actually see the stuff.
You can download the slides.
You can also...
We actually have two different ways
you can actually look at this,
one of which is you can download the mock-ups,
put them on your own systems,
and navigate through them.
We're actually also putting all of these online at swordfish.mockable.io,
both in static form,
as well as adding some simulated interactions in a few areas
so that you can actually look like
you're actively interacting
with the system.
So, what you can actually
do is interact
with the system a little bit. So, like I said,
there's three different
systems on here, and this actually
shows you a little bit of the service route.
So, you'll see
if this were
just a small-scale system, you wouldn't necessarily see all
of these things, but you can see the storage systems and the storage services here, as well
as some of those other elements we talked about. And so if you wanted to just navigate through,
you'd basically update the URL you're asking for at the top. And I navigate down into the storage service collection,
and now I can see, hey, there's three of these things on here.
So we move forward.
We picked one.
So what's actually in a storage service?
And so here's all of those things we actually already talked about.
So there's a class of service.
There's volumes.
There's pools.
There's groups.
And then there's actually also points or pointers to other resources
that the storage service references
or leverages, so things like the system and the chassis.
So there's ways to say, these are the relationships here, and so you
can actually navigate around and just
migrate your way around the system.
So you never have to go on a system.
You can actually just go into and navigate your way around
and find all the relationships.
I still haven't made you go to a schema and research anything.
You can actually just point your browser at a system
and navigate your way around and find everything.
This is completely different than interacting with an SMIS-based system where you would
have had to go read a manual to find something.
What if I want a file system?
Identical except that now I've also got a, right in about here I have to look, right
in here I now have a file system link where I can go dig down in
and see the details of the file systems.
Okay, so I actually want to do something.
So let's say I want to discover something about my system,
which is kind of what I've been doing.
I've just been navigating around and discovering stuff, right?
Let's say I want to discover something for a specific reason.
So do I have space to, you know, what do I want to discover something for a specific reason. So do I have space to,
you know, what do I want to have space to do? Do I want to, do I have space to, say,
check the capacity in a storage pool? So again, going back to the class of service,
my, I'm a DevOps guy, and my storage admin has told me that in a particular storage pool
or with a particular class of service,
and in this case, I'm in Boston,
and I have permission for anything in gold.
So I've done a search in,
because I know how to do appropriate search parameters.
I know how to search for the storage pools that have the type of
class of service for Gold Boston.
Does that show up on you?
Or I happen to know that I'm
looking for, you know, name special pool because the other
search got truncated on this screen.
So I happen to know that I can search in special
pool. So I navigate my way down to special pool,
and I can look at the capacity here,
do appropriate calculations on it,
and say, hey, look, I do have enough capacity here.
And yay, I can go create a volume in this pool.
So now I could actually go in
and do an appropriate, it's a REST API,
I can post to create a volume into this pool.
And that's exactly how simple it is.
I don't have to worry about what array this is
and what attributes it is
or anything like that. It's already been all set up
for me by the storage admin.
Okay, so that's
I don't actually even have to worry
about what vendor's
equipment it is underneath either.
Storage admin did all of that for me.
Alright, so
that's a little bit about how everything
works. I think we're down to a little bit about how everything works.
I think we're down to just a few minutes left.
So where are we?
Oh, I forgot this was a build slide.
Sorry.
Yep.
Like I said, we just finished the v1.0 spec,
released that this week.
We've had a bunch of interim releases
through the year.
Sneha.org slash Swordfish will tell you all of the rest of this
if you're interested in participating.
We would love to have you in several different ways
join the group and work with us on developing the spec.
If you're looking at it outside the group,
send feedback through the portal.
We're also working on
setting up a storage management
customer panel.
Email storagemanagement at sneha.org for
more information.
Everyone is wrong.
We're also going to be
at Ignite next week.
There will also be a
customer event
that we're working on.
We have registrations for folks to attend on Tuesday night at Ignite next week.
Well, I think that's it. I'll leave it here.
Questions?
We have a...
I have the microphone.
Yes.
Let me bring the mic to you.
You talked about an implementation guide.
Do you define there the border between redfish and swordfish?
Yes.
So a couple of things.
One is with this first release,
we've actually put out a specification
and the beginning of a user's guide.
So we decided to focus on the user's guide first,
and we'll be putting out an implementation guide later.
So the implementation guide targeted at the vendors,
but focusing on the users
and highlighting user interaction for the users first.
What we put in the spec
is actually focused on just the swordfish part
and refer back in the spec
and refer everybody else to say,
that's the redfish part.
What we actually expect to see from a client perspective
and a user perspective
is that the difference between swordfish and redfish
should be completely transparent.
And we actually have a station set up out here
in the mezzanine, thank you,
where we can actually walk you through a little bit
and show you how, from a client perspective,
that it should be completely transparent
when you're interacting with Swordfish versus Redfish.
And then I can show you the pieces
that are Swordfish versus Redfish there,
but from a client perspective,
you shouldn't be able to tell it all.
The other thing that we're doing
to make it completely transparent
is we're actually posting the schemas.
We have JSON and CSDL schemas.
We're actually posting those on the DMTF website.
So even when you're interacting there,
you don't have to come to SNIA versus DMTF
when you're building the system to get them.
They'll all be in one spot.
Any other questions?
Oh, come on.
How far does the drone go, at least?
It's a mile, by the way.
So I've got a question.
Is the standard somehow prepared
or will be prepared for somehow managing storage systems
like write or distribute file systems like Ceph
or something like that?
Is this a part of your work or will be, maybe?
So I think the question was...
I'm sorry, can...
No, I didn't quite catch all the questions. Can you... Okay, so my question was, I'm sorry, can... You don't have to repeat it. No, I didn't quite catch all the questions.
Can you... Okay, so
my question was, is the standard
somehow prepared for
managing and monitoring solutions
such a right solution
or distributed file systems?
Stuff like that.
He's talking about what?
Is there data?
Oh, um... I'm sorry.
Like objects.
You don't support objects.
Oh, no.
Right.
So in the first version, we only included block and file.
But it is completely up to anybody coming in as to what extensions are added past that.
We expect to add object, and in the last couple of days,
I've talked to folks who also want to look at potential extensions for Ceph.
We've talked to folks who have an intent to come in and add some extensions into file share space.
But really, if we have two or more vendors
who have functionality that they want to add
into a particular area, we're open to that.
We already have a roadmap that says
we're going to be adding performance metrics.
We're going to be adding a bunch of capabilities
around events.
We have fabric extensions.
We're going to be doing some collaboration with DMTF in about three or four areas.
But really, any capability is fair game as long as we have two or more vendors,
or two or more companies, I shouldn't say.
I keep saying vendors. But two or more companies that are interested in adding to the standard.
Okay. Thanks, Rochelle.
Thanks for listening. If you have questions about the material presented in this podcast,
be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further with your peers in the developer community.
For additional information about the Storage Developer Conference, visit storagedeveloper.org.