Storage Developer Conference - #132: Emerging Scalable Storage Management Functionality

Episode Date: August 10, 2020

...

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to SDC Podcast, episode 132. So what we're covering in this session is a continuation of what we talked about earlier, but I was really just going to focus in and talk in a lot more detail about some of the things that we've added this year. It'll be interesting to see how
Starting point is 00:00:55 long this actually goes because I have not timed it. I just put all the material together. So it could either be really short or way long. Anyway, so let's just disclaimer slide um so again by now you if you're here you're hopefully have a pretty good idea of what swordfish is um and so we're just going to dig in. We've talked about this a little earlier. This is the same slide I presented earlier. Is there anybody that didn't see this already? Okay.
Starting point is 00:01:40 I'm really mostly going to be focusing on this 2019 section here, which is, you know, what is the stuff we added? You know, the stuff I almost forgot to talk about in the previous session. But really talking about, you know, where we closed the gap, how we changed the functionality in there, what is all the functionality that actually went in to Swordfish V1.1, which I think I actually even forgot to say is all of the new functionality that went in this year, the bulk of it went into the release, was posted August 22nd. It's actually Swordfish v1.1.0. Everything we've done before that was v1.0.x.
Starting point is 00:02:23 So we actually rolled the minor version of the bundles because we viewed this was a significant enough change in functionality, both from the functionality we released as well as adding the features and profiles and everything to support that. So that release has been out a whole three weeks now, three weeks. So we've had so much feedback. But anyway, let me dig into that a little bit. There are actually a couple of other presentations that will go into the webcasts
Starting point is 00:02:58 and things out on profiles and some of the other stuff that do go into some other aspects. Okay, so what else did we add? We talked about the supported features. We talked about the functionality. We added consistency groups, and I've said this a couple, three times, but this is actually pretty important. We had storage groups and consistency groups running dual duty, and that was incredibly confusing. So we split this out, and we now just and consistency groups running dual-duty, and that was incredibly confusing.
Starting point is 00:03:25 So we split this out, and we now just have consistency groups. The next thing we added was replication functionality explicitly into both volumes and consistency groups. They're exactly the same set of functions, but there's six different ways to do replication. Previously, we had replication built in, but it's built in as part of class of service. There were, there was only one explicit control in that.
Starting point is 00:03:55 So you can basically configure, tell the system what kind, what data protection capabilities you want, or you can say, okay, in addition to that, do a replication, you know, create a replica right now. But when you don't have a system that doesn't require a class of service, replication becomes a little bit more important to have the configuration controls on that. So this is an area that Chris Leonetti over here helped out quite extensively with. Thank you, Chris. And we added in these six areas. And these are all structured as actions,
Starting point is 00:04:31 which going back to Jeff's talk earlier this morning, actions use post, but they're a little bit different than a regular post. So assign and create are two variants of you know creating a replica one uses an existing volume or consistency group and the other one says no use new space and then the rest of these these next three are all about, well, remove and reverse are, you know, all about, are about, well, remove is delete the relationship, not necessarily the object. Resume replication, resume and suspend are pretty obvious, right? They're mirror pair suspend and uh and then you have reversing the replicate the relationship which works for some kinds of replication is if you have a source target and
Starting point is 00:05:34 you want to reverse those the the ordering of that and switch whose source and whose target now obviously not every uh every system is going to be able to support all of this. This is kind of intended to be the superset. So those are all of the replication capabilities. We also significantly expanded the volume schema. I'm going to contradict myself a little bit from what I said earlier, which is that we're trying to keep the schema relatively simple and straightforward and keep everything very client-centric. We added a lot to the volume schema. I think there were 10, no, 14 new properties that went in, in addition to the six
Starting point is 00:06:16 actions, in addition to a couple of new collections. So it got significantly bigger, but I will, you know, actually say that the bulk of those requests did come in from a client perspective. They did not come in with a vendor saying, I want to expose this property. They came in from saying a vendor, you know, if they did come in from a vendor, they came in from a vendor saying, I want to expose this because my clients want to see, you know,
Starting point is 00:06:44 my users want to see it in our client application. We also had some kind of interesting use cases come in driving, you know, from the RSD side on the client side coming in and driving, exposing journaling interfaces and things that are much different use cases than any of the external storage folks in the room were used to. So we spent some time getting interfaces exposed to match some of those. Okay, and then I also mentioned earlier, we've expanded the user's guide quite a bit, so we've added use cases that show you everywhere from how do I create a volume or a storage pool,
Starting point is 00:07:30 volumes from here's the Redfish, here's the StorageFish, here's StorageFish with class of service, storage pools. Here's how I do this. Here's how I create it with and without class of service. The other thing we do is the user's guides also now sorted by feature so we have you know the dozen features in there and you can see the whole thing you can go actually see which use case corresponds to which feature okay so the other thing that's been happening is the swordfish ecosystem in general has been expanding. So this slide is one that the redfish community has been using quite a bit.
Starting point is 00:08:17 So there's really kind of four phases in the evolution of a standard. There's the describe, which is developing the standard itself. Prescribe, which is how do we want to use that. Implementations and then test conformance. So these are just some of the partners and who falls into some of these various categories. So obviously DMTF and SNEA have been the ones just writing the specifications during the description. And then you get folks where we're partnering with and working with these other industry consortiums, like the Open Compute Project, ODCC, OpenSDS,
Starting point is 00:09:01 as they work on how they you know a prescriptive view of what parts of the standards they want to use and then you actually start to see you know implementations come out in not just from the vendors but in other partners as well so we have implementations of we have a lot of open source space we have tools we have implementations of, we have a lot of open source space.
Starting point is 00:09:25 We have tools. We have a lot of work that we've been doing to expand the ecosystem, but there's also partnering with other groups there. And then we have, and we mentioned this earlier as well, tools and all of this work is also happening in the open source space. Tools and frameworks being developed and programs being developed to help test conformance to that to also help accelerate adoption. Okay.
Starting point is 00:09:55 So we've seen this before. What I wanted to do was kind of just circle back as to, you know, where all the new pieces were, and dig down into those into a little bit more detail. We have, here's the storage view. What did we really add? So when you have Redfish, the volumes are in the collections here. I actually talked about this.
Starting point is 00:10:20 This is where the new stuff actually got at it. In version 1.0.x of Swordfish, everything in all of the Swordfish schema, pools, groups, consistency groups, and everything, attached exclusively off of storage services. So what we did in 1.1, and that, you know, obviously is very tightly linked to the class of service functionality. So in making that functionality optional, we had to attach those things someplace else. So we hit two birds with one stone here. By moving it up here and connecting it, we both made it optional
Starting point is 00:11:10 and closed the gap with the Redfish storage. So I already described earlier how you basically take the volumes, you move from, describe going from in the Redfish model, posting to a volume collection, to when you add storage pools, now you switch to post to the storage pools, you work from there. So this is where we made the bulk of those changes. We dropped out that link between the storage and volume.
Starting point is 00:11:44 That's no longer the primary client link there. And we added all of these collections to the Redfish storage schema. So you can see, you know, adding this back in, we maintain all that backward compatibility with the storage services and have the new links back up to the top. What else is new? The other things we talked about and the capabilities, the supported features, all of the features and profiles are new. We'll go into detail of those in a minute.
Starting point is 00:12:35 Most of the bulk of the rest of the functionality is not new. We've had all of that before. We added in being able to advertise and define what the required functionality actually is. We did add two things here. One is we added the consistency groups. The other one was we added the optional elements. The elements themselves existed. They're just now optional. Okay. So into just now optional.
Starting point is 00:13:05 Okay. So, end of features and profiles. Does that end up big enough? Okay. So, kind of keeping that whole life cycle in mind becomes really useful, especially when you start to bring in profiles, because they actually help you. They're the mechanism you use to both prescribe and test your functionality. So features, those, again, those are the high-level descriptions
Starting point is 00:13:37 that an implementation can advertise that it supports using that feature registry. So the profiles are the detailed descriptions that correspond to those features. What we've done, there's multiple ways you can do this. We could have combined them all into a single file, you know, a single actual file. So the profile files, it's a ridiculous phrase, but I use it anyway because it's hard to come up with anything better.
Starting point is 00:14:04 But the profile files themselves are also JSON formatted files. And in the first session this morning, Jeff talked about that a little bit. These specify in detail, and I'll show you all of the detail in a little bit. What are the required elements? So what are the required properties? What are the required values? What's optional? And so what we've done is basically said for each feature,
Starting point is 00:14:43 and I'll describe these, and I'll go through all of these in a little bit. For each feature, there is a profile file that corresponds to that. And the profile may have dependencies and things, but just to make things very clear, there's a one-to-one mapping. So if you want to look at a particular feature, you can go look at a single file to kind of see that. And so then from a client perspective, you can actually build up and say, you know, hey, I'm, I'll pick on Morali here, I'm VMware. We want to require that, you know, a Swordfish implementation has, you know, features X, Y, and Z. They must have discovery.
Starting point is 00:15:25 They must have event notification. They must do basic provisioning. And that's the VMware client requirement. So that's actually how you would feed that into the cycle on the right. So you could actually just say, we're going to check and say this implementation advertises those things, but we want to make sure that they go through the Swordfish CTP verification program over here and actually, you know, can have and have verified, you know, conformance for the, you know,
Starting point is 00:15:59 for those features in their implementation. So, you know, that's where you kind of get that set of required profiles, corresponds to those three features that feeds into our CTP certification program and that implementation needs to run those tests. All of that is open source or will be open source. It's all open source framework now. It's getting plugged together to become a cohesive system moving forward. Any questions on any of that? All right.
Starting point is 00:16:41 Okay. So what are the actual profiles? This, rather than doing a list, since folks have been seeing my slides for a while, and you can tell I tend to do very text-based. You have pictures this time. So here's actually all of the profiles, all of the swordfish profiles. Now, each one of these actually has a relatively simple definition that goes along with it. I will go through the discovery one in detail. So I'm not going to talk through that one. I'll give you more detail on that. Because it's got a little bit more to it. So things like, let me grab the pointer again. Right end of the pointer.
Starting point is 00:17:29 So IO performance, for example, there is a class called IO statistics, a schema called IO statistics that hangs off of about four different other schema. It hangs off of volume and storage pool, file system. I think it hangs off of storage service as well. So maybe four or five places. What this says is, the way this profile is defined, it says if you have implemented, if you have one of those items and have implemented that, you have to do every single property.
Starting point is 00:18:11 So it's not a very complicated algorithm. It just says if you implement it, if you want to advertise that, you have to implement this class and you have to do every property in them. It's a pretty simple set of requirements. Pretty simple to test for, too. So some of these other ones are pretty straightforward as well. Like block provisioning. You see there's a whole little hierarchy here that goes off of the provisioning ones.
Starting point is 00:18:42 But block provisioning is if there is a volume collection, or volumes anywhere in the system, you have to be able to modify, create and modify them. That sounds pretty simple when you go to explain it. It was a really ridiculously long argument that led up to that. But, you know, one of the things we did not do, at least for the initial implementations of all of these, was make them really complicated. We've actually deliberately left out, you know, a whole slew of required properties on all of these things.
Starting point is 00:19:21 Most of them have very simple definitions with them. There are a handful of, things. Most of them have very simple definitions with them. There are just a handful of places where we have required properties. So one of them is, as we talked about, and I'll show the exact details of this in a minute, like where we said swordfish discovery, you need to be able to differentiate between a redfish and a swordfish implementation.'s a required property in there for that but we don't basically go through
Starting point is 00:19:50 and say oh yes for you know every endpoint you have to have the name and the manufacturer and all of those things if we get requirements coming back that say um you know no everyone really needs to do that we need and we really need to flush these, everyone really needs to do that, and we really need to flush these things out more, we will do that. But what we wanted to do was really get a high-level set of functionality, base functionality, with the assumption
Starting point is 00:20:16 that implementations are going to populate a bunch of properties in these objects. And if we need to go back and, you know, get down to that level of detail, we will. Okay. So some of these other ones, just to kind of clarify, local and remote replication have very, very similar requirements. In fact, they're identical except that you need to be able to differentiate a local fault domain from a remote fault domain. And so that's the only reason that those two things are separate,
Starting point is 00:20:50 is you have to be able to differentiate your fault domains. That's as much from an implementation side as from a client side, although the client sides really also want to know that when they're setting up DR. So if you actually go look at those profiles they have like two or three words different in the entire files um there may be a way to to actually combine those two things together but from my perspective i was trying to just keep that one-to-one mapping I talked about. Okay, so mapping and masking. Again, it's very, or sorry, let me do this one.
Starting point is 00:21:33 Block capacity management is also very simple. You need to be able to do a, basically expand the capacity of volumes or file systems, depending on whether it's block or file capacity management. So again, those profiles are very tiny, but they correspond to the target functionality. Okay. And then there were a couple of things over here that ended up specifically in the class and to have class of service and they're duplicated, which is the replication.
Starting point is 00:22:08 That's actually because the replication works differently over there. So we had to, even though it's basically still advertising replication capabilities, the underlying schema requirements were radically different, so we ended up with different profiles for those. All right. So I think I've probably, well, I've said a lot of the framework for this, but we haven't really described them yet.
Starting point is 00:22:37 So when an implementation goes to advertise features, it advertises them through the features registry, and it advertises the list of supported features. These are the features it is currently able to support. That includes this set of properties, feature name, description, the version, and a pointer to the corresponding profile definition. This whole thing, for anybody who's really familiar with Redfish, is a derivation of the message registry. We leverage the message registry file concept so that this, we don't have
Starting point is 00:23:16 to, we didn't have to reinvent anything around how we publish, how to find a profile, and where's a profile file, and, you know, do I have a local copy, or do I have do I have a local copy or am I using the one over there? Because message registry file structure has all of that. So we're just kind of leveraging all of that completely rather than redefining it. We apologize for calling message registry file instead of just registry file. That's too much to change. finding it. We apologize for calling message registry file instead of just registry file.
Starting point is 00:23:45 Okay. And so this is I'm not going through the entire how these things get published because I thought about doing that but that's basically just a complete duplication
Starting point is 00:24:06 of a message registry presentation. And so if you actually go look at message registry, it's very similar. It's going to be very similar, except they're probably not going to be changing all the time. But you basically go register against a message register or register to get or you it's more like you it's not like messages where you're going to get notifications it's more like you'll just go request against a set of resource or a set of
Starting point is 00:24:37 resources um but this is basically how the all of the those various things we saw are described. So you'll end up with the nea.swordfish.discovery, the description, its version, and its profile, corresponding profile. And then there's entries in, I think this is, yeah, 1.02. And this is published out on dmtf.org slash registry slash shortfish for all 13 of those boxes on that previous slide. So if you want to advertise your own features, which this also fully enables, these are the standard features. If you have oem.better version of feature, you just create your own registry, export it and say, yeah, sure, I support SNIA.replication, but we also have, you know, best vendor ever dot replication. And so, you know, clients will care about that one
Starting point is 00:25:51 rather than just the SNEA version. So the features registry, we've actually had quite a bit of interest in. We created this, obviously, for this purpose, but we've actually had several vendors interested in using this for a lot of, for a multitude of purposes. Okay, so going down a little bit into one of the profiles just to kind of walk through this.
Starting point is 00:26:21 I didn't talk about this one specifically, but I did point out these are really fairly straightforward. They're not super in-depth. I'm sure if you combined them all together, it would look a little bit more complicated. So the swordfish discovery basically says, if you implement, this sounds really obvious, but if you implement an object, you must support a get on it and on all of its properties.
Starting point is 00:26:56 Like, it's got to be readable. I know that sounds stupid, but we went ahead and specified it. And then we did have, this one we have a set of required properties. So most of them actually do have, you know, something, something similar. So here we have, you know, that, that first one we've talked about several times, right? The computer system hosting roles properties must be set to type store server. The other thing is you must have a volumes collection in storage. You must have a storage pools collection in storage. They don't actually have to have anything in them.
Starting point is 00:27:33 And the reason we say that is for the initialization case. When you first initialize a system and bring up a blank card from, you know, it doesn't have any volumes on it. That's perfectly fine. It doesn't have any storage pools created yet. That's perfectly fine. But those collections have to be there
Starting point is 00:27:59 so that you can create the first one. It also must have a collection. This one gets a little bit more complicated in the wording. You also must have a collection for storage media. So the difference you'll see here between the Redfish, the minimum requirements for Redfish, which was storage volume and drive, our minimum requirements here just add storage pools.
Starting point is 00:28:26 Plus we add a little bit of flexibility around drives. Because we basically say, you've got to have some kind of media. And it could be a drives collection, and we allow that to actually be in a couple of different places. And then we also require that you populate the storage system collection in the service route. And that's it.
Starting point is 00:28:51 That's as complicated as it gets for the discovery. And again, if we get pushback, we get clients coming in and saying that we can either create supplemental profiles that add on to that, or we can enhance the profiles. Okay, so I have a handful more slides that actually just kind of break this down. It should go through pretty fast. But this, basically the next three or four slides are basically just the profile that shows what an actual that that actual profile written out so here's the first one just a bunch of headers right so this one here says um here's the registry it pulls in that feature registry and says this is the actual
Starting point is 00:29:41 um supported feature supported feature that is used or required or dealt with somehow in this file. So then we get into the set hosting roles property. It was the first one on our list. So what this does, it says in service root storage system systems in the computer system hosting roles must it's conditional this conditional requirements basically says this value must be set to hosting roles must be set to
Starting point is 00:30:14 storage server okay and then here's just the collection of requirements for those collection of requirements for those. The collection of requirements for collections? The set of requirements for the collections. So we have requirements, again, in that service route that says,
Starting point is 00:30:37 under computer system, in storage, you must have a volumes collection. You now must have the storage pools collection. And where I was saying, they don't have to have anything in them, that's where this members min count comes from. And then drives over here. There's not really good language. One of the things we're working on is one of the things that's missing in the language right now within the interoperability profile schema is the ability to compare two
Starting point is 00:31:07 things that aren't really related, I mean that aren't right next to each other in the schema. So if you have two properties inside the same schema, it's pretty easy to do. So right now, this is actually written out in the text. So it'll have to to when we go to create the test for it, it will have to be a hand coded test but we're working on trying to figure out how to actually expand the schema language to
Starting point is 00:31:36 do what's actually written in the purpose there so we can fix that but for now it's basically the intent is written there. But I haven't gone through in detail, but we have a structure inside the storage pool called capacity sources. And that's normally where you basically say,
Starting point is 00:32:01 hey, here's whatever media you're using for this. You can have pools of pools. You can have pools of volumes. You can have volumes of volumes. You can have volumes. So you can comprise pools hierarchically. You can have pools made of persistent memory. You can have memory chunks, all different types.
Starting point is 00:32:24 If you want to just use that drives array that's in storage, you can do that. So those are all the different options you can have. And this basically says, you've just got to have some media somewhere. So it's a really hopefully not overly verbose way of saying that. Okay. There's also going to be in the profile this general set of requirements.
Starting point is 00:32:57 So these things are basically like for any storage system collection in the unit, and here's a set of property requirements. Again, these don't really have anything in them other than this one where I said in that original table you must have at least one storage system collection. So that's effectively how, you know, it's not everything that's in the profile languages. It's just the profile language can get a little bit more complicated, but we're trying to keep our profiles hopefully a little bit simpler so folks can actually understand them and pick up and adopt them relatively easily
Starting point is 00:33:38 and also so that we can develop the tests pretty easily. Okay. So time-wise, didn't do terrible. So that was about it for the content. Where do we have questions? Questions? Questions involve what time's lunch?
Starting point is 00:34:13 Is it typically going to be one profile for feature or is a feature as well? Yes. The question is, is it typically going to be one profile for feature or is it one or more? If I'm writing them, I'm going to try and keep it to one-to-one. But the profile language doesn't care. So what Redfish has done, well, they're not using the features now, but they could. If I go back to here, one of the things that I was looking at when I was looking at for generating, for working with OCP, the OCP storage group needs to develop, their processes basically say they want to develop
Starting point is 00:35:07 an OCP interoperability profile for storage. What I would look at doing there would be basically writing a single profile that says include Swordfish Discovery, or requires Swordfish discovery and event notification and maybe block provisioning, whether that's up for debate in the group or not. Pick two or three of these in there. So they would technically have one profile, but it's pulling in three or four of our profiles.
Starting point is 00:35:50 Would it be a copy of the place there is a, it would reference, and it would use a feature in here that is basically, it's an equivalent of an include. Well, it's not really an include. It's a requires. Maybe it's in here. Yes, required profiles. This one doesn't have it because this is our base profile, but all of the rest of our profiles correspond to the arrows in that tree.
Starting point is 00:36:39 So every other profile includes, well, not all of them. Most of the other profiles includes Neo.discovery. If they go down two levels in the hierarchy, I didn't. I just included the one above it because it already includes Neo.discovery or Swordfish discovery. So you're seeing the OCP version is just a cherry pick profile? Yeah. Instead of rewriting
Starting point is 00:37:05 everything from scratch. And anyway, that's a proposal I put out to them. We'll see how that goes. I don't see why they would want to rewrite something else from scratch.
Starting point is 00:37:21 The only reason I could see for that would be, and the reason I actually like that model is because if they decide they have some additional properties, then we could put that in the OCP specific thing. But I think, you know, so I'm kind of circling back a little bit to Don's question about, you know, is it one or multiple? That actually ends up being, from their perspective, it's kind of both. They end up with one. It's actually referencing our profile. Yes, it's a chain.
Starting point is 00:37:58 It ends up referencing our profiles. Yes. And then they could actually declare a new advertised feature, which is conformance to the OCP profile, in addition to advertising all the Swordfish profile features there too. Any other questions, or would folks like lunch? All right, thank you. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org.
Starting point is 00:38:35 Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.