Storage Developer Conference - #49: Time to Say Good Bye to Storage Management with Unified Namespace, Write Once and Reuse Everywhere Paradigm

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast. You are listening to SDC Podcast Episode 49. Today we hear from Adjaneya Chagam, Principal Engineer, Intel, as he presents Time to Say Goodbye to Storage Management with Unified Namespace, Write Once, and Reuse Everywhere Paradigm from the 2016 Storage Developer Conference.

Starting point is 00:00:56 My name is Reddy Chagam. I'm the principal engineer and chief SDS architect at Intel working in data center group so I'm going to cover today more most of my session is all about open source is DS controller effort so there is work going on behind the scenes with few storage vendors as well as big end customers and I would like to highlight what exactly we are doing within the context of enabling the storage management in the industry I won't be able to go over the details of there are specific announcements that are actually going to

Starting point is 00:01:35 go come come up in the next month or so I won't be able to talk about those details but I will give you a flavor for why are we doing this what exactly are the problem areas that we are trying to address when it comes to storage management in the industry hopefully it gives you a flavor of the need behind the work that is going to happen towards the later part of this year so I'm going to cover so I'm going to cover bit on on software defined storage what do I mean by software defined storage the I'm going to take a couple of storage stacks

Starting point is 00:02:15 one is the open stack the other one is kubernetes to give it to give you a flavor of how the storage management is being done in the existing stacks one gives you the virtualization layer the other one is being done in the existing stacks one gives you the virtualization layer the other one is a container framework so I wanted to give you a pulse on how storage management is being done in these two different stacks paint a picture of what exactly the problems that we are facing in general both with the cloud computing as well as the virtual virtualization management frameworks give a context behind what we are going to do with the open stack as

Starting point is 00:02:50 DS open source as DS controller in you know proposal I'll talk a little bit on the next steps and you know specifically call to action so if you look at this software defined storage by the way it's a small audience so if you look at the software-defined storage by the way it's a small audience so if you have questions feel free to stop and ask so looking at the software-defined storage there are two broad elements one is the which you are probably familiar with quite a bit which is at the bottom anything to do with the scale up which is the traditional SAN and NAS appliances, and scale out is one building block element when it comes to software-defined storage. Scale out can be anything that you instantiate on the standard high volume servers using

Starting point is 00:03:35 open source flavor like Ceph, Swift, or proprietary flavor like Scale.io, vSAN, Nutanix, Storage Spaces Direct, and so on so an element of a scale out an element of a scale up essentially is considered as the storage systems deployed in a data center but in order to manage those pieces that are actually deployed in a data center you need something at the management plane we call that as the software-defined storage controller so what is the role of software-defined storage controller so fundamentally it needs to have visibility into all the storage resources deployed in a data center it

Starting point is 00:04:17 needs to provide a mechanism to provision storage resources to meet targeted SLS SLS could be I want X amount of capacity I want X amount of latency I want X number of IOPS X amount of throughput so anything and everything related to storage requirements controller needs to be able to understand be able to cover those storage resources in an optimal fashion and work with any orchestrator so it doesn't have to be open stack only it doesn't have to be cloud stack it doesn't have to be OpenStack only, it doesn't have to be CloudStack,

Starting point is 00:04:46 it doesn't have to be Docker or Kubernetes, but it's need to be able to work with any orchestration framework out there. So that's kind of the notion of what I mean by software-defined storage. So two different elements, control plane, managed by software-defined storage controller, managing all the storage resources.

Starting point is 00:05:04 But once it calls out a resource, you will use the standard data plane protocols, NFS, iSCSI, ICER, NVMe or Fabrics, anything and everything related to block file protocols or maybe even future key value protocols as well. So we are not talking about changing anything in the data plane the data plane will continue to stay as is it is only the control

Starting point is 00:05:30 plane where how do you manage the storage resources how do you have visibility of the storage resources in a data center and how do you allocate the storage resources in a much more optimal way okay good does it also work in deaths environment environment DAS is a direct attached storage that's what I mean by yeah so a base storage you mean yeah so it should there is an element of standard standardization that is going on with the swordfish and redfish if you know the standards activities under the Snea plus DMTF umbrella so the goal is to figure out so today if you know the standards activities under the SNIA plus DMTF umbrella.

Starting point is 00:06:06 So the goal is to figure out, so today if you look at most of the storage management, it is all focusing on network attached storage, scale up or scale out. But the goal is to actually rope in anything that happens on a compute node, local storage as well, as well as the fabrics and targets. So we have to include those elements as well as well as the fabrics and targets so we have to include those elements as well today it is mostly there is a scale out storage and there is a scale up I'm going to write a driver I'm going to consume it everything else is done in its own fashion in each and every orchestration

Starting point is 00:06:37 storage layers right but the intent is to bring in those elements as well. We talked about single thing of manage this storage different of the storage. Yeah. And we have challenges where you have storage devices that are servers, then you have SAN, NAS, or different network devices. Exactly, yeah, yeah. So broadly classifying the way I look at it is

Starting point is 00:06:59 there is a storage that is sitting in a compute node that is purely servicing the compute elements like virtual machines or containers and then you have targets high-scalability targets and VM over fabrics based targets that are really needed as part of the provisioning flows as well as allocating the storage resources then you have scale out you know some sort of a service that is deployed on a standard high volume server pool like Ceph and Swift. And then you have scale up. So we should be able to manage all those pieces. They're all considered as the storage resources, but depending on your orchestration stack, your mileage will vary, right?

Starting point is 00:07:39 Yeah. Okay. So let's look at the click down of what are the major functions that you typically would like to see in the storage controller when it comes to control plane operation. So the way I look at it is you need to be able to manage the storage resources from the beginning of procuring your storage systems and deploying in a data center, all the way to retiring your storage resources in a data center. So that's how I look at it,

Starting point is 00:08:10 and there are different functional building blocks that happen for you to actually start consuming the storage resources in a data center. When I say consuming, create volumes, create shares, attach volumes, attach shares, so that the compute side of the orchestration can take advantage of storage resources. So typically you start off with provisioning.

Starting point is 00:08:33 Normally what happens is if you have a pre-configured appliance, provisioning is going to be fairly lightweight other than maybe configuring and all that stuff, and then connecting the network fabric to the compute cluster. If it is a scale out obviously you have to deploy the operating system, you have to deploy the specific software, you have to have all the pieces before you can start consuming them. So there is a little bit of heavy effort involved when it comes to deploying the server-based storage scale out stacks all the way from operating system deployment and provisioning on top of it once you provision the storage resources and connect the storage

Starting point is 00:09:11 storage systems in a data center you need to be able to discover yeah there are lots of ways to discover but most of the discovery happens to be you log into the storage appliance and figure out what exactly it has depending on what vendor you are talking about they may have a more integrated way of discovering the storage resources but in general it's not very well integrated and thought out you know phase and then you need to be able to take the storage resources and loop them you know essentially group them into certain logical buckets so that you can manage them together so you may have a pool that is performance

Starting point is 00:09:49 oriented pool you may have a throughput optimized pool you may have a capacity oriented pool you need to be able to group them but you can't group them unless you know exactly what the back ends are capable of right so there is an element of composing them into certain logical pools and then start consuming them which is essentially creating the you know blocks and file shares and buckets creating objects and so on which predominantly happens to be most of the focus in the orchestration stacks so if you look at most of the orchestration stacks they assume that these three building blocks are already taken care

Starting point is 00:10:25 so you normally don't see these orchestration stacks paying attention to those three building blocks mostly start consuming them with the assumption that they are already plugged in they are already connected to the compute nodes everything is working fine you start consuming them the later part is obviously once you start come you know creating the volumes and shares, how do you maintain and monitor, which is a critical component as well. So I'm asking for 10,000 IOPS, am I really getting 10,000 IOPS? If you look at AWS, for example, provisioned IOPS, you are essentially looking at ask for

Starting point is 00:10:58 X number of provisioned IOPS, am I really getting that? The visibility of what I ask for what I am getting is you know somewhat challenging depending on what tools you have and what framework you are dealing with and then obviously retiring so the key part of retiring is you need to be able to migrate the data out of the system that you want to retire so there needs to be a mechanism to automatically migrate the data which depending on your storage backend, sometimes you may have that mechanism, sometimes you may not.

Starting point is 00:11:28 And it doesn't normally work with cross-vendor storage backends. Works probably seamlessly within the same vendor skews, but when you look at the cross-vendor technologies, data migration is fairly challenging. So the pieces at the bottom, most of the challenges are scale and hitche has been fairly problematic if you look at openstack cinder and Manila a lot of focus in ensuring that the tools can actually scale and there is a high availability configuration baked in to deploy for large pool of machines and manage them the discovery

Starting point is 00:12:05 and classification is the piece that is really missing the third element is the policy based orchestration so if I have a policy on quality of service things like IOPS latency and so on how do I abstract it and convey that all the way down to the lowest level so it can be intelligent about how to carve out the resources multi system orchestration so if you have multiple sand back ends how down to the lowest level. So it can be intelligent about how to carve out the resources. Multi-system orchestration. So if you have multiple SAN backends, how do you manage them together as one entity?

Starting point is 00:12:32 Has been fairly challenging with most of the backends. And then automating the data migration piece. So lots of tools in this space, mostly vendor specific. This is largely ignored. Lot of focus when it comes to orchestration stack in this area lots of tools there are chef puppet there are Nagios there are lots of tools out there you can take advantage of but the goal is to kind of bring in the visibility of how is my volume behaving how is my

Starting point is 00:13:01 share behaving from a performance latency perspective? At a data center level, can you give that visibility? So that's kind of the key thing. Yeah? You said lots of tools, but what about EOL for tooling? Migration in that area? Migration, again, there are certain basic. So the question is, are there lots of tools in terms of migration?

Starting point is 00:13:26 There are tools, if you look at vendor specific, let's say EMC has its own set of tools to migrate from EMC equipment to EMC equipment. So is NetApp, Huawei, Hitachi and so on and IBM of course. I think there are a couple of vendors, proprietary vendors who can actually do cross vendor migration there are some brute force implementations from open source perspective where they look at a volume a and volume B it's plain vanilla offline export import kind of a scenario but it's an offline migration not really real-time inline migration so it depends on whether you are looking for a very sophisticated inline migration you are mostly better off by taking the commercial tools.

Starting point is 00:14:06 If it is a very simple offline migration, there are lots of tools out there, but you have to do a lot of hand stitching to be able to get that done. I don't know if I answered your question. Yeah. It's still an area that's still very much in work. Yep, exactly.

Starting point is 00:14:23 Okay, so I'm gonna take a look at a couple of orchestration stacks. Hopefully it will give you a flavor of how it is being done today storage management. One from virtualization management perspective the other one is container management. So OpenStack is very popular for managing you know deploying clouds that takes care of compute, network and storage. So you get all those three pieces when you look at OpenStack and somewhat similar to what you look for in the AWS type of an implementation for private cloud.

Starting point is 00:14:56 So the top NOAA is the virtual machine management stack in OpenStack, and then Horizon is the dashboard when it comes to storage we have four different building blocks Manila is meant for creating shares Cinder is for the block creating block creating volumes and managing volumes for the multi back-end storage scale up and scale out systems and then glance is mainly meant for images so if you have lots of virtual machine templates it's a repository for managing those templates and creating those templates snapshotting the existing volumes converting them into templates and so on and then the last one is the object store swift is the object store default object store for open

Starting point is 00:15:40 stack it's not really a control plane right it's an implementation so swift is an end-to-end implementation of a given object store but there is also an api swift api that other backends actually use as a mechanism to integrate into openstack much more seamlessly so if you look at the flows i kind of wanted to show one specific flow to give a flavor of how the storage being orchestrated So cinder has support for Essentially for LVM, which is the local That's use case and then scale up. So lots of storage vendors have drivers underneath the cinder

Starting point is 00:16:22 And then we have scale out driver plugins as well sef kind of comes under the you know scale out driver plugin rbd plugin the initial step of it assumes that these things are actually deployed in a data center configured and everything and you when you start configuring the Cinder, you essentially provide a mechanism of how to connect to these storage backends and then it starts pulling certain statistics. Things like, are the storage backends up and down?

Starting point is 00:16:55 What is the state of storage backends? It also pulls in certain stats, things like the storage capacity and so on. Cinder uses that information as a mechanism to figure out what is the best place to, you know, place volumes when there is a request coming in. And I will explain the flow a little bit later. So if you look at the number one,

Starting point is 00:17:20 either you can actually do this from command line or from UI, you are essentially creating a volume. And when you are creating a volume, you are not specifying I want to create a volume on EMC equipment or scale out Ceph. You are essentially saying I want to create a volume. Here is the size and you can have different properties attached to it, including logical pools.

Starting point is 00:17:44 Things like I want to create a volume in the performance pool. Performance pool may have three different backends, and it will go figure out what's the best way to pick one of them and create a volume in that backend. So, but when you start creating a volume, you're not really specifying any backend, any specific instance.

Starting point is 00:18:01 Cinder actually deeply buries that abstraction, and it figures out what's the best way to pick the right storage system and create a volume once you could so the volume creation once it goes through the cinder layer it essentially create figures out the best possible candidate and connects to the storage back-end using a driver to create a volume on that storage back end. And once that is created, the next step is you want to either use that volume as a way to boot it or attach it to the existing virtual machine instance. So you normally issue boot or attach

Starting point is 00:18:38 volume which is the step number three. NOVA is the one that starts looking at volume properties. It connects to Cinder and say, give me the volume properties for this volume ID. That will have information about the storage backend and maybe the security credentials on how to connect it, what type of protocol it is exposing. All the details associated with that volume actually are provided by Cinder into NOVA.

Starting point is 00:19:06 It can also pull the virtual machine template information from GLAMS and use that as a way to boot virtual machine. So step number four essentially gives you where is the volume located, what's the boot instance that I need to be using to start up a virtual machine. And then it uses the lib word, which is essentially a control plane layer for QMU, KVM, and then it uses the lib word which is essentially a control plane layer for QMU KVM virtual machine instantiation to really create the right set of virtual machine parameters and start the virtual machine so that's kind of the flow on what you see in it when it comes to open stack creating the

Starting point is 00:19:44 volumes attaching the volumes and starting the virtual machines but the key property that you see is that cinder is essentially giving you that layer of abstraction so that you don't know which storage back-end it is actually selecting there is a scheduling component in the cinder you can write your own scheduler you can schedule it such a way that you are filtering based on maybe specific region you're filtering based on certain performance properties default scheduler is fairly

Starting point is 00:20:12 primitive it looks at all the storage back ends and it figures out the first storage back end that has the most amount of capacity and creates picks that as a candidate for creating a volume but you can plug into your own scheduler that is a lot more sophisticated and do very intelligent things around it. So that's kind of the flow. There are lots of other projects in OpenStack,

Starting point is 00:20:35 things like how do you manage your security. There is a Keystone. I mean, the user credentials is Keystone, and then there is a Barbican, which is the security key management. And then Selamator is the one that where you normally provide telemetry information. And then Neutron is for the networking orchestration.

Starting point is 00:20:53 So Neutron, Nova, Cinder, Manila, Glance, those are the five building block components that really provide the foundation for compute, network, and storage orchestration okay so hopefully you got the gist of how the flow happens the key thing is open stack in general as if you are aware of open stack where it is going there are lots of production implementation so the layers in this are being used in production so there is a production level of maturity with these stacks so that's a good thing lots of vendors writing drivers for

Starting point is 00:21:27 sender and Manila so you see lots of ecosystem out there vendor ecosystem writing drivers so you will find lots of driver support out there again as I mentioned in the previous slides obviously it doesn't have storage management that is completely baked in it provides constructs on how to group storage resources but you don't know you need to have a visibility into what your storage resources are what their properties are you need to have a mechanism to logically group by yourself manually or custom tools and whatever it is and then create config files to cinder to say here is a performance pool

Starting point is 00:22:02 here is the capacity pool here is the capacity pool here is the throughput optimized pool so you have to hand stitch the configuration before cinder actually is aware of both here are the storage resources and here are the logical groupings and here is how we consume the resources is something that you have to do a little bit manual the scheduling and monitoring is still evolving specifically monitoring, lots of things happening but it's still not a place where you can actually get an end-to-end visibility of how the storage resources are

Starting point is 00:22:33 and what their properties are. Okay, any questions on the OpenStack so far? Good, okay. Yeah. Do we have any particular good okay yeah yes the cinder basic scheduler essentially uses the free space as the mechanism to pick the right back-end but there are a few different flavors of filters there is an affinity filter so you can say i want to create this volume for this compute node right lvm needs an affinity filter because you don't want to create an lvm on a different node and your compute is on a different node right so there is an

Starting point is 00:23:16 affinity filter so there are few other filters that you can potentially use you know so there is a mechanism to extend it as well but the default implementations are fairly I would say primitive in my opinion okay any other questions on the open stack okay so I'm going to cover a little bit on the container orchestration how many of you heard about kubernetes okay cool fantastic okay so there are two types of servers in kubernetes one obviously you probably will guess it node is the place where you are really deploying containers but for you to manage a pool of machines node can be your physical machine or a virtual machine but for you to manage a pool of machines, Node can be your physical machine or a virtual machine, but for you to manage a pool of machines, you need a clustering software on top of it.

Starting point is 00:24:12 So the master server has a list of services that essentially provide a way to manage a pool of servers. Things like, how do you deploy, how do you select specific pool of servers, things like how do you deploy, how do you select specific pool of containers to deploy on a thousand-node cluster. There is a mechanism to schedule. So the scheduler is the one that is essentially responsible for figuring out where is the right place to start your containers. Replication controller is the one that essentially gives you a property of there is a minimum and there is a maximum that you can actually set a

Starting point is 00:24:49 pool. It is responsible to make sure that you have enough number of container instances running in a data center to get your job done. Things like load balancing, you have a front-end web server and you say I want 10, it will make sure that there are 10 front-end container instances managing the load, and if it goes down by two or so, replication service is the one that is responsible to bring them back up. So it may be because your node is down. You are asking for 10 instances, one node is down that has two instances running on it.

Starting point is 00:25:20 It will make sure that it is starting those two on a separate node. So the whole mechanism of detecting the node state, failure, starting those instances and make sure it is meeting your criteria of number of instances is the function of a replication controller. API server is essentially a place where you use that as a mechanism to connect to the entire cluster. How do you bring up the nodes? How do you offline the nodes?

Starting point is 00:25:44 How do you add additional nodes? How do you offline the nodes? How do you add additional nodes? How do you deploy the service, Kubernetes services on nodes? You do all these three, all these things using the command line tool called kubectl that uses the API server. And you can also use the rest APIs if you really want and automate it that way too.

Starting point is 00:26:01 So when you look at the node, Kubernetes has a concept called a pod. So pod is your basic set of abstraction for deploying the compute instances, which is essentially a pool of containers. So this is the bottom most abstraction for deployment. Pod can have one container or many containers. And the reason why they chose pod is it makes easier for managing a pool of containers

Starting point is 00:26:35 from an administrative perspective, as well as things like how do you manage the policies from a networking as well as, you know, other pieces. It makes it a lot easier to manage if you have a collection of containers as opposed to managing each one as its own entity. So part is a collection of containers, and there is an agent sitting on each and every node which is called kubelet. Kubelet is essentially an entry point into a node to get anything and everything done.

Starting point is 00:27:05 So it's an agent to execute the operations on behalf of the master services. And proxy is essentially a network proxy. It gives you the network virtualization services that are needed on a node. Stitching all these, stitching nodes as well as master servers together there is foundation layer called HCD this is essentially a key value distributed storage layer it keeps up all the metadata that is needed behind beyond node failures as well as you know ensuring that your master servers can scale and so on. And then there is a concept called persistent volume.

Starting point is 00:27:50 So you can essentially attach persistent volumes. There are certain policies that you can define. There are three different policies that Kubernetes provides. One is the capacity, how much capacity you need for a given volume. Recycle. And then the actual access policy. So access policy could be I want only read only volumes, I want read write volumes, it is a private instance, it is a shared instance.

Starting point is 00:28:13 And the recycle policy says when you are done, do you want to actually recycle the data, do you want to delete the volume and share, or do you want to just leave it as is? So three different policies and the policies are still evolving right so the key thing is this is a growing community specifically very good framework for container orchestration sophisticated scheduling lots of backing from you know store you know from wide

Starting point is 00:28:42 variety of you know storage vendors as well as other big companies. The big thing is that the storage interfaces are still evolving. As I talked, there is a capacity-based as well as a little bit of access policy and how you want to recycle the data. But things like performance attributes, how do you explain them, how do you provision them, those things are still being worked out. And then most of the storage management is out of scope, it assumes that everything is provisioned and you're consuming them, so it ignores the fact that there is a provisioning part, there is a discovery part, there is a pooling part, there is a monitoring and maintenance, there is a little bit

Starting point is 00:29:18 of monitoring and maintenance but it is fairly basic. So you kind of get the theme from OpenStack and container orchestration is they're very good at provisioning the volumes and shares. Your scheduling may not be sophisticated depending your mileage will vary based on what orchestration stack you're picking. Every other area is completely ignored. So there are lots of other open source flavors. These are, you know, if you look at Apache Mesos, it's essentially an application framework, so you can actually deploy big data, you can deploy infrastructure as a service, and they can coexist on the same data center infrastructure,

Starting point is 00:30:00 so it has mechanism to up-level the abstraction to applications, and then you can write you can use those apis as a way to write your own application frameworks and deploy them in a data center right so apache mesos mesos is very popular docker swarm mainly meant for native clustering for the docker instances containers cover head i presented this with emC last year, so this is another open source flavor, software-defined storage controller. CloudStack, very similar to OpenStack, it's an equivalent of OpenStack flavor. Eucalyptus, somewhat similar to OpenStack and CloudStack, but it is tilted towards more of AWS-friendly, integration-friendly orchestration. And there are several others, like Giant, friendly orchestration and there are

Starting point is 00:30:45 several others like giant open nebula there are a few others that are actually in the industry so you can get a flavor for you know there are lots of open source flavors out there all of them are actually trying to figure out what's the best way to integrate storage resources if you look at the flavor of first three so there is Kubernetes there is darker and there is mesos virtually everyone has their own mechanism to integrate persistent volumes which is essentially volumes coming from network attached storage depending on which orchestration you are picking your integration touch points will vary and there are drivers in each one of them and then the EMC has something called

Starting point is 00:31:25 lib storage and rect-stray and as well as the flex volume concept from Diamante. The concept is, instead of me writing a driver, I will give you an abstraction layer. So you write to my abstraction. I'll make sure that everything plugs into container frameworks,

Starting point is 00:31:42 whether it is Kubernetes, whether it is Mesos, whether it is Docker. I'll be able to make sure they plug in very seamlessly so that's kind of the concept of what you see with the EMC and flex volume thing a framework that lets you write drivers so you don't have to worry about which one you are plugging underneath and then the last one is the open storage one this is mostly revolving around how do you have the API abstraction that both for the front, the northbound APIs as well as southbound APIs as a mechanism to write drivers seamlessly

Starting point is 00:32:14 has been the approach from a standards specification perspective, not standards but rather specification and reference implementation perspective. So the last four are mainly revolving around I want to create an abstraction, you can write a driver based on my abstraction, I will make sure it integrates seamlessly with all the other orchestration stacks. Mostly focusing on cloud native computing frameworks which are in the first three buckets. Okay, so what is the problem with this? So if you piece all the things together,

Starting point is 00:32:55 so if you look at all the containers as well as the OpenStack frameworks, this is just a sample of four, but when you add both proprietary as well as the other open source flavors, you can actually see the magnitude of the problem. So what is the problem here? If you look at it from top to down, it is very important for us to look at it from a storage perspective. Can I attach storage that is sitting on a compute, which is called DAS? Can I actually do that for the iSCSI targets as well as NVMe over Fabric's targets?

Starting point is 00:33:27 Can I actually allocate storage resources that are somewhat compatible with network-attached storage as well? Scale up or scale out? So there is no parity between these three different types of instantiation, DAS, iSCSI, NVMe over Fabric, target implementations, and scale-up and scale-out storage solutions, you see that there's not a whole lot of commonality when it comes to implementing those things.

Starting point is 00:33:52 So the goal is to figure out what is the best way to do that. Rather than having every vendor, every abstraction out there trying to position themselves as, hey, I have my own abstraction, why don't you write a driver underneath it? Our goal is is there a way to create a common abstraction that everyone can participate and influence as opposed to every vendor trying to drive this in a different direction right and the last thing is from a storage vendors perspective ideally in an ideal situation i want to be able to write a driver as you know writing a driver certifying a driver is a nightmare it takes significant amount of effort to make make that happen so storage vendors are

Starting point is 00:34:28 looking at can I write a driver test it thoroughly and then from there onwards it's an integration problem that I need to focus on as opposed to really writing a driver for every packet and every orchestration so storage vendor ecosystem pain point is right once we use it everywhere so if I were to do this cleanly what will it look like the yeah the picture is beautiful but you know the challenge is going to be how do we enable this right so the way we are looking at is we'll have open SDS orchestration we are calling that as open SDS controller that will have plugins for all the orchestration stacks that includes traditional computing

Starting point is 00:35:18 platforms like open stack cloud stack as well as the cloud native computing frameworks plus our hope and pray is that the proprietary flavors will also come in you know as a way to plug in VMware Microsoft stacks as well so that's essentially the top portion but for us to do that we really need to figure out what is the best way to abstract the storage resources what is that the basic set that we can actually use as a starting point is it a performance oriented construct like I ops latency throughput space efficiency data services things

Starting point is 00:35:53 like encryption compression so we are looking at few different concepts and say this is good enough and that will be the level of abstraction for us to actually provision the storage resources and we use that as a common integration mechanism with all these two orchestration stacks that's a goal and then when it comes to integrating with the existing ones our goal is not to really start everything from ground sub can we reuse the existing frameworks and then take advantage of their maturity and address the problems where they are not being addressed today things like things like automated discovery and pooling is something that nobody addressed it.

Starting point is 00:36:28 So we are going to look at those pieces as opposed to, oh, let's do this whole thing from grounds up again. So reuse the existing open source building blocks that are mature as a starting point. That includes the driver. So Cinder has made significant traction. We should be able to use a Cinder framework as well as the drivers as a starting point that includes the driver. So, Cinder has made significant traction. We should be able to use a Cinder framework as well as a driver as a starting point, as opposed to really writing from scratch. So, the goal is to reuse as much as possible, address the critical pain points, and lay the foundation.

Starting point is 00:36:56 And, you know, at some point, we look for something common API here. There are a few elements that are actually happening in this for fish area we'll be able to take advantage of enclosure management api's as well as the NVA or fabrics target type of how do you discover them how do you pull them how do you abstract them so the goal is to kind of integrate into the standards bodies and make sure that is happening very seamlessly so

Starting point is 00:37:27 that's essentially the focus so how do we simplify the integration of storage management how do we reuse the existing storage blocks you know and address the real pain points okay I kind of summarize this one in the previous slide focus on the real world you know know, end customer pain points. I think the ones that I really talked about is scale and HA is very painful for me. We need to make sure that it's taken care. That has been one of the top end user pain points. A way to discover and pool the storage resources has been the second most activity.

Starting point is 00:38:02 Automated data migration has been one of those you know critical components where I need to be able to retire the systems but I have data I need to be able to migrate it can I do that without actually going through this expensive but you know storage layer that I need to buy and it has lots of limitations it works with certain vendors and certain combinations and all that stuff so focus on the real pain points. Ensure that we are focusing on the real pain points that are seemed as priority from a customer side.

Starting point is 00:38:34 Look at integrating with the broader orchestration ecosystem, virtualization backends, as well as the container frameworks. Reuse the open source building blocks wherever it makes sense and then the goal last one is very important we really want to be able to make sure that the whole community is coming around the storage ecosystem needs to feel the pain and they need to be part of this as opposed to you know they're not being part of it so that's an important element standards bodies I talked about swordfish is one of the critical elements in here CDMI could play a very critical role too and then obviously the big service providers need to be part of it to be able to influence what is

Starting point is 00:39:13 really useful compared to you know developers coming in and doing the work so we are looking at essentially a vibrant ecosystem that has a mix of storage vendors standards standards bodies, as well as end customers to be able to influence the storage management problem in a broader way. So there are discussions going on and our goal is to announce this sometime this year. I can't because of all the NDA discussions that are going on, I can't really reveal who those companies are, when it is going to be announced,

Starting point is 00:39:49 all that stuff, but it's coming up this year pretty soon. Hopefully next month, but sometime this year. You will see that there is an open SDS controller effort. Starting under some sort of foundation, it could be OpenStack or it could be CNCF which is a cloud data computing framework or Linux foundation, one of those three. But in general, it's a broader ecosystem coming together to solve this problem. And really looking forward to you guys actually looking at this effort

Starting point is 00:40:17 and see what makes sense from a contribution perspective. We'd love to have you join this effort as well. Okay. So that's my last slide have time for Q&A so let me know if this if this makes sense are we solving the right problem is this the right thing to do for the industry give us feedback questions Questions? So for example, we have kind of Yeah, so we have, there is a prototyping effort that has been done by Intel and there are a few companies.

Starting point is 00:40:59 So we were doing essentially how do you discover it, how do you pull. Obviously the discovery component is, you know, it's because there is a lack of uniform discovery protocol, you will have to have a driver mechanism. So we quickly settled on the need for drivers. If someone wants to discover their storage backends or scale out systems, they need to have a driver to discover. So we settled on the need for driver based discovery mechanism as opposed to let's go solve that interface problem right the there is also some work that is going on NetApp

Starting point is 00:41:34 has been doing quite a bit of this stuff and they talked about this in SNEA two years ago or so they are looking at SLO based constructs so they are looking at what's the best way to create that normalized storage abstraction from an orchestration layer perspective from for different you know dimension things like performance data services performance could be you know how many I ops do I need for this volume data services could be do I need compression do I need address encryption what should be my encryption management story around it. So they have actually created a nice set of abstraction and we are looking at some of that. There is also a little bit of CDMI spec

Starting point is 00:42:16 that has very good set of abstraction but it is meant for essentially object based backends and all that stuff but we could beg and borrow from the existing standard elements as well to see what is the best way to start with. So there is some prototyping effort going on. There are a couple of other companies doing some work in the, as I showed, EMC has done some standardization for the CN's Kubernetes and Mesos

Starting point is 00:42:40 using lib storage and RxRay. So they have done some work. So there are working examples there. So the goal is kind of how do we take all these elements that are happening discreetly come together in a common way do this in an open source community to really address the storage management in one place as opposed to addressing the storage management in OpenStack, CloudStack, Kubernetes, Mesos, Docker, Swarm, you name it right plus proprietary stacks. So that's kind of the theme. So yeah there are few prototyping efforts that are actually in flight. We looked at the right design in point for doing this

Starting point is 00:43:21 effort. Lots of waiting around behind the scenes to figure out whether we want to do something from scratch yeah it looks very sexy when you go and look at write this in go you can actually attract lots of developer community out there and say hey this is the open source SDS controller we are going to write in go language you will attract reasonable amount of developer community to come in and pile on and do this work but the goal is you know storage management is a very hard and painful problem there is reason why these companies are not haven't come together in the past to do this so we talked quite a bit and say

Starting point is 00:43:54 the best starting point is what's been down art out there openstack cinder and Manila has very nice foundation there is a lot of vibrant driver vendor ecosystem out there. Why don't we start from there if you want to do something and then add on modules and then plug in everywhere has been one of the options that's being explored. So I think in reality, there are lots of moving parts. How do we take advantage of those moving parts to really do this in one place as opposed to discrete places?

Starting point is 00:44:29 And then focus on addressing the gaps any other questions Good? All right. You get 15 minutes back. Is it 15 minutes or six minutes? One of those two. All right. Thank you. Thanks for listening.

Starting point is 00:44:58 If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the developer community. For additional information about the Storage Developer Conference, visit storagedeveloper.org.

Storage Developer Conference - #49: Time to Say Good Bye to Storage Management with Unified Namespace, Write Once and Reuse Everywhere Paradigm

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.