Storage Developer Conference - #83: OpenStack Cinder as an SDS API

Episode Date: December 17, 2018

...

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to SDC Podcast Episode 83.
Starting point is 00:00:40 My name is Sean McGinnis. I work for Huawei, but I've been involved for several years now with OpenStack, and I focus on upstream open source development and talking a little bit today about different ways that you can use Cinder that some people may not be aware of. So first, the background on Cinder. Cinder is the component of an OpenStack deployment that manages block storage. So in an OpenStack cloud, if you request a volume or block storage to be available for your instance or your containers or your bare metal, that goes through the Cinder service to manage that and make that available.
Starting point is 00:01:28 So it's just a management API, and it's an abstraction layer over different types of storage. It is not the actual data provider, the storage provider. It's not in the IO path in any way, and it's not meant to be something that can manage all your different types of storage tasks. It really has the purpose of just provisioning your storage. So that's maybe a point of clarification on the title, part of the title of SDS. There's a lot of people that have different ideas of what that means to be software-defined storage. So just to make that clear for here, at least my definition, something like Ceph where you can take servers,
Starting point is 00:02:25 put a bunch of drives in it, and install software and make that into a storage device. At least for my own definition, I consider that software-based storage. For the definition I'm using here for software-defined storage, that to me is something that abstracts away the details of the storage device underneath and provides an API that you can use to programmatically manage your storage. So I'm talking in this presentation about how you can use Cinder as that abstraction
Starting point is 00:03:00 API to be able to say, I need 10 50 gig volumes, and Cinder can take those requests, and no matter what type of storage you use underneath, it's able to provide those volumes. To give a little background about how Cinder manages the storage, or what its model of a storage environment looks like. Cinder can manage one or multiple storage backends, and this would be Enterprise SAN. It could be Linux box that's running LVM and exposing LVM volumes through iSCSI. There's really a lot of different options of what this backend is.
Starting point is 00:03:49 There can also be, depending on the type of storage, there can be multiple pools within that storage backend. So when someone uses Cinder, it all starts with a volume type. Volume type is something that contains all of the attributes of the type of storage that you need. And in a cloud environment, you don't necessarily want your users of the cloud to know anything about your actual infrastructure underneath. So as a regular user, you just have a name, really. And that's what a storage type looks like to you. An administrator of Cinder or of a full OpenStack cloud
Starting point is 00:04:34 is the one that sets the internal details. They're called extra specs inside a volume type. And that's what gets passed into Cinder. And then Cinder is able to look at those extra specs and decide based on the type of volume or the attributes of the volume being requested it can figure out where to put that volume when there are multiple different storage backend types.
Starting point is 00:04:59 So you create a volume with a volume type that will pick a storage backend create a volume within there volume type. That will pick up storage back in, create a volume within there. Optionally, there can be a glance image. Glance is kind of considered an image registry, so there can be different types of images that are available that you can say I need a boot volume and have that OS made available right away on the volume that you create through Cinder. Multiple snapshots can be taken,
Starting point is 00:05:28 and volumes can be created off of those snapshots. And then, of course, volumes can be made available to hosts and removed and other things as well. So in a typical OpenStack cloud, Cinder, if you see in the bottom here, it's just one box and there's multiple different services that you put together to be able to actually create a full private public cloud using OpenStack to be able to do virtual machine management, spare metal management, configuring networks, things like that. But there's been a lot of work done in Cinder over the last couple years, a while now, to be able to just run Cinder
Starting point is 00:06:12 by itself or pull out the pieces that you would need that aren't necessarily part of a full cloud or what you consider a cloud. So a common thing I see is just running the Keystone service, which takes care of authentication, and the Cinder service. Or there is the possibility of just running Cinder by itself with this configuration setting where you tell the authentication strategy is no auth. And of course, no auth means that there is no authentication. So this works great in some environments. You obviously want to have that in a controlled environment because this is then just exposing an API that if anyone knows about that API, they can access that and pretty much do all kinds of things. Cinder can be deployed on actual bare metal hosts.
Starting point is 00:07:07 It can be deployed in virtual machines. We have Docker files to be able to create containers. There's also a project in OpenStack called Cola that their whole purpose is taking these different OpenStack services and making containers out of them. So it makes it really easy if you have container environments, you can just deploy containers and have this service available there. Of course, you can manually install,
Starting point is 00:07:32 install system packages, get things set up. I wouldn't really recommend that. It's not the most fun way to deploy these services. And then, of course, there's all kinds of deployment tools that have done a lot of work to make it easy with Ansible, Puppet, things like that. If you're interested in any more about all the services, how they're deployed, things like that,
Starting point is 00:07:58 there's a lot of good links here. This presentation will be available, so you don't need to try to write all this down but just a couple I want to point out the YouTube link there it's a nice demo of a standalone Cinder deployment being able to be quickly deployed in containers and run we have a project
Starting point is 00:08:19 or something within the Cinder source called Blockbox which has the Docker files if you wanted to create your own Docker containers for the services. And there's some blog posts that go into some nice details about capabilities as well. So that's the kind of internal abstract model of how things are done within Cinder. As a user of Cinder, a lot of it,
Starting point is 00:08:43 there is a project called Horizon, which provides a web UI that you can, you know, nice, easy, point, click, do whatever you need to do. Also, a lot of people just use the command line tools. You can easily script these, and I'll go into that a little bit later, but that gives you access to all kinds of details. And then there's the REST API, and that's everything that you could possibly do with the service is exposed through REST endpoint.
Starting point is 00:09:13 And there's, under this developer.openstack.org, there's all the details about what those API endpoints are and what you can expect to get back, what you need to pass in, all the nitty-gritty details. So, in addition to those, the REST API, there's also SDKs. I just want to point that out. So if you're looking at writing some type of system
Starting point is 00:09:44 and you need to have some storage management, if you're not using Python, which is what OpenStack is written in, or you don't want to pipe out to shell scripts, there's different SDKs, Gopher Cloud if you're writing in Golang. Pretty much most of the popular languages have some form of OpenStack SDK that you could use.
Starting point is 00:10:07 The only warning I would give there is some of them are pretty out of date. I know just out of curiosity, I went looking, and there is a C++ SDK if you really wanted to write Cinder or OpenStack management using those APIs, but it doesn't look like that one's been touched for quite a while, not really too surprisingly. But if you want it, it's out there,
Starting point is 00:10:31 and I think it did work at one point. It might need a little update at this point. So just to go through some scenarios to show other ways that Cinder can be used outside of OpenStack, I'm just going to say we have some production data, and we can't do anything with that. We want to access this data to do some tests or run some analysis against it. So we're going to take the volume that is running that data, take a snapshot, and from that create a new volume that we can then manipulate,
Starting point is 00:11:07 and we don't have to worry about affecting our production data. So in this example, I'm going to show Cinder. I'm going to show Keystone, and it's not needed if you do the no-auth. In this case, it's going to be a Windows SQL server, and we're going to use PowerShell to get that SQL data and mount it up and be able to do something with it elsewhere. And this is going to use the direct REST API. So you probably can't even see that first link there, but if you're
Starting point is 00:11:39 interested in more of this, I have the full script. I'm going to show little bits and pieces of this so I don't completely bore you to death. But if it is interesting, something that you might want to do, there's a link there in the presentation. You'll be able to download that and use that as a starting point. If you haven't used PowerShell and you're in a
Starting point is 00:11:57 Windows environment, I'd recommend learning a little bit about it. It's a great command line interface. There's all these, what they're called, commandlets that let you easily do a lot of things. So in this example, I'm going to show running some... doing the REST API calls
Starting point is 00:12:17 just using regular command line calls. So something like invoke repress method is very similar to doing, like, curl if you're used to like bash or linux unix type environment so this big chunk of uh of uh serif fonts quite a bit here but i'm gonna if we break this down it's really not as scary as it initially looks um so i'm just using this invoke web request, which is like curl, and I'm posting some data to this authentication endpoint, which is Keystone. And it's a big chunk of JSON data.
Starting point is 00:13:02 All I'm doing is saying I'm passing in my identity. I'm going to use password authentication. My username is what I pass in and my password. And that is how I get authenticated within an OpenStack cloud when using Keystone. That returns back in the header something called XSubjectToken. And all I need to do is extract that and pass that in as my authentication token for any further calls. So once I have that,
Starting point is 00:13:32 now I'm able to get a list of all volumes. I just hit the REST endpoint for volumes, including those headers. And this invokeRest method makes it really easy. PowerShell is a command line, but it's very object-oriented, so it makes it really simple to access some things where you're not having to parse a bunch of strings to figure things out.
Starting point is 00:13:59 So I get back a list of all the volumes, and I can just loop through that and look for the data, the volume that's of interest for this case. And then once I find that volume, I can then post again to the endpoint for snapshots. And since this is a post, this is creating a new snapshot. And I'm telling it, call it demo snap, and I can give it a description, and here's where that volume that I found when I looped through all my volumes and found the one that we were using. I just tell it which volume I need to take a snapshot on,
Starting point is 00:14:35 and really super simple. I look at the result, and I get the snapshot back. Then once I have that, I'm able to create a volume, telling it to create that volume from the snapshot. And this is a whole bunch of stuff. I won't go into detail, but just to show that PowerShell has a lot of really nice built-in cmdlets for things like I can get iSCSI targets, I can connect to the iSCSI target. Out of that, I can get my disks. I can find the disks. So this is the glue that ties that together.
Starting point is 00:15:10 So once I've created my volume and I've attached that to my host, I can log in, I can get that, and then down here on the bottom, I'm able to run SQL commands and attach SQL instance using that data file on that copy of the volume. That way I can do whatever I need to do with that data and not have to worry about actually affecting
Starting point is 00:15:33 any of my production users. So if you're interested in anything in there, there's some nice documentation on PowerShell and then the REST API. So if you need to do anything different, anything special, you can look through the REST API and there's nice descriptions of what everything is
Starting point is 00:15:53 and how you could add those. So another scenario. Say you're using system configuration management and you need to deploy your servers in your data center, and you just need to make sure that all of them have the same kind of configuration setup. So in this example, using something like Ansible, as part of my Ansible playbook,
Starting point is 00:16:20 I can include that volume setup as part of my configuration and just push that out and have things done. So here I'm using Cinder with Keystone again. This should work on anything that Ansible works on. If you haven't looked at Ansible, it's really nice. Everything is just declarative where you have tasks and those tasks say various things.
Starting point is 00:16:50 In this case, I'm making sure that volumes are created for each of my hosts and there's a set of OpenStack tasks or I'm blanking now, I think they're called modules within Ansible
Starting point is 00:17:05 for interacting with OpenStack clouds, where all I have to do is use OS volume and just say that, or declare that there should be a volume that is present. You declare a state. It could be absent, making sure it's not there. I'm saying present, and it needs to be 100 gig, and I'm giving it a name.
Starting point is 00:17:30 And when you run this task, Ansible is smart enough to fill in these variables and connect to the OpenStack deployment, configure the volume, and do everything it needs to do. So in the last example, I showed that big JSON blob that we're sending in in the beginning to get authenticated. I just want to point out here that all I'm saying is just Cloud Test Lab. And this is a benefit in Ansible that we get through the OpenStack SDK
Starting point is 00:17:59 that uses a nice mechanism. It's just a file called clouds.yaml. It's a YAML file that provides a list of clouds. So if you interact with a lot of different OpenStack deployments or a lot of standalone Cinder deployments, you can just give them names. And in this YAML file, you give all those details that we had to pass in manually in that last example.
Starting point is 00:18:28 And just by looking at the name in this simple text file, it's able to get out that information and do things. So it really simplifies using that. You can use that SDK directly, and same thing. Open set connect, all I have to do is give it the name then. So back to the main point. So I'm able to create that volume, and then Ansible, of course, has all those other glue pieces that tie things together,
Starting point is 00:18:54 making sure a partition is on the volume, making sure that that partition is formatted, EXDF4, and making sure it gets mounted up somewhere where we expect it to be on each host. And this is the way that you can really make sure that you have a consistent configuration that's deployed out to all your hosts. So there's all these Ansible modules. There's a whole set of them that start with OS underscore that are applicable to OpenStack. And they have nice references under docs.ansible.com. If you check out Ansible, www.ansible.com,
Starting point is 00:19:32 it has a lot of capabilities, some really cool things. This makes it really easy to tie in some of those storage pieces into all the other different types of things that you can do with Ansible. And if you're interested in that clouds.yaml, more details there, more details how you can use that OpenStack SDK directly, docs.opensack.org slash OpenStack SDK. Okay, one more scenario. You have some type of job that you need to run over the weekend.
Starting point is 00:20:04 It's going to take a really long time. It maybe generates a lot of log files, and you will want to make sure that things don't run out of space before Monday and have to start all over again. So really not a best practice, but sometimes you've got to do what you've got to do. We'll have a script that checks every once in a while, make sure that our volume has enough space, and if we're running low on space, go out and expand that volume and make sure that we're not
Starting point is 00:20:32 going to run out of space and have to start everything again on Monday. Keystone is going to be really run anywhere, bare metal, VM, whatever, and we're just scripting the CLI in this case. It's really just this script. So there's several things here. I'll break it down so it's not quite so overwhelming wall of text. All I'm doing is running disk-free on my host to pull out that line for my volume to see how much, to pull out that line for my volume
Starting point is 00:21:05 to see how much space is available. These lines just pull out those individual values out of that line so that I can see how large my volume is and what percentage of it is being used. And then all I'm doing is just have an if statement here. If the used space on that volume is greater than 90%, then just calling OpenStackVolumeExtend and adding whatever my size is,
Starting point is 00:21:35 adding 10 additional gigs on top of that. Then just whatever local commands you need to run to be able to actually consume that additional space that you make available. Just a bash script. You run a cron job or schedule it somehow just to check periodically and it can go out and just manipulate your storage without having to actually log into whatever storage management console and taking care of things yourself. So if you're interested
Starting point is 00:22:03 in that, docs.openstack.org, Python OpenStack client, and that has all kinds of capabilities, both for Cinder and any, the majority of the components that are available in OpenStack Cloud. Anything for Cinder will be under OpenStack volume and then a subset of commands there. So this is all just trying to show different ways that you can use Cinder where you might not think of it in a context outside of OpenStack. It can be used in a lot of different scenarios. I'd love to hear any other scenarios from people
Starting point is 00:22:47 about how you're using that. So feel free to reach out to me and let me know, and I'd love to share that information with other people and see other ways that we can make this useful. OpenStack or Cinder really is built for an OpenStack cloud, so it abstracts a lot of those details away. If you're interested in a broader scope of storage management and still have some of these capabilities,
Starting point is 00:23:16 I'd recommend take a look at OpenSDS. That's adding capabilities where you can run an OpenStack cloud. You can use it for container storage. There's a lot of integration points there as well. And I'd love to hear your thoughts on that, too, if you wanted to contact me. There's a meetup in this area coming up, hopefully in the beginning of November,
Starting point is 00:23:40 at IBM for OpenSDS. So definitely another one, if this looks useful for you, to take a look at that. And then kind of a preview, a little early to talk about it, but we're also adding what's at least currently called CinderLib
Starting point is 00:24:00 into the Cinder source code, where particularly some people from Red Hat have had a strong need, where they'd like to use just the drivers out of Cinder and get rid of all the scheduling capabilities and things like that. So Cinderlib will soon be another option for managing scripting your storage
Starting point is 00:24:23 where you can, within Python code, at least so far since that's what everything is written in, you can really easily just load up an individual driver and have that same kind of abstracted common way where no matter what type of storage you're interacting with, you can use that driver as a bridge of sorts to be able to do some of these storage management commands. So a lot of options there. Hopefully I've given you some ideas of maybe ways that you could use this in your own
Starting point is 00:24:58 environments. And yeah, like I said, just if you are using it way outside of an OpenStack cloud, I'd love to hear about it and love to share that. Thanks. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.