Storage Developer Conference - #78: Managing Disk Volumes in Kubernetes

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to SDC Podcast Episode 78. All right, so we're going to cover, obviously, kind of how we got here, Google and Kubernetes.

Starting point is 00:00:46 And we're specifically, obviously, this is the SDC, so we're going to be focusing on how the storage system looks and what we're doing in the storage system, including a new interface that we've just deployed and we'll be rolling out. And we've worked with probably some of you as kind of informing that interface, informing the interface spec and implementation. And then we'll kind of move over into what we can do in terms of, you know, going beyond the envelope today, right? I think kind of storage has come a long way, and Kubernetes has some core principles that we started out with. And I think that core principles have led us down

Starting point is 00:01:23 a fairly solid path that is loved by our users. And we'd like to extend those same principles into storage. And we'll need all of your help, I think, to get there. So Kubernetes, this is from 2013. Obviously the landscape's changed since then. But I always kind of look back to this quote to kind of, you know, rebase kind of the responsibility that comes by kind of being Google and then being part of that team.

Starting point is 00:01:59 And there was also, I think, a book released about kind of what would Google do. So here I'm kind'm here to tell you about what we did, why we did it, some of the past history. So I'm an engineering director in cloud storage and I've been at Google for about 12 years. I've seen it

Starting point is 00:02:17 grow from the humble beginnings around Borg. This was about 2005. And for people who already know Kubernetes, it's very similar to how Borg was architected back then. And we've kind of come a long way since then, right? We also had, you know, the GFS paper published.

Starting point is 00:02:42 In fact, back in the day, I remember spending nights and weekends trying to chase chunk servers and trying to do data recovery based on some failed servers and trying to fix bugs in some of these storage systems. I'm sure all of you have been there as you've kind of scaled a storage system from the beginnings to serving large enterprises. And I think data gives us a... It's almost a responsibility that you have to take on from a durability perspective. And unlike stateless workloads that you can just schedule and, yes, if there are failures,

Starting point is 00:03:20 you'll probably just refresh your cache and start serving again, you don't have that luxury with storage, right? As all you know, you have to build in some of the semantics to, one, keep the data safe, and also as it kind of grows, you have one small system, then you add on some other systems with different semantics and different kinds of usage. One may be columnar, one may be just block and file, one may be blob storage.

Starting point is 00:03:55 We all need to make sure that the same processes, the same guarantees, all of them work at scale. So from there, we got here, right? So we started with a few, you know, one or two storage systems, then it kind of ballooned up, and then we came to, all right, let's kind of open up all this infrastructure for everybody else. So Google went from serving one customer, Google, to serving thousands and thousands of customers with very, very different requirements.

Starting point is 00:04:31 And as a part of this kind of growth story, if you go back to the fundamentals that we saw from a Google perspective on kind of how we scale these systems, was, hey, you know, we should be able to treat these as cattle, not pets. And I'm sure all of you have heard this multiple times, right? But the cattle, not pets story, the end result is hopefully kind of this, right? That all of them are the same. It's this wonderful picture, a lot of green on it.

Starting point is 00:05:07 Everybody's eating the same thing. They're all healthy. They'll probably yield the same amount of meat, et cetera, et cetera, et cetera, right? But in our experience, kind of where this usually heads down, right, at least when you scale really quickly and move very fast through the various stages, it kind of ends up like this. Right? And it's not a fun story, but as you try to change the way storage is being accessed and storage is being managed and how customers' expectations on data durability have changed over time, right?

Starting point is 00:05:49 If 10 years ago, if somebody told you, hey, you know, I've got 11 nines of durability, maybe some of you would believe it, but not everybody, right? And here we are where 11 nines of durability is kind of average. It's the norm, right? Now, when you have to take that same level of guarantees to every single storage system that you support,

Starting point is 00:06:12 it becomes slightly hard. So we have to, I think, change the way we think about the problem a little bit so that we can kind of go back to this, right? This is the happy place. But we're not only looking at the internal consistency between storage systems, but something that has become very apparent to us is as we've

Starting point is 00:06:38 opened up the floodgates to offer this infrastructure to everybody else, we also need to work with the community to make sure that we're playing well with them and all of us are in this game together. And that's something that I think we'll cover a little more in terms of how do you not only free the infrastructure and democratize the infrastructure, but how can you be able to provide the same level of portability to data itself? Not the storage system, but data itself.

Starting point is 00:07:10 So here's where I hand off to Saad, who actually lives and breathes this on a day-to-day basis, on what went into thinking about the interface behind kind of freeing the data layer, at least the data management layer, and some of our decision and thought process. All right. Thank you very much, Nikhil. A quick

Starting point is 00:07:37 background on me. My name is Saad Ali. I'm a software engineer at Google, and I started working on the Kubernetes team pretty early on, before 1.0. I don't have a storage background. You guys are all storage experts, so I'm a little bit intimidated. But I ended up working a lot on the storage stack of Kubernetes, and the difference between what you guys do and what we do is more about we figure out how to consume storage in a

Starting point is 00:08:03 generic way, whereas you guys are focused on making that storage available and actually figuring out where those bits and bytes are going to, where the rubber meets the road, where those bits and bytes are actually going to be stored. So let me give you an overview today about what the Kubernetes storage system looks like. If you first get into this, what you're going to see is

Starting point is 00:08:25 that there's a lot of buzzwords and it's a lot to actually take in and try to understand how all of it fits together. And hopefully by the end of this presentation, you're going to have a better understanding of that. So the one principle that I want you to walk away with today is the idea that Kubernetes is all about workload portability. Let's keep that in our mind and figure out what that actually means. So for Kubernetes, the primary purpose as I see it is to act as an abstraction layer between the applications that want to deploy on some cluster, whether it's in the cloud or whether it's on-prem, and that underlying cluster, the resources's in the cloud or whether it's on-prem, and that underlying

Starting point is 00:09:05 cluster, the resources that are available from that cluster. That could be networking, that could be CPU, compute, storage, whatever it is. Kubernetes wants to be the layer between the application and that lower layer. The benefit of this is that it allows the application developers to be agnostic to the underlying storage system. The way I really like to think about this is that Kubernetes is acting a lot like a storage, a lot like an operating system. So if you think back to the days before operating systems,

Starting point is 00:09:38 if you were writing an application for some specific piece of hardware, you had to be keenly aware of the interfaces that particular piece of hardware exposed and customize your software for that piece of hardware. When operating systems became more popular, instead of worrying about what particular type of hardware your app is going to run on, you started worrying about, well, what OS am I going to write for? And in the distributed system world, we've been stuck in this kind of having to write custom distributed system applications for a

Starting point is 00:10:12 specific type of cluster, specific type of hardware for a very, very long time. And Kubernetes is finally trying to break that open and is kind of acting like an operating system. It will abstract away the resources that are available from the underlying hardware, and it'll expose a consistent interface to the application layer to be able to access those resources, do scheduling, and everything that really an operating system does. So what that means is that it doesn't matter what the underlying storage system is. You could run on Google Cloud, for example, on Amazon, or you can run on bare metal. The way that you actually deploy your application is going to be consistent because you deploy against a Kubernetes interface, and Kubernetes takes care of actually figuring out

Starting point is 00:11:02 how to deploy that application on that underlying cluster. Now, that's a very, very pretty picture that I just painted. But as with everything, storage and state makes things a lot more difficult. So in the ideal world, everything is stateless, and we don't have to worry about how to persist anything. And if everything is containerized, we can terminate an application from one machine, move it around to a different machine, get it started there. And just wherever resources are available,

Starting point is 00:11:34 we can easily move things around. The problem with that, of course, is that if you want to have a stateful application like a database or something that needs to actually persist state, how do you do that inside of a container? Your container's local file system is deleted after that container is terminated. And if you have containers that are acting as microservices,

Starting point is 00:11:59 different components that are interacting with each other, containers are essentially completely isolated. They don't really have a good way to be able to share information back and forth between them if they're actually working together. So those were some of the challenges that we wanted to solve with Kubernetes, with the Kubernetes storage and volume subsystem. So now something else to be aware of is, I think you all here are probably keenly aware of this. When I say storage, it's probably different from what you mean when you say storage and when somebody else talks about storage.

Starting point is 00:12:31 There are so many layers and so many abstractions and so many nuances to storage. At least at the layer that I work in, in cloud and distributed systems, storage can mean a number of things. It could mean the databases that run to store your state. It could be PubSub systems. It could be lower level like file or block. It could be anything. And so for Kubernetes, ideally,

Starting point is 00:12:59 we want to be able to abstract all of these things away in a manner that the consumer doesn't have to be aware of what particular type of these things they're consuming. They just say, I want block storage. Make it happen. But the reality is that we can't do all of it just yet. The reason is because the data path for a lot of these data services is not yet standardized. So what we ended up doing was focusing on the areas where the data path is standardized.

Starting point is 00:13:31 So if you look at file and block with SCSI and POSIX, the data path is standardized, and we can do a lot of cool things simply by focusing on the control path. Whereas for a lot of these other data services, think, for example, object stores, the interface, the protocol by which you actually write the bits to that service varies from one vendor to another vendor to another vendor. And so unless there is some way to be able to standardize that, Kubernetes has a difficult time trying to abstract that away. So for the sake of the Kubernetes storage and volume subsystem, we decided to focus on the left side,

Starting point is 00:14:07 which is the underlying file and block. The benefit of doing this is that ultimately the things on the right actually end up depending on the things on the left so that you can actually build those systems on top of the lower layer that we are exposing. So again, we're focusing on the left side. Data path is standardized, not yet on the right side where the data path is not yet standardized. So how do we do this?

Starting point is 00:14:32 So if you look at the Kubernetes system, we have what are called volume plugins. Volume plugins are just a way to be able to reference either a block device or a mounted file system that is accessible by any container that makes up a pod. If you're familiar with Kubernetes, the basic unit of scheduling is not a single container, rather a pod, which is a collection of containers.

Starting point is 00:14:57 So you could have containers that work together. For example, something that's pulling static content from a remote site and another container that's serving that data. So having some way to be able to share those files between those two containers would be nice. And one of the purposes that these volume plugins serve is that. So the plugins define the medium that backs a particular directory or a particular block device and defines how that actually gets set up. Let's dive into that a little bit more. But first, what are the volume plugins that we

Starting point is 00:15:31 have? I generally break this up into four categories. One is remote storage. Remote storage is fairly self-explanatory. The lifecycle of these storage systems is independent of the container or the pod. That means that you can write data to cycle of these storage systems is independent of the container or the pod. That means that you can write data to one of these storage systems, and your container can go away, and then it can come back and read and write data from that location, and it continues to exist. Ephemeral storage is a way to be able to get temporary scratch space, and we'll talk about that in a little bit. Then we have local storage. So it may be the case that you don't necessarily want to consume storage that is exposed by an

Starting point is 00:16:12 interface, either SCSI or otherwise, that's remote to your local system. You may want to just consume the local storage that's available directly on the machine that you're deployed on. There are a set of challenges in doing that, and we'll talk about that. But Kubernetes allows you to consume local storage as well. And then finally, there are out-of-tree volume plugins, in particular what we're calling the container storage interface, which is a new interface for being able to develop volume plugins like this without actually having to modify anything within Kubernetes. So if you are a storage vendor who's interested in plugging into Kubernetes, in the past you had to actually modify Kubernetes code to add your volume plugin into the system.

Starting point is 00:16:58 With CSI, that's no longer the case. You can develop your plugin independent of Kubernetes and be able to interact with Kubernetes. So let's focus in on ephemeral storage first. Ephemeral storage is fairly straightforward. Like I said, it's scratch space between the different containers that make up a pod. The reason for this is you have two containers that both want to share the same state. The basic volume plugin here is called an empty dir. It's created when the containers are started, and then it's deleted when the containers are all terminated. And any data written from one container

Starting point is 00:17:33 is visible from the other container at that directory path. Fairly straightforward. This is what it looks like in a Kubernetes definition file. You're defining a pod pod which is made up of two containers, container one and container two, and you have a volume that's called empty dir, and it's going to be mounted into both containers at this path slash shared. So if any of these containers writes to that path, it'll be visible from both containers. So that's nice. What else? On top of this, we built a few other volume plugins called Secret Config Map and Downward API.

Starting point is 00:18:13 These are actually components within the Kubernetes API. Ideally, configuration information and sensitive information like credentials or secrets, you do not want to embed those in your container image. You want those to live outside and be managed independent of the container itself. And the way that you do that, that Kubernetes allows you to do that, is you can actually define them in the Kubernetes API. Now, the easy way of accessing it

Starting point is 00:18:44 is you can actually talk directly to the Kubernetes API. So you can modify your application to make a request out to the Kubernetes API to fetch this information for your configuration or for your secrets. But that actually goes against one of the core principles of Kubernetes. One of the core principles we have is to meet the user where they are, meaning they shouldn't have to modify their application in order to work with Kubernetes. One of the core principles we have is to meet the user where they are, meaning they shouldn't have to modify their application in order to work with Kubernetes. Kubernetes should just work with the existing applications that you have. And so a lot of existing applications know how to consume both configuration information and secret information, whether it's certificates or passwords, as files. They can read a file and be able to extract that information.

Starting point is 00:19:26 So what these volume plugins do is they take this information that is in the Kubernetes API and automatically injects it as a file into the container. And that allows you to actually be able to consume this information that resides within the Kubernetes API server without having to modify your application because they can just read it and write it as a file. Next up, let's talk about remote storage. So again, the data here persists beyond the lifecycle of the container or of the pod. And there's a handful of examples.

Starting point is 00:20:06 If you're running in cloud environments, you have things like on Google Cloud, Google Cloud Persistent Disks, on Amazon, Elastic Block Storage. If you're running on-premise, this could be NFS, this could be some sort of SAN or NAS system that's exposed as Fiber Channel or iSCSI. We have a long, long list of storage systems that we support.

Starting point is 00:20:27 So what does this look like? How do you actually consume it? So when you define your workload, your pod definition, you say, okay, I have a container that I'm going to deploy. In this case, it's a busy box container that wakes up and goes to sleep for six minutes because it's a silly demo. But you also define the volumes that you want to be able to consume in this pod. And in this case, we say I want to consume a Google Cloud persistent disk called PandaDisk. And when this container starts, I want this disk to be mounted at the path called slash data. So this is fairly powerful. You have this nice, easy-to-write configuration file that defines how your application should be deployed.

Starting point is 00:21:09 Once you deploy this against Kubernetes, Kubernetes automatically takes care of making that storage available on the scheduled node and mounting it inside the container. And if that pod gets terminated or killed or if the node that this pod is running on dies for some reason, Kubernetes automatically reschedules this pod to a different machine and automatically makes that storage available on the new machine. So your application continues to run regardless of where it's scheduled and the storage is available regardless of where it's scheduled. That's a super powerful concept.

Starting point is 00:21:44 But there's a problem with this. And the problem is that in our pod definition, we defined the storage system by name that we wanted to use. In this case, we said, I have a pod that's going to use a Google Cloud persistent disk. Does anyone see a problem with that? So the primary challenge with this is that if you take that pod definition and you drop it onto a cluster that's not running within Google Cloud, now your application is effectively not going to run because there is no Google Cloud

Starting point is 00:22:17 persistent disk if you're running on-prem or running on Amazon, for example. So ideally, we want in Kubernetes some way to be able to decouple the storage that's going to be used to fulfill the request for this application from the actual underlying storage system so that you as an application developer can focus just on your application, not on the underlying storage system. So let's talk about how we make that happen. And again, this is all about the Kubernetes principle of workload portability. Ideally, we want to do this in a way that we don't inhibit the power of the underlying storage system. So all the work that you guys do to make storage more performant, make it more reliable, add tons of new features to it. We want to be

Starting point is 00:23:06 able to expose that through Kubernetes. But at the same time, we don't want the folks who are just writing applications to be exposed to that kind of complexity. So how do we balance this? How do we balance this? Making it very simple for the end user to consume while still enabling the full power and variety of all the different types of storage systems that are out there. So the way that we did this was by introducing a couple of objects in the Kubernetes API called the persistent volume claim and the persistent volume. These objects are designed to decouple storage implementation from storage consumption. So the way that it works is that a cluster administrator comes along. A cluster administrator is somebody who administers the storage system. They are familiar with the storage that's available on this cluster.

Starting point is 00:23:58 They can decide what storage to make available on that cluster to the users of that cluster. So what they do is create these persistent volume Kubernetes API objects, and they basically define sets of volumes that are available for use. So, for example, if you're a cluster administrator for Kubernetes, you can pre-provision different sizes of volumes and create persistent volume objects that make those volumes available for consumers to use. And as a user of Kubernetes, you don't have to worry about the underlying storage

Starting point is 00:24:37 that you're going to consume. You create a persistent volume claim that very generically describes the type of storage that you need. So in this case, you specify the capacity of the storage that you need and the access mode, meaning I want it to be read-write-many, read-write-once, just a very generic description of the type of storage that you need. So then what Kubernetes does is it takes the available storage, the PVs that were defined by the cluster administrator, and it tries to match your request

Starting point is 00:25:07 with what's available. And once it's done, you can use that storage by directly referencing the persistent volume claim object instead of directly referencing the exact type of storage that you want to use. So in your pod definition, instead of defining that you want to use GCE persistent disks, you say, I want to use this persistent volume claim called myPVC. And now if you were to take your workload and move to a different cloud environment or move to an on-premise environment, you would deploy the same pod configuration file and your same persistent volume claim object. And as long as the cluster administrator on that cluster has made PV objects available, everything just works automatically.

Starting point is 00:25:53 So on Google Cloud, this will automatically be matched to a Google Cloud persistent disk. And on premise, it may be fulfilled by an NFS share or, you know, if you're running on Amazon, EBS, or something else. So now we have effectively made your application deployment configuration files portable. So one of the issues that you probably noticed with this was the cluster administrator manually having to come in and predefine the volumes that are available for application developers to consume. That's tedious. It's difficult to predict and can result in outages where you maybe didn't provision enough and application developers want more. How do you handle that? So the way that Kubernetes tries to simplify this is through the concept of dynamic provisioning. And this is a concept that's very unique to Kubernetes.

Starting point is 00:26:51 Basically, what we do is when we know that there's a request for storage, we will automatically trigger the creation of a new volume automatically. This works very nicely in cloud environments where volumes can be provisioned dynamically, but it also works in on-premise environments depending on the storage system that you have and your idea of what provisioning means. So it could be for an NFS server automatically creating a new share and setting that aside. But it can vary from storage system to storage system, and the volume plugins get to define what it means for a particular storage system. So the way that this works is, again, another Kubernetes API object in this case called the storage class. The storage class is an object created by the cluster administrator that says, I know what underlying storage systems I have

Starting point is 00:27:45 available, and I want these storage systems to be available for use by application developers when they're trying to deploy a new workload on my system. And so they specify the provisioner that's going to be used to create the new storage, as well as a set of opaque parameters for defining what that new piece of storage is going to look like. And so I mentioned earlier that one of the challenges that we had was simplifying the consumption of storage while also enabling the power of the storage systems to shine through. All the different possible parameters that storage systems can have, things like replication, encryption, so on and so forth, we can't possibly enumerate all of those into the Kubernetes API. So instead, what we did was basically allow the volume plugins to define

Starting point is 00:28:39 an opaque set of key value parameters that will be specified by the cluster administrator and pass through to the storage system during provisioning. And so anything that your storage system allows customizing during the provisioning process, you can expose as an opaque parameter. And so in a Google Cloud environment, for example, this will include things like whether encryption is on or not, what type of disk to actually provision, whether it's an SSD or a spinning disk. But it can be whatever makes sense for your storage system. And in this way, we preserve the ability to not cripple, like not become the lowest common denominator in terms of what we expose to our end users, the full power of the storage system is still available to them. But instead of the end user who's deploying a

Starting point is 00:29:30 workload on the cluster worrying about it, it's the cluster administrator that gets to make the decision about how that happens. And again, as an end user, the only thing you would do is specify the type of storage class that you want to consume by taking a look at the underlying cluster and picking one. And even that you don't have to do. A cluster administrator can mark a storage class as default, and in that way, the end user doesn't even need to specify the storage class. It's automatically selected for them. So I think those are the fundamental components that make up the Kubernetes volume storage system, and they enable workload portability, ultimately.

Starting point is 00:30:17 So as an application developer, I don't have to worry about the underlying storage that fulfills my application workloads, but at the same time, I can get the power of whatever storage system is available to me on that cluster. All right, now let's take a step back and talk about a few other volume plugins that we have. We have host path volumes, which are fairly straightforward.

Starting point is 00:30:41 This means that we will expose a specific directory on the underlying host machine into the container. This is for test. Don't use it. Do not have your applications rely on this because things will break. If your application ends up getting moved to a different node, which happens in Kubernetes, if your node becomes unavailable for any reason,

Starting point is 00:31:04 any state that you wrote to the old machine is not going to be available on the new machine. So don't use that unless you know what you're doing. We do have a concept of local persistent volumes. Local persistent volumes is a way for us to be able to expose local volumes or the local storage that's available on individual nodes in a way that Kubernetes can understand. So I told you not to use HostPath for persisting data because if you do,

Starting point is 00:31:31 your workload can get scheduled somewhere else and that data is no longer available. With local persistent volumes, Kubernetes understands that this storage is available only on a specific machine. And once a workload starts using that particular piece of storage, Kubernetes knows that that workload must remain scheduled to that machine only and will not move it around to other machines.

Starting point is 00:31:57 The caveat here, of course, is that you have to understand that when you're using local persistent storage, you're basically reducing the reliability of your application. If that machine happens to die, the data that was on that machine and that application are now down. There is nowhere else that we can reschedule you because the data is not available to other machines. So the types of use cases that you would use this for are primarily two use cases. One is if you have a application level stateful application that handles replication at the application layer. So think of it as a database that replicates between its different shards.

Starting point is 00:32:40 And that way, if one shard goes away, it can tolerate that failure by automatically replicating to the other available nodes. So that's a good use case. And another use case that you'll see is a lot of these software-defined storage systems want to be able to consume the underlying storage that's available on the host machines and expose it as a virtual scale-out file system or block storage. And those systems can utilize the local persistent volumes to ingest the underlying local storage and then apply replication and other features on top of that and expose another storage system which end users can consume, which is more reliable and effectively remote. The last thing I wanted to focus on was how you write these volume plugins. So when we started with Kubernetes,

Starting point is 00:33:33 all the volume plugins were actually baked into the Kubernetes source code. The Kubernetes volume plugin interface was a Go language interface, and writing a new volume plugin meant that you had to actually check code into Kubernetes. This allowed us to move very quickly in the early days of Kubernetes because we didn't have to fixate on a interface. Instead, what we could do was continuously revise that

Starting point is 00:33:58 interface, and since all the volume plugins were also part of the same Kubernetes package, we could just update the consumers in every version of Kubernetes we shipped. Even if we modified the interface, we shipped with a set of volume plugins that would work with Kubernetes. Of course, this becomes difficult to manage over the longer term. Some of the challenges are, for us as Kubernetes developers, accepting third-party code that we can't test. So for example, the end-to-end tests that we do for Kubernetes don't exercise the Fiber Channel volume plugin. How do we know that it continues to work

Starting point is 00:34:38 with every single release of Kubernetes? Another challenge is just being required to give permissions to these volume plugins, that Kubernetes-level permissions, essentially. So all the permissions that we grant to any one of our components, these volume plugins automatically inherit, which is a security issue. And of course, if any one of these volume plugins has a bug, it can potentially crash the entire Kubernetes binary because it's running as part of that binary. So ideally, we want to remove these volume plugins from the core of Kubernetes. And as a storage vendor, you don't want to be aligned with the Kubernetes release schedule. It's a massive, massive project, and if all you're interested in is your little storage system, you don't want to have to deal with figuring out what the Kubernetes release cycle is, what the

Starting point is 00:35:29 patching system looks like, all of these things. And possibly you don't even want to open source your code, which is a requirement for Kubernetes. So how do we get around all of these issues? Actually, I'm a slide behind. So the previous slide was more about Kubernetes' storage system is awesome, but extending it is painful. So how do we make it less painful? The focus now is on an interface called the container storage interface.

Starting point is 00:35:59 I'm not going to deep dive into this because we don't have enough time, but think of it as taking that internal Kubernetes interface that we defined as a Go language interface, polishing it, flushing it out, and introducing it as an external gRPC interface. So now in order to integrate with Kubernetes, you write a binary that will implement a gRPC interface, and Kubernetes can talk to that interface over a Unix domain socket instead of it being compiled into the core of Kubernetes. So we started this project a couple years ago, and we're targeting the GA or 1.0 launch

Starting point is 00:36:42 of this project in the next quarter. So far, we've got about 12 volume plugins that are officially advertising implementing CSI. And if you are interested in figuring out how to integrate your storage system with Kubernetes, please take a look at CSI and reach out to me or anyone in the community to understand how to get started there. There was a legacy attempt at doing this within Kubernetes called Flex Volumes that we started

Starting point is 00:37:10 before CSI. And the difference between CSI and Flex is that Flex was a exec-based model. What this meant was that whenever we decided that we needed to do a volume operation like doing mounting or attaching, instead of looking within the Kubernetes binary, we would look for a file, an executable or a script on the host machine to try to fulfill that request. So that means that as a volume vendor, you can basically write your volume plug-in as an executable and deploy it on the machines.

Starting point is 00:37:44 And that's the way that Flex worked. But the challenge with that was, of course, deployment. You can write a file that you deploy to these systems, but what happens if that file gets deleted? What happens if a new node is created? How do you manage the deployment of that file automatically? These were the challenges that Kubernetes itself set out to solve. How can we depend on a storage system and make them solve this all over again? And so with the container storage interface, the way that you deploy a volume plugin is actually just another workload on top of Kubernetes. It's deployed as a pod that consumes a set of local storage, essentially,

Starting point is 00:38:26 and it does the mounting and everything inside the container. And so Kubernetes takes care of ensuring that the pieces that are required for your volume plugin remain running. And so if a new node comes in, Kubernetes will automatically deploy those bits there. If something happens to an existing node, Kubernetes can automatically recover on your behalf. So our recommendation is to look at container storage interface if you are interested in interfacing with Kubernetes. And that's all I have today. I'm going to hand it back to Nikhil for talking about what's going to happen in the future. Thanks, Nikhil.

Starting point is 00:39:10 All right. Still working. Thank you, Saad. That was a pretty deep dive into what we're trying to do. But we're not done yet. There's so much more left to be done here. And I think we'd like to make sure that we're communicating this to all of you and kind of working on this problem together. I think this is a pretty hard engineering problem.

Starting point is 00:39:35 So Saad spoke a lot about portability, workload portability. But I don't think we've solved the kind of stateful container portability completely. We're still looking at all these use cases today that have to solve the compute problem in a certain way and then solve the storage problem in a very different way. And these are real. I mean, there are customers that are waiting for a solution. So today, they're trying to say, okay, you know what? I'm going to solve a part of my problem, but

Starting point is 00:40:11 then we'll solve the storage problem when we get to it. But then it doesn't really realize the goals that we have, and then we have to put in a lot of hacks. I mean, we spoke a little bit about how local persistent volumes are kind of, they're half big right now. We still need to do better in order to kind of get them out there. And if you look at each of these use cases,

Starting point is 00:40:35 there are also very different types of storage devices as well. Some are file, some are block, right? And then how do you do things like, okay, I want to just take a snapshot of whatever I have and move it into a different environment. That's portability for me, right? I can do that with an application, but I can't move it, like, as a whole component, right? I just want my entire workload to move. And if you think about how Google has scaled over time, one of the key secrets

Starting point is 00:41:10 that we have, it's fairly a public secret, but the growth of administrators of our systems is fairly flat. Whereas the users of our systems is, I mean, it's an exponential growth, right? The number of engineers we hire versus the number of people we hire, the number of engineers we hire to develop applications on top versus the number of people we hire, say, SREs, that we

Starting point is 00:41:37 get to maintain the underlying storage infrastructure is very, very different. Now, we want to get that same concept of keeping it simple, keeping it abstracted away, keeping it like a simple set of cattle, right, that all of them look the same,

Starting point is 00:41:57 kind of pointing back to Saad's presentation on, if I could say, you know what, from a snapshot perspective, the user doesn't even know, right? All the user has to say is, hey, you know what, I just want something that I can run my CICD on. I don't have to go look at, oh, is this version X, Y, Z? Is it scrubbed of all my information? Is all the private stuff moved out?

Starting point is 00:42:21 Is no PII data in it? Where is it located? Is it in this cluster, this zone? All of that needs to go away. It needs to be magical. And that's where we're trying to get to with snapshot portability. And again, we'll do this as part of the usual Kubernetes review process, work with all of you on that problem. The other issue today is I think we've got some of the

Starting point is 00:42:51 things nailed down in terms of durability from a single pane of glass perspective. But we're not quite there yet. Again, this is a case of, OK, I can see all my compute workloads, but what's going on with my storage systems? Oh, wait, let me look at this Gluster dashboard here. Let me look at my NetApp dashboard here. Oh, but somebody is also using S3 on the side,

Starting point is 00:43:18 so let me go take a look at that. So if you need to be able to scale your organization's needs without throwing more people at the problem and be able to quickly address all of their needs and help and drive innovation, not only inside your company but across the community, we need to have a common standard by which all of the information in a Kubernetes storage cluster can be unified behind a single

Starting point is 00:43:47 pin bus. That's where we're also investing into. You'll see some proposals for potentially some common standards there and be able to reason about some of these metrics. I covered this a few times and and I'd like to kind of drill this in as well. We're at an interesting point where we're developing multiple interfaces. I've seen this movie play out at Google a few times again. There's a risk that we will diverge on potential solutions for managing the data across block, file, blob, even SQL databases. One of the statistics that I was a little surprised by is,

Starting point is 00:44:32 I was in this kind of Gartner review where they were talking about enterprise growth, SMB growth, and they actually see SMB growth flatlining over the next four to five years. And that's concerning, and I was like, okay, why is that? It turns out that moving their workloads into a cloud-native infrastructure is too complex.

Starting point is 00:44:58 They need too many people. The savings that you get from being able to, you know, use some of these cloud-native infrastructure or even, you know, migrating to something that's dynamically provisioned, you don't have any kind of fixed cost, and it's very usage-based, all of that is great, but then if you can't manage it at scale, then it falls apart, right? So I think it's still very, very useful that, you know,

Starting point is 00:45:26 and we'll have more of these. There'll be even more things that'll come up, even more different kinds of storage, even if you look at kind of just NoSQL databases themselves. There's so many today. But there's no common way that you can interact with them, and that's what we're trying to bring here with Kubernetes. And we want to be involved in the community,

Starting point is 00:45:45 be able to drive towards some common standards with no vendor lock-in at all, right? And be able to consciously empower the storage administrator so that he can empower, or he or she can empower, the actual users, right? actual users to drive that innovation. I think that's the last bit I had. And this is a quote I always look to, is be able to get to a point where,

Starting point is 00:46:24 it's just like a bottle of water, right? There's a lot that goes into getting that bottle of water on your desk. There's sanitation, there's planning, there are a bunch of dams everywhere. So being able to abstract all the complexity out so that your end users can innovate on top. By the way, for people who are wondering what those clouds in the background are, those are clouds generated by Google's data centers.

Starting point is 00:46:56 Those are the chillers that are cooling and they throw some mist out. So that's Google Cloud's data centers generating clouds. But yeah, I think going back to being able to abstract all the complexity out and simplify management of scale.

Starting point is 00:47:14 And that's been the secret to Google's success at building the largest systems, planet scale systems across compute, storage, networking. And we'd like to bring that same principles, same standards, same thought process, same level of engagement to the rest of the community. All right.

Starting point is 00:47:37 Thank you. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the

Storage Developer Conference - #78: Managing Disk Volumes in Kubernetes

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.