Storage Developer Conference - #78: Managing Disk Volumes in Kubernetes
Episode Date: November 5, 2018...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference.
The link to the slides is available in the show notes
at snea.org slash podcasts.
You are listening to SDC Podcast Episode 78.
All right, so we're going to cover, obviously,
kind of how we got here, Google and Kubernetes.
And we're specifically, obviously, this is the SDC, so we're going to be focusing on how the storage system looks and what we're doing in the storage system, including a new interface that we've just deployed and we'll be rolling out.
And we've worked with probably some of you as kind of informing that interface, informing the interface spec and implementation.
And then we'll kind of move over into what we can do in terms of, you know,
going beyond the envelope today, right?
I think kind of storage has come a long way,
and Kubernetes has some core principles
that we started out with.
And I think that core principles have led us down
a fairly solid path that is loved by our users.
And we'd like to extend those same principles into storage.
And we'll need all of your help, I think, to get there.
So Kubernetes, this is from
2013. Obviously the landscape's changed since then.
But I always kind of look back to this quote to kind of, you know,
rebase kind of the responsibility that comes by kind of being Google
and then being part of that team.
And there was also, I think, a book released about kind of what would Google do.
So here I'm kind'm here to tell you about
what we did, why we did it,
some of the past history.
So I'm an engineering
director in cloud storage
and I've been at Google for about
12 years. I've seen it
grow from
the humble beginnings
around Borg. This was
about 2005.
And for people who already know Kubernetes,
it's very similar to how Borg was architected back then.
And we've kind of come a long way since then, right?
We also had, you know, the GFS paper published.
In fact, back in the day, I remember spending nights and weekends trying to chase
chunk servers and trying to do data recovery based on some failed servers and trying to
fix bugs in some of these storage systems. I'm sure all of you have been there as you've
kind of scaled a storage system from the beginnings to serving large enterprises. And I think data gives us a...
It's almost a responsibility that you have to take on
from a durability perspective.
And unlike stateless workloads that you can just schedule
and, yes, if there are failures,
you'll probably just refresh your cache and start serving again,
you don't have that luxury with storage, right? As all you know,
you have to build in some of the semantics
to, one, keep the data safe, and also
as it kind of grows,
you have one small system, then you add on some other systems with different
semantics and different kinds of usage.
One may be columnar, one may be just block and file, one may be blob storage.
We all need to make sure that the same processes, the same guarantees, all of them work at scale.
So from there, we got here, right?
So we started with a few, you know, one or two storage systems,
then it kind of ballooned up, and then we came to,
all right, let's kind of open up all this infrastructure for everybody else.
So Google went from serving one customer, Google,
to serving thousands and thousands of customers
with very, very different requirements.
And as a part of this kind of growth story,
if you go back to the fundamentals that we saw
from a Google perspective on kind of how we scale these systems,
was, hey, you know, we should be able to treat these as cattle, not pets.
And I'm sure all of you have heard this multiple times, right?
But the cattle, not pets story, the end result is hopefully kind of this, right?
That all of them are the same.
It's this wonderful picture, a lot of green on it.
Everybody's eating the same thing.
They're all healthy. They'll probably yield the same amount of meat, et cetera, et cetera, et cetera, right? But in our experience, kind of where this usually heads down, right, at least when you
scale really quickly and move very fast
through the various stages, it kind of ends up like this.
Right? And it's not a fun story, but
as you try to change the way storage is being accessed
and storage is being managed and how customers' expectations on data durability
have changed over time, right?
If 10 years ago, if somebody told you,
hey, you know, I've got 11 nines of durability,
maybe some of you would believe it, but not everybody, right?
And here we are where 11 nines of durability
is kind of average.
It's the norm, right?
Now, when you have to take that same level of guarantees
to every single storage system that you support,
it becomes slightly hard.
So we have to, I think,
change the way we think about the problem a little bit
so that we can kind of go back to this, right?
This is the happy place.
But we're not only looking at the internal
consistency between storage systems, but something
that has become very apparent to us is as we've
opened up the floodgates to offer this infrastructure
to everybody else, we also need to
work with the community
to make sure that we're playing well with them and all of us are in this game together.
And that's something that I think we'll cover a little more in terms of how do you
not only free the infrastructure and democratize the infrastructure, but how can you be able
to provide the same level of portability to data itself?
Not the storage system, but data itself.
So here's where I hand off to Saad, who actually lives and breathes this on a day-to-day basis,
on what went into thinking about the interface behind kind of freeing the data layer,
at least the data management
layer, and some of
our decision and thought process.
All right.
Thank you very much, Nikhil.
A quick
background on me. My name is Saad Ali.
I'm a software engineer at
Google, and I started working on the Kubernetes
team pretty early on, before
1.0.
I don't have a storage background. You guys are all storage experts, so I'm a little bit
intimidated. But I ended up working a lot on the storage stack of Kubernetes, and the difference
between what you guys do and what we do is more about we figure out how to consume storage in a
generic way, whereas you guys are focused on making that storage available
and actually figuring out where those bits and bytes are going to,
where the rubber meets the road,
where those bits and bytes are actually going to be stored.
So let me give you an overview today
about what the Kubernetes storage system looks like.
If you first get into this,
what you're going to see is
that there's a lot of buzzwords and it's a lot to actually take in and try to understand how all of
it fits together. And hopefully by the end of this presentation, you're going to have a better
understanding of that. So the one principle that I want you to walk away with today is the idea
that Kubernetes is all about workload portability.
Let's keep that in our mind and figure out what that actually means.
So for Kubernetes, the primary purpose as I see it is to act as an abstraction layer
between the applications that want to deploy on some cluster, whether it's in the cloud or whether it's on-prem,
and that underlying cluster, the resources's in the cloud or whether it's on-prem, and that underlying
cluster, the resources that are available from that cluster.
That could be networking, that could be CPU, compute, storage, whatever it is.
Kubernetes wants to be the layer between the application and that lower layer.
The benefit of this is that it allows the application developers to be agnostic to the underlying storage system.
The way I really like to think about this
is that Kubernetes is acting a lot like a storage,
a lot like an operating system.
So if you think back to the days before operating systems,
if you were writing an application
for some specific piece of hardware,
you had to be keenly aware of the interfaces that particular
piece of hardware exposed and customize your software for that piece of hardware.
When operating systems became more popular, instead of worrying about what particular type
of hardware your app is going to run on, you started worrying about, well, what OS am I going
to write for? And in the distributed system
world, we've been stuck in this kind of having to write custom distributed system applications for a
specific type of cluster, specific type of hardware for a very, very long time. And Kubernetes is
finally trying to break that open and is kind of acting like an operating system. It will abstract
away the resources that are available from the underlying hardware, and it'll expose a consistent
interface to the application layer to be able to access those resources, do scheduling, and
everything that really an operating system does. So what that means is that it doesn't matter what the underlying
storage system is. You could run on Google Cloud, for example, on Amazon, or you can run on bare
metal. The way that you actually deploy your application is going to be consistent because
you deploy against a Kubernetes interface, and Kubernetes takes care of actually figuring out
how to deploy that application on that underlying cluster.
Now, that's a very, very pretty picture that I just painted.
But as with everything, storage and state makes things a lot more difficult.
So in the ideal world, everything is stateless, and we don't have to worry about how to persist anything. And if everything is containerized,
we can terminate an application from one machine,
move it around to a different machine,
get it started there.
And just wherever resources are available,
we can easily move things around.
The problem with that, of course,
is that if you want to have a stateful application
like a database or something
that needs to actually persist state,
how do you do that inside of a container?
Your container's local file system is deleted after that container is terminated.
And if you have containers that are acting as microservices,
different components that are interacting with each other,
containers are essentially completely isolated.
They don't really have a good way to be able to share information back and forth between them
if they're actually working together.
So those were some of the challenges that we wanted to solve with Kubernetes,
with the Kubernetes storage and volume subsystem.
So now something else to be aware of is, I think you all here are probably keenly aware of this.
When I say storage, it's probably different from what you mean when you say storage and when somebody else talks about storage.
There are so many layers and so many abstractions and so many nuances to storage.
At least at the layer that I work in, in cloud and distributed systems, storage can mean a number of things.
It could mean the databases that run to store your
state. It could be
PubSub systems. It could be
lower level like file or block.
It could be anything.
And so for Kubernetes, ideally,
we want to be able to
abstract all of these things away in a
manner that the consumer doesn't have to be aware of what particular type of these things they're consuming.
They just say, I want block storage.
Make it happen.
But the reality is that we can't do all of it just yet.
The reason is because the data path for a lot of these data services is not yet standardized.
So what we ended up doing was focusing on the areas where the data path is standardized.
So if you look at file and block with SCSI and POSIX, the data path is standardized,
and we can do a lot of cool things simply by focusing on the control path.
Whereas for a lot of these other data services, think, for example, object stores,
the interface, the protocol by which you actually write the bits to that service varies from one
vendor to another vendor to another vendor. And so unless there is some way to be able to
standardize that, Kubernetes has a difficult time trying to abstract that away. So for the sake of
the Kubernetes storage and volume subsystem,
we decided to focus on the left side,
which is the underlying file and block.
The benefit of doing this is that ultimately
the things on the right actually end up
depending on the things on the left
so that you can actually build those systems
on top of the lower layer that we are exposing.
So again, we're focusing on the left side. Data path is standardized,
not yet on the right side where the data path is not yet standardized. So how do we do this?
So if you look at the Kubernetes system, we have what are called volume plugins. Volume plugins
are just a way to be able to reference either a block device or a mounted file system that is accessible by any container
that makes up a pod.
If you're familiar with Kubernetes,
the basic unit of scheduling
is not a single container,
rather a pod,
which is a collection of containers.
So you could have containers that work together.
For example, something that's pulling static content
from a remote site
and another container that's serving that data.
So having some way to be able to share those files between those two containers would be nice.
And one of the purposes that these volume plugins serve is that.
So the plugins define the medium that backs a particular directory or a particular block device and defines how that actually gets
set up. Let's dive into that a little bit more. But first, what are the volume plugins that we
have? I generally break this up into four categories. One is remote storage. Remote
storage is fairly self-explanatory. The lifecycle of these storage systems is independent of the
container or the pod. That means that you can write data to cycle of these storage systems is independent of the container or the
pod. That means that you can write data to one of these storage systems, and your container can go
away, and then it can come back and read and write data from that location, and it continues to exist.
Ephemeral storage is a way to be able to get temporary scratch space, and we'll talk about
that in a little bit. Then we have local storage.
So it may be the case that you don't necessarily want to consume storage that is exposed by an
interface, either SCSI or otherwise, that's remote to your local system. You may want to just consume
the local storage that's available directly on the machine that you're deployed on. There are a set of challenges in doing that, and we'll talk about that.
But Kubernetes allows you to consume local storage as well.
And then finally, there are out-of-tree volume plugins,
in particular what we're calling the container storage interface,
which is a new interface for being able to develop volume plugins like this without actually having to modify anything within Kubernetes.
So if you are a storage vendor who's interested in plugging into Kubernetes,
in the past you had to actually modify Kubernetes code to add your volume plugin into the system.
With CSI, that's no longer the case.
You can develop your plugin independent of Kubernetes and be able to interact with Kubernetes. So let's focus in on ephemeral storage first. Ephemeral storage is fairly
straightforward. Like I said, it's scratch space between the different containers that make up a
pod. The reason for this is you have two containers that both want to share the same state.
The basic volume plugin here is called an empty dir.
It's created when the containers are started,
and then it's deleted when the containers are all terminated.
And any data written from one container
is visible from the other container at that directory path.
Fairly straightforward.
This is what it looks like in a Kubernetes definition file.
You're defining a pod pod which is made up of
two containers, container one and container two, and you have a volume that's called empty
dir, and it's going to be mounted into both containers at this path slash shared. So if
any of these containers writes to that path, it'll be visible from both containers. So that's nice. What else?
On top of this, we built a few other volume plugins called Secret Config Map and Downward API.
These are actually components within the Kubernetes API. Ideally, configuration information and
sensitive information like credentials or secrets, you do not want to embed those in your container image.
You want those to live outside
and be managed independent of the container itself.
And the way that you do that,
that Kubernetes allows you to do that,
is you can actually define them in the Kubernetes API.
Now, the easy way of accessing it
is you can actually talk directly to the Kubernetes API. So
you can modify your application to make a request out to the Kubernetes API to fetch this information
for your configuration or for your secrets. But that actually goes against one of the core
principles of Kubernetes. One of the core principles we have is to meet the user where they are,
meaning they shouldn't have to modify their application in order to work with Kubernetes. One of the core principles we have is to meet the user where they are, meaning they shouldn't have to modify their application in order to work with Kubernetes.
Kubernetes should just work with the existing applications that you have. And so a lot of
existing applications know how to consume both configuration information and secret information,
whether it's certificates or passwords, as files. They can read a file and be able to extract that information.
So what these volume plugins do is they take this information that is in the Kubernetes API
and automatically injects it as a file into the container. And that allows you to actually be
able to consume this information that resides within the Kubernetes API server without having
to modify your application
because they can just read it and write it as a file.
Next up, let's talk about remote storage.
So again, the data here persists beyond the lifecycle of the container or of the pod.
And there's a handful of examples.
If you're running in cloud environments,
you have things like on Google Cloud,
Google Cloud Persistent Disks,
on Amazon, Elastic Block Storage.
If you're running on-premise, this could be NFS,
this could be some sort of SAN or NAS system
that's exposed as Fiber Channel or iSCSI.
We have a long, long list of storage systems that we support.
So what does this look like? How do you actually consume it? So when you define your workload,
your pod definition, you say, okay, I have a container that I'm going to deploy.
In this case, it's a busy box container that wakes up and goes to sleep for six minutes because it's a silly demo. But you also define the
volumes that you want to be able to consume in this pod. And in this case, we say I want to
consume a Google Cloud persistent disk called PandaDisk. And when this container starts, I want
this disk to be mounted at the path called slash data. So this is fairly powerful. You have this
nice, easy-to-write configuration file
that defines how your application should be deployed.
Once you deploy this against Kubernetes,
Kubernetes automatically takes care of
making that storage available on the scheduled node
and mounting it inside the container.
And if that pod gets terminated or killed
or if the node that this pod is running on dies for some reason, Kubernetes automatically reschedules this pod to a different machine and automatically makes that storage available on the new machine.
So your application continues to run regardless of where it's scheduled and the storage is available regardless of where it's scheduled.
That's a super powerful concept.
But there's a problem with this.
And the problem is that in our pod definition,
we defined the storage system by name that we wanted to use.
In this case, we said,
I have a pod that's going to use a Google Cloud persistent disk.
Does anyone see a problem with that?
So the primary challenge with this is that if you take that pod definition and you drop it onto a cluster that's not running within Google
Cloud, now your application is effectively not going to run because there is no Google Cloud
persistent disk if you're running on-prem or running on Amazon, for example. So ideally,
we want in Kubernetes some way to be able to decouple the
storage that's going to be used to fulfill the request for this application from the actual
underlying storage system so that you as an application developer can focus just on your
application, not on the underlying storage system. So let's talk about how we make that happen. And
again, this is all about the Kubernetes principle of workload portability.
Ideally, we want to do this in a way that we don't inhibit the power of the underlying storage system.
So all the work that you guys do to make storage more performant, make it more reliable, add tons of new features to it. We want to be
able to expose that through Kubernetes. But at the same time, we don't want the folks who are
just writing applications to be exposed to that kind of complexity. So how do we balance this?
How do we balance this? Making it very simple for the end user to consume while still enabling the full power and variety of all the different types of storage systems that are out there.
So the way that we did this was by introducing a couple of objects in the Kubernetes API called the persistent volume claim and the persistent volume.
These objects are designed to decouple storage implementation from storage consumption.
So the way that it works is that a cluster administrator comes along.
A cluster administrator is somebody who administers the storage system.
They are familiar with the storage that's available on this cluster.
They can decide what storage to make available on that cluster to the users of that cluster. So what they do is create these persistent volume Kubernetes API objects,
and they basically define sets of volumes that are available for use.
So, for example, if you're a cluster administrator for Kubernetes,
you can pre-provision different sizes of volumes
and create persistent volume objects
that make those volumes available for consumers to use.
And as a user of Kubernetes,
you don't have to worry about the underlying storage
that you're going to consume.
You create a persistent volume claim
that very generically describes
the type of storage that you need.
So in this case, you specify the capacity of the storage that you need and the access mode, meaning I want it to be
read-write-many, read-write-once, just a very generic description of the type of storage that
you need. So then what Kubernetes does is it takes the available storage, the PVs that were defined
by the cluster administrator, and it tries to match your request
with what's available. And once it's done, you can use that storage by directly referencing the
persistent volume claim object instead of directly referencing the exact type of storage that you
want to use. So in your pod definition, instead of defining that you want to use GCE persistent disks,
you say, I want to use this persistent volume claim called myPVC.
And now if you were to take your workload and move to a different cloud environment
or move to an on-premise environment, you would deploy the same pod configuration file
and your same persistent volume claim object. And as long as the cluster administrator
on that cluster has made PV objects available, everything just works automatically.
So on Google Cloud, this will automatically be matched to a Google Cloud persistent disk.
And on premise, it may be fulfilled by an NFS share or, you know, if you're running on Amazon, EBS, or something else.
So now we have effectively made your application deployment configuration files portable.
So one of the issues that you probably noticed with this was the cluster administrator manually having to come in
and predefine the volumes that are available for application developers to consume. That's tedious. It's difficult to predict and can
result in outages where you maybe didn't provision enough and application developers want more.
How do you handle that? So the way that Kubernetes tries to simplify this is through the concept of dynamic provisioning.
And this is a concept that's very unique to Kubernetes.
Basically, what we do is when we know that there's a request for storage, we will automatically trigger the creation of a new volume automatically.
This works very nicely in cloud environments where volumes can be provisioned
dynamically, but it also works in on-premise environments depending on the storage system
that you have and your idea of what provisioning means. So it could be for an NFS server
automatically creating a new share and setting that aside. But it can vary from storage system
to storage system, and the volume plugins get to define what it means for a particular storage system.
So the way that this works is, again, another Kubernetes API object in this case called the storage class.
The storage class is an object created by the cluster administrator that says, I know what underlying storage systems I have
available, and I want these storage systems to be available for use by application developers
when they're trying to deploy a new workload on my system. And so they specify the provisioner
that's going to be used to create the new storage, as well as a set of opaque parameters for defining
what that new piece of storage is going to look like. And so I mentioned earlier that one of the
challenges that we had was simplifying the consumption of storage while also enabling the
power of the storage systems to shine through. All the different possible parameters that storage systems can have,
things like replication, encryption, so on and so forth, we can't possibly enumerate all of those
into the Kubernetes API. So instead, what we did was basically allow the volume plugins to define
an opaque set of key value parameters that will be specified by the cluster administrator and pass
through to the storage system during provisioning. And so anything that your storage system allows
customizing during the provisioning process, you can expose as an opaque parameter. And so in a
Google Cloud environment, for example, this will include things like whether encryption is on or not, what type of disk to actually
provision, whether it's an SSD or a spinning disk. But it can be whatever makes sense for your
storage system. And in this way, we preserve the ability to not cripple, like not become the lowest
common denominator in terms of what we expose to our end users, the full power
of the storage system is still available to them. But instead of the end user who's deploying a
workload on the cluster worrying about it, it's the cluster administrator that gets to make the
decision about how that happens. And again, as an end user, the only thing you would do is specify
the type of storage class that you want to consume by taking a look at the underlying cluster and picking one.
And even that you don't have to do.
A cluster administrator can mark a storage class as default, and in that way, the end user doesn't even need to specify the storage class.
It's automatically selected for them.
So I think those are the fundamental components that make up the Kubernetes volume storage system,
and they enable workload portability, ultimately.
So as an application developer,
I don't have to worry about the underlying storage
that fulfills my application workloads,
but at the same time, I can get the power of whatever storage system
is available to me on that cluster.
All right, now let's take a step back
and talk about a few other volume plugins that we have.
We have host path volumes, which are fairly straightforward.
This means that we will expose a specific directory
on the underlying host machine into the container.
This is for test. Don't use it.
Do not have your applications rely on this
because things will break.
If your application ends up getting moved to a different node,
which happens in Kubernetes,
if your node becomes unavailable for any reason,
any state that
you wrote to the old machine is not going to be available on the new machine. So don't use that
unless you know what you're doing. We do have a concept of local persistent volumes. Local
persistent volumes is a way for us to be able to expose local volumes or the local storage that's
available on individual nodes
in a way that Kubernetes can understand.
So I told you not to use HostPath for persisting data
because if you do,
your workload can get scheduled somewhere else
and that data is no longer available.
With local persistent volumes,
Kubernetes understands that this storage
is available only on a specific machine.
And once a workload starts using that particular piece of storage,
Kubernetes knows that that workload must remain scheduled to that machine only
and will not move it around to other machines.
The caveat here, of course, is that you have to understand
that when you're using local persistent storage,
you're basically reducing the reliability of your application.
If that machine happens to die, the data that was on that machine and that application are now down.
There is nowhere else that we can reschedule you because the data is not available to other machines.
So the types of use cases that you would use this for are primarily two use cases.
One is if you have a application level stateful application that handles replication at the
application layer. So think of it as a database that replicates between its different shards.
And that way, if one shard goes away, it can tolerate that failure by automatically replicating to the other available nodes.
So that's a good use case.
And another use case that you'll see is a lot of these software-defined storage systems want to be able to consume the underlying storage that's available on the host machines and expose it as a virtual scale-out file system or block storage. And those systems can utilize the local persistent volumes to ingest the underlying local storage
and then apply replication and other features on top of that
and expose another storage system which end users can consume, which is more reliable and effectively remote.
The last thing I wanted to focus on
was how you write these volume plugins.
So when we started with Kubernetes,
all the volume plugins were actually baked
into the Kubernetes source code.
The Kubernetes volume plugin interface
was a Go language interface,
and writing a new volume plugin
meant that you had to actually check code
into Kubernetes. This allowed us to move very quickly in the early days of Kubernetes because
we didn't have to fixate on a interface. Instead, what we could do was continuously revise that
interface, and since all the volume plugins were also part of the same Kubernetes package,
we could just update the consumers in every version of Kubernetes we shipped.
Even if we modified the interface,
we shipped with a set of volume plugins that would work with Kubernetes.
Of course, this becomes difficult to manage over the longer term.
Some of the challenges are, for us as Kubernetes developers,
accepting third-party code that we can't test. So for example, the end-to-end tests that we do for Kubernetes don't
exercise the Fiber Channel volume plugin. How do we know that it continues to work
with every single release of Kubernetes? Another challenge is just being required to give permissions to these volume
plugins, that Kubernetes-level permissions, essentially. So all the permissions that we
grant to any one of our components, these volume plugins automatically inherit, which is a security
issue. And of course, if any one of these volume plugins has a bug, it can potentially crash the entire Kubernetes binary because it's running as part of that binary.
So ideally, we want to remove these volume plugins from the core of Kubernetes.
And as a storage vendor, you don't want to be aligned with the Kubernetes release schedule.
It's a massive, massive project, and if all you're interested in is your little storage system,
you don't want to have to deal with figuring out what the Kubernetes release cycle is, what the
patching system looks like, all of these things. And possibly you don't even want to open source
your code, which is a requirement for Kubernetes. So how do we get around all of these issues?
Actually, I'm a slide behind. So the previous slide was more about
Kubernetes' storage system is awesome,
but extending it is painful.
So how do we make it less painful?
The focus now is on an interface
called the container storage interface.
I'm not going to deep dive into this
because we don't have enough time,
but think of it as taking that internal
Kubernetes interface that we defined as a Go language interface, polishing it, flushing it out,
and introducing it as an external gRPC interface. So now in order to integrate with Kubernetes,
you write a binary that will implement a gRPC interface, and Kubernetes can talk to that interface
over a Unix domain socket instead of it being compiled into the core of Kubernetes.
So we started this project a couple years ago, and we're targeting the GA or 1.0 launch
of this project in the next quarter. So far, we've got about 12 volume plugins
that are officially advertising implementing CSI.
And if you are interested in figuring out
how to integrate your storage system with Kubernetes,
please take a look at CSI and reach out to me
or anyone in the community
to understand how to get started there.
There was a legacy attempt at doing this within Kubernetes called Flex Volumes that we started
before CSI.
And the difference between CSI and Flex is that Flex was a exec-based model.
What this meant was that whenever we decided that we needed to do a volume operation like
doing mounting or attaching,
instead of looking within the Kubernetes binary,
we would look for a file, an executable or a script on the host machine to try to fulfill that request.
So that means that as a volume vendor, you can basically write your volume plug-in as an executable
and deploy it on the machines.
And that's the way that Flex worked.
But the challenge with that was, of course, deployment. You can write a file that you deploy
to these systems, but what happens if that file gets deleted? What happens if a new node is
created? How do you manage the deployment of that file automatically? These were the challenges that
Kubernetes itself set out to solve.
How can we depend on a storage system and make them solve this all over again?
And so with the container storage interface, the way that you deploy a volume plugin is actually just another workload on top of Kubernetes. It's deployed as a pod that
consumes a set of local storage, essentially,
and it does the mounting and everything inside the container.
And so Kubernetes takes care of ensuring that the pieces that are required for your volume plugin remain running.
And so if a new node comes in, Kubernetes will automatically deploy those bits there.
If something happens to an existing node, Kubernetes can automatically recover on your behalf.
So our recommendation is to look at container storage interface if you are interested in interfacing with Kubernetes.
And that's all I have today.
I'm going to hand it back to Nikhil for talking about what's going to happen in the future.
Thanks, Nikhil.
All right.
Still working.
Thank you, Saad.
That was a pretty deep dive into what we're trying to do.
But we're not done yet.
There's so much more left to be done here.
And I think we'd like to make sure that we're communicating this to all of you and kind of working on this problem together.
I think this is a pretty hard engineering problem.
So Saad spoke a lot about portability, workload portability.
But I don't think we've solved the kind of stateful
container portability completely.
We're still looking at all these use cases today that have to solve the compute problem in a certain way
and then solve the storage problem in a very different way.
And these are real.
I mean, there are customers that are waiting for a solution.
So today, they're trying to say, okay, you know what? I'm going to solve a part of my problem, but
then we'll solve the storage problem when we get to it.
But then it doesn't really realize the goals that
we have, and then we have to put in a lot of hacks. I mean, we spoke a little bit about
how local persistent volumes are kind of,
they're half big right now.
We still need to do better
in order to kind of get them out there.
And if you look at each of these use cases,
there are also very different types
of storage devices as well.
Some are file, some are block, right?
And then how do you do things like,
okay, I want to just take a snapshot of whatever I have and move it into a different environment.
That's portability for me, right?
I can do that with an application, but I can't move it, like, as a whole component, right?
I just want my entire workload to move. And if you think about how Google has scaled over time, one of the key secrets
that we have, it's fairly a public secret, but the growth of administrators of our systems
is fairly flat. Whereas the users of our systems is, I mean,
it's an exponential growth, right? The number of
engineers we hire versus the number of people
we hire, the number of engineers
we hire to develop applications
on top versus the number of people we hire,
say, SREs, that we
get to maintain
the underlying storage infrastructure
is very,
very different.
Now, we want to get that same concept of keeping it simple,
keeping it abstracted away,
keeping it like a simple set of cattle, right,
that all of them look the same,
kind of pointing back to Saad's presentation on,
if I could say, you know what,
from a snapshot perspective,
the user doesn't even know, right?
All the user has to say is, hey, you know what, I just want something that I can run my CICD on.
I don't have to go look at, oh, is this version X, Y, Z?
Is it scrubbed of all my information?
Is all the private stuff moved out?
Is no PII data in it?
Where is it located?
Is it in this cluster, this zone?
All of that needs to go away.
It needs to be magical.
And that's where we're trying to get to with snapshot portability.
And again, we'll do this as part of the usual Kubernetes review process, work with all of you on that problem.
The other issue today is I think we've got some of the
things nailed down in terms of durability from a single pane
of glass perspective.
But we're not quite there yet.
Again, this is a case of, OK, I can see all my compute
workloads, but what's going on with my storage systems?
Oh, wait, let me look at this Gluster dashboard here.
Let me look at my NetApp dashboard here.
Oh, but somebody is also using S3 on the side,
so let me go take a look at that.
So if you need to be able to scale your organization's needs
without throwing more people at the problem
and be able to quickly address all of their needs
and help and drive innovation,
not only inside your company but across the community,
we need to have a common standard
by which all of the information in a Kubernetes storage cluster can be unified behind a single
pin bus. That's where we're also investing into. You'll see
some proposals for potentially some common standards there and
be able to reason about some of these metrics.
I covered this a few times and and I'd like to kind of drill this in as well.
We're at an interesting point where we're developing multiple interfaces.
I've seen this movie play out at Google a few times again.
There's a risk that we will diverge on potential solutions for managing the data across block, file, blob, even SQL databases.
One of the statistics that I was a little surprised by is,
I was in this kind of Gartner review
where they were talking about enterprise growth, SMB growth,
and they actually see SMB growth flatlining
over the next four to five years.
And that's concerning, and I was like, okay, why is that?
It turns out that moving their workloads
into a cloud-native infrastructure
is too complex.
They need too many people.
The savings that you get
from being able to, you know,
use some of these cloud-native infrastructure
or even, you know, migrating to something that's dynamically provisioned,
you don't have any kind of fixed cost, and it's very usage-based,
all of that is great, but then if you can't manage it at scale, then it falls apart, right?
So I think it's still very, very useful that, you know,
and we'll have more of these.
There'll be even more things that'll come up,
even more different kinds of storage,
even if you look at kind of just NoSQL databases themselves.
There's so many today.
But there's no common way that you can interact with them,
and that's what we're trying to bring here with Kubernetes.
And we want to be involved in the community,
be able to drive towards some common standards
with no vendor lock-in at all, right?
And be able to consciously empower the storage administrator
so that he can empower, or he or she can empower,
the actual users, right? actual users to drive that innovation.
I think that's the last bit I had.
And this is a quote I always look to,
is be able to get to a point where,
it's just like a bottle of water, right?
There's a lot that goes into getting that bottle of water on your desk.
There's sanitation, there's planning, there are a bunch of dams everywhere.
So being able to abstract all the complexity out so that your end users
can innovate on top.
By the way, for people who are wondering
what those clouds in the background are,
those are clouds generated by Google's data centers.
Those are the chillers that are cooling
and they throw some mist out.
So that's Google Cloud's data centers generating clouds.
But yeah,
I think going back to being able to
abstract all
the complexity out and
simplify management of scale.
And that's been the secret to
Google's success at building the
largest systems, planet
scale systems across
compute, storage, networking.
And we'd like to bring that same principles, same standards, same thought process, same
level of engagement to the rest of the community.
All right.
Thank you.
Thanks for listening.
If you have questions about the material presented in this podcast,
be sure and join our developers mailing list
by sending an email to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further with your peers
in the storage developer community.
For additional information about the