Storage Developer Conference - #85: Bulletproofing Stateful Applications on Kubernetes

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to STC Podcast, Episode 85. Good afternoon, everyone. My name is Dinesh Asrani. I'm a software engineer at Portworx, and today I'm going to engineer at Portworx. And today I'm going to talk about basically bulletproofing your stateful applications.

Starting point is 00:00:59 So I'm basically going to go about what kind of issues you're going to see when you start running stateful applications in production and how you can basically overcome them. So these are the things that I'm going to talk about. I'm going to basically give you an intro about Portworx because a lot of the stuff that I'm going to talk about depends on a software-defined storage solution installed on your Kubernetes cluster like Portworx. Then I'm going to talk about what the motivation was for the stock project and what it actually does. So we're going to talk about how you can schedule your stateful applications more efficiently,

Starting point is 00:01:26 how you can basically use stock to do, to monitor your, monitor the health of your storage system, and also how you can use it to, uh, basically do disaster recovery and cluster migration. Uh, and at the end, I'll basically do a demo of the cluster migration, uh, feature, and then we can, uh, move to any questions if you have. So, uh, about Portworx. integration feature and then we can move to any questions if you have so about portworks support works is basically the first software defined storage solution for production ready stateful

Starting point is 00:01:53 applications the way portworks works is you install a daemon set on your kubernetes cluster and we go ahead and fingerprint all your storage devices on all the nodes, and basically form a virtual storage pool. And at that point, you can basically create container granular volumes. So basically, each pod will get its own volume, and you'll be able to use it in your stateful applications. And also, Portworx works on any deployment, so it can be bare metal, it can be VMs in your local, in your

Starting point is 00:02:26 private data centers, or it can be in any cloud. We also have support for snapshots and cloud snaps for DR purposes, and we also support bring your own key encryption for encrypting your volumes as well as your backups.

Starting point is 00:02:42 The good thing about Portworx is there's just one way of deploying it wherever you're running it, whether it's on-prem or on the cloud. And we ourselves run as a container so that we don't pollute your host namespace at all. So just to get a feel of the room, how many of you are running stateful applications

Starting point is 00:03:02 in Kubernetes or any other container orchestrator right now? Okay, cool. Any of you running that in production right now? Okay, cool. All right, so I'm not going to go into very deep on how applications and resources are managed on Kubernetes. I think the previous talk gave a good overview of that. But basically what happens

Starting point is 00:03:30 is you can specify so every application that you start up on Kubernetes, you can basically specify a persistent volume that you want to use with it. And the way it's done is it's in a declarative manner. You basically specify the pods you want to run and what volumes

Starting point is 00:03:45 you want to use. So why was, why did we start the STORC project? So STORC stands for Storage Orchestration Runtime for Kubernetes. So our main goal for starting STORC was to basically help run stateful applications more efficiently on Kubernetes. The issue is that the storage subsystem in Kubernetes is written in a very generic way so that it has to work with all the subsystems but to take care of to take advantage of new

Starting point is 00:04:13 software defined storage solutions it doesn't really work with that because you basically need to extend Kubernetes for those cases so that's why we started writing Stalk. Also, there is no way to manage the lifecycle of stateful applications natively in Kubernetes, so you can't really take snapshots,

Starting point is 00:04:35 migrate your stateful applications across clusters natively. So Stalk is actually written in a plugin model, so you can actually write drivers for any storage subsystem and it will work with STORK. And all of this is open source and it's available on our GitHub page. So, let's move on to the first problem that STORK is trying to solve. So for software defined storage solutions that are running on the same nodes as your Kubernetes nodes.

Starting point is 00:05:07 There is a problem that when you basically schedule these pods, you do not know where your data lies. And Kubernetes does not have the intelligence to figure out what type of storage system it's actually talking to. So, Storck basically tries to solve that problem. The way that it's done right now is you can basically specify labels and affinity rules for your applications when you start them up.

Starting point is 00:05:29 So you would basically say I want to pin my applications onto nodes which have some labels. The problem with this is that this does not scale. If you have a thousand applications, you have to basically go ahead and modify each one of them to figure out where your data is and then modify them. The other thing is this error prone. So you have to make sure that you actually modify all of them or you will end up with pods on nodes which actually don't have the data. The other thing is there's a concept of stateful sets

Starting point is 00:05:58 in Kubernetes, which is basically used for applications like Cassandra. So you basically say, I want to run n number of these pods, and I want to use a persistent volume claim for each one of these pods. The problem there is, how do you actually specify these labels and affinity rules for your pods? Since there is no one-to-one mapping for the labels and the pods, you can't basically templatize the affinity rules. So the solution that we used in Stark was we basically built a scheduler extender.

Starting point is 00:06:33 So the good thing about Kubernetes is that it's very extensible. And especially for schedulers, you can write extenders, which will be called every time a pod is going to be scheduled in your cluster. So there are a couple of calls that the scheduler will make out to this extender, and we've basically implemented two of them, which is basically to filter out nodes where your storage isn't available, and then we can also prioritize nodes where your data is local. So this is very helpful in cases of software-defined solutions where you have replicas placed on some nodes, and you preferably want to start your workloads on those nodes

Starting point is 00:07:08 so that you get better local performance. Another advantage of this is that you don't have to modify your applications at all. The only thing that you need to actually do is set your scheduler name to the new scheduler that you started up. And you don't have to set any affinity rules, you don't have to modify or add any node labels to any of your Kubernetes nodes. So how does this work exactly? So basically, the way the Kubernetes scheduler will get

Starting point is 00:07:41 a call to schedule a pod for multiple volumes, say V1 and V2. At this point, since we've configured a scheduler extender, the Kubernetes scheduler will basically send a filter request to Stork. And Stork will then basically talk to the storage driver. So in this case, we have a five-node cluster running Kubernetes, but we have a software-defined solution, Portworx, installed only on three of the nodes. Now, if Kubernetes was to go ahead and start the pod on node four,

Starting point is 00:08:11 it would basically fail at that point because it would not be able to find the volume to mount into the pod. So what Stork is going to do is it's basically going to talk to Portworx and figure out where it's actually running. And it's going to figure out that Portworx is running only on three of the five nodes. So it's basically going to filter out N4 and N5, and in the fourth step, it's basically going to tell Kubernetes that you can only schedule this part on node N1, N2, and N3.

Starting point is 00:08:38 Now, once the scheduler gets this, it's basically going to make the next call in its scheduling process. It's basically going to ask the scheduler again, OK, so here are, here are the nodes that this pod can be scheduled on. How do you prioritize these nodes? So in this example, Portworx has actually, has spread out the volumes for these, for these, for, for, has spread the replicas for these volumes across the cluster. So as you saw here, the pod had actually requested for volume V1 and V2, and Portworx placed replicas for volume V1 on node N1 and N2,

Starting point is 00:09:15 and replicas for V2 on N2 and N3. Now here, obviously, if the pod starts up on N2, we'll have the best performance because both the replicas will actually be local and we'll not be going over to the network. So at this point, when the prioritized request comes into Storck, Storck is again going to talk to Portworx and ask it,

Starting point is 00:09:34 these are the volumes that I need. Can you please tell me where the replicas for these volumes lie? And based on this information, it's basically going to rank the different nodes. So when it sends back the reply, it's basically going to say that N1 has a score of 100, N2 has a score of 200, and N3 has a score of 100 again. And then Kubernetes is basically going to rank these based on these scores

Starting point is 00:09:57 and other prioritization techniques that it uses. And it's basically going to end up starting the pod on node N2. So this is basically going to result in the best performance for your stateful applications. So apart from this, one of the other problems that you'll realize as soon as you start running stateful applications in Kubernetes is there is no way to actually monitor the health of your storage subsystems. So once a pod is actually scheduled on one of the nodes, if the storage subsystem somehow

Starting point is 00:10:31 runs into an error, the file system goes into a read-only mode, or you start seeing file system errors, Kubernetes will not be able to actually figure out that something's wrong with the pod. You might have health checks for your pods, but these don't always work. For example, for apps like MongoDB, which actually maintain data in memory, the flusher might actually be stuck because it's not able to flush data, but its health check will keep returning okay for a long time until a flush is actually done. So there are multiple things that can happen here, right?

Starting point is 00:11:06 Your software can crash. Your disk could actually be degraded, causing these issues. And the problem is Kubernetes will not realize that something's wrong because Kubelet is still running on that node. So Kubelet is basically a service that runs on each Kubernetes node,

Starting point is 00:11:22 and it basically monitors the health of the node as well as all the pods that are running on that node and returns it to the master. And then the scheduler basically works on that and figures out if the pods need to be rescheduled or it basically takes care of the entire lifecycle of the pod. And since Kubelet is still running on that node, it's not going to realize that anything's wrong.

Starting point is 00:11:43 And what happens in this case is you usually have to go in manually, delete the pod, and wait for it to actually be rescheduled onto another node on the pod. So the solution that we have here is, Portworx, STORK actually monitors the storage health on all the nodes. And as soon as it sees that Portworx has actually gone offline on a node, it, it basically queries for all the pods that are running on that node and figures out whether any of those pods were actually using Portworx volumes. Now, since Portworx is actually, or any storage driver is actually offline on

Starting point is 00:12:18 that node, it's going to realize that that pod is not going to be able to do its job. So it's basically going to kill the pod so that it gets rescheduled onto another node. Without this, what's going to happen is the pod is basically, if the pod was, if the storage system died while the pod was being scheduled, it's going to get stuck in pending state indefinitely

Starting point is 00:12:39 because it's not going to be able to mount the volume, or it's basically going to be in a state where it's not going to be able to do IOs. So this is basically how the flow would look. So basically, Stark would keep polling for the health of the underlying storage system regularly. And in this case,

Starting point is 00:12:58 we had basically scheduled the pod on node N2. Now since node N2 is offline at this point, Stark is going to realize that that pod needs to be rescheduled. And what it's going to tell the scheduler is that since, since the pod, since the node where the pod was scheduled is offline, please go ahead and reschedule it. And

Starting point is 00:13:18 at this point, it's going to go through the filter and the prioritize workflow again. So when the filter request comes in, it's again going to realize that the storage system is running only on nodes N1 and N3, and it's going to filter out N2. And eventually, the pod is going to get scheduled on either N1 or N3

Starting point is 00:13:37 because they both had the same scores. So it just reschedules the pod? It doesn't restart the node? No, it does not have control over the storage subsystem, right? So that is up to the storage subsystem to manage on its own. But since the replicas for the volume were available on different nodes, it will basically just reschedule the pod. Yes. Yes, and there could be multiple issues, right? I mean, the

Starting point is 00:14:08 stock cannot figure out what the storage system, what the issue is at that point. Yes. The main goal of stock is to schedule pods efficiently and have high availability for all your pods, even if there is an underlying issue with your storage subsystem. So Kubernetes takes care of the node, you take care of the storage. Yes, that's right. So it's not just Kubernetes though, right? At this point, so Kubelet might still be online on node N2, right? And you might be able to schedule other stateless apps.

Starting point is 00:14:40 But your storage is offline, so you'll not be able to use stateful apps on that node. So you need to basically be able to move your pods over. And there's no Kubernetes native way of doing this, because Kubernetes does not know how to talk to different storage subsystems at this point. So the other thing that Starg does is it helps you manage the lifecycle of your stateful apps. So right now there is no native way of managing your snapshots and disaster recovery in Kubernetes. Well, there's no entry way of managing that right now.

Starting point is 00:15:17 So what we've done is we've basically added support for snapshots. This is based on the Kubernetes incubator snapshot project. So we've basically pulled that into stock itself. And we've added support for two types of snapshots. So you can basically take local snapshots. So the snapshots will lie on the same cluster that is running a storage subsystem. Or you could also basically take a snapshot to any object store. So it could be any S3 compliant object store like Minio or AWS or Azure and Google.

Starting point is 00:15:50 Also, these snapshots are application-consistent, so you can specify rules that you want to run before you take the snapshot. For example, if you're running MySQL, you basically want to flush all the tables and you want to make sure that the database is logged before you actually take a snapshot, so you get an application-consistent snapshot.

Starting point is 00:16:08 And if you're running in-memory databases, which you, like Cassandra, you basically want to make sure that everything is flushed, so that at the point that you take the snapshot, you have the latest data available in there. So the way that you can do this is you can specify

Starting point is 00:16:24 commands that you want to run in either one of those pods or all of those pods. And we'll basically run the command, take the snapshot, and then you can also run a post-exec command to basically run the command after the snapshot has been taken. And this also works for a group of snapshots,

Starting point is 00:16:40 a group of volumes. So you can basically say that you want to snapshot all your Cassandra volumes and it'll basically freeze all of those volumes at one point. First of all, if you had provided a pre-snapshot hook, it would basically flush everything, freeze all the volumes, take the snapshot, unfreeze everything, and then go ahead with the snapshot. So one of the other things that we are adding in stock is this concept of cluster migration.

Starting point is 00:17:10 There are multiple reasons that you might want to migrate your workloads from one cluster onto another. One of the reasons is that you probably did not... So workloads are always growing, right? You might... When you actually provision your cluster, you might have allocated some resources for it. But at some point, you realize that you want to run a lot more workloads

Starting point is 00:17:31 and you want to move some of your workloads from your initial cluster onto another one. So that is the first use case where you want to basically augment your current storage and compute with another cluster. The other reason is a blue-green deployment. So you might be upgrading either your software-defined storage solution or Kubernetes, and you want to make sure that everything runs fine

Starting point is 00:17:52 before you actually switch it over to your new cluster. So, yeah. And the third one is basically dev tests. So you might have... For CI-CD, you might want to actually be running tests on some data from your production cluster. And this basically makes it very easy. So you would basically have cluster one, which is your production cluster, and you would be able to

Starting point is 00:18:14 specify another cluster. And you can say, I want to migrate all my data from cluster one to cluster two, run some tests on it, make sure everything's fine, and then push out the updated containers or applications onto your dev or production cluster. So this is a high overview of how the migration would happen. So the storage migration would happen at the storage layer. So in this case, Portworx would be responsible for migrating all your data.

Starting point is 00:18:44 And this can actually work between any type of cluster. So you could basically go from on-prem to the cloud and then back again. Or you could basically move from Google to Azure or vice versa. And you can do any kind of migration at that point. So Portworx will be responsible for moving the volumes and all the storage-based policies, and then we'll basically use Kubernetes to move all the resources from

Starting point is 00:19:10 one cluster to another. So, there are two paths to this cluster migration. So, the first is basically you need to pair these clusters, because you need to be able to say where you can actually migrate these resources to. So the way that you would do it is on your source cluster, you would basically use what in Kubernetes we call a custom resource definition.

Starting point is 00:19:39 So let me just talk about that. So in Kubernetes, you can basically define your own types, and you can basically write your controllers that look at these types and basically try to perform some jobs based on what has been provided in these custom app specs. So basically, we have created a custom resource definition for pairing two clusters. So when you're pairing two clusters, all you would need to do is you would need to basically provide it the IP of the other cluster and the storage token and the port that you want to talk to.

Starting point is 00:20:14 And in the config section, you would basically have to give it the Kubernetes config so that we can actually migrate the resources. Once you have this information, all you have to do is apply this and wait for it to be in ready state, which means that the storage has been, storage has been paired as well as the scheduler has been paired. So now once you have the two clusters paired, how do you actually

Starting point is 00:20:39 migrate the data and the resources, right? So, so we've added a second CRD to actually do this migration. So in this CRD, what you would specify is, you would specify the cluster pair. So here we basically called this a remote, we gave it a name, remote cluster. So in the second, in the migration CRD, we are basically going to say that the cluster pair that we want to migrate is called remote cluster.

Starting point is 00:21:04 And we also want to include all the resources. What this does is, so you can actually set this to false, and if you set it to false, all it's going to do is it's going to tell the storage subsystem just to move the volumes. It's not going to move any Kubernetes resources. So you're not going to see any applications,

Starting point is 00:21:20 PVCs or PVCs move to the remote cluster. The second option that you can provide is do you actually want to start the applications once they are moved? So this is very helpful in cases of DR scenarios where you want to maintain an active site and you want to backup everything to a passive site,

Starting point is 00:21:37 but you don't want to actually start up your applications. So you want to migrate your volumes and your applications, but not start the application. So what this is going to do is if you set this to false, it's actually going to set the replicas for your deployments and stateful sets to zero. So all your resources are actually available on the other side, but nothing's been started off yet.

Starting point is 00:21:58 And we also... So basically, what happens if your original cluster goes down, though? How do you know what replicas... how many replicas you should set the application to? So we'll actually put an annotation in the deployment on the stateful side to say how many replicas were actually started

Starting point is 00:22:14 on your source cluster. So in case your original cluster goes down, all you have to do is go to your new cluster, figure out what the replicas were on the other one, and just scale them up, and you are ready to go. So, and once you apply this spec, what's going to happen is

Starting point is 00:22:32 Storck is going to talk to the underlying storage subsystem. It's going to figure out the other thing that I missed was you can actually specify a list of namespaces and selectors that you want to migrate. So you don't want to migrate your entire cluster, right? You might want to migrate only a part of your cluster. So you can basically specify what namespaces you want to migrate, and you can also

Starting point is 00:22:51 give it label selectors. So you can migrate only some applications within a particular namespace, too. So basically when you apply this, what's going to happen is Storck is going to talk to your underlying storage subsystem. It's going to tell, what's going to happen is Stork is going to talk to your underlying storage subsystem. It's going to tell... It's going to figure out what volumes are actually mapped to this namespace and these label selectors. And it's going to tell the storage subsystem to basically migrate these volumes

Starting point is 00:23:15 from the source cluster to the remote cluster. And once that is done, Stork is basically then going to figure out what resources were actually a part of this namespace and these selectors, and it's going to apply it to the remote cluster. So this includes everything like deployments, PVCs, the PVs, your secrets,

Starting point is 00:23:37 as well as any service config that you might have on that cluster. All right, so... I'm going to go ahead and, I had two clusters over here. And I don't have any applications running right now, so I'm just going to do a get all in my, in the MySQL namespace, and you can see I don't have anything running here. Neither do I have anything running

Starting point is 00:24:44 on the destination cluster. So what I'm going to do first now is I'm going to spin up a MySQL service. I'm just going to show you what it looks like. So basically this is going to start up a MySQL deployment and we specified a storage class for the PVCs to use and this is basically going to say, this is basically saying I want to use a Portworx volume with a two-way replica.

Starting point is 00:25:09 And I want to create a PVC called MySQL data. And in the deployment, we basically said that we want to use the MySQL data PVC. And I want to basically mount it under valid MySQL. And like I mentioned, this is going to use stock, so this is the only change that you need to do to basically make it work with stock. You don't need to specify any node affinities or pod affinities to basically schedule your apps close to your data.

Starting point is 00:25:42 So once I have this, I'm just going to basically apply it. So this is going to spin up the app. And if we do this, we should see that the app is now up and running. So we basically have a MySQL pod running, and if we do a kubectl get pvc, we will see that a pvc, sorry, it's in the MySQL namespace,

Starting point is 00:26:18 a pvc has been created, and there is also a pv that's backing this. Now that we have this... So we still don't have anything over here. So I've basically already set up the cluster pair information over here. So if you look at this, what this specifies is it specifies the IP of one of the nodes on the remote cluster

Starting point is 00:26:42 and the token that we got from the storage subsystem on the other side. And it specifies the port that we want to use to talk to the storage. And in the config, we basically have the Kubernetes config so that we can talk to Kubernetes on the remote cluster. Now, once I do a kubectl get cluster pair, you'll see that I don't have any cluster pairs

Starting point is 00:27:06 created right now. So I'm just going to go ahead and create that. Now once we use this, Stalk also has a tool called StalkCurl to basically look at these custom resource definitions more clearly.

Starting point is 00:27:23 So if you do a kubectl cluster pair, you'll see that... You'll see that both the storage and the scheduler status are saying ready at this point. So once this is the case, we are basically ready to migrate resources from one cluster to the other. Now, I'm going to do this in two steps.

Starting point is 00:27:43 So in the first step, I'm basically going to say that I want to migrate all my... I want to migrate all... I want to migrate the MySQL namespace to the remote cluster, and I want to include the resources, but I don't want to start the applications. So what this is going to do is it's going to first migrate

Starting point is 00:28:03 all the volumes associated with the applications in MySQL. Then it's going to migrate all the resources. But the deployment that we have used, the replicas for the deployment will be set to zero on the remote cluster. So you'll see that no pods will actually spin up on

Starting point is 00:28:20 the other side. So I'm again gonna go ahead, do this. And if I do a get migration, so I'm just gonna make this into full screen, and you'll see that currently it's, since we only had one volume associated with MySQL, it's basically migrating the volumes right now. And this is gonna take, take a few minutes.

Starting point is 00:29:03 But on the other side, let me also run a watch. So what this is doing is basically watching for all the resources that are there in the remote cluster. And you'll see that as soon as this volume migrates, you'll see that it goes into the application stage, and then it's going to go into the final done stage. So I'm just going to watch on this. Yes, the volumes are migrated and it's basically migrating the resources and everything's done. And at that point, you see that all your applications, all the specs that were on the source cluster are now available in the destination cluster.

Starting point is 00:29:47 Now you'll see over here that the deployment is present, but the desired replicas for this is set to zero. And if you actually look at the deployment, you'll see that we'll end up... we ended up adding an annotation over here saying what the migration replicas were. So in this case, if your source cluster goes down, you can basically look at this and just increase the replica count to one and your application will be up. But what if you wanted to actually start them up again?

Starting point is 00:30:28 So now I'm just gonna use this, a similar, similar spec, and now I'm gonna say is, I wanna again do the replicate, I again wanna do the migration and I also wanna start up the applications this time. So one thing to note over here is all this, the migration is actually incremental. So the second time that you do a migration, it's not going to migrate all

Starting point is 00:30:53 your data. It's going to, since this works at a block level layer from Portworx, it's going to be able to figure out what has actually changed between the two migrations and migrate only the diffs. So the first time, if you have a large volume and a large amount of data, the migration might take time. But any subsequent migrations will just migrate the changes that you had. So now I'm just going to basically apply the second spec.

Starting point is 00:31:28 And I'm going to start a watch again over here. And you'll see that over here, the deployment is going to get updated, and as soon as the volumes and the resources are migrated, you're going to see that a new pod also spins up for your MySQL application. Question? Yeah. So is this migration accessible by a namespace

Starting point is 00:31:50 or Kubernetes? So all of this will work with RBAC, so you need to make sure that the admin actually gives permissions to the right people. So everything would be done at the Kubernetes RBAC level. Since this is

Starting point is 00:32:06 a CRD, you can basically specify which users have the capability to create the migrations as well as the cluster pairs. So you can actually, that can be very granular. So you can say some users have the ability to get cluster pairs, but they don't have abilities to create them. And then you can also restrict who has the ability to actually do the migrations. So that can be left up to a cluster administrator. So it doesn't have to be the user apps that actually do this. But they can always look at,

Starting point is 00:32:34 so you can always give them a get or a list ability so that they can actually see when the last migration was done. So here it just finished. As you can see that the MySQL pod actually just spun up and since we had said that we want to start the applications, the replica count was set to one at this point. So it's just going to basically create the container and spin up the application at this

Starting point is 00:32:55 point. All right. Sure. So when you do migration, you essentially are pressing on the source, replicating the data to the target. Yes. And then essentially it's up to the user to start the service. Yes, so that is the configuration value, right?

Starting point is 00:33:20 So you can select to either not start it up or start it up, depending. So in terms of best practices, how do you recommend somebody to put this into a VR scenario? Does it sound like you have to trigger this manually? Would you have a com.gov or some migration thing from the party on a regular basis? So there's going to be a schedule to do the migration regularly too.

Starting point is 00:33:43 So you're going to be able to specify intervals. You're going to be able to specify at what point daily you want to take this and weekly and monthly schedules. So it'll have a cadence of doing those migrations regularly. Does this manage Kubernetes? No, so it won't be. So Kubernetes scheduler is just in charge of basically taking care of pods and scheduling pods. So there'll be another spec, basically, where you'll be able to specify a schedule.

Starting point is 00:34:09 You'll be able to say when you want to do the migrations at what cadence. And at that point, we'll basically migrate the apps. The thing about scaling up apps on the other side, now that needs to be something that's taken care of at a higher service level because you need to figure out that your source or your primary cluster site is actually down. And then something needs to go in and basically increase the replicas over there. So that is not something that can be done

Starting point is 00:34:35 at this level, I think. So where I was coming from, the question was, how do I minimize my recovery point objective, right? So I have the shortest amount of time of potential loss. Yes. Yes, and what we're also applying to add is you'll be able to specify in your volume if 10% of the data changes trigger a migration. So that'll basically make sure that you're,

Starting point is 00:35:02 that, first of all, that the migration takes less time, and also you are up-to-date in terms of your data changes on the remote cluster, too. All right. I think I'm done with the presentation. Any more questions? Yes. Yes.

Starting point is 00:35:36 Yes, so basically there's a driver interface in Stork, and as long as you implement that interface, you'll be able to use Stork to do all of this stuff. It's all open source. So the entire project is open source on GitHub, yeah. So you can do everything. So including doing the extender

Starting point is 00:35:54 to basically prioritize your pods, doing the health monitoring, taking snapshots, as well as the cluster migration and the pairing, all of that, as long as you implement that driver interface in stock, you'll be able to use this. Are there any other storage vendors, products, testing certified with Startport? Right now, there's only Portworx. Yeah. But it's an open source thing, so anybody from

Starting point is 00:36:22 the community is welcome to submit PRs to add support for their drivers, yeah. So, I mean, some of this stuff would be useful to get into CSI or Kubernetes eventually and have people start supporting it. Well, it depends. So, if you look at Kubernetes, right, right now, if you try to add new features, there's a lot of push to add,

Starting point is 00:36:48 to basically extend Kubernetes instead of putting stuff into the code right now. So it really depends on the Kubernetes community if they would want to add stuff like this. Yeah. Yeah. Yes. Yeah, and for CSI2, as long as we are able to get all this information through CSI APIs, we might be able to add a driver for CSI2.

Starting point is 00:37:15 But at this point, I don't think there are enough APIs in CSI to implement all of the functionality that's required. All right, cool. So here's the link for GitHub in case you want to look at that. There's a link to the blog too. And if you want to try out Portworx Enterprise, feel free to send out an email at info at portworx.com

Starting point is 00:37:38 and I'll be hanging around after the talk if you want to chat too. Thanks.

Your Ad Here

Storage Developer Conference - #85: Bulletproofing Stateful Applications on Kubernetes

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.