Grey Beards on Systems - 126: GreyBeards talk k8s storage with Alex Chircop, CEO, Ondat

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Keith Townsend. Welcome to another sponsored episode of the Greybeards on Storage podcast, a show where we get Greybeards bloggers together with storage assistant vendors to discuss upcoming products, technologies, and trends affecting the data center today. And now it is my pleasure to introduce Alex Kierkorp, CEO of Ondat. So Alex, why don't you tell us a little bit about yourself and what's going on at Ondat? Hi Ray, hi Keith. It's great to be on the podcast. So my name is Alex Kierkorp. I'm one of the founders and the CEO of Ondat, which you might previously have known as StorageOS.

Starting point is 00:00:49 And at Ondat, we're building a cloud-native storage solution for stateful applications in Kubernetes. Really excited to be here. Yeah, great, great. So we've been talking to a lot of Kubernetes solutions these last couple of months and stuff like that. What makes OnDat different than some of the other solutions we've talked with? Right. So OnDat is different in terms of three main functions. The first is we're completely platform agnostic. So we're a software-only solution actually deployed as containers across any platform, whether that's on-prem or in VMs or in cloud instances, and supporting all of the different

Starting point is 00:01:43 Kubernetes distributions, whether that's some of the different Kubernetes distributions, whether that's some of the more traditional ones like OpenShift or Rancher, for example, or some of the newer ones like EKS and AKS and Anthos, for example, which we're seeing a lot more pickup in. The second point is we design the whole product around the developer and around enabling the developer to run their applications in the easiest way possible. So providing them with a frictionless experience to effectively run stateful applications, just like they run stateless

Starting point is 00:02:26 applications. And then finally, we do all of that with the performance scale and reliability that you'd kind of expect for mission-critical systems, you know, and providing the ability to scale over really large environments and provide deterministic performance for those workloads, like databases and message queues and those key functions that provide the stateful capability to those applications. Yeah, so I was looking at your website, and there's like a page and a half of different applications

Starting point is 00:03:03 that you're connected to and stuff like that. I thought it's kind of interesting that you cover a lot of ground with that um so you mentioned uh reliability availability performance those sorts of things what you know i saw something on your website about your performance can you explain how you're able to achieve high performance like that sure so. So there's a few things that contributes to that. But at a fundamental level, we're kind of engineered from the ground up to be extremely low latency. And as you're aware, latency directly translates to things like transactions per second. So the way we do this is through very low level optimization throughout the entire IO queue chain,

Starting point is 00:03:58 the threading model, the network connectivity, et cetera, to make this happen at a very efficient level. We've also done a lot of work to use accelerations for encryption and hashing and a variety of other things to be exceptionally CPU optimized. We do a lot of work as well in terms of laying out data in the most optimal way to deal with both throughput scenarios where applications like Kafka, for example, might be focused very much on the number of megabytes per second you can push through, but also in terms of small block random workloads. Exactly. And we also employ a number of optimizations as well to make this happen in a fairly deterministic manner across a Kubernetes environment, which

Starting point is 00:05:03 by its very nature has a lot of moving objects, right? So, you know, nodes come and go and clusters scale on demand. And we employ a lot of automation to make sure that the workloads are kept as local to the applications as possible. And we effectively provide that confidence that if a job took five minutes yesterday, it'll take five minutes today. That's very unusual. These days, right? Right. Now, help me out with some of the logical pieces here.

Starting point is 00:05:50 The team decided to build a platform that's kind of agnostic of the underlay. But, you know, that's kind of an anti-pattern in storage. You know, storage systems, you want to kind of touch, feel, and understand the consistency of the underlay. How do you ensure, whether it's a VM-based, et cetera, that the, to your point earlier, that that transaction that took five minutes yesterday is going to take five minutes today, regardless of, you know, kind of what's happening underneath. Especially a problem in the cloud, quite frankly, right? I'm sorry, go ahead.

Starting point is 00:06:37 No, no, no, it is. And, you know, we, there's no, you know, there's no special magic for non-deterministic performance in the cloud for sure but I think the key thing is to be able to make use of or best use of whatever the underlying technologies are whether that's solid state or NVMEs or things like that which we're seeing a lot more of even in cloud environments you know which which which provides you know potentially hundreds of thousands of iops per device um even even in those cloud instances and then secondly the the other part which is which is typically the other

Starting point is 00:07:17 constraint is is the networking um and we you know to do that, we make sure that, for example, if there are any failures or if there are scaling, we have a function called Delta Sync that, you know, intelligently nodes, because we often find that despite having very fast CPUs and very fast disk, often the network between nodes is the biggest bottleneck in these environments. So is something like compression an option that can be turned on and off, or is it something that you do all the time for the data? For network traffic, it's something that we just do all the time for the data? For network traffic, it's something that we just do all the time. We've just found that. OK. Is it placed on the storage media in a compressed format?

Starting point is 00:08:14 It is, right. So it's on that. Every function, whether it's replication or encryption or compression, is just a factor of having a label on the volume in Kubernetes, right? So you can, exactly, right? So it's done either, you know, it's granular per volume, but obviously most people will deploy this as part of a storage class and effectively have a group of volumes with a particular class. And the idea is that we'll encrypt data in transit and at rest. We'll compress in transit and we'll also compress at rest based on those labels. And those are all things which are selectable, you know,

Starting point is 00:09:08 on a per volume or per class of application. So everybody's, you know, the reason for all this because of the stateful applications are coming online in the Kubernetes space. What's driving the statefulness of containers? Obviously, they originally came out, they were all stateless, ephemeral kinds of things that come and go as they need and scale up and down as needs happen, but this statefulness is a different world.

Starting point is 00:09:39 Right. Well, so containers are effectively a different way of packaging an application. And the container includes all the dependencies for an application. So the application becomes self-contained and it means that the application is portable. And then with orchestrators like Kubernetes, the developers get this superpower where effectively they can compose what their application requires in terms of, you know, compute and memory, but also network connectivity and also storage connectivity. And so what we're seeing is this sort of the shift left, right, where developers are becoming responsible for defining all the different parts of the infrastructure. And it started with things like testing and has moved on to security, networking, and storage storage too. And therefore, what we're really seeing is it's less about whether containers are stateless or stateful.

Starting point is 00:10:51 It's about all applications need to store state somewhere. And if you're using an orchestrator to automate this functionality, then why wouldn't you also want to automate your stateful parts of the of the application because at some level they all need state i i always thought this but nobody agreed with me no absolutely you know and and the thing is you know when we talk about when we talk about cloud i i have this sort of theory that that cloud it's not about really the place, right? Cloud is about the on-off consumption model and the automation and the self-service. And effectively, that's what the developers

Starting point is 00:11:34 are getting with Kubernetes, right? They're getting that capability of specifying what they need, whether it's in their dev environment or their pre-product or production environment. And then Kubernetes kind of makes it so for them. And therefore, what they really also need is, you know, the ability to have the same data services available in all those different environments wherever Kubernetes is running. And maintaining, you know, the scale and availability and performance and dependability, which they

Starting point is 00:12:08 kind of just depend on from the infrastructure. So I was at KubeCon a few weeks ago and we talked about governance an awful lot. I think Staple, folks like Kelsey Hightower from Google will still argue that stateful apps and Kubernetes is not an appropriate approach. But we're not here to talk about whether or not it's appropriate, but support what people are doing. And I tend to agree with them. Kubernetes is probably not the platform to build stateful apps in. That wasn't the design pattern. But I want to get on this topic of developers and using tools and solutions like OnDeck. And you mentioned security and networking

Starting point is 00:12:59 and Kubernetes in general being kind of all encompassing of defining application environments. What I'm seeing is a separation. You know, we're we're generically using the term developer to talk about the team that builds and operates and makes available the Kubernetes platform or the platform built on Kubernetes to provide services. Are you seeing kind of this bifurcation of teams where you have developers focused on the platform and then developers focused on building the application and one serves the other? Or are you seeing kind of this nirvana of the mystical developer doing it all? It does vary between organizations and sides of organizations, right? So there is a spectrum from pure application developers to you know devops teams to sort of platform managers

Starting point is 00:14:08 and and and everywhere in between but in general what we are seeing is this consistent shift left where it's getting closer to the developer and the the processes around, for example, CICD are automating the changes in the environment between these different systems. So as an example, you know, you might have the developer working on an environment on their laptop. They push changes to a Git repo and, you you know i see icd process automatically you know pushes that into say a pre-prod environment but the pre-prod environment is is say in in the cloud or or or even on-prem um and you have automated policies that that effectively take the same standard definition of what the application needs and applies them correctly to the different environments. So, for example, you might find that the developer working locally in their laptop might not need to worry about things like replication.

Starting point is 00:15:18 And then that gets automatically applied and high availability across, say, replication across availability zones gets applied automatically as part of a policy when it's being migrated into those larger environments. And these things actually can happen automatically now, right? And this is where you get the governance that you mentioned, right? The developer can specify and the DevOps teams specify what their application needs.

Starting point is 00:15:49 But a lot of those policies can be implemented in a centralized model, right? Where they can apply, you know, security requirements like encryption or they can require, they can define availability requirements like replication. And because of this, I think Kubernetes is basically able to cater for just about every

Starting point is 00:16:13 type of workload, right? Because the incredible thing of giving developers the ability to specify and to compose their environment means that effectively they can build anything as a service so now you know if they have an operator for a database and they need a second database or 10 databases they can just spin those up on demand as needed and then tear them down by the way you know and that's that's another part of cloud-native technology where systems like OnNet allow the creation and deletion of these volumes and the automation for these environments. And I think that's what the superpower is.

Starting point is 00:16:58 And it's true in the beginning, maybe Kubernetes wasn't taught of for stateful workloads. But I think that's gone, that kind of concept really has evolved because you wouldn't want to have two sets of operational processes, two sets of CI CD, two sets of GitOps, two sets of data management systems, just to have, just to be able to manage.

Starting point is 00:17:27 One for the state, one for the app. Right, exactly. So let's talk about the shift left. I'm really curious about it because this is where I'm finding much of the complexity. So on that highest complexity, the Stateful app, Stateful is not less complex in Kubernetes than it is pre-Kubernetes in monolithic application design architectures, et cetera. So on that is hiding that and automating that complexity.

Starting point is 00:17:59 What happens when it breaks and there's a need for to kind of peel back the layers of complexity? Where are you seeing teams be successful and teams struggle, especially when it comes to something like on data and something as complex as enterprise storage? Right. So let's break that down. The first part of that equation is how do we simplify things? And the way we do this is we're deployed as a container on the different nodes. We virtualize the storage that's available on each of those nodes, and it can be physical, virtual, or cloud disk.

Starting point is 00:18:47 And then we effectively establish a pool of storage with a data mesh that makes sure that volumes are available instantly on any nodes within the platform. And then the developer can use dynamic provisioning to effectively create and define what they need out of the storage. So they might say, for example, look, I want a data volume, and I'm going to give it a name called database one. And it's going to be 100 gig, and it needs to be encrypted, and I need three copies of it. And on that, we'll just make that happen under the covers. We'll automatically create that data set and we'll set up the replication and the data mesh. And it's completely transparent to the developer. The developer can then say, I want to run my database container and I want it to connect to that data volume.

Starting point is 00:19:45 And Kubernetes will just run the application and connect the data volume into the namespace. So in that sense, it is extremely simple. When things go wrong, you know, if there is a node failure or, you know, component failures or, you know, even complex scenarios where you have cluster partitioning events and things like that, then on that we'll automatically make sure that the data continues to be available and we will automatically make sure that the desired state matches the actual state. So if primary volumes or replica volumes get lost or deleted,

Starting point is 00:20:24 we will recreate them and re-replicate them. So when a cluster is partitioned because of some error in networking, I guess, or something like that, and you've got these volumes that, let's say, have three replicas, and maybe two of them are sitting on one part of the cluster, and the other one is sitting on the other one, other part of the cluster. So you will automatically start the process to make sure that you have three replicas involved? No, no. In on that, we have this concept of disaggregated consensus, where effectively every volume has a mini brain and is able to make placement decisions and failover decisions independently of other volumes within the cluster. So whichever node holds the primary volume effectively holds the lock, if you wish, for that primary copy of the data that controls that failover process.

Starting point is 00:21:36 And that's a strongly consistent process across the cluster using Raft protocols, raft protocols and things like that to ensure that strong consistency. So whichever side of the, whichever side of the cluster owns the primary or owns the majority of the data will automatically perform the recovery process. The side that's effectively isolated will actually be physically disconnected, even protecting against some of these nightmare scenarios

Starting point is 00:22:18 where you get split-brain clusters and things like that. So effectively, every transaction and every I.O. has a transaction ID and once a node has been isolated any transaction IDs below that certain number are automatically discarded. So even if the node does reconnect to the network you know it wouldn't cause any problems and it would sort of self heal. So I guess my question is not necessarily self healing, but more of returning back to a good known state. So there's, you guys are doing a lot of neat stuff to make sure that data isn't lost, that there's the correct number of

Starting point is 00:23:05 replicas that you're adjusting for the complexity of the underlay. But what happens when the underlay itself is a root cause of the issue? Let's say this is on physical hardware and there was firmware updates the day before, which the developer either is involved in and not involved in, but they are the pseudo-administers of OnDat. And that is causing a performance issue. The shift left, obviously developers only have purview to what they have purview to. How does OnDat and a support model help a developer who's not necessarily a hardware expert or even a cloud infrastructure expert get back to a good known original state?

Starting point is 00:23:55 Yeah, that's a good question. So when there's underlying issues, which could be, you know, performance, could be network related, it could be, you know, it could be intermittent issues or whatever. What we're looking to do is providing the telemetry and the monitoring to kind of give the developers and the DevOps teams the ability to diagnose those sort of issues. And in fact, one of the things that we're building to continue to help developers, you know, not only a frictionless way of deploying and licensing and activating the software, but also providing the visibility and the big picture view of the cluster so that they can diagnose, you know, performance bottlenecks or, you know, lack of capacity or, you know, the sort of issues that might come up when doing an upgrade or scaling of a cluster, for example. So the idea is today we plug into the telemetry and the monitoring capabilities and the observability capabilities of Kubernetes itself.

Starting point is 00:25:26 And as we develop our SaaS platform, we give end users the ability to view this in a more holistic manner and be able to see the big picture across not only just one cluster, but across all of their clusters. Because the other complexity typically is that we're seeing organizations deploy not only large clusters, but a larger number of smaller clusters too.

Starting point is 00:25:53 So it's how clusters interact with each other that is also part of the equation there. You talked a little bit about encryption and compression and those sorts of things. Tell me a little bit more about the storage facilities that are available. Do you offer synchronous replication? Do you offer snapshotting? I mean, what sort of data protection is built into the system? You mentioned that you can turn up replicas from one to three, I guess. And those are mirrors within the system.

Starting point is 00:26:25 I'm thinking, do you have RAID protection kinds of things? Right. So what we do is we set up replicas between nodes. And we actually use synchronous replication to ensure data integrity and strong consistency across the different nodes. availability zones to ensure that say replicas are in different data centers or different racks or different groups of servers, depending on the topology of your environment. Would that be one cluster in the Kubernetes that spans multiple availability zones or that would be different clusters?

Starting point is 00:27:18 No, that can actually be one cluster. We have a number of customers today who are deploying across three or four availability zones, for example, and using on that as a way of actually replicating data across those availability zones in a transparent way. So their application can sit on a node in any of the availability zones and transparently access the data. And if a node goes down or indeed the whole availability zone goes down, the application can restart somewhere else and continue to access the data. So the data protection is predominantly based on replication or mirroring within the cluster.

Starting point is 00:28:09 It is. And the reason for that was sort of architectural, but also goes back to the deterministic performance. You know, there are a number of different, you know, pros and cons for for different mechanisms whether it's you know parity or ratio coding or or or replicas etc um and we specifically chose replicas because that allows us to to maintain uh the lowest latency for for these environments yeah sure sure what about um snapshots or copy instantaneous copies and that sort of thing? Do you guys support that?

Starting point is 00:28:49 So we support different backup mechanisms to allow for the protection of the data. The snapshots themselves are on the roadmap and those are coming very, very soon. Okay. What about replication across clusters? on the roadmap and those are coming very, very soon. Okay. What about replication across clusters? So again, those are roadmap items where we're looking at different ways

Starting point is 00:29:17 of both consuming data across clusters and replicating data across clusters and providing this multi-cluster capability. In general, we're finding that some of the federation capabilities in Kubernetes are still sort of being developed and relatively immature. And we're working with that sort of timeline. Alex, I'm really interested to know, what capabilities are we kind of missing from the just change in approach like when things shift left and you have a fresh set of developer eyes looking at something like enterprise storage and leveraging this to build applications what what either motions or or patterns or or features are developing that us traditional monolithic applications,

Starting point is 00:30:28 faithful folks looking at it from kind of on-premises world, what's happening in cloud native that we probably should start to take note? So... Besides the automation. Well, so the automation is key, but I think there's also two other things that are worth mentioning. native environments has, you know, the entry bar has been raised. You know, say 10, 15 years ago, we used to be talking about storage arrays with maybe tens of thousands of IOPS. And then, you know, with flash arrays, we were talking about maybe hundreds of thousands of IOPS. But nowadays, with nodes having NVMe drives and local SSDs. We're talking about hundreds of thousands of IOPS per node.

Starting point is 00:31:28 And so what's more important is to have an ubiquitous set of services, whether that's replication or access or encryption, for example, and compression across all of those different environments. So I think that's the first step. The bar has gone up and you know there's there's the environment is therefore inherently a little bit more forgiving um the the the second aspect to to all of this is is also you know not just the automation, but the ability to compose things means that developers are

Starting point is 00:32:10 tending to break things up into smaller chunks, and DevOps teams too. And this translates to databases and message queues. In older, maybe in, say, even now, I guess, we see database servers, which are big, bare metal boxes, perhaps connected to a SAN. And they'll run a really large database instance and potentially tens or dozens, sometimes even hundreds of little databases within a huge database instance. But what we're seeing in the cloud native world is that they're actually breaking up those databases into smaller instances. So you're moving from, for example, one big monolithic database to 10 smaller database instances, which give you greater flexibility because they're not all combined on a single server requiring lots of cores and lots of memory and therefore a lot more flexibility.

Starting point is 00:33:16 And it also means that as developers add applications or projects, they can fire up smaller instances on demand. So what we're seeing is this translation to, you know, some of the concepts of microservices and the breakup of these monolithic applications also apply to databases. They apply to message queues. They apply to, you know, streaming systems like Kafka, for example. And when I say databases, it's everything from the more traditional SQL databases to distributed databases too. It just applies to all of those things. So breaking up of those systems in this composable environment makes it more flexible and less error prone too. Yeah. That's really interesting because so much of limitations or design limitations

Starting point is 00:34:08 is based on the control plane or infrastructure. You look at why public cloud hasn't taken hold in private data centers. It's because it's very hard to chop up the control plane small enough to get it into the private data center to make it make sense financially and from a footprint perspective. So as developers are able to take a message queue and a database service and all these big heavy services that were centralized and they can begin to decentralize and abstract them in a way that allows them to put smaller bits of pieces of application geographically dispersed. You get better performance, you get better response, and I hope in the future, just better applications that serve needs that we previously had barriers to because of this centralized control plane issue.

Starting point is 00:35:07 No, that's absolutely true. And it's part of the journey. If you think of an application being put in a container, which is now portable, now what we're saying is it's not just the application, but it's everything that the application needs to think of and to talk to. So it's the components that the application needs to process data to. It's the FIPS and the service points on the network that the application needs to communicate or expose to the external world. And all of that is configured in one way uh with with with kubernetes and that means that you know the whole application with its dependencies now becomes self-contained and that gives you know developers a whole load of additional benefits you know one you're you're

Starting point is 00:35:59 kind of limiting the failure domain and the blast radius when when things go wrong you know you're not taking out say 10 applications because your big know, multi-core bare metal box has gone down. But also, you know, secondly, it gives them a lot more flexibility with things, simple things like, you know, versions and patching levels and all of those sorts of things. You know, I mean, we've all lived through the trauma of, you know of one application requiring a particular version or patch level of a particular database because of certification or whatever else. And then other things which are using the same database server all break. And now you don't have that anymore because each application can use it at their own level. The other thing I was going to mention was scalability. The ability that Kubernetes brings to the table to scale up and down applications

Starting point is 00:36:51 is almost unfathomable in a normal non-container world, I would say. It's just not doable. No, that's absolutely true. And it also enables environments to use some of the services which are nodes in the cluster maybe have local NVMe disks and they're providing storage for the cluster and other nodes are focused on compute and just consuming that storage in a transparent way. And therefore, those nodes can then scale up and scale down on demand based on the actual application. So the quick question on the opposite end of the spectrum, Kubernetes isn't the answer for everything. So is ONDAT specifically laser focused on Kubernetes and the Kubernetes environment? Or are there use cases that stretch beyond just using OrnDAT for Kubernetes? Well, no, we're very focused on Kubernetes

Starting point is 00:38:14 and the way we've built our software to run in a container with an exceptionally low overhead. You know, we've got customers running in tiny cloud instances with you know a couple of cores and four gigs of ram all the way to you know bare metal boxes with you know tens of cores and and hundreds of gigs of ram um and and and everywhere in between you know so so you're right i don't think i think k think Kubernetes is a tool. I kind of think nowadays of Kubernetes like the Linux kernel, and there are a number of distributions and services

Starting point is 00:38:52 that are built on top of Kubernetes. And more and more we'll begin to see this sort of functionality emerge where Kubernetes will be kind of like an infrastructure abstraction layer going all the way from edge to centralized data centers and cloud and a bit of things in between. So what's the size of a typical on-deck deployment in, let's say, usable capacity? There is such a thing, right? Yeah, it's hard to put specifics on that because it does vary quite a lot. Typically

Starting point is 00:39:35 many clusters are a few hundred nodes in size, but we've got some clusters which are a couple of thousands. We are also a few hundred nodes in size, but we've got some clusters which are a couple of thousand. We are also anything from a few tens of terabytes to a few hundreds of terabytes in these sorts of environments. Typically, the workloads that we're protecting or that we're providing data services for is things like, you know, databases and message queues and those, you know, things like Elasticsearch and Kafka, et cetera. So these stateful services that are transaction oriented

Starting point is 00:40:16 or that are actually running the system, and then, you know, other services will, might be providing object stores or things like that as a backup or an archive mechanism. So how is this priced? Is it priced on a per-capacity basis, per-data-storage-node basis? It's a very simple process. It's effectively priced per node in a cluster. Yeah. So we have some differentiation for volume and for bare metal versus VMs or cloud instances.

Starting point is 00:40:58 But it's really a simple node pricing. And shortly with the SaaS platform, we'll have on-demand pricing as well. So it's a more traditional cloud pricing model where it will be per hour. Yeah. So this is block storage in a Kubernetes environment. It doesn't include file storage. Is that true? No, so there are two types of volumes.

Starting point is 00:41:31 In Kubernetes, it's called read-write-once and read-write-many. And basically, read-write-once are effectively block volumes with a file system that's mounted within the namespace of that application. And read-write-many volumes are a shared file system that's mounted within the namespace of that application. And read-write-many-volumes are a shared file system. So those are file systems which are shared across multiple nodes and multiple applications. So think of it as the equivalent of NFS, for example. And you're doing read-write-one. Is that my understanding?

Starting point is 00:42:07 Correct? No, no. We're doing both. Oh, so you offer both. Yeah. So, it's, you know, typically databases will be using read-write one, so those block volumes. But there are many applications that have sort of packed. Yeah, exactly. And also, you know, there are lots of applications that kind of just, even, you know,

Starting point is 00:42:30 transformation, data transformation type systems where one system will write data to a volume which was read by another system and that kind of thing. So you can have those containers running on different nodes sharing the same file system.

Starting point is 00:42:44 I have to ask the one question. Does it support Tanzu? Or how does that work in this environment? How does ONDAT work in a VMware, Tanzu, Kubernetes grid kind of thing? It just works natively. I mean, Tanzu is just Kubernetes on VMware, right? So it's no different conceptually to say OpenShift or Rancher or Anthos or EKS, or even some of the more complex environments which we're seeing nowadays where customers are deploying EKS anywhere on-prem or Google Anthos in AWS and all the different combinations that we see nowadays. So you've tested it or it's your belief that it runs fine?

Starting point is 00:43:37 I have to ask the question. I'm sorry. No, we believe it runs fine. We've done some basic testing on the community edition so far. Yeah, I think it would be more of a detriment to VMware if it didn't run fine. This is kind of, I think this is one of those things that as traditional infrastructure folks, we have to start to wrap our heads around that Kubernetes compliant distribution is a Kubernetes compliant distribution. It's not quite the same as Linux and Linux kernels where, you know, like is Red Hat supported versus Ubuntu versus whatever. If it's a compliant distribution, it's a compliant distribution type of deal. It is. that's true but what we're seeing nowadays is um a lot more opinion opinionated distributions right so the you know the

Starting point is 00:44:37 distribution i mentioned kubernetes is almost like that kernel but then there's all the stuff around it right so so for example you know you for example, you'll get Google Anthos, which includes Kubernetes, of course, but it also includes Knative for serverless, and it includes Istio for network service meshes, etc. And different distributions will have different ways of doing that, and that extends to, you know, security and a variety of other things too. So, so not all distributions are, are identical and the opinionation is actually good. You know, I think, I think customers want to have an opinionated distribution that actually does a lot of stuff out of the box and not have to build everything from scratch. Right. So I guess that does open open the other question which is you know as i look at you know the cloud native uh kind of correlation of services between service

Starting point is 00:45:32 meshes and etc where do customers have to kind of be careful about opinionated platforms when it comes to storages the do you guys just plug into CNI and stay focused onto the CNI plugin and the operators that you provide, or is there a reliance on, let's say, a certain service mesh for visibility? Well, so Kubernetes provides an abstraction layer called CSI. So similar to CNI for the network, CSI is the equivalent for storage. And it allows Kubernetes to talk to the storage system.

Starting point is 00:46:14 But, you know, CSI doesn't guarantee that services are available. So, for example, you know, you can use csi to consume ebs volumes in in amazon but that doesn't mean that a cluster that's um that that stretches multiple availability zones will be able to access ebs volumes across availability zones because you know that's not an architectural possibility with with with amazon um similarly you know you might come across other restrictions in terms of you know failover times or um or encryption capabilities or or other services right and so you know csi is is effectively just a standard way for kubernetes to access the storage system but all storage systems are effectively very

Starting point is 00:47:03 different um you know in much the same way that you could say, you know, iSCSI is a standard way of accessing, you know, a volume from a SAN array. But, you know, what the SAN array does with that volume is very different. Yeah. All right. Well, this has been great. Keith, any last questions for Alex before we close? No, it's been a great conversation. All right. Alex, anything you'd like. Keith, any last questions for Alex before we close? No, it's been a great conversation.

Starting point is 00:47:26 All right. Alex, anything you'd like to say to our listening audience before we close? Just one thing. We're growing incredibly rapidly and we're always actively recruiting talented Kubernetes and Golang engineers. So if anybody is interested, please do come to our website and let us know. We'd love to hear from you. Well, this has been great, Alex. Thanks for being on our show today. Thank you very much.

Starting point is 00:47:53 It's been great talking to you, Ray and Keith. That's it for now. Bye, Keith. Bye, Ray. And bye, Alex. Bye-bye. Until next time. Next time, we will talk to the system storage technology person. Any questions you want us to ask, please let us know. Until next time. Thank you.

Your Ad Here

Grey Beards on Systems - 126: GreyBeards talk k8s storage with Alex Chircop, CEO, Ondat

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.