Grey Beards on Systems - 125: GreyBeard talk K8s storage with Tad Lebeck, US CTO for ionir

Starting point is 00:00:00 Hey everybody, Ray Lucchese here. Welcome to another sponsored episode of the Greybeards on Storage podcast, a show where we get Greybeards bloggers together with storage assistant vendors to discuss upcoming products, technologies, and trends affecting the data center today. Now it is my pleasure to introduce Tad Lebec, USA CTO of Ioneer. So, Tad, why don't you tell us a little bit about yourself and what's going on at Ioneer? Yeah. So, nice to meet you, Ray. Yeah, Ioneer is, you know, one of the companies solving that persistent data problem in Kubernetes, you know, for stateful applications.

Starting point is 00:00:51 But we take a very different approach to it because, you know, the Kubernetes value prop is really around being able to run your applications anywhere that makes sense and moving them pretty easily. But stateful applications really kind of drag that down. You get anchored to where your data is. You know, a lot of people call that data gravity. What we've done is a unique thing where we're able to move data across space and time. And it sounds really kind of great, doesn't it? It's a cute trick if you can do it. I'll say that much. Yeah. Well, we do kind of bend the laws of physics with this too, because across time is pretty straightforward. We've seen this with other products in the past where you have continuous

Starting point is 00:01:28 data protection. Because our architecture, the way the storage system works is that we're able to dial back time to one second increments. So, you know, for instance, if you're running a database and you get dropped a table space by accident, shame on you. But instead of having to go back to the most recent snapshot, you can simply say, you know what, I'm just going to dial it back to the second before I did that and boom, you're back up and running again. And that's a really clever thing. So you don't have to think about doing snapshots. It's just automatically done as you're operating your data. Oh, that's nice. There's always limitations on how much of that you can do and stuff like that and the change rate and things of that nature.

Starting point is 00:02:06 But I guess we'll get into some of that stuff later. Yeah. And then the other thing where it really gets kind of crazy is moving across space. And, you know, we have this great demo where we take a Mongo database running in New York and we teleport it over to Dublin, Ireland. And Mongo has a way to, say, replicate that data over there and then bring it up so you can actually have remote access over there. And so what we do in the demo is we start the Mongo process for that, and then we start the Ioneer way of doing that. And we're able to get that data up and accessible within 40 seconds. It takes Mongo a little bit longer to kind of get itself initialized. But we run queries on the New York side.

Starting point is 00:02:49 And then when it's up and available on the Dublin side, we do the same queries. You can see the data's there. And we're able to get that up and going an order of magnitude faster than the Mongo. Ah, an order of magnitude faster. That's very impressive. Are you replicating before, or is this something that's done the instant you say, go ahead and do this?

Starting point is 00:03:11 Well, so it has to do with how the architecture works. And so we don't, you know, the traditional data addressing for storage is, you know, you have a volume and a volume offset, and you kind of, you know, in your persistent volume, that's how you access it. We present that through the container storage interface, but underneath the covers, we actually have a way of naming the data. We, you know, we create a unique name for it. And when we teleport that across, we send over that library of names so that the remote site can start accessing it. Things that are available in a globally deduped data store are available immediately. Things that aren't, they can reach across and pull them over. So you can start accessing the data.

Starting point is 00:03:54 You have longer latencies to start with. All writes are local. So it's up and running as soon as you're able to get that library or that directory. The metadata. Yeah. Defining the data sitting on the persistent volumes. Exactly. And so that allows us to really create as many copies as you want without any

Starting point is 00:04:12 replication necessary because, you know, the data is just referenced. And if it's in the same cluster, there's no actual, there's no copy on write per se. It's just simply we write new data. The way our data store works is it's immutable. So every write is, you know, item potent. We store that. And so another write will generate a new hash and create a new name for it. And they'll look up for these things from that, you know, from the container storage interface, includes the volume and the offset as well as the timestamp.

Starting point is 00:04:46 And so using that, you look up in the metadata server the actual unique name for it. Anybody who hashes to that same value will get the same reference to that. So that allows us to really kind of automatically dedupe things because there's no need to rewrite it again. And then as you do writes, if it exists already, then you don no need to rewrite it again. And then, you know, as you do writes, if it exists already, then you don't need to even write it. Typically, you know, deduplication occurs on some sort of a boundary. It can be at the file level. It can be at a block level. It's unclear to me, is Ionir a block device or a file device? So, you know, in the Kubernetes world, persistent volumes are block devices. That's what we present. And then, you know, Kubernetes lays a file

Starting point is 00:05:28 system on top of that. So we present a block device into the Kubernetes storage. And it's read-writable by multiple containers, or is it only a single container read-write? So the usage model in Kubernetes is there's not multiple reader-w they the notion is you have your data and you operate on it um and so because of the way our our dedupe works you can have multiple readers off the same data but they have their own view so when they do their writes they write to their own namespace yeah and that namespace is a um artifact of ionir it's not the kubernetes namespace well Yeah. And that namespace is a artifact of Ioneer. It's not the Kubernetes

Starting point is 00:06:07 namespace. Well, so it's, you know, when I see the namespace, it's really that addressable space that's presented for persistent volume claim. And so it's a Kubernetes notion. I gotcha. We're built for and in Kubernetes, you know, we're a microservices architecture. That's just what we do. We don't try to do this outside of Kubernetes. We're really built for and in Kubernetes. We're a microservices architecture. That's just what we do. We don't try to do this outside of Kubernetes. We're really built for and with Kubernetes. Right, right. So the containers, let's say, operating in a cluster,

Starting point is 00:06:35 when they're accessing the persistent volume, does the persistent volume have to be on the same worker node as the container is executing? No. It just has to be in the cluster someplace. It has to be in the cluster. Or when we do teleport outside of clusters, it's a special use case because you have to be able to.

Starting point is 00:06:52 But inside the cluster, there's the notion of those nodes that contain data, and there can be other worker nodes who are accessing data. And the worker nodes that contain data can also actually run containers as well. And they're running, obviously, the storage container service, whatever that turns out to be. That's right. So we run in Kubernetes as these microservices. And if Kubernetes decides to run other containers in that same worker node, that's up to them. And the right, so the item potency of the data means that, you know, if I do a right, it's forever. So I can go back to the first right to a block as well as the last right to, you know, the

Starting point is 00:07:40 block and all that's being maintained in Ioneer storage and metadata? Yeah, exactly. And now with caveats, right? Because as you mentioned before, if you run this long enough, you'll just consume all time and space. So we do have the ability to say, you know, look, preserve this for some time period. And you can dial that between, you know, hours, minutes, days. So the continuous data protection, you can apply and say for this persistent volume, save it for the last two days or something like that. How long to retain that? So you can,

Starting point is 00:08:14 you know, what window do you want to dial back into? And then we also have the ability to name certain points in there so we can bookmark places. And then those are preserved until you actually manually get rid of them. So like if a ransomware attack occurred, you could go say, okay, the second before a ransomware attack, the good volume, something like that. Absolutely. That's exactly one of the use cases. So, so Ioneer is, is, is the reboot of a company that was called Reduxio in the past. And so they were doing tin wrap software.

Starting point is 00:08:47 And, you know, as the market moved away from enterprise appliances, they pivoted into the Kubernetes space. And so that ability, they actually have great customer experience with, I think it was a police department on the East Coast where they actually had a ransomware attack and they were able to just dial back time and recover without any input. So a couple of things. Containers and persistent volumes, do they have affinity to, can they establish affinity to a node and things like that? So what we do in this environment is we actually utilize the ephemeral disks on the nodes, and we pool that together as a storage and present that as logical volumes in the environment.

Starting point is 00:09:32 So the nodes that have ephemeral drives that you've allocated as data storage nodes will present that. And so if Kubernetes wants to do hints towards people running closer to the data, they can do that. But that's not something that we would actually manage. And as far as data protection for the data, are you using like RAID structures or erasure protection or replication? The current version is using a three-way mirror we sort of reach a form um and we have on the roadmap to do erasure encoding um as we scale that out because you know these clusters can have more than three nodes as a matter of fact many of the clusters have a lot of nodes and we only require three-way mirroring across that so as we look at doing erasure encoding

Starting point is 00:10:24 you want to do larger writes there. So you have to be a little clever about taking small writes and grouping so you get effective stripe writing across the erasure encoding volumes. And the data is effectively, I would call it log structured on the ephemeral disks across all the solutions or is it? No, we don't. So we're not, you know, it's not log structured. So that's, we're a little bit different on that

Starting point is 00:10:51 in the sense that the way that we do it is, you know, we were able to do this three-way mirroring for Quorum and making sure that we have availability across that. But the metadata server is really the value on that. So there's no log replay or anything like that. We simply, you know, you query based on the timestamp you want and you get back the blocks that you want for that. Then you navigate your file system accordingly. So there's no log replay or anything like that.

Starting point is 00:11:21 Right, right. So the metadata server becomes a crucial data integrity aspect of the solution. How is that protected? Yeah, so the metadata server actually uses its own three-way mirroring, and it has another, distinct from the user data, way of storing that data because you don't want to have to rely on the same thing. So the metadata stripper actually does that,

Starting point is 00:11:47 and then we protect that independently. And so you mentioned, I don't know if it's data migration or data replication across clusters. And so in that scenario, you would replicate the metadata server for that named volume or for all the data sitting on the cluster? Only for the volumes that you want to move. And so, yeah. So we don't need to drag the whole thing over. We just simply say, hey, this is a volume. You want to bring that over here?

Starting point is 00:12:17 And then so at the target cluster, when accessing that named volume, if the data is not there, it goes out to the originating cluster, I guess, to get the data. But are you moving the data in the background so that in fact the data would be replicated in both places or how do you control that? So we just do the heat maps, bring things over.

Starting point is 00:12:41 We don't bring everything over. And then because of this global dedupe store in each cluster, the other cluster is going to have its own dedupe. And any data that hashes to the same name, there's no need to copy it over. So we don't start copying everything over because some of it may exist there already from other volumes. And so we simply do it on reference. So you kind of merge the data between clusters on a heat map basis. And so, you know, let's say I've got this volume name ray

Starting point is 00:13:18 on my cluster at home, my crypto cluster, and I want to replicate it to the TAD cluster sitting in, I don't know, East Coast someplace. So because the data is, you know, we do a secure hash on that. We have a name on it. If you have data that hash to that same value, it's the same data. So we don't have to copy that over that exists in your system already it just happens to have the same you know unique global unique

Starting point is 00:13:51 name for it and so for those they just exist already because we name the data based on the contents and if you happen to have data the same content i you know a string of zeros for instance we don't need to copy that over, right? It exists already in there. So we're not really merging the data. We're only simply saying, look, if you've already got data that exists with that same global unique name, we don't need to copy. Okay.

Starting point is 00:14:17 But, you know, so I got this volume on my crypto server called ETH1 or Chia12 or something like that. And I want to replicate it over to the TAD cluster on the East Coast. So the metadata gets merged, right? I mean, so this metadata that's associated with Chia 12 is a list of hashes, let's say, for the data blocks, right? Isn't that how this works? Yeah, you've got the right model.

Starting point is 00:14:43 And then that metadata is going to be copied over to the TAD cluster. And now Chia 12 as a volume sort of simulated but exists on the TAD cluster. Now as you access the data, are you generating the heat map at that point or do you already have a heat map from the ray cluster that you're copying over? We have the heat map from the access patterns from the cluster that was crossed. And you'll copy that data over time so it makes sure it's available at the TAD cluster. The data we need to copy, exactly.

Starting point is 00:15:19 So it allows you to teleport things across like that. Now those are unique cases. Probably the more germane use case that we should probably zero in more on is around, you know, think of your normal CICD pipeline. You've got a team of, you know, engineers working on this, and you've got a database that you've cleaned for the development team to use. They typically, you know, have to go through and make a copy of that data as they go through their DevOps pipeline.

Starting point is 00:15:51 So as they copy that, that takes time. And then they move it through the different integration tests, deploy to dev, deploy to QA. And as they go through that, they're copying data across the pipelines. With us, they simply just get another name for it and they start using it. And when they do hit an error, instead of having to create a new copy of it, they can just simply say reset it back to the time. So I have consistent data to test against again.

Starting point is 00:16:17 So, you know, if you look at that, that pipeline goes from hours down to minutes. Because you're not copying data anymore. And more importantly, you can now have parallel streams going because you don't pay a price for that extra data because mostly it's reading and what they do right is what they get separately. But now you can actually have multiple streams of developers working off that same data set without interfering with each other, and then resetting and moving along. So you really eliminate the data bottleneck, and you really let the pipeline go much faster. And that's a use case that's more typical, right? This notion of teleporting across is unique, and it's useful for some cases, but that's not the main use cases really around this DevOps pipeline acceleration. And that's because you main use case is really around this DevOps pipeline, you know, acceleration.

Starting point is 00:17:06 And that's because you're not really copying data to go from dev to test to QA or something like that. You're just copying metadata, I guess. In those cases, it's all in the same cluster. So you simply say, here, give me, you know, I want my own copy of that. You get the name of it and then you access the same things. But now when you do writes, you create your own write patterns. So why do you think that stateful containers are becoming more important in the Kubernetes world? I mean, it's always been stateless,

Starting point is 00:17:34 plus the databases have been outside the Kubernetes environment, but something's happened in the last couple of years to make stateful containers be more interesting. It's always been there, but I think that, you know, the initial push was let's get stateful containers be more interesting? It's always been there, but I think that, you know, the initial push was let's get stateful, let's get stateless, right? Let's just go do those. And you'd look at the big financial institutions, they've got, you know, 20,000 programmers, they've got all these applications that developed using various databases. They don't want to rewrite those to target them to a database service in the cloud environment. They want to bring them along because the guys who wrote those things have moved on,

Starting point is 00:18:11 but they want to bring it into this environment. And so what you're seeing now is as people bring stuff over. It's a lift and shift kind of thing, right? It's more the refactoring, right? They kind of bring it over. The database comes over as a service, but then they refactor the application to a microservices, but they don't need to have to retarget it. And so there's a lot of, you know, that's what's pulling things now, whereas before it was like, you know, the first wave was we're going to develop it in the cloud for the cloud,

Starting point is 00:18:40 and we're going to use all the services there. But as you start to move towards people thinking more seriously about multi-cloud, that's harder to do because to retarget to another cloud, then you have to retarget to the other database services. If you bring your data with you, you don't have to do that. So what's a typical installation look like for Ioneer these days from a data size perspective? And I'm not, I guess not the replicated data, but the actual data.

Starting point is 00:19:08 Yeah, the answer is always the one you hate, which is it depends. Because, you know, the example, one of our customers is using Citus, you know, the Postgres database. And for them, it's a fairly large database. And what they love about us is because we have that availability, when Citus would go down before,

Starting point is 00:19:33 they'd have to wait for it to rebuild everything. And with us, because of the availability and the mirroring, we're able to have them back up in minutes instead of hours. And so the size, I don't know the exact size. I mean, it's a fairly substantial thing because they're collecting all the security information. And then, you know, you look at the DevOps cases. In those cases, it's really hard

Starting point is 00:19:55 because for some of them it's production data and those can get fairly big. But most databases in that space don't get huge, right? I mean, you don't see multi-terabyte, you know, they're usually, you know, smaller databases that are built around for the application-specific use case. So you're not seeing the SAP oracles

Starting point is 00:20:18 moving into this space. Yet. Well, we haven't seen it yet. I don't know. That's a different discussion. You mentioned high availability. How would you characterize the Ioneer solution? Is it a highly available storage environment or not?

Starting point is 00:20:35 Yeah. So if you think about in the Kubernetes cluster environment, because you've got this data mirrored across a you know, a cluster of machines as, as nodes fail, especially in the public cloud, as they go away, we can keep running because we can still have quorum. If you blow the whole thing up and you have to kind of rebuild it, but, and that's, that's a cluster failure as opposed to a node failure. Node failures are much more frequent. And in that case we provide, you know, it just keeps running and then we can, you know, we can keep moving along on that.

Starting point is 00:21:08 So it is a high availability solution targeted at, you know, persistent workloads in this environment. So the metadata becomes extremely important during high availability solutions. Are you backing up the metadata? Is there some sort of, I mean, obviously the triple redundancy helps, but are you replicating it someplace else? So in the on-prem solution, it's, you know, you back up your systems, how you back them up. And we provide you to do that in the cloud. We persisted off to, you know, the Elastic Box storage in AWS, for instance.

Starting point is 00:21:45 And then you can snapshot that if you want. We haven't integrated with the other backup products as of yet, but that's the thing to start looking at it. And then on our roadmap for future work is really this ability to tier across different storage types because our microservices architecture allows us to add new storage nodes in the background. So we can have, you know, the, the ephemeral drives,

Starting point is 00:22:11 we can have EBS volumes and we can even have S3 and then the ability to move data across those as we need to. That's roadmap. It's in the future. It's not in the product today. So today would you be using a SSD storage? I mean, does the storage have to be a, the same capacity across all the storage nodes? I mean, the storage disks. And B, does it have to be the same type? Could you mix SSDs and disk drives, for instance?

Starting point is 00:22:35 And I'm not sure what the EBS terms are for those. The use case that we've seen mostly people have stopped using hard drives except for really large volume storage. And everybody just kind of solves their I.O. with ssd now because the price points are so low so we haven't really tried to mix and match ssd and hdd because that would just be the latencies are so far off so we really you know on the ephemeral drives we work with ssds typically with nvme if it's available because that's the best performance we can get out of it. And then we're working on a blended thing where we'll use that with a GP2 or whatever the storage you want to use on the EBS farms in the backside. So about how many customers, I mean, I noticed you have a free download, free online thing, but I didn't see anything about the pricing.

Starting point is 00:23:29 So maybe you can talk a little bit about the cost of such a solution. Yeah. So we typically price out like the other persistent storage solutions for Kubernetes, which is on a per node cost basis. So this is per node in the cluster, whether they're data or worker nodes? Per data node, per data. So I could, you know, I could potentially have a data node with 24 SSDs in it if I had, you know, on-prem or something like that. That's the same price as one with three? Yeah, we bent it around the idea of capacity-based pricing. And I think it's just, as you look at where the market's going, it's really, you know, this edge to private to public cloud. And we wanted a pricing model that worked across those. And we do price differently for the on-prem versus the cloud because the cloud things

Starting point is 00:24:11 typically, but it's still a per node basis. And I noticed on your website, you mentioned edge quite a lot. I hadn't considered persistent volumes being relatively useful on the edge. Are you seeing a lot of call for that? So we're getting into good conversations on it now. I don't want to be too specific on it, but if you look at the whole evolution of 5G, it's all Kubernetes based.

Starting point is 00:24:38 And as we start to see how the carriers are building out these micro data centers, if you will, themselves, well, they're actually putting little data centers out there as well. And then you look at some of the use cases where they're actually doing remote. One company I talked to, they were a mining company out of Scandinavia. And when they go into a place, they plop down a little edge data center. And then they have a bunch of drones that go up and survey the land and figure out the mining process for it. And that's local data then linked to remote data. We're not working with them yet,

Starting point is 00:25:17 but I just love their story because they they're really doing it for completely edge to the public cloud in that whole pipeline because they have to be able to run independently when they're in these remote locations, but they want to link it up to a central system because they use that data to optimize their mining processes. Right, right, right. So in that scenario where you've got this persistent Viam really being generated at the edge, but you want it to be available at the core center or something like,

Starting point is 00:25:48 or on the cloud. Would you be pushing more data than just a heat map when you replicate that data? I mean, you know, the heat map is an interesting idea, but the data doesn't actually exist in both places, right? So, so, so you're right. And that's. Well, the data exists if it's right. It's accessible, yes, but the latency is different. But you're right. In this case where you want to move data to the core,

Starting point is 00:26:13 you have to move the data, right? Because the other case we're talking about is more remote access coming in. This is really pushing data back up. And that's a roadmap thing for us to look at how we, you know, replicate data into a core as opposed to the current model, which is your data is operating in the cluster and you have, you know, you want to teleport access method out. As far as there's really no limit to how much teleportation or how many

Starting point is 00:26:41 different clusters I can teleport some ray volume out to if I want, Chia 12 or something like that. Yeah, that's right. It's really because it's a simple naming system and you basically just... And metadata replication kind of thing, yeah. Yeah. Huh. And then, so you mentioned Ddupe and you're using some sort of hash.

Starting point is 00:27:03 Is this like a SHA-256 or 512 on the data, or do you talk about that? So it's a SHA hash. I'm not going to get into the details of it, but exactly that, right? We do a secure hash of the data to come up with a unique name, and then the metadata server will map volume, offset, and timestamp into that secure hash for the data that was indexed by that. So most snapshot technologies and stuff like use copy on write or copy on, you know. We don't have to do that.

Starting point is 00:27:32 Yeah. If it exists, it exists. And the name is already there. And if it's new data, we just, we write the new data. So there's no copy on write per se. But the metadata has to be updated regardless. Yeah. So the metadata is going to track us.

Starting point is 00:27:48 So you're focusing in on the right thing and the real unique thing about this product and the implementation and a number of patents on that around how we're able to you know like the age-old problem of you know writing data and the metadata and you know how do you manage that synchronously because you don't want to end up with a dangling reference or data or something like that yeah yeah and we actually have a unique way of doing that simultaneously and maintaining that and we've got patents on that. And you mentioned earlier that you can add data nodes to the cluster on demand without having to take the system down or bring it back up or anything like that? That's right. And you can lose nodes as well.

Starting point is 00:28:42 I understand the lose nodes, but lose nodes is, you know, they were there all along and they're gone now. Now I want to add another node to this cluster. So how do you, do you spread the information now at this point or not? Yeah, we do. We spread it out. We don't

Starting point is 00:29:00 try to balance it completely, but we do start to balance it some so you get, you know, better resiliency and better utilization of different nodes. So you can keep adding them and we spread the data around on them. try to balance it completely, but we do start to balance it some so you get, you know, better resiliency and better utilization of different nodes. And so you can keep adding them and we spread the data around on them. Is the data physical address some sort of a hashing mechanism across the nodes? So, I mean, I've got a block I want to write to something. I have to decide where in the ephemeral storage or... Let's talk talk about the architecture for a little bit i think that okay go for that right because because you're you're going right where the architecture leads you to which is you know the basic model is that there's three main microservices to the product

Starting point is 00:29:36 and the metadata one we've talked a lot about there's also the um the front end that presents the csi um and we call those the pitchers. They basically catch the data from the thing and they send it off to the metadata server to map it. And then they go to the persistent storage, things we call the keepers. And the keepers will take care of storing that data. And so when you look at how this works,

Starting point is 00:30:03 the thing that interfaces the CSI is the one who knows how you're accessing the data. And he simply goes to the metadata server and says, you know, I've got this volume offset timestamp. Give me the name for it. And then he knows which keeper to go to to ask for it. And the keeper then will do the mapping to the actual physical storage. Even though the physical storage might be on some other data node. No, so the keeper is the thing that actually owns the data. They live where the data, where the storage devices are.

Starting point is 00:30:36 And so, you know, with the exceptions of things like if you have a keeper for S3, it's going to be the interface to S3 in this case. But they're the ones who actually know, they're the ones who map, you know, kind of like that name to the physical storage location. But in this case, the physical storage location is at least three different physical devices, right? The keeper in that case knows that. And do those physical devices have to be on the same node or can they be spread across different data nodes? We want them across separate nodes so we have better availability. So they are.

Starting point is 00:31:13 So the keeper actually can access the data on other nodes than the physical data nodes it's accessing. Yes. So in the case of this, it understands the topology of where the data is stored and how that works. Exactly. And so the keeper masks all that from everybody else. So anyone accessing it doesn't have to understand that. The keeper is the one who presents that. So the keeper, in my mind, is essentially a block server. It says, here's a block I want to access, and it goes and finds it and gives it back to you or something like that. Yeah, yes, exactly. So you come in with the name and say, here's the thing I want. I have the name for it. You go give me the data associated with that name and bring it back for me. And there's effectively a keeper associated with every data node?

Starting point is 00:32:01 So there's a keeper associated with every data storage type. So you can have keepers for this three-way mirrored ephemeral. You can have a keeper for EBS. You can have a keeper for S3. And they understand how that data topology is stored. So there's one keeper per tier, I would call it. Yeah, you can call it a tier. But the keeper has to be replicated across multiple nodes so in case one goes down, it can continue to go on and stuff like that. Exactly.

Starting point is 00:32:34 And does the keeper become a bottleneck in this scenario? If I've got flash devices on-prem, and I guess I would have at least two keepers, if not more, right? So the way it works is that, no, the keeper does not become the bottleneck on that, because it's simply the mapping of it. So if you've got a lot of data nodes, you understand the mapping of that, and you can say, look, I need to go get that data from this spot over here. But Ted, there's only one keeper for that solution, right? So there's a type of keeper. I didn't say that. I'm sorry. There's not necessarily one instance of keeper. There's a type of keeper.

Starting point is 00:33:16 And there's multiple instances of the keeper, yes. And so I could scale up the keepers if I thought I needed more. I could scale them down if I didn't need more, et cetera, et cetera. Absolutely. So because we built it to be kind of a good microservices thing, you can scale the catchers, you can scale the metadata servers, you can scale the keepers to match your workloads. And adding a new data node just adds it to a keeper set that it knows about.

Starting point is 00:33:49 Yes. And it can start migrating data if it needs to, but in the end, it's just there, right? Right. And that's all hidden from you. You don't have to think about that, right? The keeper just presents it to you and it will manage. So, you know, I forget what our biggest cluster is, but it's an on-site one. But it can scale pretty large, right? And so we haven't seen that big problem.

Starting point is 00:34:16 Petabyte? Pretty large. Terabytes? I think it's terabytes. I don't hold me to that one. I don't know the exact number on it. Yeah, yeah, yeah. So Kubernetes has become very important to VMware as well.

Starting point is 00:34:30 The whole, you know, I can't even think of the name now, but Tanzu services and stuff like that. Do you guys operate under any Kubernetes cluster, or is there certain ones that you work with? So it's funny, you bring up Tanzu and you say, do you operate under any? So today we work under any kind of standard Kubernetes cluster. So if you abide by that and then, you know,

Starting point is 00:34:56 so if it's your own operated cluster versus even a managed one, as long as it's, you know, the standard Kubernetes cluster, we work in it because we don't have any special dependencies on it. In Tanzu, we haven't done any testing with that. We've tested on Google Cloud, on Azure, on AWS, and then on Rancher and OpenShift in the data center, as well as the upstream version. Right, right, right, right. So I saw somewhere in your blog or website that was a one-line installation thing.

Starting point is 00:35:31 How does this work? Most installations take more than one line, I'll have to say. So it's the classic, you know, there's a lot of magic hidden behind that one line in terms of, you know, that's a script that actually takes and, and, you know, pulls down from the repository, the different microservices you need and make sure they get installed and things like that. So it's, it's not, you know, one line per se,

Starting point is 00:35:57 it's a script that actually manages that for you. So the unit is one line, but there's a lot more that goes on there. And the, and the free service free service, how is that limited? Or is it limited in time or limited in space? So it's limited to the number of nodes, the number of data nodes. So the idea is, you know, like a small cluster, you can operate it for free for infinity. But if you want to start scaling it up and having a lot more data nodes in it and stuff like that, then you should come back and talk to us. But there's no functionality gates or anything

Starting point is 00:36:29 like that that's when you do this? The thing is, right, you know, for that whole model, you really want to show the value. And if you hide it behind a paywall, they can't really undercover what the product can really do. And we view it as, you know, look, if they start using it, they're going to eventually grow to where they need to do more. Talk to me about operations of the system. I mean, is it all API driven and there's no GUI at all? Or is the GUI associated with it tells you, you know, how much storage you're consuming and whether you're going to go out and run out of storage or anything like that? So the answer is it's all through a REST API and we have a UI that communicates to the product through that same REST API.

Starting point is 00:37:07 And using Grafana or something like that? We have Prometheus and Grafana in the background, and we can monitor on that. And then we use the UI is written, I think, in Angular. So you can look at and view in what you want to do. You can specify a teleportation. You can say, look, I want to

Starting point is 00:37:31 create a backtrack time, you know, time travel back on this volume through the UI, or you can do it through the REST APIs. And you're pretty well integrated in the CICD tools and stuff like that? Well, so we're working on that, right? So what we do is we found that when we get outside talking to people,

Starting point is 00:37:52 there's real interest in this. And so we've been working with, you know, we don't have a Jenkins tool or a CircleCI tool, but we do have the ability to integrate with them. And we're talking to some folks who are doing kind of DevOps in a box where they want to integrate us into their solution. I can't talk about who that is just yet. But the idea is that we can help you integrate it into your pipeline. It's not a product because everybody's pipelines are their own pipelines. So we help you integrate it into it, or we'll work with partners

Starting point is 00:38:21 who can actually do that for you. All right. You mentioned partners. You guys do anything with like managed service partners or multi-tenancy, I guess is the question that's typically asked in this scenario, right? Do you guys support multi-tenancy? And what would it even look like in Kubernetes? I have no idea. Well, that's the problem is Kubernetes doesn't really do multi-tenancy. So you have to kind of like skin that cat each time. And it depends on the managed service provider. So in the one case, we have somebody where they're actually doing multi-tenancy, but it was how they did it, we just do Kubernetes multi-tenancy. You have to kind of go look at how they solved it and what roles they defined and how they're doing that.

Starting point is 00:39:12 But yeah, you have to support it somehow. We expect to make a lot more progress on that as we go forward because we think that in this Kubernetes universe, the biggest problem people have is skill set shortage. And so they're turning to trusted advisors like managed service providers and folks to help them implement. And so it's the logical thing to work with them because they're the ones who customers are turning to for help. You know, there's the really advanced shops who know what they're doing, but by and large, the bulk of the market is really looking for help on this.

Starting point is 00:39:46 That brings up a question about professional services. Do you guys have a professional services organization that people can call on to help deploy, install, and optimize these sorts of solutions? I'm not trying to be asked. We will help you do that. We don't have an organization pre-yet, but we will build that out.

Starting point is 00:40:04 And so we handle it now. And I wouldn't say it's a professional service organization, but we see that as a need that you have to do to help people be successful. And we're a startup, so we're going to help them be successful. And I didn't see anywhere where you guys talked about open source. So I'm assuming IronEar is all proprietary code, that sort of thing. Yep. That's safe to say. Okay. Okay. I think I'm about exhausted here. Ted, anything you'd like to say to our listening audience before we close? I mean, I think that you already made the intro of like, go check out our website. If you see, you know, try it out. It's a really cool product and, you know, we'd love to have you kind of some more. No, I think, Ray, you did a good job of poking all the little spots on it.

Starting point is 00:40:49 I enjoyed it. Well, this has been great, Ted. Thank you very much for being on our show today. Thank you, Ray. And that's it for now. Bye, Ted. Bye, Ray. Until next time. Next time, we will talk to another system storage technology person. Any questions you want us to ask, please let us know.

Starting point is 00:41:08 And if you enjoy our podcast, tell your friends about it. Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.

Your Ad Here

Grey Beards on Systems - 125: GreyBeard talk K8s storage with Tad Lebeck, US CTO for ionir

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.