Grey Beards on Systems - 128: GreyBeards talk containers, K8s, and object storage with AB Periasamy, Co-Founder&CEO MinIO

Episode Date: January 27, 2022

Sponsored by: Once again Keith and I are talking K8s storage, only this time it was object storage. Anand Babu (AB) Periasamy, Co-founder and CEO of MinIO, has been on our show a couple of times now a...nd its always an insightful discussion. He’s got an uncommon perspective on IT today and what needs to … Continue reading "128: GreyBeards talk containers, K8s, and object storage with AB Periasamy, Co-Founder&CEO MinIO"

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everybody, Ray Lucchese here. Welcome to another episode of the Greybeards on Storage podcast, a show where we get Greybeards bloggers together with storage assistant vendors to discuss upcoming products, technologies, and trends affecting the data center today. This Greybeards on Storage episode is brought to you today by MinIO. And now it is my pleasure to reintroduce A.B. Periasami, co-founder and CEO of MinIO, who's been on multiple Great Beards podcasts. So, A.B., why don't you tell us a little bit about yourself and what's new at MinIO? I'm one of the co-founders and CEO.
Starting point is 00:00:46 MinIO is an object storage. And what's new, there's actually quite a bit of activities going on. I think the latest news is switching to the next gear. It's all about multi-cloud now. The adoption of Minio has grown quite quite a bit even inside the cloud and we are we are targeting basically all all the multiple all the clouds and we are going after it. So as being object storage has been kind of a secondary storage for I would say decades now. I first started talking about it back in the 2000 through 2004 time frame. There's some discussion that
Starting point is 00:01:27 object storage might become more like primary storage. Where does MinIO fit into that framework? So that's something that we have been quite clear from the beginning in terms of where we wanted to be, but the market has come a full circle around it. Now the market has aligned itself to object storage as the primary storage. But if you look, it was not like something we predicted. It was already there when we started. If you looked into the public cloud, object storage was the foundation.
Starting point is 00:01:59 Whether you store some static websites, container images, application artifacts, all the way to AML, right? Like look at like Snapchat to Snowflake. They're built on object store from EMR to even outside of AWS, from BigQuery, Azure ML, Power BI, name it. Object storage was always the foundation, always the primary storage. In fact, AWS, the whole cloud thing started with object storage as a service first, that is S3. Then came everything else. But outside of the public cloud, the industry was dominated by SAN and NAS vendors. They were in dismissal, right? They thought that
Starting point is 00:02:41 the future would be more like enterprise on the cloud, that is file block and VMs. And a file block and VM cloud is called managed service, like the traditional MSPs. That market has changed. I think now you can clearly see VMware is embracing Kubernetes wholeheartedly. The new model is containers, objects, Kubernetes is the infrastructure API standard. YAML is the definition of their stack, right? And it has become, today object storage is the primary storage.
Starting point is 00:03:12 And why this change happened, even outside of the public cloud, there are three primary use cases for SAN and NAS, databases, right, database, right? That now, look at from Kafka, Elastic, ClickHouse, name it, pretty much all the popular open source databases, old school ones, all the way from SQL Server, Vertica, Teradata, all of them, they have gone object storage. Then look at VM images, snapshots, container data images, not the container, the VM images and the database snapshots all of them have come to object storage in fact the new world there is no vms containers container images and
Starting point is 00:03:51 artifacts are on object storage always and then the last thing is all the archival data actually archival data not so much for the for the for the primary storage for For the SAN NAS, the AML data, that also came to object storage. So today, I think object storage is accepted as primary storage, not just inside public cloud, at private and edge as well. It's still a struggle seeing that in the enterprise to some extent, given the enterprise applications
Starting point is 00:04:21 proclivity for block and file. But I see object being, you know, the major storage play in any of the cloud vendors these days. It's like EBS and those sorts of things are, you know, are very rarely used anymore. It's all the data sitting on object storage and EBS is maybe temporarily used and things of that nature. Yeah. So that, Ray, you're hitting a really interesting point, which is most of like the resistance that I've seen from object storage is from traditional workflows. Let's say the data scientists are still kind of slow to pick up object because their tools don't natively speak object.
Starting point is 00:05:03 AB, I love to hear what you're seeing from like an end user perspective, legacy applications, like HPC and et cetera. You mean something like Splunk doesn't use object? No, meaning like a lot of the analytic platforms that people use at the end client. Hadoop?
Starting point is 00:05:21 Yeah. Not Hadoop. Hadoop is definitely, you can get block storage. I'm talking about when people connect to file shares and run traditional analytics applications. That's where
Starting point is 00:05:35 I'm seeing R workloads, etc. Actually, Ray answered that question in just two words. You mean Splunk doesn't support object store? Actually, Splunk's smart store is S3 API foundation. Splunk doesn't know how to talk to their SAN or NAS. In fact, Splunk engineer, if you talk to, they will tell even the Splunk hot tier, don't put it on SAN and NAS. The hot tier is more of a primary cache with some persistence. They make replicated copies of it on local drives. But for all
Starting point is 00:06:05 persistence, SmartStore is the way to go. And SmartStore is ObjectStore S3 API. Every one of the analytics stack today, if you look at from old school established players, all the way to the most modern ones, all of them have gone ObjectStore. store in fact the most successful one there that replaced sand and nas was htfs and htfs is now coming to object store for a while they were using s3a adapter to make object store look compatible to uh to the adobe applications right so it gave file interface hcfs file interface to object store so the the Hadoop applications, Hive and everything, you didn't have to rewrite. But nowadays, if you look at all these modern AML, like Kubeflow, which is the data pipeline standard for Kubernetes, uses mini SDK to talk to S3 compatible
Starting point is 00:06:57 object store. So object store is now the standard for all analytics. Maybe it's the question of definitions. What do you mean by analytics, Keith, versus, you know, I look at analytics as big data kind of solution. So if I'm using R, if I'm using some type of... R, which is, okay,
Starting point is 00:07:14 scientific language. Scientific language. It's like HPC. Like HPC has been extremely resistant to object storage because those users, those data scientists, because they're still data scientists, are not necessarily using Hadoop. They're not using modern applications. They're using the legacy tools they've always used. But I mean, R and Python and all these
Starting point is 00:07:36 other solutions that do data science, I mean, they have object storage APIs, don't they? Yeah, they have object storage APIs, but again, it's a question of where does the storage live today? If the storage doesn't live in object storage, you have to either move it from your file system based. It's a workflow question, like what comes first, the chicken or the egg?
Starting point is 00:07:58 I love object storage, absolutely, because it's cheap, deep, performant now, but my workloads exist. My data sets exist on FAL. That migration has been a big slow in my experience. So there are two parts to this, right? One is a data scientist downloading some CSV, like a JSON type log data or some kind of data set. They download it to their laptop and or their
Starting point is 00:08:26 workstation and they perform and the local drives sure you can run minivo on those laptops too right the application developers do it all the time but they would just store these csv files on a file and if you're running our script right it just local operating system provided this file system is just fine but where did these data sets come from? They come from a large data repository that's off an object store. And it used to be HDFS that became object store because you can access the data scientists spread across. They actually can download the data set over HTTP securely
Starting point is 00:09:00 because S3 API is just HTTP, that's fully API. HDFS to SAN NAS accessing across the cloud, forget about even WAN, right? They found that object storage is a lot more convenient and secure to do that. But for local processing, sure, local file system is just fine. And outside of this use case, the HPC market predominantly is MPI-based workload. They have their own
Starting point is 00:09:27 highly optimized MPI-IO and the Lustre-type systems had native integration with the MPI systems. Sure, those HPC community from, say, 92% to go to 94% efficiency, they have to make it twice as complex. They would because it's worth for them. And that's a market that until we see a pull, it doesn't make much sense. But the commercial HPC market is quite different from the national labs. The commercial HPC market has moved on. You're talking like bioinformatics
Starting point is 00:10:04 and things of that nature? They're all object storage nowadays. It is moving away. So I see where you're coming from, Keith, but yeah, you're right. I mean, if I'm doing an application on my laptop or something like that or in my test environment,
Starting point is 00:10:19 I probably want to look at files, but that data is coming off of some object storage someplace in the cloud or someplace where it's all being gathered and stuff like that. Don't you think that's the case? Well, user habits are very hard to change, right? I think if you're coming out of an environment where you're at a university, et cetera, and you're learning this stuff new, and your first experience with interfacing with your data sets is from object storage,
Starting point is 00:10:50 you're just gonna keep that workflow. The opposite is true when you've spent years connecting to the F drive and running it off. User habits are extremely, extremely difficult to change. Whether or not the underlying technology is better or not is not really relevant as can you get the users to adopt it. And my question is more about that in adoption. And it sounds like, Avia, what you're telling me is that if you want a modern experience when it comes to data analytics. Object storage is where it's at. Yeah, yeah, yeah. And the file is actually even the data science community, once we gave them a nice file explorer that they can find their browser and download the data set and search all that,
Starting point is 00:11:37 if you give them something convenient, not necessarily a better technology, you're spot on there, right? If you just give them something even easier, they actually change their habits. But where the file will continue to play a role is the enterprise community, particularly the ones who cannot hire software engineers to modernize infrastructure as a code, that's actually still a good part of the industry. It's the traditional VMware-based virtual. Yeah, yeah, exactly. I mean, you know, I came from the block world long before files even existed. So, yeah, it's something that's embedded in what I do. But to a large extent, you know, if I'm doing AI, ML types of things, it's all based on objects someplace. I mean, it's all sucking in objects, maybe coming into a file or a CSV file or something like that and being processed there. But in the
Starting point is 00:12:30 real world, guys that are doing AI ML, it's all objects. It seems to be, I don't know. You look at Kubeflow or look at ML Ops or something like that. It seems like it's getting all its data from object stuff. The files will stay, but it's like mainframes is still a profitable business, right? Files and blogs. You know, MinIO doesn't work on mainframe yet. It probably does with a Z something, Kubernetes thing or something. I don't know, Linux.
Starting point is 00:12:57 Actually, surprisingly, there is a native port for power architecture and there is actually a startup now, I think it's Model 9 or something. They use MinIO to modernize mainframes applications to become cloud compatible. They use MinIO there. It's a market that I don't have much expertise.
Starting point is 00:13:17 I'd rather leave it to partners. Well, you mentioned the cloud earlier on, AB. I mean, it seems like the cloud has always been to a large extent object-oriented. And you made a statement there. I have to go back. It says AWS actually started with S3 alone. Is that what you're saying? Yeah, that is actually true. I never saw that. I never realized that. Yeah, actually, that's pretty interesting. I talk about that a lot. People think about EC2, one of the most common, if not still the most commonly used service outside of S3. I remember when AWS announced their or Amazon announced their cloud services, AWS, and the service was S3 and I checked it out.
Starting point is 00:14:01 I'm like, why would I ever one? I said, why would I ever use this? So take my advice with a grain of salt. I was obviously wrong about that along with probably thousands of others. But yeah, the S3 is the oldest service out there. Yeah. So what about the multi-cloud? The problem today with enterprises adoption of cloud is that occasionally AWS or Azure or Google go down. And I need to have services that now span clouds like I have services that span data centers in the past in order to keep up. Where does MinIO fit in that framework of multiple cloud operations? So when AWS goes down, when Google, all these clouds goes down,
Starting point is 00:14:47 actually I do see tweets in the community that I am safe, I'm running Minayu and I did not get affected. They do talk about all this, right? But the reality, if you look into clouds, uptime is definitely higher than most of these data centers they run themselves, right?
Starting point is 00:15:04 I don't think that uptime is a big deal. All infrastructures eventually go down here and there, but if I'm an Amazon customer, I would still feel confident that their engineers are competent enough to bring it back faster than my engineers. Of course, my engineers, I have confidence, but in general, right?
Starting point is 00:15:23 But I think the real reason why multi-cloud is happening is it's not even because they had a clear strategy today most cios have a strategy and mandate that they have to be multi-cloud ready but they even those cios end of the day they will tell we did a large contract with google and then all of us standardized on google but and you can see that most organizations have that exclusivity because they have to get that discount and they made the commitment. But the real reason why multi-cloud is happening is because the developers started building applications as microservices
Starting point is 00:15:58 and they containerized everything. When they containerized everything, they naturally brought in Kubernetes to orchestrate these containers. They detached their application stack from the public cloud, and they looked at public cloud as between asking my IT to provision Dell or Supermicro servers, here on an AML file, in a moment, I can provision these servers. They left IT and went to the public cloud. It was more of infrastructure as a service. And then they went there. Object storage was seen no different from MongoDB or Elastic or Kafka.
Starting point is 00:16:30 It's the blob store they adopted. They brought in they they brought in their software stack containerized and they pushed to the cloud. And overnight, they played my management told I have to go to Google Cloud, redeploy the software stack on Google Cloud. It happened. So multi-cloud happened as a... You think multi-cloud is there
Starting point is 00:16:51 because Kubernetes and containerization occurred? I'm not sure which is first here in this environment. But even if you're running a cluster, let's say an AWS and a cluster in Google Cloud, the data is a different question. I mean, so the data has got to be sitting someplace in this environment. Well, Ray, I want to interject. Maybe I will love your feedback on this. I think I have a $2 billion proof point here.
Starting point is 00:17:21 Just read this morning, J.P. Morgan, Jamie Dimon was on the investor call, I'm assuming yesterday, the day before this recording, talking about how JP Morgan has invested $2 billion in cloud, much of that spend going to the data center to enable cloud. So I think to AB's point, the cloud experience is what most businesses are driving towards. Yeah, IDC, STAT, 88% of enterprises want to be able to repatriate static cloud workloads, but have that cloud capability. And that starts with the underlying storage. This is the great biggest of storage. We believe in storage. And you need this storage across platforms to be able to do that.
Starting point is 00:18:14 So how does MinIO facilitate this storage residing multiple platforms and stuff like that? Yeah. So this one, and combining the previous question too, that did Kubernetes and containers drove the strategy or the other way around, right? Our bet early on was when we came in, there was already Google Cloud and Azure
Starting point is 00:18:38 starting their journey, and Amazon S3 was like the standard, right? When we saw they were, each of them were incompatible with each other. And then outside of the public cloud, HDFS, SAN, NAS, anything you look, it was a array of standards, every one of them incompatible with each other.
Starting point is 00:18:57 We knew that this was not going to be the way, in the long run, everything will look like AWS or it is AWS itself. That was inevitability. So we knew that given enough time, the problem will be solved, right? But then we can fix, we don't need to fix the compute side. Compute side, when we started from Cloud Foundry, like Mesos to Docker, Swarm, Compose, there are so many standards.
Starting point is 00:19:26 We saw Kubernetes was better positioned. It was written in the infrastructure language of choice as Go, and they understood the community sentiments better. They drove well, right? It was a better idea to declarative model. We saw that that compute side will be solved, but the data side, this industry for switching from POSIX to object API like S3 itself is a monumental task. It happened finally now, right?
Starting point is 00:19:53 At least it's happening and it's happening on all the emerging markets. But for them to go from POSIX to multiple API standards, that's not going to happen, right? That's where we saw that if instead of releasing yet another open source standard, it's okay for us to stick to S3 API. Amazon won't be unhappy about it. So we chose to promote S3 API as the standard across all cloud. And Minivo's position is Minivo can run inside AWS, on Outpost, it can run on Google Cloud,
Starting point is 00:20:24 even Rancher, Anthos, or VMware, everywhere. S3 cannot. And that was our bet. And we knew that in the long run, multi-cloud will be inevitability, and we would be able to help the community at a giant scale. If we don't do it, we would fail anyway. We focused on the application developers to help build a powerful ecosystem, and that paid off. So you solved the API problem with taking the S3 bet,
Starting point is 00:20:53 and it was a successful bet from NIO, and you guys benefited from it and more power to you. But the question still remains, AB, where's the data? The fact that I can use S3 to access it is the right thing across every cloud and every on-prem environment in the world. Absolutely. Yeah. You know, now I have plenty of data points to actually understand what's happening. When a customer, say, comes from AWS S3 to Minivo, or say, even they have minivo on-prem and they went to cloud and they
Starting point is 00:21:25 deployed minivo in the cloud they actually don't move the data around because the new data that's getting produced is more than all of the historic data combined so they actually don't move the data which is expensive and time consuming they they build the new infrastructure and the new data goes there and they're all kept in silos. Some organizations choose to centralize, some organizations choose to go decentralize that model, but overall they never move data. So you're saying that the data ends up being distributed or partitioned across these multiple cloud slash on-prem environments based on what applications started in that
Starting point is 00:22:06 particular environment and what data needs they had at that point. Is that how it plays out? Correct. Because you can move the application code, but not the data, but you can't move the application code if you're stuck with the ABA. And the way it supports the multi-cloud is that you could have, let's say an application running on-prem and reference a URL, which happens to be in AWS or Google or whatever, and still access the data, right? In theory, yes, right? But then what really happens is that even if they pick one cloud and deploy it in I.O., they actually deploy in multiple regions across the world. And applications, wherever they are, they are also global. They tend to pick the one that is in
Starting point is 00:22:46 close proximity even though s3 api of minioke is https and you can access from anywhere the bandwidth costs are not the same right it's more than the storage infrastructure so maybe it sounds like what you're talking about the value necessarily isn't in building multi-cloud apps, but having a multi-cloud operating model in which you can adapt or move your workflows, whether you're developing a point of sale application in one cloud or a data analytics platform in another, the way that you address your data is not changing across public clouds. That is very accurate. Yeah, that's very accurate.
Starting point is 00:23:31 In fact, all they care is their software. It's in simple, plain terms, like in developer terms, right? They just want their applications to be containerized. And it's what VMware envisioned as software defined data center. It's more than the infrastructure layer. All they care is their application stack. In the ML file, if I can take my entire software stack and roll it out to any cloud on demand, I'm good to go.
Starting point is 00:23:55 And that's actually how they build it on Minikube, go to CI, CD environment, and then goes to production environment. That pattern has been followed always. None of these guys actually are building application connected to EKS or AKS and all their data stored on S3. They don't build application inside the cloud. They build it elsewhere. Even their day one launches on S3, born in the cloud applications, those applications are not built in the cloud. And we find that all they care is their software stack to be independent of any cloud. What made that possible was containers and Kubernetes.
Starting point is 00:24:32 Kubernetes, containers, all that stuff. You mentioned Minikube. I was going to try to run Minikube on my Macs here, Macs cluster, but that's another story. So Kubernetes is the key to multi-cloud as you see it, right? So, A.B., what you're saying is pretty much in line with what I found at KubeCon last year. I was sitting at the launch table. It's one of my favorite spots to sit during conferences, listening to this modernization of the platform team. Before, the platform team might've provided VMware vSphere
Starting point is 00:25:08 and they provided VMs. Now the platform team is like this weird mix of these traditional infrastructure operators with developers sitting inside of that team that is consuming services like MinIO or deploying services for MinIO to enable, you know, I can call it more of an enablement team that enabling the developers that's solving the business challenge. They're kind of a shim between the developer solving the business challenge and the cloud provider. Is that what you're seeing? Absolutely. You can see right here, the change to go to cloud, it cannot start with IT. The
Starting point is 00:25:49 problem is that the cloud is incompatible with the enterprise IT, right? File block VM, if you take that stack and retrofit in the cloud, it doesn't run or it runs very poorly. So you have to involve developers to, it's not just a matter of automation, right? Like Chef and Puppet tried that. It's not that case. This time they have to involve developers to it's not just a matter of automation, right? Like Chef and Puppet tried that. It's not that case. This time they have to rewrite the application to go cloud native. Often they're finding that it's cheaper to rewrite than
Starting point is 00:26:13 retrofit. This time, all the organizations that have become cloud ready, they involve developers into the mix where IT became ops-centric, DevOps-centric. They worked hand in hand. Those organizations succeeded. The rest who resisted cloud actually and resisted, whoever claimed that those are developer tools are complicated,
Starting point is 00:26:33 what we found was there was a wall between IT and the development team. The development team went to cloud nevertheless. I never looked back. So, I mean, you mean, taking an enterprise application and making a cloud version of it, there's this lift and shift discussion or refactoring and reimplementation kind of thing, or redesign altogether, I guess, solution. So you see, does lift and shift work or is it not? It doesn't. Actually, the biggest proof is VMware itself tried in their first version of their cloud was VMware as a service, right? They took VMware.
Starting point is 00:27:13 I give you same VMware as a hosted offering that looked like just outsourced data center. That's not what cloud is. That's where customers basically say their version of cloud is it has to look like aws right the aws experience that's what i think google and microsoft understood and they they did not give the same same old software as a service they it it's not just about automation right it's fundamentally incompatible meaning throw away all the legacy the biggest advantage of cloud is break the legacy systems, throw them away.
Starting point is 00:27:46 We can build modern infrastructure like how we built it for ourselves. Like that was Amazon's message that resonated. You completely took that value away. And if you brought back legacy, then it's no better. We have to have a lengthy discussion about that offline. So where does opinionated solutions fit into this multi-cloud environment? It was a word I almost had to look up when I saw that.
Starting point is 00:28:19 I think the opinionated is too broad, right? If opinionated is in the form of stack, like we talked about LAM stack in the dot-com times, still people try to do this. I recently, last week, I came across Merck, some stack like that. I was like, what is that? I didn't even know what it was.
Starting point is 00:28:39 Then it was like MongoDB, something. But what I find that the opinionated stacks in the form of pass it never worked it's something that even amazon and other cloud players gradually increased providing the instead of giving you opinionated stack like google actually the very first version was based on python based a google app engine type model right it was more closer to pass it did not work it in the what we are finding consistently all the time was the the developers don't like opinionated stack they they they want building blocks so they can compose
Starting point is 00:29:13 their own application infrastructure stack but for this is something it this this this topic alone requires like a whole day discussion that you can see that the biggest success ever happened in the PaaS world is Heroku. And it's not a big one, right? So it was a small exit. It's still there, but PaaS never worked because developers don't like opinionated stacks. But having said that, we like being opinionated
Starting point is 00:29:39 in the sense when, like early on when we started Minio, a community was like hey why not use swift api because that's open source why are you promoting s3 api s3 api is not even a standard and it's proprietary my point was pick one you want swift api or you want s3 api i'm not going to do both and the answer was very clear sometimes you will come and ask actually they still ask why don't you add nfs api give file block and object and i still tell them the same thing that if i it's if i added a legacy protocol the all the advantage of s3 api is gone the advantage of s3 api the reason it is
Starting point is 00:30:18 incompatible is because posix is a legacy if i POSIX++, it's actually minus minus. I'll end up giving you a MediaCard object storage and a terrible file system. Not because I don't know how to build a file system that was actually don't care about how you implement the specifics of details. They just kind of want to consume storage. How are you seeing that being kind of validated in the market with developers? So developers, when they approach storage, when they approach Kubernetes, they want to provision storage. Like what are some of the pain points they're realizing?
Starting point is 00:31:12 Like, oh, the regular Kubernetes providers are not enough. Oh yeah, that's actually, there is a lot of confusion in this topic, right? I'll tell you from two different angles. The one, if you talk to the actual consumers of the storage itself, they actually don't even call it storage.
Starting point is 00:31:30 Most of these, they are all developers who are dealing with data. They look at it as a data store. Minivu, for them, is no different from MongoDB or Elasticsearch. It's just that if you are talking about metadata type data that you want powerful query interface, you would put in a database. If you have blob data and you want lots of persistence, you would put in an object store. They look at MinIO as an object store,
Starting point is 00:31:54 and that is how the consumers of the cloud, who are the application builders, data engineers, data architects, AI, ML, data scientists, they locate it as just a data store period. But if you talk to the infrastructure people there, particularly if you talk to the storage data architects, AML data scientists, they look at it as just a data store period, right? But if you talk to the infrastructure people there, particularly if you talk to the storage vendors, they brought in SAN and NAS and then wrote a CSI adapter and they all want to look cool. These are same old appliances. They suddenly become the Kubernetes ready and they claim that they are Kubernetes native storage. So this is all the persistent volume stuff that was through CSI and all that stuff. CSI. But that's not what actually
Starting point is 00:32:27 the Kubernetes storage is about, right? The application, there is a big disconnect there and a lot of confusion too. Every one of these modern distributed data stores, if you look at who wants this, the traditional SAN, NAS, in the cloud, SAN and NAS are are considered legacy it's only meant to bring legacy applications that cannot be rewritten as a stop gap by yourself sometime that's when you
Starting point is 00:32:51 will go for efs or ebs otherwise imagine like snowflake written in ebs or efs they would not have started only right they that the csi providers are meant to give you legacy compatibility. But if you are talking about a modern application, even the databases themselves, right, that they are stateful sets, where would they store? Look at every one of the modern distributed database. They have gone scale out, and all they want is a local persistent volume. Like previously it started with host path
Starting point is 00:33:21 and then came local volumes, but even the local volumes does not have a CSI driver, which means it cannot be dynamically provisioned. And I saw that there was a problem in Kubernetes, that there are no CSI drivers to manage local drives, local JBOTs, which is what every modern distributed data store, data processing frameworks are built on, not on SAN and NAS. VMware recognized this and brought in vSAN Direct for their Tanzu environment. But outside of it, this is
Starting point is 00:33:51 actually an emerging area of discussion. OpenEBS came with local PV and I think Longhorn from Rancher also has something like this. I needed this for Minivo whether or not I solved this problem for the rest of the industry. So we wrote something called DirectPV. It's a direct persistent volume. So all you want is a bunch of local drives automatically provisioned and managed for Kubernetes through a CSI driver. So it's just a volume manager, not a storage system. Storage systems are distributed data. So DirectPV is accessed through a CSI plugin or what?
Starting point is 00:34:27 Yeah, it is actually a CSI driver. And you give DirectPV all the local drives. And when you ask, say, if I'm running, it's not just MinIU, right? So maybe I'm running Elasticsearch. And Elasticsearch makes copies of replicated copies of their data sets on their local drive. For long-term persistence, they would put it on object storage let's leave that aside just to run elastic search if you brought in San or NAS and put a scale out system on a scale up architecture it wouldn't scale and it will be inefficient all elastic
Starting point is 00:34:58 search wants is local PV but if you use just kubernetes provided local PV there is no CSI driver so you have to manually pre-create these volumes and then provision that's kind of inefficient it breaks the automation if you use direct pv when you when you provision your elastic search or kafka or anything right anything distributed when you want these volumes when when say when when elastic search says i want 10 tb on on eight nodes each, each of these eight nodes 10 TB local, you make a volume claim and DirectPV will run
Starting point is 00:35:31 your Elasticsearch containers with the- Across all the nodes, across all the storage. Yeah, yeah. So I'm trying to understand here. So let's say I have MinIO using my local storage or defined for using DirectPV for local storage. But if I want to access object storage sitting out on the web and I'm a container, I still just use a URL. I don't have to do a persistent volume claim or anything like that, right?
Starting point is 00:35:54 Yeah, you don't need it. You just use object store, S3 API, just like Redis or MongoDB or anything else, when you access data services, actually, this is an interesting segue into what is disaggregated storage? The industry talks about disaggregated storage. If you talk to the application developers, they will tell you disaggregation is between stateless microservices and stateful data services, that is, data stores. That's what they mean by disaggregation. That's what cloud talks about disaggregregation. Talk to the storage vendors. They talk about desegregation between the drive
Starting point is 00:36:29 and the storage systems, the data stores. Yeah, and the compute, right? And all that stuff. Yes, yes, yes. So if I'm using like MongoDB or something like that in a containerized application, I'm using a MongoDB API. I'm not creating a persistent volume on a Mongo database or anything like that, right? There is no... Underneath the MongoDB, MongoDB would make a persistent volume claim and then that's where...
Starting point is 00:36:55 That's where it stores its data and stuff like that. The application won't see it. Yeah. Again, Kubernetes seems to be the key to all this stuff. It is now the API of infrastructures. It's the API standard for infrastructure. Yeah, declarative and all that stuff. It's bizarre.
Starting point is 00:37:15 It has to happen, right? Otherwise, it's very hard to build. Nowadays, it's not the installation rates. How do you operate at scale? Operations means every day you are rolling out new updates. Operations has become the most important problem. And without standardization, it's going to be very hard. Yeah, yeah, exactly, exactly, exactly.
Starting point is 00:37:36 So recently, there's been some new funding for MinIO. Our series was closed. Is that true? Yeah, the existing investors preempted a term with a term sheet that Intel Capital, Pat Gelsinger himself presented a term sheet that was very humbling to receive that. Then SoftBank participated along with the existing investors. It's 103 million round at a billion dollar valuation. And it's Series B.
Starting point is 00:38:11 And we are still a small team at the time when we got the funding. We were like a 40 member team. How does that compare to the overall market? This is a crowded space. When Ray hit me up to do this sponsored episode with me, I'm like, oh, another Kubernetes storage provider. Like how does this set you apart from your competitors when it comes from a financial finances?
Starting point is 00:38:36 I never saw the amount of money you raise, the number of people you hire to be a real measure of success, right? It's still the case. You can see previously we raised only 23 million series A is the number of people you hire to be a real measure of success, right? It's still the case. You can see previously we raised only 23 million Series A and it didn't slow us down. We were accelerating like crazy and still, like today, like around 1.1 million Docker pulls a day.
Starting point is 00:38:57 And it's not just, you can... 1.1 million? Did you say 1.1 million Docker pulls a day? Yeah, it's just from a Docker hub alone. It's not including the private repositories and all other repositories, right? I wasn't supposed to say this, but my private infrastructure that I give access to my team, I found the MinIO VM. I'm like, whoa, I don't know if Ray infiltrated my team or not. Yeah, it's hard to stop not. Yeah. It's hard
Starting point is 00:39:25 to stop at this point. It's just there. But we knew that there is only one way that this market will consolidate. I actually like to go into a crowded market because the market is established. The hardest part is actually to go do concept selling
Starting point is 00:39:41 and create a new market. It's easier to go into an established market where there are so many players selling and create a new market it's easier to go into a established market where there are so many players you create a superior product like fine superior for me is not more features or more beefy shiny it is fine craftsmanship listen to the users give them just what they want and that fine touch and finish you get to connect with the users, right? If you put that product first attitude, it's hard to go wrong. And that was the reason that we were quite confident that the market is so big 10 years from now, no one will blame me that I picked the wrong market because data is going to be everybody's problem. It is everybody's problem. It will be everybody's problem for the
Starting point is 00:40:22 foreseeable future. It kind of paid off, right? It's really simple ideas that Unfocus allowed us to get here. Yeah, yeah. You're in the right place at the right time. I'll say that much. And so what comes next in your world of MinIO? I mean, you've pretty much conquered S3 API compatibility. You're sitting here with a pot full of money.
Starting point is 00:40:49 And who are you going to go after next? So from our thinking point of view, we never saw anybody as a competition. We were eternally unsatisfied with our own creation. It's like every artist, if you look at, ask them about their past work, they're actually not happy in spite of being a big hit. We are just, we know we can do better. And good part is software,
Starting point is 00:41:17 there is always version one, version two, and version three. We can keep on improving. So that part is all I'm thinking, right? But on the other hand, now should I introduce the next product? Our skill is to actually, we are creators and product creators. We can go do Minio of this, Minio of that, and we can do all that, right? But then, on the other hand, Minio is, while we got the land grab, the commercial journey is just starting.
Starting point is 00:41:50 And if I launch a new product, it will create a branding problem. Product creation is easy. Creating a business around it and promoting the brand is very, very hard. I am better off accelerating on the commercial journey. Right now, the customers are actually coming to us because we are deeply in production across many of these enterprises. And when they come in, I think it makes more sense for me to help them run their infrastructure at scale with more ease and better security and better operational visibility. Those are more important to me. And that's actually where basically if you look at Amazon S3 versus us, one thing
Starting point is 00:42:26 Amazon themselves admit that S3 has become very complicated. And Minio's case, while the industry talks about how easy Minio is, I actually think that it is not, right? We can still, there's a long way for us, right? Yeah, make it easier. I think I will never go wrong if I do that as compared to launching new products. But I think one area I would continue to invest on the next big shift, if at all, from Minio to the next layer is what you do with the data. Now that you stored all the data in Minio,
Starting point is 00:42:56 we became the data infrastructure for your organization. What you do with the data is more important. Search type functions and unlocking the value of the data through AI machine learning functions, I think that's an area I would definitely invest. Interesting. Well, this has been great. Keith, any last questions for AB before we close? One use case that we haven't talked about has been
Starting point is 00:43:20 the standard OS images, application artifacts, snapshots, backups, etc. Like the mundane tasks that we've done in traditional storage arrays. kind of the standard os images application artifacts snapshots backups etc like the the the mundane tasks of that we've done in traditional storage arrays where's min io and providing the that type of capability so uh that it started out uh on the artifact side uh as a common use case the like from uh from jfrog to uh i think this is Harbor, container image repository, like all of them, just storing the container images, that itself, it is clearly object storage
Starting point is 00:43:53 is the backend for that. And Minerva grew quite popular there. And what we started seeing was, it's not just about container repository. They are constantly building new container images. Every day they make a new patch. How is this new commit tested? They build building new container images. Every day they make a new patch. How is this new commit tested? They build a new container and the whole stack,
Starting point is 00:44:10 whatever the code has changed, all becomes results in new containers and they get tested in a CI, CD automated framework, right? And that results in a flood of container images. Used to be VM images. That's why you needed all this copy data management, secondary data management. That has become more of now artifactory store. And that market also came to object store. Pretty much all the CICD frameworks underneath, you will find that if it's
Starting point is 00:44:36 not MinIO, it has to still be an object store. Probably if it's not MinIO, it will be some kind of public cloud. Even the models, another one that's growing out of that, the application artifacts are container images, but now we are seeing a new class of artifacts that are just machine learning models. So the models themselves are becoming not quite containers, but container-like solution or artifacts. They are kind of the data container.
Starting point is 00:45:05 The platform with that, so versioning, et cetera, with that other thing. Exactly. Yeah, the whole MLOps thing is all about that. All right, AB, anything else you'd like to say to our listening audience before we close? I think the questions were great. I enjoyed the discussion.
Starting point is 00:45:23 We touched upon everything. Okay, good. Well, A.B., thanks for being on our show today, and thank you to MINIO for sponsoring this podcast. That's it for now. Bye, Keith. Bye, Ray. And bye, A.B.
Starting point is 00:45:37 Thank you, Ray. Thank you, Keith. Until next time. Next time, we will talk to another system storage technology person. Any questions you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it. Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.