Orchestrate all the Things - MinIO, the de facto open source standard for multi-cloud storage, becomes a unicorn after a $103 Million Series B round funding. Featuring Founder AB Periasamy, CMO Jonathan Symonds

Episode Date: January 26, 2022

MinIO took an early bet in the cloud, and in Amazon's S3 becoming the de facto standard for application storage needs. The bet is paying off. Article published on ZDNet ...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. InIO took an early bet in the cloud and in Amazon's S3 becoming the de facto standard for application storage needs. The bet is paying off. I hope you will enjoy the podcast. If you like my work, you can follow Link Data. Let's start from the beginning, which is since we haven't connected before,
Starting point is 00:00:27 I would just like to ask you, Ebi, to share a few words about yourself and your background and the founder story for MinIO. I'm Ebi. I'm one of the co-founders, and there are totally three founders, Garima Kapoor and Harsha. And my background, I previously built a startup called Gluster, or Gluster File System. It's an open source distributed file system. Red Hat ended up acquiring that startup in the past. And I've been in the open source space as long as I can remember. Open source distributed computing and storage and system software. That's pretty much my background.
Starting point is 00:01:14 And the history behind me now, myself, me and Harsha, we were working on Bionics. After Gluster, we were just having fun. And Harsha was part of the Gluster team as well. Garima wanted to do a startup, and that's how all of this came together. But we put aside Bionics because you don't want to be experimenting in a startup where everything has to move fast, right? So we picked an idea that's long-term and here to stay and not only grows, it actually multiplies and grows bigger and bigger, right? So every direction you look at,
Starting point is 00:01:56 what was very clear for us was data is going to be a problem, right? That is anything you do around data is here to stay we picked the persistence layer of the data stack that is storing of all the data storing and managing all the data and the public the opportunity for us was in the public cloud the public cloud started with object storage and between different public cloud services, they are proprietary. They are stuck with each other. Like everybody is proprietary.
Starting point is 00:02:28 If you wrote application on one cloud, you cannot move it to other cloud. And then outside of the public cloud, everything was file block VM, incompatible and old school. The SAN, NAS, HTFS type technologies were incompatible with the cloud. And that was the opportunity we saw, that if data is everybody's problem across small, medium, large, every business is going to struggle with managing huge amounts of data, and they have to standardize on one piece of technology, what would that be?
Starting point is 00:02:59 It was very clear that cloud is here to stay, and cloud will reset the infrastructure standards. And the storage for the cloud is going to be the storage for everybody. And object store is the technology underneath the public cloud storage services. So what we did was we built a software that is exactly compatible with the cloud,
Starting point is 00:03:21 at least the leading cloud here, in this case, Amazon S3 cloud storage. We built that as compatible with Amazon cloud, at least the leading cloud here, in this case, Amazon S3 cloud storage. We built that as compatible with Amazon S3, but then you can run inside AWS on Google Cloud, Azure, all the way to Edge. You can run anywhere. It's a software, runs like your MongoDB or Postgres or any other data store, except it's used to store massive amounts of data. And that's basically the story. That was the idea. We built a piece of software that was very easy to use and released it to the community as open source, and it grew virally.
Starting point is 00:03:57 Yeah, in retrospect, and I have to admit in advance that I wasn't familiar with MinIO before this announcement. In retrospect, seeing what you do and how you go about it, it all seems so obvious. So one question I have to ask is, when did you actually start that? I don't think you mentioned that previously. Yeah, we officially incorporated the company in November of 2014 and then things started coming to shape in 2015. Taking that into consideration, yes, it definitely looks obvious now.
Starting point is 00:04:37 Probably didn't look that obvious back then. And this is why you are in the position you are in today. You were pioneers in a way, I guess. Yeah, it feels so, but it is true that when we started in 2014, like when I told about this problem, everyone thought that AWS will, if we are not useful inside AWS, we will not be useful anywhere else. And then I was like, inside AWS,
Starting point is 00:05:08 you already have AWS S3. Sure, it's a piece of software that can run inside AWS too. But the real problem is all the world's data, they have to standardize on one technology and one API. It did not make sense to a lot of people. It is possible that AWS, my bet was that it's possible that AWS can become the world's data center and all of the world's data goes to AWS S3.
Starting point is 00:05:33 We all can go home, right? It's easier. But it was hard for me to digest that that would be the case. My bet was that AWS will be very successful in convincing the industry the right way to build infrastructure, but it's highly unlikely that AWS will be the only infrastructure around. And if that's going to be the case, too many standards of APIs is not going to cut it. But then on the other hand, world is producing so much data. that was an obvious problem. Every major trend from AAML to name it, right, like IoT, every like autonomous driving, anything you pick, more data is coming. And it was obvious problem for me. And it wasn't, you don't have to be an expert to tell that 10 years from now, the world will be drowning in data.
Starting point is 00:06:27 It was already the case to begin with. And from 7 billion people to trillions of devices producing data, it was so clear. And for us, we didn't have to be genius to come up with this conclusion. Well, you must have done something right. I mean, besides being at the right time and the right place, you must have done something right, because then to bring this all into context,
Starting point is 00:06:54 the occasion for having this conversation is that you're getting a pretty massive round of funding, which also happens to raise your valuation to so-called unicorn status. So, if you'd like to just share the specifics around this announcement and, well, how did you get there?
Starting point is 00:07:13 And to add some color to the question, it also has to do with the fact that your main license model is AGPL, if I'm not mistaken, which is, well, let's put it that way, it's not the most common model among unicorns, let's say. Yeah. So for us, everything we do, it reflects in our brand. Our brand is our culture and it has to be built around honesty, trust and love, right?
Starting point is 00:07:48 If you have to build a long-term brand, it's really important that everything you do, you establish trust and you make your community and users fall in love with you. And in that sense, right? Like for us, when we talk about open source, it's not a marketing strategy for us.
Starting point is 00:08:04 It's a philosophy we believe in and at the same time we also have the responsibility to run a business the common thing team you see across the the open source space they look at open source as a powerful vehicle to distribute and get the adoption but then they start cannibalizing their own adoption by making some important features proprietary. They go to open core model. Then you are either open source, not fully proprietary. Even if you are completely proprietary,
Starting point is 00:08:32 you can tell your customers and community, this is it, right? You are being honest. But for me, it did not make sense. It was, for us, open source is about software freedom. And we have to be honest about it. Either you believe in it or you don't. The AGPL license allowed us to be that.
Starting point is 00:08:51 That we are actually, I would say, probably one of the few or the only one who can say that it's the exact same binary bits that a customer and a community gets. I'm not holding back any feature. Every important feature from advanced encryption, ransomware protection, name it, right? Every major enterprise feature is there in the open source product. But then how do we make money?
Starting point is 00:09:16 It's actually quite simple. AGPL license is a license that's for open source community. If you are building open source community, if you believe in software freedom equally, then we are in the same club. We all can benefit together from each other's contribution. But if you are building commercial proprietary application, you don't believe in software freedom, then those people I don't want to ignore. In fact, they are the one who can fund this. They would pay for a commercial license because they cannot open source their derivative works, their application. AGPL is actually the purest license of all.
Starting point is 00:09:50 Others tend to pick something like MIT Apache permissive license. And then the result of it is if you see every one of those companies who started with Apache license ended up as an open core making important features proprietary. That was in conflict with our core philosophy and beliefs. Okay. I think I saw, you know, looking around, doing a little bit of research before the conversation, I think I saw some reference on your website to something like object storage as a service.
Starting point is 00:10:22 And seeing that in my mind, I felt like, okay, so maybe this is part of the business model. So you actually use your own software to do precisely that offer, object storage as a service. And this is how you're able to make money while at the same time being open source with the LGPL license. So we actually don't provide service, right? So it is just a software.
Starting point is 00:10:48 It is no different from say Red Hat Enterprise Linux or MongoDB or any of these piece of software that you would pay for the software license. You download the software, run in your infrastructure. We did launch Minayo as a marketplace offering offering tightly integrated with the public cloud services. So you can easily click and deploy and run your own multi-cloud object store. You are still in control of your data and software. our community and customers deeply care that if they, if they, if, if, if it's a service, then it's entirely man, their data is managed by us. I have access to their data and it's not
Starting point is 00:11:32 something that makes them feel comfortable, right? Whereas it's a piece of software, they can read all of the source code, they can download, they can run, but managed service or a marketplace offering gives you the fine balance. It still gives you all the benefits, somewhat closer to the service model, but you are in control. The software is running in your infrastructure and your data is in your control. I cannot see your data. That's actually very important for most of these modern enterprises today. Yeah, and I think I want to just double click on that because the confusion may come from the fact that we've made it very, very simple for IT to deploy object storage as a service
Starting point is 00:12:12 to their developer community. And so with just a handful of clicks from the MinIO console, they can now deploy object storage as a service. Where heretofore that was almost impossible. It was a very, very complex undertaking, whereas now it's a very, very simple undertaking. You don't even know how to spell Kubernetes to basically deploy a Kubernetes-powered object storage as a service to your clients internally. And so that's one of the reasons that MinIO's adoption has accelerated really over the last
Starting point is 00:12:40 18 months is that that capability just didn't exist in the market until we brought it to the market. Okay. Yeah, I realize that a lot of the appeal and also a lot of the capabilities that the bathroom comes seems to come precisely from this deep, from this leveraging Kubernetes basically, which enables you to be operational on all clouds and public and hybrid and whatnot. Yeah.
Starting point is 00:13:13 When we started Minivo, Kubernetes was not there, right? Like Docker, Docker Compose and Swarm. Then you look around from Cloud Foundry, Mesos, like all the way to VMware, every one of these technologies were incompatible with each other again. But at that point, Minivo itself was designed in a way that it followed all the cloud-native principles, that it has to be lightweight, scalable, almost runs like a microservice, except it's a persistent state because you store data in Minivo.
Starting point is 00:13:48 But we waited for the market to consolidate. As Kubernetes grew, it was quite clear for us that they got the ideas right. And we took a serious bet on Kubernetes. But Minivo itself, it can run from Raspberry Pi to all kinds of high-end servers, even power architecture, from really small to really large deployments. Kubernetes, the declarative model and the simplicity of Kubernetes allowed the application developers to manage their containers. Kubernetes itself came as an afterthought. Once they containerized the application, they actually put Minivu along with their application containers, but then they needed
Starting point is 00:14:30 something to naturally orchestrate their containers. And Kubernetes abstracted the differences between different clouds and even the private cloud. Once you built an application on containers and Kubernetes, you can move from VMware Tanzu to Rancher, OpenShift to any one of the cloud. It was an inevitability. We watched the trend, listening to our community. We naturally gravitated to what the community was consolidating towards. It made a lot of sense for us too. Minerva itself is not really tied to Kubernetes, but it became a natural fit because both of them were cloud native.
Starting point is 00:15:10 Okay. So, yeah, it seems like the core of the bet that you took early on was precisely around S3 and building something that is compatible with S3, while at the same time supporting all the different formats that different clouds work with. And I wanted to ask if there was something special that you did to achieve that from a technical point of view. So the core bet, while it is betting against Amazon S3 versus us, but the real
Starting point is 00:15:49 story behind it was, I actually like to go into a market that's quite crowded and then disrupt it with a better product. That was really the differentiator. That was really the driving force behind it. And then how do you disrupt is really the differentiation. It's very hard to go and create new markets. Instead, if I'm a product guy, and I would rather go into a market that's very established, and the market is already there. And it's the good part about object storage market is, even though it looked like crowded by the time I started, there were many players, but I knew that it will consolidate.
Starting point is 00:16:32 But at the same time, market will have an explosive growth. If I, when it consolidated, if I'm one of the last few players standing, then it will result in a huge success. So I knew the costs were too high, but then I also knew we can beat the competition with a better product.
Starting point is 00:16:52 And what I found was across the industry, you look around, the storage guys, particularly they came with a very old school mindset. And they're all like typical file system developers. In fact, I myself came from Gluster file system, but Gluster file system is like no other file system. I took different ideas. I didn't follow the crowd.
Starting point is 00:17:14 The problem in the storage space, particularly the enterprise space, they like to build complex, shiny, heavy products. That's not what the industry wants, particularly when you have lots of data and big infrastructure, building complex products falls apart. For me, it wasn't that hard to see that differentiation.
Starting point is 00:17:33 It's not like, oh, they got these clever dedupe algorithms or compression algorithms. All of the building blocks that was needed to build a better product was just there. It was simply a matter of being focused, taking a minimalist view that I found was very hard and rare in the enterprise space. They always think about how can I do more features, ended up creating blockware. I found that if I were the user, how would such a product be really? What i expect from from from an architect we we relentlessly focused
Starting point is 00:18:08 on just the right features and simplicity was the core driving design philosophy it's a mindset that allowed us to differentiate giving a better product for us was a product that resonated with the users just the right features, being super disciplined. Like all the time, even early on, community will ask, why S3 API? Why not Swift API? Swift is, at that time, OpenStack was still not dead, right? There was even HDFS API, all of it. I was like, pick one, right? It's not like I'm a fan of S3 API, but if you, if you want Swift API, then it has to be Swift API. And I'll do only one thing, only one way to do it right. And I will not do multi-protocol.
Starting point is 00:18:52 That's not minimalism design philosophy. They would even ask why not add file API? You can do a multi-protocol file block and object. I'm like, that's a grand unified theory of storage. It sounds nice on paper, but these are completely independent problems. And the whole idea about object storage was to let go legacy stuff. If I bring back the legacy stuff, then it's not plus plus, it's minus minus. I was just disciplined to tell a lot of no, and that's what allowed us to get here. Can I add a couple of things to that? So one is we owe this giant debt of gratitude to Amazon for the S3 API, right? They changed the developer mindset in terms of how you do storage. The
Starting point is 00:19:39 second thing is that because of the open source model and because of our focus on that just one thing, i.e. object storage through the S3 API, we were able to basically harden our implementation of that S3 API in a way that no proprietary software vendor could ever do. So we have tens of thousands of customers that hammer on our S3 API every single day, and we release weekly. So if there's any issue, we know about it within minutes, and we patch it within an hour or so. But that hardening of that and the large amount of people that are hammering on it every day have given us a significant advantage and have given the community and, you know, by extension, our users and our growing user base, the confidence to know that any application is going to run against our S3 API seamlessly.
Starting point is 00:20:36 True. Since S3 and its API, it's obviously such a central thing to what you do. I was wondering if there was ever any point in time that you were concerned about? Well, you know, the famous Oracle, Google API case. If there was any point in time that you were concerned that, you know, this may be an issue at some point for you? Yeah, it would be equally a bigger issue for Amazon as well. If they change the API, they would first... First, it's an API like you mentioned. It's not a standard.
Starting point is 00:21:14 In fact, the API definitions they published, they are more like a religious text. There are many ways to interpret and their own SDK is interpreted differently between different languages. Even within a Java SDK, different versions of Java, AWS Java SDK would interpret that API differently. We got the implementation right. But then now the issue of like it is now Amazon proprietary, it's not just about the legal rights to use, right? Even Amazon can simply keep changing. Even if they never sued anybody using AWS S3 API, but then they are changing the API because they control the upstream.
Starting point is 00:22:00 They control their own standard. If they keep changing, it is enough to upset the industry. But if they keep changing, it would also hurt their own customers. So the API is pretty much converged and Amazon services support some of the oldest APIs too. So it's not going to change, but they will keep adding. If they keep adding, we also keep adding. We are neck to neck with them. And today we reached a point of API
Starting point is 00:22:26 saturation that it has all the right features that industry needs adding anything more is looking more like vanity but overall I think we got the right functionalities all the modern functionalities both of us neck-to-neck fully compatible with each other but Amazon overall has been only kind to us and to everybody else. They actually saw that their mindset is somewhat similar to us too. If S3 API becomes popular, it means it'll result in more applications being Amazon ready. And their belief is that they can build a better product. They can build the best of the breed product compared to anybody else. And if that's the case, if S3 API indeed becomes a standard, it is now almost a standard, the best, because they are the best, their belief is they are the best implementation, industry will gravitate towards AWS and they will win anyway. Our bet is that, yes, we believe that
Starting point is 00:23:22 same thing too. But unlike Amazon, Amazon thinks that they will build a service and they will be the world's data center. Our bet is that we will build a better software and we will turn every infrastructure into an object storage service and we can build a better product than Amazon S3. But at this point, so far, I would say Amazon has been only kind and they have actually been public about it. I've seen one of their VP R&D only encouraging, promoting Minayu to their own community.
Starting point is 00:23:52 They are not afraid because they are confident about their capabilities, their ability to build a better product. Overall, so far, everything is very cooperative, going well. But if any day Amazon says, legally, I'm blocking you, like what Oracle did to Microsoft and Google, if any time Amazon says, you cannot do it, it gives us the right to go publish our own standard. It's not like we don't know how to build a better standard. We certainly do. S3 API has kind of become more complex than its roots. We can build a better S3 API. And today, we have a bigger ecosystem, if not equal. And if Amazon says it turns other way, we have a huge community.
Starting point is 00:24:40 If we publish a new standard and we publish it as an open standard that anybody can use without royalty, that would actually kill S3 API. So it's not in the interest of either party to just kill each other. Instead, consolidate, grow the market and let the winner take off. Yeah, I agree with you. It looks like it's a mutual beneficial relationship the way it is. When we started, our problem was that S3 was only available inside AWS. And what was the result of that? There were only a minority of application segment that spoke S3 API.
Starting point is 00:25:16 And if this is going to be the case, we have no role to play. But first, we needed applications to emerge that speaks S3 API. We worked very hard to create the ecosystem outside of AWS which was actually larger all the way from VMware environment right now it ships with chips Mina as part of their stack vSphere to even you can convert your NetApp to HDFS to name it every one of the infrastructure can become S3 compatible. We wrote the first S3 v4 API implementation. We promoted it and created an ecosystem. Now that the ecosystem is built around Minio, the problem pretty much has gone.
Starting point is 00:25:57 S3 API availability of applications speaking S3 API outside of AWS today is no longer a problem. In fact, it has become an advantage. These applications only know how to speak S3 API, and they lost their ability to talk to SAN and NAS today. Yeah, it's true that you do support a wide variety of environments. I have to say there is one in particular that piqued my interest, and that's the HDFS system. Because of, well, I'm one of those people that kind of lived, let's say, the rise and fall of Hadoop and its file system.
Starting point is 00:26:37 So I'm wondering how much of a market pool do you see for people who have invested in Hadoop and HDFS and now want to move away from that? We consistently come across, particularly our large deals, they are actually big data replacement. Why they are large deals? Simply because our pricing model is consumption-based. More data means bigger checks. And we found that most of these organizations, where is most of the data sitting? It's actually HDFS because they were machine generated data. Machine
Starting point is 00:27:26 generated meaning some form of logs. It could be application logs, all kinds of log. It's inside machine software writes these logs. That data coming in on a continuous basis, that filled these data volumes. We are talking about petabytes and petabytes of data. Most of these organizations, if they have 10 plus petabytes of data, if you look around, you ask them, where is it sitting? They will tell you it's HDFS. It's not some high HC, like Nutanix or VxRail type HCI or some kind of NAS. It was in HDFS. HDFS actually played a major role in taking the control out of SAN and NAS and giving them a software-defined large-scale on commodity hardware, a distributed storage system.
Starting point is 00:28:14 But the problem with HDFS, as we all see, that it's now collapsing under its own weight because it was complex, it was incompatible with the cloud, and our most popular deployments at the top, like use case, is around AML, big data analytics type workloads. We are routinely replacing Cloudera stack and HDFS with MinIO, Kubeflow, Kubernetes containers, data pipeline type architecture, and there is no HDFS in that model because HDFS isn't compatible here. But I would still give credit to HDFS for opening the market and even object store, you can, there is a driver called S3A adapter. It makes object store S3 API look like it is Hadoop compatible file system and unmodified Hadoop applications
Starting point is 00:29:05 can be brought into Object Store. What is now surprising is the speed at which the AML applications are getting rewritten. They are actually speaking native S3 API and they don't go through Hadoop HCFS translation layer. It does make it more efficient. Interesting. Thank you.
Starting point is 00:29:27 I think as we're about to wrap up, just one final question. What are your future plans? What are you going to be spending this money on? I would say that this money has to be spent on educating the market at even
Starting point is 00:29:43 broader level. Object storage has plenty of capabilities for today's needs, I think, for some time to come. We'll continue to keep up the product innovation. Those are actually easier things, right? I would say making it even easier. The product, the technology operating at scale is not hard. But to understand, even writing a good policy definition requires a deeper understanding of the API today.
Starting point is 00:30:09 But that's not a good thing, right? You will see a lot of times that on Amazon, like the breaches happen simply because they wrote poorly written policies or they just gave permission to everything. It's not because S3 is insecure and it's the nature of the S3 API. I actually want to reach a point where average technician grade people with no expertise in storage, they can't spell Kubernetes. Can they operate at scale securely? That to me is better user experience.
Starting point is 00:30:39 But even bigger than that is just educating the market on what is object storage. You would find at a small scale, they would start sticking the data blobs in database, not knowing that they should be using object store or enterprise IT still stuck with file and block. And that's a big, there is a big market out there that they are not DevOps. They are not the geeks like us. And they are drowning in data too. If we educate the market at a broad scale, explaining the very basics,
Starting point is 00:31:16 what is starting with what is object storage to using them at large scale, that requires a creation of large amounts of content and distributing it to the right hands. That's for us is marketing. It's not like spending on big trade show banners and stuff, right? It's just for content creation and distribution, we would invest heavily in educating the market
Starting point is 00:31:38 about object storage at a large scale. Great. Thank you. And well, success. I wish you much success going forward and again,
Starting point is 00:31:49 congratulations for the funding. Thank you, George. Thank you, George. I hope you enjoyed the podcast. If you like my work, you can follow
Starting point is 00:31:58 Link Data Orchestration on Twitter, LinkedIn and Facebook.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.