Screaming in the Cloud - Open Source, AI, and Business Insights with AB Periasamy

Episode Date: March 14, 2024

Join Corey Quinn and MinIO's co-founder and CEO, AB Periasamy, for a look into MinIO's strategic approach to integrating open-source contributions with its business objectives amidst the AI e...volution. They discuss the effect of AI on data management, highlight the critical role of data replication, and advocate for the adoption of cloud-native architecture. Their conversation examines the insights of data replication, mentioning its pivotal role in ensuring efficient data management and storage. Overall, a recurring theme throughout the episode is the importance of simplifying technology to catalyze a broader understanding and utilization that can remain accessible and beneficial to all.Show Highlights: (00:00) - Intro(03:40) - MinIO's evolution and commitment to simplicity and scalability.(07:25) - The significance of data replication and object storage's versatility.(12:12) - Challenges and innovations in data backup and disaster recovery.(15:21) - Launch of MinIO's Enterprise Object Store and its comprehensive features.(20:50) - Balancing open-source contributions and commercial objectives.(30:32) - AI's growing influence on data storage strategies and MinIO's role.(34:33) - The shift towards software-defined data infrastructure driven by AI and cloud technologies.(39:40) - Resources and the future of tech (43:31) - Closing thoughts About A.B Periasamy:AB Periasamy is the CEO and co-founder of MinIO. One of the leading thinkers and technologists in the open source software movement, AB was a co-founder and CTO of GlusterFS which was acquired by RedHat in 2011. Following the acquisition, he served in the office of the CTO at RedHat prior to founding MinIO in late 2015. AB is an active angel investor and serves on the board of H2O.ai and the Free Software Foundation of India. He earned his BE in Computer Science and Engineering from Annamalai University.Links:MinIO: https://min.io/Kubernetes:https://kubernetes.io/AWS (Amazon Web Services): https://aws.amazon.com/Twitter: @abperiasamy 

Transcript
Discussion (0)
Starting point is 00:00:00 All you need is replication, and replication, copying the versions, the historic versions of the data, you actually solve the problem. Welcome to Screaming in the Cloud. I'm Corey Quinn. Back again after a little over a year to talk about what's new and exciting over in his world, we have A.B. Parasamy, who is the CEO and co-founder of MinIO. Thank you for joining me again. How's your year been?
Starting point is 00:00:30 It has been a wonderful year. Last year was kind of both ways, right? It saw the extreme. Customers had difficult time dealing with budgets. They even had to shrink existing deployments. One side, we saw extreme pressure on the budget. At the same time, within the same year, some of these customers actually expanded to a scale they never thought about. That was because of AI. We actually saw
Starting point is 00:01:00 both extremes, but net-net, towards the end of the year, it turned out to be our best year ever. I would say it was the worst and best year, but net-net, we delivered. This episode is brought to us in part by our friends at MinIO. With more than 1.1 billion Docker pulls, most of which were not due to an unfortunate loop mistake like the kind I'm like to make, and more than 37,000 GitHub stars, which are admittedly harder to get wrong, MinIO has become the industry standard alternative to S3. It runs everywhere.
Starting point is 00:01:34 Public clouds, private clouds, Kubernetes distributions, bare metal, raspberries, Pi, co-locations, even in AWS local zones. The reason people like it comes down to its simplicity, scalability, enterprise features, and best-in-class throughput. Software-defined, capable of running on almost any hardware you can imagine, and some you probably can't, MinIO can handle everything you can throw at it. And AWS has imagined a lot of things, from data lakes to databases. Don't take their word for it, though.
Starting point is 00:02:11 Check it out at www.min.io and see for yourself. That's www.min.io. The timing of this conversation is apt. From this recording, in about a week and a half, I'm giving a talk in Pasadena called Terrible Ideas in Kubernetes. And the week after that, I'm giving a talk here in San Francisco at SREcon about the economics of on-prem versus cloud. And this does tie directly into what you're talking about, because for the last couple of months, I've built a Kubernetes of my own in this spare room. And installed it as I was going down that particular path. And that was quite the learning experience.
Starting point is 00:02:46 The first time I did it, the entire cluster exploded and more or less ate itself. And what I learned at the time was, okay, it had nothing to do with MinIO itself and everything to do with the underlying storage subsystem, which did not work
Starting point is 00:02:59 if there was a symlink involved in its path. Instead of throwing an error like a sensible system might, it just tore the cluster down around its own ears. So that was exciting and fun. The second time went much more smoothly once I got those problems taken care of. And now I have an object store in the spare room,
Starting point is 00:03:16 just like, you know, mother intended. So that was exciting. And then there's also the economic story of talking about this at larger scale with companies that are doing this on more than a shoestring budget with a bunch of raspberries pie like I am. So it's been an interesting month just from my perspective, thinking about the confluence of those two things. Yeah, actually, that's pretty much the pattern across the industry. When we started Minivo, our idea was just this. At that time, it did not make sense to a lot of people
Starting point is 00:03:47 because we were saying, when you outgrow AWS, you will come to Minio. It almost was laughable, right? And if you see most of our large deployments, actually customers, when they reached a certain scale, cloud was pretty expensive when it comes to data infrastructure. Data was not only, it never shrink data infrastructure, it actually accelerates. New data comes in at even larger pace than all of the past data put together. At that point, then they
Starting point is 00:04:15 actually look at colo-type infrastructure. And what they experience with MinIO is the simplicity. We made it ridiculously simple that it can even run on Raspberry Pi and laptops and home NAS systems not because that was the intended use case. It has to be very simple if you want to build at
Starting point is 00:04:35 exascale. And that simplicity naturally led MinIO to a larger install base. But our focus has always been that enterprise at large scale, data is going to be heart of the business. And if you build something simple enough that a combination of, let's say, Kubernetes for compute and MinIO for data infrastructure, that will give anybody a cloud infrastructure. The simple message was in the long run, we always knew the world will look like AWS or it is AWS. And this is where
Starting point is 00:05:07 cloud was built on open source and open source further accelerated and Kubernetes replacing Docker, Mesosphere, a bunch of other technologies, even Cloud Foundry. When we started, there were many alternatives and it was chaotic, but it's the nature of open source. Eventually, when the dust settled, Kubernetes took the compute and object store took the data side. And within object store, MinIO actually became the leading player and credit to its simplicity. There's a use case for everything. For example, everything that I've installed will speak to arbitrary object stores. In fact, in many of the examples I find, oh, here's how to use MinIO. And I do that for some things. And for others, it would be a wildly inappropriate fit. In fact, in many of the examples I find, oh, here's how to use MinIO. And I do that
Starting point is 00:05:45 for some things. And for others, it would be a wildly inappropriate fit. For example, oh, where do I want to back up the storage volumes to? Well, maybe not the cluster itself. That's like the snake eating its own tail. And you kind of want these things somewhere else that you could use to rehydrate it. But then I started getting cost alerts because, huh, that's starting to cost a fair bit of money for a bunch of metrics that you don't actually need. And OK, so now I want to be judicious as far as what are the volumes I really want to make sure that I can recover? Not that many. But I also don't want to back it up to itself, because if there's a small to mid-sized fire, given my intensity to be lousy with a soldering iron, which is always a possibility, I want to have the ability to go forward. But having something locally that isn't, I know,
Starting point is 00:06:30 charging per API call or winding up metering what I do in strange ways in a test account is liberating in a very strange way. It means I don't have the same, I guess, economic sort of Damocles hanging over my head. But there are other problems with it. I have to think about disk capacity. I have to think about nodes dropping out of a cluster. Whereas with cloud, I don't have to worry about those specific things. I have other problems that I get to worry about.
Starting point is 00:06:59 So it's very much a trade-off. And there's no one path that's going to work for absolutely everyone. But I like the ability to wind up smoothing over some of the key differences. Because historically, if you wanted an object store, you were either going to be paying a cloud provider, or you were going to be buying something relatively janky and hoping it worked. Once you wind up, you know, making sure that your storage volumes don't crumble under the load of actually having data pass through them, MinIO is pretty great at that. Yeah. And this is actually a much deeper topic,
Starting point is 00:07:28 but I'll touch upon some of the problems that the industry is facing today. The backup itself has many terms. In fact, if you even see the backup vendors, the traditional backup vendors who backed up VMs and databases, they moved on to calling copy data management, and from copy data management to cloud data management, now they are into cybersecurity and databases. They moved on to calling copy data management and from copy data management to cloud data management. Now they are into cybersecurity and AI. So that market is one side. Like other side, we are finding the volume of data is so large. If you actually look at the data lakes, the analytics data, the data that's feeding AI, that data is many folds.
Starting point is 00:08:07 It's not like just VM backups and some small-scale database backups. How do you backup object store is one question. Then how do you backup into object store is another. Replication, right? Yeah. Actually, interestingly, here's the thing. So why do people backup? Very simple thing.
Starting point is 00:08:27 If any disaster strikes, I want to be able to go back in time. So the real reason why they backup is intentionally and intentionally I deleted, overwrote, something went wrong, I want to go back. And it's point-in-time recovery. And traditionally, industry did with snapshots. The SAN NAS did not give continuous data protection. You basically did snapshots and snapshot window, anything in between you could lose the data. The snapshots, because they are read-only clones
Starting point is 00:08:56 of that particular point in time, you then take a copy of the data to a remote site, but it would work for small-scale. When you have even 10 petabytes of data, the rate at which data changes in object store, the flowing in of data, you cannot possibly sync it up. When you take snapshots, it's not good enough. What customers want is to even file by file in object store,
Starting point is 00:09:21 it's called object. Every change you can actually capture, that's something that's unique about object store that SAN and NAS cannot do. When you have billions, sometimes hundreds of billions or trillions of objects, you can exactly point in time, this time exactly how this object looked like,
Starting point is 00:09:38 you can say that with object store because every mutation is an independent version. So you got point in time recovery by default built into ObjectStore when you enable object-level versioning. Now, I can go back in time, but what if the new update I rolled out somehow corrupted stuff or application accidentally deleted it, some kind of tampering happened? Then all you need is replication.
Starting point is 00:10:03 And replication, copying the versions, the historic versions of the data, you actually solve the problem of DR, pointing time recovery with active-active synchronous replication. And you can even go multi-site replication. The enterprise still would feel comfortable if they're able to take a backup software and take a copy of the data and then
Starting point is 00:10:25 put it in some third-party system. Unfortunately, those systems for a small data set, if it's so critical, some parts of the bucket or some buckets, it's small enough, you want to do it, you can do it. When you want to back up a 100 petabyte volume, you also ask yourself that
Starting point is 00:10:41 now if this is the primary store in the cloud, they already answered this, right? There is only object store and there are different tiers of object store. Now anything you want to back up object store into that should not cost more than object store. At least equal I can understand
Starting point is 00:10:58 it can be cheap, it should be cheaper because you are not doing primary active IO on that system. Today any of the systems, whether it is tape or any, I don't want to name these vendors, but any of these secondary data management systems, if you take, they are actually more expensive than object store itself. Industry is going through a chaotic phase
Starting point is 00:11:18 because they are caught with the scale that they've never seen and the traditional backup software is falling apart. It's one of those areas that I've found where there's a lot of nuance and a lot of variance. I'm building something else unrelated to most of the stuff I talk about here, where, okay, I actually had to build a DR plan that would pass the sniff test.
Starting point is 00:11:38 And the honest answer was, is that, okay, there's not that much data we have that is not able to be reconstructed relatively easily, but we're taking that and then we're that is not able to be reconstructed relatively easily, but we're taking that, and then we're snapshotting it over to an object store, and then we're replicating those not just to another region, but to another account. And the rule is, if you have access to one,
Starting point is 00:11:55 to the production environment, you do not have access to, right access, to the failover story, an overrided delete story, and vice versa, because it's about what, at least for my use case, what if you wind up getting compromised and someone acting as you logs in? I don't want to be able to have that be a blow out the company style of story. But I don't need that to be another provider at this point. If this ever were to grow to a point where you have to start explaining to auditors why you're not, then okay, switching that over is not the hardest problem to solve for. But you want to and you have very reliable storage system, NVMe drives, all that.
Starting point is 00:12:47 It does not matter. You rolled out an update to your application, and it just started overwriting corrupted data. Many reasons this can happen. Now, the versioning of object store automatically gives this for free when every new change happens. You can actually go back in time, but that's not the problem.
Starting point is 00:13:09 What if someone intentionally went and cleaned up that bucket? Sometimes it could be a malware attack, right? Some kind of third-party system. There was a data breach, and they went and explicitly deleted old versions. This is why you have object locking. Object locking, again, is something that compared to SAN or NAS,
Starting point is 00:13:32 in SAN or NAS, you never had file-level versioning. In object store, you have object-level versioning. And each and every version, every mutation that happened ever on your object store can actually be locked. It can be locked in two ways, like compliant. That can get expensive applied to the wrong things. Yeah, the thing is, you get this for free with object store.
Starting point is 00:13:52 And it's not just specific to Minayo, right? Amazon has it too. And these are actually vetted out by third-party experts called like Cohasset Associates. They understand this deeply. They do the assessment on Amazon. They did for Minayo and they gave the assessment. They do the assessment on Amazon. They did for MinIO, and they gave the assessment. This has to be compliance
Starting point is 00:14:08 grade. And there are modes where you can say, okay, I want it to be default lock upon creation of the object or any mutation. Nobody can change this. And I would say up to six years. After that, you can do whatever you want with it. Or sometimes
Starting point is 00:14:24 you can say that as an admin, I want to be able to unlock. Now I want to be able to delete this because I know for sure this has been abandoned. This project is decommissioned and I got approval from InfoSec team to actually delete the data free up to space. Sure, admin can unlock it.
Starting point is 00:14:41 There is even a mode where the admin cannot unlock until that time comes. And that level of protection you have, and this is about every single object that every single change you did, that not just you can go back in time, you can be guaranteed that nobody can tamper with any such changes until the measured time elapsed.
Starting point is 00:15:03 It's one of those areas where it's a, it just becomes a concern that if you're not in that use case market, you don't understand or really appreciate exists. All it takes when I think I've seen it all is to talk to one more customer and suddenly my perspective on things tends to be turned on its head.
Starting point is 00:15:21 So relatedly, you have some news about a change coming out. Specifically, you're calling it the enterprise object Store, which, silly me, I sort of assumed that was the paid offering of what you already had. But apparently I am misled on that. What are you releasing? What's new and exciting? So it is an upgrade, a significant upgrade, I would say, to the paid offering. And what we saw with the enterprise was that towards last year, the last couple of years, we have been seeing the scale growing many folds.
Starting point is 00:15:52 And last year, particularly, we noticed that customers are reaching excess scale. When you run at small scale, all you need is object store and Kubernetes. But when it comes to scale, you look at it not as an object store, you look at it as a data infrastructure. And when you have a data infrastructure, you don't need Minivo. It's not just about Minivo alone. You need observability, catalog,
Starting point is 00:16:17 some global console that you can link multiple deployments, multiple tenants across multiple sites, sometimes even multiple clouds into one single console. From that to, let's say, a key management server. You turn on encryption, obviously, and then you need a key management server that can handle billions and billions of keys and also handle very high key creation and lookup per second.
Starting point is 00:16:41 So then I have all my data, but then how do I protect my data at the network level? You don't want the API to ever hit the server. Now you need a data firewall. All the firewalls out there are designed for application security. The closest thing that you have is web application firewall that understands HTTP traffic, but you still need to write specific rules. And that requires deeper understanding of data traffic, primarily S3 API, right? And then there are no data firewalls. So very soon, we realized watching customers that it's not just object store, they ended up buying a collection of products around it to actually complete the object store. We always had a definition of min in MinIO means minimalism.
Starting point is 00:17:29 And minimalism means the right quantity of something. If it is less, it's incomplete. If it's excess, it's more. It doesn't qualify as minimalism. So I wanted to keep it light. And every feature, we actually even tell customers, if you want a feature required, if you ask us a new feature, you have to give me 100 reasons why this is going to be useful for everyone. And if you ask me to remove, I would gladly do it.
Starting point is 00:17:55 But then when we looked at the object store, it was kind of incomplete when customers were deploying at scale because we were telling them to go bring these third-party components. Without it, you wouldn't be able to take... You can take MinIO to production, but operating MinIO in production, you needed these capabilities. And only then you can call this as a complete object store. So I saw those capabilities were essential part of the data infrastructure stack. So while we retain the capability for Minio to talk to any key management server, any log monitoring system, any metrics monitoring system,
Starting point is 00:18:33 any firewall out there, load balancer out there, Minio has the most widest support integration for all these third-party services, but expecting customers to go do all the integration and supporting those and integrating those also fell on our lap. And that was the one time-consuming for us. So we clearly saw that what if they were purposeful
Starting point is 00:18:56 just for Minivo and built into Minivo, you don't need to buy these custom batteries that were not designed for Minivo specifically. That's what makes this as enterprise object store. So the paying customers are getting all of these bundled in an upgrade that is the enterprise customers get all of these capabilities at no additional cost. There's a lot to be said for the approach. I find that very often in the land of Kubernetes, it's a, you can choose anything to do all of these parts.
Starting point is 00:19:26 And what that means is that there's no real golden path. Like being told which one of these things should I use and being met with it. There are so many different options. Great. I don't want to be set out to sea on an ice floe by myself if I'm having trouble with this stuff down the road. I want to have a commonly deployed approach to it. And that's one of the challenges you see with open-ended systems. Looking at all the things that Menio can do,
Starting point is 00:19:51 that's great. That is fantastic for a number of use cases. I don't need 90% of versioning. That's awesome. I don't actually need it for my use case of my test lab here. Oh, the ability to wind up doing balancing and different tiers of storage. Great. These are raspberries pie. They are, there's not a whole lot of high grade hardware versus slow grade hardware in here. It's, it's stuff that I do not need, but it's nice to know that it's there that let's be
Starting point is 00:20:16 honest, a mouse click away. Cause I am a strong proponent of click ops whenever I can be. It really does a decent job of meeting people where they are. It is open source and it has a remarkably high degree of polish for what I would consider to be a typical open source user experience. Did you find when you were building this out that there was a bit of a balancing act though to get there? Because it's a, on the one hand, you don't want to give away the thing that actually drives your revenue for free to the entire world and just hope that they'll do the
Starting point is 00:20:44 right thing and pay you because at a certain point of scale, it's hard to get companies to be philanthropic, but you also don't want to make the free experience so crappy that no one adopts it unless they're paying you. How do you strike the balance? It sounds difficult, but it's actually not that hard, right? End of the day, I always look at everything is a matter of trust and love. Brand stands for trust and love. I personally, for me, I call it free software, not open source.
Starting point is 00:21:11 But it's about software freedom and it's a philosophy. Either you believe in it or you don't. And then at the same time, Minajo is also a business, like you said. It's not a non-profit. I always believe that even non-profit requires funding to operate. A product like Mina said, right? It's not a non-profit. I always believe that even non-profit requires funding to operate. A product like Minio, it's not easy to pull off as a hobby-grade
Starting point is 00:21:31 project because if you are operating a giant data infrastructure and these are mission-critical financial information, you would not put it on a hobby-grade product, right? But then how do you build that level of resiliency? it's not like non-profits cannot do i think you can clearly see from fsf to apache foundation to linux foundation they've delivered incredible products but the for for the time crunch we have how fast we need to move we as a company i always found that it was much easier if you have unlimited time. Sure. Right. Then you can organically with several mistakes here and there, you can get there. But I found that that getting funded and running as a company also gets you to the same path.
Starting point is 00:22:16 Capitalist way of getting things done. Right. But if I believe in software freedom, then I can always be honest to the users, the community, and the customers. That is how you strike the balance. Even if it's proprietary software, if I just gave something away for free for a long time, and then suddenly one day I caught you, now you're stuck with me, now you have to pay me. If I forced them, I would lose them. So the core principles are always the same. Community are paying customers. You have a contract with them and it's built on trust. You never let them down. You can give more, but never take back what is given.
Starting point is 00:22:56 And the community for us is the heart of it. Our paying customers and community, the difference is that community is not hackers and hobbies. Our community are our future customers. paying customers and community, the difference is that community is not hackers and hobbies. Our community are our future customers. They are not playing with petabytes of data because they have some hardware lying around and they want to have fun. These are serious users. And when they reach a point in time where it is a critical part of their business and they are in production, these two points, when they click, right, they feel good that there is a commercial entity behind that when it reaches that maturity point,
Starting point is 00:23:30 they have someone to knock the door and become a paying customer. Red Hat showed that they could build a lot. They could build... At the time, Red Hat got bought by IBM. They were the largest software exit ever. Red Hat showed that open source or free software is not orthogonal. It doesn't compete with business.
Starting point is 00:23:53 It's complementary. And the same model works for us, too. As we make improvements, the paying customers, on the other hand, they have paid us. And I have an obligation to give them more value and if i my end goal is keep both of them happy and the way to do it is be honest and paying the community want source code they want all the latest and greatest and then there is a huge wide community enterprise customers they wanted to they wanted actually in frequent releases they want they don't want to see the latest source code.
Starting point is 00:24:26 But at the same time, they want mission-critical reliability, and they want some neck to choke. And also, they were paying for all these clunky proprietary products around that they need to buy to operate MinIO. So we were replacing all of that as well.
Starting point is 00:24:42 So net-net, the balance is built on trust. I think it leads to a hard series of decisions, but you're right. The things that I use in my experimentation and in other capacities tend to be what I'm most familiar with. What I recommend to people when they have questions is what has worked for me, the things I am inherently most familiar with. And if, okay, if you gate it behind a, you must first pay us at least this much money to use the product, I am extremely unlikely to have much firsthand experience
Starting point is 00:25:11 with that in any of my fun experimentation style stuff. I'm not saying I'll never encounter it, but the odds of me getting to kick the tires on it are less. And as a result, it feels like it does, it definitely highlights that there is value to the open source distribution mechanism. But then you have the other side of it where companies seem to be taking the approach of, okay, now we're going to make sure that no one else can implement it themselves with source available licenses. Which, especially in your case, seems a little on the silly side.
Starting point is 00:25:38 I don't think that AWS is going to try to use MinIO code to implement an object store. They've already got one. It works in its own specific ways, and they are not grabbing stuff off the shelf to build S3. That's just the nature of that particular beast. So it's a delicate balancing act, but I think you've done a great job of mostly striking the right notes.
Starting point is 00:25:58 Yeah, the thing is, right, like the SSPL common class, there are a bunch of licenses. They are not open source licenses, right? And they are not free software licenses either. If I choose that, I might as well say I am 100% proprietary. I will never mislead our community saying we are open source, but when it comes to actually trying it out, oh, wait a minute, you have these restrictions.
Starting point is 00:26:21 Those are proprietary licenses. There won't be any confusion. No one will be mad at us if you say, hey, MinIO is proprietary, right? But if I cause this ambiguity that MinIO is open source, but they look into the details of what license I have, and that is actually a proprietary license, that's where the friction is. And also that if you start as open source and you then one day suddenly change the license and go proprietary, that also you're taking back something you promised them.
Starting point is 00:26:48 They put their trust on you and you misled them, you let them down. That also causes friction. So you'd never take back what you gave them. And that is where the free software and open source approved licenses guarantee that. And maybe in the future, let's say there are companies that adopt these open source and licenses guarantee that. And maybe in the future, let's say there are companies
Starting point is 00:27:06 that adopt these open source and free software licenses, they can add new capabilities, not give back. Or in the future, they may even make it completely proprietary, but the community has the ability to fork and keep maintaining it, right? The thing is that you never take back something that you granted.
Starting point is 00:27:23 If you change the license like that and make it proprietary, that's point of friction. And also the other point of friction is the ambiguity that you call a proprietary license as open source license. I actually find that Apache license and AGPL license, they all have their own strengths and weaknesses. AGPL license for the server and the SDKs and everything, Apache license that works out great. And for the paying customers, that some of these enhanced capabilities are under proprietary license.
Starting point is 00:27:54 And I have no problem saying that it is proprietary. Community never gets upset about that. Other factors that also the industry tend to play with is what you give to the community. They want community as a... The come-in is like baiting them, right? You would classify these as baitwares. If you gave community something just enough
Starting point is 00:28:16 that they came in and you got them hooked, now essential features, like even Active Directory integration, it's very easy for them to just go. One engineer over two, three weeks can actually pull up an active directory integration. In the past, it was hard to do because no one had these code in the open source product.
Starting point is 00:28:36 MinIO and now other projects also have similar integration. You can understand how this works and contribute that capability. Now, what should I do? I should take those features, right? In Minivo's case, we only implemented all of these nice enterprise capabilities, the ones that you were talking about that even from multi-site
Starting point is 00:28:56 active-active replication is not even there on AWS, like instant active-active synchronous replication. From that to high-performance erasure go to all the enterprise-grade encryption capabilities, even the object locking that we were talking about, everything is available to the community. And when they go to production,
Starting point is 00:29:14 they were anyway buying these bunch of proprietary software to complete the infrastructure stack. And now I'm giving them so much more value replacing those proprietary components with the enterprise object store. So there was a nice balance. If I treated my community as if I just want to bait them, trick them into using MinIO, I would lose them. I'm better off going completely proprietary, educating the industry, hey, this is what I am. If I never mislead them, there are proprietary software vendors
Starting point is 00:29:47 doing just fine in the market. They never misled the community, right? So it's the same thing here. And I take some of this inspiration from the consumer products. Like if you look at whether it's YouTube or many of these products, they entirely disrupted the media industry,
Starting point is 00:30:04 the cable industry. And if they treated the user base as a way to just bait them, like the model as a bait, nobody would be using YouTube-like systems. They understood the delicate balance. They are an essential part of your system. And what you give and what you expect in return. As long as you are on the giving side, it's all good to go. One last topic I want to get into.
Starting point is 00:30:32 It seems like I can't have a podcast episode these days without touching on the zeitgeist here. But AI is absolutely something that is sucking up an awful lot of hype energy. And there's value here. It's not like crypto. It is clearly something that is sucking up an awful lot of hype energy. And there's value here. It's not like crypto. It is clearly something that is useful. But whether you're doing your own model training or whether you're enriching what you have with existing access to data, it's hard to disagree that this is a scale problem and that access to a whole bunch of data is necessary in almost every
Starting point is 00:31:02 case. And I'm seeing that increasingly be object store. What are you seeing as far as what your customers are doing? Yeah, so it's now trendy for everybody to simply put, get a.ai domain name, right? And everything has become AI. But in our case, it's more true because there is a direct impact on our revenue. The reason why some of these GPU vendors,
Starting point is 00:31:25 their valuation is skyrocketing is because they are not able to make enough GPUs that customers can buy. And as long as that supply problem is there, that demand is there, they will be valued like that. And the first wave hit the GPU, the hardware vendors
Starting point is 00:31:42 who are directly serving the customers. Customers couldn't buy enough. The second wave, it's already hitting us. That's the data side. Customers are now realizing that now I have the GPU fabric, but then I need to build a data fabric. In the end, they actually look at data as their asset. Hardware, even GPUs, every one of them, they are fully aware in just a matter of two, three years, GPUs will become a commodity. Intel and AMD, all of them coming into the race,
Starting point is 00:32:10 and even ASIC players, there will be multiple GPUs. Every cloud vendor will make their own cheaper GPUs. GPUs are bound to become a commodity. Customers are now understanding the value of the data has grown multiple folds. Previously, with just the big data and analytics, traditional data science and machine learning alone showed the value of data is many folds. Whatever you spend on infrastructure is minuscule. And that has gone to a whole new level because we finally could understand real unstructured data, not the big data semi-structured data.
Starting point is 00:32:45 This one is complex text, human language, source code, anything that is quite long. Previously, machines had no idea. This was left to creative jobs. Only humans could do it. This has unfolded a whole new level of value. And for the first time, we are seeing enterprises have more than just snapshots and VM images. This time around, they are having audio, video, all kinds of structured data documents that they would not previously store for a long time. Now, everything is valued. Every drive-through audio clip,
Starting point is 00:33:18 every conversation, every Zoom meeting, every document you can have, they are now capturing because you can easily make an LLM within hours. It can actually read billions of documents and become an expert on everything. It's like data in Star Trek can read a book in a second. It's that experience enterprises have. A direct result of that is customers now see that
Starting point is 00:33:40 data is the core asset of their business. The scale, why we are finding it exciting is it has impacted our revenue already. And we are now talking about $10 million plus deals. And these are exascale deployments. And exascale was only in the realm of national labs because they could afford, they had the talent to run that kind of infrastructure. Now it's hitting enterprise.
Starting point is 00:34:09 And for us, we are now seeing that this is not an anomaly. We are seeing a repeat pattern here. And object store is well suited to take on that market. This is where SandNAS, the days of SandNAS
Starting point is 00:34:23 and the traditional enterprise store, they are on their way out. then the drive shelves. People are using commodity hardware and using distributed file systems, SAF, or in my particular case, Longhorn. But there are ways to wind up having those volumes shared or open EBS or a bunch of different things in that direction, which, okay, well, is that going to be reliable enough? Well, it depends. If you can constrain blast radius and then layer something like MinIO on top of it that understands erasure coding or the fact you don't want every constituent volume living on the same virtual machine, you can start getting an awful lot of flexibility without massively spiking your price on top of it. And let me be very clear here, you're talking about $10 million deals. Compared to what it costs for SAN licensing at exabyte scale, there's no contest. There is. Absolutely.
Starting point is 00:35:26 In fact, when I was building Gluster distributed file system, my previous startup, right? And customers got the idea. Even investors would say, the investors would actually argue that, oh, but that's all only Yahoo and Google type applications. In fact, Facebook was also part of our community board. They were all using Glustrious, but it was dismissed. Sure, there will never be other large scale companies like that again. Yeah, and that was dismissed. The software defined data store was dismissed as those are exceptions only. And they would even call these Yahoo and Facebook type deployments as that's not enterprise. And you ask these Facebook engineers, they will tell you, we are not enterprise.
Starting point is 00:36:11 Nobody else is enterprise. You have no idea what the level of security and scale that we are dealing with. And they understood very early on that modern infrastructure is software, and you need software engineers to build and operate the infrastructure, not hardware and IT people. They understood that. And what accelerated that was the same application vendors. Amazon did this for themselves, but Amazon also saw the value in giving it to others. Cloud was born and these were all built on open source and homegrown. The software engineers were building this infrastructure.
Starting point is 00:36:44 Imagine these vendors, if AWS S3 or EBS or anything is built on top of SAN or NAS, there won't be cloud today, right? That's why cloud fundamentally differentiated itself from the traditional hosting providers like Rackspace and everybody else. Even Rackspace tried to build their own system. But the application vendors, the SaaS providers, they had the scale. They understood the importance of it. They started hiring engineers to write their own database,
Starting point is 00:37:12 their own object store. Every one of our large-scale customers tried writing their own object store. Then they saw that they are the kind of team they can partner with and save. There's also some business risk. But nevertheless right software defined data store i i always found the term the term software defined object store was itself
Starting point is 00:37:33 funny because for me it was so obvious to our engineers the customers who were in the cloud native space it was so obvious you don't talk about software-defined MongoDB, software- defined Elasticsearch, software- defined database. Database is a far more complicated product than object store. They run as just a piece of software or commodity hardware. And why would you call object store as software
Starting point is 00:37:57 defined? It is because that is the only way we could explain to the traditional enterprise IT buyers who are used to buying storage in the appliance form factor. They used to even buy compute like Spark, Sun, Solaris, right? They used to buy even compute as appliance, and they are now getting to that idea. The cloud, they understood the need and scale, and it was all software-defined from the beginning. They'd never talk about these terms. They didn't care about it. And now, finally, what we are seeing is that cloud and open source completely rewrote the enterprise landscape. All the new emerging deployments are going
Starting point is 00:38:32 containers and software-defined and on commodity hardware. What is accelerating this adoption is basically AI workloads. Both cloud and AI is pushing the software-defined trend. They don't even call it software-defined. They call it cloud-native, if at all. And the appliance-based business models are on their way out.
Starting point is 00:38:54 I think that is probably right. People are expecting a different substrate that they can build on top of with modern software engineering. And don't get me wrong. There's a very long tail of enterprises that exist that are still skeptical of virtualization, let alone cloud. But that's drying up.
Starting point is 00:39:12 It's a, at this point, if you haven't migrated to cloud or something cloudy at an enterprise, there's usually a reason for it other than, wait, what's this cloud thing I keep hearing about that worked in 2014? It doesn't work in 2024. So I'm firmly convinced that we're seeing a seismic shift here. I'm excited to see what this empowers people to build going forward.
Starting point is 00:39:31 When suddenly having deeply reliable storage is no longer something that either is in the cloud or is something that you have to build yourself from scratch and bailing wire. This is going to turn into something interesting and I'm curious to see where it goes. If people want to learn more,
Starting point is 00:39:48 where's the best place for them to go to find you? Obviously, the community. And we are always open to having any of these engagements, even if it's not the community, just write to us. There are multiple ways to reach out to us. We are here to educate, right? And this is where even we do not look at marketing as fluffy advertisements, commercials, right? It's about educating the industry.
Starting point is 00:40:10 We want to educate them the right way. Modern buyers today are sophisticated. They know what they want and they can differentiate the right kind of information and trying to be salesy and trying to get their money, right? I think if you are altruistic here, if you educated them, gave them the right information, you build a long-term trust with them. That's what businesses are about, that they become a customer not because one time they want to just buy and walk away, at least not in the data infrastructure business. They want to build a long-term relationship with us. And it starts with having these kind of
Starting point is 00:40:45 conversations and educating them even if you have no immediate need to buy or anything right you have you're not planning to engage with us commercially or you might actually end up with our competition that's quite all right any opportunity for us to educate the industry honestly right and it's it's going to help them help us. We are always open. And the part about how the industry is now shifting towards software-defined, it's also the responsibilities on us that while the customers have the need to shift to modern software-defined cloud-native type infrastructure, they also understand that if they don't do this,
Starting point is 00:41:27 their business will become irrelevant. The change that AI is bringing is happening so fast. In the next three to five years, the entire industry landscape can look different. And if they don't adopt it, they're going to be in trouble. But at the same time, the cultural shift has to happen within the organization. This is where the public cloud, despite being expensive, played an enormous role in educating the industry.
Starting point is 00:41:52 This is how your infrastructure should be. And it prepared them already. The last five years have been so crucial. It prepared them how you look at modern infrastructure run by software engineering team. And that change came at the right time. And then now the AI is starting this journey. At the same time, vendors like us, right, we have a responsibility. If I told them that you replace your appliance with software,
Starting point is 00:42:17 and if software is not giving them the same benefit or better, they would not shift. It goes both ways. They want to change. I think customers are in the mood to goes both ways. They want to change. I think customers are in the mood to change. They understand the urge to change. But if we make it harder for them to change, this change won't happen. This will result in a bubble burst, right? We saw this happen in the past. Anytime new big technologies come, there is a gold rush, and then eventually leads to a collapse.
Starting point is 00:42:51 AI is real, but I think if we don't get this right, the wheels will come off. This is an irreversible trend. It will happen. But I think the trend around that if everything is software, and if customers are unprepared for it, and if the infrastructure is complicated, they just wouldn't adopt it. There will be few giant winners consolidating the market, which is bad for the whole industry. This is where Kubernetes, even the open source community and all these projects have an important role. Work towards simplicity. Someone with the average technician grade skill can adopt and operate an infrastructure. If you don't do it, it won't transition. Yeah, and I think that's a good perspective to have on it.
Starting point is 00:43:30 I really want to thank you for taking the time to speak with me today. I really do appreciate it. And I look forward to hearing next year what you come up with by then. Yep, I enjoy these conversations and it's great to be here. Thank you for having me. Of course. A.B. Apiriasamy is the co-founder and CEO of MinIO.
Starting point is 00:43:47 This promoted guest episodes have also been brought to us by MinIO. And I'm cloud economist, Corey Quinn. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that one day will get expunged because that platform isn't using Object Store to store the comments.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.