Screaming in the Cloud - Making Open-Source Multi-Cloud Truly Free with AB Periasamy

Episode Date: March 28, 2023

AB Periasamy, Co-Founder and CEO of MinIO, joins Corey on Screaming in the Cloud to discuss what it means to be truly open source and the current and future state of multi-cloud. AB explains ...how MinIO was born from the idea that the world was going to produce a massive amount of data, and what it’s been like to see that come true and continue to be the future outlook. AB and Corey explore why some companies are hesitant to move to cloud, and AB describes why he feels the move is inevitable regardless of cost. AB also reveals how he has helped create a truly free open-source software, and how his partnership with Amazon has been beneficial. About ABAB Periasamy is the co-founder and CEO of MinIO, an open source provider of high performance, object storage software. In addition to this role, AB is an active investor and advisor to a wide range of technology companies, from H2O.ai and Manetu where he serves on the board to advisor or investor roles with Humio, Isovalent, Starburst, Yugabyte, Tetrate, Postman, Storj, Procurify, and Helpshift. Successful exits include Gitter.im (Gitlab), Treasure Data (ARM) and Fastor (SMART).AB co-founded Gluster in 2005 to commoditize scalable storage systems. As CTO, he was the primary architect and strategist for the development of the Gluster file system, a pioneer in software defined storage. After the company was acquired by Red Hat in 2011, AB joined Red Hat’s Office of the CTO. Prior to Gluster, AB was CTO of California Digital Corporation, where his work led to scaling of the commodity cluster computing to supercomputing class performance. His work there resulted in the development of Lawrence Livermore Laboratory’s “Thunder” code, which, at the time was the second fastest in the world.  AB holds a Computer Science Engineering degree from Annamalai University, Tamil Nadu, India.AB is one of the leading proponents and thinkers on the subject of open source software - articulating the difference between the philosophy and business model. An active contributor to a number of open source projects, he is a board member of India's Free Software Foundation.Links Referenced:MinIO: https://min.io/Twitter: https://twitter.com/abperiasamyLinkedIn: https://www.linkedin.com/in/abperiasamy/Email: mailto:ab@min.io

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode is sponsored in part by our friends at Chronosphere. When it costs more money and time to observe your environment than it does to build it, there's a problem.
Starting point is 00:00:40 With Chronosphere, you can shape and transform observability data based on need, context, and utility. Learn how to only store the useful data you need to see in order to reduce costs and improve performance at chronosphere.io slash cory-quin. That's chronosphere.io slash cory-quin. And my thanks to them for sponsoring my ridiculous nonsense. Welcome to Screaming in the Cloud. I'm Corey Quinn. And I have taken a somewhat strong stance over the years on the relative merits of multi-cloud and when it makes sense, when it doesn't.
Starting point is 00:01:21 And it's time for me to start modifying some of those. To have that conversation and several others as well with me today on this promoted guest episode is A.B. Aparasamy, CEO and co-founder of MinIO. A.B., it's great to have you back. Yes, it's wonderful to be here again, Puri. So one thing that I want to start with is defining terms. Because when we talk about multi-cloud, there are, to my mind at least, smart ways to do it and ways that are frankly ignorant. The thing that I've never quite seen is, it's Greenfield, day one, time to build something. Let's make sure we can build and deploy it to every cloud provider we might ever want to use. And that is usually not the right path. Whereas
Starting point is 00:02:06 different workloads in different providers, that starts to make a lot more sense. When you do mergers and acquisitions, as big companies tend to do in lieu of doing anything interesting, it seems like they find, oh, we're suddenly in multiple cloud providers. Should we move this acquisition to a new cloud? No, no, you should not. One of the challenges, of course, is that there's a lot of differentiation between the baseline offerings that cloud providers have. MinIO is interesting in that it starts and stops with an object store that is mostly S3 API compatible. Have I nailed the basic premise of what it is you folks do. Yeah, it's basically an object store,
Starting point is 00:02:45 Amazon S3 versus us. It's actually, that's a comparable, right? Amazon S3 is a hosted cloud storage as a service, but underlying technology is called object store. Minivo is a software and it's also open source. And it's the software that you can deploy on the cloud, deploy on the edge, deploy anywhere. And both Amazon S3 and Minivo are exactly S3 API compatible.
Starting point is 00:03:08 So drop-in replacement, you can write applications on Minivo and take it to AWS S3 and do the reverse. Amazon made S3 API a standard inside AWS. We made S3 API a standard across the whole cloud, all the cloud edge everywhere, rest of the world. I want to clarify two points because otherwise I know I'm going to get nibbled to death by ducks on the internet. When you say open source, it is actually open source. You're AGPL, not source available or we've decided now we're going to change our model for licensing because, oh, some people are using this without paying us money as so many companies seem to fall into that trap. You are actually open source and no one reasonable is going to be able to disagree with that definition.
Starting point is 00:03:49 The other pedantic part of it is when something says that it's S3 compatible with an API basis, like the question is always, does that include the weird bugs that we wish it wouldn't have? Or some of the more esoteric stuff that seems to be a constant source of innovation. To be clear, I don't think that you need to be particularly compatible with those very corner and vertex cases. For me, it's always been the basic CRUD operations. Can you store an object? Can you give it back to me? Can you delete the thing? And maybe an update, although generally object stores tend to be atomic. How far do you go down that path of being, I guess, a faithful implementation of what the S3 API does? And at which point do you decide that something is just honestly lunacy and you feel no need to wind up supporting that?
Starting point is 00:04:37 Yeah, the unfortunate part of it is we have to be very, very deep. It only takes one API to break. And it's not even like one API we did not implement, one API under a particular circumstance, right? Like even if you see like AWS SDKs, right? Java SDK, different versions of Java SDK will interpret the same API differently. And AWS S3 is an API. It's not a standard. And Amazon has published the REST specifications, API specs, but they are more like religious text. You can interpret it in many ways. Amazon's own
Starting point is 00:05:09 SDK has interpreted this in several ways, right? The only way to get it right is you have to have a massive ecosystem around your application. If one thing breaks, today if I commit a code and it introduces regression, i will immediately
Starting point is 00:05:25 hear from a whole bunch of community what i broke there's no certification process here there is no industry consortium to control the standard but then there is an accepted standard like if the application works then it works and only way to get it right is like amazon sdks the all of those language sdks to be simpler, but applications can even use Minio SDK to talk to Amazon. And Amazon SDK to talk to Minio, now there is a clear cooperative model. And I actually have tremendous respect for Amazon engineers. They have only been kind and meaningful, reasonable partnership. Like if our community reports a bug that Amazon rolled out a new update in one of the region and the S3 API broke, they'll actually go fix it.
Starting point is 00:06:07 They will never argue, why are you using Minivo SDK? They're engineers. They do everything by reason. That's the reason why they gained credibility. shift just because so much has been built on top of it over the last 15, almost 16 years now, that even slight changes require massive coordination. I remember there was a little bit of a kerfuffle when they announced that they were going to be disabling the bit torrent endpoint in S3, and it was no longer going to be supported in new regions, and eventually they were turning it off. There were still people pushing back on that. I'm still annoyed by
Starting point is 00:06:45 some of the documentation around the API that says that it may not return a legitimate error code when it errors with certain XML interpretations. It's kind of become very much its own thing. It is a problem. We have seen
Starting point is 00:07:01 even stupid errors similar to that. HTTP headers are supposed to be case insensitive. But then there are some language SDKs will send us a certain type of casing and they expect the response to be the same way. And that's not HTTP standard. We have to accept that bug and respond in the same way than asking a whole bunch of community
Starting point is 00:07:22 to go fix their application. And Amazon's problem are our problems too. We have to carry that baggage, but some places where we actually take a hard stance is like Amazon introduced that initially the bucket policies, like access control list, then finally came IAM. Then we actually, for us,
Starting point is 00:07:40 like the best way to teach the community is make best practices the standard, the only way to do it. We make best practices the standard, the only way to do it. We have been educating them that we actually implemented ACLs, but we removed it. So the customers will no longer use it. The scale at which we are growing, if I keep it, then I can never force them to remove. So we have been planting about how certain things that, if it's a good advice, force them to do it. That approach has paid off, but the problem is still quite real. Amazon also admits that S3 API is no longer simple, but at least it's not like POSIX, right?
Starting point is 00:08:13 POSIX is a rich set of API, but doesn't do useful things that we need to do. So Amazon's APIs are built on top of simple primitive foundations that got the storage architecture correct. And then doing sophisticated functionalities on top of the simple primitives, the atomic RESTful APIs, you can finally do it right and you can take it to great length and still not break the storage system. So I'm not so concerned. I think it's time for both of us to slow down and then make sure that the ease of operation and adoption is the goal than trying to create an API bible.
Starting point is 00:08:47 Well, one differentiation that you have that, frankly, I wish S3 would wind up implementing is this idea of bucket quotas. I would give a lot in certain circumstances to be able to say that this S3 bucket should be able to hold five gigabytes of storage and no more. You could fix a lot of free tier problems, for example, by doing something like that. But there's also the problem that you'll see in data centers where, okay, we've now filled up whatever storage system we're using.
Starting point is 00:09:14 We need to either expand it at significant costs and it's going to take a while, or it's time to go and maybe delete some of the stuff we don't necessarily need to keep in perpetuity. There is no moment of reckoning in traditional S3 in that sense, because, oh, you can just always add one more gigabyte at 2.3 or however many cents it happens to be, and you wind up with an unbounded growth problem that you're never really forced to wrestle with, because it's infinite storage. They can add drives faster than
Starting point is 00:09:43 you can fill them in most cases. So it just feels like there's an economic story, if nothing else, just about governance control and make sure this doesn't run away from me. And alert me before we get into the multi-petabyte style of storage for my Hello World WordPress website. Yeah. So I always thought that Amazon did not do this. It's not just Amazon, the cloud players, right? They did not do this because they want, it's good for their business. They want all the customers' data, like unrestricted growth of data. Certainly, it is beneficial for their business, but there is an operational challenge.
Starting point is 00:10:18 When you set quotas, this is why we grudgingly introduced this feature. We did not have quotas and we didn't want to because Amazon S3 API doesn't talk about quotas, but the enterprise community wanted this so badly. And eventually we yielded and we gave, but there is one issue to be aware of, right? The problem with quota is that you as an object storage administrator, you set a quota, like say this bucket, this application, I don't see more than 20 TB. I'm going to set a 100 TB quota. And then you forget it.
Starting point is 00:10:50 And then you think in six months they will reach 20 TB. Reality is in six months, they reach 100 TB. And then when nobody expected, everybody has forgotten that there was a quota set in place, suddenly applications start failing. And when it fails, it doesn't, even though the S3 API responds back saying that insufficient space, but then the application doesn't really pass that error all the way up. When applications fail, they fail in unpredictable ways. By the time the application developer realizes that it's actually object storage ran out of space, they lost time and it's the downtime. So as long as they have proper observability, because Minerva also has the
Starting point is 00:11:24 observability that it can alert you that you are going to run out of space soon. If you have those systems in place, then go for a quota. If not, I would agree with the S3 API standard that it's not about cost. It's about operational unexpected accidents. Yeah, at some level, we wound up having to deal with the exact same problem with disc volumes where my default for most things was at 70 i want to start getting pings on it and at 90 i want to be woken up for it so for small volumes you run up with a runaway log or whatnot you have a chance to catch it and whatnot and for the giant multi-petabyte things okay well why would you
Starting point is 00:12:03 alert at 70 on that, because procurement takes a while when we're talking about buying that much disk for that much money. It was a roughly good baseline for these things. The problem, of course, is when you have none of that and, well, it got full, so oops-a-doozy. On some level, I wonder if there's a story around soft quotas that just scream at you
Starting point is 00:12:21 but let you keep adding to it, but that turns into implementation details and you can build something like that on top of any existing object store if you don't need the hard limit aspect. Actually, that is the right way to do. That's what I would recommend customers to do. Even though there is hard quota, I will tell you, don't use it, but use soft quota. And the soft quota, instead of even soft quota, you monitor them. On the cloud, at least you have some kind of restriction that the more you use, the more you pay. Eventually, the month-end bills, it shows up.
Starting point is 00:12:49 On Minio, when it's deployed on these large data centers, it's unrestricted access. Quickly, you can use a lot of space. No one knows what data to delete, and no one will tell you what data to delete. The way to do this is there has to be some kind of accountability. The way to do it is actually have some chargeback mechanism based on the bucket growth and the business units have to pay for it. The IT doesn't run for
Starting point is 00:13:12 free. IT has to have a budget and it has to be sponsored by the applications team. And you measure, instead of setting a hard limit, you actually charge them that based on the usage of your bucket, you're going to pay for it. And this is an observability problem. And you can call it soft quotas, but it hasn't been to trigger an alert in observability. It's an observability problem. But it actually is interesting to hear that as soft quotas, which makes a lot of sense. It's one of those problems that I think people only figure out after they've experienced it once, and then they look like wizards from the future who, oh yeah, you're going to run into a quota
Starting point is 00:13:48 storage problem. Yeah, we all find that out because the first time we smack into something and live to regret it. Now we can talk a lot about the nuances and implementation and low-level detail of this stuff, but let's zoom out a bit. What are you folks up to these days? What is the bigger picture that you're seeing of object storage in the ecosystem? Yeah. So when we started, right, our idea was that the world is going to produce an incredible amount of data. In 10 years from now, we are going to drown in data. We've been saying that today, and it'll be true every year. You say 10 years from now, and it'll still be valid, right? That was the reason for us to play this game.
Starting point is 00:14:25 And we saw that every one of these cloud players were incompatible with each other. It's like early Unix days, right? Like a bunch of operating systems, everything was incompatible. And the applications were beginning to adopt this new standard, but they were stuck. And then the cloud storage players, whatever they had, like GCS can only run inside Google Cloud, S3 can only run inside AWS, and the cloud players game was bring all the world's data into the cloud. And that actually requires enormous amount of bandwidth and moving data into the cloud at that scale. If you look at the amount of data world is producing, if the data is produced inside the cloud, it's a
Starting point is 00:15:02 different game. But the data is produced everywhere else. Minivo's idea was that instead of introducing yet another API standard, Amazon got the architecture right. And that's the right way to build large scale infrastructure. If we stick to Amazon S3 API, instead of introducing yet another standard, RS50 API, and then go after the world's data. When we started in 2014, November, it's really 2015 we started, it was laughable. People thought that there won't be a need for MinIO because the whole world will basically go to AWS S3 and they will be the world's data store.
Starting point is 00:15:36 Amazon is very capable of doing that. The race is not over, right? And it still could be done now. The thing is that they would need to fundamentally rethink their, frankly, usurious data egress charges. The thing is that they would need to fundamentally rethink their, frankly, you serious, data egress charges. The problem is not that it's expensive to store data in AWS. It's that it's expensive to store data and then move it anywhere else for analysis or use on something else. So there are entire classes of workload that people should not consider the big three cloud providers as the place where that data should live because you're never getting it back. Spot on, right? Even if network is free, right? Amazon makes like, okay, zero egress,
Starting point is 00:16:09 egress charge. The data we are talking about, like most of Minio's deployments, they start at petabytes, like one to 10 petabyte feels like 100 terabyte. Even if network is free, try moving a 10 petabyte infrastructure into the cloud. How are you going to move it? Even with FedEx and UPS giving you a lot of bandwidth in their trucks, it's not possible, right? I think the data will continue to be produced everywhere else. So our bet was that we will be every, instead of you moving the data, you can run MinIO where there is data. And then the whole world will look like AWS S3 compatible object store. We took a very different path. But now when I say the same story that when what we started with day one,
Starting point is 00:16:49 it's no longer laughable, right? People believe that yes, Minio is there because our market footprint is now larger than Amazon S3. And as it goes to production, customers are now realizing it's basically growing inside a shadow IT and eventually businesses realize that
Starting point is 00:17:05 bulk of their business critical data is sitting on Minio and that's how it's surfacing up. So now what we are seeing this year particularly all of these customers are hugely concerned about cost optimization and as part of the journey there is also multi-cloud and hybrid cloud initiatives. They
Starting point is 00:17:21 want to make sure that their application can run on any cloud and the same software can run on any cloud or the same software can run on their colos like Equinix or a bunch of digital reality anywhere. And Minivo as a software, this is what we set out to do. Minivo can run anywhere inside the cloud all the way to the edge, even on Raspberry Pi. It's now, whatever we started with, now has become reality. The timing is perfect for us. One of the challenges I've always had with the idea of building an application, with the idea to run it anywhere, is you can make explicit technology choices around that. For example, object store is a great example because most places you go now
Starting point is 00:17:58 will or can have an object store available for your use. But there seem to be implementation details that get lost. And for example, even load balancers wind up being implemented in different ways with different scaling times and whatnot in various environments. And past a certain point, it's, okay, we're just going to have to run it ourselves on top of HAProxy or Nginx or something like it running in containers themselves, and you're reinventing the wheel. Where is that boundary between we're going to build this in a way that we can run anywhere and the reality that I keep running into, which is we tried to do that, but we implicitly, without realizing it, built in a lot of assumptions that everything would look just like this environment that we started off in.
Starting point is 00:18:39 The good part is that if you look at the S3 API, every request has the site name, the endpoint, the bucket name, the path and the object name. Every request is completely self-contained. It's literally a HTTP call away. And this means that whether your application is running on Android, iOS, inside a browser, JavaScript engine, anywhere across the world, they don't really care whether the bucket is served from EU or US East or US West. It doesn't matter at all. So, it actually allows you by API, you can build a globally unified data infrastructure. Some buckets here, some buckets there. That's actually not the problem. The problem comes when you have multiple clouds, different teams like part M&A, even if you don't do M&A, different teams, no two data engineer would
Starting point is 00:19:25 agree on the same software stack. Then they will all end up with different cloud players and some still running on old legacy environment. When you combine them, the problem is like, let's take just the cloud, right? How do I even apply a policy, that access control policy, that how do I establish unified identity? Because I want to know this application is the only one who is allowed to access this bucket can I have that same policy on Google Cloud or Azure even though they are different teams like that employer that project or that admin if he or she leaves the job how do I make sure that that's all protected you want unified identity you want to access unified access control policies where are the encryption keys stored?
Starting point is 00:20:08 And then the load balancer itself, the load balancer is not the problem. But then unless you adopt S3 API as your standard, the definition of what a bucket is, is different from Microsoft to Google to Amazon. Yeah, the idea of the puts and retrieving of actual data is one thing. But then you have, how do you manage it? The control plane layer of the object store. And how do you rationalize that? What are the naming conventions? How do you address it? I even ran into something similar somewhat recently when I was doing an experiment with one of the Amazon Snowball Edge devices to move some data into S3 on a Lark, and the thing shows up. It presents itself on the local network as an S3 endpoint, but none of their tooling can accept a different
Starting point is 00:20:46 endpoint built into the configuration files. You have to explicitly use it as an environment variable or as a parameter on every invocation of something that talks to it, which is incredibly annoying. I would give a lot for just to be able to say, oh, when you're talking in this profile, that's always going to be your S3 endpoint. Go. But no, of course not, because that would make it easier to use something that wasn't them. So why would they ever be incentivized to bake that in? Yeah, Snowball is an important element to move data, right? That's the UPS and FedEx way of moving data. But what I find customers doing is they actually use the tools that we built for Minayu, because the Snowball appliance also looks like a S3
Starting point is 00:21:23 API-compat compatible object store. And in fact, I've been told that when you want to ship multiple Snowball appliances, they actually put Minivo to make it look like one unit because Minivo can erasure code the objects across multiple Snowball appliances. And the MC tool, unlike AWS CLI, which is really meant for developers like low-level calls. MC gives you unique code details like LSC, PR, sync-like tools, and it's easy to move and copy and migrate data. Actually, that's how people
Starting point is 00:21:51 deal with it. Oh, God, I hadn't even considered the problem of having a fleet of snowball edges here that you're trying to do a mass data migration on, which is basically how you move petabyte scale datas. A whole bunch of parallelism, but having to figure that out on a case-by-case basis would be nightmarish. That's right. There is no good way to wind up doing that natively.
Starting point is 00:22:08 Yeah. In fact, Western Digital and there are a few other players too, Western Digital created a Snowball-like appliance, and they put Minio on it, and they are actually working with some system integrators to help customers move lots of data. But Snowball-like functionality is important
Starting point is 00:22:24 and more and more customers will need it. This episode is sponsored in part by Honeycomb. I'm not going to dance around the problem. Your engineers are burned out. They're tired from pagers waking them up at 2 a.m. for something that could have waited until after their morning coffee. Ring, ring, who's there?
Starting point is 00:22:43 It's Nagios, the original Call of Duty. They're fed up with relying on two or three different monitoring tools that still require them to manually trudge through logs to decipher what might be wrong. Simply put, there is a better way. Observability tools like Honeycomb, and very little else because they do admittedly set the bar, show you the patterns and outliers of how users
Starting point is 00:23:05 experience your code in complex and unpredictable environments so you can spend less time firefighting and more time innovating. It's great for your business, great for your engineers, and most importantly, great for your customers. Try free today at honeycomb.io slash screaming in the cloud. That's honeycomb.io slash screaming in the cloud. That's honeycomb.io slash screaming in the cloud. Increasingly, it felt like back in the on-prem days that you'd have a file server somewhere that was either a SAN or it was going to be a NAS. The question was only whether it presented it to various things as a volume or as a file share. And then in cloud, the default storage mechanism, unquestionably, was object store.
Starting point is 00:23:50 And now we're starting to see it come back again. So it started to increasingly feel in a lot of ways like cloud is no longer so much a place that is somewhere else, but instead much more of an operating model for how you wind up addressing things. I'm wondering when the generation of prosumer networking equipment, for example, is going to say, oh, and send these logs over to what object store?
Starting point is 00:24:12 Because right now it's still write a file and SFTP it somewhere else, at least the good ones. Some of the crap and still want old unencrypted FTP, which is neither here nor there. But I feel like it's coming back around again. Like when do even home users wind up instead of where do you save this file to having the cloud abstraction, which hopefully you'll never have to deal with an S3 style endpoint, but that can underpin an awful lot of things. It feels like
Starting point is 00:24:35 it's coming back and that cloud is the de facto way of thinking about things. Is that what you're seeing? Does that align with your belief on this? I actually fundamentally believe in the long run, right? Applications will go SaaS, right? Like you remember the days that you used to install QuickBooks and ACT and stuff like on your data center. You used to run your own exchange servers. Those days
Starting point is 00:24:58 are gone. I think these applications will become SaaS. But then the infrastructure building blocks for these SaaS, whether they are cloud or their own colo, I think that in the long run, it will be multi-cloud and colo all combined and all of them will look alike. But what I find from the customer's journey, the old world and the new world is incompatible. When they shifted from bare metal to virtualization, they didn't have to rewrite their application. But this time, it is a tectonic shift. Every single application you have to rewrite. If you retrofit
Starting point is 00:25:29 your application into the cloud, bad idea. It's going to cost you more and I would rather not do it. Even though cloud players are trying to make the file and block, like file system services and stuff, they make it available 10 times more expensive than object, but it's just to increase some legacy applications, but it's still a bad idea to just move legacy applications there. But what I'm finding is that the cost, if you still run your infrastructure with the enterprise IT mindset, you're out of luck.
Starting point is 00:25:58 It's going to be super expensive and you're going to be left out. Modern infrastructure, because of the scale, it has to be treated as code. You have to run infrastructure with software engineers. And this cultural shift has to happen. And that's why cloud, in the long run, everyone will look like AWS. And we always said that and it's now becoming true. Kubernetes and MinIO basically is leveling the ground. Everywhere it's giving ECS
Starting point is 00:26:20 and S3-like infrastructure inside AWS or outside AWS everywhere. But what I find the challenging part is the cultural mindset. If they still have the old cultural mindset and they want to adopt cloud, it's not going to work. You have to change the DNA, the culture, the mindset, everything. The best way to do it is go to the cloud first. Adopt it. Modernize your application. Learn how to run and manage infrastructure. Then ask economics question, the unit economics. Then you will find answers yourself.
Starting point is 00:26:50 On some level, that is the path forward. I feel like there's just a very long tail of systems that have been working and have been meeting the business objective. And, well, we should go and refactor this because, I don't know, a couple of folks on a podcast said we should. But isn't the most compelling business case for doing a lot of it? It feels like these things sort of sit there until there is more upside than just cost-cutting to changing the way these things are built and run. And that's the reason that people have been talking about getting off of mainframes since the 90s in some companies. And the mainframe is very much still there. It is so ingrained in the way that they do business, they have to rethink a lot of the architectural things that have sprung up around
Starting point is 00:27:29 it. I'm not trying to shame anyone for the state that their environment's in. I've never yet met a company that was super proud of its internal infrastructure. Everyone's always apologizing because it's a fire, but they think someone else has figured this out somewhere and it all runs perfectly. I don't think it exists. What I'm finding is that if you're running it the enterprise IT style, you are the one telling the application developers, here you go, you have this many VMs and you have a
Starting point is 00:27:55 VMware license and JBoss, WebLogic and a SQL server license. Now you go build your application, you won't be able to do it. Because application developers talk about Kafka and Redis and Kubernetes Kubernetes. They don't speak the same language. And that's when these developers go to the cloud and then finish their application, take it live from zero lines of code before IT can procure infrastructure and provision it to these guys. The change that has to happen is how can you give what the developers want? Now the reverse
Starting point is 00:28:24 journey is also starting. In the long run, everything will look alike. But what I'm finding is if you're running enterprise IT infrastructure, traditional infrastructure, they are ashamed of talking about it. But then you go to the cloud and then at scale, some parts of it you want to move. Now you really know why you want to move. For economic reasons, particularly the data-intensive workloads, becomes very expensive. And expensive and at that part they go to a colo but leave the applications on the cloud so it's the multi-cloud model i think is is inevitable the expensive pieces that where you
Starting point is 00:28:55 can if you are looking at yourself as hyperscaler and if your data is growing if your business focuses on data centric business parts of the data and data analytics, AAML workloads will actually go out if you're looking at unit economics. If all you are focused on productivity, stick to the cloud and you're still better off. I think that's a divide that gets lost sometimes. People say, oh, we're going to move to the cloud to save money. It's no, you're not. At a five-year time horizon, I would be astonished if that juice were worth the squeeze in almost any scenario. The reason you go for it, therefore, is for a capability story when it's right for you. That also means that steady-state workloads that are well understood can often be run more economically in a place that is not the cloud. Everyone thinks for some reason that I tend to be, it's cloud or it's trash.
Starting point is 00:29:41 No, I'm a big fan of doing things that are sensible. And cloud is not the right answer for every workload under the sun. Conversely, when someone says, oh, I'm building a new e-commerce store or whatnot, and I've decided cloud is not for me, it's, eh, you sure about that? That sounds like you are smack dab in the middle of the cloud use case. But all these things wind up acting as constraints and strategic objectives. And technology and single vendor answers are rarely going to be a panacea the way that their sales teams say that they will. Yeah. And I find organizations that have SREs, DevOps, and software engineers running the infrastructure, they actually
Starting point is 00:30:23 are ready to go multi-cloud or go to colo because they exactly know, they have the containers and Kubernetes microservices expertise. If you are still on a traditional SAN, NAS and VM architecture, go to cloud, rewrite your application. I think there's a misunderstanding in the ecosystem around what cloud repatriation actually looks like. Everyone claims it doesn't exist because there's basically misunderstanding in the ecosystem around what cloud repatriation actually looks like. Everyone claims it doesn't exist because there's basically no companies out there worth mentioning that are,
Starting point is 00:30:50 yep, we decided the cloud is terrible, we're taking everything out, and we are going to data centers at the end. In practice, it's individual workloads that do not make sense in the cloud. Sometimes just the back of the envelope analysis means it's not going to work out. Other times during proof of concepts and other times as things have hit a certain point of scale where an individual workload being pulled back makes an awful lot of sense. But everything else is probably going to stay in the cloud. And these companies don't want to wind up antagonizing the cloud providers by talking about it in public. But that model is very real. Absolutely. Actually, what we are finding is that the application side, like parts of their overall ecosystem within the company,
Starting point is 00:31:30 they run on the cloud. But the data side, some of the examples, these are at the range of 100 to 500 petabytes. The 500 petabyte customer actually started at 500 petabytes, and their plan is to go at exascale. And they are actually doing repatriation because for them, their customers, it's consumer facing and it's extremely price sensitive. When you're consumer facing, every dollar you spend counts. And if you don't do it at scale, it matters a lot. It will kill the business. Particularly last two years, the cost part became an important element in their
Starting point is 00:32:05 infrastructure. They know exactly what they want. They are thinking of themselves as hyperscalers. They get commodity, the same hardware, right? Just a server with a bunch of drives and network and put it on Colo or even lease these boxes. They know what their demand is. Even at 10 petabytes, the economic starts impacting. If you are processing it, the data side, we have several customers now moving to colo from cloud. And this is the range we are talking about. They don't talk about it publicly because sometimes you don't want to be anti-cloud. But I think for them, they are also not anti-cloud. They don't want to leave the cloud.
Starting point is 00:32:40 If they are completely leaving the cloud, it's a different story. That's not the case. Applications stay there. Data lakes, data infrastructure, object store, particularly if it goes to a colo. Now your applications from all the clouds can access this centralized, centralized meaning that one object store, you run on colo and the colos themselves have worldwide data centers. So you can keep the data infrastructure in a colo, but the applications can run on any cloud some of them surprisingly that they have global customer base and not all of them are cloud sometimes like some
Starting point is 00:33:11 applications itself if you ask what type of edge devices that they are running edge data centers they said it's a mix of everything what really matters is not the infrastructure infrastructure in the end is cpu network and drive it's a commodity. It's really the software stack. You want to make sure that it's containerized and easy to deploy rollout updates. You have to learn the Facebook, Google style running SaaS business. That change is coming. It's a matter of time and it's a matter of inevitability. Now, nothing ever stays the same. Everything always inherently changes in the full sweep of things. But I'm pretty happy with where I see the industry going these days. I want to start seeing a little bit less centralization around one or two big companies.
Starting point is 00:33:54 But I am confident that we're starting to see an awareness of doing these things for the right reason, more broadly permeating. The competition is always great for customers. They get to benefit from it. So the decentralization is a path to bringing and commoditizing the infrastructure. I think the bigger picture for me, what I'm particularly happy is for a long time, we carried
Starting point is 00:34:16 industry baggage in the infrastructure space. No one wants to change. No one wants to rewrite applications. As part of that equation, we carried POSIX baggage, SAN and NAS. You can't even do iSCSI as a service, NFS as a service. It's too much of a baggage. All of that is getting thrown out. The cloud players help the customers start with a clean slate. I think to me that's the biggest advantage. Now we have a clean slate, we can now go on a
Starting point is 00:34:44 whole new evolution of the stack, keeping it simpler, and everyone can benefit from this change. Before we wind up calling this an episode, I do have one last question for you. As I mentioned at the start, you're very much open source, as in legitimate open source, which means that anyone who wants to can grab an implementation and start running it, how do you, I guess, make peace with the fact that the majority of your user base is not paying you? And I guess, how do you get people to decide, you know what? We like the cut of his jib. Let's give him some money.
Starting point is 00:35:14 Yeah. If I look at it that way, right, I have both the heads, right, on the open source side as well as the business. But I don't see them to be conflicting. If I run as a charity, right, like I take donation, if you love the product, here is the donation box, then that doesn't work at all, right? I shouldn't take investor money and I shouldn't have a team because I have a job to pay their bills to. But I actually find open source to be incredibly beneficial. For me, it's about delivering value to the customer. If you pay me $5, I have to make you feel $50
Starting point is 00:35:43 worth of value. The same software you would buy from a proprietary vendor, why would... If I'm a customer, same software, equal in functionality. If it's proprietary, I would actually prefer open source and pay even more. But why are really customers paying me now? What's our view on open source? I'm actually the free software guy. Free software and open source are actually not exactly equal, right? We are the purest of the open source community. And we have strong views on what open source means, right? That's why we call it free software.
Starting point is 00:36:13 And free here means freedom. Free does not mean gratis, free of cost. It's actually about freedom. And I deeply care about it. For me, it's a philosophy and it's a way of life. That's why I don't believe in open core and other models that holding, giving crippleware is not open source, right? I give you
Starting point is 00:36:32 some freedom but not all, right? Like, it breaks the spirit. So, Minivo is 100% open source but it's open source for the open source community. We did not take some community developed code and then added commercial support on top. We built the product.
Starting point is 00:36:48 We believed in open source. We still believe and we will always believe. Because of that, we open sourced our work. And it's open source for the open source community. And as you build applications, the AGPL license and the derivative works, they have to be compatible with AGPL. Because we are the creator, if you cannot open source, your application derivative works,
Starting point is 00:37:10 you can buy a commercial license from us. We are the creator. We can give you a dual license. That's how the business model works. That way, the open source community completely benefits and it's about the software freedom. There are customers,
Starting point is 00:37:23 for them, open source is a good thing and they want to pay because it's open source. There are some customers that they want to pay because they can't open source their application and derivative works. So they pay. It's a happy medium. That way, I actually find open source to be incredibly beneficial. Open source gave us the trust more than adoption. It's not like free to download and use. More than that, the customers that matter, the community that matters, because they can see the code and they can see everything we did.
Starting point is 00:37:51 It's not because I said so, marketing and sales, you believe them, whatever they say. You download the product, experience it, fall in love with it. And then when it becomes an important part of your business, that's when they engage with us because they talk about license compatibility and data loss or a data breach. All that becomes important. Open source, I don't see that to be conflicting for business. It actually is incredibly helpful.
Starting point is 00:38:15 And customers see that value in the end. I really want to thank you for being so generous with your time. If people want to learn more, where should they go? I was on Twitter. Now I think I'm spending more time on maybe LinkedIn. I think if they can send me a request and then we can chat. And I'm always spending time with other entrepreneurs, architects, and engineers, sharing what I learned, what I know, and learning from them. There's also a community open channel. And just send me a mail at ab.min.io.
Starting point is 00:38:45 And I'm always interested in talking to our user base. And we will, of course, put links to that in the show notes. Thank you so much for your time. I appreciate it. It's wonderful to be here. AB Parasamy, CEO and co-founder of MinIO. I'm cloud economist Corey Quinn. And this has been a promoted guest episode
Starting point is 00:39:05 of Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice that presumably will also include an angry, loud comment that we can access from anywhere because of shared APIs. and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.