Screaming in the Cloud - Making Open-Source Multi-Cloud Truly Free with AB Periasamy
Episode Date: March 28, 2023AB Periasamy, Co-Founder and CEO of MinIO, joins Corey on Screaming in the Cloud to discuss what it means to be truly open source and the current and future state of multi-cloud. AB explains ...how MinIO was born from the idea that the world was going to produce a massive amount of data, and what it’s been like to see that come true and continue to be the future outlook. AB and Corey explore why some companies are hesitant to move to cloud, and AB describes why he feels the move is inevitable regardless of cost. AB also reveals how he has helped create a truly free open-source software, and how his partnership with Amazon has been beneficial. About ABAB Periasamy is the co-founder and CEO of MinIO, an open source provider of high performance, object storage software. In addition to this role, AB is an active investor and advisor to a wide range of technology companies, from H2O.ai and Manetu where he serves on the board to advisor or investor roles with Humio, Isovalent, Starburst, Yugabyte, Tetrate, Postman, Storj, Procurify, and Helpshift. Successful exits include Gitter.im (Gitlab), Treasure Data (ARM) and Fastor (SMART).AB co-founded Gluster in 2005 to commoditize scalable storage systems. As CTO, he was the primary architect and strategist for the development of the Gluster file system, a pioneer in software defined storage. After the company was acquired by Red Hat in 2011, AB joined Red Hat’s Office of the CTO. Prior to Gluster, AB was CTO of California Digital Corporation, where his work led to scaling of the commodity cluster computing to supercomputing class performance. His work there resulted in the development of Lawrence Livermore Laboratory’s “Thunder” code, which, at the time was the second fastest in the world. AB holds a Computer Science Engineering degree from Annamalai University, Tamil Nadu, India.AB is one of the leading proponents and thinkers on the subject of open source software - articulating the difference between the philosophy and business model. An active contributor to a number of open source projects, he is a board member of India's Free Software Foundation.Links Referenced:MinIO: https://min.io/Twitter: https://twitter.com/abperiasamyLinkedIn: https://www.linkedin.com/in/abperiasamy/Email: mailto:ab@min.io
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is sponsored in part by our friends at Chronosphere.
When it costs more money and time to observe your environment than it does to build it, there's a problem.
With Chronosphere, you can shape and transform observability data based on need,
context, and utility. Learn how to only store the useful data you need to see in order to reduce
costs and improve performance at chronosphere.io slash cory-quin. That's chronosphere.io slash
cory-quin. And my thanks to them for sponsoring my ridiculous nonsense.
Welcome to Screaming in the Cloud.
I'm Corey Quinn.
And I have taken a somewhat strong stance over the years on the relative merits of multi-cloud
and when it makes sense, when it doesn't.
And it's time for me to start modifying some of those. To have that conversation
and several others as well with me today on this promoted guest episode is A.B. Aparasamy,
CEO and co-founder of MinIO. A.B., it's great to have you back.
Yes, it's wonderful to be here again, Puri.
So one thing that I want to start with is defining terms. Because when we talk about multi-cloud, there are, to my mind at least, smart ways to do it and ways that are frankly ignorant.
The thing that I've never quite seen is, it's Greenfield, day one, time to build something.
Let's make sure we can build and deploy it to every cloud provider we might ever want to use.
And that is usually not the right path. Whereas
different workloads in different providers, that starts to make a lot more sense. When you do
mergers and acquisitions, as big companies tend to do in lieu of doing anything interesting,
it seems like they find, oh, we're suddenly in multiple cloud providers. Should we move
this acquisition to a new cloud? No, no, you should not. One of the challenges, of course,
is that there's a lot
of differentiation between the baseline offerings that cloud providers have. MinIO is interesting
in that it starts and stops with an object store that is mostly S3 API compatible. Have I nailed
the basic premise of what it is you folks do. Yeah, it's basically an object store,
Amazon S3 versus us.
It's actually, that's a comparable, right?
Amazon S3 is a hosted cloud storage as a service,
but underlying technology is called object store.
Minivo is a software and it's also open source.
And it's the software that you can deploy on the cloud,
deploy on the edge, deploy anywhere.
And both Amazon S3 and Minivo are exactly S3 API compatible.
So drop-in replacement, you can write applications on Minivo and take it to AWS S3 and do the reverse.
Amazon made S3 API a standard inside AWS.
We made S3 API a standard across the whole cloud, all the cloud edge everywhere, rest of the world.
I want to clarify two points because otherwise I know I'm going to get nibbled to death by
ducks on the internet. When you say open source, it is actually open source. You're AGPL,
not source available or we've decided now we're going to change our model for licensing because,
oh, some people are using this without paying us money as so many companies seem to fall into that
trap. You are actually open source and no one reasonable is going to be able to disagree with that definition.
The other pedantic part of it is when something says that it's S3 compatible with an API basis,
like the question is always, does that include the weird bugs that we wish it wouldn't have?
Or some of the more esoteric stuff that seems to be a constant source of
innovation. To be clear, I don't think that you need to be particularly compatible with those
very corner and vertex cases. For me, it's always been the basic CRUD operations. Can you store an
object? Can you give it back to me? Can you delete the thing? And maybe an update, although generally
object stores tend to be atomic. How far do you go down that path of being, I guess, a faithful implementation of what the S3 API does?
And at which point do you decide that something is just honestly lunacy and you feel no need to wind up supporting that?
Yeah, the unfortunate part of it is we have to be very, very deep.
It only takes one API to break.
And it's not even like one API we did not
implement, one API under a particular circumstance, right? Like even if you see like AWS SDKs, right?
Java SDK, different versions of Java SDK will interpret the same API differently. And AWS S3
is an API. It's not a standard. And Amazon has published the REST specifications, API specs,
but they are more like religious text. You can
interpret it in many ways. Amazon's own
SDK has interpreted this
in several ways, right? The only way to
get it right is you
have to have a massive ecosystem
around your application.
If one thing breaks, today if I
commit a code and it introduces
regression, i will immediately
hear from a whole bunch of community what i broke there's no certification process here there is no
industry consortium to control the standard but then there is an accepted standard like if the
application works then it works and only way to get it right is like amazon sdks the all of those
language sdks to be simpler, but applications can even use
Minio SDK to talk to Amazon. And Amazon SDK to talk to Minio, now there is a clear cooperative
model. And I actually have tremendous respect for Amazon engineers. They have only been kind and
meaningful, reasonable partnership. Like if our community reports a bug that Amazon rolled out
a new update in one of the region and the S3 API broke, they'll actually go fix it.
They will never argue, why are you using Minivo SDK?
They're engineers.
They do everything by reason.
That's the reason why they gained credibility. shift just because so much has been built on top of it over the last 15, almost 16 years now,
that even slight changes require massive coordination. I remember there was a little
bit of a kerfuffle when they announced that they were going to be disabling the bit torrent endpoint
in S3, and it was no longer going to be supported in new regions, and eventually they were turning
it off. There were still people pushing back on that. I'm still annoyed by
some of the documentation around the API
that says that it may not
return a legitimate
error code when it errors with certain
XML interpretations.
It's kind of become
very much its own thing.
It is a problem. We have seen
even stupid errors similar to that.
HTTP headers are supposed to be case insensitive.
But then there are some language SDKs
will send us a certain type of casing
and they expect the response to be the same way.
And that's not HTTP standard.
We have to accept that bug and respond in the same way
than asking a whole bunch of community
to go fix their application.
And Amazon's problem are our problems too.
We have to carry that baggage,
but some places where we actually take a hard stance
is like Amazon introduced that initially
the bucket policies, like access control list,
then finally came IAM.
Then we actually, for us,
like the best way to teach the community
is make best practices the standard,
the only way to do it. We make best practices the standard, the only way
to do it. We have been educating them that we actually implemented ACLs, but we removed it.
So the customers will no longer use it. The scale at which we are growing, if I keep it,
then I can never force them to remove. So we have been planting about how certain things that,
if it's a good advice, force them to do it. That approach has paid off, but the problem is still quite real.
Amazon also admits that S3 API is no longer simple, but at least it's not like POSIX, right?
POSIX is a rich set of API, but doesn't do useful things that we need to do.
So Amazon's APIs are built on top of simple primitive foundations that got the storage architecture correct.
And then doing sophisticated functionalities on top of the simple primitives, the atomic
RESTful APIs, you can finally do it right and you can take it to great length and still
not break the storage system.
So I'm not so concerned.
I think it's time for both of us to slow down and then make sure that the ease of operation
and adoption is the goal than trying to create an API bible.
Well, one differentiation that you have that, frankly, I wish S3 would wind up implementing
is this idea of bucket quotas. I would give a lot in certain circumstances to be able to say that
this S3 bucket should be able to hold five gigabytes of storage and no more. You could
fix a lot of free tier problems, for example,
by doing something like that.
But there's also the problem that you'll see in data centers
where, okay, we've now filled up
whatever storage system we're using.
We need to either expand it at significant costs
and it's going to take a while,
or it's time to go and maybe delete some of the stuff
we don't necessarily need to keep in perpetuity.
There is no moment of reckoning
in traditional S3 in that sense, because, oh, you can just always add one more gigabyte at 2.3 or
however many cents it happens to be, and you wind up with an unbounded growth problem that you're
never really forced to wrestle with, because it's infinite storage. They can add drives faster than
you can fill them in most cases.
So it just feels like there's an economic story, if nothing else, just about governance control
and make sure this doesn't run away from me. And alert me before we get into the multi-petabyte
style of storage for my Hello World WordPress website. Yeah. So I always thought that Amazon
did not do this. It's not just Amazon, the cloud players, right?
They did not do this because they want, it's good for their business.
They want all the customers' data, like unrestricted growth of data.
Certainly, it is beneficial for their business, but there is an operational challenge.
When you set quotas, this is why we grudgingly introduced this feature.
We did not have quotas and we didn't want to because Amazon
S3 API doesn't talk about quotas, but the enterprise community wanted this so badly.
And eventually we yielded and we gave, but there is one issue to be aware of, right? The problem
with quota is that you as an object storage administrator, you set a quota, like say this
bucket, this application, I don't see more than 20 TB.
I'm going to set a 100 TB quota.
And then you forget it.
And then you think in six months they will reach 20 TB.
Reality is in six months, they reach 100 TB.
And then when nobody expected, everybody has forgotten that there was a quota set in place,
suddenly applications start failing.
And when it fails, it doesn't, even though the S3 API responds back saying that insufficient space, but then the application doesn't really pass that error all the way up. When applications fail, they fail in unpredictable ways.
By the time the application developer realizes that it's actually object
storage ran out of space, they lost time and it's the downtime. So as long as
they have proper observability, because Minerva also has the
observability that it can alert you that you are going to run out of space soon.
If you have those systems in place, then go for a quota.
If not, I would agree with the S3 API standard that it's not about cost.
It's about operational unexpected accidents.
Yeah, at some level, we wound up having to deal with the exact same problem with disc volumes
where my default for most things was at 70 i want to start getting pings on it and at 90 i want to
be woken up for it so for small volumes you run up with a runaway log or whatnot you have a chance
to catch it and whatnot and for the giant multi-petabyte things okay well why would you
alert at 70 on that, because procurement takes a while
when we're talking about buying that much disk
for that much money.
It was a roughly good baseline for these things.
The problem, of course, is when you have none of that
and, well, it got full, so oops-a-doozy.
On some level, I wonder if there's a story
around soft quotas that just scream at you
but let you keep adding to it,
but that turns into implementation details
and you can build something like that on top of any existing object store if
you don't need the hard limit aspect. Actually, that is the right way to do. That's what I would
recommend customers to do. Even though there is hard quota, I will tell you, don't use it,
but use soft quota. And the soft quota, instead of even soft quota, you monitor them. On the cloud,
at least you have some kind of restriction that the more you use, the more you pay.
Eventually, the month-end bills, it shows up.
On Minio, when it's deployed on these large data centers,
it's unrestricted access.
Quickly, you can use a lot of space.
No one knows what data to delete,
and no one will tell you what data to delete.
The way to do this is there has to be some kind of accountability.
The way to do it is actually have some chargeback mechanism based on the bucket
growth and the business units have to pay for it. The IT doesn't run for
free. IT has to have a budget and it has to be sponsored by the applications
team. And you measure, instead of setting a hard limit, you
actually charge them that based on the usage of your bucket, you're going to
pay for it.
And this is an observability problem. And you can call it soft quotas, but it hasn't been to
trigger an alert in observability. It's an observability problem. But it actually is
interesting to hear that as soft quotas, which makes a lot of sense. It's one of those problems
that I think people only figure out after they've experienced it once, and then they look like wizards from the future who, oh yeah, you're going to run into a quota
storage problem. Yeah, we all find that out because the first time we smack into something and
live to regret it. Now we can talk a lot about the nuances and implementation and low-level
detail of this stuff, but let's zoom out a bit. What are you folks up to these days? What is the
bigger picture that you're seeing of object
storage in the ecosystem? Yeah. So when we started, right, our idea was that the world is going to
produce an incredible amount of data. In 10 years from now, we are going to drown in data. We've
been saying that today, and it'll be true every year. You say 10 years from now, and it'll still
be valid, right? That was the reason for us to play this game.
And we saw that every one of these cloud players were incompatible with each other.
It's like early Unix days, right?
Like a bunch of operating systems, everything was incompatible.
And the applications were beginning to adopt this new standard, but they were stuck.
And then the cloud storage players, whatever they had, like GCS can only run inside Google Cloud, S3 can only
run inside AWS, and the cloud players game was bring all the world's data into the cloud. And
that actually requires enormous amount of bandwidth and moving data into the cloud at that scale. If
you look at the amount of data world is producing, if the data is produced inside the cloud, it's a
different game. But the data is produced everywhere else.
Minivo's idea was that instead of introducing yet another API standard, Amazon got the architecture right.
And that's the right way to build large scale infrastructure. If we stick to Amazon S3 API, instead of introducing yet another standard, RS50 API, and then go after the world's data.
When we started in 2014, November, it's really 2015 we started,
it was laughable.
People thought that there won't be a need for MinIO
because the whole world will basically go to AWS S3
and they will be the world's data store.
Amazon is very capable of doing that.
The race is not over, right?
And it still could be done now.
The thing is that they would need to fundamentally rethink
their, frankly, usurious data egress charges. The thing is that they would need to fundamentally rethink their, frankly, you serious, data egress charges.
The problem is not that it's expensive to store data in AWS.
It's that it's expensive to store data and then move it anywhere else for analysis or use on something else.
So there are entire classes of workload that people should not consider the big three cloud providers as the place where that data should live because you're never getting it back. Spot on, right? Even if network is free, right? Amazon makes like, okay, zero egress,
egress charge. The data we are talking about, like most of Minio's deployments, they start at
petabytes, like one to 10 petabyte feels like 100 terabyte. Even if network is free, try moving a 10
petabyte infrastructure into the cloud. How are you going to move it? Even with FedEx and UPS giving you a lot of bandwidth in their trucks, it's not possible, right?
I think the data will continue to be produced everywhere else.
So our bet was that we will be every, instead of you moving the data, you can run MinIO where there is data.
And then the whole world will look like AWS S3 compatible object store.
We took a very different path. But now when I say the same story
that when what we started with day one,
it's no longer laughable, right?
People believe that yes,
Minio is there because our market footprint
is now larger than Amazon S3.
And as it goes to production,
customers are now realizing
it's basically growing inside a shadow IT
and eventually businesses realize that
bulk of their business critical data
is sitting on Minio and that's how it's surfacing
up. So now what we are seeing
this year particularly all of these
customers are hugely concerned about cost
optimization and as part of
the journey there is also multi-cloud and
hybrid cloud initiatives. They
want to make sure that their application can run
on any cloud and the same software can run on any cloud or the
same software can run on their colos like Equinix or a bunch of digital reality anywhere. And
Minivo as a software, this is what we set out to do. Minivo can run anywhere inside the cloud all
the way to the edge, even on Raspberry Pi. It's now, whatever we started with, now has become
reality. The timing is perfect for us.
One of the challenges I've always had with the idea of building an application,
with the idea to run it anywhere, is you can make explicit technology choices around that. For example, object store is a great example because most places you go now
will or can have an object store available for your use.
But there seem to be implementation details that get
lost. And for example, even load balancers wind up being implemented in different ways with different
scaling times and whatnot in various environments. And past a certain point, it's, okay, we're just
going to have to run it ourselves on top of HAProxy or Nginx or something like it running
in containers themselves, and you're reinventing the wheel. Where is that boundary between we're going to build this in a way that we can run anywhere and the reality that I keep
running into, which is we tried to do that, but we implicitly, without realizing it, built in a lot
of assumptions that everything would look just like this environment that we started off in.
The good part is that if you look at the S3 API, every request has the site name, the endpoint, the bucket name, the path and the object name.
Every request is completely self-contained.
It's literally a HTTP call away.
And this means that whether your application is running on Android, iOS, inside a browser, JavaScript engine, anywhere across the world, they don't really care whether the bucket is served from EU or US East or US West. It doesn't matter at all.
So, it actually allows you by API, you can build a globally unified data infrastructure.
Some buckets here, some buckets there. That's actually not the problem. The problem comes
when you have multiple clouds, different teams like part M&A, even if you don't do M&A,
different teams, no two data engineer would
agree on the same software stack. Then they will all end up with different cloud players and some
still running on old legacy environment. When you combine them, the problem is like, let's take just
the cloud, right? How do I even apply a policy, that access control policy, that how do I establish
unified identity? Because I want to know this application is the only one who is allowed to access this bucket can I have that same policy
on Google Cloud or Azure even though they are different teams like that
employer that project or that admin if he or she leaves the job how do I make
sure that that's all protected you want unified identity you want to access
unified access control policies where are the encryption keys stored?
And then the load balancer itself, the load balancer is not the problem.
But then unless you adopt S3 API as your standard, the definition of what a bucket is, is different from Microsoft to Google to Amazon.
Yeah, the idea of the puts and retrieving of actual data is one thing.
But then you have, how do you manage it? The control plane layer of the object store.
And how do you rationalize that? What are the naming conventions? How do you
address it? I even ran into something similar somewhat recently when I was doing an experiment
with one of the Amazon Snowball Edge devices to move some data into S3 on a Lark, and the thing
shows up. It presents itself on the local network as an S3 endpoint, but none of their tooling can accept a different
endpoint built into the configuration files. You have to explicitly use it as an environment
variable or as a parameter on every invocation of something that talks to it, which is
incredibly annoying. I would give a lot for just to be able to say, oh, when you're talking in
this profile, that's always going to be your S3 endpoint. Go. But no, of course not, because that would make it easier to use something that wasn't
them. So why would they ever be incentivized to bake that in? Yeah, Snowball is an
important element to move data, right? That's the UPS and FedEx way of moving data.
But what I find customers doing is they actually use the tools that we
built for Minayu, because the Snowball appliance also looks like a S3
API-compat compatible object store.
And in fact, I've been told that when you want to ship multiple Snowball appliances,
they actually put Minivo to make it look like one unit because Minivo can erasure code the
objects across multiple Snowball appliances. And the MC tool, unlike AWS CLI, which is really
meant for developers like low-level calls. MC gives you unique code details
like LSC, PR, sync-like tools,
and it's easy to move and copy and
migrate data. Actually, that's how people
deal with it. Oh, God, I hadn't even considered the
problem of having a fleet of snowball
edges here that you're trying to do a mass data migration
on, which is basically how you move petabyte
scale datas. A whole bunch of parallelism,
but having to figure that out on a case-by-case
basis would be nightmarish. That's right.
There is no good way to wind up doing that natively.
Yeah. In fact, Western Digital
and there are a few other players too,
Western Digital
created a Snowball-like appliance, and they put
Minio on it, and they are actually
working with some system integrators to help
customers move lots of data. But
Snowball-like functionality is important
and more and more customers will need it.
This episode is sponsored in part by Honeycomb.
I'm not going to dance around the problem.
Your engineers are burned out.
They're tired from pagers waking them up at 2 a.m.
for something that could have waited
until after their morning coffee.
Ring, ring, who's there?
It's Nagios, the original Call of Duty.
They're fed up with relying on two or three different monitoring tools
that still require them to manually trudge through logs
to decipher what might be wrong.
Simply put, there is a better way.
Observability tools like Honeycomb, and very little else
because they do admittedly set the bar,
show you the patterns and outliers of how users
experience your code in complex and unpredictable environments so you can spend less time firefighting
and more time innovating. It's great for your business, great for your engineers, and most
importantly, great for your customers. Try free today at honeycomb.io slash screaming in the cloud.
That's honeycomb.io slash screaming in the cloud. That's honeycomb.io slash screaming in the cloud.
Increasingly, it felt like back in the on-prem days that you'd have a file server somewhere that
was either a SAN or it was going to be a NAS. The question was only whether it presented it to
various things as a volume or as a file share. And then in cloud, the default storage mechanism,
unquestionably, was object store.
And now we're starting to see it come back again.
So it started to increasingly feel in a lot of ways
like cloud is no longer so much a place
that is somewhere else,
but instead much more of an operating model
for how you wind up addressing things.
I'm wondering when the generation of prosumer networking equipment, for example,
is going to say, oh, and send these logs over to what object store?
Because right now it's still write a file and SFTP it somewhere else,
at least the good ones.
Some of the crap and still want old unencrypted FTP,
which is neither here nor there.
But I feel like it's coming back around again.
Like when do even home users wind up instead
of where do you save this file to having the cloud abstraction, which hopefully you'll never
have to deal with an S3 style endpoint, but that can underpin an awful lot of things. It feels like
it's coming back and that cloud is the de facto way of thinking about things. Is that what you're
seeing? Does that align with your belief on this? I actually fundamentally believe in the
long run, right? Applications will go
SaaS, right? Like you
remember the days that you used to install
QuickBooks and ACT and stuff like on
your data center. You used to
run your own exchange servers. Those days
are gone. I think these applications
will become SaaS.
But then the infrastructure building blocks
for these SaaS, whether they
are cloud or their own colo, I think that in the long run, it will be multi-cloud and colo all
combined and all of them will look alike. But what I find from the customer's journey, the old world
and the new world is incompatible. When they shifted from bare metal to virtualization,
they didn't have to rewrite their application. But this time, it is a tectonic shift. Every single application you have to rewrite. If you retrofit
your application into the cloud, bad idea. It's going to cost you more and I would rather not do
it. Even though cloud players are trying to make the file and block, like file system services and
stuff, they make it available 10 times more expensive than object,
but it's just to increase some legacy applications,
but it's still a bad idea to just move legacy applications there.
But what I'm finding is that the cost,
if you still run your infrastructure
with the enterprise IT mindset, you're out of luck.
It's going to be super expensive
and you're going to be left out.
Modern infrastructure, because of the scale,
it has to be treated as code.
You have
to run infrastructure with software engineers. And this cultural shift has to happen. And that's why
cloud, in the long run, everyone will look like AWS. And we always said that and it's now
becoming true. Kubernetes and MinIO basically is leveling the ground. Everywhere it's giving ECS
and S3-like infrastructure inside AWS or outside AWS everywhere.
But what I find the challenging part is the cultural mindset.
If they still have the old cultural mindset and they want to adopt cloud, it's not going to work.
You have to change the DNA, the culture, the mindset, everything. The best way to do it is go to the cloud first.
Adopt it. Modernize your application.
Learn how to run and manage infrastructure.
Then ask economics question, the unit economics.
Then you will find answers yourself.
On some level, that is the path forward.
I feel like there's just a very long tail of systems that have been working and have been meeting the business objective.
And, well, we should go and refactor this because, I don't know, a couple of folks on a podcast said we should.
But isn't the most compelling business case for doing a lot of it?
It feels like these things sort of sit there until there is more upside than just cost-cutting to changing the way these things are built and run.
And that's the reason that people have been talking about getting off of mainframes since the 90s in some companies.
And the mainframe is very much still there.
It is so ingrained in the way that they do business, they have to rethink a lot of the architectural things that have sprung up around
it. I'm not trying to shame anyone for the state that their environment's in. I've never yet met
a company that was super proud of its internal infrastructure. Everyone's always apologizing
because it's a fire, but they think someone else has figured this out somewhere and it all runs
perfectly. I don't think it exists.
What I'm finding is that if you're running it the enterprise IT style,
you are the one telling the application developers,
here you go, you have this many VMs
and you have a
VMware license and JBoss,
WebLogic and a SQL
server license. Now you go build your application,
you won't be able to do it. Because
application developers talk about Kafka and Redis and Kubernetes Kubernetes. They don't speak the same language.
And that's when these developers go to the cloud and then finish their application, take it live
from zero lines of code before IT can procure infrastructure and provision it to these guys.
The change that has to happen is how can you give what the developers want? Now the reverse
journey is also starting.
In the long run, everything will look alike.
But what I'm finding is if you're running enterprise IT infrastructure, traditional
infrastructure, they are ashamed of talking about it.
But then you go to the cloud and then at scale, some parts of it you want to move.
Now you really know why you want to move.
For economic reasons, particularly the data-intensive workloads, becomes very expensive. And expensive and at that part they go to a colo but leave the applications on the cloud
so it's the multi-cloud model i think is is inevitable the expensive pieces that where you
can if you are looking at yourself as hyperscaler and if your data is growing if your business
focuses on data centric business parts of the data and data analytics, AAML workloads will actually go out if you're looking at unit economics. If all you are focused on productivity,
stick to the cloud and you're still better off. I think that's a divide that gets lost sometimes.
People say, oh, we're going to move to the cloud to save money. It's no, you're not.
At a five-year time horizon, I would be astonished if that juice were worth the squeeze in almost any scenario.
The reason you go for it, therefore, is for a capability story when it's right for you.
That also means that steady-state workloads that are well understood can often be run more economically in a place that is not the cloud.
Everyone thinks for some reason that I tend to be, it's cloud or it's trash.
No, I'm a big fan of doing things that are sensible. And cloud is not
the right answer for every workload under the sun. Conversely, when someone says, oh, I'm building a
new e-commerce store or whatnot, and I've decided cloud is not for me, it's, eh, you sure about that?
That sounds like you are smack dab in the middle of the cloud use case. But all these things wind up acting as constraints and strategic objectives.
And technology and single vendor answers are rarely going to be
a panacea the way that their sales teams say that they will.
Yeah. And I find organizations that have SREs,
DevOps, and software engineers running the infrastructure, they actually
are ready to
go multi-cloud or go to colo because they exactly know, they have the containers and
Kubernetes microservices expertise.
If you are still on a traditional SAN, NAS and VM architecture, go to cloud, rewrite
your application.
I think there's a misunderstanding in the ecosystem around what cloud repatriation actually
looks like. Everyone claims it doesn't exist because there's basically misunderstanding in the ecosystem around what cloud repatriation actually looks like.
Everyone claims it doesn't exist because there's basically no companies out there worth mentioning that are,
yep, we decided the cloud is terrible, we're taking everything out, and we are going to data centers at the end.
In practice, it's individual workloads that do not make sense in the cloud.
Sometimes just the back of the envelope analysis means it's not going to work out. Other times during proof of concepts and other times as things have hit a certain point
of scale where an individual workload being pulled back makes an awful lot of sense. But everything
else is probably going to stay in the cloud. And these companies don't want to wind up antagonizing
the cloud providers by talking about it in public. But that model is very real. Absolutely. Actually,
what we are finding is
that the application side, like parts of their overall ecosystem within the company,
they run on the cloud. But the data side, some of the examples, these are at the range of 100
to 500 petabytes. The 500 petabyte customer actually started at 500 petabytes, and their
plan is to go at exascale. And they are actually doing repatriation because for them,
their customers, it's consumer facing and it's extremely price sensitive.
When you're consumer facing, every dollar you spend counts.
And if you don't do it at scale, it matters a lot.
It will kill the business. Particularly last two years,
the cost part became an important element in their
infrastructure. They know exactly what they want. They are thinking of themselves as hyperscalers.
They get commodity, the same hardware, right? Just a server with a bunch of drives and network
and put it on Colo or even lease these boxes. They know what their demand is. Even at 10 petabytes,
the economic starts impacting. If you are processing it, the data side, we have several customers now moving to colo from cloud.
And this is the range we are talking about.
They don't talk about it publicly because sometimes you don't want to be anti-cloud.
But I think for them, they are also not anti-cloud.
They don't want to leave the cloud.
If they are completely leaving the cloud, it's a different story.
That's not the case.
Applications stay there.
Data lakes, data infrastructure, object store, particularly if it
goes to a colo. Now your applications from all the clouds can access this centralized, centralized
meaning that one object store, you run on colo and the colos themselves have worldwide data centers.
So you can keep the data infrastructure in a colo, but the applications can run on any cloud some of them
surprisingly that they have global customer base and not all of them are cloud sometimes like some
applications itself if you ask what type of edge devices that they are running edge data centers
they said it's a mix of everything what really matters is not the infrastructure infrastructure
in the end is cpu network and drive it's a commodity. It's really the software stack. You want to make sure that it's containerized and easy to deploy
rollout updates. You have to learn the Facebook, Google style running SaaS business. That change
is coming. It's a matter of time and it's a matter of inevitability. Now, nothing ever stays the same.
Everything always inherently changes in the full sweep of things.
But I'm pretty happy with where I see the industry going these days.
I want to start seeing a little bit less centralization around one or two big companies.
But I am confident that we're starting to see an awareness of doing these things for the right reason, more broadly permeating.
The competition is always great for customers.
They get to benefit from it.
So the decentralization is a path to bringing
and commoditizing the infrastructure.
I think the bigger picture
for me, what I'm particularly happy
is for a long time, we carried
industry baggage in the infrastructure
space. No one wants to change.
No one wants to rewrite applications.
As part of that equation, we carried
POSIX baggage,
SAN and NAS. You can't even do iSCSI as a service, NFS as a service. It's too much of a baggage.
All of that is getting thrown out. The cloud players help the customers start with a clean
slate. I think to me that's the biggest advantage. Now we have a clean slate, we can now go on a
whole new evolution of the stack,
keeping it simpler, and everyone can benefit from this change.
Before we wind up calling this an episode, I do have one last question for you. As I mentioned
at the start, you're very much open source, as in legitimate open source, which means that anyone
who wants to can grab an implementation and start running it, how do you, I guess, make peace with the fact that the majority of your user base is not paying you?
And I guess, how do you get people to decide, you know what?
We like the cut of his jib.
Let's give him some money.
Yeah.
If I look at it that way, right, I have both the heads, right, on the open source side as well as the business.
But I don't see them to be conflicting.
If I run as a charity, right, like
I take donation, if you love the product, here is the donation box, then that doesn't work at all,
right? I shouldn't take investor money and I shouldn't have a team because I have a job to
pay their bills to. But I actually find open source to be incredibly beneficial. For me,
it's about delivering value to the customer. If you pay me $5, I have to make you feel $50
worth of value. The same software you
would buy from a proprietary vendor, why would... If I'm a customer, same software, equal in
functionality. If it's proprietary, I would actually prefer open source and pay even more.
But why are really customers paying me now? What's our view on open source? I'm actually
the free software guy. Free software and open source are actually not exactly equal, right?
We are the purest of the open source community.
And we have strong views on what open source means, right?
That's why we call it free software.
And free here means freedom.
Free does not mean gratis, free of cost.
It's actually about freedom.
And I deeply care about it.
For me, it's a philosophy and it's a way of life.
That's why I don't believe in open core and other models that
holding, giving crippleware is not
open source, right? I give you
some freedom but not all, right?
Like, it breaks
the spirit. So, Minivo is
100% open source but
it's open source for the open source community.
We did not take some
community developed code and then added commercial support on top.
We built the product.
We believed in open source.
We still believe and we will always believe.
Because of that, we open sourced our work.
And it's open source for the open source community.
And as you build applications, the AGPL license and the derivative works, they have to be compatible with AGPL.
Because we are the creator,
if you cannot open source,
your application derivative works,
you can buy a commercial license from us.
We are the creator.
We can give you a dual license.
That's how the business model works.
That way, the open source community
completely benefits
and it's about the software freedom.
There are customers,
for them, open source is a good thing
and they want to pay because it's open source. There are some customers that
they want to pay because they can't open source their application and derivative works.
So they pay. It's a happy medium. That way, I actually find open source
to be incredibly beneficial. Open source gave us the trust
more than adoption. It's not like free to download and use. More than that,
the customers that matter, the community that matters, because they can
see the code and they can see everything we did.
It's not because I said so, marketing and sales, you believe them, whatever they say.
You download the product, experience it, fall in love with it.
And then when it becomes an important part of your business, that's when they engage
with us because they talk about license compatibility
and data loss or a data breach.
All that becomes important.
Open source, I don't see that to be conflicting for business.
It actually is incredibly helpful.
And customers see that value in the end.
I really want to thank you
for being so generous with your time.
If people want to learn more, where should they go?
I was on Twitter. Now I think I'm spending more time on maybe LinkedIn. I think if they can
send me a request and then we can chat. And I'm always spending time with other entrepreneurs,
architects, and engineers, sharing what I learned, what I know, and learning from them.
There's also a community open channel. And just send me a mail at ab.min.io.
And I'm always interested in talking to our user base.
And we will, of course, put links to that in the show notes.
Thank you so much for your time.
I appreciate it.
It's wonderful to be here.
AB Parasamy, CEO and co-founder of MinIO.
I'm cloud economist Corey Quinn.
And this has been a promoted guest episode
of Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review
on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star
review on your podcast platform of choice that presumably will also include an angry, loud
comment that we can access from anywhere because of shared APIs. and less horrifying. The Duckbill Group works for you, not AWS.
We tailor recommendations to your business,
and we get to the point.
Visit duckbillgroup.com to get started.