Orchestrate all the Things - MinIO, the de facto open source standard for multi-cloud storage, becomes a unicorn after a $103 Million Series B round funding. Featuring Founder AB Periasamy, CMO Jonathan Symonds
Episode Date: January 26, 2022MinIO took an early bet in the cloud, and in Amazon's S3 becoming the de facto standard for application storage needs. The bet is paying off. Article published on ZDNet ...
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting the dots together.
InIO took an early bet in the cloud and in Amazon's S3
becoming the de facto standard for application storage needs.
The bet is paying off.
I hope you will enjoy the podcast.
If you like my work, you can follow Link Data.
Let's start from the beginning, which is since we haven't connected before,
I would just like to ask you, Ebi, to share a few words about yourself
and your background and the founder story for MinIO.
I'm Ebi.
I'm one of the co-founders, and there are totally three founders, Garima Kapoor and Harsha.
And my background, I previously built a startup called Gluster, or Gluster File System.
It's an open source distributed file system. Red Hat ended up acquiring that startup in the past. And I've been in the open source space as long as I can remember.
Open source distributed computing and storage and system software.
That's pretty much my background.
And the history behind me now, myself, me and Harsha, we were working on Bionics.
After Gluster, we were just having fun.
And Harsha was part of the Gluster team as well.
Garima wanted to do a startup, and that's how all of this came together.
But we put aside Bionics because you don't want to be experimenting in a startup where everything has to move fast, right? So we picked an idea that's long-term and here to stay
and not only grows, it actually multiplies
and grows bigger and bigger, right?
So every direction you look at,
what was very clear for us was data
is going to be a problem, right?
That is anything you do around data is here to stay we picked the persistence layer of the
data stack that is storing of all the data storing and managing all the data and the public the
opportunity for us was in the public cloud the public cloud started with object storage and
between different public cloud services, they are proprietary.
They are stuck with each other.
Like everybody is proprietary.
If you wrote application on one cloud, you cannot move it to other cloud.
And then outside of the public cloud, everything was file block VM, incompatible and old school.
The SAN, NAS, HTFS type technologies were incompatible with the cloud.
And that was the opportunity we saw,
that if data is everybody's problem across small, medium, large,
every business is going to struggle with managing huge amounts of data,
and they have to standardize on one piece of technology,
what would that be?
It was very clear that cloud is here to stay,
and cloud will reset the infrastructure standards.
And the storage for the cloud
is going to be the storage for everybody.
And object store is the technology
underneath the public cloud storage services.
So what we did was we built a software
that is exactly compatible with the cloud,
at least the leading cloud here,
in this case, Amazon S3 cloud storage. We built that as compatible with Amazon cloud, at least the leading cloud here, in this case, Amazon S3 cloud storage.
We built that as compatible with Amazon S3, but then you can run inside AWS on Google Cloud,
Azure, all the way to Edge. You can run anywhere. It's a software, runs like your MongoDB or
Postgres or any other data store, except it's used to store massive amounts of data. And that's basically the story.
That was the idea.
We built a piece of software that was very easy to use and released it to the community
as open source, and it grew virally.
Yeah, in retrospect, and I have to admit in advance that I wasn't familiar with MinIO
before this announcement.
In retrospect, seeing what you do and how you go about it, it all seems so obvious.
So one question I have to ask is, when did you actually start that?
I don't think you mentioned that previously.
Yeah, we officially incorporated the company in November of 2014 and then things started coming
to shape in 2015.
Taking that into consideration, yes, it definitely looks obvious now.
Probably didn't look that obvious back then.
And this is why you are in the position you are in today.
You were pioneers in a way, I guess.
Yeah, it feels so, but it is true that when we started in 2014,
like when I told about this problem,
everyone thought that AWS will,
if we are not useful inside AWS, we will not be useful anywhere else.
And then I was like, inside AWS,
you already have AWS S3.
Sure, it's a piece of software
that can run inside AWS too.
But the real problem is all the world's data,
they have to standardize on one technology and one API.
It did not make sense to a lot of people.
It is possible that AWS, my bet was that it's possible that AWS can become the world's data
center and all of the world's data goes to AWS S3.
We all can go home, right?
It's easier.
But it was hard for me to digest that that would be the case.
My bet was that AWS will be very successful in convincing the industry the right way to build infrastructure, but it's highly unlikely that AWS will be the only infrastructure around.
And if that's going to be the case, too many standards of APIs is not going to cut it.
But then on the other hand, world is producing so much data. that was an obvious problem. Every major trend from AAML to name it,
right, like IoT, every like autonomous driving, anything you pick, more data is coming. And it
was obvious problem for me. And it wasn't, you don't have to be an expert to tell that 10 years from now, the world will be drowning in data.
It was already the case to begin with.
And from 7 billion people to trillions of devices producing data, it was so clear.
And for us, we didn't have to be genius to come up with this conclusion.
Well, you must have done something right. I mean, besides being at the right
time and
the right place, you must have done something
right, because then to bring this
all into context,
the occasion for having this
conversation is that you're getting a
pretty massive round
of funding, which also happens
to raise your
valuation to so-called unicorn status.
So, if you'd like to just share the specifics around this announcement and, well, how did
you get there?
And to add some color to the question, it also has to do with the fact that your main
license model is AGPL, if I'm not mistaken,
which is, well, let's put it that way,
it's not the most common model among unicorns, let's say.
Yeah.
So for us, everything we do, it reflects in our brand.
Our brand is our culture and it has to be built around honesty,
trust and love, right?
If you have to build a long-term brand,
it's really important that everything you do,
you establish trust
and you make your community
and users fall in love with you.
And in that sense, right?
Like for us, when we talk about open source,
it's not a marketing strategy for us.
It's a philosophy we believe in and at the same time we also have the responsibility to run a business
the common thing team you see across the the open source space they look at open source as a
powerful vehicle to distribute and get the adoption but then they start cannibalizing their own
adoption by making some important features proprietary.
They go to open core model.
Then you are either open source,
not fully proprietary.
Even if you are completely proprietary,
you can tell your customers and community,
this is it, right?
You are being honest.
But for me, it did not make sense.
It was, for us, open source is about software freedom.
And we have to be honest about it.
Either you believe in it or you don't.
The AGPL license allowed us to be that.
That we are actually, I would say, probably one of the few or the only one who can say
that it's the exact same binary bits that a customer and a community gets.
I'm not holding back any feature.
Every important feature from advanced encryption,
ransomware protection, name it, right?
Every major enterprise feature
is there in the open source product.
But then how do we make money?
It's actually quite simple.
AGPL license is a license that's for open source community.
If you are building open source community,
if you believe in software freedom equally, then we are in the same club. We all can benefit together from each other's
contribution. But if you are building commercial proprietary application, you don't believe in
software freedom, then those people I don't want to ignore. In fact, they are the one who can fund
this. They would pay for a commercial license because they cannot open source their derivative works, their application.
AGPL is actually the purest license of all.
Others tend to pick something like MIT Apache permissive license.
And then the result of it is if you see every one of those companies who started with Apache license ended up as an open core making important features proprietary.
That was in conflict with our core philosophy and beliefs.
Okay.
I think I saw, you know, looking around,
doing a little bit of research before the conversation,
I think I saw some reference on your website
to something like object storage as a service.
And seeing that in my mind, I felt like, okay,
so maybe this is part of the business model.
So you actually use your own software to do precisely that offer,
object storage as a service.
And this is how you're able to make money while at the same time
being open source with the LGPL license.
So we actually don't provide service, right?
So it is just a software.
It is no different from say Red Hat Enterprise Linux
or MongoDB or any of these piece of software
that you would pay for the software license.
You download the software, run in your infrastructure.
We did launch Minayo as a marketplace offering offering tightly integrated with the public cloud services.
So you can easily click and deploy and run your own multi-cloud object store.
You are still in control of your data and software. our community and customers deeply care that if they, if they, if, if, if it's a service,
then it's entirely man, their data is managed by us. I have access to their data and it's not
something that makes them feel comfortable, right? Whereas it's a piece of software,
they can read all of the source code, they can download, they can run, but managed service
or a marketplace offering gives you the fine balance.
It still gives you all the benefits, somewhat closer to the service model, but you are in control.
The software is running in your infrastructure and your data is in your control.
I cannot see your data. That's actually very important for most of these modern enterprises today. Yeah, and I think I want to just double click on that because the confusion may come from the fact
that we've made it very, very simple for IT
to deploy object storage as a service
to their developer community.
And so with just a handful of clicks from the MinIO console,
they can now deploy object storage as a service.
Where heretofore that was almost impossible.
It was a very, very complex undertaking,
whereas now it's a very, very simple undertaking. You don't even know how to spell Kubernetes
to basically deploy a Kubernetes-powered object storage as a service to your clients internally.
And so that's one of the reasons that MinIO's adoption has accelerated really over the last
18 months is that that capability just didn't exist in the market until we brought it to the market.
Okay.
Yeah, I realize that a lot of the appeal and also a lot of the capabilities
that the bathroom comes seems to come precisely from this deep,
from this leveraging Kubernetes basically,
which enables you to be operational on all clouds
and public and hybrid and whatnot.
Yeah.
When we started Minivo, Kubernetes was not there, right?
Like Docker, Docker Compose and Swarm.
Then you look around from Cloud Foundry, Mesos,
like all the way to VMware,
every one of these technologies were incompatible with each other again.
But at that point, Minivo itself was designed in a way that it followed all the cloud-native principles,
that it has to be lightweight, scalable, almost runs like a microservice,
except it's a persistent state because you store data in Minivo.
But we waited for the market to consolidate.
As Kubernetes grew, it was quite clear for us that they got the ideas right.
And we took a serious bet on Kubernetes.
But Minivo itself, it can run from Raspberry Pi to all kinds of high-end servers, even power architecture,
from really small to really large deployments. Kubernetes, the declarative model and the
simplicity of Kubernetes allowed the application developers to manage their containers. Kubernetes
itself came as an afterthought. Once they containerized the
application, they actually put Minivu along with their application containers, but then they needed
something to naturally orchestrate their containers. And Kubernetes abstracted the differences between
different clouds and even the private cloud. Once you built an application on containers and
Kubernetes, you can move from VMware Tanzu to
Rancher, OpenShift to any one of the cloud. It was an inevitability. We watched the trend,
listening to our community. We naturally gravitated to what the community was
consolidating towards. It made a lot of sense for us too. Minerva itself is not really tied
to Kubernetes, but it became a natural fit because both of
them were cloud native.
Okay.
So, yeah, it seems like the core of the bet that you took early on was precisely around
S3 and building something that is compatible with S3,
while at the same time supporting all the different formats
that different clouds work with.
And I wanted to ask if there was something special that you did
to achieve that from a technical point of view.
So the core bet, while it is betting against Amazon S3 versus us, but the real
story behind it was, I actually like to go into a market that's quite crowded and then disrupt it
with a better product. That was really the differentiator. That was really the driving
force behind it. And then how do you disrupt is really the differentiation. It's very hard to go and create new markets. Instead, if I'm a product guy, and I would rather go into a market that's
very established, and the market is already there. And it's the good part about object storage
market is, even though it looked like crowded
by the time I started,
there were many players,
but I knew that it will consolidate.
But at the same time,
market will have an explosive growth.
If I, when it consolidated,
if I'm one of the last few players standing,
then it will result in a huge success.
So I knew the costs were too high,
but then I also knew we can beat the competition
with a better product.
And what I found was across the industry,
you look around, the storage guys,
particularly they came with a very old school mindset.
And they're all like typical file system developers.
In fact, I myself came from Gluster file system,
but Gluster file system is like no other file system.
I took different ideas.
I didn't follow the crowd.
The problem in the storage space,
particularly the enterprise space,
they like to build complex, shiny, heavy products.
That's not what the industry wants,
particularly when you have lots of data
and big infrastructure,
building complex products falls apart.
For me, it wasn't that hard to see that differentiation.
It's not like, oh, they got these clever dedupe algorithms
or compression algorithms.
All of the building blocks that was needed to build a better product
was just there.
It was simply a matter of being
focused, taking a minimalist view that I found was very hard and rare in the enterprise space.
They always think about how can I do more features, ended up creating blockware. I found that
if I were the user, how would such a product be really? What i expect from from from an architect we we relentlessly focused
on just the right features and simplicity was the core driving design philosophy it's a mindset that
allowed us to differentiate giving a better product for us was a product that resonated
with the users just the right features, being
super disciplined. Like all the time, even early on, community will ask, why S3 API? Why not Swift
API? Swift is, at that time, OpenStack was still not dead, right? There was even HDFS API, all of
it. I was like, pick one, right? It's not like I'm a fan of S3 API, but if you, if you want Swift API,
then it has to be Swift API. And I'll do only one thing,
only one way to do it right. And I will not do multi-protocol.
That's not minimalism design philosophy.
They would even ask why not add file API?
You can do a multi-protocol file block and object.
I'm like, that's a grand unified theory of storage. It sounds nice on paper, but these are
completely independent problems. And the whole idea about object storage was to let go legacy
stuff. If I bring back the legacy stuff, then it's not plus plus, it's minus minus. I was just
disciplined to tell a lot of no, and that's what allowed us to get here.
Can I add a couple of things to that? So one is we owe this giant debt of gratitude to Amazon for the S3 API, right? They changed the developer mindset in terms of how you do storage. The
second thing is that because of the open source model and because of our focus on that just one thing,
i.e. object storage through the S3 API, we were able to basically harden our implementation of
that S3 API in a way that no proprietary software vendor could ever do. So we have tens of thousands
of customers that hammer on our S3 API every single day, and we release weekly.
So if there's any issue, we know about it within minutes, and we patch it within an hour or so.
But that hardening of that and the large amount of people that are hammering on it every day have given us a significant advantage and have given the community and, you know,
by extension, our users and our growing user base,
the confidence to know that any application is going to run against our S3 API seamlessly.
True. Since S3 and its API, it's obviously such a central thing to what you do.
I was wondering if there was ever any point in time that you were concerned about?
Well, you know, the famous Oracle, Google API case.
If there was any point in time that you were concerned that, you know, this may be an issue at some point for you?
Yeah, it would be equally a bigger issue for Amazon as well.
If they change the API, they would first...
First, it's an API like you mentioned.
It's not a standard.
In fact, the API definitions they published,
they are more like a religious text.
There are many ways to interpret
and their own SDK is interpreted differently between different languages. Even
within a Java SDK, different versions of Java, AWS Java SDK would interpret that API differently.
We got the implementation right. But then now the issue of like it is now Amazon proprietary, it's not just about the legal rights to use, right?
Even Amazon can simply keep changing.
Even if they never sued anybody using AWS S3 API, but then they are changing the API because they control the upstream.
They control their own standard.
If they keep changing, it is enough to upset the industry.
But if they keep changing, it would also hurt their own customers.
So the API is pretty much converged and Amazon services support some of the oldest APIs too.
So it's not going to change, but they will keep adding.
If they keep adding, we also keep adding.
We are neck to neck with them.
And today we reached a point of API
saturation that it has all the right features that industry needs adding anything more is looking
more like vanity but overall I think we got the right functionalities all the modern functionalities
both of us neck-to-neck fully compatible with each other but Amazon overall has been only kind to us and to everybody else. They actually
saw that their mindset is somewhat similar to us too. If S3 API becomes popular, it means it'll
result in more applications being Amazon ready. And their belief is that they can build a better
product. They can build the best of the breed product compared to anybody else. And if that's the case, if S3 API indeed becomes a standard, it is now almost a standard,
the best, because they are the best, their belief is they are the best implementation,
industry will gravitate towards AWS and they will win anyway. Our bet is that, yes, we believe that
same thing too. But unlike Amazon, Amazon thinks that they will build a service
and they will be the world's data center.
Our bet is that we will build a better software
and we will turn every infrastructure into an object storage service
and we can build a better product than Amazon S3.
But at this point, so far, I would say Amazon has been only kind
and they have actually been public about it.
I've seen one of their VP R&D only encouraging, promoting Minayu to their own community.
They are not afraid because they are confident about their capabilities, their ability to build a better product.
Overall, so far, everything is very cooperative, going well. But if any day Amazon says, legally, I'm blocking
you, like what Oracle did to Microsoft and Google, if any time Amazon says, you cannot do it, it
gives us the right to go publish our own standard. It's not like we don't know how to build a better
standard. We certainly do. S3 API has kind of become more complex than its roots.
We can build a better S3 API.
And today, we have a bigger ecosystem, if not equal.
And if Amazon says it turns other way, we have a huge community.
If we publish a new standard and we publish it as an open standard that anybody can use without royalty, that would actually kill S3 API.
So it's not in the interest of either party to just kill each other.
Instead, consolidate, grow the market and let the winner take off.
Yeah, I agree with you.
It looks like it's a mutual beneficial relationship the way it is.
When we started, our problem was that S3 was only available inside AWS.
And what was the result of that?
There were only a minority of application segment that spoke S3 API.
And if this is going to be the case, we have no role to play.
But first, we needed applications to emerge that speaks S3 API. We worked very hard
to create the ecosystem outside of AWS which was actually larger all the way from VMware
environment right now it ships with chips Mina as part of their stack vSphere to even you can
convert your NetApp to HDFS to name it every one of the infrastructure can become S3 compatible.
We wrote the first S3 v4 API implementation.
We promoted it and created an ecosystem.
Now that the ecosystem is built around Minio, the problem pretty much has gone.
S3 API availability of applications speaking S3 API outside of AWS today is no longer a problem.
In fact, it has become an advantage.
These applications only know how to speak S3 API, and they lost their ability to talk to SAN and NAS today.
Yeah, it's true that you do support a wide variety of environments.
I have to say there is one in particular that piqued my interest,
and that's the HDFS system.
Because of, well, I'm one of those people that kind of lived, let's say,
the rise and fall of Hadoop and its file system.
So I'm wondering how much of a market pool do you see for people
who have invested in Hadoop and HDFS
and now want to move away from that?
We consistently come across, particularly our large deals, they are actually big data
replacement.
Why they are large deals?
Simply because our pricing model is consumption-based.
More data means bigger checks. And we found that most of these organizations, where is most of the data sitting? It's actually HDFS because they were machine generated data. Machine
generated meaning some form of logs. It could be application logs, all kinds of log. It's
inside machine software writes these logs. That data coming in on a continuous basis,
that filled these data volumes. We are talking about petabytes and petabytes of data. Most
of these organizations, if they have 10 plus petabytes of data, if you look around, you
ask them, where is it sitting? They will tell you it's HDFS. It's not some high HC, like
Nutanix or VxRail type HCI or some kind of NAS. It was in HDFS. HDFS actually played
a major role in taking the control out of SAN and NAS and
giving them a software-defined large-scale on commodity hardware, a distributed storage system.
But the problem with HDFS, as we all see, that it's now collapsing under its own weight because
it was complex, it was incompatible with the cloud, and our most popular deployments at the top,
like use case, is around AML, big data analytics type workloads. We are routinely replacing Cloudera
stack and HDFS with MinIO, Kubeflow, Kubernetes containers, data pipeline type architecture,
and there is no HDFS in that model because HDFS isn't compatible here.
But I would still give credit to HDFS for opening the market and even object store,
you can, there is a driver called S3A adapter. It makes object store S3 API look like it is Hadoop
compatible file system and unmodified Hadoop applications
can be brought into Object Store.
What is now surprising is the speed
at which the AML applications are getting rewritten.
They are actually speaking native S3 API
and they don't go through Hadoop HCFS translation layer.
It does make it more efficient.
Interesting.
Thank you.
I think as we're about to wrap
up, just one final question.
What are your future plans?
What are you going to be
spending this money on?
I would say
that this money has to be spent on
educating the market at even
broader level.
Object storage has plenty of capabilities for today's needs, I think, for some time to come.
We'll continue to keep up the product innovation.
Those are actually easier things, right?
I would say making it even easier.
The product, the technology operating at scale is not hard.
But to understand, even writing a good policy definition requires
a deeper understanding of the API today.
But that's not a good thing, right?
You will see a lot of times that on Amazon, like the breaches happen simply because they
wrote poorly written policies or they just gave permission to everything.
It's not because S3 is insecure and it's the nature of the S3 API.
I actually want to reach a point where average technician grade people with no expertise
in storage, they can't spell Kubernetes.
Can they operate at scale securely?
That to me is better user experience.
But even bigger than that is just educating the market on what is object storage.
You would find at a small scale, they would start sticking the data blobs in database,
not knowing that they should be using object store or enterprise IT still stuck with file and block.
And that's a big, there is a big market out there that they are not DevOps.
They are not the geeks like us.
And they are drowning in data too.
If we educate the market at a broad scale,
explaining the very basics,
what is starting with what is object storage
to using them at large scale,
that requires a creation of large amounts of content
and distributing it to the right
hands.
That's for us is marketing.
It's not like spending on big trade show banners and stuff, right?
It's just for content creation and distribution, we would invest heavily in educating the market
about object storage at a large scale.
Great.
Thank you.
And well, success.
I wish you much
success going forward
and
again,
congratulations
for the funding.
Thank you, George.
Thank you, George.
I hope you enjoyed
the podcast.
If you like my work,
you can follow
Link Data Orchestration
on Twitter,
LinkedIn
and Facebook.