Grey Beards on Systems - 107: GreyBeards talk MinIO’s support of VMware’s new Data Persistence Platform with AB Periasamy, CEO MinIO
Episode Date: September 25, 2020Sponsored by: The GreyBeards have talked with Anand Babu (AB) Periasamy (@ABPeriasamy), CEO MinIO, before (see 097: GreyBeards talk open source S3… episode). And we also saw him earlier this year, a...t their headquarters for Storage Field Day 19 (SFD19) where AB gave a great discussion of what they were doing and how it worked … Continue reading "107: GreyBeards talk MinIO’s support of VMware’s new Data Persistence Platform with AB Periasamy, CEO MinIO"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Keith Townsend.
Welcome to another sponsored episode of the Greybeards on Storage podcast,
a show where we get Greybeards bloggers together with storage assistant vendors
to discuss upcoming products, technologies, and trends affecting the data center today.
This Greybeard on Storage episode is brought to you today by MinIO
and was recorded on September 17, 2020.
We have with us here today, AB Piriya Asami, CEO of MinIO.
So, AB, why don't you tell us a little bit about yourselves
and the recent news about a partnership between VMware and MinIO.
Great. Thank you for bringing me on this channel.
It feels like not so long ago we spoke.
Yeah, it was a couple months back,
not that long ago.
That's good news.
Yeah, it feels like long ago, right?
And I'm myself, AB, Anand Babu Fereyasamy.
If you dedupe my name, you get AB, right?
And I'm one of the co-founders of MinIO,
and MinIO is an object storage.
And the recent announcement about VMware partnership, it's actually a big step in the
enterprise IT space, bringing Kubernetes to the enterprise. And MinIO now is natively available on VMware Tanzu, and it is available as a service to the data persistent services layer.
Yeah, so VMware has been spending a lot of, I would say, development effort and marketing effort talking about Kubernetes on VMware and Tanzu's latest integration of that activity. And so I'm trying to understand how MinIO plays in the VM world environment nowadays.
Yeah, we saw the industry was split until recently, right?
There was enterprise IT that was file block and VMs, right?
Or HCI, the closest innovation they had there was HCI. And you saw the cloud
native world, which was Kubernetes and containers and everything is elastic. These two worlds are
fundamentally incompatible, right? Containers and data services. And the data services, either you
have object storage or you have a database. And object storage is the primary storage out there,
right? Whether it's Snowflake or
Azure ML, Power BI, even static website hosting, everything in the cloud is built on object
storage. But you look at the private cloud or in the enterprise, the IT environment,
completely incompatible, right? That has shifted this time. And the way VMware did it, they did
such a clean job. If you see the race all along
is how do we modernize the enterprise IT so everybody in the end looks like AWS or one of
these public cloud. And our journey was to give the enterprise the storage side of things, which
is object storage. And this time around, what VMware did was by bringing Kubernetes support native into the heart of vSphere, it enabled us to go on top of VMware and take full advantage and bridge the world.
You see the traction Manivo has, we pretty much own the Kubernetes space.
It's the de facto object storage of choice in the private cloud, in the hybrid cloud.
But when it came to enterprise IT, we were seen as a shadow IT.
Is this something that somebody on vCenter can just fire up a MinIO object storage cluster?
I'm trying to understand how this all plays out in the VMworld environment.
Yeah, and I'm a little bit even confused because when I talk to VMware about object storage,
especially vSAN, I get my hand slapped.
And they say, you know what, Keith?
vSAN, no object storage, not optimized for object storage.
It is basically foul in VMFS.
That is what they addressed this time, right?
And the vCenter is the key, right?
So the IT still owns the physical layer,
and they control when to buy, when to upgrade,
when to fix the fail to drive,
the physical to virtual or physical to container.
IT owns the physical resources and the SLA and stuff, right?
How do they manage it? Through the vCenter.
This time, IT can provision private cloud infrastructure,
multi-tenant, like full Amazon-like capability,
but more enterprise-hardened,
without ever learning to spell Kubernetes.
And this is all done entirely through the vCenter.
You don't even have to touch the kubectl command.
You don't need to know even that it's underneath powered by Kubernetes.
So it becomes just a vCenter data store?
I'm not sure even that's the right terminology here,
but isn't that what we're talking about here?
Yeah, so actually, I'm glad you paid attention to that detail. It's a very fine, subtle detail, but it actually is a huge shift in the industry, right?
Like you saw how enterprise IT
resisted software-defined storage, right?
What VMware calls it as data persistence layer
is actually a huge upgrade to the software-defined storage.
They actually just moved the industry forward several steps.
And this is all because that is how the public cloud operates already.
It is no longer about storage.
It's actually about data.
Now when it comes to data, actually they put database and object storage along the same
lines.
It's actually the same layer.
In this announcement, the data persistence services the
if you see three of them are object storage and one of them is a cassandra database as you see
more more services come on they are essentially going to be a data service either a database for
storing metadata or it's an object storage the file and block are are kind of gone right but
end of the day the hard hard drive is where like solid state
or some block storage,
you need to still save the data.
So vSAN actually is the thin layer
that actually virtualizes the physical drives
or SSDs into a container storage interface
and enabling high-performance object storage
like MinIO or database,
distributed database to
run natively.
So effectively, this is providing a persistent data layer, a persistent storage layer for
the Tanzu container solution.
But it seems like this sort of stuff also applies to normal VMs and stuff like that,
wouldn't it?
Yeah. In fact, the integration speaks volumes. Instead of just retrofitting,
if you tell the customers, you can just run Kubernetes on top of VMs, nothing changes,
it's just a marketing campaign. That's not what VMware did. They actually brought Kubernetes into
the vSphere layer, and they did some
fundamental improvements in a way that Kubernetes now got the benefit of VM-like isolation,
and you can now manage VMs and containers just alike. And Minivo running in the supervisor
cluster close to the vSAN direct layer, you actually get best of both worlds. And that to us is a big deal.
So I know some of this firsthand.
One of the big things that VMware did in the 1.0 release of vSphere 7.0
and Tanzu was implement namespaces, not just adopting container namespaces,
but adopting namespaces for vSphere
and vCenter itself. Is MinIO tying into the concept of namespaces across VMs and containers
and offering some new persistent layer of storage based on just Linux namespaces? Yeah, it's actually the,
so the VMware namespace now is actually
the Kubernetes namespace as well,
because they are all kind of converged now, right?
And namespace is the fundamental resource isolation.
And now just like applications are isolated
from each other through namespace,
and namespace is how the IT would control
how resources are allocated
in a multi-tenant environment that applies to the storage layer as well meaning the object storage
layer or a database layer and even inside minivo when you provision new tenants the tenants could
be just different departments inside your company or an msp onboarding multiple customers or even
within and within a particular department,
they may have multiple applications. They want like different SLAs and different isolation
security levels. So when you create multiple tenants, Minaio actually uses the same namespace
to isolate even between the tenants. If you upgrade one tenant, you may even run different
versions of Minaio at different times and there is no disruption between the tenants. If you upgrade one tenant, you may even run different versions of MinIU
at different times,
and there is no disruption between the tenants
and is fully isolated.
So the applications and the data services like MinIU
or database are all managed exactly like one fabric.
Oh gosh, how well does this thing perform
under vCAM solution with VMware and all that stuff?
You mentioned the problem, right?
Like it was not like previously when you asked about vSAN, that for object storage, they couldn't run.
And we had the same problem.
We, of course, would like to make it easier for IT to control the physical layer.
We wanted to work on top of VMware. But the problem that early on we had
was vSAN to be able to hold petabytes of data.
It's not just the scalability part, right?
The other real problem is vSAN
as a software-defined network storage,
if we are running in one container on, say, Node 6,
and it's attached to a drive that is on Node 3,
now every I.O. that we perform we we cause
right amplification and we write across the network and we also have to erase your code right
this is the this this is the one that they beautifully fixed it by introducing vsan direct
which is new in this uh the 7.0 update one and vsan direct gives you the host local access
also it eliminates the raid
controller type bottleneck. If we actually get JBOD or a JBOF type access, you can now bring in
thousands of drives to actually build a very large infrastructure. And still, IT, without hiring
Amazon like DevOps, can manage the private cloud environment all through vCentral. You mentioned petabytes.
I mean, object storage is known for having sizable storage repositories,
but I'm not sure I've seen many VMware installations
with petabytes of storage in the past.
I can't think of too many vSAN petabytes in the petabytes.
That's what we're talking about, right?
Yeah, it was not possible before, but I can tell
you the use cases exist. Here is the problem, right? In every customer base that we have,
the IT is kind of frustrated that all the data processing AML workloads are run by the Hadoop
workloads, and the Hadoop guys are now ditching HTFS and moving to MinIO, and they went to
Kubernetes. And IT couldn't
manage those services. And then the other problem was even in the organizations that are entirely
managed under IT, like Splunk, for example, is actually growing really fast inside these
organizations. And Splunk actually grows to petabytes in no time. Bulk of the organization's
data growth is actually machine generated logs and event data
and Splunk is getting standardized there instead of Hadoop and HDFS type like complicated service.
And when we couldn't bring Splunk and Minivo, like Splunk Smart Store and Minivo combination,
we couldn't run them on vSAN because of the same problem. And we actually have customer
requirements that this is one of the bank I can't name.
There are three different sites they have to consolidate and totaling to like 70, 80 petabytes of data.
70 or 80 petabytes of data?
Yeah.
In one vSphere cluster or a couple of vSphere clusters, we're talking supercomputer stuff almost.
Actually, you know, it looks very big, right?
But not actually in the object storage space.
Like if you see the dense deployments for Minivo,
they actually like in one of our customers,
they actually have 200 drives per chassis.
In just 16 servers, they are talking about 39 petabytes.
40 drives, 48 drives, 96 drives per chassis.
16 terabyte drives. These things are just amazing.
And I don't have a problem with petabytes of storage being accessed by vSphere cluster.
Typically, we just look outside of HCI or BYOD type of solutions to do this.
We're looking at, you know, purpose-built object store solutions or purpose-built
foulers that could handle that skill.
It's really disruptive to think that you can get that, you know, in the native solution.
Yeah, a vSAN HCI solution.
That's what we're talking about here, right?
I can easily think of several use cases.
It's just that I would never have tried it.
It is disruptive.
In fact, that is how I myself thought
that when any way data cannot move around
and elasticity and stateless,
it makes sense for the applications
to be stateless containers.
And particularly if you look at minivo the entire minivo server is like a 45 megabyte static binary
and it's super easy to start even like some average javascript developer can run minivo
even if he or he doesn't know how to run elastic search it's that simple why would you actually
bring it on to kubernetes why would you put MinIO on container? All these questions. For me, it wasn't obvious when we started.
And we actually did not support Kubernetes. Even though it was designed to be cloud native,
I always thought that they would just buy these dense machines, run MinIO on top of it,
keep it simple, and application would be on the
containers, right? What actually happened was the community started maintaining these Helm charts.
They actually started putting containers. And if you look at our downloads, they are basically more
than 61% is all containers and Kubernetes type. And they are all, they basically, they are all like community and customers pushing us towards it.
When I started asking these guys, why are you guys doing it, right?
I was surprised just like you.
And what they told me was they want to completely containerize their software infrastructure.
Sounds very familiar, like how VMware, everything has to be virtualized, right?
This time they want to virtualize the data layer as well.
Why?
Because they are saying that they roll out their software updates multiple times a month,
sometimes even multiple times a day.
And this is crucial for them when they containerize.
That's why they containerized and brought Kubernetes for orchestration.
You can now deploy on edge or private cloud, public cloud, anywhere.
And if you only virtualized or containerized the application side,
if you go to Azure, now you can't put EMC appliance or a NetApp appliance there.
You can't even buy it.
This is where they want everything has to be containerized.
Right. So you guys get a lot of downloads.
I mean, is it a highly active environment?
I mean, is it a highly active environment? I mean, yeah.
I remember, as it is, it was already growing in the first two years, right?
And then around 2017, we were just doing our Series A,
and it started just exponential rise.
And our investors are super excited, and I'm telling them,
maybe it's one of the security fix we did. Everybody's rushing to update. just exponential raise. And our investors are super excited. And I'm telling them,
maybe it's one of the security fix we did.
Everybody's rushing to update.
Don't count on it.
It will fall down.
And it actually started accelerating,
started accelerating,
like even growing faster and faster. We are nearly doubling like every 18 months, actually.
Oh my God.
Yeah.
And so this is kind of, you know,
it's all part to seem,
seems to be part of VMware's push to, I'll say, conquer the container world as they've conquered the enterprise IT world.
It seems, right?
I mean, they're just trying to make this environment as useful to enterprise IT as they possibly can.
Yes.
Actually, for us, it was something that we wanted to do, but we could not do.
Just like the rest of the world says, don't fight the cloud. For us, we were there. We were born in
the cloud, right? But we didn't want to fight the IT because IT actually did important things like
SLAs and upgrades, updates. They still run the infrastructure, right? We have to incorporate
them, but we couldn't do it because we didn't want to be a hardware
appliance company.
This time around, that VMware bridged the Kubernetes world, the cloud-native world,
and the enterprise IT into one fabric by allowing us to not retrofit, but run natively.
This time, it made it possible.
And this time, we didn't want to fight IT.
Now, we don't want to fight IT. Now we don't have to. So let's talk a little bit about that not fighting IT and integrating
into existing flows. Because I like not fighting IT. I'm an
IT guy. If I'm not ready for containers
but I'm ready to move to vSphere 7.0,
what's the argument for MinIO in that environment?
So if you're IT, right?
So if you see the industry, how it happened, right?
How IT saw these new developments in their lab,
almost every case in our customer base, right?
It's very much like how Linux itself penetrated
and then like, say say even application services,
like say like in the past,
you would buy WebLogic and like say DB2
or SQL Server license
and you would give to the application developer.
Now we go build application.
But nowadays that application team tells IT
that not only I'm running Cassandra or Elastic and Kafka,
they are telling I have even orchestrated everything.
Now I manage
my application infrastructure I push multiple times it's all CICD and it is like I don't know
how to deal with that and it applications team is like let me do it and this is where but the
applications team they don't they're for them the priority is not SLA security and bunch of other
things that even they don't even know how to spec out the hardware and this is where what what the way the VMware integrated if you see Minivo specific case
itself you can you when you go to the vCenter UI and provision you like you actually you you
all you all you are saying is you basically say this tenant how much capacity how many nodes
how much memory and cpu resources
you want to give then you say you you want them to connect to an ldap or open id identity manager
a encryption service you connect to a key management service you basically just setting
what their bounds are right because you are in a better place you don't want some rogue application
to take over uh or even some unintentional right
you still control that you can you are doing all of that still without a ever touching kubernetes
but once they up a tenant is provisioned then the applications team has native kubernetes api
and they can do all api driven they are the one, your customers, IT's customers, that is the application team, they would use Kubernetes interface, but you would use vCenter interface.
And both of them are nicely integrated.
Kind of like a shadow IT came out of existence doing this sort of stuff.
And now they can actually do it on real IT infrastructure and sort of be administrated and managed to some extent by the real IT organization and stuff like that.
It's really interesting.
So it's almost time.
I just wanted to ask one question about how is it working with VMware being a startup like yourself?
Yeah.
You know, we work very closely with them on this.
And now I can tell you from my heart, right?
Actually, the team was wonderful.
For a company of their size, we felt like they are just another startup of our size.
They were moving fast at the same pace and also more than anything, right?
They were resourceful like a startup.
There is no bureaucracy, nothing in between. Getting things done for a company of that size resourceful like a startup there is no bureaucracy nothing in between
getting things done for a company of that size to behave like a startup was just
stunning right for us we we don't want to do things from a press release and marketing point
of view right we have to do something that is real and it will benefit our world and we our customers
and our community and it was the reason for us to get engaged with VMware was they were doing real,
they were fixing the problems the right way.
And that is what enabled us and got us excited.
But working with them closely, I think they are just a big startup.
It seems like to me, ever since they started playing in this container space,
their development cadence has started to increase. The vSAN team has always been, you know,
really quickly adopting technologies and stuff like that. I don't know. Keith, you're kind of
involved in that. What do you think of what they're doing these days? Yeah, you know, I remember I've
been doing podcasts around vSAN for the probably past five or six years that I remember it was vSAN, haha.
Wow, vSAN is in my data center.
I'm running vSAN.
So it's come a very long way.
It's a feature parity.
Most features enterprises care about for general purpose workloads are there.
And this is just another example, right?
Today, if you are rolling out like a software defined, everything cloud native, right?
You need to have a both storage, basically storage, networking, and compute.
All of them has to be containerized or VMs, virtualized.
And that's how you can roll out anywhere.
Otherwise, there is no cloud,
right? And they are now
there. They brought it together.
Yeah, it's
amazing. That's amazing.
All right, so Keith, any last questions for AB
before we leave?
No, I don't want to go down the rabbit hole
of looking at
solutions other than, no, I'll ask the question.
What if I'm not a vSAN fan?
What if I have VCF and I use VCF for the core?
What about solutions outside of vSAN?
So you can actually use MinIO
through the TKGI,
it turns to Kubernetes grid interface.
You can actually run MinIO
as just like a Kubernetes application, right?
That is also possible where you would mount anything
that maps as a CSI, that's like a vSAN,
like vVol, pretty much anything, right?
But I actually liked the current integration with the vSAN
and how it's tightly integrated
and we are in the supervisor cluster with more privileges
and you have the vCenter console.
But I think eventually even
i can't speak for vmware i think technically the same technology that is released can actually just support vwall as well but they find it it will be more like do you want to support a legacy san
nest why would you build object storage on top of san and nest not just object storage even the
distributed databases and data services,
they took care of replication, erasure code, everything.
So vSAN is the right interface.
So that's why they focused on this one then enabling vBall.
But I don't know, maybe in the future it might happen,
but it is more like it only benefits the legacy investment.
If you have already made investment into the SAN or NAS,
it makes sense, but for all new deployments, I think vSAN direct is the way to go.
Okay. AB, anything else you'd like to say
to our listening audience before we close?
Just one small point. If you see the data persistence
announcement, it is not just about
a hardware appliance vendors writing
a CSI driver and claiming to be now Kubernetes compatible.
This is actually, this is the first time that you see the shift has happened, that a storage
software is now treated like a database and it has to be available as a container.
This is where you see all the storage giants who are appliance vendors actually have no
role to play. They have to go back to the drawing board to build something that is not only software
defined, it has to be container native object storage built from scratch, which is what we
have been doing all along. It gave us a huge advantage. Yeah, I would say so. Well, this has
been great. Thank you very much, AB, for being on our show today. And thanks again to MinIO for sponsoring this podcast.
Thank you for having me. I always enjoy talking to you both. Thank you, Ray. Thank you, Keith.
All right. That's it for now. Bye, Keith.
Bye, Ray.
And bye, AB.
Bye, everyone.
Until next time. Thanks.
Next time, we will talk to another system storage technology person.
Any questions you want us to ask, please let us know.
And if you enjoy our podcast, tell your friends about it.
Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.