Grey Beards on Systems - 128: GreyBeards talk containers, K8s, and object storage with AB Periasamy, Co-Founder&CEO MinIO
Episode Date: January 27, 2022Sponsored by: Once again Keith and I are talking K8s storage, only this time it was object storage. Anand Babu (AB) Periasamy, Co-founder and CEO of MinIO, has been on our show a couple of times now a...nd its always an insightful discussion. He’s got an uncommon perspective on IT today and what needs to … Continue reading "128: GreyBeards talk containers, K8s, and object storage with AB Periasamy, Co-Founder&CEO MinIO"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here. Welcome to another episode of the Greybeards on Storage podcast,
a show where we get Greybeards bloggers together with storage assistant vendors to discuss upcoming products,
technologies, and trends affecting the data center today.
This Greybeards on Storage episode is brought to you today by MinIO.
And now it is my pleasure to reintroduce A.B. Periasami, co-founder and CEO of MinIO,
who's been on multiple Great Beards podcasts.
So, A.B., why don't you tell us a little bit about yourself and what's new at MinIO?
I'm one of the co-founders and CEO.
MinIO is an object storage.
And what's new, there's actually quite a bit of activities going on.
I think the latest news is switching to the next gear.
It's all about multi-cloud now.
The adoption of Minio has grown quite quite a bit even inside the cloud and we are
we are targeting basically all all the multiple all the clouds and we are going after it.
So as being object storage has been kind of a secondary storage for I would say decades now.
I first started talking about it back in the 2000 through 2004 time frame. There's some discussion that
object storage might become more like primary storage. Where does MinIO fit into that framework?
So that's something that we have been quite clear from the beginning in terms of where we wanted to
be, but the market has come a full circle around it. Now the market has aligned itself to object storage
as the primary storage.
But if you look, it was not like something we predicted.
It was already there when we started.
If you looked into the public cloud,
object storage was the foundation.
Whether you store some static websites,
container images, application artifacts,
all the way to AML,
right? Like look at like Snapchat to Snowflake. They're built on object store from EMR to even
outside of AWS, from BigQuery, Azure ML, Power BI, name it. Object storage was always the foundation,
always the primary storage. In fact, AWS, the whole cloud thing started with object storage
as a service first, that is S3. Then came everything else. But outside of the public cloud,
the industry was dominated by SAN and NAS vendors. They were in dismissal, right? They thought that
the future would be more like enterprise on the cloud, that is file block and VMs.
And a file block and VM cloud is called managed service,
like the traditional MSPs.
That market has changed.
I think now you can clearly see VMware is embracing Kubernetes wholeheartedly.
The new model is containers, objects, Kubernetes is the infrastructure API standard.
YAML is the definition of their stack, right?
And it has become, today object storage is the primary storage.
And why this change happened, even outside of the public cloud,
there are three primary use cases for SAN and NAS,
databases, right, database, right?
That now, look at from Kafka, Elastic, ClickHouse, name it, pretty much all the popular
open source databases, old school ones, all the way from SQL Server, Vertica, Teradata,
all of them, they have gone object storage. Then look at VM images, snapshots, container data
images, not the container, the VM images and the database snapshots all of them
have come to object storage in fact the new world there is no vms containers container images and
artifacts are on object storage always and then the last thing is all the archival data
actually archival data not so much for the for the for the primary storage for For the SAN NAS, the AML data,
that also came to object storage.
So today, I think object storage is accepted
as primary storage, not just inside public cloud,
at private and edge as well.
It's still a struggle seeing that in the enterprise
to some extent, given the enterprise applications
proclivity for block and file.
But I see object being, you know, the major storage play in any of the cloud vendors these days.
It's like EBS and those sorts of things are, you know, are very rarely used anymore.
It's all the data sitting on object storage and EBS is maybe temporarily used and things of that nature. Yeah. So that, Ray, you're hitting a really interesting point,
which is most of like the resistance that I've seen from object storage is
from traditional workflows.
Let's say the data scientists are still kind of slow to pick up object
because their tools don't natively speak object.
AB,
I love to hear what you're seeing
from like an end user perspective,
legacy applications, like HPC and et cetera.
You mean something like Splunk doesn't use object?
No, meaning like a lot of the analytic platforms
that people use at the end client.
Hadoop?
Yeah.
Not Hadoop.
Hadoop is definitely, you can
get block storage. I'm talking about
when people connect
to file shares and run
traditional
analytics applications. That's where
I'm seeing R
workloads, etc.
Actually, Ray answered that question in
just two words. You mean Splunk doesn't
support object store? Actually, Splunk's smart store is S3 API foundation. Splunk doesn't know how to talk to
their SAN or NAS. In fact, Splunk engineer, if you talk to, they will tell even the Splunk hot
tier, don't put it on SAN and NAS. The hot tier is more of a primary cache with some persistence.
They make replicated copies of it on local drives. But for all
persistence, SmartStore is the way to go. And SmartStore is ObjectStore S3 API. Every one of the
analytics stack today, if you look at from old school established players, all the way to the
most modern ones, all of them have gone ObjectStore. store in fact the most successful one there that replaced sand
and nas was htfs and htfs is now coming to object store for a while they were using s3a adapter to
make object store look compatible to uh to the adobe applications right so it gave file interface
hcfs file interface to object store so the the Hadoop applications, Hive and everything,
you didn't have to rewrite. But nowadays, if you look at all these modern AML, like Kubeflow,
which is the data pipeline standard for Kubernetes, uses mini SDK to talk to S3 compatible
object store. So object store is now the standard for all analytics.
Maybe it's the question of definitions. What do you mean by analytics, Keith,
versus, you know,
I look at analytics
as big data kind of solution.
So if I'm using R,
if I'm using some type of...
R, which is, okay,
scientific language.
Scientific language.
It's like HPC.
Like HPC has been extremely resistant
to object storage
because those users,
those data scientists, because they're still data scientists, are not necessarily using Hadoop. They're not using modern applications.
They're using the legacy tools they've always used. But I mean, R and Python and all these
other solutions that do data science, I mean, they have object storage APIs, don't they?
Yeah, they have object storage APIs,
but again, it's a question of
where does the storage live today?
If the storage doesn't live in object storage,
you have to either move it from your file system based.
It's a workflow question,
like what comes first, the chicken or the egg?
I love object storage, absolutely,
because it's cheap, deep, performant now,
but my workloads exist.
My data sets exist on FAL.
That migration has been a big slow in my experience.
So there are two parts to this, right?
One is a data scientist downloading some CSV, like a JSON type log data or some kind of data set.
They download it to their laptop and or their
workstation and they perform and the local drives sure you can run minivo on those laptops too
right the application developers do it all the time but they would just store these csv files
on a file and if you're running our script right it just local operating system provided this file
system is just fine but where did these data sets come from?
They come from a large data repository that's off an object store.
And it used to be HDFS that became object store
because you can access the data scientists spread across.
They actually can download the data set over HTTP securely
because S3 API is just HTTP, that's fully API.
HDFS to SAN NAS accessing across the cloud,
forget about even WAN, right?
They found that object storage is a lot more convenient
and secure to do that.
But for local processing, sure, local file system is just fine.
And outside of this use case,
the HPC market predominantly is MPI-based workload. They have their own
highly optimized MPI-IO and the Lustre-type systems had native integration with the MPI
systems. Sure, those HPC community from, say, 92% to go to 94% efficiency, they have to make it twice as complex. They would because it's worth for them.
And that's a market that until we see a pull,
it doesn't make much sense.
But the commercial HPC market
is quite different from the national labs.
The commercial HPC market has moved on.
You're talking like bioinformatics
and things of that nature?
They're all object storage nowadays.
It is moving away.
So I see where you're coming from, Keith,
but yeah, you're right.
I mean, if I'm doing an application
on my laptop or something like that
or in my test environment,
I probably want to look at files,
but that data is coming off
of some object storage
someplace in the cloud or
someplace where it's all being gathered and stuff like that. Don't you think that's the case?
Well, user habits are very hard to change, right? I think if you're coming out of an environment
where you're at a university, et cetera, and you're learning this stuff new, and your first experience with interfacing
with your data sets is from object storage,
you're just gonna keep that workflow.
The opposite is true when you've spent years
connecting to the F drive and running it off.
User habits are extremely, extremely difficult to change. Whether
or not the underlying technology is better or not is not really relevant as can you get the users to
adopt it. And my question is more about that in adoption. And it sounds like, Avia, what you're
telling me is that if you want a modern experience when it comes to data analytics. Object storage is where it's at.
Yeah, yeah, yeah. And the file is actually even the data science community, once we gave them a nice file explorer that they can find their browser and download the data set and search all that,
if you give them something convenient, not necessarily a better technology, you're spot
on there, right? If you just give them something even easier, they actually change their habits. But where the file will continue to play a role
is the enterprise community, particularly the ones who cannot hire software engineers to modernize
infrastructure as a code, that's actually still a good part of the industry. It's the traditional
VMware-based virtual. Yeah, yeah, exactly.
I mean, you know, I came from the block world long before files even existed.
So, yeah, it's something that's embedded in what I do. But to a large extent, you know, if I'm doing AI, ML types of things, it's all based on objects someplace.
I mean, it's all sucking in objects, maybe coming into a file or a CSV file or something like that and being processed there. But in the
real world, guys that are doing AI ML, it's all objects. It seems to be, I don't know. You look
at Kubeflow or look at ML Ops or something like that. It seems like it's getting all its data
from object stuff. The files will stay, but it's like mainframes is still a profitable business, right?
Files and blogs.
You know, MinIO doesn't work on mainframe yet.
It probably does with a Z something,
Kubernetes thing or something.
I don't know, Linux.
Actually, surprisingly,
there is a native port for power architecture
and there is actually a startup now,
I think it's Model 9 or something.
They use MinIO to modernize mainframes applications
to become cloud compatible.
They use MinIO there.
It's a market that I don't have much expertise.
I'd rather leave it to partners.
Well, you mentioned the cloud earlier on, AB.
I mean, it seems like the cloud has always been
to a large extent object-oriented. And you made a statement there. I have to go back. It says AWS actually started
with S3 alone. Is that what you're saying? Yeah, that is actually true.
I never saw that. I never realized that.
Yeah, actually, that's pretty interesting. I talk about that a lot. People think about EC2, one of the most common, if not still the most commonly used service outside of S3.
I remember when AWS announced their or Amazon announced their cloud services, AWS, and the service was S3 and I checked it out.
I'm like, why would I ever one? I said, why would I ever use this?
So take my advice with a grain of salt. I was obviously wrong about that along with probably
thousands of others. But yeah, the S3 is the oldest service out there.
Yeah. So what about the multi-cloud? The problem today with enterprises adoption of cloud is that occasionally AWS or Azure or Google go down.
And I need to have services that now span clouds like I have services that span data centers in the past in order to keep up.
Where does MinIO fit in that framework of multiple cloud operations?
So when AWS goes down,
when Google, all these clouds goes down,
actually I do see tweets in the community
that I am safe, I'm running Minayu
and I did not get affected.
They do talk about all this, right?
But the reality, if you look into clouds,
uptime is definitely higher
than most of these data centers
they run themselves, right?
I don't think that uptime is a big deal.
All infrastructures eventually go down here and there,
but if I'm an Amazon customer,
I would still feel confident that their engineers
are competent enough to bring it back faster
than my engineers.
Of course, my engineers, I have confidence,
but in general, right?
But I think the real reason why
multi-cloud is happening is it's not even because they had a clear strategy today most cios have a
strategy and mandate that they have to be multi-cloud ready but they even those cios
end of the day they will tell we did a large contract with google and then all of us
standardized on google but and you can see that most organizations have that exclusivity
because they have to get that discount and they made the commitment.
But the real reason why multi-cloud is happening is because
the developers started building applications as microservices
and they containerized everything.
When they containerized everything, they naturally brought in
Kubernetes to orchestrate
these containers. They detached their application stack from the public cloud, and they looked at
public cloud as between asking my IT to provision Dell or Supermicro servers, here on an AML file,
in a moment, I can provision these servers. They left IT and went to the public cloud.
It was more of infrastructure as a service. And then they went there.
Object storage was seen no different from MongoDB or Elastic or Kafka.
It's the blob store they adopted.
They brought in they they brought in their software stack containerized
and they pushed to the cloud.
And overnight, they played my management told I have to go to Google Cloud,
redeploy the software stack on Google Cloud.
It happened.
So multi-cloud happened as a...
You think multi-cloud is there
because Kubernetes and containerization occurred?
I'm not sure which is first here in this environment.
But even if you're running a cluster,
let's say an AWS and a cluster in Google Cloud,
the data is a different question.
I mean, so the data has got to be sitting someplace in this environment. Well, Ray, I want to interject.
Maybe I will love your feedback on this.
I think I have a $2 billion proof point here.
Just read this morning, J.P. Morgan, Jamie Dimon was on the investor call,
I'm assuming yesterday, the day before this recording, talking about how JP Morgan has
invested $2 billion in cloud, much of that spend going to the data center to enable cloud. So I think to AB's point, the cloud experience is what most businesses are driving towards.
Yeah, IDC, STAT, 88% of enterprises want to be able to repatriate static cloud workloads, but have that cloud capability.
And that starts with the underlying storage.
This is the great biggest of storage.
We believe in storage.
And you need this storage across platforms to be able to do that.
So how does MinIO facilitate this storage residing multiple platforms
and stuff like that?
Yeah.
So this one, and combining the previous question too,
that did Kubernetes and containers drove the strategy
or the other way around, right?
Our bet early on was when we came in,
there was already Google Cloud and Azure
starting their journey,
and Amazon S3 was like the standard, right?
When we saw they were,
each of them were incompatible with each other.
And then outside of the public cloud,
HDFS, SAN, NAS, anything you look,
it was a array of standards,
every one of them incompatible with each other.
We knew that this was not going to be the way,
in the long run, everything will look like AWS
or it is AWS itself.
That was inevitability.
So we knew that given enough time, the problem will be solved, right?
But then we can fix, we don't need to fix the compute side.
Compute side, when we started from Cloud Foundry,
like Mesos to Docker, Swarm, Compose, there are so many standards.
We saw Kubernetes was better positioned.
It was written in the infrastructure language of choice
as Go, and they understood the community sentiments better.
They drove well, right?
It was a better idea to declarative model.
We saw that that compute side will be solved,
but the data side, this industry for switching from POSIX to object API like S3 itself is a monumental task.
It happened finally now, right?
At least it's happening and it's happening on all the emerging markets.
But for them to go from POSIX to multiple API standards, that's not going to happen, right? That's where we saw that if instead of releasing
yet another open source standard,
it's okay for us to stick to S3 API.
Amazon won't be unhappy about it.
So we chose to promote S3 API as the standard across all cloud.
And Minivo's position is Minivo can run inside AWS,
on Outpost, it can run on Google Cloud,
even Rancher, Anthos, or VMware, everywhere.
S3 cannot.
And that was our bet.
And we knew that in the long run, multi-cloud will be inevitability, and we would be able to help the community at a giant scale.
If we don't do it, we would fail anyway.
We focused on the application developers
to help build a powerful ecosystem, and that paid off.
So you solved the API problem with taking the S3 bet,
and it was a successful bet from NIO,
and you guys benefited from it and more power to you.
But the question still remains, AB, where's the data?
The fact that I can use S3 to access it is the right thing across every cloud and every on-prem environment in the world.
Absolutely.
Yeah.
You know, now I have plenty of data points to actually understand what's happening.
When a customer, say, comes from AWS S3 to Minivo, or say, even they have minivo on-prem and they went to cloud and they
deployed minivo in the cloud they actually don't move the data around because the new data that's
getting produced is more than all of the historic data combined so they actually don't move the data
which is expensive and time consuming they they build the new infrastructure and the new data goes
there and they're all kept in silos.
Some organizations choose to centralize, some organizations choose to go
decentralize that model, but overall they never move data. So you're saying that
the data ends up being distributed or partitioned across these multiple cloud
slash on-prem environments based on what applications started in that
particular environment and what data needs they had at that point. Is that how it plays out?
Correct. Because you can move the application code, but not the data, but you can't move the
application code if you're stuck with the ABA. And the way it supports the multi-cloud is that
you could have, let's say an application running on-prem and reference a URL, which happens to be in AWS or Google or whatever, and still access the data, right?
In theory, yes, right?
But then what really happens is that even if they pick one cloud and deploy it in I.O., they actually deploy in multiple regions across the world.
And applications, wherever they are, they are also global.
They tend to pick the one that is in
close proximity even though s3 api of minioke is https and you can access from anywhere the
bandwidth costs are not the same right it's more than the storage infrastructure so maybe it sounds
like what you're talking about the value necessarily isn't in building multi-cloud apps,
but having a multi-cloud operating model in which you can adapt or move your workflows,
whether you're developing a point of sale application in one cloud or a data analytics
platform in another, the way that you address your data is not changing across public clouds.
That is very accurate.
Yeah, that's very accurate.
In fact, all they care is their software.
It's in simple, plain terms, like in developer terms, right?
They just want their applications to be containerized.
And it's what VMware envisioned as software defined data center.
It's more than the infrastructure layer.
All they care is their application stack.
In the ML file, if I can take my entire software stack
and roll it out to any cloud on demand, I'm good to go.
And that's actually how they build it on Minikube,
go to CI, CD environment, and then goes to production environment.
That pattern has been followed always.
None of these guys actually are
building application connected to EKS or AKS and all their data stored on S3. They don't build
application inside the cloud. They build it elsewhere. Even their day one launches on S3,
born in the cloud applications, those applications are not built in the cloud. And we find that all they care is their software stack to be independent of any cloud.
What made that possible was containers and Kubernetes.
Kubernetes, containers, all that stuff.
You mentioned Minikube.
I was going to try to run Minikube on my Macs here, Macs cluster, but that's another story.
So Kubernetes is the key to multi-cloud as you see it, right?
So, A.B., what you're saying is pretty much in line with what I found at KubeCon last year.
I was sitting at the launch table.
It's one of my favorite spots to sit during conferences, listening to this modernization of the platform team.
Before, the platform team might've provided VMware vSphere
and they provided VMs.
Now the platform team is like this weird mix
of these traditional infrastructure operators
with developers sitting inside of that team
that is consuming services like MinIO or deploying services for MinIO to enable,
you know, I can call it more of an enablement team that enabling the developers that's solving
the business challenge. They're kind of a shim between the developer solving the business
challenge and the cloud provider. Is that what you're seeing? Absolutely. You can see right here, the change to go to cloud, it cannot start with IT. The
problem is that the cloud is incompatible with the enterprise IT, right? File block VM,
if you take that stack and retrofit in the cloud, it doesn't run or it runs very poorly.
So you have to involve developers to, it's not just a matter of automation, right? Like
Chef and Puppet tried that. It's not that case. This time they have to involve developers to it's not just a matter of automation, right? Like Chef and Puppet tried that. It's not
that case. This time
they have to rewrite the application to go
cloud native. Often they're finding that
it's cheaper to rewrite than
retrofit. This time, all the
organizations that have become cloud ready,
they involve developers into
the mix where IT became
ops-centric, DevOps-centric. They worked hand
in hand. Those organizations succeeded.
The rest who resisted cloud actually and resisted,
whoever claimed that those are developer tools are complicated,
what we found was there was a wall between IT and the development team.
The development team went to cloud nevertheless.
I never looked back.
So, I mean, you mean, taking an enterprise application and making a cloud version of it, there's this lift and shift
discussion or refactoring and reimplementation kind of thing, or redesign altogether, I guess,
solution. So you see, does lift and shift work or is it not? It doesn't.
Actually, the biggest proof is VMware itself tried in their first version of their cloud was VMware as a service, right?
They took VMware.
I give you same VMware as a hosted offering that looked like just outsourced data center.
That's not what cloud is.
That's where customers basically say their version of cloud is it has to look like aws
right the aws experience that's what i think google and microsoft understood and they they
did not give the same same old software as a service they it it's not just about automation
right it's fundamentally incompatible meaning throw away all the legacy the biggest advantage
of cloud is break the legacy systems,
throw them away.
We can build modern infrastructure
like how we built it for ourselves.
Like that was Amazon's message that resonated.
You completely took that value away.
And if you brought back legacy, then it's no better.
We have to have a lengthy discussion about that offline.
So where does opinionated solutions fit into this multi-cloud environment?
It was a word I almost had to look up when I saw that.
I think the opinionated is too broad, right?
If opinionated is in the form of stack,
like we talked about LAM stack in the dot-com times,
still people try to do this.
I recently, last week, I came across Merck,
some stack like that.
I was like, what is that?
I didn't even know what it was.
Then it was like MongoDB, something.
But what I find that the opinionated stacks
in the form of pass it never
worked it's something that even amazon and other cloud players gradually increased providing the
instead of giving you opinionated stack like google actually the very first version was based
on python based a google app engine type model right it was more closer to pass it did not work
it in the what we are finding consistently all the time was the the
developers don't like opinionated stack they they they want building blocks so they can compose
their own application infrastructure stack but for this is something it this this this topic alone
requires like a whole day discussion that you can see that the biggest success ever happened
in the PaaS world is Heroku.
And it's not a big one, right?
So it was a small exit.
It's still there, but PaaS never worked
because developers don't like opinionated stacks.
But having said that, we like being opinionated
in the sense when, like early on when we started Minio,
a community was like hey why not use
swift api because that's open source why are you promoting s3 api s3 api is not even a standard
and it's proprietary my point was pick one you want swift api or you want s3 api i'm not going
to do both and the answer was very clear sometimes you will come and ask actually they still ask
why don't you add nfs api
give file block and object and i still tell them the same thing that if i it's if i added a legacy
protocol the all the advantage of s3 api is gone the advantage of s3 api the reason it is
incompatible is because posix is a legacy if i POSIX++, it's actually minus minus. I'll end up giving you a MediaCard object storage and a terrible file system. Not because I don't know how to build a file system that was actually don't care about how you implement the specifics of details.
They just kind of want to consume storage.
How are you seeing that being kind of validated in the market with developers?
So developers, when they approach storage,
when they approach Kubernetes,
they want to provision storage.
Like what are some of the pain points
they're realizing?
Like, oh, the regular Kubernetes providers
are not enough.
Oh yeah, that's actually,
there is a lot of confusion in this topic, right?
I'll tell you from two different angles.
The one, if you talk to the actual consumers
of the storage itself,
they actually don't even call it storage.
Most of these, they are all developers
who are dealing with data.
They look at it as a data store.
Minivu, for them, is no different
from MongoDB or Elasticsearch.
It's just that if you are talking about metadata type data
that you want powerful query interface, you would put in a database. If you have blob data and you want
lots of persistence, you would put in an object store. They look at MinIO as an object store,
and that is how the consumers of the cloud, who are the application builders, data engineers,
data architects, AI, ML, data scientists, they locate it as just a data store period.
But if you talk to the infrastructure people there, particularly if you talk to the storage data architects, AML data scientists, they look at it as just a data store period, right?
But if you talk to the infrastructure people there, particularly if you talk to the storage vendors,
they brought in SAN and NAS and then wrote a CSI adapter and they all want to look cool.
These are same old appliances. They suddenly become the Kubernetes ready and they claim that they are Kubernetes native storage. So this is all the persistent volume stuff that was through CSI and all that stuff.
CSI.
But that's not what actually
the Kubernetes storage is about, right?
The application, there is a big disconnect there
and a lot of confusion too.
Every one of these modern distributed data stores,
if you look at who wants this,
the traditional SAN, NAS,
in the cloud, SAN and NAS are are considered legacy it's only meant to bring
legacy applications that cannot be rewritten as a stop gap by yourself sometime that's when you
will go for efs or ebs otherwise imagine like snowflake written in ebs or efs they would not
have started only right they that the csi providers are meant to give you legacy compatibility. But if you are talking about a modern application,
even the databases themselves, right,
that they are stateful sets, where would they store?
Look at every one of the modern distributed database.
They have gone scale out,
and all they want is a local persistent volume.
Like previously it started with host path
and then came local volumes,
but even the local volumes does not have a CSI driver, which means it cannot be dynamically
provisioned.
And I saw that there was a problem in Kubernetes, that there are no CSI drivers to manage local
drives, local JBOTs, which is what every modern distributed data store, data processing
frameworks are built on, not on SAN and NAS.
VMware recognized this and brought in vSAN Direct for their
Tanzu environment. But outside of it, this is
actually an emerging area of discussion. OpenEBS came with
local PV and I think Longhorn from Rancher also has something
like this. I needed this for Minivo whether or not I solved
this problem for the rest of the industry.
So we wrote something called DirectPV. It's a direct persistent volume. So all you want is a
bunch of local drives automatically provisioned and managed for Kubernetes through a CSI driver.
So it's just a volume manager, not a storage system. Storage systems are distributed data.
So DirectPV is accessed through a CSI plugin or what?
Yeah, it is actually a CSI driver.
And you give DirectPV all the local drives.
And when you ask, say, if I'm running, it's not just MinIU, right?
So maybe I'm running Elasticsearch.
And Elasticsearch makes copies of replicated copies of their data sets on their local drive.
For long-term persistence, they would put it on object storage let's leave that aside just to
run elastic search if you brought in San or NAS and put a scale out system on a
scale up architecture it wouldn't scale and it will be inefficient all elastic
search wants is local PV but if you use just kubernetes provided local PV there
is no CSI driver so you have to manually
pre-create these volumes and then provision that's kind of inefficient it breaks the automation if
you use direct pv when you when you provision your elastic search or kafka or anything right
anything distributed when you want these volumes when when say when when elastic search says i want
10 tb on on eight nodes each,
each of these eight nodes 10 TB local,
you make a volume claim and DirectPV will run
your Elasticsearch containers with the-
Across all the nodes, across all the storage.
Yeah, yeah.
So I'm trying to understand here.
So let's say I have MinIO using my local storage
or defined for using DirectPV for local storage.
But if I want to access object storage sitting out on the web and I'm a container, I still just use a URL.
I don't have to do a persistent volume claim or anything like that, right?
Yeah, you don't need it.
You just use object store, S3 API, just like Redis or MongoDB or anything else, when you access data services, actually, this is an interesting segue into what is disaggregated storage?
The industry talks about disaggregated storage.
If you talk to the application developers, they will tell you
disaggregation is between stateless microservices and stateful data services,
that is, data stores.
That's what they mean by disaggregation.
That's what cloud talks about disaggregregation. Talk to the storage vendors. They talk about desegregation between the drive
and the storage systems, the data stores. Yeah, and the compute, right? And all that stuff. Yes,
yes, yes. So if I'm using like MongoDB or something like that in a containerized application,
I'm using a MongoDB API. I'm not creating a persistent volume on a Mongo database
or anything like that, right?
There is no...
Underneath the MongoDB, MongoDB
would make a persistent volume claim
and then that's where...
That's where it stores its data and stuff like that.
The application won't see it.
Yeah. Again, Kubernetes
seems to be the key to all this stuff.
It is now the API of infrastructures.
It's the API standard for infrastructure.
Yeah, declarative and all that stuff.
It's bizarre.
It has to happen, right?
Otherwise, it's very hard to build.
Nowadays, it's not the installation rates.
How do you operate at scale?
Operations means every day you are rolling out new updates.
Operations has become the most important problem.
And without standardization, it's going to be very hard.
Yeah, yeah, exactly, exactly, exactly.
So recently, there's been some new funding for MinIO.
Our series was closed.
Is that true?
Yeah, the existing investors preempted a term
with a term sheet that Intel Capital, Pat Gelsinger himself presented a term sheet that was
very humbling to receive that. Then SoftBank participated along with the existing investors.
It's 103 million round at a billion dollar valuation.
And it's Series B.
And we are still a small team at the time when we got the funding.
We were like a 40 member team.
How does that compare to the overall market?
This is a crowded space.
When Ray hit me up to do this sponsored episode with me,
I'm like, oh, another Kubernetes storage provider.
Like how does this set you apart from your competitors
when it comes from a financial finances?
I never saw the amount of money you raise,
the number of people you hire
to be a real measure of success, right?
It's still the case.
You can see previously we raised only 23 million series A is the number of people you hire to be a real measure of success, right? It's still the case.
You can see previously we raised only 23 million Series A and it didn't slow us down.
We were accelerating like crazy and still, like today,
like around 1.1 million Docker pulls a day.
And it's not just, you can...
1.1 million? Did you say 1.1 million Docker pulls a day?
Yeah, it's just from a Docker hub alone.
It's not including the private repositories and all other repositories, right?
I wasn't supposed to say this, but my private infrastructure that I give access to my team,
I found the MinIO VM.
I'm like, whoa, I don't know if Ray infiltrated my team or not.
Yeah, it's hard to stop not. Yeah. It's hard
to stop at this point. It's just
there. But we knew that
there is only one way that this market
will consolidate.
I actually like to go into
a crowded market because the market
is established. The hardest part
is actually to go do concept selling
and create a new market. It's
easier to go into an established market where there are so many players selling and create a new market it's easier to go into a established
market where there are so many players you create a superior product like fine superior for me is
not more features or more beefy shiny it is fine craftsmanship listen to the users give them just
what they want and that fine touch and finish you get to connect with the users, right? If you put that product first attitude,
it's hard to go wrong. And that was the reason that we were quite confident that the market is
so big 10 years from now, no one will blame me that I picked the wrong market because data is
going to be everybody's problem. It is everybody's problem. It will be everybody's problem for the
foreseeable future. It kind of paid off, right?
It's really simple ideas that Unfocus allowed us to get here.
Yeah, yeah.
You're in the right place at the right time.
I'll say that much.
And so what comes next in your world of MinIO?
I mean, you've pretty much conquered S3 API compatibility.
You're sitting here with a pot full of money.
And who are you going to go after next?
So from our thinking point of view, we never saw anybody as a competition.
We were eternally unsatisfied with our own creation.
It's like every artist, if you look at,
ask them about their past work,
they're actually not happy in spite of being a big hit.
We are just, we know we can do better.
And good part is software,
there is always version one, version two, and version three.
We can keep on improving.
So that part is all I'm thinking, right? But on the other hand,
now should I introduce the next product?
Our skill is to actually, we are creators and product creators.
We can go do Minio of this, Minio of that, and we can do all that, right?
But then, on the other hand, Minio is, while we got the land grab,
the commercial journey is just starting.
And if I launch a new product, it will create a branding problem.
Product creation is easy.
Creating a business around it and promoting the brand is very, very hard. I am better off accelerating on the commercial journey.
Right now, the customers are actually coming to us because we are deeply in production across many of these enterprises.
And when they come in, I think it makes more sense for me to help them run their infrastructure
at scale with more ease and better security and better operational visibility.
Those are more important to me.
And that's actually where basically if you look at Amazon S3 versus us, one thing
Amazon themselves admit that S3 has become very complicated. And Minio's case, while the industry
talks about how easy Minio is, I actually think that it is not, right? We can still, there's a
long way for us, right? Yeah, make it easier. I think I will never go wrong if I do that as
compared to launching new products.
But I think one area I would continue to invest
on the next big shift, if at all,
from Minio to the next layer is what you do with the data.
Now that you stored all the data in Minio,
we became the data infrastructure for your organization.
What you do with the data is more important.
Search type functions and unlocking the value of the data
through AI machine learning functions,
I think that's an area I would definitely invest.
Interesting. Well, this has been great.
Keith, any last questions for AB before we close?
One use case that we haven't talked about has been
the standard OS images, application artifacts,
snapshots, backups, etc.
Like the mundane tasks that we've done in traditional storage arrays. kind of the standard os images application artifacts snapshots backups etc like the the
the mundane tasks of that we've done in traditional storage arrays where's min io and providing the
that type of capability so uh that it started out uh on the artifact side uh as a common use case
the like from uh from jfrog to uh i think this is Harbor, container image repository, like all of them,
just storing the container images,
that itself, it is clearly object storage
is the backend for that.
And Minerva grew quite popular there.
And what we started seeing was,
it's not just about container repository.
They are constantly building new container images.
Every day they make a new patch. How is this new commit tested? They build building new container images. Every day they make a new patch.
How is this new commit tested?
They build a new container and the whole stack,
whatever the code has changed,
all becomes results in new containers
and they get tested in a CI, CD automated framework, right?
And that results in a flood of container images.
Used to be VM images.
That's why you needed all this copy data management,
secondary data management. That has become more of now artifactory store. And that market also
came to object store. Pretty much all the CICD frameworks underneath, you will find that if it's
not MinIO, it has to still be an object store. Probably if it's not MinIO, it will be some kind
of public cloud. Even the models, another one that's growing out of that,
the application artifacts are container images,
but now we are seeing a new class of artifacts
that are just machine learning models.
So the models themselves are becoming not quite containers,
but container-like solution or artifacts.
They are kind of the data container.
The platform with that, so versioning, et cetera,
with that other thing.
Exactly.
Yeah, the whole MLOps thing is all about that.
All right, AB, anything else you'd like to say
to our listening audience before we close?
I think the questions were great.
I enjoyed the discussion.
We touched upon everything.
Okay, good.
Well, A.B., thanks for being on our show today,
and thank you to MINIO for sponsoring this podcast.
That's it for now.
Bye, Keith.
Bye, Ray.
And bye, A.B.
Thank you, Ray.
Thank you, Keith.
Until next time.
Next time, we will talk to another system storage technology person.
Any questions you want us to ask, please let us know.
And if you enjoy our podcast, tell your friends about it.
Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.