Grey Beards on Systems - 124: GreyBeards talk k8s storage orchestration using CNCF Rook Project with Sébastien Han & Travis Nielsen, Red Hat
Episode Date: October 9, 2021Stateful containers are becoming a hot topic these days so we thought it a good time to talk to the CNCF (Cloud Native Computing Foundation) Rook team about what they are doing to make storage easier ...to use for k8s container apps. CNCF put us into contact with Sébastien Han (@leseb_), Ceph Storage Architect and … Continue reading "124: GreyBeards talk k8s storage orchestration using CNCF Rook Project with Sébastien Han & Travis Nielsen, Red Hat"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Keith Townsend.
Welcome to another sponsored episode of the Greybeards on Storage podcast,
a show where we get Greybeards bloggers together with storage assistant vendors
to discuss upcoming products, Senior Principal Software Engineer and
SEF Storage Architect, and Travis Nielsen, Senior Principal Software Engineer, both at
Red Hat.
So, Sebastian and Travis, why don't you tell us a little bit about yourselves and what's
going on with the Rook Storage Project?
Yeah, hi, this is Travis.
Glad to be with you today. Working on what's happening on with the Rook storage project. Hi, this is Travis. Glad to be with you today.
Working on what's happening with Rook today.
That's a big question. Lots is going on.
Maybe I'll start with a little background first on what Rook is and what we're trying to accomplish.
What Rook aims to do is really bring storage to Kubernetes
in a way that's natural to work with Kubernetes. And where we started, the storage platform we
started with as well was Ceph. So we knew Ceph was a great storage platform and it was built
long before Kubernetes ever existed. So where Rook started was we said,
well, let's bring Seth into Kubernetes.
And the way we do that is with an operator.
So an operator works with Kubernetes CRDs
or custom resource definitions to, yeah,
to respond to what user's desired state is.
So you want to deploy Ceph,
so you tell, you create these custom resources
that tell Rook how you want to deploy Ceph.
And then the Rook operator is the component
that goes and makes that happen.
It automates the install and everything around getting Seth running in the cluster.
So it's worth pausing here for a sec to really dig into this deeply for our audience, at least beyond the cover.
A CRD is a object you configure, let's say, in generic sense, not just Kubernetes, but I want to describe a resource and how I'm going to attach the resource, etc.
So it's a rich definition of how you're going to use a resource.
And it's a self-concept or is it Kubernetes?
No, it's Kubernetes.
Yeah, it's an extension to the kubernetes api
when when there is something you want to define or when you want to see happening in kubernetes
and kubernetes has no idea about that can be a storage cluster for example then you can define
a crd that represents that cluster and then on the side, you have an operator that will respond to requests
like instantiation of that CRD.
Then the operator will respond
and then deploy a cluster, for example.
And really think of the operator as a,
you just take all of the operational expertise
and then you bring all of that into a software entity that will be responsible for
deploying, maintaining, and just managing the entire lifecycle of a software. In our case,
it's a storage stuff. So, Ray, if you think about the typical storage admin job, if I needed to
attach a bunch of lungs to, let's go ancient, to an HPUX instance, and I wanted to replicate those
lungs locally or within, you know, in some block level storage, a CRD would be the equivalent of
defining that in whatever your OS or platform is. An operator automates that thing. So instead of
having a, instead of, and it's an intelligent way to automate that thing. So instead of having a, instead of having, and it's an intelligent way
to automate that thing.
So instead of having
the administrator go back
and repetitively do
the same thing
over and over again,
Kubernetes has this
concept of an operator
which can intelligently
do that repetitive task
that can be software defined now.
And does it take
the place of CSI or something like that?
Or is the CSI still in the environment as well?
CSI is still in the environment as well.
I'd say what the custom resource definition is, the CRD,
it's just really a way to extend the Kubernetes resources
that are built into Kubernetes.
Because Kubernetes doesn't know or couldn't possibly create
all possible types of resources or Define them that people would want to use into Kubernetes. Because Kubernetes doesn't know or couldn't possibly create all possible types of resources
or define them that people would want to use with Kubernetes.
So they allow this plugin mechanism
so you can define your own types.
So the concept of a CRD
and sort of the framework for using CRDs
is built into Kubernetes,
but the actual definition and creation of those is defined by each project.
Like Rook has its set of CRDs that define Cep software for the Ceph storage cluster that's operating in the Kubernetes pods.
Is that how this way it plays out?
Yeah, think of Rook as the management plane for the storage platform, which is Ceph.
And I guess why this is needed is a really interesting rabbit hole to go down a bit, because a lot of times a storage admins will look at something like a CSI driver and think, oh, OK, this CSI driver for my VMAX array, the pods in the cluster can see the underlying storage.
What problem am I solving if I'm able to provide persistent block storage to a pod? And I think
that's where I think you guys can help us really understand the value of Rook from a Kubernetes perspective.
Yeah, with CSI specifically.
So CSI is a specific extension to Kubernetes for storage.
Of course, you can plug in your persistent volumes and mount them in your pods and all that.
So you have CSI drivers that are implemented according to that CSI API.
And so Rook actually installed a CSI driver
for working with Ceph volumes.
And so that's one aspect of Rook,
but Rook is not a CSI driver,
Rook installed the Ceph CSI driver.
And so CSI is a specific API,
is how I think of it,
for Kubernetes to work with the storage volume.
So what I'm teasing out here
is I can have a Ceph provider
outside of like my cluster.
I can stand up,
as you mentioned earlier in the introduction,
Ceph has been around for long before Kubernetes.
So I can have an existing set of Ceph resources and I can have a CSI driver for that and I can connect via the Ceph CSI driver to there.
But that storage, by definition, sits outside of Kubernetes.
It's not part of my cluster. So when my cluster fails or my storage fails,
I lose that connectivity to the storage. As I understand it, it helps me think of storage in the same way that I think of Kubernetes resources.
So it effectively brings the Ceph cluster under the constraints and operational characteristics of a Kubernetes cluster.
It takes that Ceph cluster you were talking about, Keith, and brings it inside Kubernetes, I guess.
Is that what is going on here, gents?
Yeah, yeah. That's exactly what it is.
So your storage becomes, your Kubernetes becomes storage aware.
In my Dell Technologies example where we have our VMAX array and we're just CXI driver.
If I want to move my workloads, I can move my workloads.
But what about that persistent storage layer and that connectivity to the underlying storage?
The CSI driver doesn't magically make the pods, the new pods to connect to the old.
Something has to orchestrate all of that new connectivity and the movement of the physical data plane to a new set of pods.
CSI does exactly that.
And I think that's where you guys can help me out
because this is where my kind of knowledge is failing
because I'm not a Kubernetes guy.
I know enough to be dangerous.
As I think of like traditional Ceph
and traditional Ceph storage providers
or NFS providers, because Rook
works more than just Ceph. As I think about that, and I think about the storage upcoming
under Kubernetes control, what are the benefits over just CSI drivers to a traditional
external resource? What's the distinction that bringing Ceph underneath the Kubernetes cluster versus having
Ceph outside the Kubernetes cluster does for container apps, let's call it?
You're just adding the management layer.
If you have Rook deploying Ceph inside Kubernetes, That's what you get, essentially. Versus if you treat an existing external Ceph cluster
as external,
then you're more into this consumer relationship
than managing.
You're not managing anything at this point.
Rook is only responsible for bringing the connectivity
of that external cluster onto Kubernetes
and then pass it down to CSI
so that you can provide persistent storage
to your applications.
But by having that Ceph cluster
inside of Kubernetes pods, let's say,
what are the advantages of doing that
versus having the Ceph cluster sitting outside?
You're saying both can be Rook managed
or Rook connected, I'll call it.
Yeah, so when it's inside Kubernetes,
then you get, let's say, dynamic response over failures, for example.
So if one of the components of Ceph fails,
then it can be immediately rescheduled onto another host.
And you can also do things like replica sets, for example,
when you can decide, I want this particular interface of sef
running multiple times because seth we haven't really dived into what seph is and what it does
and how the storage interface is structured but essentially sef provides three methods to interact
interact with storage so three storage interfaces object object, block, and file system.
And you could decide that you want, and essentially the object piece is really similar to
OpenStack Swift or Amazon S3, which is API compatible. It is really compliant with this
API. And these are actually taking the form of gateways, as we called them. And you could decide, okay, run three instances of all of those gateways
and aggregate all of them through the service endpoint in Kubernetes.
And if the scale goes up, then you can dynamically add more
and also scale down to where you were before.
So it's more responsiveness over scaling up and scaling down,
as well as also responding to failures.
If one of the critical components of Ceph, for instance, the one maintaining the quorum, the monitors, is failing, then we can fail over and reestablish the quorum.
That's one of the fees we get from running in Kubernetes. So, I mean, the advantages of Kubernetes, obviously, is the auto-scaling kinds of things and auto high availability restarted containers that fail and stuff like that.
But now you have that sort of capability for the storage as well as the containerized application.
What about the physical disks and stuff that's sitting behind some Ceph node someplace or Ceph
pod or Ceph container? I'm not exactly certain what the right term is.
Yeah, well, I think you're right.
I mean, it depends on which environment you're deploying.
I mean, there is no magic.
If you run on bare metal,
then the disk is going to stick on the machine.
The disk is going to stick on the host.
If you run on more dynamic environments,
such as cloud environments,
like Amazon or Azure, for instance,
then the storage becomes portable because essentially there are VMs
where you have attached block devices onto.
So if that fails, then you can move it onto another virtual machine.
And yeah, if one machine fails, one VM fails,
then the storage, let's say an EBS volume,
will be rescheduled onto another VM.
And then the
SEP cluster is being healthy again. So let's, I really love where we're going with this
conversation. And I think, again, as we get into the nuances where we see the difference between
SEP external and SEP internal to being provided by the ROOC operators. So if I wanted to build a highly scalable or
highly redundant, let's focus on redundancy, on highly redundant solution, I can build it
with either model. I think the question becomes that of a operations management plane, if I do it with Ceph external, then I have to, as a either operator
or developer, I have to glue together the things, the automation for when failure happens. The,
the reconnection, the, I, you know, the, the visibility into the app to see that the,
that either the pod of the storage is down.
Kubernetes tells me whether or not the pod is down. But in order to provide that
replication redundancy, et cetera, I have to build
that myself when I'm consuming. I have to build that automation
myself when I'm consuming it via Ceph external. There's some tools
out there that'll help me do
it. But what Rook says is basically your workload and your, when I define my workload and my storage,
I can define it in a single set of YAML or whatever. And Kubernetes manages the whole
thing for me with less manual thought on my end to make that work
is that correct it is right and that's where the work project again was born really was
self-management requires generally that i mean you hire a specialist who knows how to deploy
and upgrade and maintain handle the failure. You need someone who really would understand Ceph deeply.
So what Rook does is it helps abstract that
or remove some of that complexity.
And so then you can just define,
well, how do you want Ceph to run?
How many Ceph bonds do you want?
Tell us where you want the OSDs to run
and we'll just make it happen.
And then, oh, if you want to upgrade to a new version of Ceph, well, just tell us what version that you want the OSDs to run and we'll just make it happen. And then, Oh, if you want to upgrade to a new version of Ceph,
we'll just tell us what version of that you want and we'll go sequentially
sequentially and safely upgrade all of the pods in the succession.
So you don't have to worry about how Ceph upgrades work. And again,
managing Ceph is Rook's job,
the Rook operator's job so that you don't have to worry so much about the Ceph internals.
Yeah. Do you see a lot of, I don't know, implementations of Rook Ceph sitting in public cloud environments?
Or is it more bare metal or obviously a combination of both?
You mentioned that public cloud has some interesting characteristics with respect to disk placement, I guess, or disk
floating connectivity. Yeah. Yeah. It's really bringing even more high availability to the
storage when you deploy that because the cloud provider is going to guarantee you X9s, I guess, for that storage. But if you put on to like an environment,
if you put SAP on it, then you're just extending that
because you will be replicating data also
on top of those block devices,
which hopefully already underneath replicated as well,
but you're just bringing more redundancy to the platform
and more availability in general.
Right. And again, when Rook started, I thought,
why would anybody ever want to run Rook in the cloud?
You've got the cloud storage solutions.
So when it was initially created, it's like,
let's target bare metal, your own data center type of scenarios.
But what we really found is there are some common scenarios
to run in the cloud, which is, for example, limitations of the cloud providers.
Like you can only have a certain number of PVs per node.
So you quickly run, I think it's like 32.
And I forget how many exactly in some environments.
But you can have like there's practically limitless number of PVs for Ceph, like thousands, or I've seen thousands in testing at least.
So the other, yeah, the other thing that kind of this crosses the boundary of, you know, obviously containers and Kubernetes was kind of designed around a stateless computations, I'll call it.
But, you know, we're bringing Ceph and Rook and persistent volumes.
All of a sudden now we have stateful containers.
Are you seeing a big adoption of stateful applications and stateful containers?
In Kubernetes deployments?
Yeah, yeah, exactly.
I mean, I guess you wouldn't need Rook without him, I suppose.
Right, exactly. I mean, that's why people use Rook, because at the end of the day, how can you build an application that doesn't need some state or storage, right?
You know, unless it's even the simplest website is backed by some storage.
Typically, those things are sort of sitting outside the Kubernetes operational environment, right?
I mean, it's like a database server or something like that sitting outside.
So, Ray, I think you're hitting a key point of what Rook is solving is this concept.
We need persistence. not only in database storage, but we need some type of file system persistence or data persistence for unstructured data across multiple pods.
Like the pods, the processes can be anywhere. We don't really care for that.
But data has gravity and we have to sometimes move the data with the gravity, I mean, with the workload.
And once we've once we ran into that architectural problem, well, the question is, how do we solve it? Do operators build the capabilities within Kubernetes to be data aware? Or do you build it outside of Kubernetes and then kind of have a dotted line relationship between the two. I think Rook solves that or answers that question in an
opinionated way to say that, well, you make Kubernetes aware. We still don't want to treat
the process or the control plane. And I think this is a control plane question. We don't want to
treat the data control plane as a pet. It's still a cattle. If the data control plane
dies, we still want to move the entire application, including the data, to another set of pods or
resources. And I think that's what Rook enables. I think there's isolation here between Ceph and
Rook that's important to understand.
I mean, I see Rook as being the one that deploys the Ceph cluster.
I see Rook being one that's sort of monitoring the Ceph cluster and Ceph operations and stuff.
But, you know, the containers are using persistent volumes through the CSI to talk directly to Ceph that's sitting on the cluster.
Isn't that how this works?
Yeah, that's absolutely right.
And that was one of our core architecture principles is the data path only goes from your application to Ceph.
Rook is not in that data path.
Rook's only at the management layer.
So we're not slowing anything down.
It's just you get the stability.
You know, Ceph has been around for a long time.
You want storage that's stable and durable and all that.
Right, right.
And you mentioned lifecycle management.
So I have no idea how you upgrade a Ceph cluster that's external to this world,
but you're providing some capabilities within Rook to, I don't know,
go from version X to version Y in Ceph?
Yeah, that's why you don't really have to know how to upgrade a Ceph cluster
to upgrade one when you use Rook,
because the only thing you have to do essentially is just tell us
which version you want, and we'll just go ahead and kick up the upgrade.
So the entire logic is, again, baked in the operator.
And the RUC operator that's running in the Kubernetes cluster someplace,
as containers, no doubt.
That's interesting.
So I guess part of the clarification I need as well is, you know, I watched the video.
I think Ceph is obviously the most mature presentation protocol for Rook.
Are there other supported underlays like NFS or block storage directly outside of what's provided via Ceph?
So if I wanted to use Rook to present NFS instead of Ceph.
You can do it through Ceph, actually.
So it's always through Ceph.
Ceph is the primary presentation or management plane technology that we're using.
Data.
Yes.
Data.
Yeah.
Technology.
Data gateway.
I'm not sure exactly what the terms are, but so you're saying if you wish to have a standard NFS box or a NAS box being serviced in this Kubernetes cluster, you could do this through Ceph file or something like that.
Well, Ceph has an interface to re-export with NFS.
So, yes.
I see.
As far as do you have the same sort of export for,
does Ceph also support object export like that,
or does it have to be maintained within the Ceph storage functionality?
Yeah, could you have some object storage endpoint beyond,
let's say you wanted to use Amazon S3,
and you're running this Kubernetes, you know, Amazon, you're running this,
this Kubernetes cluster in Amazon EKS or something like that.
Could you use physical S3 sub storage or would you have to use EBS kinds of
volumes?
Yeah, Rook always consumes EBS block volumes,
but then Seth can turn around and expose the S3 endpoint from that.
So if you want an AWS S3 endpoint,
then you would just go straight to AWS S3.
So I guess the challenge is, and I guess,
and it's fine if it doesn't solve this problem.
Let's say that I have the problem that S3 is external object to me, external storage to my Kubernetes cluster.
So I have all of the same challenges of, you know, accessing any CSI provider when it's outside of the control plane of Kubernetes.
And I want to solve the problem of making my application
or my cluster data aware.
But I also want to use the power of S3 replication.
I don't want to replicate via, I don't
want to have a layer of abstraction on top of S3.
I want to replicate anywhere.
But I want to orchestrate the movement of the data
at the cluster level, or the attachment of the data at the cluster level, or the attachment
of the data at the cluster level. Rook does not do that. If you want that capability, you have to use
Ceph. Rook is only orchestrating the Ceph storage. Ceph does really a lot of things. For instance, you were mentioning about S3. Ceph has an interface to consume objects,
just like you were consuming them through S3.
So it has gateways, which are S3 compliant.
So essentially, we're always playing catch up
with whatever comes next into S3 spec.
But Ceph has gateways that you can access
through the S3 protocol and you can also
set this up in a multi-site fashion so you can have geographically distributed clusters that all
that interact with each other's uh and just replicate data across across regions for instance
this is not like this is possible with Rook to a certain extent.
But this requires like overall like higher level orchestration.
For this, ideally, we would need to have something
like Kubernetes Federation, which we don't have yet.
So that you would really orchestrate workloads
between regions.
But right now, we don't really have this.
But out of the box, Ceph, Rook, and with a little bit of extra configuration,
you could set up multi-site gateways pretty easily.
Interesting.
For the data alone, and you could potentially do some disaster recovery things at the other regions
if you wish to fire that sort of thing up there and that sort of stuff.
So we mentioned lifecycle, we mentioned configuration, and we mentioned monitoring.
Are there other capabilities of Rook with respect to the Ceph cluster that we should know about?
Access methods, maybe. We haven't really discussed that through CSI.
So what would that be, Sebastian?
So as I said,
Seth is really capable of doing many things.
And it is really great when it comes to exposing storage
through different interfaces.
So it can be block, file system, and object.
Within the CSI spec,
it's always block-oriented. So it's not object. Within the CSI spec, it's always block-oriented. So it's not object. So what
you're consuming is always a raw block device or a block device with a file system on top of it,
which you can decide which one you want. And then you have different ways as an application to
consume that storage. So you can say, and that's what we call access methods, essentially, where you can specify, I want
that block device to be mapped slash attached, if you want, to multiple applications at the
same time.
And they will all be writing and reading at the same time, too.
So it's a...
I mean, use this for something like a file system, for instance, that would be supported
across a number of containers and stuff like that
are utilized by a number of containers. That's right. If you have an application that scales,
like let's say your app has Rubrica 3, it runs three times, but it has to access the same data
store always. And with SAF, you could tell, okay, use the same PV, but attach it three times
on two different containers, and they will all share the same store and they will be able to read and write
from that storage too.
This is the most advanced, I think,
that people might be looking at.
You can do this with block as well, if you want.
We can do this with our system, of course.
So if I was to do like a database server
under Kubernetes and I wanted to use RookSef and stuff like that.
I mean, this guy could potentially have multiple pods running to scale the database as it requires,
and then behind it, I guess it would be block storage rather than file storage.
But let's say it had file storage behind it.
Then in this situation, multiple pods could be accessing that persistent
volume that are supporting this database server in this Kubernetes cluster.
So yes and yes, but it's a bit over-optimistic, I think,
because doing databases operations over distributed file systems is,
I mean, it is always super heavy in terms of metadata.
So I guess, yeah, it is possible to...
Nobody would seriously consider this suicidal method.
I guess it depends.
I mean, it depends on the workload.
It depends on how heavy the workload is and how your rights are.
But I mean, practically speaking, technically speaking, it works.
But then would you want to do it is a different question.
You might be better looking at distributed databases instead.
But yeah, I mean, it's a valid example.
Just trying to put in a warning here.
There's a clear performance cost to using the shared file system.
So probably databases, yeah, you'd want the distributed database instead of the distributed files.
So they would probably use a block storage option under this configuration, but that
would work as well in this case.
The block storage volume could be shared across a number of pods running the database server
itself.
There is a downside to this.
I mean, in that case, you wouldn't map the same block device multiple times.
It will be a one-to-one relationship, and then you might have shards distributed.
But then the database will do the coordination.
Because if you map multiple block devices onto the same machine,
then the only thing you should do is just mount that file system as read-only.
Have a single writer and multiple readers.
If you have multiple writers, then you're just corrupting everything.
Yeah, unless there's some sort of synchronization across the writers and stuff.
Yeah, well, you have to use a clustered file system,
like ancient clustered file systems, I guess.
Well, this is the VMFS,
this is what VMFS does, but it's
proprietary to VMware's.
Yeah, they have GFS and the
old OCFS2 stuff.
Yeah, Virtos and all that stuff.
There are plenty of clustered file systems that
have existed over time, and VMFS
is the current one that
VMware is
using for much of their enterprise apps and stuff like that.
Oh, okay.
Yeah, yeah, yeah.
So how does this fit into like a CI, CD, DevOps kind of situation?
Can you roll out Rook changes?
You know, go from Rook version 10 to Rook version 11?
I don't know what the versions are
and stuff like that automatically.
Or is that something outside of this?
Well, you could do get-ups, yeah, for sure.
Yeah, we've definitely seen people doing that with CI-CD.
And as far as upgrading Rook itself,
the upgrade is usually just, well, update our CRDs,
those core definitions, update the rollback access control, the RBAC, which is just sometimes the privileges change as far as what Rook needs.
And then you update the operator, and then the operator automates everything after that.
Yeah, so I would expect this just enables the technology
enables the approach.
If your approach is to do rolling updates,
the technology is there from, let's say,
that step one is to update the Ceph cluster
and the components of that.
You can schedule Rook to do that,
or let's say step one is to update Rook.
And you have, you know, the technology is there.
You guys have kind of completely embraced or you're subjective to the approach to managing Kubernetes in general.
This just integrates with your whatever operational approach you've adopted.
That's right. And it really is, I mean, to the Kubernetes admin,
Rook should look like any other application
that they already need automation for in the cluster.
It's just another app.
So what's the next, like, what are the big areas of interest
the community would like to take Rook to?
Like, what's in the hopper?
Backup, disaster recovery, synchronous replication.
You know, these are enterprise kinds of situations, right?
I mean, does the system support a backup solution
or how would that play out in this environment?
Is it a Ceph issue or is it a Rook issue?
I don't know.
Yeah, well, DR is definitely one area,
well, probably the biggest area of focus we have right now
because it is a complex architectural problem.
How do you really support disaster recovery?
Where's the automation and where's the boundary
between that manual trigger that says,
yes, we do need the admin to decide when to failover?
Yeah, and I think there are two things, right, Travis?
Something you really worked on extensively is stretched clusters.
Because before going DR, we have to determine whether a cluster can be stretched across
two locations, because that is actually the ideal.
When you do a stretched cluster, then you get some sort of a DR for free, right?
You don't really have to do much.
If you're doing synchronous replication across a stretch cluster and things of that nature,
or they all have access to the same servers, it depends on what you're doing, right?
Yeah, for that stretch scenario, really, we're talking about still a single Kubernetes cluster,
so the latency still can't be too high, but
some people have, if you have two
data centers that are
geographically close enough,
and then you have,
basically, we need a third tiebreaker
node somewhere between the two data
centers, then you can
have that replication. You call that the witness node
or something like that. Arbiter, we call it.
So let's talk about observability and visibility
from a application pod orchestration perspective.
Is there any either roadmap or existing features
that allow operators to select Ceph clusters
based on attributes?
I guess the first question is, are there going to be multiple Ceph clusters in this Kubernetes
environment, or is it just one?
You can have as many as you want, but typically you have, I mean, typically you have one,
but yeah, it just depends on how you want to.
So I guess that, I guess that, I guess Ray, you're asking a little bit better question than I am,
which is we're storage guys.
We like speeds and feeds.
Not all underlying storage is the same.
Sometimes I need cheap and deep storage.
Sometimes I need super fast storage.
And is the delineation a separate Ceph cluster? Is it the same cluster with different
storage pools? Like how do I give my developers the choice they need?
So I got some database that needs, you know, real fast block storage. And I've got some
machine language solution that needs real sequential storage and i've got some i don't know data analytics
well it is i think it's one of the really it's one of the goodness of seth honestly is that
with seth you don't need to have multiple clusters to be did in where each will be dedicated to a
certain purpose like oh this one is fast storage.
This one is archive.
This one is only file.
This one is only block.
No, no, you can manage all of that
through a single step cluster.
And you can really isolate pools
through like, you can build a logical reference
of your architectural platform.
Like how many servers you have,
what type of disks are in those servers.
And you can divide that up
and expose that particular storage.
So you can say, okay, the set of machine
will be blocks, fast block storage oriented,
and then it will be exposed through a pool that will know,
okay, I have those, like these pool of machines available,
and then I'm exposing and I'm serving storage
in that matter.
Then through CSI, I can expose multiple pools with multiple providers through storage classes,
and then the storage classes will say,
okay, this is fast storage, go with it.
This is archiving intended, and then go with it.
And this is how the developers will determine
the type of storage they should consume.
So on the container YAML file, for instance, they would say, I want fast storage or I want the storage pool.
And Seth would automatically assign it to that.
If you give it the right hardware, yes.
Yeah, so from the application's perspective, the application requests storage from the storage class, and you'd have to define a storage class that corresponds to the Ceph pool.
And then if you've configured Ceph properly, then it...
And Rook does that all for me, right?
Right.
No kidding.
Rook won't read your mind for how to set up, but...
So once I have the Ceph defined storage pools, then Rook will provide the storage class and the linkage between the two and all that stuff, is that what you're saying? in EBC, I'm sorry, EBS, then I can assign that ultra fast if I'm doing Theta, one level that's
fast. And then if I'm doing, if Amazon even has spinning disk anymore, if I'm doing spinning disk,
I can put that as slow. I can define that as the Rook administrator. I can, you guys are making
that easier for me to define that within Ceph.
So I don't need a Ceph expert to do that definition for me.
Right. Especially if you run in the cloud with Amazon and you would know that this particular type of provider is bringing you NVMEs. So you tell Ceph, OK, use that provider and give me 908s.
And then you get a pool and then you create your own storage class that points to that pool,
and then you pass that storage class down to your
developers, and they can start consuming it. So yeah, we're really trying
to make that easy for you to consume.
I don't recall. Does Ceph do mirroring for
data protection, or does it support something like RAID 5 or RAID 6 or something like that?
Or, you know, what's the minimum number of Ceph nodes if such a thing exists?
Yeah.
Internally, Ceph will replicate the data for you that we recommend by default,
three replicas.
So it just mirrors the three and that's how it handles data protection if one
of them goes down.
Yeah. So in a Ceph cluster, you'd really want at least three nodes because of that number. You want
at least node redundancy. And there's other means of mirroring. The mirroring term, at least in the
Ceph world, is more about mirroring across clusters. So you can take the Arbity image and
mirror it to the other cluster.
It's just a question of how many copies of the data are we talking about?
Is there one copy and there's parity or is there multiple copies required?
And in this case, you're saying multiple copies are required.
Yeah, and then it's on what level of abstraction are you talking about? Are there multiple copies?
Because I could consume at the provisioning level,
I could consume a service,
a provider that has by its nature redundant copies,
but they may not expose them.
So there's a lot of complexity in these abstractions.
Yeah, but I would say that if you get underlying replication,
even if you don't know about it and you are somehow paying for it already,
and you're also never really guaranteed that you will ever get your disk back
if something fails.
And that's why you really have to have Ceph on top of it
to do the replication for you of your data.
And Ceph also has different types of replication.
You can tell it how many replicas,
or you can tell it to use a ratio coding,
which then breaks it up.
So ratio coding would be what I would consider
a RAID-like functionality,
which has data and parity,
and how many parity groups might determine how many storage nodes go down
or how many disks can go down and still recover the data.
So that's good.
It's good for availability, and it's also,
it becomes a little bit cheaper if you want to reduce your cost per terabytes as well when you're deploying.
It's a bit more expensive when it comes to computation, of course.
Yes, yes, or access itself. Yeah, yeah.
So what would be a typical sized RookSeph environment?
I mean, is it something you'd consider deploying petabytes to, or is it something more of a terabytes nature?
I'm just trying to get a handle on what
the typical environment might look like.
The scale is definitely up to petabytes. That's what
Ceph was designed for, at least petabytes.
We do have, I'm trying to think,
I know of at least one or two clusters
that are at a petabyte or two in Rook.
But there are also so many people in the upstream community
that we just never hear from.
I wish I knew what the real size of clusters was.
Yeah.
But the most famous one are,
because they have been working with Ceph for a very long time and they have really, they're part of the Ceph foundation.
They have been really advocating their usage of Ceph as the CERN, which is the data scientists from Geneva.
And I think there's also like an Australian, the Monash University, and they have a few petabytes there. But CERN has, I
think they have, so they have multiple clusters and no, everything I'm saying is
public by the way, so I think they have like between 20 to 60 petabytes
clusters on Ceph and I'm sure they are much bigger clusters out there for the
object use case when they want to have...
I mean, something like CERN and Monash
is actually using Rook to manage those clusters
under Kubernetes?
Unfortunately, I think the answer is no.
They have multiple clusters too,
but I think they're experimenting.
Yeah, and the cluster I was mentioning
and they've commented publicly too, is the University of California. I think they've got a hundred or low hundreds number of nodes type of thing with the petaphyter too, supporting their universities across California.
Well, this has been great. So Keith, any last questions for Sebastian or Travis?
No, I think we've picked their brain as much as my brain will handle. I've learned way more about
Kubernetes than I ever wanted to learn about. So Sebastian or Travis, anything you'd like to say
to our listening audience before we close? Well, I think it was great to have you guys.
We could spend hours discussing this.
I think you could tell
that Travis and I
are really passionate
about what we do.
But it has been great
chatting with you guys,
for sure.
Yeah, absolutely.
And we'd love to hear
from anybody interested in Rook.
You can join the Rook Slack.
Go to rook.io.
That's our main website
and links to everything
interesting there. There's Kube website and links to everything interesting there.
There's KubeCon.
For the last few years,
we've always had talks at KubeCon. You can go back and listen
to or the KubeCon North America
that's coming up in mid-October.
We'll have a couple of talks there.
So we'll look forward to
hearing more from people.
All right. Well, this has been great. Sebastian and Travis,
thank you for being on our show today.
Thank you.
Yep.
Thank you.
And that's it for now.
Bye, Keith.
Bye, Rick.
And bye, Sebastian and Travis.
Okay.
Bye.
Until next time.
Next time,
we will talk to
another system storage
technology person.
Any questions you want us
to ask,
please let us know.
And if you enjoy our podcast, tell your friends about it. Please review us on Apple Podcasts, Google Play, and
Spotify as this will help get the word out. Thank you.