Grey Beards on Systems - 142: GreyBeards talk scale-out, software defined storage with Bjorn Kolbeck, Co-Founder & CEO, Quobyte
Episode Date: February 2, 2023Software defined storage is a pretty full segment of the market these days. So, it’s surprising when a new entrant comes along. We saw a story on Quobyte in Blocks and Files and thought it would be ...great to talk with Bjorn Kolbeck (LinkedIn), Co-Founder & CEO, Quobyte. Bjorn got his PhD in scale out … Continue reading "142: GreyBeards talk scale-out, software defined storage with Bjorn Kolbeck, Co-Founder & CEO, Quobyte"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here.
Jason Collier here.
Welcome to another sponsored episode of the Greybeards on Storage podcast,
a show where we get Greybeards bloggers together with storage assistant vendors
to discuss upcoming products, technologies, and trends affecting the data center today.
And now it is my pleasure to introduce Bjorn Kolbeck, co-founder and CEO of CoBiTE.
So Bjorn, why don't you tell us a little bit about yourself and what's new at CoBiTE?
Hey Ray, yeah, thanks for having me here.
I got into storage more or less by accident.
I started my career in high performance computing and was part of a large European research
project and we got the data management packet and that opened up the possibility to try
things and we decided that we wanted to build a distributed parallel file system for grid
computing and that's how I met my co-founders
and how it all started.
After finishing my PhD, I worked at Google.
I wanted to do anything but storage.
But of course, when you work at Google,
you use the infrastructure.
And it's pretty impressive to see
how very small teams run this global infrastructure.
And that was the inspiration for my co-founder and
me to to start qubit we wanted to bring together the massive scale out of high performance
computing storage and the ease of operation and also the scalability on the operational side that
you see at google that basically comes from from true software. So that's how basically my career and my research
led to what we have with Colbert today.
So why don't you tell us, so what's the big changes?
What does Colbert bring to the table
that some of the other software storage players don't have?
Ease of use, obviously.
Yeah, ease of use is a big one.
That's one of our focus areas.
And we believe that Corvette is easy to use.
That's why we have a free edition that you can just download and install.
I think that's the major differentiator to all the other commercial
software storage products out there.
They don't they're not available as a download.
And my guess is that's because they're very complex to install and use.
So I think that's the biggest differentiator.
And then there are a lot of details on how we do things differently,
how we focus on fault tolerance, how we, for example, do the data
management of policies with a lot of flexibility.
I think this is where Qobyt is truly unique.
And on the other hand,
we are a parallel high-performance file system,
so you can get excellent throughput,
IOPS, small file performance from Qobyt.
So when you say parallel file system,
are you using like NFS version 3
or are you using your own client software, POSIX? How does the customer access
GoByte? We do have our native clients for Linux, Mac, and Windows, and that's the preferred model
to access the storage. It's a POSIX-compatible file system. But with our own protocol, we avoid
all the problems you have with NFS, bottlenecks, failover problems.
And we can do parallel IO and also parallel metadata.
And then for applications that somehow require NFS, we also support version 3 and version 4 of that protocol.
But ideally, and I think most of our customers do, you would use the native client to get the best performance and also best fault tolerance.
And Bjorn, on that, so you mentioned you can go and download that. Do you have,
are there clients available, like you said, for basically Mac, Linux and Windows? Can you also,
is that also the server as well? Can you utilize that to basically, I guess,
take and create storage pools off of that?
On the server side, we only support Linux.
Okay.
We run in user space,
so we support all the major Linux distributions,
but no Windows and Mac.
That's really only the client side.
Yeah, yeah, yeah.
You mentioned somewhere in there parallel metadata.
You want to clarify what that means?
I mean, I'm trying to understand
what that looks like in the world today.
Yeah, it means that we basically scale out metadata as well.
So you can have a large number of metadata services.
So we have the split between metadata and data.
It's very common in the HPC world.
And you mentioned NFS initially, and there's also parallel NFS.
And one of the big downsides with both NFS versions is that even in the parallel version,
you talk to one controller node metadata server.
So this is where our protocol allows you to talk to a number of metadata servers, not just one, which means that you can scale out your metadata as well.
And so you mentioned storage servers running Linux software.
So are the metadata servers in the storage servers both running on those servers or are
they distinct services or how has that kind of deployment played out?
So we do have them as separate services.
So there is a metadata service as a separate data service,
but you can run them on the same machine in the end.
That's up to the customer, to the user, how they want to do it.
The default setup is to have all the services on the same machine because
then you basically utilize the scale out better.
But we do have use cases that are very metadata heavy where customers decide that they want to have of that on, say if you've got a mix of a flash and a flash SSD,
and then basically spinning rust on there too,
can you actually split those up
and run those services on different drives within a system?
Absolutely.
We have also the basically metadata drives and then data drives. Those are separated because you want the performance isolation.
And then on the metadata side should be a flash drive, simply because when you access file system parts that are cold, you want to page them in from the drive. You don't want to wait for a hard drive. And then on the data side, we support
both flash and hard drives and can use them as
tiers inside the same storage cluster and
even inside the same file.
How many nodes or systems do you need to start with to
create this distributed system?
Minimum is four nodes, simply for fault tolerance.
So we do either data replication with a quorum approach, which means you need three replicas,
or ratio coding.
And so four is the minimum number of nodes to have a production cluster where you
can lose one node and the system will still continue to operate properly. And then you can
scale that as much as you like. Yeah. So when you said parallel metadata,
metadata is presumably protected as well. Is it protected through replication or is it depending?
I mean, is it something like it protected through replication or is it depending?
I mean, is it something like it's the same as the data protection?
It's different.
So for the metadata, we use a replicated key value store.
So the mechanism is kind of the same with primary election and Chrome approach.
But where data is basically file blocks,
our metadata is stored in an SM tree database that's replicated.
And so we have one primary per volume
with two backups,
and the backups are actually what you would call
hot standbys.
So they have the state and can take over immediately.
And when your cluster grows to like,
let's say 16 nodes,
you've got multiple primaries in that configuration
or there's just still one primary and multiple secondaries for the metadata?
No, then you can scale that out and have as many primaries as you like.
And they basically, our system tries to evenly distribute that
across your metadata servers so that every
metadata server is responsible for part of the system.
I'm sorry, did you mention that the metadata is partitioned across those primaries or it's
effectively the same metadata across all the primaries?
It's partitioned into what we call volumes.
So we have a very lightweight volume concept,
and then you can have a large number.
So what our users typically do is create, for example,
one volume per home directory instead of this giant home directory volume.
Yeah, that's interesting.
That's interesting.
On these machines, is this designed to run bare metal?
Or I assume since it's pretty much a software-defined stack,
you could actually run this within virtual machines or containers or anything like that.
Is there a specific kind of deployment style that you look at?
Or do you support one specific or multiples?
Multiples. You're right. It's software. deployment style that you look at or do you support one specific or or multiples
uh multiples you're right it's it's software um and I mentioned we run in user space so we don't have a lot of requirements when it comes to the kernel version that's why we run on bare metal
um you just take your favorite Linux distribution installed on the servers and then install our software in there.
Same works on virtual machines on the cloud
or bare metal machines on the cloud.
And we also run on Kubernetes.
So we have, or we provide a home chart
to install the Cobalt servers, the clients
and our CSI plugin.
And then you can run Cobalt in Kubernetes
and also provide shared file system to Kubernetes.
Ah, that's interesting. That is interesting. So you're effectively running as a containers
and under Kubernetes, but not as containers under bare metal, right? Is that how I would read that?
Yeah, exactly. So we don't require you to have containers and bare metal we just run our server
processes run as a regular user uh with c groups there's nothing basically you take a bare linux
and install our software yeah yeah yeah yeah yeah talk about security there was something on your
website mentioning that uh the solution is very...
You know, the other thing I mentioned on the website was it's not only file, but it's also object.
You want to talk about what your object protocol support is?
Sure. I mean, this is where we basically combined it into a single platform.
So we made sure that our object storage interface translates to the file system layer.
So it's not different namespaces. It's actually when you go through the object storage interface, you can access the file system,
which means you can access and share the same files, volumes, everything through the object storage interface.
It's really seamless. And it's like S3 then or compatible?
Yeah, it's S3 compatible.
I would say most applications don't notice the difference.
Admins will notice a difference
because they're operating on the file system
with and can use file system permissions
and access control lists.
And then they are also effective for the object
storage access so that makes their life much easier okay that's interesting users they don't
see a difference okay now back to security um what's what's the security options for the solution
then yeah security is a complex topic. So it's many layers.
In the end, to make a storage system secure, you try to put as many fences around it as
possible.
Again, we have the advantage of our own protocol that allowed us to add more security features
than you have in NFS.
So the first layer is end-to-end data encryption, which is done inside the callback client.
So your data is encrypted and decrypted only on the client machine where it's actually produced or consumed.
And then the rest of the system, like the network, even the storage servers, don't need to be trusted because the data is never stored in plain text anywhere on the drives over the network.
So that really end-to-end encryption is something you can't do with NFS.
Where are the keys for that then?
Is that on the client side or is that something that you're configured with?
That's something you can configure. They can either be in one of our services or you can put the keys or use an external key store.
Okay, that's good.
I like that.
Okay, go on.
So you were talking about end-to-end encryption
is kind of the first layer of security for the solution?
Yeah, because when you set that up correctly
and your keys are stored somewhere else, you
basically have a system where you don't need to trust the storage admins because they will
never have access to the data, to the clear text content.
Then we have the second layer that's basically for the network communication.
There we have optional TLS support.
So once you enable that,
you can define which networks you don't trust.
And then our clients and services will use TLS
to communicate with each other.
Comes, of course, at a price.
TLS connections are CPU intensive, slower.
So often that's good for wide area communication,
but you don't want that inside your cluster,
for example, inside the compute cluster.
And then with that kind of hand in hand, go the support for X509 certificates.
That's again, optional.
You can have a cluster that uses the typical ip network restrictions
to allow components to access the cluster or you can use x5 and 9 certificates in that case
all servers and also all clients need a valid certificate to join the cluster
and the advantage here is that with that um you don't need to trust the network. Even if someone gets access to the network,
is able to squeeze a node in there,
without a certificate, they can't access a storage system.
And the other advantage is that admins can put restrictions on certificates
and that way invalidate them if the certificate leaked
or restricted to certain users.
So when you have this typical it's
there's this typical scenario especially in research where the researchers like or data scientists love to have root access on the workstation yeah in that case with the certificates
you give them one certificate that's tied to the user and then you don't care that they have root
access or can impersonate other users.
So that's a level of extra security that you get with the X.509 certificates.
Interesting. What about things like role-based access controls and things of that nature?
We support something that's similar to NFS v4 ACLs internally, and then we translate back and forth to the native representation.
So in Windows, we support a subset of the Windows access control.
So, for example, no inheritance because that can't be translated, but everything else you can modify it in the Windows settings.
Our client translates that and then modifies your NFS ACLs, same with POSIX or S3.
So those are unified, which makes it fairly easy to manage them and also audit those access
control settings.
Do you guys have any type of ability to replicate from one kind of data store to another data store for like disaster recovery style of capabilities?
Yeah, we have basically two mechanisms.
One is what we call volume mirroring, which is based on an event stream. So the remote cluster gets the updates
and then pulls the changes near instantaneous
from the source cluster.
That's one option for DR.
Then the other option is a bit more versatile.
It's what we call the data mover.
It's a component that you can can use to move or copy or synchronize
data between uh core by clusters um it's highly paralyzed but it's in the end something you can
trigger so you can run that every hour you can run that in the evening depending on your needs
so that's for um cases where you have for, limited network bandwidth between the clusters or
where the new instantaneous updates would be too much.
So an asynchronous based kind of mode that you can trigger?
Exactly. Both of them are asynchronous, but one is,
depending on your bandwidth, I would say near real time. And the other one is something that's
really more like eventual consistency.
That's nice.
So you mentioned the multiple services.
You mentioned the client and the storage and metadata services.
Is there like a CLI or a GU gui level operational console dashboard kind of thing
as well and where would that run i guess yeah both of them are available um so we have a command line
tool we have a web interface um both of them talk to our api so we're kind of like this API first and admins can use the API to automate
or do anything they can do
with the command line tool or the web console.
So both of those tools actually talk to our API
and API and web console are just additional services
that you run on one of or multiple of the machines
where the Qubite data and metadata service is running
and so you mentioned erasure coding um obviously as a three point three plus one for the four node
as you go to let's say 16 nodes does that erase your coding change or how is that
is that something customer configures or something that you coding change or how is that? Is that something that customer configures
or something that you automatically change
or just stays with four to one or three to one?
No, you can adapt that automatically.
And customers can select the erasure coding schema.
There are a few schemas that we recommend
because you need enough redundancy to survive node outages.
And customers can select them if they have enough machines and disks in the system.
So as you add machines, you can go wider up to,
for example, 12 plus three, I think it is.
And then our data mover can also recode files.
So that's the use case where you have files that were, I don't know, replicated or 4 plus 2 erasure coded, then you add more nodes, and then the data mover can recode them in the background if you want that.
Is that something that's automatically done as you add nodes to the cluster?
Is that something that somebody would have to trigger through API, CLI, or GUI?
That's something you have to trigger because the recoding is pretty expensive.
It basically reads all your data, writes it out again,
needs to recompute or compute the parity.
So that's a pretty expensive task.
It's something you don't want the system to do automatically.
You mentioned being able to scale it out.
Are there any limits to the scaling?
How far have you practically scaled it out? And are there any theoretical maximums, I guess?
In terms of in production, the customers we can talk about, I think, are in the range of 250 servers.
And we have seen 20, 000 clients accessing the storage that's what we
can talk about theoretically there aren't any limits in our system in terms of the number of
servers or clients we use an architecture where we do the primary election decentralized. So only the nodes that have a copy of the file or the metadata need to talk to each
other to coordinate who becomes primary.
And this allows us to scale without, you know, things like a centralized lock service, which
then easily would become a bottleneck because we don't have that.
We really have linear scaling to a large number of nodes.
Is there any specific cluster consensus algorithms that you use
for doing that? Or is that proprietary?
It is proprietary. I mean, if you look into our publication
list, you'll find it. It's basically
how would I best describe that?
Inspired by Paxos might be the right word.
Yeah.
I've dealt with Paxos and Raft and a few others in my day.
You might.
Yeah.
Go ahead.
The tricky part is to avoid the disk IO, the persistent storage that you need to do with both.
So that's the main thing that we don't use the IO for.
The consensus, which would basically ruin the performance of your storage system.
Yeah, it gets very chatty the more nodes you add into it.
So yeah, I remember going through that many times with Paxos as well.
I'm not familiar with Pax but that doesn't that's okay you mentioned primary elective
something i can't even think what what the other word was that so if uh if let's say a cluster
i'm trying to access a file and there's you know it's erasure coded uh so is there one storage server that becomes the primary
conduit for that file i o and that then the data could go to any of the storage servers
associated with the erasure code or i guess i guess first question is is erasure coding
done on a node by node basis or on a drive by drive basis it appears node by node yeah so um maybe we have to
take a step back when you create a file our the metadata server that's responsible for that volume
looks at the policy engine and the policies you configured because that allows you to do
things very fine granular.
So you could have policy that says,
says race files are always erasure coded and on flash and
Bjorn's files are replicated and end up on hard drive.
So you could, you have a lot of options there.
So the metadata server first looks at the policies to decide how to
handle your file.
If you use synchronous replication,
it would assign three copies on three different machines
and also three different drives.
And then we have a primary election
to make sure that the three copies are in sync,
that you have a consistent
or consistent copies of your file.
If it's erasure coded,
then we actually spread it across ideally eight plus three servers. So if you have 12 servers or 11 servers, we would spread it across them so that each chunk is on a
different server so that you have really the full fault tolerance for erasure coding. And then there
is no primary election vote because that's the replication.
So in that situation,
typically, well, I guess I don't know.
So in an NFS perspective,
the file request would go to an IP address,
which would be what I would consider
like the primary storage conduit for the data.
The data could be anywhere on the cluster.
But in the case of your solution, I could actually go to any of the storage nodes on
the cluster, whether I'm NFS or client.
I guess I don't understand how it connects.
Yeah, it's very different from NFS. You're right. When you're a client and you talk to NFS,
you basically have this one NFS endpoint and then something happens behind that server.
That's also the problem because you're talking to that one NFS endpoint. So that's a bottleneck.
If everyone's talking to the same one, you know, it's a typical problem you have in NFS-based solutions.
So our client takes a different approach.
Our client directly communicates
with the right metadata service
and then ask the metadata service
for the file location.
And then the metadata service
will tell the client that this file,
for example,
is erasure coded and the stripes are spread across these 11 drives and servers. And then the client
can directly go to the data services and retrieve the data from there or write the data, the metadata
servers out of the loop. And that's the great thing for scalability because on one hand,
you don't need to talk to the metadata service anymore. On the other hand, the clients are able to talk directly to all the data services that have the necessary data. So you have a system
where you don't have those bottlenecks of NFS gateways that need to redirect the data.
Okay. So typically, let's say you've got an 8 plus 3 stripe here. The mapping between, let's say, a file record, if such a thing exists, or a byte and offset, let's say, and, you know, a location within that stripe is typically non-computational.
I mean, it's something that has to be look up some table, and that typically requires
some sort of metadata service.
But in your case, you don't do that.
I'm trying to understand how that all plays out.
Yeah.
Often when metadata in distributed systems isn't the same for all systems. Some do distributed metadata where they consider the block allocation,
the metadata.
In that case, you're basically similar to a local file system.
You have the block allocation table for a file, which can get very big.
And then you constantly talk to the metadata.
Yeah, yeah, yeah.
Yeah. OK.
But in your case, andFS are part of that.
And then we have an architecture
that's somewhat similar to the Lustre file system
where we don't do block allocation at global level.
The metadata basically has the allocation
in large chunks, think gigabytes,
for a file, what we call segments.
And then for each segment,
it has the list of drives for the chunks.
So for eight plus three erasure coded files,
it would be 11 drives.
And then the client based on the offset
and this very small table
can actually locate the servers that have the data
and then the servers to the block allocation locally.
But that's something the client doesn't care about.
I got it. The client, the offset goes to this tiny table and then hardly ever
talks to the metadata service again yeah well nfs is pretty chatty with respect to a lot of
the metadata stuff so that could be a potential bottleneck um let's talk about performance. You mentioned that you can have a multi-tiered storage pool.
Is the tiering on a file basis or is the tiering on a storage pool basis slash volume, I guess, is the question I would ask.
It's actually on a per file basis.
What?
That's, again, where our policy engine comes in. So you could say or you could
create a policy that says if you want to do that, your own files are tiered down from
flash to hard drives after 10 days. Things like that. So of course, usually you don't do that on that granularity,
but we have customers that say, for example,
certain files like the big, in life science,
it's BAM files, those are huge, several gigabytes.
Those files should go directly to hard drives
because they're only read sequentially,
they're never modified.
So just put them directly on hard drives,
don't even bother with the flash tier.
So it's really on a file level
and tiering is just a different kind of policy.
We have policies where we automatically switch
from writing to flash with replication
to erasure coding on hard drives inside the same file.
Because you often have those use cases
where you have tiny files of metadata files,
and then you have large files that contain the actual data.
Think about machine learning.
That's fine.
Wait.
So let me try to understand.
So let's say it's Bjorn's files, and I want to tier them.
I can have it go to Flash and be replicated on the Flash tier
for, let's say, 30 days. And after that, it can go to disk and be erasure coded. Is that what you
said? That's one option. What? Nobody does that. Nobody does that. Yes, that's what we do.
How many different tier levels can you have?
As many as you like.
In the end, we tag devices.
So we have the policies.
We have devices that have tags.
And then we have files.
And tags, automatically, the system creates HDD, SSD and NVMe tags.
So this would be your traditional three tiers that you have. But we have customers that,
for example, have the high density hard drive servers for archival. Then they create archival
or tag archival for those drives and use that for very cold storage, for example.
So that's-
Ray likes to tear the tape.
No, no, no.
Let's not talk about that stuff.
Thanks, Jason.
That was a low blow.
Or maybe it was a high blow.
I don't know.
But this is pretty unusual
to change the configuration
of the data protection based on what tier.
And the fact that you've got unlimited numbers of tiers is also pretty unusual, I would say.
And then you also mentioned that there's some other policy engines, like you could define
a policy, but do you have any automated policies that go in and kind of check the heat maps
of data or anything like that um the automated policies would be um what i described that we
automatically switch from replication on flash to erasure coding on hard drives inside a file when
it grows so that's because most customers have this problem that they have a ton of tiny files
and then very large files and this policy automatically handles them correctly.
So the small files end up on Flash.
Your large files end up on hard drives.
So you don't need to tune anything, and you don't need to do tiering.
Because tiering is also expensive.
You read the data, and then you push it down.
In that case, the system automatically optimizes the file based on the file size.
What other policy specifications are there besides the tiering stuff?
I mean, you mentioned security, obviously.
I guess that's also on a file level, the encryption?
Encryption is, well, in theory, it's on a file level, but you
would define that per volume.
Okay.
What other sort of policy attributes are there?
Security settings like which users are considered super users for Kubernetes, for example,
how to translate user IDs and how to store them when you don't have a username associated with that. Client settings like metadata cache times.
What else is there?
Quality of service levels, all of that.
Basically everything is controlled to policies.
Yeah, yeah, yeah, yeah.
So you mentioned the free solution.
Is that sort of on a, you know, how many gigabytes you have for storage or is there a certain timeframe for that? Or is it, how's the free versus non-free work and how is your forever. And it includes up to 150 terabytes of hard drives, and I believe 30 terabytes of flash. So it's something you can actually use.
I'm sorry, did you say the free service was up to 150 terabytes of disk and 30 terabytes of flash?
Yes.
Jesus.
Okay, there was a limitation on number of servers. Is that what you said? and 30 terabytes of flash? Yes. Jesus. What are you...
Okay, there's a limitation on number of servers.
Is that what you said?
No limitation on number of servers.
There's a limitation on the number of clients
that can access the storage system.
And there are certain enterprise features
that you don't get with the free edition.
Okay.
So, for example, the security features aren't included.
But yeah, in the end, we're a scale out file system
and we want to make sure that our free edition
is actually meaningful.
That's why you get 150 terabytes.
150 terabytes could go for a pretty long time
for most of my application.
I don't know about yours, Jason,
but I think I'd go pretty much forever.
Yeah, then
just download it from our website. I'm going to.
Okay, so tell me, how is the solution priced
after 150 terabytes of disk? Jesus.
I'm sorry. So our infrastructure edition
is priced as an annual subscription.
So we basically give you a subscription based on the capacity for flash and hard drives.
Subscription includes support and then obviously also updates to the software.
And it's a typical volume discount.
So the more capacity you buy, the lower the per gigabyte price is.
That's nice.
Nice.
And so is that licensed for one cluster or for as many clusters as I want?
So, I mean, let's say I'm licensed for, I don't know, a petabyte.
That would be a Jason thing.
A petabyte of disk storage.
Can I run 10 clusters or is that just one cluster or how does that play out?
That's just one cluster.
Okay, per cluster.
Per cluster.
I got you.
But you could buy 10, 100 terabyte licenses
to spread it across your clusters.
Yeah, yeah, exactly, exactly.
And the replication data mover or event driven,
is that something added to the solution
that you have to purchase or is that available
with the base price?
That's available with the base price.
So we just have, I would say a very transparent price model
like a lot of other storage companies.
We have the free edition or what we call the infrastructure edition, and the infrastructure edition comes with all features.
Okay. Yeah. Nice. Nice. Nice.
So tell me about this. So recently there was a client, I guess in life sciences, to purchase the solution.
Can you tell me why they ended up with Cobite?
Yeah, that's Hudson Alpha. They're based in Huntsville, Alabama. They decided to go with
Cobite because they like the simplicity and also the way they can offer storage as a service to
their internal customers,
because that's customers might be the wrong word users.
That's something that we see more frequently now that especially in life
science, but also in academia, um, research institutes and companies
try to consolidate the storage.
So often every group does their own storage and they're trying to bring
it onto a centralized storage platform to have better quality of service and also better cost
in the end. And then you need to do it with similar to what you have on the cloud where you
can offer storage as a service to your users and this is where our flexibility
with the api is helpful where we have for example multi-tenancy features so that you can isolate
your different tenants um thinly provisioned volumes so that you can just you know do over
subscription and sell 100 terabytes um without reserving 100 terabytes, all of those things.
Right, right, right, right, right, right, right.
Storage itself is very, I mean, it's a very horizontal market space.
Have there been any specific vertical markets that have been really a good, kind of a really
good selling proposition for you?
Yeah, because we focus on scale out in the end, any use case where you have data growth
and the customers are struggling with that. So life science is a pretty good example because,
you know, if you look at genome sequencing or anything that works with images, whether it's
microscopy, cryo-EM, the resolutions are getting higher
and they just swamp with data.
Resolutions are going up.
Yeah, yeah, yeah.
Yeah.
Media entertainment is similar because same thing,
4K, 6K, 8K.
So resolutions are going up.
Data is very valuable.
So they just, you know,
the amount of data they have to store and process is just growing every year so that's a good market then the whole machine learning
um sector is very demanding on on the storage as well and you if you think about autonomous
driving you just have huge amounts of data there so that that's a good market. And then also traditional high-performance computing.
I think COVID definitely brought more attention
to the high-performance computing side.
Yeah.
And also a lot of funding from the governments
for research institutes.
So that has been a phenomenal market for us too.
Right, right, right, right.
So a lot of file systems do well with small files or well with large files. It's kind of hard to do well with both.
Is it because of the way you are doing tiering?
So you mentioned one of the policies was to take the file and put it on flash at first.
And as it grows to, let's say, oh, I don't know,
10, 100 gigabytes, you move it to disk.
Is that a policy option for you?
Yeah.
And in the end, it's exactly the reason
why we can handle both small files
and large files very well.
We use our asynchronous replication for small files.
And for large files, we use the Razor coding to make them efficient. And for large files, we use the ratio coding to make them efficient.
And for large files, it's throughput optimized.
And then by combining the two inside the same file, we make that automatic
so that you don't really care whether you have small and large files or mix
in a volume. So
by implementing both mechanisms,
we can deliver high performance for small files, high performance for random IOPS like databases or VMs, and high performance for throughput workloads with the erasure coding.
Yeah, interesting.
So the policy could be time-based, it could be size-based.
I guess it could be change- based and stuff like that i mean
i guess that doesn't make any sense sorry don't even go there bjorn um yeah it could be it could
be file extension based you know any anything you have in the file metadata you can actually
use for the policies username file file extension, last access time.
All of that stuff.
All of the attributes.
Yeah, yeah, yeah, yeah.
There's something else rattling around here.
I was trying to think what it was.
So you mentioned usability, API.
So Google is all about API and I'll call it DevOps kind of world
where everything is code managed and stuff like that.
Do your customers use your solution like that?
I mean, obviously, because it's API driven, it could be.
It depends on the customer, I would say.
We have customers that definitely use us, you know, infrastructure as code.
We also have Ansible scripts, and it's very easy to set up
Cobite clusters.
We have customers that spin up
Cobite clusters on the cloud
temporarily to provide the storage
for a workflow that runs
on the cloud, for example.
Yeah, that was the other question.
Which clouds do you work on?
And are you in quote unquote
marketplaces on those clouds
or is that something else?
We're on the Google Cloud marketplace,
but we also run on Azure, Or is that something else? We are in the Google Cloud Marketplace.
But we also run on Azure, Amazon, and Oracle Cloud, all of them tested.
Okay.
And as far as the Marketplace perspective, you're in the Google Cloud Marketplace, but not on the other ones I gather?
Yeah, but that will change. Yeah, over time, obviously. God, I think I'm done with
questions. I was going to ask about backups and stuff like that. Do you talk with any
backup solutions? I mean, it's one thing to have erasure coding and synchronous replication,
but if the cluster goes down, the cluster goes down, you want to try to restore it someplace else, right?
Yeah, you should have backups.
We should all have backups.
Pretty clear. So yeah, we work with some backup vendors. It's pretty straightforward because we're
parallel POSIX file systems, so you add you know extra nodes to run the backup from
and anything that does backups of posix file systems works with us we're also working with
vendors that have solutions for tiering to tape for example and then it's pretty easy they
often have a s3 interface and our next release will support that that you can tear off to them or
copy onto onto such a solution that then moves it to tape i mean external tiering always comes with
footnotes right when you something external system and you need to tear it back in i'm not a big fan
of external tiering it's often very cumbersome but customers ask for it
and that's why we're adding it
okay
you know I think that's about it Jason
do you have any last questions for Bjorn
I don't think so
it's an impressive looking solution
yeah exactly okay Bjorn
anything you'd like to say to our listening audience
before we close
I think try the free edition
that's the best way to actually see for yourself all the things like to say to our listening audience before we close? I think, yeah, try the free edition.
That's the best way to actually see for yourself all the things we can do and that storage actually can be easy. Yeah, don't say that too loud, but yeah. Well, this has been great, Bjorn. Thanks
very much for being on our show today. Thank you, Ray. And that's it for now.
Bye, Bjorn.
Bye, Jason.
Until next time.
Next time, we will talk to another system storage technology person.
Any questions you want us to ask, please let us know.
And if you enjoy our podcast, tell your friends about it.
Please review us on Apple Podcasts, Google Play, and Spotify,
as this will help get the word out.