Grey Beards on Systems - GreyBeards talk data-aware, scale-out file systems with Peter Godman, Co-founder & CEO, Qumulo
Episode Date: July 13, 2015In this podcast we discuss Qumulo’s data-aware, scale-out file system storage with Peter Godman, Co-founder & CEO of Qumulo. Peter has been involved in scale-out storage for a while now, coming fro...m (EMC) Isilon before starting Qumulo. Although, this time he’s adding data-awareness to scale-out storage. Presently, Qumulo is vertically focused on the HPC and media/entertainment market … Continue reading "GreyBeards talk data-aware, scale-out file systems with Peter Godman, Co-founder & CEO, Qumulo"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here and Howard Marks here.
Welcome to the next episode of Greybeards on Storage monthly podcast,
a show where we get Greybeards storage and system bloggers to talk with storage and system vendors
to discuss upcoming products, technologies, and trends affecting the data center today.
Welcome to the 22nd episode of Greybeards on Storage, which was recorded on July 6, 2015.
We have with us here today Peter Godman, CEO and founder of Cumulo.
Why don't you tell us a little bit about Cumulo, Peter?
Thanks, Howard and Ray.
Cumulo is an enterprise Thanks, Howard and Ray.
Cumulo is an enterprise data storage company that was founded in 2012.
We've been working on our product
for about three and a quarter years.
We launched our company right about three months ago
in March of 2015.
In the very limited amount of time
that we've had product on the market,
we've acquired right around 30 customers.
And the way to quickly understand what Cumulo does is we build storage that scales really, really easily,
but that most importantly is great at telling you about what data you have stored.
Traditional storage tends to focus on just the container and managing the
container. And we do that, but we also focus on helping people understand their data footprint
itself and manage it. So you've been out on the market, you said about three months,
four months. Is that true? Yeah. so we actually started selling our first product, which is our QC24 product, in about August of last year.
But we sold product for quite a while in stealth mode.
So we went out and found people that had the problem that we were trying to solve.
And then with about 15 customers in March of 2015, we finally launched the company and product and announced
the new hardware platform.
That's great.
So I'm a little bit confused by what you mean about information about your data because
I've heard terms like that mean so many different things.
But before we get there,
let's go to just the real basics.
Block, file, scale out, scale up?
Yeah, we build a scale out file storage system.
So we say we build the world's first data-aware scale-out NAS.
Okay. And data awareness means what? Okay. So back to data awareness. Lots of people
want to talk about data in a lot of different ways. And you're right to pause on what that actually means for any particular vendor.
The problem that we're focused on is one where organizations need to store billions now of individual files across petabytes of data.
And rapidly come to the realization that they don't necessarily know what they have anymore, why it's growing, where the performance
goes in terms of the actual data, who's using individual files, and then also grapple with
trying to back up and archive and understand what things need to be archived out of their data sets.
So we're not a big data company. Big data companies are really focused on extracting value out of your individual files.
You can kind of think of us as the big data of your metadata company. So you have metadata
associated with billions and billions of files, and we help folks understand what those files
actually are and what they have. I guess one question I would have is, are you creating additional metadata beyond what I would consider the standard NFS or CIFS-SMB metadata for files?
Today, we're focused on the metadata that already exists inside files. you can imagine us extending that to be any sort of extensible metadata and the ability to
provide analytics on extensible metadata and index, et cetera. So the canonical use case there,
I guess, you know, you imagine the first step here in data awareness is tell me what proportion
of storage in my system is consumed by files that are, you know, three to seven megabytes in size. The next step
might be, you know, tell me about what proportion of storage is consumed by files that are three to
seven megabytes that are 192 kilobit MP3s or something like that. So you start with just the
POSIX metadata. Next step out is things you can infer about the file by looking at it. And then
the step beyond that, I think, would be the individual metadata tags
that applications or end users make about the data.
Interesting. And you said scale out.
So is there a limit to the number of nodes?
I guess that's a critical question here.
I mean, architecturally, there's always a limit of some type, right?
Yeah. So our primary limit is just governed by
what we can assemble to put together and test. So we build an N-way scale-out system. And the
assumption is that we scale linearly. And in practice, our ability to deliver bigger and
bigger systems is gated more than anything else by our ability to assemble more and more hardware.
Practically speaking, we don't see a huge number of scalability limitations for our architecture right now.
That said, our biggest systems in field are right around the 20-node mark right now.
That's not bad.
And it's a shared nothing model, right?
That's right. We build a shared nothing system. So the way to think about it is that all protection inside our scale-out file system lives at a scalable block layer.
So there's a scalable block layer in our system that provides transactional semantics and also data protection.
The file system itself thinks it's talking to pretty much a linear address space, but the file system itself is also a scale-out entity.
So whereas some object systems and some scale-out systems will build protection and transactional semantics into the file layer, we actually build it into the block layer. better data efficiency because we can protect files against each other rather
than just against themselves and also higher transactional performance.
Interesting distinction you draw there Peter. Probably have to probe that for a little bit.
But so as far as the hardware is concerned, is there a specific
inter-cluster network that you require or anything like that in InfiniBand or something like that?
Today, our smaller systems use 10 gigabit Ethernet and our larger systems use 40 gigabit Ethernet.
We're committed to open standards at this point.
I've been involved in systems before that used InfiniBand, for example, and using that is pretty convenient.
But at this point, you can get what
you need out of Ethernet. And, you know, one of the themes for our company is that as time moves
forward, vendors of something that, you know, you and I would call storage will have less and less
control over our hardware environment. So in the future, storage systems will run
in other people's infrastructure as a service clouds.
In the future, our customers will want more and more
to use white box hardware
that they're procuring on their own.
And so we're focused on using all open hardware components.
So Ethernet for interconnectivity.
And rather than using a proprietary NV RAM
we use SSD to take all writes in the system and to handle commitment of those writes.
Okay so the SSD is for a write buffer only or is it used for reads as well?
Yeah so all data coming into the system lands in SSD
and data may be read from SSD as well.
And as the SSD begins to get full,
data is selected to migrate down to spinning disk
as a call to tier.
So storage tiering to a large extent.
Yeah.
I've dealt with this problem for a long time.
And the traditional solution, scale aside,
has been to run some third-party product
that walks the file system
and copies the metadata into a database
so that I can query it.
Have you guys made the actual file system metadata that kind of queryable, indexable database?
Or are there two data structures there?
And how do you manage that?
Yeah, great question.
We've, you know, like you, something that we observed when we started the company uh just as an aside you know we we conducted a great deal
of research starting the company we ended up talking to about 600 administrators of data
storage and one of the things that we found like you is over and over again you see this pattern
of things that walk the file system extract extract metadata, and put it into a separate database. And what we heard when we talked to
people doing that is it provides a solution to some problems. If you don't need up to the minute
information or even up to the hour or day information, you can get a kind of view of
what's going on in your storage. But in creating that view, you had to have a separate database
to scale and manage a piece of software
that walks through files in a file system
and then pushes it out to this separate database.
And in return for that,
you get this kind of out-of-date information.
If that walker is single-threaded,
it might take, say, 50 days to index a billion files.
And if it's multi-threaded, then you consume a lot of IOPS
looking at the same files over and over and over again,
making sure that they haven't changed.
So big problem.
People are really unhappy with this situation.
And the existing file systems bog down.
Yeah, that's right.
None of the file systems are just not designed for, you know, I want to, even a backup job going, I want to check the archive bit on these four billion files.
It takes forever.
That's right.
That's right.
So, exactly right. So creation of a backup catalog, for example, starts to take forever and it results in large systems that are almost impossible to back up.
Answering any kind of complicated question becomes really difficult.
And then last, the caches of storage systems are not there to cater to the needs of things that want to repeatedly read all metadata out of the system.
And yet in a traditional storage environment, that's what
ends up happening. All of your cache is completely filled with inodes associated with periodic walks
of the entire file system tree. So that's a really bad situation. And we believe that storage should
be rather more intelligent about that and should be able to answer complicated questions about data
footprint virtually instantaneously.
So what we decided to do, you asked the question about is it built in or is it external.
So our database is completely built into the file system tree itself.
And what we do is we basically build a hierarchical database into the file system such that you can answer really quickly
complicated questions about resource usage and also really rapidly identify,
for example, all of the files in the system that have changed in the last 24 hours.
So functionality was recently added in the last few weeks. So functionality was recently added
in the last few weeks
that makes that query, for example,
very, very fast on our system.
And what you can do inside any directory inside...
Kind of the logical equivalent
of being able to put an index on a metadata attribute.
Yeah, that's right.
So one of the challenges
with traditional
relational database approach is it gets more and more expensive the more indexes you add.
Keeping track of all these different data structures is really expensive. So what we
came up with is a way that individual attributes of things can have functions applied to them,
and those can be aggregated inside the file system tree. So, for example,
one of the attributes that we aggregate is the maximum of the change time inside a directory structure. So, you can look at any directory in the system and say, tell me about the most
recent change time that exists somewhere in this tree. And if you, for example, want to say, I just want to know about these changes.
For all the MP4s, I can very quickly find out
if there are any at any node in the tree
without having to enumerate its members.
That's the idea.
So, you know, you can imagine recursively
starting at the root directory saying,
hey, does this thing, does this directory have anything that's been changed in the last day?
No? Okay, I don't need to deal with it then.
And you do that recursively in your search for files that match individual criteria.
So that's how indexing works inside CumuloCore.
And we did that, as you say, in response to people's pain associated with building these external databases of metadata.
It's almost like you built your own internal database
for the metadata. It's not really a relational database
per se, but you've optimized the information. It's an older
model. Yeah, absolutely. ISAM kind of thing.
Absolutely right. More Codacil
like. Yeah, I suppose, yeah.
Okay. Yeah, so
the thing is with
relational model is it's wonderful to be able to
ask really, really complicated queries
about very large data sets.
But, you know,
when you build a file system, it has to perform
like a file system. We sell a scalable file system product
and so updating 10 indexes
for every inode write we do
just wouldn't work
and so we had to find a way to build
a high performance database that could
answer most of the questions people had
and still have a file system
My problem with
things like ILM
over the years
has in no small part been
just that the metadata available
in POSIX
is so meager
that it makes it difficult
to make decisions.
Do you have
any extensibility to this?
So today, we don't have an implementation of extensible metadata in our file system.
You can expect that to change relatively quickly.
What we do have is the ability to, as you say, first leverage POSIX metadata and then next derive it automatically from the files themselves.
So lots and lots of file types have headers where you can read the information about the file.
For example, find out if it's an MP3 or whatever kind of file it is.
When you start getting to JPEGs, there's a lot of data there.
Yeah, right.
There's actually a lot of data inside systems, and most of it never gets consulted.
And it's going to be a long road for us to pull all of that data out of all of those different file types. It's one of the things that makes a vertically focused go-to-market strategy really appropriate for this sort of product because you need to understand 10, 20, 50 file formats at a time, rather than every file format the world has ever,
you know, has ever come up with, which would be impossible. So, yeah, in answer to your question,
as I say, I see this as a three-stage journey. The first is just POSIX. The second is,
you know, POSIX plus whatever's in the file. And then the third stage will be
applications and possibly individuals tagging extended attributes on data.
And of those, you know, like you, I've seen for a long time that getting humans to do any manual tagging of metadata on files is very painful and difficult.
I blame Bill Gates. You know, I worked for a while on a system at Real Networks to do with having all content creators just tag metadata on their created content.
And that project went on for a very long time.
And I think the folks who were running it came to terms with the fact that getting humans to do things like that would always be very, very difficult.
The funny part is there's one class of humans it's not that difficult for,
and that's attorneys.
Yeah, librarians maybe or something like that.
Because if you go to a law firm, law firms don't open Microsoft Word or WordPerfect,
and amazingly enough, a lot of law firms still run WordPerfect,
but a lawyer doesn't sit down to write a pleading
by opening the word processor.
He opens the document management system,
and he says, I want to create for this client,
for this matter, this kind of document,
and then it creates all the metadata.
And he can't hit save until he enters all the rest of the metadata.
And if Microsoft Office in 1995 had had an easy way for us to say,
don't let anybody leave without filling out these four fields,
life would be much easier today.
For these sorts of things, yes, yes, yes.
So you mentioned a vertically focused,
I go to markets,
so I assume you're vertically focused,
and what markets are you going after?
Yeah, today the market
that we've enjoyed the most traction in
has been media,
and we're also targeting life sciences and oil and gas also.
And as I say, probably slightly more than two.
Sorry, go ahead.
No surprises there.
No surprises there.
Yeah.
Yeah, so the way I look at those markets is, you know,
one way is to say this is commercial high-performance computing.
See folks with really high-performance requirements that are using computers to analyze or create content.
But the other side of that, I think, that unifies all of these fields is it's humans and computers working together on analyzing and
creation. One thing that we've observed is when it's just an application that is the only data
accessor, there you tend to see a lot more adoption of object storage. And also that application is
going to track metadata associated with its stored assets on its own.
When there are humans on the other side of it, though, humans are great at creating things, and they're also great at creating waste product along the way that goes unindexed and not understood.
And so these verticals have this in common.
It tends to be places where there are people that are directing the creation and analysis of large amounts of data.
I'll give you a random example.
I was talking with a large biotech company recently that commented they had had a researcher visiting from Europe.
And he had stayed for about 90 days.
And sometime three months after his departure, they were sort of looking at the storage footprint. The storage
footprint was about 30 petabytes total. And they said, you know, something's just not adding up
here. It's like someone came along and just sort of inflated the balloon a while ago. And it turns
out that this researcher had, during his tenure at the institution, created about a petabyte of data
and left it lying around all over this 30 petabyte file store.
And this organization had no way of enumerating all of the data that belonged to this person
or no way of understanding the fact that it was this person that had created this rapid growth and footprint.
And what we see is that when folks do have billions of files, you see this problem over and over and over again,
particularly when there are humans involved because the humans are making decisions.
They're creating temporary artifacts.
And humans will often work with the belief that the underlying resources are free, storage in particular.
Well, then it doesn't come out of my budget.
Yeah. I have the same problem with my own desktop.
Yeah, yeah.
It's free, right?
I never got a bill for it.
Yeah, well.
So to solve that problem,
I would want more than POSIX metadata holds for me.
Well, doesn't POSIX indicate the
creator of the...
There's owner. Owner, yeah.
It's quite the same.
Yeah.
And I kind of want historical.
You know, who wrote this
file last?
Not to mention the whole security auditing
part. Yeah.
So, you know, this particular example is a good example of one where just POSIX is sufficient.
If you track through time utilization by owner or something,
and you just use graph how that's developing over time,
you can see this pretty easily just based on POSIX.
However, I completely agree with you that sometimes, you know, when all
you're seeing is sort of the most recent owner and the user is different from the owner, then
that's not sufficient. And what you actually want to see is information about the individual
operations that have happened against the file. And I, you know, I expect that that's where audit
comes in. That, you know, audit, usually the expectation is that is a system to preserve
individual operations that have occurred against a file. Security in these spaces is obviously a
huge concern as well. You know, right back to humans working on data, when it's a single
application working on data, you don't normally have enormous security concerns. It's either
secure or it's not, and it tends to be perimeter security. When you need to have a thousand people
working together on a piece of sensitive data, things get a lot more complicated, and that's
where audit comes into play, as you say. So you have audit in the system today, or is that something
that's one of these future... It's a future thing. We don't have it in the system today. And you mentioned security.
Do you encrypt the data in the file system?
Today, we don't have encryption in the file system.
You're not even logging audit data right now.
No, we're not logging audit data right now.
Because, well, that's too bad.
Yeah.
I think that, you know...
Not that I want logs,
because that's the real problem with auditing, you know.
Even as humble a file server as Windows
has had all of that auditing available for decades,
but as soon as you turn it on,
you end up with megabytes of cryptic log files.
And then you need to do.
Splunk or whatever.
Or you need the big data of your metadata.
A couple of points about this.
We as a company, Cumulo, try to be very transparent about what we do and what we don't do at any given point in time.
We operate in a space where mature products have hundreds of years of investment in them, and it's pretty much inescapable. full-featured, scalable storage product that handles file protocols today without an enormous
investment of human capital, basically. And so prioritization is a constant part of our existence.
So one thing that's kind of notable and interesting about the way we deliver
our technology is in response to this particular problem. So we actually ship new versions of our storage software every two weeks.
What?
Yeah, every two weeks.
Wait, wait, no.
It used to take us longer than two weeks just to validate a new version of a system of storage.
Right.
So, yeah, so I'll tell you about how we do that. Almost every one of our customers updates to almost every build that we put out,
every two-weekly build.
So that sounds crazy, right?
It sounds like, well, that's just a recipe for total mayhem.
So it's actually not.
And it helps to understand how traditional storage systems have been qualified
and certified.
So the traditional way and the way that I've been exposed to in the past is that software developers write software.
And occasionally we'll write a test here and there.
And then the whole bag gets thrown over to a QA organization whose job it is to verify that it works, for some definition of verify that it works.
And so what you have is a huge disconnect between the people who are producing the artifact and the people who are certifying it, right?
And the people who certify end up doing really more tests than anything else.
So we looked at that and looked at what people were doing in the SaaS world with rolling out software continuously
and kind of looked at how are people accomplishing that
without periodically having their website fall apart?
And the answer for us ended up being,
we have a team of software engineers
that write almost all of the test coverage for their own code.
So we have a really maniacal focus on testing
every single aspect of a piece of code
as it goes into the code base.
So we do certification.
For us, it takes nine or ten days.
But we don't expect any significant issues
to show up in that process,
and rarely do we see any significant issues show up.
And the reason is that by the time,
well, let's say 12 hours after a developer checks a piece of code into our system, it's already
undergone a million tests or thereabouts, and many tens of thousands of unique tests of different
facets of our software. So we're building things in a new way, and it's possible that other storage
companies have done this in the past, and I think it's probable that most have never done things this way, and it actually works out really well.
So we measure the number of days it takes for us to respond to a customer request for a new feature or a piece of, let's say, relatively minor functionality, and we measure that today as about 10 days, 10 business days from request to delivery.
So we turn things around really, really fast.
And we do that having a pristine track record of no outage and no data loss across now 18 months of having systems in deployment.
That's amazing.
It's almost agile development applied to storage.
It's absolutely agile development applied to storage.
It's wonderful and scary all at the same time.
Well, you know, I have to be honest with you.
Remember, we're not gray beards for nothing.
And if I could grow facial hair, I think it would be gray.
So I'm with you.
You know, I too was skeptical about this.
Honestly, Neil's my co-founder.
He's VP of engineering.
He said, no, Pete, look, we've got to do it this way because this is how things are going to be done.
And these are all the great things.
And I was honestly quite skeptical, but it's worked out really well.
Yeah, I would say, you know, in the old days a six month validation activity was
relatively good.
Nine to ten days
seems...
The world is moving faster than it used to,
Ray. Wow, yeah, I
understand that. Although,
this method
and turning
around new ASICs
in ten days would become a challenge, to say the least.
Yeah, I don't know if you could take all of this and apply it to hardware.
It's worked pretty well in the software domain.
But that all fits in with trying to keep anything proprietary out of hardware.
We build systems that are, as I say, there's no custom MV RAM.
There's no custom fabric. It's Ethernet and it's SSD, and that lets us move really fast on software.
So what's the go-to-market model, software only?
Yeah, so even though we build a technology where pretty much all of the IP is in software,
most of the systems that we sell
are with bundled hardware. So we have two lines of appliances, the QC24 and the QC208. The number
just signifies the amount of raw capacity in a system. And almost everyone buys those appliances.
So one of the things that we observed along the way, talking to lots of folks that buy data storage, is that there is huge demand for software-only storage at the very, very high end of the market.
When you talk to the largest investment banks, when you talk to hyperscale internet companies,
people want software, but everyone else needs to buy appliances because no one can afford to qualify
a piece of hardware against our software because it's expensive. It takes a long
time to make sure this piece of hardware isn't going to lie and destroy your data along the way.
Yeah, there's another pocket of demand all the way at the bottom.
It's software only lives at both ends. Yeah, it's the extreme side of things.
I agree. We see it at the bottom as well. But in the middle, people
want to buy...
The funny thing is, frequently
people want to do POCs on the hardware
that's already in their data center, but then want to
go into production on your
appliances. Yeah.
So we do a lot of
proofs of concepts and
evaluations using
virtual machines. It's really handy to have something software- only because people get to look at it in virtual machine context.
But when things actually roll into production, they're mainly on appliances.
We do have one customer today that is software only.
So we're quite willing to do it for the right opportunity. But you over and over again run into this problem of how are you going to qualify that this piece of hardware isn't going to let you down.
And you need to have enough scale to make that worth doing.
Yeah.
Yeah, yeah.
Makes sense.
So speaking of performance, do you guys have any benchmark types of numbers for your systems? I think we publish benchmarks out to all
of our customers every two weeks. We don't presently talk about performance numbers publicly.
Okay. But the architecture of the system sounds like it's set up to provide good performance with both large and small IOs.
Yeah, that's right.
The system is designed to deliver good small IO performance.
Our scale-out block store is based on a 4K block size, for example.
And simultaneously, one of the expectations people have of a scale-out file offering is that it delivers very high throughput,
and we deliver high throughput as well.
Right.
Well, I mean, especially in some of the markets you're in, like media,
where I'm going to suck a multi-megabyte file up and render it for a while
and then push it back down.
Yep, that's right.
So how's the data laid out
on the back end? So I mean
each node
controls
a certain
hash space for the
scale out file
system or is it
This is the point where I'm going to start
practicing the law without a license I think.
I'll talk a little bit about it.
And then Neil or Aaron will run into my office at the end of this and say, you got that totally wrong, Pete.
So basically the file system, the block store in the system is the segmented distributed block store where five gigabyte contiguous address spaces are laid out between individual nodes.
So the file system itself says, okay, these next blocks are going to go into,
we call these things P-stores, these five gigabyte contiguous regions.
So it'll say, I'm going to put this into P-store 11,
and I'm going to write it offset 102 or something like that.
So the file system knows about that segmented P store space,
but it doesn't know anything about how those P stores are actually laid out
and distributed between individual nodes in the system.
So it's not a pure linear address space the file system sees.
It's segmented, but it nets out to be the same thing.
And the protection level is like RAID 6 kinds of things, or is RAID 1?
So yeah, today, each of the P-Stores is a mirrored P-Store. So it's a two-way mirrored P-Store.
So a couple of notes about why we do this. We're actually working on a ratio coding right now,
but I can't tell you yet when it will come out. It will be in the not very distant future.
So today, we do 2x mirroring. So,
you know, the key to reliability in a mirrored system is having incredibly fast rebuild times.
They have to be very, very fast. And one of the reasons why we did all protection at the block
layer instead of the file layer is it's the only way that you can make hard disks operate entirely sequentially while you're rebuilding data.
So in CumuloCore, as the system re-protects after a drive failure, all drives in the entire system
participate in reading data that was left exposed by the failure of that component,
and they all operate entirely sequentially.
And what that means is on a minimum-sized QC208,
our 208 terabytes per node system,
it takes about, as I understand it,
about an hour and 20 minutes
to re-protect away from a failed 8-terabyte drive.
So about an hour and 20 minutes
for failed eight terabyte drive. And then that amount of time halves every time you double the
system size. So an eight node system should be about 40 minutes and a 16 node system should be
about 20 minutes. And that scales down. And that's regardless of whether you have small files or
large files or transactional access patent or a sequential access patent.
But a node failure that takes multiple drives offline
is also going to take linearly longer.
Yeah, so node failure, we deal with those entirely through just node replacement.
So we just bring in another chassis, pull out the drives, and put them in the new chassis.
We don't have problems with having a custom NVRAM or something.
It's just SSDs and disks.
So if you put them in a different chassis, then you're back to normal.
Right.
So you really don't have to rebuild.
You just kind of do a mind meld between the old down node and the new node that you just brought in.
You just move the disks and the SSDs,
and then you're up and running, I guess.
Yep.
And you provide a spares kit
that a customer can buy to be able to do that?
Yeah, that's right.
So, well, we have, yeah, so we'll always sell it.
I don't want to wait the four hours
for something I can do myself.
Oh, right.
It's really one of the things that annoys me most
is when I walk into data centers and they go,
yeah, and we have 172 of these.
And when something goes wrong, we call
and they bring us one to replace it.
Right, right.
That's like, you have 172?
Why don't you have 174?
So, you know, the other side of that is
you don't necessarily want, you know, the other side of that is you don't necessarily want, you know, we could deliver extra chassis to everyone with a bunch of blank drives.
But you'd have to spare out enough capacity that that's a lot of capacity that's just sort of sitting still for a redundant system.
So the way we deal with it is if a node fails, the system stays up and we, you know, we bring in another chassis and then just transfer the disks over, and then you're done.
Okay.
So I need to hit the performance thing one more time, Peter.
Yeah, I've been following spec SFS since I was a wee child.
And I'm looking at it.
Every once in a while, we see a scale-out system out there,
and it performs real well, does lots of high numbers of NFS ops or CIFS ops
and things of that nature.
But the recent version of SpecSFS 2014, there has been zero non-SpecSFS submissions.
And I was wondering if you're aware of what the problem is
with respect to that
and number one
and number two
if you are planning
to release a spec SFS submission.
Yeah, so we will.
We're looking forward
to doing so.
We are kind of
in an interesting gap
again on spec.
I suppose you wouldn't know this but I was involved in doing a lot of the spec work at
Isilon a long time ago. I was one of the people working on performance
of that system. And I don't know what's up
with the new one. I know that it's application-centric. I don't know if there are compatibility
issues for people running presumably extracted parts of application backends against storage. I wish I knew the answer to the question, but I don't know if there are compatibility issues for people running presumably extracted parts of application backends against storage.
I wish I knew the answer to the question, but I don't know why people aren't publishing right now.
In my last performance analysis, I kind of went into this. I think it's not that
bad as I suspect. I mean, when SpecSFS 2000,
oh, I don't know, the prior one, S97R1 or something like that
changed over to the next 2008.
There was probably a quarter delay for a couple of items,
but it wasn't really a big change.
This one is a sizable change.
They went to applications.
They went to an operating system file stack.
It's a major change from my perspective.
It seems the limited amount I've read about it makes it seem like a huge change.
As you know, the problem with all benchmarks is that vendors have a really great incentive to go game them.
Any way they can figure out how to do that, you know, by making replicas of, you know, whole data sets or figuring out what, you know, what particular pattern of bytes gets written into a file and a benchmark or whatever silly stuff.
Peter, when I was at PC Magazine,
one manufacturer of video chips customized them to run our benchmarks faster.
Video chips?
Yeah.
This VGA chip would identify the data pattern that we wrote as part of our benchmark and would read the screen out of a buffer rather than render it.
I don't want to hear this, Howard.
So, you know, I've been on the other side of that and I've seen sort of all the incentives that creates to optimize towards a particular benchmark.
I actually really like spec as a benchmark because traditionally it hits a lot of different operation types and is focused on stuff that matters.
Even though everyone reports IOPS on spec, it does report latency, and latency really does matter. So I like all that about that lineage, but at the same time,
we've, you know, we've only cursorily run spec in the life of our company and we primarily focus
our efforts and performance to, you know, are our customers satisfied with the performance that they
get and optimizing only for what they're trying to do because that seems like a healthier short-term direction for the company.
I think longer term, you can expect us to say,
oh, yeah, but here's the deal.
As you move from three verticals to 20,
the more general case applications
become more significant.
There is a video version of the SpecSFS.
So one of the four applications is a video,
almost a media server kind of thing.
So you might take a look at that.
All right.
You know, we've kind of hit the 40-minute mark here.
I got one more thing I want to talk about, though, Ray.
All right, Howard, go for it.
Because the problem comes now, I've got this ginormous scale-out file system.
How do I back it up?
Yeah, how do you back it up?
Because we've already talked about, well, and walking the file system to find out who's got the archive bit set isn't practical when I've got millions and millions of files.
And your index will speed that up, but only to the point where NetBackup can handle it.
Well, yeah, so there are a few things that go wrong with backup.
The first is identifying rapidly all of the files
that have changed in a certain time period.
We solved that problem.
I mean, you can get an answer to that question
that takes an amount of time in proportion to the number of changed files.
And you don't have to return the whole thing at once. We can even stream that information back,
right? So I actually provide a stream of things that have changed. So we feel like the technology
we've built is going to solve that particular problem. And the second problem, a lot of backup
systems... But you have to do that in a way that doesn't require me to rewrite
my backup app. Right. Yes. My next point was about backup applications. So, you know, so backup
applications need to understand that. And then second, they need to be able to parallelize very
large data sets. If CumuloCore comes back and says, oh, this is the first time you're backing this up. So there are 1.6
billion files in here. Most backup applications are going to choke, especially if they're running
single threaded. Now, if they have the ability to paralyze their operations, the situation looks
better. But usually they're going to either need to look at all data in the system to figure out how to parallelize,
or they need some help in some way.
So the other thing that we're doing through our analytics is allowing very rapid partitioning of very large trees.
You say, break down this tree of 1.7 billion files and 3 petabytes of data
into 100 equal-sized chunks, right?
And because of the hierarchical database functionality of our system,
we can do that near instantaneously,
actually carve a huge file system into many equal-sized chunks ready for parallelization.
Are you actually changing the directory structure, or are you doing this virtualized?
No, we actually can just respond with a list of inclusions and exclusions that define that slice of a file system tree.
So if you imagine how you describe to Async inclusions and exclusions, you can imagine that Cumulo can say these are the inclusions and exclusions that define this particular slice of this tree.
So that's what we're doing on that front is we make it so that we can actually help the application parallelize.
Now, as I said, we're a very transparent company.
Our integrations with backup software right now, they're not yet,
but we're actively working on integrating with applications
to make it so that this thing is the first very large file system
that is actually easy to backup.
Okay.
You mentioned parallelization. Do you support like NFS v4?
No. So that's a whole separate topic in terms of parallel access through clients.
No, we don't do NFS v4. We will do as demand dictates.
I think we're still kind of in chicken-the-egg territory on NFS before, or PNFS particularly,
in that client support is still a bit problematic and client compatibility is still problematic,
and then vendor support is also patchy, and we're part of that part of the problem.
But it's tough to prioritize that when there's only so much actual demand. Yeah, I guess the other question on
version levels with the SMB3, are you guys up to that level yet? Yeah, we do SMB2.1 right now,
and again, it's demand-based. I think the big driver for SMB3 will be completely transparent failover semantics
when nodes come and go in your scale-out storage system.
That's going to be one of the earliest drivers for us.
Rather than the performance gains,
which not as many people ask for,
it'll be completely undetectable failure.
You know, no dialogue box failure.
Yeah.
I guess I have one further question.
Snapshot support?
Not today.
Not today.
High on the priority list.
I understand.
No, I think I've got it.
It's, you know, like a lot of the scale-out systems,
you know, they're hitting that set of verticals where the requirements are pretty well definable.
I like the idea.
I like the analytics.
And, of course, I always want more.
Yeah, don't we all?
We wouldn't be gray birds if we didn't.
No.
So, Peter, is there anything else you'd like to say?
You're with
me on that too i always want more i'm never never satisfied um no i mean uh you know to your point
about wanting more stay in touch um these are uh these are vectors not points you know so things
change fast with us so i'll look forward to uh keeping you and hopefully your audience up to
date all right good well this has been great it's's been a pleasure to have Peter with us here on our podcast.
Next month, we'll talk to another startup storage technology person.
Any questions you want to ask, please let us know.
That's it for now.
Bye, Howard.
Bye, Ray.
Until next time.
Thanks again, Peter.
Thank you so much, Howard and Ray.
All right.
Thank you.
Until next time.