Grey Beards on Systems - GreyBeards talk scale-out storage with Coho Data CTO/Co-founder, Andy Warfield
Episode Date: February 3, 2014Welcome to our fifth episode. We return now to an in-depth technical discussion of leading edge storage systems, this time with Andrew Warfield, CTO and Co-founder of Coho Data. Coho Data supplies VM...ware scale-out storage solutions with PCIe SSDs and disk storage using the NFS protocol. Howard and I talked with Andy and Coho Data at … Continue reading "GreyBeards talk scale-out storage with Coho Data CTO/Co-founder, Andy Warfield"
Transcript
Discussion (0)
Music Hey everybody, Ray Lucchese here and Howard Marks here. on Storage monthly podcast, a show where we get Greybeards storage and system bloggers to talk with storage and system vendors to discuss upcoming projects, technologies, and
trends affecting the data center today.
Welcome to the fifth episode of Greybeards on Storage.
It was recorded on January 20, 2014.
We have here with us today Andy Warfield, CTO and co-founder of Coho Data.
Why don't you tell us a little bit about yourself and your company, Andy?
Sure. Thanks, Ray. Well, the company has been around for just over two years at this point.
We kind of come from a mixed storage and virtualization background.
I was one of the guys that wrote the Zen Hypervisor when I was a grad student in the UK. And then I was involved in ZenSource with both of my co-founders, Keir Fraser and Rauna
Janela.
We're building a high-performance scale-out storage system that incorporates a bunch of
pretty interesting technologies, not the least of which are PCIe Flash and software-defined networking.
And so the premise of the company is basically to build a scale-out storage service like
you might see in the larger cloud providers, but package it up in a way that you can deploy
it inside an enterprise data center.
That's very interesting.
I guess the first question I had about this is the software-defined networking layer.
Not a lot of storage companies out there package their systems with software-defined networking.
Can you explain why you thought that was necessary in your environment?
Yeah, yeah, that's been around for a while.
Flash was being packaged as a SaaS or SATA device, and it was basically a disk that didn't have the same kind of seek penalty. And as Flash has started to move on to faster buses, PCIe first and more recently on to
DIMM slots, the bandwidth that's available into the device is significantly faster than
we've seen from disks in the past.
And one of the early realizations that we made working with some loaner gear from Fusion IO was that a single device, even under a pretty aggressive random workload, was capable of saturating an entire 10 gig port.
Which meant that you were going to have to push a bunch of the logic in the storage system out to incorporate the network.
And that's where we really started looking at SDN. So a random workload and a single device is able to saturate a 10 gigabyte
Ethernet NIC?
Is that what you said?
A NIC with dual ports or a single port and all that, you know, jumbo frames
and all that junk.
Howard, jump in any time you want with the networking stuff.
Well, a single port and jumbo frames really doesn't help all that much.
My experience
with iSCSI has been
jumbo frames get you
somewhere like 3-5% better performance.
But remember, Ray,
we're talking about Flash, so the fact
that it's random doesn't slow it down.
So
can one PCIe, SSD, ssd or fusion io card deliver
more than 10 gigabits per second of data uh yep sure can yeah yeah yeah if it's the right if it's
the right data right that's the other part part of this problem yeah yeah absolutely well so the
interesting thing at that point right at 10 gig gigabits per second or around a gigabyte a second,
you're already faster than the entire 600 megabyte Sasser SATA bus.
You're dealing with storage performance that exceeds a fully populated SAS or SATA bus pumping,
you know, at full speed.
And so, you know, it really begged a reconsideration of the software and of the storage architecture
in that, you know, the way that we've always built storage has really been around assuming
disks were the slowest things.
And what we saw with the Flash was, you Flash was the device was fast enough to saturate the NIC,
and saturating that NIC actually consumed a fair bit of CPU.
And so the thing that we're faced with in going to multiple devices,
which you still need to do because you still want redundancy,
you still want to be able to scale up capacity,
is that a bunch of that coordination and aggregation of the devices
needs to move out to incorporate the network.
And so we use SDN a few different ways around that.
Yeah, I found it fascinating over the past few months
to see how different vendors are coming out with different scale-out systems
and the whole how you get requests to the node that has the data
varies so substantially across the models.
Absolutely.
It's something that's actually really evolved a lot, right?
So the scale-out and data distribution problem predates Flash, obviously.
With a lot of object storage systems where you're just trying to scale a capacity on lots and lots of disks,
you would use things like consistent hashing to try and throw the data out at all those disks.
Yeah.
Right. And the latency that stopping at an ingest node and figuring
out where you had to go next was acceptable because it was a disk
on the back end. But now that it's Flash on the back end,
adding a millisecond of latency
in a reference starts to add up. Yes.
Yes, absolutely.
So are you saying you're doing the hashing and the open flow layer?
Is that to decide where to go?
No, we're not doing the hashing at that layer.
There's opportunities there.
It's actually, there's a bunch of kind of interesting stuff
that we can talk about there.
We've learned an awful lot about the merchant silicon that these switches are shipping with
and the capabilities of OpenFlow on this stuff.
So there are kind of two layers that it's interesting to think about the protocol at
or the work that we're doing at.
Well, so maybe it's useful to take a step back
and just talk about the hardware architecture.
Sure, sure.
Yeah, yeah.
So we're a scale-out system, as mentioned.
The building blocks of the system
are balanced combinations of flash, CPU, and network, right?
So we go and buy the best sort of performance price point
from what we can get in terms of commodity hardware
and all of our stuff is software that sits on top of it.
So our GA product is a 2U2 server OEM box.
So it's two physical servers.
Each of the physical servers has two 10 gig ports,
two sockets, and two PCIe flash devices. We also incorporate a tier of 12 spinning disks
across that two use, six per side. And so we call those server instances microarrays
in the architecture to sort of suggest that they're you know the smallest sort of like array analog in terms of building scale of storage and we manage the two halves
of them as sort of virtual instances right so you have a wire coming in on the 10 gig ports
and that's being handled directly by a cpu and mapping down onto flash and so to scale up the
system you just plug these things into the switch.
We run software on the switch as well as on the nodes.
And so a sort of like smallest possible deployment of the system is a single switch and a single
2U box, which gives you 40 gigabits of connectivity into your storage across the four ports. Being inclusive of the switch means that your clients,
and our GA really focuses around virtualization and ESX clients,
your clients can be run directly into the switch,
and so they can take advantage of the full cross-sectional bandwidth
of the switch into the storage.
So we can actually pump 40 gigabits of storage out onto four clients as a starting point.
So getting back to the question of how the forwarding works, you can really think of our
stack in two layers, right? The base layer is the aggregation, right? It's a bare metal
object store where you combine all of these storage elements and allow them to be addressed.
And at that layer, you get horizontal addressing across everything.
You can talk directly to data across all the devices.
The interesting problem that comes up on top of that is there have been object storage companies kind of forever of stuff, right? And one of the big Achilles of building an object storage product
is that you have to demand interface changes on the client,
or you have to build a single gateway that ends up being a big performance bottleneck.
And so the thing that we've done there is we've taken advantage of the fact
that we've got this scale-out platform with CPU and network
to actually build a horizontal scale-out presentation layer, right?
So we built an object store, but we said,
we're never going to be able to sell a brand-new object store.
Let's go in with a protocol that's actually relevant
in these virtualization environments.
And so we started with a test.
Thank you.
And so the NFS implementation is a horizontal NFS controller, right?
And by that I mean that you see a single NFS server IP address on the other end of the switch, right?
The clients, the ESX hosts are configured to talk to a single NFS IP address.
When those connections come in, they're balanced across as many microarrays as you have.
So the NFS implementation that terminates that IP address is actually scaled across all of the hosts.
It's basically a distributed TCP endpoint that runs across all of them.
So that also helps solve the problem that NFS doesn't natively do multipathing.
Yes, yes.
There's lots of room to continue to improve on that.
But the place where we are right now is that basically every single NFS session that comes into the switch can run active on an independent link. So if you have four ESX hosts, they can come in over four wires,
all appearing to talk to the same NFS server.
And what's more, we can dynamically move those connections around in response to load and data location and stuff like that as the system runs.
So that was the question.
So if the data happens to be on, let's say you've got, I don't know, a four-server configuration
or even an eight-server configuration.
So the disks that are behind the pair of servers, are they shared across all the servers or
are they only accessible to the pair of servers?
How does that play out?
Because, I mean, if the data is sitting on, you know, let's say disks associated with the fourth pair rather than the first pair, how does that get moved around and stuff?
Oh, okay.
Yeah, that's a good question.
So the disks in the system are built in behind the flash.
And so the object store that we built that runs on each node manages those things as a set.
The interesting bit of that part of the system's design is we designed it first for flash
and then added the support for disks later.
So unlike a lot of the storage systems that you see today where you've got a 20- or 30-year-old RAID implementation
with a file system and someone's gone, okay, how do I stick this flash on top of it?
And then done that to vastly varying degrees of success.
Yeah.
We kind of came at it from the other end,
just out of the speed that we looked at the hardware that we had.
So we built it for flash, and then we asked the question,
okay, we need more capacity.
How do we demote cold data off to the disks? So the Flash is the primary tier in the architecture,
and the disk is the slower tier. To your question, though, Ray, if the data is on a different
microarray than the one that you're coming into, there's absolutely a hop there, right?
So in behind the gateway, the request gets forwarded one hop to pull that stuff across.
Now, we're doing some pretty exciting stuff with the switch and with the TCP implementation
where I expect that later this year, that hop will be in one direction, right?
So you'll, you know, if you're trying to do a read, the read will enter through the NFS implementation.
It'll get forwarded to the node that has the data, but that node will reply directly out on the NFS session with the data.
Yeah, okay.
So there's a bunch of interesting ways that we can take advantage of the switch, but there's currently not enough capability in the forwarding logic on the switch to reach inside the TCP stream where the NFS requests are and pull out the NFS headers, which is what you would need to do to forward the stuff directly to the to the right nodes right okay but just to be straight the one cabinet is two micro nodes or micro arrays yeah but each micro array
owns its own storage right that's right that's right so it's it's not an ha pair it's just
two nodes that happen to be in the cabinet.
That's right.
That's exactly right.
It would be as if they were one new blade server sitting on top of each other,
or one new piece of oxygen sitting on top of each other.
Okay.
And the disk implementation is a RAID something or another, or ratio coding?
No, we're not doing any coding on that in right now
although
We're we're doing some work on on building a capacity
Scale of story that that'll let us later this year
Offer a much larger scale capacity growth as the disc tier independent of the of the flash tier
But right now the disks are not rated they're they're simply you know cold pools of data that
the flash can demote down onto we get redundancy by doing replication across
nodes okay all right so so data comes, it gets chunked up into objects,
and then that object gets written to two or three nodes depending on what level
I selected? That's right. That's right. You can, I mean, in some ways you can think
of what's happening with the way the data is forwarded as being heavily influenced by a sort of network forwarding path.
And so the thing that we're providing
on the back end
is basically a coarse-grained
object system.
I like to think of it
as being a lot like
what we did with virtual machines
when we did the work on Xen.
The analog there is that with the stuff we did with Xen,
we realized that the CPU was way more powerful
than any of the application workloads
that were being put on top of it.
And so dividing that CPU up into something
that gave you controlled sharing
and let you run multiple of those workloads
without trying to add any extra crap in terms of uh you know layers of abstraction um was a good way to to get
additional value and and these flash devices look a lot like that it's it's you know there are a few
cases of very demanding apps but for the most part they're hard pressed to put enough application
workload on one of these devices to saturate it. And so the abstraction we have is an object
as a pretty coarse-grained container for data.
And the object then on each of your flash devices
with disks in behind them gets composed up
into striping and replication layers.
Those things give you a sort of forwarding graph, right,
like you would have for routing a packet in an IP network.
And the read and write requests follow those paths down onto the appropriate devices.
And so our NFS implementation is just one possible client of that thing.
We've internally built a version of MySQL where we've linked MySQL directly into those dispatch libraries.
And MySQL then is able to
talk across all the devices
in the full width of the switch.
And the MySQL implementation
does direct requests? Yes, that's right.
Okay.
Wait a minute, wait a minute, wait a minute. Is this a database
machine? Is that what you said there, Andy?
It's more just a proof of concept of what we're trying to do on the base, right?
That for a lot of the customers that we've been talking to, right,
certainly the more enthusiastic customers over the past year,
the kind of performance that we're getting off of even the 2U, right? We're getting 180K, 8020 random 4K IO
with replication off of the 2U.
And so the customers that are really kind of getting excited about that
tend to be, you know, medium to large enterprises.
And to some degree, a lot of them end up having
in-house development of one sort or another.
Either they're a software shop that's building stuff that they're packaging on arrays or they're, you know, a media and entertainment site that's got a bunch of stuff.
And they're really, really interested in the idea of being able to take on a bit of development work to integrate against a richer storage abstraction and get, you know, consequent performance benefits.
And so the idea there is that, you know, we solve an immediate problem with NFS
and this MySQL prototype, right, and it's really just a POC around one idea,
is that you can adapt an application to not have to go over this NFS or iSCSI bottleneck,
but instead to be able to talk directly across 10 or 20 flash devices
with very low latency and great scale of performance.
And now you just have to extend atomic rates across that,
and we're really in business.
Yeah, that's right.
So the SSD layer or the flash layer is both doing,
I'll call it read caching as well as write caching?
Yeah, it's read and write.
It's primary storage.
It's primary storage and then you have cold storage, which is the disks behind it.
Yeah, yeah.
This is going to take some thought.
So let me give you another little wrench in there to uh to wrap your head around the um uh the cold
storage thing is is really interesting um in that in array design uh in storage design for
for a long time um we've had you know naturally a hierarchy of storage right even with spinning
disks you you had a battery-backed ram to some amount on the buffered write path. And then you had in-RAM read cache, typically, right?
And, you know, that was how vendors initially adopted Flash, right, was to just build a larger read cache.
Some of them are still stuck there.
Yeah, some of them. that we saw on that side is that the policies that you end up using to manage that cash
are all variants of LRU, basically, right?
They're LRU or, you know, the sort of later improvements in LRU like ARC and CAR and so
on that do a little bit better job on specific workloads, right? But basically, they're working with this hot end of the workload
and just trying to keep the stuff that's going to be reaccessed inside memory.
And the thing that we noticed is when you grow the amount of fast memory
that you have relative to the amount of slow memory,
you move out on that curve of access frequency.
Yeah, a typical hybrid system that manages its cache well is well into the long tail
by the time you get to the least recently used block.
Yes, exactly.
And so, you know, it's not hard for, you know, even normal workloads.
We did a bunch of analysis early on.
Microsoft Research released a bunch of enterprise traces from their data center a few years ago into this NIA trace archive.
And so it's about 14 or 15 workloads running over a week.
And it's a big variety of workloads.
So it's a pretty good resource for doing storage modeling.
And if you look at that thing, you know, running all of those workloads as if you were sitting on top of a single hypervisor.
And you start modeling what your hit rates are like as you get out into the tail of the curve for LRU, you're looking at inter-cell access times in the
neighborhood of like 20 hours or more, right?
So the amount of time between accesses to a given piece of flash is potentially a day,
right?
And to us, that seems kind of like a waste of money, right, to go and stick in really expensive flash, right?
Yeah, but now you've got to figure out something more valuable to stick there.
Yes. Well, you've got to do something.
And so on that end of the system…
Or you reduce the cash, right?
Yeah, but people have proven they're willing to pay for the flash.
Yeah.
And from a marketing point of view… They're willing to pay for the flash. And from a marketing point of view –
They're willing to pay for the performance.
Yeah, but from a marketing point of view, if you just said, you know, 90% of the benefit comes from half the flash.
So instead of giving you a terabyte of flash, I'm going to give you half a terabyte of flash.
People are going to look at the spec sheet and go, that's not a good idea, and move on.
So to some extent, you've got to go, okay, I have this much flash because everybody else has that much,
is in the 10% flash to disk ratio, and if I go down to 5%, nobody will believe my performance claims.
So now what's more valuable to use in that other piece of flash.
Exactly. How can we use it to get more out of it?
And so that's where we spend a great deal of time internally.
We'll be announcing some stuff, I think, over the next few months that's pretty fun on this side.
But one of the things that we've put a fair bit of effort into is actually taking continuous online traces of workloads running on the array or running on the storage system and trying to do much richer analytics in terms of predicting, you know, what data is going to be accessed based on the recent access patterns,
based on the time of day, the time of week, and so on.
And we use that to inform everything from promotions out of disk
to how we actually lay out data when we demote it out of Flash.
So based on the way that you've seen data accessed in the past,
you may not want to be sequential when you write it out onto the spinning disk tier.
You may want to prefer to write out data that's accessed closer together.
And so, that's an aspect of the system that's been pretty exciting.
And an interesting thing there is we really think there's significant value to be had by tying analytics and scalable analytics into a scalable storage system.
Wait a minute.
Andy, are you saying you're running Hadoop on this storage cluster to do big data analytics on the IO traces that the customer is doing?
That's right.
We're using our own stack for this right now, but we've certainly been doing a lot of work internally with Hadoop,
and it's exactly the direction we're going.
We're our own customers on this for now,
but as we get better experience with integrating it against the storage system,
we will be rolling out support for allowing customers to do analytics on the
system next to their data.
So avoiding a lot of the data movement associated with that.
So you're doing this offline.
Thank God.
Okay.
I got you.
I understand.
But the trend over the past 18 months or so has been for the innovative companies in storage
to be collecting a lot more counters.
Yes. Yes. So that that analytics can be done. companies in storage to be collecting a lot more counters.
Yes, yes.
So that that analytics can be done.
I mean, Nimble's doing some interesting things there as well.
Yeah, yeah.
Where, you know, the data we used to have was minuscule compared to what some of these systems are now collecting.
Yeah, yeah, yeah.
It's absolutely true.
I mean, I guess it's a shift in the model from the storage hardware being both completely isolated from the rest of the world and also being pretty constrained in terms of IO budget and compute. for populating your cache, for laying out data on disk, for all these things. Flash is still demanding, and customers' acceptance of doing phone home facilities
for remote analytics, which is the case in some of the other vendor examples.
Both of those things mean you can do a heck of a lot more stuff with the data.
Yeah. I want to tell you, we used to get, this is back in the 90s, we used to get like 8K state
saves and we'd get like, you know, a floppy's worth of data off a machine.
Something like that was a godsend, you know.
Today, I don't know, you know, we were talking of Nimble.
I think they talked about 20 million counters a day.
Now, you know, obviously they're taking something on the order of every minute
or every hour or something like that, but still, it's quite a lot of data.
Yeah, well, with a full trace running on our gear,
at around 10 nodes when you're pulling a million IOPS through,
we can configure the thing to take a trace event for every single one of those requests.
So it's a million and trace a second.
I like the AK floppy example.
No, no, no. Don't go there. Don't go there.
Just showing we've been there and done that.
Yeah, yeah, yeah.
A couple of times.
So, you know, as the first question, any scale-out environment that I ask is like, how big can a scale-out scale to?
So we test today deployments up to between 20 and 30 nodes of the system. And we're actually a little bit economically and power constrained in terms of growing
our test lab up greater than that.
So we're not architecturally limited at that point, but that's the point that we've tested
to.
Yeah, yeah.
So if you run any sort of, you know, I consider, performance benchmarks like SpecSFS on the product?
I can't say as I've seen any results from you guys.
No, no, no. I'm glad you asked that. I meant to mention that earlier, actually.
So the way that we've approached things, we're pretty pragmatic on a lot of this stuff. The decision to do an NFS implementation for VMware was a pretty careful
one, right? VMware represents a fair bit of new storage spend. So as a company, it's a good initial
market. And NFS on VMware means that you get visibility into the individual VM image files,
which you don't get on top of the block
presentations to VMware.
They stick VMFS to their cluster file system, and it obfuscates which things are being accessed,
which means that you can't provide per VM policy.
The other big benefit to taking that decision was from an NFS implementation perspective,
we could
really focus on the data path, right? So we're, you know, we're focused in the initial product
on building a really scalable, high-performance NFS v3, interestingly, server, right? It's v3,
but it's getting a lot of the benefits. That's all VMware supports. Yeah, exactly. It's getting a lot
of the benefits that you might otherwise get from like a PNFS in terms of being able to scale out, but not have to worry about a lot the room next to me, the engineering team is actually sitting with spec SFS in regression.
So we actually run spec SFS in our regression suite and is working on beavering away at that number.
So later this year, we should have a fully general NFS implementation.
That's really impressive because it's not the NFS implementation.
It's the file system like VMFS only has to deal with a limited set of uses.
Yes.
When you start saying, you know, I'm going to build this distributed file system and it's going to support POSIX record locks so that we get the ability to run that kind of application across the scale-out system. That starts getting a lot more difficult.
Yeah, it's pretty exciting to see the work taking place.
Well, I guess one of the things that's been kind of redundancy, the scale out and the availability
work in terms of the actual data on disk has kind of been solved at this scale at object
store layer.
The client connection management for NFS is being handled at the SDN and NFS protocol presentation there, right?
So that's where we're doing things like using the SDN switch
to monitor traffic demand, moving sessions around, and stuff like that.
And now, like you say, we're building this scalable metadata representation
that takes the NFS namespace and does the exact same thing
that we've done to the other two components with that, right?
That builds the namespace as a distributed tree, that scales it out across all the nodes,
that integrates to move clients that are accessing similar working sets onto the same nodes,
that we don't have to remote accesses to locks and things like that.
And so, you know, it's underway.
And I think I'm pretty hopeful about where we're going to get to it later this year.
Yeah, the reason I mentioned SpecSFS is to try to get some idea on the latency of your storage versus, you know, competitive offerings and stuff like that.
And it's got a thing, and SpecSFS supports a thing called the, I think it's operational response time. It's really an average across, you know, from the lowest performance to the highest performance at some specific configuration.
So what would you say the latency of your system is?
I mean, obviously, with SSDs and stuff like that, latency can be fairly small in a block storage configuration,
but that's a different animal for files.
So what are you after there?
Oh, I'm just after a number, Andy.
1.7 or 1,600 or, you know, I don't know. So, you know, assuming we've got a VM running running you know and you know i'll give you that you know it's a flash hit and we're not
going to the spinning disks you know what kind of transactional latency is that vm going to see
so the the latencies that we're seeing there um from a vm uh i think uh from the numbers i was
looking at when i was doing stuff last week on this, right now we're in the neighborhood of a millisecond plus or minus half a millisecond on an unloaded system.
Yeah, okay.
So if your queues are empty, we're getting really, really solid flash performance that factors in the network round trip on that.
The latency numbers that a lot of people talk about on these things always drive me nuts
because with any storage system, as you add load, your queuing increases,
and queuing ends up being a dominant factor in latency.
It's something that drives me, it's, it's, it's something that,
you know, I, it drives me nuts to see somebody claiming I'm doing a million IOPS and I'm doing,
you know, 500 microseconds of latency, right? Those two things.
Well, you better be saying you're doing them both at the same time.
Exactly. The implication is that, uh, is, is always that they are, um, and we spend a lot
of conversations with people talking about, uh, and, and sometimes they are. And we spend a lot of conversations with people talking about...
And sometimes they are and sometimes they're not.
And this is the problem with benchmarking.
Unfortunately, you're talking to Ray, who reads the stuff out of spec,
and SPC with a fine-tooth comb,
and me, who runs a test lab, so, you know.
Yeah.
Go ahead.
We know the problems with today's benchmarks.
Yeah.
And, you know, and the problem where you can say,
look, I did a million IOPS,
and look, I did 500 microseconds of latency,
but you didn't say, you know, at the same time. Yes. All right. And you didn't say at the same time.
And you didn't say how big the data set was because you're a hybrid
and you were running it against a one gigabyte test file.
Exactly, with two gigabytes of flash.
All right, so SpecSFS does a fairly reasonable job with latency
because it's an overall average from the lowest workload to the peak workload.
SPC, block-oriented services, has a least response time, which is like 10% load.
So it's really, you know, no queuing delay kind of environment.
Okay.
But that's a block storage configuration. SpecSFS has got, at least it provides an average across the whole
spectrum of workloads that
you are benchmarking.
So I prefer that, but
that's the way it is.
Just to clear the air on benchmarking.
I won't mention it again.
Well, I'll look forward to
coming back to you with some
spec numbers later in the year.
I think we'll have some exciting stuff there. I'd be happy to look
at that. All right.
God, we're at about 45 after.
Is there any other
questions, Howard? Yeah, there's just one
more set of things I want to talk about,
which will go pretty quick.
If I remember right, you guys are doing
per VM snapshots and per VM replication?
Yes, that's right.
And you're not currently
reducing the data on the disk, right?
No, we're not doing any data reduction
in the current implementation.
We decided to, like I said,
focus entirely on making sure
that the performance was there
in our GA.
Okay.
We have an internal implementation
of Ddupe that'll go out when we push it through the testing paces.
We also have a rather simpler internal implementation of compression that we're looking at putting in ahead of the Ddupe work.
The compression one is obviously a lot easier than the dedupe as a feature to build.
Although you're being object-based on the back end.
Makes this all easier.
Helps the dedupe.
Oh, yeah, absolutely.
Yeah, it's a longer discussion about how the dedupe works,
but it's a pretty interesting one that I'd be happy to go through at some point later on.
We'll save that for another day.
I mean, the main fun thing on both of those is that we push it to be asynchronous, right?
So the dedupe implementation takes hints on the live data path. We actually, since we're doing CRCs for data at rest,
we use the CRCs as hints on opportunities to dedupe.
But in both the compression and dedupe case,
those things actually kick in off the write path
and come into play on the demotion path.
And so we're able to use them
without having to sit right on the hot data path
and giving away a performance.
I could argue that one either way.
Yeah, as we all could.
Yeah.
All right.
Okay, so you mentioned no deduplication or compression into the flash or into the back end.
I assume it meant across the whole system, right?
That's right. None at all right now.
Well, this has been great.
Thank you, Andy, for being on our call.
Next month, we'll talk to another startup slash storage technology person.
Any questions you have, let us know.
That's it for now.
Bye, Howard, and thanks again, Andy.
Until next time.
Thanks, Andy.
So long, Ray.
Until next time.
All right.
See you guys.
Thanks.
Bye.
Bye.
Bye. Thank you.