Grey Beards on Systems - GreyBeards talk scale-out storage with Coho Data CTO/Co-founder, Andy Warfield

Starting point is 00:00:00 Music Hey everybody, Ray Lucchese here and Howard Marks here. on Storage monthly podcast, a show where we get Greybeards storage and system bloggers to talk with storage and system vendors to discuss upcoming projects, technologies, and trends affecting the data center today. Welcome to the fifth episode of Greybeards on Storage. It was recorded on January 20, 2014. We have here with us today Andy Warfield, CTO and co-founder of Coho Data. Why don't you tell us a little bit about yourself and your company, Andy? Sure. Thanks, Ray. Well, the company has been around for just over two years at this point. We kind of come from a mixed storage and virtualization background.

Starting point is 00:01:18 I was one of the guys that wrote the Zen Hypervisor when I was a grad student in the UK. And then I was involved in ZenSource with both of my co-founders, Keir Fraser and Rauna Janela. We're building a high-performance scale-out storage system that incorporates a bunch of pretty interesting technologies, not the least of which are PCIe Flash and software-defined networking. And so the premise of the company is basically to build a scale-out storage service like you might see in the larger cloud providers, but package it up in a way that you can deploy it inside an enterprise data center. That's very interesting.

Starting point is 00:02:01 I guess the first question I had about this is the software-defined networking layer. Not a lot of storage companies out there package their systems with software-defined networking. Can you explain why you thought that was necessary in your environment? Yeah, yeah, that's been around for a while. Flash was being packaged as a SaaS or SATA device, and it was basically a disk that didn't have the same kind of seek penalty. And as Flash has started to move on to faster buses, PCIe first and more recently on to DIMM slots, the bandwidth that's available into the device is significantly faster than we've seen from disks in the past. And one of the early realizations that we made working with some loaner gear from Fusion IO was that a single device, even under a pretty aggressive random workload, was capable of saturating an entire 10 gig port.

Starting point is 00:03:15 Which meant that you were going to have to push a bunch of the logic in the storage system out to incorporate the network. And that's where we really started looking at SDN. So a random workload and a single device is able to saturate a 10 gigabyte Ethernet NIC? Is that what you said? A NIC with dual ports or a single port and all that, you know, jumbo frames and all that junk. Howard, jump in any time you want with the networking stuff. Well, a single port and jumbo frames really doesn't help all that much.

Starting point is 00:03:46 My experience with iSCSI has been jumbo frames get you somewhere like 3-5% better performance. But remember, Ray, we're talking about Flash, so the fact that it's random doesn't slow it down. So

Starting point is 00:04:03 can one PCIe, SSD, ssd or fusion io card deliver more than 10 gigabits per second of data uh yep sure can yeah yeah yeah if it's the right if it's the right data right that's the other part part of this problem yeah yeah absolutely well so the interesting thing at that point right at 10 gig gigabits per second or around a gigabyte a second, you're already faster than the entire 600 megabyte Sasser SATA bus. You're dealing with storage performance that exceeds a fully populated SAS or SATA bus pumping, you know, at full speed. And so, you know, it really begged a reconsideration of the software and of the storage architecture

Starting point is 00:04:55 in that, you know, the way that we've always built storage has really been around assuming disks were the slowest things. And what we saw with the Flash was, you Flash was the device was fast enough to saturate the NIC, and saturating that NIC actually consumed a fair bit of CPU. And so the thing that we're faced with in going to multiple devices, which you still need to do because you still want redundancy, you still want to be able to scale up capacity, is that a bunch of that coordination and aggregation of the devices

Starting point is 00:05:28 needs to move out to incorporate the network. And so we use SDN a few different ways around that. Yeah, I found it fascinating over the past few months to see how different vendors are coming out with different scale-out systems and the whole how you get requests to the node that has the data varies so substantially across the models. Absolutely. It's something that's actually really evolved a lot, right?

Starting point is 00:05:59 So the scale-out and data distribution problem predates Flash, obviously. With a lot of object storage systems where you're just trying to scale a capacity on lots and lots of disks, you would use things like consistent hashing to try and throw the data out at all those disks. Yeah. Right. And the latency that stopping at an ingest node and figuring out where you had to go next was acceptable because it was a disk on the back end. But now that it's Flash on the back end, adding a millisecond of latency

Starting point is 00:06:39 in a reference starts to add up. Yes. Yes, absolutely. So are you saying you're doing the hashing and the open flow layer? Is that to decide where to go? No, we're not doing the hashing at that layer. There's opportunities there. It's actually, there's a bunch of kind of interesting stuff that we can talk about there.

Starting point is 00:07:04 We've learned an awful lot about the merchant silicon that these switches are shipping with and the capabilities of OpenFlow on this stuff. So there are kind of two layers that it's interesting to think about the protocol at or the work that we're doing at. Well, so maybe it's useful to take a step back and just talk about the hardware architecture. Sure, sure. Yeah, yeah.

Starting point is 00:07:31 So we're a scale-out system, as mentioned. The building blocks of the system are balanced combinations of flash, CPU, and network, right? So we go and buy the best sort of performance price point from what we can get in terms of commodity hardware and all of our stuff is software that sits on top of it. So our GA product is a 2U2 server OEM box. So it's two physical servers.

Starting point is 00:08:03 Each of the physical servers has two 10 gig ports, two sockets, and two PCIe flash devices. We also incorporate a tier of 12 spinning disks across that two use, six per side. And so we call those server instances microarrays in the architecture to sort of suggest that they're you know the smallest sort of like array analog in terms of building scale of storage and we manage the two halves of them as sort of virtual instances right so you have a wire coming in on the 10 gig ports and that's being handled directly by a cpu and mapping down onto flash and so to scale up the system you just plug these things into the switch. We run software on the switch as well as on the nodes.

Starting point is 00:08:49 And so a sort of like smallest possible deployment of the system is a single switch and a single 2U box, which gives you 40 gigabits of connectivity into your storage across the four ports. Being inclusive of the switch means that your clients, and our GA really focuses around virtualization and ESX clients, your clients can be run directly into the switch, and so they can take advantage of the full cross-sectional bandwidth of the switch into the storage. So we can actually pump 40 gigabits of storage out onto four clients as a starting point. So getting back to the question of how the forwarding works, you can really think of our

Starting point is 00:09:34 stack in two layers, right? The base layer is the aggregation, right? It's a bare metal object store where you combine all of these storage elements and allow them to be addressed. And at that layer, you get horizontal addressing across everything. You can talk directly to data across all the devices. The interesting problem that comes up on top of that is there have been object storage companies kind of forever of stuff, right? And one of the big Achilles of building an object storage product is that you have to demand interface changes on the client, or you have to build a single gateway that ends up being a big performance bottleneck. And so the thing that we've done there is we've taken advantage of the fact

Starting point is 00:10:20 that we've got this scale-out platform with CPU and network to actually build a horizontal scale-out presentation layer, right? So we built an object store, but we said, we're never going to be able to sell a brand-new object store. Let's go in with a protocol that's actually relevant in these virtualization environments. And so we started with a test. Thank you.

Starting point is 00:10:42 And so the NFS implementation is a horizontal NFS controller, right? And by that I mean that you see a single NFS server IP address on the other end of the switch, right? The clients, the ESX hosts are configured to talk to a single NFS IP address. When those connections come in, they're balanced across as many microarrays as you have. So the NFS implementation that terminates that IP address is actually scaled across all of the hosts. It's basically a distributed TCP endpoint that runs across all of them. So that also helps solve the problem that NFS doesn't natively do multipathing. Yes, yes.

Starting point is 00:11:32 There's lots of room to continue to improve on that. But the place where we are right now is that basically every single NFS session that comes into the switch can run active on an independent link. So if you have four ESX hosts, they can come in over four wires, all appearing to talk to the same NFS server. And what's more, we can dynamically move those connections around in response to load and data location and stuff like that as the system runs. So that was the question. So if the data happens to be on, let's say you've got, I don't know, a four-server configuration or even an eight-server configuration. So the disks that are behind the pair of servers, are they shared across all the servers or

Starting point is 00:12:16 are they only accessible to the pair of servers? How does that play out? Because, I mean, if the data is sitting on, you know, let's say disks associated with the fourth pair rather than the first pair, how does that get moved around and stuff? Oh, okay. Yeah, that's a good question. So the disks in the system are built in behind the flash. And so the object store that we built that runs on each node manages those things as a set. The interesting bit of that part of the system's design is we designed it first for flash

Starting point is 00:12:53 and then added the support for disks later. So unlike a lot of the storage systems that you see today where you've got a 20- or 30-year-old RAID implementation with a file system and someone's gone, okay, how do I stick this flash on top of it? And then done that to vastly varying degrees of success. Yeah. We kind of came at it from the other end, just out of the speed that we looked at the hardware that we had. So we built it for flash, and then we asked the question,

Starting point is 00:13:23 okay, we need more capacity. How do we demote cold data off to the disks? So the Flash is the primary tier in the architecture, and the disk is the slower tier. To your question, though, Ray, if the data is on a different microarray than the one that you're coming into, there's absolutely a hop there, right? So in behind the gateway, the request gets forwarded one hop to pull that stuff across. Now, we're doing some pretty exciting stuff with the switch and with the TCP implementation where I expect that later this year, that hop will be in one direction, right? So you'll, you know, if you're trying to do a read, the read will enter through the NFS implementation.

Starting point is 00:14:19 It'll get forwarded to the node that has the data, but that node will reply directly out on the NFS session with the data. Yeah, okay. So there's a bunch of interesting ways that we can take advantage of the switch, but there's currently not enough capability in the forwarding logic on the switch to reach inside the TCP stream where the NFS requests are and pull out the NFS headers, which is what you would need to do to forward the stuff directly to the to the right nodes right okay but just to be straight the one cabinet is two micro nodes or micro arrays yeah but each micro array owns its own storage right that's right that's right so it's it's not an ha pair it's just two nodes that happen to be in the cabinet. That's right. That's exactly right. It would be as if they were one new blade server sitting on top of each other,

Starting point is 00:15:13 or one new piece of oxygen sitting on top of each other. Okay. And the disk implementation is a RAID something or another, or ratio coding? No, we're not doing any coding on that in right now although We're we're doing some work on on building a capacity Scale of story that that'll let us later this year Offer a much larger scale capacity growth as the disc tier independent of the of the flash tier

Starting point is 00:15:47 But right now the disks are not rated they're they're simply you know cold pools of data that the flash can demote down onto we get redundancy by doing replication across nodes okay all right so so data comes, it gets chunked up into objects, and then that object gets written to two or three nodes depending on what level I selected? That's right. That's right. You can, I mean, in some ways you can think of what's happening with the way the data is forwarded as being heavily influenced by a sort of network forwarding path. And so the thing that we're providing on the back end

Starting point is 00:16:32 is basically a coarse-grained object system. I like to think of it as being a lot like what we did with virtual machines when we did the work on Xen. The analog there is that with the stuff we did with Xen, we realized that the CPU was way more powerful

Starting point is 00:16:52 than any of the application workloads that were being put on top of it. And so dividing that CPU up into something that gave you controlled sharing and let you run multiple of those workloads without trying to add any extra crap in terms of uh you know layers of abstraction um was a good way to to get additional value and and these flash devices look a lot like that it's it's you know there are a few cases of very demanding apps but for the most part they're hard pressed to put enough application

Starting point is 00:17:21 workload on one of these devices to saturate it. And so the abstraction we have is an object as a pretty coarse-grained container for data. And the object then on each of your flash devices with disks in behind them gets composed up into striping and replication layers. Those things give you a sort of forwarding graph, right, like you would have for routing a packet in an IP network. And the read and write requests follow those paths down onto the appropriate devices.

Starting point is 00:17:54 And so our NFS implementation is just one possible client of that thing. We've internally built a version of MySQL where we've linked MySQL directly into those dispatch libraries. And MySQL then is able to talk across all the devices in the full width of the switch. And the MySQL implementation does direct requests? Yes, that's right. Okay.

Starting point is 00:18:18 Wait a minute, wait a minute, wait a minute. Is this a database machine? Is that what you said there, Andy? It's more just a proof of concept of what we're trying to do on the base, right? That for a lot of the customers that we've been talking to, right, certainly the more enthusiastic customers over the past year, the kind of performance that we're getting off of even the 2U, right? We're getting 180K, 8020 random 4K IO with replication off of the 2U. And so the customers that are really kind of getting excited about that

Starting point is 00:18:55 tend to be, you know, medium to large enterprises. And to some degree, a lot of them end up having in-house development of one sort or another. Either they're a software shop that's building stuff that they're packaging on arrays or they're, you know, a media and entertainment site that's got a bunch of stuff. And they're really, really interested in the idea of being able to take on a bit of development work to integrate against a richer storage abstraction and get, you know, consequent performance benefits. And so the idea there is that, you know, we solve an immediate problem with NFS and this MySQL prototype, right, and it's really just a POC around one idea, is that you can adapt an application to not have to go over this NFS or iSCSI bottleneck,

Starting point is 00:19:43 but instead to be able to talk directly across 10 or 20 flash devices with very low latency and great scale of performance. And now you just have to extend atomic rates across that, and we're really in business. Yeah, that's right. So the SSD layer or the flash layer is both doing, I'll call it read caching as well as write caching? Yeah, it's read and write.

Starting point is 00:20:07 It's primary storage. It's primary storage and then you have cold storage, which is the disks behind it. Yeah, yeah. This is going to take some thought. So let me give you another little wrench in there to uh to wrap your head around the um uh the cold storage thing is is really interesting um in that in array design uh in storage design for for a long time um we've had you know naturally a hierarchy of storage right even with spinning disks you you had a battery-backed ram to some amount on the buffered write path. And then you had in-RAM read cache, typically, right?

Starting point is 00:20:48 And, you know, that was how vendors initially adopted Flash, right, was to just build a larger read cache. Some of them are still stuck there. Yeah, some of them. that we saw on that side is that the policies that you end up using to manage that cash are all variants of LRU, basically, right? They're LRU or, you know, the sort of later improvements in LRU like ARC and CAR and so on that do a little bit better job on specific workloads, right? But basically, they're working with this hot end of the workload and just trying to keep the stuff that's going to be reaccessed inside memory. And the thing that we noticed is when you grow the amount of fast memory

Starting point is 00:21:38 that you have relative to the amount of slow memory, you move out on that curve of access frequency. Yeah, a typical hybrid system that manages its cache well is well into the long tail by the time you get to the least recently used block. Yes, exactly. And so, you know, it's not hard for, you know, even normal workloads. We did a bunch of analysis early on. Microsoft Research released a bunch of enterprise traces from their data center a few years ago into this NIA trace archive.

Starting point is 00:22:19 And so it's about 14 or 15 workloads running over a week. And it's a big variety of workloads. So it's a pretty good resource for doing storage modeling. And if you look at that thing, you know, running all of those workloads as if you were sitting on top of a single hypervisor. And you start modeling what your hit rates are like as you get out into the tail of the curve for LRU, you're looking at inter-cell access times in the neighborhood of like 20 hours or more, right? So the amount of time between accesses to a given piece of flash is potentially a day, right?

Starting point is 00:23:00 And to us, that seems kind of like a waste of money, right, to go and stick in really expensive flash, right? Yeah, but now you've got to figure out something more valuable to stick there. Yes. Well, you've got to do something. And so on that end of the system… Or you reduce the cash, right? Yeah, but people have proven they're willing to pay for the flash. Yeah. And from a marketing point of view… They're willing to pay for the flash. And from a marketing point of view –

Starting point is 00:23:27 They're willing to pay for the performance. Yeah, but from a marketing point of view, if you just said, you know, 90% of the benefit comes from half the flash. So instead of giving you a terabyte of flash, I'm going to give you half a terabyte of flash. People are going to look at the spec sheet and go, that's not a good idea, and move on. So to some extent, you've got to go, okay, I have this much flash because everybody else has that much, is in the 10% flash to disk ratio, and if I go down to 5%, nobody will believe my performance claims. So now what's more valuable to use in that other piece of flash. Exactly. How can we use it to get more out of it?

Starting point is 00:24:12 And so that's where we spend a great deal of time internally. We'll be announcing some stuff, I think, over the next few months that's pretty fun on this side. But one of the things that we've put a fair bit of effort into is actually taking continuous online traces of workloads running on the array or running on the storage system and trying to do much richer analytics in terms of predicting, you know, what data is going to be accessed based on the recent access patterns, based on the time of day, the time of week, and so on. And we use that to inform everything from promotions out of disk to how we actually lay out data when we demote it out of Flash. So based on the way that you've seen data accessed in the past, you may not want to be sequential when you write it out onto the spinning disk tier.

Starting point is 00:25:10 You may want to prefer to write out data that's accessed closer together. And so, that's an aspect of the system that's been pretty exciting. And an interesting thing there is we really think there's significant value to be had by tying analytics and scalable analytics into a scalable storage system. Wait a minute. Andy, are you saying you're running Hadoop on this storage cluster to do big data analytics on the IO traces that the customer is doing? That's right. We're using our own stack for this right now, but we've certainly been doing a lot of work internally with Hadoop, and it's exactly the direction we're going.

Starting point is 00:25:52 We're our own customers on this for now, but as we get better experience with integrating it against the storage system, we will be rolling out support for allowing customers to do analytics on the system next to their data. So avoiding a lot of the data movement associated with that. So you're doing this offline. Thank God. Okay.

Starting point is 00:26:13 I got you. I understand. But the trend over the past 18 months or so has been for the innovative companies in storage to be collecting a lot more counters. Yes. Yes. So that that analytics can be done. companies in storage to be collecting a lot more counters. Yes, yes. So that that analytics can be done. I mean, Nimble's doing some interesting things there as well.

Starting point is 00:26:32 Yeah, yeah. Where, you know, the data we used to have was minuscule compared to what some of these systems are now collecting. Yeah, yeah, yeah. It's absolutely true. I mean, I guess it's a shift in the model from the storage hardware being both completely isolated from the rest of the world and also being pretty constrained in terms of IO budget and compute. for populating your cache, for laying out data on disk, for all these things. Flash is still demanding, and customers' acceptance of doing phone home facilities for remote analytics, which is the case in some of the other vendor examples. Both of those things mean you can do a heck of a lot more stuff with the data. Yeah. I want to tell you, we used to get, this is back in the 90s, we used to get like 8K state

Starting point is 00:27:28 saves and we'd get like, you know, a floppy's worth of data off a machine. Something like that was a godsend, you know. Today, I don't know, you know, we were talking of Nimble. I think they talked about 20 million counters a day. Now, you know, obviously they're taking something on the order of every minute or every hour or something like that, but still, it's quite a lot of data. Yeah, well, with a full trace running on our gear, at around 10 nodes when you're pulling a million IOPS through,

Starting point is 00:28:03 we can configure the thing to take a trace event for every single one of those requests. So it's a million and trace a second. I like the AK floppy example. No, no, no. Don't go there. Don't go there. Just showing we've been there and done that. Yeah, yeah, yeah. A couple of times. So, you know, as the first question, any scale-out environment that I ask is like, how big can a scale-out scale to?

Starting point is 00:28:33 So we test today deployments up to between 20 and 30 nodes of the system. And we're actually a little bit economically and power constrained in terms of growing our test lab up greater than that. So we're not architecturally limited at that point, but that's the point that we've tested to. Yeah, yeah. So if you run any sort of, you know, I consider, performance benchmarks like SpecSFS on the product? I can't say as I've seen any results from you guys. No, no, no. I'm glad you asked that. I meant to mention that earlier, actually.

Starting point is 00:29:15 So the way that we've approached things, we're pretty pragmatic on a lot of this stuff. The decision to do an NFS implementation for VMware was a pretty careful one, right? VMware represents a fair bit of new storage spend. So as a company, it's a good initial market. And NFS on VMware means that you get visibility into the individual VM image files, which you don't get on top of the block presentations to VMware. They stick VMFS to their cluster file system, and it obfuscates which things are being accessed, which means that you can't provide per VM policy. The other big benefit to taking that decision was from an NFS implementation perspective,

Starting point is 00:30:04 we could really focus on the data path, right? So we're, you know, we're focused in the initial product on building a really scalable, high-performance NFS v3, interestingly, server, right? It's v3, but it's getting a lot of the benefits. That's all VMware supports. Yeah, exactly. It's getting a lot of the benefits that you might otherwise get from like a PNFS in terms of being able to scale out, but not have to worry about a lot the room next to me, the engineering team is actually sitting with spec SFS in regression. So we actually run spec SFS in our regression suite and is working on beavering away at that number. So later this year, we should have a fully general NFS implementation. That's really impressive because it's not the NFS implementation.

Starting point is 00:31:30 It's the file system like VMFS only has to deal with a limited set of uses. Yes. When you start saying, you know, I'm going to build this distributed file system and it's going to support POSIX record locks so that we get the ability to run that kind of application across the scale-out system. That starts getting a lot more difficult. Yeah, it's pretty exciting to see the work taking place. Well, I guess one of the things that's been kind of redundancy, the scale out and the availability work in terms of the actual data on disk has kind of been solved at this scale at object store layer. The client connection management for NFS is being handled at the SDN and NFS protocol presentation there, right?

Starting point is 00:32:26 So that's where we're doing things like using the SDN switch to monitor traffic demand, moving sessions around, and stuff like that. And now, like you say, we're building this scalable metadata representation that takes the NFS namespace and does the exact same thing that we've done to the other two components with that, right? That builds the namespace as a distributed tree, that scales it out across all the nodes, that integrates to move clients that are accessing similar working sets onto the same nodes, that we don't have to remote accesses to locks and things like that.

Starting point is 00:33:00 And so, you know, it's underway. And I think I'm pretty hopeful about where we're going to get to it later this year. Yeah, the reason I mentioned SpecSFS is to try to get some idea on the latency of your storage versus, you know, competitive offerings and stuff like that. And it's got a thing, and SpecSFS supports a thing called the, I think it's operational response time. It's really an average across, you know, from the lowest performance to the highest performance at some specific configuration. So what would you say the latency of your system is? I mean, obviously, with SSDs and stuff like that, latency can be fairly small in a block storage configuration, but that's a different animal for files. So what are you after there?

Starting point is 00:33:53 Oh, I'm just after a number, Andy. 1.7 or 1,600 or, you know, I don't know. So, you know, assuming we've got a VM running running you know and you know i'll give you that you know it's a flash hit and we're not going to the spinning disks you know what kind of transactional latency is that vm going to see so the the latencies that we're seeing there um from a vm uh i think uh from the numbers i was looking at when i was doing stuff last week on this, right now we're in the neighborhood of a millisecond plus or minus half a millisecond on an unloaded system. Yeah, okay. So if your queues are empty, we're getting really, really solid flash performance that factors in the network round trip on that. The latency numbers that a lot of people talk about on these things always drive me nuts

Starting point is 00:34:54 because with any storage system, as you add load, your queuing increases, and queuing ends up being a dominant factor in latency. It's something that drives me, it's, it's, it's something that, you know, I, it drives me nuts to see somebody claiming I'm doing a million IOPS and I'm doing, you know, 500 microseconds of latency, right? Those two things. Well, you better be saying you're doing them both at the same time. Exactly. The implication is that, uh, is, is always that they are, um, and we spend a lot of conversations with people talking about, uh, and, and sometimes they are. And we spend a lot of conversations with people talking about...

Starting point is 00:35:25 And sometimes they are and sometimes they're not. And this is the problem with benchmarking. Unfortunately, you're talking to Ray, who reads the stuff out of spec, and SPC with a fine-tooth comb, and me, who runs a test lab, so, you know. Yeah. Go ahead. We know the problems with today's benchmarks.

Starting point is 00:35:54 Yeah. And, you know, and the problem where you can say, look, I did a million IOPS, and look, I did 500 microseconds of latency, but you didn't say, you know, at the same time. Yes. All right. And you didn't say at the same time. And you didn't say how big the data set was because you're a hybrid and you were running it against a one gigabyte test file. Exactly, with two gigabytes of flash.

Starting point is 00:36:16 All right, so SpecSFS does a fairly reasonable job with latency because it's an overall average from the lowest workload to the peak workload. SPC, block-oriented services, has a least response time, which is like 10% load. So it's really, you know, no queuing delay kind of environment. Okay. But that's a block storage configuration. SpecSFS has got, at least it provides an average across the whole spectrum of workloads that you are benchmarking.

Starting point is 00:36:50 So I prefer that, but that's the way it is. Just to clear the air on benchmarking. I won't mention it again. Well, I'll look forward to coming back to you with some spec numbers later in the year. I think we'll have some exciting stuff there. I'd be happy to look

Starting point is 00:37:06 at that. All right. God, we're at about 45 after. Is there any other questions, Howard? Yeah, there's just one more set of things I want to talk about, which will go pretty quick. If I remember right, you guys are doing per VM snapshots and per VM replication?

Starting point is 00:37:23 Yes, that's right. And you're not currently reducing the data on the disk, right? No, we're not doing any data reduction in the current implementation. We decided to, like I said, focus entirely on making sure that the performance was there

Starting point is 00:37:37 in our GA. Okay. We have an internal implementation of Ddupe that'll go out when we push it through the testing paces. We also have a rather simpler internal implementation of compression that we're looking at putting in ahead of the Ddupe work. The compression one is obviously a lot easier than the dedupe as a feature to build. Although you're being object-based on the back end. Makes this all easier.

Starting point is 00:38:15 Helps the dedupe. Oh, yeah, absolutely. Yeah, it's a longer discussion about how the dedupe works, but it's a pretty interesting one that I'd be happy to go through at some point later on. We'll save that for another day. I mean, the main fun thing on both of those is that we push it to be asynchronous, right? So the dedupe implementation takes hints on the live data path. We actually, since we're doing CRCs for data at rest, we use the CRCs as hints on opportunities to dedupe.

Starting point is 00:38:52 But in both the compression and dedupe case, those things actually kick in off the write path and come into play on the demotion path. And so we're able to use them without having to sit right on the hot data path and giving away a performance. I could argue that one either way. Yeah, as we all could.

Starting point is 00:39:14 Yeah. All right. Okay, so you mentioned no deduplication or compression into the flash or into the back end. I assume it meant across the whole system, right? That's right. None at all right now. Well, this has been great. Thank you, Andy, for being on our call. Next month, we'll talk to another startup slash storage technology person.

Starting point is 00:39:45 Any questions you have, let us know. That's it for now. Bye, Howard, and thanks again, Andy. Until next time. Thanks, Andy. So long, Ray. Until next time. All right.

Starting point is 00:39:57 See you guys. Thanks. Bye. Bye. Bye. Thank you.

Grey Beards on Systems - GreyBeards talk scale-out storage with Coho Data CTO/Co-founder, Andy Warfield

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.