Storage Developer Conference - #62: Getting it Right: Testing Storage Arrays The Way They’ll be Used

Episode Date: January 31, 2018

...

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to SDC Podcast Episode 62. Hi everybody. After lunch, this is the most difficult presentation and session of the day because in about 10 minutes, lunch is going to kick in. And as hard as you try, you'll do what we call a light check. So I'll try to keep you entertained. Call that the church notes.
Starting point is 00:01:02 Yeah, sure. As it says here, I'm Peter Murray and I'm from Virtual Instruments and I got a big long title here for the presentation. Virtual Instruments, some of you probably know us a little bit better as SwiftTester, Load Dynamics, or that Swift box. That's what I come from. And we've, for about a little less than, no, about 10 years now, we've been building and enhancing on what I still will say is the best load generator on the planet. Okay, I know I'm doing a little bit of commercial stuff. I should shut up and stop doing that and just talk about what I'm talking about.
Starting point is 00:01:43 But I want to give you a little background. So I've been an SE from the time we started out with this and we I have I say that I have the best job in the world for a seven-year-old boy I break stuff and people give me money and I don't even have to fix it I that's my job and it's it's pretty cool and what we have actually we can break almost anything pretty easily. Okay, enough out. The reason I tee it up that way is that I just want to talk to you about testing. So the last, well, I guess about four years ago, I joined something, the SNEA Solid State Storage System Technical Working Group. We call it S4TWIG because hardly anybody can remember the full name of it.
Starting point is 00:02:24 And the goal of it was to come up with a methodology for testing. Originally, we said we want to test solid-state arrays. Well, it turns out if you get the right methodology, it doesn't matter what you're testing. If it's storage and you test it that way, it won't penalize anything particularly, and it'll help people show off their features to their best effect if when you're really testing and Over the years of doing this we've I remember the first time we came here and people said Swift what and now we'll talk to You later. Oh that looks like a kind of little nice little wrapper around
Starting point is 00:03:02 SMB torture and a lot of things were said back and then and over the years we've gotten more and more people to know us more and more people are using us it's pretty much all of the vendors certainly the tier one vendors most of the second tier and now a lot of enterprises and the reason that they use us is that we help them test their equipment the way it's going to be used. So that's, we're saying, test storage arrays the way they'll be used. And this is the first of two presentations. So I'm talking to you guys as storage vendors or probably most of us here. Is that clear?
Starting point is 00:03:35 Any service providers or any standards trolls like me? Okay. There's at least one user here. Really? Oh, yeah, I'm sorry. That's right. We'll convert you, too. Okay.
Starting point is 00:03:50 So I want to talk to just give you some ideas of how we do what we're doing. Now, there is this solid-state storage system technical working group. I got it out again. It's working on coming up with a methodology for testing flash arrays. There's still a lot of discussion. There's still a lot of viewpoints. So I'm going to give you the perspective
Starting point is 00:04:12 that I have from my company. And it's not just from me. We have a professional services group that goes out and is doing proof of concepts and helping people do testings and doing bake-off and things like that. Every week, they're out doing something new. And this that I'm going through is kind of a distillation of what we've been doing for a lot of years.
Starting point is 00:04:34 I hope you find it useful. So I'm going to talk, blah, blah, blah. I'll talk about these things. If I don't talk about it, if I miss anything, come back and yell at me later. And as I said, this is where I'm going to talk about the methodology, the way we believe things should be done. Wednesday afternoon, I'm going to do a second presentation about the way that we have implemented our methodology
Starting point is 00:05:00 and what we do and how it works. So solid-state arrays became mainstream when nobody was looking. Yeah, there's still the hyperscalers are using drives and more power to them. They've got a working model, and it goes great for them. But in enterprises, solid-state storage is displacing the old disk drives. Spinning rust is useless and less. There are some things that are different about them. We really need to test differently.
Starting point is 00:05:33 In the past, we had programs like Iometer and FIO and others that did straight I-O testing. They were developed from disk utilities. They were not concerned with anything other than trying to get access and get something back from a drive. And you could test randomly across a drive, or you could test sequentially, and you could do reading and writing together. Well, with solid state, the world kind of changed. We've got new features,
Starting point is 00:06:01 and we really need to test for those features. Anybody ever used to see dedupe and compression on a spinning drive? Well, you could wait a really long time to get your data back and do it, but nobody bothered because it didn't help. Well, that's changed a lot, so we're going to look at some things like that. And we believe that the best way to test now isn't just to use the synthetic read-write 80-20 mixture of synthetic writes and reads and random or sequential and all that, because it doesn't tell you how applications are going to run in a data center. What's going
Starting point is 00:06:39 to give you a better idea of what's going to run in a data center is to emulate an application as closely as you can. Now, that doesn't mean that we're perfect. We've still got a long way to go. It's a process we're going through. We're working like crazy on it right now to get finer and finer grain control of what those applications look like. So I want to talk now about the elements. So what's common about an application stream? It has locality. That means there is a certain pattern to where things are written and when they're written, where they're read, and when they're read. And this is true even though people talk about the application blender effect being saying that especially when you're working with a lot of VMs, everything looks random.
Starting point is 00:07:27 I'm going to show you a chart, actually an HTML page, that says that's not true, and that what I'm going to show you is what we found is more honestly what's out in the world. So we need to understand locality. Where is stuff being written?
Starting point is 00:07:42 Why, you know, how is it being grouped to right there? Access patterns. Anybody, if you go in and look at performance stats on an array, especially older arrays, you'll look at it and say, average block size, 32, 33K. I still haven't seen a 33K block in my life. But it's a mixture. So they take all of that and crunch a number,
Starting point is 00:08:05 and 33 is what they give you. Well, it's made up, actually, of multiple different sizes. So we want to test with those sizes. We want to test in realistic ways. Uniform random write. I'm going to hit this again, and so I'll be repeating myself. But uniform random write is the way some vendors say to test solid-state arrays because it makes their array look really good.
Starting point is 00:08:27 Vendors will say test with this certain IOP pattern using this particular load generator, but they don't use bursts, and it distorts the performance. They'll say test without data content. Who needs that? Well, it depends. You know, not every array is good for every single application. If you've got a highly compressible application, then you probably want to test with compression in Ddupe
Starting point is 00:08:53 to find out how well it does, because that's going to lower the cost of what the array that you get. And lastly, testing with bursts. Subsecond bursts are present everywhere. Anybody ever seen a computer that sends traffic out at 100 IOPS? No, it dumps the file. It gets rid of it. So we're going to talk more about that.
Starting point is 00:09:13 So we want to understand what's going to happen in production. This is from my marketing guy. So for those of you who haven't figured it out yet, you don't get as many write cycles from solid state storage. Okay, we get it. So they try it. A friend of mine at one of the vendors says it's best to call these things write avoidance arrays. So they coalesce a bunch of writes up into a memory page, and then they lay them down.
Starting point is 00:09:43 And then they watch that because where that, because where that information was coming from when it was laid down is likely to be followed by other stuff in that same region. That's the way an application operates. If you think of adding to a customer file, it's pretty sequential. You're adding, adding, adding, adding, adding, adding all day long. So in many ways, those bands show up and you see them.
Starting point is 00:10:06 So the write time really is implementation dependent. If you're running JBOD, there's not so much. But if you're looking at most of the modern arrays now, they are doing a lot of sophisticated computing and math to write things as efficiently as possible, to avoid overwrites, to speed access when somebody does read it back, if they ever do. And we want to make sure that that gets done right.
Starting point is 00:10:35 Oh, I said here, reading is free. Well, reading doesn't pit the solid state memory. It doesn't cause any problems. Writing does. So it's a little bit different, but reading still introduces problems with, you can get blocked. If an array is trying to get to some location and it's being written to at that moment, they either have to go to RAID and get to another copy of it, or they have to do some sort of work to get to that to keep it going as fast as it needs to be for to maintain the overall latency
Starting point is 00:11:06 we may well in fact in most cases i don't say in most cases in many cases data reduction which is compression and d-dupe and pattern reduction requires that post-processing occurs it's just not enough time and it's going to be interesting to see what happens with NVMe, because at the much higher data rates, we're going to have to have something that can just scream it, slamming things down and making less space out of it. And that may be pushed to the side. That may all be post-processing, as we see.
Starting point is 00:11:39 That's still unfolding. So I say it typically doesn't affect write speed, because they'll just go through initially when they're taking things in and do what they can to that data and save the rest for later when they can come back and look through the array. Okay, preconditioning is a religious topic. That's for my friend in the back. We have, there's a lot of different opinions out that everything from, oh you know, overwrite it twice with random data, you go back and splatter the data for 20 seconds,
Starting point is 00:12:10 and then you're done, and you've done your preconditioning, and it's not going to change very much. There's other people that say, you need to do this for days. And the reason that you're doing that is we're not writing to disk drives anymore. We're writing to metadata. We're writing, when we do a write, the array looks at that block that comes in. First it sees if it's a duplicate of something that's come before. If it's not, great. Then it looks at it and sees if it can smash it to make
Starting point is 00:12:37 it smaller. And if it can, it writes. If it can't, it writes. And that's how the writing occurs. And if that same pattern is seen, later, you don't have to write. You just make a reference to it. And the smaller the chunk you use, it goes down to about 512 bytes is the smallest I've seen so far. If you go down to that smaller chunk,
Starting point is 00:12:59 then it's more likely that you'll see duplicates. It's just, no, statistics say that that's the case. So there's a whole bunch of tradeoffs that vendors use for this. But this can affect the speed of writing. Preconditioning can affect all of this. And I say more Wednesday. Come hear me Wednesday. Question?
Starting point is 00:13:21 So what forms of other forms of preconditioning are there such as simulated fragmentation? The latest argument, yeah, I don't want to spend a whole lot of time on Tate. I'll spend more, I think, on when I talk on Wednesday. But the latest argument is either you do this for a long time, you try to do it across block boundaries and do everything you can to dirty it up and mess it up. But all you're really trying to do is to come up with a bunch of checksums and then look for those checksums.
Starting point is 00:13:53 And if they exist, you don't store again. Some vendors can do dedupe across block boundaries as well. So something gets appended and data moves out. They can still go find that new chunk and just save a reference to that it depends on the vendor and the algorithms are getting more sophisticated all the time Okay, so we were trying to emulate real applications duh, that's what I already said avoid uniform random write distribution because that's a Particular vendor tactic for selling arrays and It works very well if you have a lot of reserve flash now all the vendors have some reverse reserve flash
Starting point is 00:14:30 what's presented to you is less than what's there and The more that you have in reserve the less you have to worry about what's happening in the background and that really matters most for for Garbage collection when an array has to stop to get and blank a particular memory page, it's not going to be a block anymore. It'll be that memory page. It takes time. And searching for that can be very painful.
Starting point is 00:14:55 And vendors that use less reserve flash have a harder time getting to do that. And the array will reflect that by slowing down, especially if you're doing uniform random write can anybody tell me an application that uniformly randomly writes and never reads pardon any logging process like exchange web servers you're never gonna go look at it you're never gonna go look at it. You're never going to go look at it. Well, okay, and if you're going to say exchange in servers, I'll make an argument back there. You're not seeing it at the rates that are going to choke a solid-state array.
Starting point is 00:15:32 But point taken. Okay, it's unlikely. Yeah, that's right. They are sequential. Yeah, there's sequentiality to it. So it's doing that, that random that comes from Theo and comes from Iometer, was relevant when we're doing disk drives. It's just less relevant now.
Starting point is 00:15:57 And that's not what your typical application looks like. And just wait until I get to that slide and show you, and I'm going to be so happy. If it doesn't work, I'm going to be really irritated, and I'm gonna be so happy to if it if it doesn't work I'm gonna be really irritated not have to take you all outside or something okay so yeah I know and you're getting what you pay for today I'll tell you so don't do uniform random writes use multiple block sizes duh now some people argue back and say oh my my Oracle application is all every time 8 8K writes, that's all it does, and that's the end of the story. Well, sometimes. But one thing about flash arrays, what are people doing?
Starting point is 00:16:35 They're combining applications. They're not running just one application on a stand with multiple racks anymore. They're combining these together, and when you get that, that changes it. More on that later. So we also want to make sure we test in the presence of the other stuff that goes. All the enterprise features backdrops, snapshots, replication, periodic access like in the month processing, and a quarter processing, whatever that might be, which has vastly different traffic patterns. Okay, so what do we care about?
Starting point is 00:17:06 Write read ratios, random sequential access ratios, the access pattern drift. This is, you know, what am I losing it here for? This is where we talk about clustering, where we talk about, oh, never mind. I'll come back to it later. Block sizes, alternate paths. When you're writing to an array especially if you're using a fiber channel with mpio you've got stuff coming in multiple paths coming into the array so you want to test in a way that looks more
Starting point is 00:17:36 like that rather than just trying to get one endpoint and read everything that came in there because you may only be getting half of the story. There may be traffic coming in from multiple interfaces. Okay, so locality is this idea that stuff tends to get clustered when it's written during a day. And this is even true with blended applications. The IO blender effect that has famously been talked about with virtualized applications isn't entirely true there's still some locality to this so we're defining when something is written or read that's spatial locality or where when something where is something is written or read
Starting point is 00:18:21 spatial locality when it's written or read as temporal locality. Both of those are presence. And hotspots represent this locality. The hotspots you tend to be writing in a narrow area, and that tends to drift over time as you're writing, okay? So if you don't test with locality, you're not stressing the arrays that are going to be used, the way they're going to be used in production. And again, if it's just JBOD, that's going to have less effect than if you've got a controller with sophisticated algorithms that are deciding what to lay down and when and how and all that. With a lot of arrays today, you don't write to one, zero, offset 100. You write to a logical representation of that. And where that gets put on the array
Starting point is 00:19:05 may be anywhere. That's because of where leveling. The whole idea is to make that array last as long as it can. So you don't want to keep writing to the same location and burn it out. So where leveling takes care of that. So all of that is abstracted when you're writing to the array. You're writing to a logical space. It's writing to the physical space outback. So I've got a demonstration. So it's showing write locality for a LUN with real life
Starting point is 00:19:34 Oracle ASM access. Horizontal access is going to be the LUN broken up into 100 parts. So if something comes in at the left, that's towards the lower end of the address range. If it comes in at the left, that's towards the lower end of the address range. If it comes in at the right, it's towards the higher end of the address range. And each refresh of what I'm about to show you comes for one minute interval. You're not going to be able to get this,
Starting point is 00:19:56 unfortunately. I wasn't able to figure a way to save it. But if you come to me, we can somehow get you. I'll send you the location or something. Because it is pretty interesting to see this. So the height is just normalized for each refresh. It's not trying to show relative intensity. It's intensity rather to show that the hotspots are there and that they move over time. So my friend Lou Leidigson at Pure Storage is the guy who got this for me. And now is the moment of truth.
Starting point is 00:20:25 And we'll see if my access to the Hyatt network worked. Okay, so now I'm going to have to come up here and dance and show you what... There it is, there it is. No dancing. And you're not seeing it. Let me see if I can do this. Yeah, I'll just let it run for a little while. This is why you go and try everything before you do an actual presentation. But you can see what's going on there.
Starting point is 00:21:17 This is not random. There is a definite pattern to where everything is being written, and this pattern changes over time sometimes there's a lot of clustering sometimes there's less but this is one of many views that that Lou has done and they all reflect this they have not been able to find applications that don't do this. Yes, sir? Yeah, yeah, sure. There. So it's just, it's a representation of a thousand, the LUN broken up into a thousand
Starting point is 00:22:03 different sections, okay? And it gives us a little bit more to see down here. So this is happening. This is happening all the time. We just hadn't been able to see this yet. One thing that I keep... I don't want to sound like a commercial for these guys, but they've got a way of taking a standby controller and just monitoring everything that's coming in.
Starting point is 00:22:24 And they were able from that to generate these graphs that show that locality as traffic is coming in pretty cool stuff so it's there I proved it if you believe it that's the virtual view or the hardware view? That's one LUN. And actually, yes, it is the virtual view. LBA. No, because the way it goes on the back end is who knows. That's all done for. The algorithms take care of that for their wear leveling.
Starting point is 00:23:06 It worked. I'm all pleased okay there okay good I'm back on it okay we just talked about this access is not uniformly random hotspots are accessed more frequently than others during a defined time period, and it varies according to what you're doing. We have some examples here like index temp files, logs, journals, things like that. And we want to, when we're testing, emulate that there are hotspots and emulate that drift as well. Here's one example that we saw of where things were written. So 1% of all access regions got 35% of the IOs. That's quite a bit. And it varies according to each of those, and you get up
Starting point is 00:23:59 to higher numbers get less of the IOs over time. And the FIO developer, as it says here, says that this is a simplified example, and that it's actually even a little bit more complex than this. But they're present, and SKU is present. So, yeah, 25 to 35K is the average reported block size. We look at vendor performance reports, vendor performance processes.
Starting point is 00:24:32 Virtually every vendor can produce a CSV file that talks about performance during a time period. For any of you guys who work for vendors that are willing to share share with us we beg you please do because we can tend to take the output of that csv file and turn that into input for a workload model i'll talk a lot more about this on wednesday but the idea is that we take this information that's reported from the array. Did Eden live? Eden left.
Starting point is 00:25:09 Darn it. Eden Kim's working on another model that works by using a small program in an initiator or target, if I understand it right. Is that correct? That is able to record what happens on the way out or going through one of the interfaces. And he gets pretty good resolution out of it. The problem that we have, really, is getting adequate resolution for this. If you get a five-second interval,
Starting point is 00:25:36 you could have a horrible spike for a couple of seconds that really impacted performance, but you average it over five minutes, and it looks like nothing. So the better we can get this, the better. And without being too much of a plug, Virtual Instruments is working on that. Now, the guys, the VMAX team, has got something called Trace SRT that allows them to get really fine time-grain intervals, The command, what it was, how long it lasted, and things like that that allow you to recreate
Starting point is 00:26:08 a really accurate model of what happened during that time. And that varies all over the map. From some vendors that give you these five-minute reporting intervals, which is not very useful at all. It still may be a little bit better than doing things completely the old way of testing but the finer the grain time interval the better and we're working towards getting below one second it's a limitation where we are today of computing getting trying to move all that data and at the same time say oh by
Starting point is 00:26:40 the way we did all this stuff it It's pretty difficult to do. But we're going in that direction. So 25, 35K often applications don't use uniform block size across an entire application. We want representative block sizes. So you set up percentages of different block sizes and you mix those in. Now, we are doing a statistical model. This is not an exact replay of an application because if we had the desire to do that, we'd have to have a second data center to collect all the data to do it. Because, you know, a PCAP at 10 gig, much less 40 gig or 100 gig, just huge amounts of data. So we summarize, distill that down, and we're using that.
Starting point is 00:27:20 But we do a synthetic model of it. So burst. Here we go. Craig is in the back of the room to correct me if I blow this one again. So here's this burst that happened during a particular time period. The host performance, they see this going up. During this time, the IOPS didn't change. Latency didn't really change. But there was this huge burst that happened. So it comes and it goes. These are real. Computers don't send 100 IOP files. They get rid of them as quickly as they can. When that happens, it tends to load up
Starting point is 00:28:05 the CPU, it loads up the queues, it can mess up the processor memory, the buffers, and the queues, and can dramatically lower performance. There's a certain vendor that's saying recently that we should all increase our queue depths. And we argue just the opposite, that the lower the queue depth setting, the more likely you are to avoid network congestion and you're going to get more work done during a given time. So, yeah, 50 milliseconds. Okay, yeah. Is this here?
Starting point is 00:28:40 No, no, it's not. Those heavy burst times are actively impacting performance while they occur. The more things are sitting waiting to get done, the more it affects everybody going to that array. So for reads or writes, which one is it? Okay. So we see here that we have minimum pending exchanges.
Starting point is 00:29:05 There was a lot of times where there was just nothing happening over this period of time. It's almost an hour. We'll see the top average pending. There was actually, during a whole lot of the time, a lot that actually was waiting to get done, which means somebody's getting backed up. And we see the max pending up here,
Starting point is 00:29:24 our average about 40 and lowest about 20. That was happening over that time. Max, we see that it was well over 100. So this is what's happening on the wire. And this is going to be what really can hurt an application. So let's look at reads. If this was just a constant number of IOPS, we're getting, what, 40 to 80k IOPS,
Starting point is 00:29:59 we would see a much more even response time. What we're seeing here is a range of 200 to 400 over a 50 millisecond barrier. So the bursts were 40 times as high as the average. What this means is that even in a second, stuff hits the array, it gets backed up a little bit, and then it drains and keeps on going. And there's periods where nothing else is going on, but where those peak times occur,
Starting point is 00:30:30 it's actually seriously impacting the performance of that array. And that gets missed a lot. IOPS, we're seeing it's only about 2K IOPS here, but there was significant amounts of delay there. Okay, so writes, real writes. Here's an example. We're looking at IOPS. IOPS were almost nothing there except for one big peak during a brief time.
Starting point is 00:31:01 But when we look at the 50 millisecond peak, we see that it's even worse than we saw when we were looking for writes than for reads. So here's the reads. Oh, no. There's the reads, the writes, and those bursts were slowing us down. Okay, so there's a range of these as well, and that's what this is showing. Those bursts were slowing us down. So there's a range of these as well. And that's what this is showing.
Starting point is 00:31:30 The peak, the bottom, the mean. So a couple of these guys were really working, getting greatly increased latency. It was less bad over time. But the IOPS here were much more stable. And you're going to see that in a real application. Sir? Ma'am? So where is this being observed at?
Starting point is 00:31:59 This is being observed off of a wire using a tap, using virtual instruments, virtual wisdom application. This is our data. Yeah. This was observed in. The first image that we showed you was an actual customer issue where the only observable problem in their environment where the legacy was doubled was the fact that they were having these light bursts.
Starting point is 00:32:24 Yeah. It's the only thing that changed. environment where the latency more than doubled was about how big a half of these lines burst. Yeah, it's the only thing that changed. We also went through and analyzed our customer data, and this is common in all of the data that I've had asked, that most of the time, every second there's a time period in the second where it reports to the entire second. And in some cases the writes, all of the writes for the entire second occur within a second. So it gives you an idea that things are not a flat line for the sub sub sub sub one second. They're very, they're all over the map. Even if you thought it was safe, high ops, And we've had multiple vendors come back to us and say, you suck.
Starting point is 00:33:16 You give lower performance than Iometer does. Or, I'm sorry, VDBench. Because they were running VDBench without bursts. And yeah, we do. When you make the traffic more real, you don't get as much data through. So writes, reads, writes, the range of response times. Now, load dynamics, our load generator, by default, breaks up transmissions and bursts them in less than a second. That's
Starting point is 00:33:45 the way that the product was designed in the way that it works. So you'll see here we have a second broken up into ten pieces and there's those bursts are taken and when we ran tests against arrays, performance was slightly lower than what they would get running something like VDBench without any bursting. And this is what those arrays are going to see in production, not that steady flat line that you get out of a storage test tool. So we're seeing here we're getting about...
Starting point is 00:34:24 Oh, what am I trying to read here? Oh, action. It's 25K IOPS here, and we're getting a response time of about 40K microseconds. 40 milliseconds, sorry. 40 milliseconds. That's with the burst. 40 milliseconds is pretty bad. Now, if you take this down to 80%, take that rate before and drop it down by 20%,
Starting point is 00:34:50 we see that we actually were getting 20k IOPS through. We're getting down to 6 milliseconds response time. This is all what we're talking about, sizing the array for what it's going to really see in production. So that's running at 80% of maximum, running without bursts, which is what a lot of guys report as their numbers. They'll take and just run this straight and meter it out. And we see that we're showing 15,000 or 25,000 instead of the 20,000 that we saw at the last screen. But they're showing it at 1.2 milliseconds latency,
Starting point is 00:35:28 and this is what they publish. If you're sitting in the data center running applications, this is not what those arrays are going to see. So these are the hero numbers, and testing with those bursts, testing with that realism, is going to give you a lot better idea of what you're going to get when you really run the array. So here's the difference here. I love this one.
Starting point is 00:35:49 So with bursts, we saw they were getting 20k IOPS, throughput of about 1250 megabytes, six and a half microseconds, milliseconds latency, and this is the real world. With bursts, 25k IOPS jumped the throughput up slightly, but hugely increased the latency. Unacceptable. And without the bursts, we're seeing 25k just like we were seeing with bursts, but we're seeing 1.2 millisecond response time, which is a myth, but which is oftentimes published. So first matter. Okay, on. Any questions about that before I go on? Okay, data content.
Starting point is 00:36:31 Most of the arrays are using data reduction, deduplication, compression, and pattern reduction. Like I said, if they see a block of all zeros, make a reference, Don't store anything. It's a metadata reference. If something comes in, they check some of it. If they haven't seen it before, they try to reduce it. If it gets bigger or it doesn't reduce, they save it as is.
Starting point is 00:36:56 If it does reduce, they save it down. And then they'll have a varying number according to the array of times they can repeat that before they have to make another save of that same piece of data. Coming back out the other way, because we have solid state memory now and we have faster processors, it's possible to what they call rehydrate or re-expand that data and get it out to the sender much more quickly. So all of those are there. Not all arrays do compression. Not all of them do dedupe. Not all of them do pattern reduction. So you want to know what's going to work?
Starting point is 00:37:30 Test for it. And it's not just to say, oh, it got this dedupe ratio or this compression ratio. That doesn't save a customer a lot of money. The more efficiently data can be deduplicated and compressed, especially at higher rates, the less you need of that stuff. And for customers, that's an important consideration. Okay, we all want to sell more as vendors, but we also want to be considerate of this as well.
Starting point is 00:37:57 So, we create data content patterns, we turn those into streams, which essentially take blocks in repeating in various amounts. You'll have a block with a certain checksum may be repeated 10,000 times during a test. A different one may be repeated not at all. It may just be random. But those combination of them give a stream that looks like an application. I talked about the repeating and non-repeating patterns. Pattern lengths vary.
Starting point is 00:38:35 As I said, some arrays can detect patterns across block boundaries, so they're able nominally to get a little bit more compression out. Next is thread count and Q depth. So when you're testing, not only do you want to do the best testing, we'll say, okay, I want to understand a technology. Okay, I choose that technology and then I want to know what configuration of that technology is going to work best. And when I'm doing that testing, I want to be able to test using different Q depths to find out what gives me the best performance and combining that with different thread
Starting point is 00:39:10 counts to find out what the capacity max number of users and such is going to work for a given size of an array maybe I have to go up to a second controller I have to do a scale out to go to multiple nodes to maintain the essentially maintain the latency that I want for that application over time. So we find the max IOPS an array can do per thread, per Q depth, and then a total for a given number of threads and Q depth. And then we increase this to find out where the limits are. You shouldn't really be testing for today.
Starting point is 00:39:44 You want to test what an application can do today, and then the lease period is going to be somewhere between three and seven years, something in there, and they're going to have to refresh at the end of that time. So you want to make sure at the end of that time the array is still performing. This is important, even going back into testing with disk-based arrays. There was a company we worked with that had installed, they had five stands, and they put the first one out when their traffic was relatively low, worked fine.
Starting point is 00:40:13 Second one came in, third one came in, fourth one came in, and the whole thing fell apart. It was a limitation that they didn't understand in that particular array, and it caused a very public outage so oops that's why you want to test up to what your expected maximums are going to be next because we're talking about solid-state in many cases not all I'm trying I'm not trying to claim that everybody's going to be throwing 50 different
Starting point is 00:40:40 applications on one array so but we're seeing more of it coming, right? We're seeing more of that happening. So we want to test those applications together. Find out the performance of an individual app and then start combining them. And again, crank up the numbers to find out what the maximum performance is going to be. And last night I was talking to my friend SW, and he said, you want to be careful about this. I was saying, ah, you just should test them all on the rainy day. Well, that's not going to happen if you're...
Starting point is 00:41:15 The apps aren't necessarily all going to hit peak performance at the same time, so you want to be sensitive to that. Look at the patterns, you do some baselining, understand how the applications are performing, and then put them together in ways that's going to make sense over a test, say, where you emulate a day or something like that. And make sure that the additive peaks don't exceed your latency targets. You want to stay within that. So we want to emulate each application, combine them, and then test, of course, with the other processes that
Starting point is 00:41:49 are going to go on, snapshots, replications, backups, periodic processing, and all that. And take care to make sure that the peak times are represented, just not make this one big additive jump on top of everything. I say here, this is a work in process, progress. We're working on this. There's still a lot of work to be done,
Starting point is 00:42:11 but it's where we can do a better job of helping to represent these applications running. But this is pretty much where we are today. And we need ways of speeding this up. Somebody doesn't have usually the luxury of spending, oh, we'll take a month to test this array and a month to test the next one. You maybe get a week or two to crank it out. And so you have to make it more efficient.
Starting point is 00:42:32 We're early on in this, and it's getting better. Okay. I already said this. You want to make sure that you're sized properly, so crank up the numbers if you get an application running at a certain rate today. I think my assertion that it's okay to make the assumption that as traffic rises, the components of the application are still going to work approximately the same. I may be wrong, but it's the best we know today. So try to take today's rates and boost them up
Starting point is 00:43:02 and just make sure that the latency targets are met, that the system doesn't fall over on its side. So this all used to be black art. It's not so much anymore. It's still not clear, but it's clearer than it was. There is no such thing as a perfect synthetic workload. The perfect workload is the workload that's running in the data center on the equipment it's running on now being moved over. But that's not a very easy thing to do to try to test five different arrays by running your current application on them. So synthetic is what we've got, and it works pretty darned well.
Starting point is 00:43:40 Customers can see how closely their workload matches a model. We do this all the time. They'll get performance stats from their application. We will model it and run it, and we find that the peaks, valleys, the temporal stuff stays almost dead on with that. So we can emulate this now. The days of not being able to do it are gone.
Starting point is 00:44:01 We can do better than we used to do. And I think this is changing where we're headed. So thanks. This is my ad. Please come hear the second presentation, which I'm going to do Wednesday, and I'm going to talk about an implementation that we've done. And I'm trying not to make this too much of a commercial. We've just been working on this for a long time, and we think we have something valid to let you know about so Wednesday 4 0 5 Come on down. You can this is going to be posted online I think with this latest version of the last screen that actually says virtual instruments instead of your name here
Starting point is 00:44:37 and Come and see us any questions Sure Come and see us. Any questions? Sure. Could you touch on garbage collection in splash devices, such that, I call it the Super Bowl problem, your system is pretty idle until the one time you really have load and the counters start climbing and it may kick in the garbage collection, the one time you don't want to go. Right, when everything gets delayed. Yes.
Starting point is 00:45:03 Yeah. Consistently? Some vendors are more vulnerable to it than others. It's a matter of sizing. Okay. In general, I think it's safe for me to say that the less reserve flash you have, the more prone you would be to that.
Starting point is 00:45:22 And if you're operating in a JBOD environment where you don't have metadata, as I said, we're writing, essentially, we're writing logically, but that turns into metadata and what gets written physically is something completely different. There's a lot of work going on in that direction. But the thing is, NVMe is gonna blow this all up.
Starting point is 00:45:50 I'm looking really forward to seeing what happens, but there's gonna be a lot less time to do stuff with data as it's going through, because it's gonna be coming through at higher and higher rates. I'm gonna use this. I was at one of those customer love-ins a couple months ago, and their comment was, by 2020,
Starting point is 00:46:10 we're going to be generating five zettabytes a day of CCTV, of video. It's got to be stored somewhere. The Internet's going to have a capacity of about 1.5 zettabytes. So everything to the cloud, it seems a pretty big stretch to me now there's techniques that we can use to compress and do all kinds of things like that but if it's something that the lawyers got to see it's still a lot if you're monitoring a busy time it's gonna be an issue I just got the hook so if you'd like to talk some more, come on, check me out afterwards. And also on
Starting point is 00:46:48 Wednesday, come by if you've got the time. I'd love to talk to you a little bit more. So thanks for now. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the storage developer conference, visit www.storagedeveloper.org.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.