Grey Beards on Systems - 55: GreyBeards storage and system yearend review with Ray & Howard

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Howard Marks here. Welcome to the next episode of Graveyard Done Storage, a monthly podcast show where we get Graveyard Done Storage and system bloggers to talk with storage and system vendors to discuss upcoming products, technology, and trends affecting the data center today. This is our 56th episode of Graveyard Done Storage, which was recorded on December 21st, 2017. It's just Howard and I today, and it's our annual year-end podcast. So Howard, what's the news? Well, I think the biggest thing is actually a long-term trend that in the never-ending battle between build and buy,

Starting point is 00:00:39 buy is definitely winning. 10 years ago, now part of it is because I was a consultant ten years ago, but I think in general it's still true, that ten years ago that we built things. We bought storage and networking and servers and we cobbled them together ourselves. Installed the software and did all the software configuration and all that stuff.

Starting point is 00:01:04 It was a lot more of a hands-on environment in the old days. Yeah. Well, today, you know, whether it's converged infrastructure or hyper-converged infrastructure or kind of the ultimate build – you know, the ultimate buy is not even to buy but to rent and go, I'll just put it in the public cloud. The cloud. Yeah. Yeah. Yeah. And so that every organization's applications were unique little snowflakes. And they weren't even pets.

Starting point is 00:01:35 They were Paris Hilton's chihuahua. And then we shifted to where they were pets. And, you know, I've got dogs and pets are great. But today – No, I heard something called cattle. I started to call applications cattle. And, well, it's actually servers that should be cattle. Yeah, I understand. It's a pets versus cattle discussion in the container space.

Starting point is 00:01:57 But, yeah, go ahead. But that's, you know, that's a different conversation because that's about software development models and operations models. And, right. Right. And I think that I don't buy it for the enterprise. I mean, I buy that the physical servers should be cattle. And that was one of the great things that, that vSphere brought us. Right. Is that a server, you know, I've got a, instead of having a, an exchange cluster of two servers and a SQL server cluster of two servers and three domain controllers that I

Starting point is 00:02:34 keep separated, you know, one in separate racks connected to separate switches, um, that with all of those things virtualized, I just have hosts and a host is a host is a host. And the real key to the pets versus cattle argument or discussion is that when your pet gets sick, you panic and go to the vet. And when your cow gets sick and falls over amongst the thousand other cows, you just leave it there. And in web scale operations, that can be true of all sorts of microservices that, you know, I've got 57 shards of MySQL. And if one of those shards goes down, then I still have 56 shards and I'm okay. No big problem, yeah. But that implies that you're doing

Starting point is 00:03:30 what the hyperscalers do, what Facebook does. And you're managing nine applications across billions of users. As opposed to the corporate data centers I've worked in, where I have a thousand applications, but more than half of them have less than 50 users. Yeah, yeah, yeah. And, you know, there you just, I mean, you can say, well, if that application is having a bad day, I'll just tell those 30 users to go home.

Starting point is 00:03:57 But it's not the same. Yeah, yeah. So back to the infrastructure side of this discussion, the build versus buy, you seem to be saying that there is a general trend, or I would say a seesaw here, between where companies would go out and buy the parts and put them together into an infrastructure environment versus going out and buying an out and buying, you know, an appliance rather that does it all. Is that what you're talking about there? Oh, yeah. You know, when I go out and I talk to the enterprise users that I know, you know, they're buying everything from V-blocks to VxRail to Nutanix or SimpliVity or, you know, some

Starting point is 00:04:44 other hyper-converged solution. Scale computing, for instance, yeah. Not really so much in the enterprise. Yeah, that's true. Yeah, yeah, yeah. But if you look at a vBlock, it's a Unity. I almost said V and X. Yeah, it's a Unity world now, okay.

Starting point is 00:05:04 But it's a Unity or a VMAX and UCS servers and Cisco switches. And there's a little bit of added management software, but it's basically the same gear that you probably were going to buy 10 years ago. And you would have bought them separately from each of those separate vendors and put them together on your shop floor and tested them and run it. And that would have taken your senior architect a week to decide exactly how much memory and what processor and, you know, which NIC to use. And am I going to use SFP? Checking compatibility and all that garbage. Right.

Starting point is 00:05:38 And then in the assembly stage, it's going to take two days for making sure that the firmware on every component is at the level that's on the HCL. And then racking and stacking it all and all that stuff. Well, I'm going to, even if we assume that, you know, the VAR is going to send somebody, because I just spent a million dollars with them, he can send me a guy for a day. Okay. Okay. I got you. To do the racking and stacking. Right. Right. There's two, you know, in terms of converged infrastructure, there's two pieces. You know, first of all, there's one throat to choke. Right, support-wise.

Starting point is 00:06:12 Yeah, I got you. And, you know, because I was having conversations with some people just yesterday about, you know, virtualized exchange. And, you know, Microsoft saying, you know, we don't want your virtualized exchange unless it's with hyper v and you know just the research to go through all of that stuff and then go and i've got six vendors and they're all going to point fingers at each other so there's that side of it there's i don't need to tie up my smartest infrastructure system architect. Doing the prep work and testing and all that stuff. Doing the product selection. Yeah.

Starting point is 00:06:50 You know, not the gross product selection, but the detailed product selection. Uh-huh, uh-huh, uh-huh. And then it all comes set up so that the time and effort from I need something to support these 5,000 VMs to I have on my floor something I can start putting those 5,000 VMs on is reduced. Considerably, yeah.

Starting point is 00:07:14 Yeah, I mean, we're going from what was probably 90 days to what's probably 30 days. Oh, gosh. In some cases, even shorter. I mean, these guys, they're touting like hours from power on to operation. But that doesn't include from PO to loading dock. That's true. That's true. That takes time.

Starting point is 00:07:34 Yeah, most of that time is from IUSHA to PO, when's it on my dock? Yeah, yeah, yeah. Bill me later. Right. But that's time, not effort. Right. That's true. That's true. And I think that the reason we're seeing this is that we've abstracted things so far and we've commoditized so many pieces of the infrastructure that we don't have to carefully select things anymore. The difference between a Dell server and a Lenovo server and a Supermicro server and a Cisco server aren't so enormous. As they used to be. As they used to be.

Starting point is 00:08:17 And so this whole sea change is really substantial. And I don't think it's gone away. I think this is semi-permanent. And you think it's not only applicable to servers but networking gear, storage and all the other infrastructure surrounding these things. Well, networking gear is an interesting case because the network guys are disgustingly loyal. To whatever vendor they've grown up with and love. I mean, I know guys who have their CCIE number on their business cards, and they're more loyal to Cisco than they are to their employer. Yeah, well, I imagine there's some monetary advantage from that perspective.

Starting point is 00:08:59 But yeah, I agree with what you're saying. But I mean, that goes so far as I have heard of cases where senior management decided we're not going to buy Cisco for this new data center. We're going to buy Juniper because they have this layer two fabric technology that we like. And several of the network guys up and quit because they didn't want it's not not cisco yeah i mean i understand the investment in you know i i spent 20 years learning this cli but as we start getting into more and more automated environments that 20-year investment in the cli is going to become less and less valuable. And I mean, I was a consultant. And the value I delivered as I would walk into an organization and have the meeting with the Cisco guys and the Juniper guys

Starting point is 00:09:56 and know enough about not, I mean, I was never a guy who knew everything about both of them. But I knew enough about both of them to know when the sales guy was lying. Yeah. Which is the important aspect I might add. Yeah. Well, it's like, oh, so you're saying that they don't have this thing, but they, you know, but I know they have this equivalent thing. And I don't care how you solve that problem. I just want the problem solved. But I mean, a similar problem exists here. I mean, if you look at the converged infrastructure, networking is part of that. Yeah. But if you look at a, if you, if, you know, if you look at a V block, the networking that connects those servers to that storage and

Starting point is 00:10:34 connects those servers to each other is included. And then you have like four lines that come out of that. Right. And connect to the rest of the network. That connect to the, to the global world to the global world yeah yeah you know you treat it like in its supply like it's an appliance the network guys don't play with that switch yeah anymore yeah um what i'm really hoping and looking forward to is that the hyper converged folks reach the point reach the level of the current type. For their software-defined networking? Well, for their networking at all. Okay. And I wrote a blog post a couple of months ago

Starting point is 00:11:12 about how I've read 400 articles that all start with hyperconverged is compute and networking and storage all in one. And at that point, I had never seen a hyperconverged solution that did anything with the network. Nutanix, if you run Acropolis and you really buy into the Nutanix. The whole worldview, yeah. Will do some control of the switches. But again, it's a very limited set of switches.

Starting point is 00:11:50 And in the VMware world, they'll sell you NSX and say, you don't have to control the switches. We'll do everything in an overlay and the switches can be stupid. Right, if you want. But I think NSX is overkill for the mid-market. That for anybody but a really large organization with high security requirements or high security understanding where the micro-segmentation and such pays off. Pays off. make pays off pays off i think nsx is overkill and and i would really love to see um a hyper converge vendor you know like scale or like vmware say you know we're gonna use the open stack

Starting point is 00:12:38 neutron driver that most switches provide provide network control kind of through that and when you say create a vm and put it on vlan 27 that it makes sure that all the ports that connect to the hosts that that vm is going to run on access vlan 27 properly uh we haven't yeah we haven't reached that point yet. But they're not there yet. But I think that's where HCI lives up to the hype of being able to say that we have integrated compute and storage and networking. Today, it's compute and storage, and you need a network, but they're not really doing all that much

Starting point is 00:13:23 to support the network. And frankly, a lot of the hyper-converged solutions from the second-tier vendors are just storage that runs on the compute. Yeah. With virtualization. We wrote a storage layer and we run it as VMs. Completely different UI. Right. So, I mean, back to this build versus buy trend in the industry. So it really doing the detail work to get something that works together and are purchasing this stuff in a package.

Starting point is 00:14:17 Yeah, and it's not just on the primary side. You know, we're seeing this on the secondary side too. If you look at Cohesity and Rubrik, and I mean, we talked to Commvault just the other day about their appliance solution and Unitrends and, you know, a couple of other players, are building backup appliances that aren't, you know, like a data domain, a target, but are the data mover and the target integrated? Yeah, as a combined solution. I guess Veeam's got something similar as well. I haven't seen it from Veeam. You know, Veeam's... Maybe I'm confused because I had an NDA discussion with them, so maybe I...

Starting point is 00:15:03 Yeah, I mean, certainly Veeam has Veeam has bits and pieces of the solution. And they could and probably should package it up as an appliance. Right. But they start facing the problem that they're Switzerland. Yeah, yeah. And as soon as they say, yes, and we decided we're going to build this and it's going to be on super micro appliances, you've got channel pushback and you've got OEM pushback.

Starting point is 00:15:31 And, you know, they probably want to do something like the VMware ready node solution where, you know, here's the blueprint and everybody can sell this. The vSAN ready node. Yeah, yeah, yeah. Yeah, I got you. And with these following, you know, channel partners and or vendors and stuff like that. Right.

Starting point is 00:15:54 You know, it shouldn't matter. You know, somebody like Veeam shouldn't care if the integration happens at, you know, Avnet or Arrow before it goes to the VAR or at the VAR. It's just – but if you look even at the vSAN sales, we think about vSAN as a roll-your-own solution, but the majority of the cash that's flowing into VMware doesn't come from that. It comes from either ReadyNodes or VxRail because people want to buy these things pre-assembled. Yeah. Well, that's the build versus buy thing again. Because I mean, by buying the

Starting point is 00:16:31 appliance, they get it all without having the problems and the worry and the time. And again, it's a single support place and all that other stuff as well. I think the biggest piece is, as we well know, because we talk to our friends, it's much easier for an IT manager to spend his hardware budget than to hire people. And, you know, this is why we don't run OpenStack in most organizations because VMware cost half a million dollars more than OpenStack did. But I would have had to hire two guys at 250 a year to run OpenStack. Anyways, yeah. And that means, A, I have to find two guys who know OpenStack. Right, right. And get them here. And yeah. So this is in part a reaction to a skills

Starting point is 00:17:21 shortage. It's in bigger part a reaction to just the way corporate budgeting works. And, you know, as far as I can tell, just that the average solution is so much better than it used to be that the extra 5% you could get by going with the best of breed isn't worth it. Doing it yourself. Yeah. Well, that's interesting. So that brings up really the next level of discussion here. We were talking earlier this year with Micron and others with storage class memory and stuff like that. But really, the thing that seems to have come out of the woodwork is this 3D TLC SSD stuff. The vendors have, you know, Samsung, Micron, and I think Hynix is on its way,

Starting point is 00:18:11 and Toshiba to some extent are all starting to roll out this 3D NAND, which is taking the next, you know, capacity over, you know, cost trends to the next level, I would say. The last data I saw was that 3D NAND surpassed planar NAND in bits sold last quarter or the quarter before. And it's just amazing that's coming out i mean the size of these things are gigantic well this is okay let's talk about the the process first and we can talk about the products because they're there's interesting stuff on both sides of that okay absolutely absolutely you know in terms of, it took all four foundries a substantial amount of time.

Starting point is 00:19:09 A while to get it to lock. To get to the point where one wafer of 3D was more sellable bits than one wafer of cleaner. Even though they were doubling or tripling the capacity by going to a third dimension. Yeah, the yield was a killer, right? Yeah. And, you know, if you say, yes, every chip holds four times as much, but we only make a fifth as many chips that we can sell. Good, good chips. And that's what happens at the beginning of a complicated process. Jim Handy, you know, we learned about, yes, and now you have to drill and, you know, metaphorically

Starting point is 00:19:46 drill because nobody makes a drill that small. It's very straight. But, you know, you have to drill a quarter inch hole 700 feet deep with the sides being perfectly straight. This is a process took people a while. But the net result is, you know, since the foundries have made the transition to where they can make the 3D stuff profitably, we're going to start seeing the supply side loosen up. I saw some data from disk trends that the spot market and contract prices for Flash were actually down two or three percent in the last quarter as opposed to having been flat or up for quite a while. Raising, yeah. And even though an enterprise SSD still costs 10 times as much as a spinning disk on a per gigabyte basis. Now that it's end of year, I'm seeing people go,

Starting point is 00:20:51 yes, and just as I predicted, SSDs are killing hard drives. And it's like, really? I mean, the price difference is the same. Things haven't changed as far as I'm concerned this year. Now, again, from our conversation with Jim Handy, the longer a shortage lasts, the steeper the price decline at the end. That has to be met up. Yeah.

Starting point is 00:21:14 And this has been a very long shortage. Right. By the standards of the semiconductor business. So there's going to be a big price drop second half of 2018. When all these guys kick into high gear and stuff like that. And I'll be very happy about that. In terms of product, 3D and the denser chips means people are making denser SSDs. God, these things are huge. Well, I mean, Viking at the Flash Memory Summit introduced a 50 terabyte SSD.

Starting point is 00:21:45 Yeah, yeah. Seagate. Yeah, yeah. Same thing. That's insane. Yeah, Seagate, I think, had a 64. Yeah, insane. Insane. Well, they're not insane. They cost an arm and a leg, but yeah. But they are focused for a very small number of customers. And here, this is something that I have to start educating people about, the hard drives.

Starting point is 00:22:09 You know, we got 10, 12, 14 terabyte hard drives today. Nobody should be buying two 14 terabyte hard drives and running them in RAID 0, in RAID 1. Yeah, yeah, yeah, yeah. These devices are designed for people who need a lot of them. Right. Yeah.

Starting point is 00:22:27 A lot of bulk storage, yeah. And I wouldn't buy, I would always get to where I have 40 devices before I start worrying about big devices. Because you want the potential to do erasure code across those and for those devices to fail and not kill yourself, et cetera, et cetera. I want a many-to-many rebuild. You know, whether that's erasure coding or chunklet RAID, you know, it's the same thing. The difference there is, you know, how far apart the drives are. terabyte hard drive, SSD, if it was as, if it could accept data as fast as the spec sheet says it could accept data for extended periods of time, which we both know is never true. Yeah.

Starting point is 00:23:13 It would take four days to fill. Yeah. Right. I did the math. An SSD, mind you. Yeah. Yeah. Yeah. Yeah. I know. I know. And it can't do that. You don't want to be doing a one-to-one rebuild in traditional RAID-style systems. Yeah. Or even a five-to-five rebuild when your exposure is that big. Right.

Starting point is 00:23:34 Yeah. I mean, even if I had 20 of them and a good chunklet system, it's still a four- to ten-hour rebuild. And so, you know, it's funny. You look at like the register and yes, the people we know read the register, but the people who comment on the register generally have never seen a data center from the inside. And so you get, you know, comments about how

Starting point is 00:23:59 this is not an appropriate consumer product. It's like, yes, this is not a consumer product. This is a forklift. Only industrial users use forklifts. We need these things. Yes. Yeah. But the, you know, the bigger impact in terms of how things work,

Starting point is 00:24:19 as opposed to just, oh my God, I remember selling 30 meg hard drives for five grand and now I hold 50 terabytes in one hand, is the transition to NVMe. That's moving fast. Right, right, right. A lot faster than anybody expected, I guess. Oh, we did do a forecast on that and we did have certain people say it was going to come out and I'm going to have to do a forecast contest wrap up here in the next month or so because the end of 2017 is here. But the other thing was the storage class memory, right? We all thought storage class memory would make a major inroad across the industry. And what we're finding is very niche product at this point because of the cost. We've been being teased with storage class memory for years.

Starting point is 00:25:06 Yeah, yeah. You think it's a holographic storage kind of thing? No, it's there. Holographic storage is five years away. It has been for the past 20 years. Yeah, yeah. Storage class is a year away, and it's been for the last three years. And right now I say it's 18 months away.

Starting point is 00:25:23 First of all, when I use the term storage class memory, I'm not talking about the media. I'm talking about the use case. And that means memory addressed in the dim socket. Right. Storage. And persistent memory. And Diablo teased us with that, but never really delivered. Yeah. You know, in their short life and poor Diablo, they've gone away. But, you know, they delivered two products, one of which was it goes in a dim socket, but you treat it like a SSD. Right. And the other was it goes in a dim socket and it uses flash, but it's volatile.

Starting point is 00:26:05 And neither one of those was really a great solution. Right. For what? Yeah. Everybody wanted persistent RAM effectively. When we start looking at, you know, 3D Crosspoint is still the leader of the next generation memory products. And I, you know, I built and bought a new cluster of servers for the lab in the fall because I finally got to the point where I needed the capacity.

Starting point is 00:26:29 And I bought for each one of them the 32 gig M.2 Optane drives that Intel's selling. No kidding. Well, it's 32 gig. They were 80 bucks. I'm impressed. Yeah, yeah, yeah. yeah i had to get you know that of course when i say new servers i mean new from ebay so they're so our generation may be down but not too far off yeah yeah they're you know whatever's a thousand Yeah. You know, whatever is $1,000 on eBay.

Starting point is 00:27:06 Right. And they don't have M.2 slots, so I had to get the M.2 to add in card adapters for another $10. So Optane is around, and they Optane, you know, the 400 or 379, I think it is, gig Optane cards are available. Yeah. But when Intel released the latest generation of Xeons, they called scalable because E3, E5, E7 wasn't hard enough. Now we have to have gold, silver, bronze, platinum, plutonium, and rhododium.

Starting point is 00:27:44 Right, right, right. Oh, yeah. Is this a marketing thing here? But when they announced that, they did not put into the chipset and the motherboards support for SCM. So it will be the next generation of Xeons before this really persistent memory becomes viable. Prevalent, yeah.

Starting point is 00:28:11 And at that point, all of a sudden, SAP HANA goes from being an edge case to a mainstream product. You know, bringing up SAP, it's another one of these build versus buy. In the old days, all these enterprise environments had their own HR system, their own payroll, their own inventory system, only ERP. Now everybody's using SAP. Yeah, but that happened in the 90s. I'm still seeing it happen. I'm sorry. PeopleSoft, Oracle Financials, SAP, you know, the pack. In the 90s, we went from completely custom accounting software where everyone was a snowflake to customized versions of these out-of-the-box ERP systems. Right.

Starting point is 00:28:56 Okay. So we'll table that discussion and move back to the world today. So SCM emerging sometime with the next generation of chips. Yeah, 15 to 8. And servers. And by that time, hopefully the fabs will get to a point where they can actually create enough of this stuff to meet demand. Intel and Micron. And notice Intel selling Optane, but Micron hasn't started selling their product yet.

Starting point is 00:29:22 Yeah. And that's because there isn't enough of it. Right, right. Yeah, yeah, at a reasonable price. So Intel and Micron are losing money on every Optane SSD. Yeah, I've been there, done that, don't like that. How much of that is Intel losing money and how much of it is Micron losing money?

Starting point is 00:29:41 I'm not getting into because I have no idea. Right, right, Yeah, yeah. It's a whole new process. And so they have to run a couple of thousand wafers to see how, and then make a tweak and then run another couple of thousand wafers to see how the process goes. Yeah. To learn.

Starting point is 00:29:59 It's a painful way to try to lock down the process and get the yield up. So what's going on right now is Intel has this optane channel to sell as many chips as they have to make to learn the process. Right. At whatever price they can get for them, because they have to make these chips because two years from now, they have to be able to make them profitably. And the only way to learn the process is to run the process. Yeah, yeah, yeah.

Starting point is 00:30:26 So next year sometime I expect that the process will start work. Yields will be up high enough that there aren't, you know, the demand for Optane drives doesn't suck it all up. And then Micron will start selling their equivalent. And they had a brand name and I forget what it is. And then they'll start selling those. And then they'll start, you know, then we'll start seeing, you know, products designed to deal with the production stages. Of those solutions, yeah.

Starting point is 00:30:55 Yeah. But it's not, you know, that there isn't demand or that they can't make enough. It's that if they made enough, they would lose more money. They have to iterate the process enough. That to carefully control the channel, the process, and the whole world so they learn the most with the least amount of pain to get to a point sometime in the future where the yields will be sufficient so they can start opening the spigots. Interesting. Right.

Starting point is 00:31:23 Well, and that brings us to our third topic. That's correct. When we can start buying not gigabytes, but petabytes of Optane, we'll start seeing NVMe over Fabrics accessing that Optane. And I've been incredibly impressed at the rate at which the industry,

Starting point is 00:31:43 if not the market, has been adopting NVMe over fabrics. Yeah, we were both at the, I think it was Accelero Storage Field Day event. And I was just blown away with the capability that they had. I mean, the speed, the performance, it didn't have any capability, mind you, other than reading and writing 4K. But, you know, it was just amazing. You know, speed in and of itself has value. Absolutely. To some extent, in our world today, I would say too much.

Starting point is 00:32:13 I wouldn't say too much, Howard. I'm kind of a performance kind of guy, and you're a performance guy. I mean, we're all kind of flacked and loaded on this stuff. But if the stock exchanges had, instead of saying, first bid that comes in wins, said every 50 milliseconds we're going to batch all the bids that came in and treat them as coming in at the same time. Yeah, it would be a different world. seconds doesn't serve any societal purpose. It just makes the guys who can put their servers a hundred feet closer to the main switch, gives them the advantage of speed and lets them monetize it. Societal advantages, the whole philosophical discussion on this.

Starting point is 00:32:57 Yes. And I, and I could spend, you know, a good hour. We could spend an hour, but we can't do that here but so yeah and so eh storage the other guys that have uh you know a a viable product yeah so acceleron and e8 are both you know pre-standard products when you say pre-standard that seems to imply okay so it's not an nvme or fabric standard that can be used at this point. They're both kind of making their own tweaks to the protocol in order to support what they need to do. More they both came up with their own NVMe over Networks protocols before NVMe over Fabrics was a standard. Right, right. I did hear the Fiber Channel guys did do a plug fest on NVMe over Fabric here in the last couple of weeks.

Starting point is 00:33:43 NVMe over Fabrics with Fiber Channel as the Fabric is, well... Nobody wants. I shouldn't say that. My friends at Brocade will be pissed, but yeah. Cavium and Broadcom, who now owns that piece of Brocade, are interested. Yeah, yeah. But I think there's enterprise interest too. Just because the infrastructure is sunk cost, I think.

Starting point is 00:34:09 Well, there's – it's let me leverage what I know. It's what I know and love. Yeah, and that too. It's a storage thing and it keeps the network guys out of it. And it's secure. Well, I mean any dedicated network is secure yeah yeah i frankly you know because i come from the commercial as opposed to the enterprise world you know the discussions about you know the crew and how you should encrypt your traffic over the fiber channel

Starting point is 00:34:38 in flight between the server and the disk array never made any sense to me because the only time I could make a man in the middle attack on that fiber channel is if I was standing in your data center. Yeah, or burn underneath that to the floor. And if I'm literally standing in your data center, wouldn't it be easier to hack the server than to hack the fiber channel? Yeah. So, you know, there's pieces of fiber channel is secure that's really, it's a dedicated network. It's secure because you can't get to it.

Starting point is 00:35:13 Right. And I appreciate that. And I've done a lot of iSCSI on dedicated switches for that reason. Oh, yeah. Absolutely. But other than that, you know, we get down to security by obscurity. It's like, which, oh, God, I can't believe I'm actually going to say, it's an advantage of tape. All right, we won't go there.

Starting point is 00:35:33 But back to this NVMe over fabric stuff, the adoption is rapid. And, you know, I'm seeing, you know, a lot of these, you can probably get a vSAN ready appliance with NVMe SSDs. Oh, you can. You don't even have to go over fabric. And, you know, since vSAN just talks to the device, as long as there's a driver for it in the hypervisor, you can do that. And I'm sure people will because the price difference between NVMe SSDs and SAS SSDs

Starting point is 00:36:07 isn't all that huge. Not that high, yeah. And certainly for the read buffer, it would make a huge amount of sense. Oh, God, yeah. But I think that, you know,

Starting point is 00:36:17 the interesting thing is to see how much the really low latency of NVMe SSDs brings us back to managing storage in the host as opposed to simply presenting it as a common protocol. You know, we had a, I got excited about this about 10 years ago with server-side caching and SSDs, and it didn't happen.

Starting point is 00:36:46 Yeah, it didn't happen because of a number of reasons. I mean, I think the complexity of integrating that cache with the storage system cache was high. And then who controlled it? It was the other side. Was it a host or a storage side? Well, and that became, was it the host administrator or the storage administrator? So there were political problems.

Starting point is 00:37:08 And, you know, and what I think the biggest problem is just by the time people got around to looking seriously at it, it was time for them to do an array refresh anyway. And a hybrid or an all-flash array solved that problem in one place and there was too much rolling your own of the server-side caching really need to get sod him back to uh to tell talk to us about the caching stuff someday when he gets a chance when he gets his head up of water but uh but but if you look at solutions like e8 where a lot of the volume manager runs in the host, not in the shelf, or Datrium. Yeah, or Acceleron for that matter, right? Yeah, it's all- Well, yeah, Acceleron, it's all run from the client side.

Starting point is 00:37:58 Yeah, absolutely. And so the clients, whether we're going to see a shift of the pendulum back to host-managed storage is an interesting architectural question. Yeah, yeah. I don't see it, Howard. No, I see lots of technical advantages and lots of market resistance. That's an interesting discussion in and of itself. But I also see lots of FUD around NVMe over fabrics and NVMe in general. I see vendors going, only our architecture will work for NVMe. And they're a vendor who doesn't actually support NVMe yet.

Starting point is 00:38:33 And they're arguing, you shouldn't buy that guy who does support NVMe because we'll do it better in the future. I think all this stuff is going to be pretty much resolved over the next year. I think NVMe over Fabric standards will ultimately emerge that support high availability, fault tolerance, and all the capabilities that are necessary. Looking at the NVMe group roadmap, it's 18 months, I think. So it's a year and a half away. From NVMe over Fabrics is really ready for prime time. They need to include things like multi-path support. And today it's like, well, you can go to the eight or Accelero and not worry about it because they've solved that problem their own way and it's proprietary. Or you can use standard NVMe over Fabrics protocols and have to deal

Starting point is 00:39:23 with it yourself somehow. So I think – I can't believe it's going to take 18 months, Howard. I just – I can't believe it. Well, that's till the – The world has actually adopted the standard that actually implemented that 12 months ago. Yeah. Well, that's till the standard is fully approved. You know there's always a nine-month period between –

Starting point is 00:39:44 Okay, we wrote the draft and now we're voting on it and somebody objects – and somebody said that that comment is misplaced. So we have to do another round of voting and a round of voting takes 30 days. But it's – there's that period where you have the draft. You can write code to the draft and you know there's not going to be any changes that require more than a patch. Right, right. Technicals. Yeah, yeah, yeah. Right. Technical impact changes, yeah. But what I'm seeing kind of faster is NVMe over Fabrics on the backend.

Starting point is 00:40:20 Oh, yeah, where you've got a little bit more control over the whole world. Right. You don't, you know, I don't need to have multi-path support built into the protocol. If it's my host, if it's my disk array controller talking to my JBoff, then I can just have the logic in the disk controller, try multiple paths using whatever code I want. I guess the question then becomes, does that high-performance backend translate to front-end low latency? I think there's some level of it does, but not all of it, obviously. Well, certainly adding data services in the form of a controller is going to add latency. It's going to impact this, yeah a controller is going to add late it's going to impact this yeah going to add some latency the question is are the data services worth the

Starting point is 00:41:09 latency yeah but you know i see two years two or three years from now a data center architecture where we have nvme over fabrics jbofs that cost about what a SaaS expander JBOD costs today, because there's several vendors reducing all of that to a chip. And the cloud native applications in my data center, the stuff that I've written in the past year or two, where resiliency is an application level problem, not an infrastructure level problem, talk directly to those JBODs. And so those servers each make sure that it writes three replicas to three different JBODs. infrastructure level problem, talk directly to those JBoffs. And so those servers each make sure that it writes three replicas to three different JBoffs. Without a controller? Without dual controllers? Without a storage fabric? What? Well, I mean, that's how cloud native

Starting point is 00:41:57 applications work. That, you know, everything, you know, at that resiliency runs at that, at the application layer. Now the problem is testing resiliency individually in each application in your data center is really expensive. So. Yeah, but if you only have a couple of applications, it's okay. If you have a small number of applications with a large number of users, then. Yes. Then that makes sense.

Starting point is 00:42:23 Development and deployment model makes sense. Development and deployment model makes sense. But you're always also going to have a large number of applications, each of which has a small number of users. And for those applications, we can have controllers that run as VMs, software-defined storage, that consume that same Flash

Starting point is 00:42:41 and whatever comes after Flash from those JBoffs. Amb presents it out with data services to the applications that want data service right but it's all infinitely composable that i can allocate in ansible or puppet slices of a hundred gigabytes out of a jBOF to a controller that is going to then present it maybe as fiber channel, maybe as NVMe over fabrics. Can I feel as a startup in here someplace, Howard? Well, Caminarios announced something like this. Oh, gosh.

Starting point is 00:43:16 Yeah. But there are ways away. Yeah. And I have to admit, I'm working with them on the idea and on some projects. Okay. Listen, Howard, we're running up against our time limit here. But let me, you know, before we run out of time, let me say, I see this coming from other people too. That's good.

Starting point is 00:43:34 That's good. Are there any last statements you'd like to make to our listening audience? Merry Christmas to all and to all a good night. Okay. This might actually be released after Christmas. So happy holidays, whatever your worldview is. Next month, we will talk to another

Starting point is 00:43:51 systems storage technology person. Any questions you want us to ask, please let us know. And if you enjoy our podcast, please tell your friends about it and please review us on iTunes as this will also help get the words out.

Starting point is 00:44:02 That's it for now. Bye, Howard. Bye, Ray. Until next time. That's a wrap. Woo-hoo!

Your Ad Here

Grey Beards on Systems - 55: GreyBeards storage and system yearend review with Ray & Howard

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.