Grey Beards on Systems - 73: GreyBeards talk HCI with Gabriel Chapman, Sr. Mgr. Cloud Infrastructure NetApp

Episode Date: October 26, 2018

Sponsored by: NetApp In this episode we talk HCI  with Gabriel Chapman (@Bacon_Is_King), Senior Manager, Cloud Infrastructure, NetApp. Gabriel presented at the NetApp Insight 2018 TechFieldDay Extra... (TFDx) event (video available here). Gabriel also presented last year at the VMworld 2017 TFDx event (video available here). If you get a chance we encourage you to watch the … Continue reading "73: GreyBeards talk HCI with Gabriel Chapman, Sr. Mgr. Cloud Infrastructure NetApp"

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Howard Marks here. Welcome to another sponsored episode of the Graybeards on Storage podcast. This Graybeards on Storage podcast is brought to you today by NetApp HCI and was recorded on October 23rd, 2018. We have with us here today Gabriel Chapman, Senior Manager, Cloud Infrastructure. So, Gabriel, why don't you tell us a little bit about yourself and what's new at NetApp HCI? Yeah, my name's Gabriel Chapman, obviously. I have been a Senior Manager here at NetApp, working and focusing on our cloud infrastructure products,
Starting point is 00:00:44 one of which includes our NetApp, working and focusing on our cloud infrastructure products, one of which includes our NetApp HCI product line. I've got about a 22-year background. While I can't grow a beard, when it does come in, it does show up gray. So I think of a new entrance into the gray beards club. We'll make you an honorary member. My background is I started working in IT as a tech support engineer in 1996 for a bulk mail software company and spent about 15 years in the end user space and then got lured into the dark side of vendors.
Starting point is 00:01:16 And I enjoy it, right? I don't know if I can ever go back at this point, but for now I still kind of like and enjoy seeing how the sausage is made and being part of bringing products to market. And the NetApp HCI is a solution based on the SolidFire acquisition? Is that pretty much so? Correct. I came to NetApp via the SolidFire acquisition.
Starting point is 00:01:39 The NetApp HCI product is based on Element Software, which was kind of the intellectual property, the prime mover behind what the SolidFire product technology is. And the Element Software solutions really encompass today the SolidFire all-flash array and the NetApp HCI hyperconverged or hybrid cloud infrastructure technologies. And there was a lot of discussion about some of the new capabilities that are coming out at the, I guess I'd call it the cloud side of things. And you're starting to bring some of that stuff to the HCI as well?
Starting point is 00:02:11 Right. You know, we look at NetApp as making the transition over the last couple years under George's management to kind of refocus. And we've kind of replatformed the company into kind of three areas. One focused on our traditional on-tap technologies and what we're calling cloud-connected flash through leveraging the concept of data fabric. We have Anthony Lai's organization running cloud data services with our integration with the hyperscalers and the services that they're taking
Starting point is 00:02:38 and putting up in the public cloud space. Cloud infrastructure, our primary goal and focus is to build private clouds with customers or build cloud infrastructure with customers. But also potentially act as a landing point for those same cloud services that we're putting in the public cloud space. Okay, so some of those services today like cloud volumes and Kubernetes as a service, I saw it was kind of a preview of that at the show as well. You see that flowing out to the HCI as well? Yes. Ultimately, I look at it as an infrastructure vehicle or a consumption vehicle that delivers
Starting point is 00:03:11 services on it. We can still provision your traditional infrastructure workloads in a virtualized manner, but we do see this major shift towards containerization and cloud native application design and structure. If I want to go to where the future is going, where we see everybody kind of embracing the public cloud and then a little bit of the public cloud reaching into our own data centers, there's a synergy between those two. And I think offering services that are enterprise class, that enterprise customers are accustomed to using and are familiar with in the public cloud space
Starting point is 00:03:45 makes a whole lot of sense. And still being able to, in turn, take those services and bring them down on-prem affords customers an operational model that's much more efficient than having to re-platform, re-tool, and rework all the work that they've done in the past. It's kind of a different game, because I'm also people are moving their applications from on-prem to the cloud. It's kind of a different game because most of the people are moving their applications
Starting point is 00:04:05 from on-prem to the cloud. Here's an opportunity of actually moving them back to an HCI solution, ultimately, right? Well, which is what you need to do when you discover how big the public cloud bill becomes. I hear a lot of the public cloud is big, Bill. Bill being unruly, I hear that often. And I think that the reality is I don't think there's any one right way to do it. I think there's the right way for your company to do it. And some of those right ways would be to run things in the public cloud space. Another way would be to run it on-prem and have absolute control over it
Starting point is 00:04:39 and more control over your own destiny. And I think giving customers the flexibility and choice to do that and make those decisions for them to gain a competitive advantage is key. And for most people it's really going to turn out to be some of both. And that's the whole concept behind hybrid cloud and multi cloud and having a cloud strategy. When we talked with our partners they're all focused on a multi cloud strategy and how they get their customers prepared to go into that space. Not everything's going to go up there.
Starting point is 00:05:07 We had the lift and shift conversation today. I mean, it's prohibitive. But if it gives you an edge, if it actually can be viable for you and you can make it work, it could be to your advantage. It only takes time and effort to do it. Yeah, even for my own stuff in the lab, we collect data and then push it up to AWS so I can analyze it in EMR because setting up a Hadoop cluster was just more trouble than it was worth. Yes, and now Hadoop has clustered itself with Cloudera and Norton coming together.
Starting point is 00:05:32 Let's talk a little bit about how NetApp HCI is different. And we could talk about the architectural part, but you mentioned something in the presentation earlier today about ONTAP Select. And I've always found it funny that the HCI solutions first gained some traction with VDI. But VDI is such an intensively file services-based application model. And most of those guys didn't do file services. Well, building a file system is hard. I think it was my first or second week
Starting point is 00:06:05 at NetApp and I didn't know Dave Hitz. And so I'm sitting next to him at lunch and we're talking about things and I'm like, and somebody brings up something like, you know, they get a new file system. Like, you know how hard it is to build a file system? And I'm looking at Dave and he's like, why, yes, I do. For those listeners who don't know, Dave Hitz was a founder, is a founder at NetApp and essentially wrote waffle NetApp's file system. I was an architect. You know if I if I look at it from that approach right I look across the ecosystem right side plays most people are using NFS as a transport because it's easy to aggregate it's treat VMs as objects and you can manipulate them a little easier we are a
Starting point is 00:06:43 block storage protocol with the Element Software solution. And so we have a different, a little different bit of approach. The goal in our minds was that we should be able to take anything that we do and offer it as a consumable service or a virtual machine. And ONTAP Select was built in part to basically give you all the goodness of ONTAP and rich NFS and SIFS and SMB integration points. And for certain customers, especially like in the VDI space, it makes sense to offer that as part of the deployment mechanism.
Starting point is 00:07:16 So for a customer- Well, for the VDI space, for the robo case, it's a file server is almost always the first server that goes- Right, and so instead of running FreeNAS, I can give you something that's got some enterprise functionality in it. And I can help put a metro cluster on top of it, leveraging those types of services. I'm not saying it's perfect for everything. I think for smaller implementations, it totally makes sense. Obviously, cost is one issue, but feature functionality trade-offs are another. And there's nothing that really stops us from saying, well, hey, you have two terabytes here of file services that you really want,
Starting point is 00:07:52 and then the rest of your stuff is block, and that's fine. If you want to get more performant or robust, I have an entire portfolio of fast solutions I can tack onto this system. So it's not like we're saying, hey, we're only going to go down one route. We're going to be flexible and open because I have a portfolio. Right. Well, this is the advantage. One of the things that SolidFire had brought to the table that was kind of unusual is a fairly sophisticated quality of service perspective. Does that stuff flow into the HCI? Absolutely. There is no difference between the all flash array and the HCI construct. It's just a different packaging exercise for the storage nodes. In the HCI, if you go with the 2U appliance model that's four sleds,
Starting point is 00:08:33 you have a half-width 1U storage blade that essentially sits inside there, and it's going to have six disks instead of 10 or 12 that the 1U components have. Now, we don't stop you from taking existing solid fire that sits on your floor and integrate it into the HCI or new solid fire in the new systems or nodes that we've announced this week, the 610s. We can integrate those into the compute piece as well. We're expanding our compute options as well,
Starting point is 00:09:01 so we have a new GPU, a 2U node for GPU accelerations for VDI use cases. So we something that comes up common in discussion, and we had this discussion with a really large financial services company today. Does the 2U, 4-node solution make sense for you as a customer? And for some of them, they're like, no, it absolutely does not. We want a commoditized 1U pizza box that either has storage or doesn't, and we want to scale those resources according to that way. We storage or doesn't, and we want to scale those resources according to that way.
Starting point is 00:09:46 We want racks of servers, and we want racks of storage, and that's how we architect and design our solutions. Now, can we get the benefits of the underlying ElementOS software, you know, rich QoS? We have an industry-leading QoS. Nobody's really able to set min, max, and burst and scale it the way we do and guarantee performance across mixed workloads. It's one of the things. You've definitely got the best control in the QoS.
Starting point is 00:10:09 There's a little complexity that comes with that. Well, you know, as we start to automate more of that, like through QoS policies, or if we leverage it with vVols and storage policy-based management and creating, you know, policy-driven automation around the storage piece, even working towards something that's more dynamic, you know, policy-driven automation around the storage piece, even working towards something that's more dynamic. You know, we have customers do some really sophisticated things with our APIs based on triggers that, you know, we would like to productize, but, you know, we would like to do it a different way.
Starting point is 00:10:36 You know, and then that really boils down to when we look at the HCI landscape and the choices architecture that most people have made, it is a shared core model that combines storage and compute in every single node that you're provisioning for the most part. And that's good for low point of entry, and it's good to a certain point, but then we start to see stranding of resources, and we see lots and lots of silos of little clusters being managed by a cluster manager, but we don't see a commonality. And the more more you share resources the more you have contention for them.
Starting point is 00:11:08 Well that's just it. If I don't have to contend, I can isolate performance and SLAs around compute and memory real easily with the hypervisor controls but I wasn't able to do it the storage layer quite as effectively. So when I go in there I look at it's like I can a provision just the proper amount of storage for your applications, the proper amount of storage for your applications, the proper amount of commute for your applications, and I can mix workloads with confidence without having to worry about contention. Then I'm starting to leverage infrastructure the same way I do in the public cloud space. I think that's what's attractive.
Starting point is 00:11:37 And then also SolidFire is very API-driven from a perspective of having all the capabilities pretty much driven by APIs, and all that flows through the HCI as well. Correct. If you go back in the history of SolidFire, the first three releases didn't have a UI. What? It wasn't until we actually had to demonstrate the product that we had to skin an API. Basically, the UI is just an API. Basically, all the UI is just an API
Starting point is 00:12:07 call mechanism. Which is how it should be. Which is how it should be. But it's interesting looking at, you know, SolidFire started out selling into
Starting point is 00:12:14 the solution provider market. And we weren't selling to storage people for one. No, but corporate data centers are becoming
Starting point is 00:12:23 much more orchestrated. So these things are becoming much more important in the corporate world. It's really interesting to watch solution providers, solutions, and high-performance computing ideas move their way into the data center. Right. We're seeing a blurring of some of those concepts coming down. I look at it for the truly unique guys out there, like the Googles and Facebooks. I mean, obviously, they might as well be their own server manufacturer of record. The traditional enterprise, while they look at that and go, gosh, we could do just like that,
Starting point is 00:12:56 then welcome to supply chain. It's something you've never done before. The whole idea that corporate America should run their data centers the way hyperscalers run their data centers is just wrongheaded to me. Look at what they're doing with HCI and all that stuff. It's moving in that direction. They're trying to get there. And the Kubernetes as a service being deployable on an HCI is a very interesting solution.
Starting point is 00:13:17 There are lots of pieces of what the hyperscalers do that make sense to adopt. But the key difference between a hyperscaler and a corporate data center is the number of applications that you run. If I'm Chevron with 50,000 applications or if I'm Twitter with one. Or you're Facebook with six. Right. And you know what your core competency is and you know the things that you're really
Starting point is 00:13:40 good at and they can design a server to do just what it needs to do for their operations. Whereas we're taking enterprise class, all the resiliency and redundancies built in those systems. I mean, you saw HP go into CloudLine and kind of go in there and then kind of pull back. And there's either people who want dirt cheap that I've kind of put together myself, or you have other people who want
Starting point is 00:14:04 total UCS management and all the bells and whistles. Yeah, I was in, we were in a session earlier with George and talking to him and had a question about, you know, scaling the speed of development from, you know, normal ONTAP six months to something lower. And he says, well, in the cloud, there's really only one application I have to work with versus, you know, if I'm going to try to put that on all these appliances out there with a gazillion applications, it's a different game, really. And honestly, do my customers want to update their infrastructure every six weeks?
Starting point is 00:14:33 That soon? No. No, no. Probably not. We've pulled back to where we're doing about one major update every eight months. Oh, really? Okay. For solid fire.
Starting point is 00:14:41 Even though you guys are doing it a lot sooner, right? We do like a single, like usually there's an annual point release, like the big one. So we're going to go from neon, you know, oxygen to neon or something of that nature, the big releases. And then we'll have a couple like auxiliary releases that either, not necessarily bug fixes, but sometimes we'll introduce new features. And that's kind of the course for the year, realistically, because, you know, I think updating your environment twice a year is probably about as much as most people's haul rate.
Starting point is 00:15:13 From an enterprise perspective, yeah. But then there's other people like, you know what, I have this content that just talks about the duality inside most organizations. I have the people that are resistant to change. I have the people that only want change. And every organization has both, right?
Starting point is 00:15:31 There's the people who are going to download the open source software and put it on there and try to put it in production. And then I got people that are sitting on, their firmware on their switches is two years old because they're hesitant to update it. Well, that's it. It ain't broke, don't fix it.
Starting point is 00:15:45 But my favorite example of that is, I was working at an advertising agency, and the last Mac, skinny little screen Mac, we took that off some woman's desk and gave her a brand new Mac with a color screen, and she went and cried. You took my friendly Mac with the smile away. Talking about HCI, you mentioned that you've recently come out with a GPU-intensive version for VDI.
Starting point is 00:16:13 Yeah, we have a partnership with NVIDIA. And we noticed that, you know, end-user computing, virtual desktop type solutions are very, they're gaining more traction and adoption, especially specific verticals. We see them a lot in healthcare. We see them a lot in design work. And realistically, we looked at the partnership that we have with NVIDIA and we kind of, we built the proper box to do, you know, X number of desktops, depending on the type of work classification you're looking at. So now I really can run AutoCAD and NVIDIA? Well, yeah, you can. And then the beauty of behind that is it's never, in my viewpoint, it's never a cost saving in like removing a desktop off somebody's, you know, there's management efficiencies that you gain there and you can reduce some costs there.
Starting point is 00:16:56 But it's a security play, right? Right. me work out to India or China or something like that, instead of having to move terabytes of files there and possibly lose control of them, you can remote in and do your design work locally in my own data center where I have absolute control and I know you can't take anything. And then if I lose that laptop, I haven't lost all of those designs because they're not on the local disk. Exactly. And that's the bigger play for it.
Starting point is 00:17:22 I see it a lot in hospitals where we've had some very good success with some large VDI implementations. Yeah, well, hospitals are kind of the absolute best case for VDI. Yeah, or design studios and things of that nature. Mm-hmm. Yeah, it's interesting. DreamWorks was one of your major discussions here, and it was interesting. Yeah, the partnership with DreamWorks has definitely expanded and grown beyond just kind of the customer and client relation
Starting point is 00:17:48 and into more of an actual technical partnership. You heard him discuss it today on the main stage, and I think that that's compelling and interesting. Scotty and the group over there, amazing technologists, and it's always a pleasure to go and have a discussion with that group. So the HCI solution is a four configuration with half a sled is effectively a storage solution and the other three are available for compute? It's actually, you basically can have a start
Starting point is 00:18:18 with a blank chassis and you can choose the configuration of storage and compute in certain ratios. So we have to meet a minimum config of two compute nodes. Okay. So for VMware high availability. And then we start with a minimum of four storage nodes because we like high availability and we like the ability to tolerate failure.
Starting point is 00:18:36 And from that point, you can scale to meet the hypervisor maximum. So I can scale to 64 compute nodes in a given cluster. I can scale. We usually put an arbitrary limit about 40 storage nodes right now. And that's realistically, I mean, like, is petabyte a good failure? So I thought, oh, a fairly large failure domain? I mean, for the places where HCI works well.
Starting point is 00:18:57 A petabyte would seem to be more than that. At some point you kind of go, I'd rather have four small clusters than one big cluster, just for failure domain management. So there are a couple of ways to do that. And the nice thing about the solution is I can, let's say I had 40 compute nodes. I mean, I don't have to run all those, I don't have to run VMware. They can run a different hypervisor. They can run a different operating system.
Starting point is 00:19:20 They can run multiple clients and have one single common storage plane underneath and get the efficiency of the scale. I just got that. Because, I mean, there are other HCI vendors that will run multiple hypervisors, but not at the same time. On the same storage footprint, right? So if I'm running KVM and I'm running Red Hat
Starting point is 00:19:38 OpenShift and I'm running Kubernetes and I'm running VMware, I'll bare metal effectively on different small little numbers of clusters inside the master, you know, the footprint that I've put to the rack. But I have one single storage domain where I have global inline dedupe compression
Starting point is 00:19:54 and provisioning. And that means I can do things like generate data on nodes on one of those platforms, take a snapshot, mount it on the other platform to analyze it. And vice versa. Well, but not have to copy the data. Yeah, or clone it.
Starting point is 00:20:10 Right. And so that's when I look at, like when we were talking about the architectural differences, I think there's a lot of nuance in there that gets lost until you actually sit down and have that discussion. Yes, it's great that I can manage 20 HCI clusters that are all thing, but I don't have a common plane of data
Starting point is 00:20:26 that I'm working with. All right, gents, this has been great. Gabriel, Howard, is there any last questions you have for Gabriel? I did want to come back, you were talking about form factor other than the four node 2U high density. Is that roadmap or are you guys?
Starting point is 00:20:42 Yeah, we have some roadmap stuff around. Well, obviously we have a different compute node. We still take the the the triton solid fire node the 610 right the one that's the one the one you we we can take the one you and connect it to the hci we can take the the the half width and use it as part of it okay but but today i can't buy all one use and get the packaging yeah the compute part isn't on the one you right now it's something we would obviously want to look at investigating is as we start to explore and expand and get more towards a configure to order type of yeah well especially as we get into things like GPUs where slot right slots
Starting point is 00:21:17 be different cards critical factor yeah and that's obviously the first use case around the one you possibility would be a different type of GPU acceleration or maybe a different use case for GPU acceleration. Okay. Gabriel, anything you'd like to say to our listening audience? Other than come check out NetApp, come check out NetApp HCI, come take a look at cloud.netapp.com and take a look, poke around, go deploy a Kubernetes cluster. You could do that today. There's a lot of stuff coming down the road that I think will really impress people.
Starting point is 00:21:46 Okay. Well, this has been great. Thank you very much, Gabriel, for being on our show today. Thank you. And thanks for sponsoring this podcast. Next time we'll talk to another system storage technology person. Any question you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it. And please review us on iTunes and Google
Starting point is 00:22:02 Play, as this will also help get the word out. Remember, five stars are out. That's it for now. Remember, five stars are good. That's it for now. Bye Howard. Bye Ray. And bye Gabriel. Bye guys. .

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.