Grey Beards on Systems - GreyBeards talk converged infrastructure with Dheeraj Pandey, CEO Nutanix

Episode Date: October 2, 2013

Episode 1: Converged Infrastructure Welcome to our inaugural GreyBeards on Storage (GBoS) podcast. The podcast was recorded on September 27th, 2013, when Howard and Ray talked with Dheeraj Pandey, CEO... Nutanix. Our first podcast ran to ~48 minutes and was a broad, wide-ranging conversation that discussed everything from the specifics of Nutanix solutions to broader … Continue reading "GreyBeards talk converged infrastructure with Dheeraj Pandey, CEO Nutanix"

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everybody, Ray Lucchese here. And Howard Marks here. Welcome to the inaugural version of the Greybeards on Storage monthly podcast. The show where we get Greybeard bloggers with storage and system vendors and startups to talk about upcoming products, technologies, and more trends that are affecting the data center today. Now, if you're a startup or a vendor representative and want to be considered for a future Gray Beards podcast, feel free to contact either myself, my Twitter ID is Ray Lucchese, and
Starting point is 00:00:39 email is ray at silvertonconsulting.com, or you can also contact Howard, whose Twitter handle is deepstoragenet, and his email is hmarks at deepstorage.net. This is the first episode, and it was recorded on September 27, 2013. We have with us here today Dheeraj Pandey, CEO of Nutanix, the virtual computing platform, to discuss hyper-converged systems. Dheeraj, why don't you tell us a little bit about yourself and your company? of Nutanix, the virtual computing platform, to discuss hyper-converged systems. Dhiraj, why don't you tell us a little bit about yourself and your company? Thank you, Ray, and thank you, Howard, for the opportunity.
Starting point is 00:01:18 And thank you, everyone, for listening to this podcast. This is Dhiraj Pandey, and I'm the founder and CEO of Nutanix. At Nutanix, we build what we call the virtual computing platform, and we'll talk more about the platform itself in due course. Big picture, the company was founded almost four years ago, and we're about 300 employees now across 25 countries. And we've raised more than $72 million from VC investors in the last four years. The company actually builds a lot of storage on the server side. So it's what people call software-defined storage on the server side.
Starting point is 00:02:03 And the virtues for the product are that it's scale-out and software-defined. It's converged as well, so we'll talk about convergence in the next 10, 15 minutes. And that's it. Look forward. Okay. So what exactly is hyper-convergence in your mind? Well, people have been talking about convergent infrastructure for the last 18 months or so. The true essence of convergence is time to value, speed to market, and also a single throat to choke. On one end of the spectrum, if you look at AWS, Amazon AWS, you can spin up a virtual machine in seconds through APIs and programs. On the other end of the spectrum is what people have been doing for the last 15, 20 years, the idea of stitching together infrastructure from scratch,
Starting point is 00:03:05 buying storage, buying servers, buying switches, and putting them all together. And sometimes these projects, infrastructure projects, can run into months before you can even spin up your first virtual machine. So we live in this day where public cloud computing can make things really agile. And the idea around convergence is to figure out how private cloud computing can be just as agile. If you look at VMware, it's talking about software-defined data centers. Now, the essence of software-defined data centers
Starting point is 00:03:46 is that the hypervisor is the new operating system of the data center, and everything runs on top of the hypervisor. So in the last 10 years, we have virtualized a lot of business applications. And going forward, we'd see a lot of other data center services like storage, like networking, like security, also be virtualized and then put on top of Hypervisor.
Starting point is 00:04:11 Now, once you do that, you've basically converged all the data center services, including applications, on the server itself. And to us, that is the true essence of convergence and also what software-defined really means. You can spin up new services, new firewalls, new storage controllers, new load balancers, using APIs and programs as opposed to racking and stacking them manually using lots of arms and legs. As a former consultant, I really have mixed minds about this. I mean, you're taking food out of my mouth. I don't get to spend six months building a new infrastructure.
Starting point is 00:05:02 Yeah, I think that's actually, it cuts both ways. If you look at technology in general, what's right for the customer is what's right for the channel and what's right for the consultants, because the enemy is not convergence or the fact that you can do things fast. The enemy is budgets, and the enemy is how long it takes to put things together. I think IT's biggest enemy is time. When you look at the consumers of IT department, developers and line of business departments you know they are looking for agility and if IT and consultants who serve IT they hide behind the inefficiencies of professional services consumers will anyway bypass IT and go to let's's say, Amazon or Rackspace or
Starting point is 00:06:07 Google or something like that. And I think it happened with business applications like CRM and ERP and increasingly in human resources and accounting payroll. All that stuff is moving to the cloud because a lot of things were not agile with IT. So I think it's time to look beyond the last 10, 15 years. And, you know, this is also an opportunity for the professional services people to bring up the value of their skill sets to things at a higher level than just stitching together infrastructure. It's like, you know, if you think about 20 years ago, we used to write, you know, programs in COBOL and even assembly languages. And over time, we have abstracted those things out into higher level languages like Java and Ruby and Python and things like that.
Starting point is 00:07:04 And in some sense, that's what's happening to the world of infrastructure as well. I think consultants and value-added resellers who actually do respect their own services will realize that this lifts off everybody, and they start to think about higher value add as opposed to the lower level value add. You know, one thing you said, Dhiraj, about the new data center operating system, it's kind of an interesting concept here. I mean, it's trying to incorporate just about everything in the data center, not just applications anymore, but all the other infrastructure services into a software functionality layer. Is that how you see it?
Starting point is 00:07:55 Yeah, I think what we've seen with server consolidation in the last 10 years, which was more of a horizontal consolidation, this decade is about vertical consolidation, where all the tiers of the data center can actually live in the hypervisor on top of virtualization. And that's how Amazon is doing it. Amazon has stopped buying pizza boxes. They don't have people
Starting point is 00:08:25 racking and stacking new firewalls and new VPN servers and new load balancers for a new tenant that they actually add programmatically. I mean,
Starting point is 00:08:38 tenants like small businesses or developers or DevOps groups, they actually just sign up with Amazon without even picking up the phone. And an entire virtual private cloud comes up programmatically. It's all provisionable through APIs. So I think that's the day and age that the public cloud world is living in.
Starting point is 00:09:01 That's how the infrastructure, the service folks are doing things. And I think it's time that some of that actually bleeds over into the enterprise as well. And the other thing you mentioned was the biggest enemy of IT is time. That's kind of an interesting concept. It's time to deployment. It's time to service. Is that how you see those sorts of things? Yeah, I mean, what I meant by that is agility how you see those sorts of things yeah i mean what i meant by that is agility you know how quickly can you bring things up and i mean if you think about on the consumer side you know you and i as consumers we are always looking for
Starting point is 00:09:37 things as quickly as we can get them and the whole paradigm shift with mobile devices and things like that, the apps that we have for everything in life, even for simple things like renting a car through what's happening with Uber and companies like that, Airbnb and all that stuff. So what's true on the consumer side is also true on the enterprise B2B side. People just value time and they're willing to pay for it. They want things to be done faster because agility is king and everything. And I think...
Starting point is 00:10:19 Time is ultimately the most constrained resource regardless of who you are. Yeah. There just isn't enough time. And anything we can do that eliminates steps from things, even such basic things as ordering a V-block instead of putting together the list of parts that you need to order means you have time to go do something that somebody else can't do. Well said, Howard. I think people are willing to pay for it.
Starting point is 00:10:50 That's the way to look at it. Yeah. So I guess, mentioning vBlock, so how does something like Nutanix compare to vBlock and vSpecs and FlexPod and FlexSystems, et cetera, et cetera? I mean, is it taking it another step beyond what those systems offer? VBlock and FlexPod and some of these converged infrastructure products out there, they're in the right direction. They're trying to solve problems that IT faces today, and I think it validates
Starting point is 00:11:26 what Nutanix is in a big way as well. They decided to put coalitions of companies together, like VCE is VMware, Cisco, and EMC. Blackspot is VMware, Cisco, and NetApp. They're trying to come together to bring together disparate technologies into one chassis, into one rack. And I think it is going and addressing this problem. And eventually it's about time to value and also agility. And perhaps even one more thing, which is single throat to choke. But I look at uh them as uh bandits you know it's like uh if you roll back six seven years when salesforce.com was talking about cloud oracle and ibm said oh we've been doing cloud for 20 30
Starting point is 00:12:19 years now you know oracle on demand is basically. So if you don't want to run something on premise, we can do that for you. And IBM, you know, had similar services via global services and so on. And they were trying to poo-poo Salesforce.com for, you know, coming up and spearheading the whole idea of software as a service or cloud computing. But if you look beneath the covers, they were day and night different. I mean, what Oracle on Demand was, was nothing more than outsourcing services to Oracle. And Oracle is still running, you know, hundreds of instances of e-business suite and Oracle database and things like that, which are all silos. So there's like one silo per customer, and it just didn't have economies of scale. On the other hand, Salesforce had built a truly multi-tenant cloud, and I think that was the essence of the cloud itself. In the case of Nutanix, it's very similar. I think in the spectrum, you can think
Starting point is 00:13:25 of, or in the continuum, you can think of vBlock as a converged infrastructure, but you can push this a whole lot towards software where things are truly converged in the sense that, you know, you don't need separate storage appliances and separate compute servers, can bring them all together via software on this x86 platform. And you share the hardware between business applications and storage controllers and other such services like firewalls and VPNs and load balancers and such. But more than just that, I think the idea of pay-as-you-grow is is very relevant in this discussion vblock and flex pod again the build silos in two vblocks cannot spill over into each other and they cannot fail over into
Starting point is 00:14:19 each other transparently unless you're using some software on the server side like VMware file system. But then again, it doesn't work with OpenStack, KVM, or other hypervisors. So I think the true essence of what Nutanix has done is the idea of scale-out, the fact that you should consume infrastructure elastically and fractionally. So you can start small. I mean, Nutanix can start as low as $50,000, and then you can keep adding more and more nodes to it and basically results in a very elastic infrastructure,
Starting point is 00:15:01 which I think is at the heart of next generation data center architectures as well. So that's... Sorry, go ahead, Howard. From where I sit, things like vBlocks solve the procurement problem, but they don't really solve the operational problem. And if you're going to go set up vCake and let your user departments actually do on-demand creation of VMs to host their applications, you don't know what they're going to do ahead of time. You need a lot more flexibility in being able to expand the system. You don't think Vblock and those other certain levels of automation that go beyond the pure
Starting point is 00:16:08 VMware data center solutions. There's an additional software layer, but if you buy a – I'm not going to know offhand what the model number is, but if you buy the vBlocks that's six ucs servers and the associated switching and that vnx it's those servers and that vnx and and if you use that to set up a private cloud and it's more successful than you expected in six months in it that configuration's almost full you can't really add performance to that vnx you You've got to get another one. And if you're using software— You've got to get another Vblock? Is that what you mean?
Starting point is 00:16:53 Yeah. Yeah, yeah, yeah, yeah. Okay. And if you're dealing with a scale-out system, if it's more successful, you just order more bricks. Yeah, yeah. And I think this is, again, at the heart of the discussion here, the idea of software-defined, which means different things to different people. But to us, it actually means everything is VM-aware. You don't deal with LUNs and volumes and file systems and constructs that were built for the last generation, which was around physical servers.
Starting point is 00:17:29 If you set up VK and your storage management is not on a pervium basis where you can take backups on a pervium basis, you can show flash love on a pervium basis or you can set VR policies or snapshots or clones or dedupe policies or compression policies on a pervium basis, then you're not truly a multi-tenant. It's important to realize that hardware has to stay undifferentiated in this new world of a multi-tenant cloud. And then you have to basically set policies on virtual entities like virtual machines, as opposed to on relatively physical constructs like LUNs and volumes and file systems. Go ahead, Howard.
Starting point is 00:18:21 Doesn't that ultimately mean that we need to do things like storage DRS? Excuse me, storage QoS? Yeah, absolutely. I think QoS is one service which basically brings differentiation in terms of performance and how you want to throttle a set of VMs over others, how you want to guarantee IOPS over others. And there's 10 other things like that. Like, you know, as I said, even DR policies have to be set on a per VM level. You must be able to say that these set of VMs will have the replication throttle
Starting point is 00:18:59 because those other set of VMs are actually going through. So quality of service goes end-to-end in all workflows, not just in the direct IO path of NFS or Fiber Channel or iScali, which is just the user request, but goes end-to-end, including things like backup throttling or quality of service around backups or archiving to Amazon and network QoS and things like that. So it's a very rich and very profound service. Yeah, you have a much finer-grained version of multi-tenancy than what I'm used to.
Starting point is 00:19:41 And I look at multi-tenancy from a storage perspective as defining virtual control units with virtual LUNs and that sort of thing and assigning those to almost application centers that could be potentially across multiple VMs, et cetera, et cetera. It's almost as if the multi-tenancy from your view is the VM is a world. And it can be hosted on just about any storage, any hardware out there, or networking, or server, or storage. But the VM world is sacrosanct and separated and distinct from all other VM worlds out there. Is that kind of how you see it?
Starting point is 00:20:22 I look at a VM as the smallest unit of provisioning and the smallest unit of performance management and debugging and things like that. It's like if you look at a multi-tenant software-as-a-service application, like, say, Salesforce, You can design the schema of Salesforce where every customer has a table of their own, like an Oracle table or something of their own. Or you can say, you know what,
Starting point is 00:20:53 all these customers will have rows that sit in a single table in this large Oracle database, something. So the economies of scale come more when you start to share all these things across different tenants or customers. And so, again, it's a continuum of how well you designed your multi-tenant app itself. And in the case of Nutanix, we believe that the smallest quote-unquote row is a virtual machine. And if you can go and performance manage it and analyze it and visualize it not just a single vm but even a collection of vms a single vm is is the degenerate case of this but
Starting point is 00:21:33 you think of a collection of vms and and uh you think about performance managing them quality of service uh backup clones dr snapshot all that stuff at that level. That's where the true value of multi-tenancy comes in. We need to shift the unit of management from the LUN, which was a convenient abstraction, to the application, and that application is made up of some group of VMs. You know, moving to the VM is a good first step. And I still can't believe that, you know, when VMware did VAII for NFS, they didn't include a primitive for make a snapshot because we now have almost half a dozen vendors that can do per VM snapshots in their storage system.
Starting point is 00:22:35 And they're really useful, but they're not part of the mainstream workflow yet. In fact, that's what I meant by a collection of VMs because you can logically take a collection of virtual machines and that makes your application. And then I think if I understand you right, Howard, you're saying that then you can go inside the VM itself and look
Starting point is 00:22:58 at the behavior of Linux and Windows services and processes and try to understand how they're doing what they're doing. I think from the infrastructure layer, a collection of VMs is a good start, but definitely one can go and even understand remote procedure calls and things like that
Starting point is 00:23:18 across two services where a service is a Java VM inside a Linux VM, and the other service is a Windows service running inside a Windows VM. Yeah. So just for a second here, let's get off to software discussion. Can you tell me a little bit about what sort of hardware ships with a Nutanix system? Yeah. So, you know, in Nutanix, almost all the IP is in software.
Starting point is 00:23:45 Of course. But we made a decision early on that we have to bring this as a merchant-grade product that is bought and sold through the channel, you know, one-tier or two-tier distribution model. And it was important to reduce friction at the time of delivery and deployment. So in the mid-market and also for the higher end of mid-market, and even for large customers, for their remote offices, branch offices, an appliance with the right form factor. We didn't want to, again, stitch together an unknown piece of hardware on our software and take like two months to decide, you know, things like dead and arrival hardware or performance issues with, you know, new exotic pieces of hardware and things like that. or IP and we take off-the-shelf commodity hyperscale servers that are built in Taiwan and again I think that's the other religion I would say of
Starting point is 00:24:52 Nutanix that Taiwan is good enough you know what they build in terms of hardware if you can put software on top of it on the enterprise grid and that's exactly what Amazon and Facebook and Google actually do. They don't buy blade servers from Cisco or IBM or HP or any such branded x86 server company. They go and buy unbranded white boxes. Well, the cloud scale guys go a step further than I'm comfortable with. I think you're right. Using Velcro to hold the disk drive to a motherboard in a data center
Starting point is 00:25:27 is a little bit much for me. Some of that, by the way, some of that is also marketing. A lot of them do buy Quanta and Supermicro and a couple others from Taiwan. And I think that's what I mean. The fact that enterprises can now, using Nutanix software or other such companies' software, they can make commodity x86 through the enterprise
Starting point is 00:25:54 grid. So we take these hyperscale servers from Taiwan and we use Intel SSDs on the SATA backplane and 1TB or 2TB or 4TB drives in the backplane to build these hybrid flash and spindles kind of converging infrastructure arrays. So it's effectively commodity servers, commodity storage. You didn't mention a networking component, but I assume that would be commodity networking as well? Yeah, off-the-shelf networking. We do resell Arista switches, 10-gigabit switches,
Starting point is 00:26:38 but many a time we don't make that call for the end customer because philosophically networking guys have their own religion about cisco or juniper or rest and we don't want to add more friction by you know shipping a switch that we mandate almost all 10 gigabit switches which are relatively uh you know low latency whether it's Cisco or Arista or even Juniper, I think for that matter, works with us. Now, if the customer really wants a single throat to choke in terms of supporting switches as well, we can ship them in Arista.
Starting point is 00:27:17 And our support team is truly converging that way, the people who have expertise in virtualization and people who understand storage and people who understand performance of VDI and big data and things like that. But there's a lot of CCIE and CCNA experts within the support team as well who can support Cisco switches and understand the details of networking as well. Yeah, and a shocking number of those network guys are more loyal to Cisco than they are to their employers. Yeah, that is well said, well said.
Starting point is 00:27:56 I think that's the observation that we made, Howard, as well, that we could ship our own switch, but we'd rather let the workload guys, you know, the virtualization guys, work it out with their networking guys to decide what top of the rack switch they really want. A couple of questions come to
Starting point is 00:28:20 mind. Maybe I should start with the cluster interconnect is effectively 10 gig Ethernet?'s not infinite or anything like that yeah it's 10 gigabit Ethernet and if you think about it we've basically converged the compute plane which is the application first while VMware plane with the storage back plane as well so the frontplane and the backplane now share the common fault tolerant
Starting point is 00:28:51 10 gigabit plane and you can use virtual switches and quality of service traffic shipping and software to figure out what part of the fat pipe is used by the frontplane and what part of that pipe is used by the front plane
Starting point is 00:29:05 and what part of that pipe is used by the back plane itself. The storage almost seems like the key to some of this. I mean, arguably V-blocks and flex pods and stuff like that are using enterprise-class storage services and storage machinery and equipment today. But you're taking a different tack with what I would call direct access storage devices on the server. But somehow you're sharing that storage across the cluster. So therein lies an interesting key to this whole discussion in my mind. Welcome to the future, Ray.
Starting point is 00:29:50 I guess. In fact, you know, that's a good observation, Ray. A lot of the IP, I mean, the heart of Nutanix is data. We manage data as well as it has been managed in a dual controller storage appliance and better you know we basically look at the hub-and-spoke architecture of the last 20 years and we say the storage appliance a dual controller storage appliance was sitting at the hub of a hub-and-spoke and we basically disaggregate the hub. There's no need to have this unscalable, expensive, big-iron approach to storage.
Starting point is 00:30:32 You can put a mini-controller as a virtual machine on every server, and then these controllers talk to each other to form our distributed system. So the heart of Nutanix is a distributed system. Now, if you look under the hood of Nutanix, there's a distributed file system. And that file system uses, you know, those popular tenets of big data. It has a NoSQL metadata service. It uses MapReduce for pretty much everything, whether it's recovering disks to rebalancing cluster nodes
Starting point is 00:31:11 to doing tiering of data to archiving data to making snapshots more efficient lazily in the background, pretty much everything we use MapReduce for. and then we have a modified version of Cassandra, which is the NoSQL service I talked about. We use ZooKeeper, which is a cluster manager that a lot of people use in the big data web scale environments. So under the hood, Nutanix is really a big data application that solves a lot of the mundane problems in the data center,
Starting point is 00:31:50 and it basically ties itself into existing virtualization stacks and management stacks and things like that. So we've taken big data as a framework, and we said let's bring it down to the masses. If you think of Splunk, Splunk is like a big data application that solves some mundane problems of log analytics for the IT administrator. And if you extend that further ahead, you can solve enterprise-grade storage problems like the way Splunk solves log analytics problems. You can think of Nutanix as a company that has built a product that exposes NFS and iSCSI and SMB 3.0
Starting point is 00:32:47 and integrates with VMware and OpenStack and Microsoft SCVMM to really bring I would say continuity to the administrators so that they don't have to really think about changing their workflows and their lives. You know, everything just works out of the box. It seems almost unbelievable to me that you could take something, some of these ideas, NoSQL, Database, Cassandra, MapReduce, and implement enterprise-level storage characteristics, utilizing that as
Starting point is 00:33:34 sort of the framework or the backbone for it. I can't get my head around it. And Howard, maybe you can help me here. Or maybe there's just no excuse well i mean in in part you have to remember that you know moore's law has been a huge gift to all of us and so just the fact that processors have gotten so much more powerful makes it possible to do this kind of thing um and you know conceptually if you look at something like vsan you know it's not based on cassandra and map reduce but it's you know another way to try and solve the same problem um of saying you know we have compute resources
Starting point is 00:34:23 and we have direct-attached disk, and how do we make that direct-attached disk shared and reliable? But I can understand something like vSAN, which is a special-purpose program that implements storage protocols internal to VMware. What I find difficult to believe is that somebody can use a NoSQL database and MapReduce and things of that nature to implement effectively a distributed storage service for VMs and get away with it. Yeah, and by the way, there's some precedence to this. If you go back 20 years, storage vendors were basically writing operating systems from scratch. And then came NetApp and said, you know, we can use FreeBSD and we can modify FreeBSD.
Starting point is 00:35:13 We don't have to reinvent process management and memory management and IO subsystems. We can use FreeBSD as our base. And then around the year 2000, Linux became good enough for a lot of the data center community, including switch vendors and load balancer vendors and including storage vendors. And then fast forward four or five years, MySQL became good enough for a lot of the metadata that these storage vendors and these switch vendors were actually maintaining.
Starting point is 00:35:50 There was Berkeley DB and there was MySQL, the two databases that people were using. So open source has been the foundation of innovation, even in the data center community for the last 20 years. What we have done is looked at it a little bit more closely and said there's more that you can get out of open source so that you don't have to spend time implementing plumbings that basically nobody pays you for. People pay you for services on top of that, like deduplication, compression, disaster recovery, backup, clone, snapshots, VAI integration, things like that.
Starting point is 00:36:30 Yeah, just storing the data safely is kind of assumed. Exactly. And figuring out, I mean, obviously we have changed a lot of Cassandra. Like, you know, Cassandra by itself is eventually consistent, so it's not good enough for the storage world. We have built a distributed consensus algorithm on top of Cassandra to make it strictly consistent, but at least we didn't have to reinvent some of the Cassandra wheel around consistent hashing so that you can redistribute the keys when you add a node or remove a node. We didn't have to reinvent concepts like SS tables,
Starting point is 00:37:06 which are LSM trees which convert random writes into sequential writes and so on. So on that low level of Cassandra is something that we didn't have to reinvent from scratch. Yeah, yeah. So you have tied a lot of this together with your own IP
Starting point is 00:37:21 to try to make it more effective in an enterprise storage solution. Yeah, absolutely. And I think one other aspect of Nutanix that I haven't touched upon is how we're spending a lot of time on ease of use and reducing friction. And there's a whole fabric that we're building, which is a systems management fabric, which will be the V center of the next generation. Think about it. I think based on NoSQL, based on MapReduce, based on collecting bazillion amounts of statistics and data from all across the data center entities, you know, virtual and physical.
Starting point is 00:38:01 And analyzing historical stats and then going and visualizing it. I think this whole systems management of the next decade is a problem of scale and a problem of design. Can you bring consumer-grade design and visualization to this idea of systems management? And can you bring a web scale to the idea of collecting a lot of data and analyzing a lot of it? It's an interesting trend in the storage market that the startups, including yourselves, have started producing huge amounts of analytical data. Yeah. That, you know, the storage system used to be much more of a black box,
Starting point is 00:38:50 and now, you know, dashboards are showing us a lot of what's going on at a much more granular level than they used to. And that, of course, is going to mean there's a lot more data to analyze. Yeah. So a couple of questions here. How far can these systems scale up? Is there some sort of VMs per
Starting point is 00:39:12 node kind of view of this? A node effectively is server and storage and then there's networking outside of it that supports the fabric between them. That's correct, right? Yeah, and then you can have different kinds of nodes you can have heavy nodes you can have storage heavy nodes you can have flash heavy nodes and
Starting point is 00:39:32 they can all be stitched together using this single fabric that makes the whole thing look like one single system so you can independently scale compute and storage or performance and capacity by adding different kinds of nodes and one cluster itself. Now, the way it actually scales. So there's two things here. One is architecturally, what have we done? So we have two kinds of, I would say, metadata stores. One is the configuration database, which basically stores information about machines
Starting point is 00:40:09 and virtual machines and virtual disks and things like that. And that's stored in ZooKeeper, which has been known to scale to tens of thousands of... in Facebook and... And then there's the NoSQL Cassandra itself which is again a partition that basically
Starting point is 00:40:32 grows as you add more machines so as you add more machines in the Nutanix cluster it's not just the user data that gets produced it's also the metadata that gets rebalanced and the way we do rebalancing of keys is not using a simple mod n arithmetic where everybody now has to give up their keys to this
Starting point is 00:40:56 new guy who's been added to the cluster because that is an unscalable solution if you have 100 nodes and you add one more node you don't want all the other existing hundred nodes to actually reshuffle their keys to be able to bring the hundred first node of the ensemble so we use a consistent hashing to really do a lot of this scaling as well so architecturally there is no single point of bottleneck or a single point of failure in the Nutanix product itself. Now, in terms of what we have actually gone and shown, we've gone up to 52 nodes in a single fabric. So you can have a single large data store that could be a petabyte in size. And nobody in the VMware world has actually been able to show a single large data
Starting point is 00:41:45 store that scales to a petabyte just yet. And we can actually go to 100 nodes or 200 nodes if we had to. Our largest customer is about more than a thousand nodes right now. And they just carve out 52 node fabrics, data fabrics, and they're able to use a single console to manage all of them. So the control fabric is a single pane of glass, which is able to manage all these 52-node fabrics. And the 52-node fabric is a distributed file system that is a single system that you can use as a single data store. And that environment, 1,000 nodes environments with 52-node clusters,
Starting point is 00:42:35 so each of those clusters is effectively a separate, and I'm not sure what the terminology in VMware is, but a separate HA environment or a separate vMotion environment, and you can't move a VM between them without more serious activities. But within that cluster, you can move VMs and storage to your life's content to some extent. Exactly. And I think it's more of a – Well, I mean vSphere clusters don't scale to 52 nodes. So the storage backend extends – it's one file system across the 52 nodes, but it would be multiple vSphere clusters on top of that.
Starting point is 00:43:12 Yeah, yeah, yeah. But I think the important thing here is theory versus practice. I think even if you look at Facebook, I mean, we have a bunch of developers from Google and Facebook working for us. They talk about how they would stop a single fault domain to go beyond a rack, because they knew that if a rack failed for whatever reason, only a well-defined sliver of the user community would be inaccessible or would be unable to use Facebook.
Starting point is 00:43:43 But they didn't want to have a fault domain that was 10 racks or 50 racks or whatever because they knew that if that failed, then a whole bunch of users. So it's important to actually keep that fault domain relatively contained as well. All right. We're coming up about the time where we're over. I don't have any further questions. Howard, do you have any further questions? Well, the only other thing I think that's important to talk about is how
Starting point is 00:44:11 software-defined storage as a software product like vSAN affects folks like Nutanix that are doing hyperconvergence and is the fact that affects folks like Nutanix that are doing hyperconvergence? And, you know, is the fact that that concept comes from VMware just an endorsement, or is that a threat? I think I'll answer this question in a couple of ways. One is that people who look at Nutanix as a converged infrastructure, hardware, software, appliance, tightly integrated with software, if they look at us like that, then they underestimate us.
Starting point is 00:44:51 They don't understand what we have in store in the next two to three quarters. So I'll stop at that in terms of the kind of packaging that we are thinking about for the future. I think there is a world of good enough where I think vSAN would come in and make smaller storage appliances like EcoLogic and Compellent and Left Hand and companies like that a little bit less relevant. Not totally relevant, but a little bit less relevant, especially for good enough computing like Test and Dev or non-persistent virtual desktops or things like that.
Starting point is 00:45:39 Because vSAN doesn't have the enterprise-grade features, and they've spent like five years doing R&D on that. But I do expect it to open the market where people can now think of building data centers without using a storage appliance. And that's a huge win for us. I mean, as a company, we'd love to ride the coattails of VMware marketing where they're able to open doors for us, especially in, you know, very religious markets like Global 2000 where the server guys and the storage guys, they are different territories and different teams and different agendas and goals and so on.
Starting point is 00:46:20 I think VMware really opens up that market because we don't have to be the first ones to preach convergence on the server. They already are spending hundreds of millions of dollars in marketing doing that. Now, there, it's going to be about, at least from Nutanix's point of view, about hypervisor agnostic architecture. Can you do this with VMware and can you also do this with OpenStack KVM? Can you do this with Hyper-V? And can you spill over to Amazon? All sorts of things in the middle. I think to the global
Starting point is 00:46:52 2000, they want choice as always. They want flexibility and we argue that software defined storage on the server side needs to live above the hypervisor, not inside the hypervisor, which is similar to what Oracle did about 20 years ago or 25 years ago.
Starting point is 00:47:12 They went and argued that databases don't belong inside mainframe. They don't belong inside operating systems. If you make databases run above operating systems, then you can run it on all operating systems as well. And I think that's kind of the deja vu that we are looking at right now we believe that fabrics like storage which are sitting on the server or even fabrics like the control fabric systems management and analytics and visualization monitoring and performance management of data centers they must be hypervisor agnostic they must live above the commodity sheet metal,
Starting point is 00:47:47 which is the hypervisor itself. All right, well, that's great. Thank you, Dheeraj, for being on our call. Next month, Greybeards on Storage focus our discussion to server-side flash caching, which should be exciting. Any questions you have on server-side flash, please let us know. That's it for now.
Starting point is 00:48:08 Bye, Howard. And thanks again, Dheeraj, for being on our call. Until next time. Thank you. Thanks, guys. It's been fun. Talk to you guys later. Yep.
Starting point is 00:48:19 Bye. Bye. Bye. Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.