Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 2x15: Enabling AI Applications through Datacenter Connectivity with Nvidia

Starting point is 00:00:00 Welcome to Utilizing AI, the podcast about enterprise applications for machine learning, deep learning, and other artificial intelligence topics. Each episode brings experts in enterprise infrastructure together to discuss applications of AI in today's data center. Today, we're discussing the requirement for connectivity between different elements of the data center that are all working together. First, let's meet our guest, Kevin Deerling. Hi, I'm Kevin Deerling. I'm the SPP of networking at NVIDIA. I came to NVIDIA through the acquisition of Mellanox early last year.

Starting point is 00:00:40 Really excited to talk about the connectivity for AI. And if you're interested, you can reach me at LinkedIn, or you can follow, tune in, because this is GTC week, and you can watch us at GTC. That's the NVIDIA conference. Thanks. And I'm your co-host, Chris Grundemann. I'm a consultant, content creator, coach, and mentor. You can learn more on chrisgrundemann.com. And I'm Stephen Foskett, organizer of AI Field Day and publisher of Gestalt IT. You can find me on most social media networks, including Twitter at S Foskett. So Kevin, many of our people in the audience are very, very familiar with NVIDIA. After all, NVIDIA is probably the 800-pound silicon gorilla of the AI space.

Starting point is 00:01:26 But many of them may not be aware of the connectivity and networking capabilities that NVIDIA acquired from Mellanox, and also may not be aware of just how important connectivity is to modern AI applications. I wonder if you can just sort of set the stage by telling us about that, about how connectivity is sort of holding back the deployment of AI applications. Yeah, I think it's surprising, but we're actually the leader in high-performance networking today with 25 gig, 50 gig, 100 gig networks. And the reason that's important is because with AI, there's so much data that getting access to that data and communication between these accelerated servers that are based on the GPU is super important. And so that's what we do. Awesome. And so when you approach networking for AI applications or infrastructure that's

Starting point is 00:02:24 going to support AI applications, are, or infrastructure that's going to support AI applications, are there any fundamental differences in the way you look at a network if it's going to be built specifically for AI applications? Yeah, I think there is, because in a traditional environment, you might have your application running on a single server. But with AI, the data center is the new unit of computing. It's the entire data center because you're actually running a bunch of jobs across all sorts of different servers. We're stitching together all of these different AI services. And when you do that, the networking performance becomes critical.

Starting point is 00:03:00 So throughput and latency and all of the accelerations that we build into the network are vital. So your old one gig or even 10 gig network just doesn't cut it. Yeah, that makes sense. And then is there anything in addition to, you know, really just structuring that network for high bandwidth, low latency, low jitter, right? I mean, just really making a rock solid network to allow AI microservices to be spread across an entire data center. Is there more a network can do for AI, right? Is there things inside the network that we can do a great pre-processing or things like that? Yeah, that's a great question because in fact, there is. The data center is the new unit of computing means that all of the data center services can then be offloaded. So if you

Starting point is 00:03:48 run software-defined networking, software-defined storage, and software-defined security, you consume 30% of the CPU cores on your servers. And what we do is accelerate that. So we have what's called the data processing unit that goes into servers, and it runs all of that infrastructure that feeds the AI applications and connects GPUs and CPUs together. So the DPU really becomes the third element of the data center. And then of course, the switch fabric to connect everything with high performance and low latency. Now, you know, we've seen, I mean, other folks would call them smart NICs, right? I mean, I think that's kind of, you know we've you've seen um i mean other folks would call them smart nicks right i mean i think that's kind of you know this is an outgrowth of that or a next generation of that

Starting point is 00:04:29 right the dpu and you know the the first question i think that comes to some folks minds anyway is why not just offload that completely right i mean if you're essentially building a server on a car that you're putting inside of another server, is that actually more efficient or better in what ways than just putting another server beside that server? Yeah, so we really want to put the DPU in every box, and there's very good reasons to do that. First of all, the data processing unit is really good at some of the tasks that are needed for AI workloads. So whether we're doing encrypted models or we're moving low latency data with what's called GPU direct storage directly from GPUs to GPUs or GPUs to storage, we can do that

Starting point is 00:05:13 built into the network. But there's also a huge benefit of decoupling the application software stack from the networking and the security and the storage stack. So it actually provides huge flexibility and performance gains. And that isolation benefits really, really powerful. So if I can push back architecturally on this, I'm kind of hearing two architecture stories here. On the one hand, we're seeing DPUs going into every server, as you just mentioned. But at the same time, we're also hearing about this concept of a data center-wide compute unit. And how is it that those two sort of conflicting elements are brought together into a unified whole? I suppose the answer is networking,

Starting point is 00:05:59 right? That's exactly right. So the networking, if you think about a traditional computer, you have a backplane with IO connectivity between, you know, storage and that might be something like PCIe. In a data center scale computer, it's the network. And, but now things are ephemeral you've got containerized systems that are coming up and going all the time, little microservices. And so you can no longer afford to build your security and your interconnect by manually configuring ACLs on switches. It needs to be automatically provisioned. It needs to be adaptive. Everything has to be happening in real compute time, not in human being time. And this, I think, connects with sort of what Mellanox has historically excelled at. Because again, maybe not everybody's as familiar with the company as its products as I am. But Mellanox, for the longest time, was the champion of this idea of basically sharing memory with massive bandwidth outside of a traditional compute unit, whether it's a storage

Starting point is 00:07:05 server or a compute server. And so fundamentally, by bringing that technology into NVIDIA, it seems to me that you are sort of at once exploding the computer, but also kind of consolidating it as well. Because essentially, the multiple servers, multiple storage devices, all of them become sort of a unified shared memory fabric almost. Is that the right way to put it? Yeah, that's really a great perspective because that is our heritage. We're the supercomputing leaders with our technology, which is called InfiniBand. And interestingly, the supercomputer and the cloud are closely related cousins. And so if you think about it, early on in the history of the internet, there was a question that somebody asked the founders of Google, what could they get that

Starting point is 00:07:57 would really help them? And they said, hey, if we could put the entire internet into memory, that would be really useful. Well, effectively, that's what we're doing with these giant data center scale computing. We're putting massive amounts of AI workloads, all the data associated with it, all the models, all that information into memory on all the different servers. And then the network is critical because we offload, accelerate, and isolate the network. And we connect all of these things as if it's a giant pool of memory that we're sharing. And of course, there's always storage too. There's a memory hierarchy.

Starting point is 00:08:37 You never fit everything into the main memory, but that's effectively what we're doing. And the network's critical to make it happen. Yeah. So you mentioned the InfiniBand and supercomputing. And of course, that's really where, you know, Mellanox dominated. But then over time, the technology advanced and a lot of those concepts came down sort of from supercomputing to HPC to big data

Starting point is 00:08:55 and now into AI. And I see a very straight line in terms of architecture between those concepts. So where originally it was sort of, you know, proprietary systems for supercomputers. Now, you know, we're looking at, you know, more conventional protocols. We're looking at ethernet, we're looking at, you know, x86 servers and so on. But the idea is the same that the, that the data processing spans something greater than a CPU or greater than a server.

Starting point is 00:09:27 Yeah, that's exactly right. Because the scale of the AI problems that we're solving simply don't fit into one or two or 10. We're talking a thousand servers. And when you have that, the networking becomes critical. And the other key part is that when we have a thousand microservices, you have a problem that's called the tail at scale. It's not the average latency of the response, it's the worst case latency. Because if you have a thousand microservices and you call all thousand of them, okay, most of the time it returns in 10 milliseconds, but every once in a while it takes 100 milliseconds. If you have a thousand microservices, you see that tail latency every single time. And so having determinism built into your network so that we're not relying on software, but we're accelerating

Starting point is 00:10:16 things in hardware is critical. Interesting. I mean, yeah, I mean, and that kind of reminds me back into the network side, which is if you're offloading some of these functions, right? So the SDN functions, those kind of things into the DPU. Does that mean that the network switches in between can have less intelligence call it, right? And they're just kind of really moving packets quickly in that core network where a lot of the features and functionality, that kind of service edge ends up in the DPU. Is that how it gets architected? Yeah, I think there's a push and pull between where the intelligence is. Is it in the DPU or is it in the switching? We see both.

Starting point is 00:10:53 We see both models being deployed. The key advantage of putting it in the edge is that it becomes inherently scalable. So you don't have a centralized resource that might run out of memory or ability to cache flows and things like that. But in many network and you say, hey, I've got multiple paths to get to my endpoint. I'm going to make a decision. So there's a different kind of intelligence that's moving to the switch from the edge. So what else? We're talking about kind of a lot of different offloading and shifting things around and moving different pieces, right? And this isolation was another thing you touched on earlier with Steven. I think it's really interesting.

Starting point is 00:11:48 And I have to assume that that has potential security implications as well, right? As being able to isolate different pieces of what you're doing, whether it's the networking functions or something else, should be able to provide different layers of security that weren't previously available. Yeah, that's right.

Starting point is 00:12:04 So we talked about offload, accelerate, and isolate. And that isolation piece, security is a critical aspect of that. Because when you're running your software-defined networking and storage and security on the x86 processor in the application processing domain, if the x86 is compromised by an application, then so are all of your provider policies. So your security policy is compromised. And we've seen things like Spectre and Meltdown bugs, which actually can come in because you're inviting third parties, you're inviting customers, your employees are bringing apps, downloading them, and installing them right in the middle of your data center. And once you've compromised your x86 application processing domain,

Starting point is 00:12:50 because those aren't trusted anymore, now you've compromised all your security policies. So the isolation is huge. So another thing you mentioned in there that I thought was really interesting is sort of this interesting counterpoint between, on the one hand, massive scalable systems, and on the other hand, smaller and smaller containerized and microservices applications and endpoints. And I think it's really interesting that at the same time that the data center is getting big, the applications are getting small. And for that reason, I think that we can bring this back to this whole machine learning applications concept, because what we've seen is that a lot of application of machine learning

Starting point is 00:13:33 goes into things like, you know, microservices that serve a specific job and process a specific bit of data. I think that most of us are aware of sort of a personal digital assistance where you say, hey, keyword, turn on the lights or something, right? That's a classic example of an artificial intelligence that's implemented in sort of a micro services approach. Because essentially, every time you say that, the infrastructure basically builds up whatever is required to service your request, does your thing, and then tears it all back down. And that gets to your sort of worst case scenario of infrastructure, because essentially a user's experience with one of those machine learning assistants that I'm not going to say the name of, is required, basically predicated on their experience every time they

Starting point is 00:14:26 use it. And if they use it, and sometimes it takes it 15 seconds to respond, they're going to say, well, this thing stinks, even if most of the time it responds in half a second. So I think that, is that really what you're trying to say? That, you know, not only do you need to bring all this stuff together, but you need to be very deterministic in terms of making it reliable, making it work every time. That's right. Because, you know, the things about human beings and our interactions is we're pretty slow, but 100 milliseconds or 200 milliseconds is, you know, normal for human response times. That's actually incredibly hard if you think about the ensemble of AI services that's required. If you're asking a digital assistant, then it's doing voice recognition and translating that to text. It's doing natural language processing. Maybe you're asking it to see

Starting point is 00:15:20 something in a retail application. So it has to go search a database, find out the best matches for your background, and then put that in reverse to talk back to you and say, hey, I found this. So it synthesizes the language in reverse, and then it actually does all of that. And it has to do that in human response times that we don't feel awkward. It has to be good responses. The amount and the number of different services that are involved to deliver that kind of real-time response requires that the network is deterministic and super low latency. And we spend a great deal of time making that happen. And as someone who's got a lot of experience in the enterprise networking space as well, not to throw stones, but would you say that most enterprise networks and data centers are architected in a way that allows them to implement applications like this? Or would you say that maybe they have some work to do? Yeah, they got a lot of work to do, frankly.

Starting point is 00:16:19 We see this sometimes where we'll bring in our new AI platforms, our edge servers from our partners that have GPUs in them. And then whether it's storage connectivity, they'll just assume, okay, well, we'll just use that with our old network. We've got, you know, some one gig, we've moved to 10 gig. We have in our platforms, in our AI platforms, we can put something like two terabits of bandwidth into a single server. So it's massive data that's required because we're stitching these things together. We're scaling out the computer. Again, if you think about the backplane of a server, and now it needs to run across the entire data center. That's the level that we're talking about here. We put 200 gigabit per second adapters into our DGX servers, and then we multiply that

Starting point is 00:17:11 times nine. So that's 1.8 terabit of throughput from server to server and server to storage connectivity. That's what's needed. It's not your network that you installed even five years ago. Yeah, that scale is just simply massive, right? I mean, I'm still just kind of wrapping my head around multiple terabits to a single server. And then obviously, I mean, then aggregating that out, I mean, does this just become a really, really flat network? Or I mean, is the backbone, is there a core there that's five, 10 times that to be able to aggregate servers to get them to

Starting point is 00:17:43 crosstalk and things. Yeah. So these are almost always what's called the fat tree network, which is a constant bisectional bandwidth network. So we see the same amount of bandwidth on the leafs and then going up through a spine. So we're going to provide all to all connectivity where every server can talk to every other server at full bandwidth. And to do that, you'll see that those spine switches, we're already shipping 400 gigabits per second today. 800 gig is right around

Starting point is 00:18:12 the corner, the specs that are standardizing, they're solidifying. And so we'll start to see that happening. So we're seeing 100, even 200 gig to each of the servers at the endpoints. We can multiply those up for storage and AI boxes. And then in the core today, it's 400, 800 is right around the corner. So that's 800 gig per port. Yeah. Wow. Wow. That's huge. And you know, that reminds me of something you were saying earlier about kind of the latency issue or potential latency issue, right? Where any longest response ends up slowing the whole thing down,

Starting point is 00:18:46 which when you combine that with the idea of thinking of the data center as the unit of compute, it reminds me of, was it Amdahl's law, right? And how much you can get out of a single process being based on how long one process takes, right? If you break things down to paralyze them. And that seems like the network now becomes a factor in that. And actually, you know, being able to speed up your AI applications are really, really going to be reliant on that kind of longest piece of latency, biggest piece of bandwidth,

Starting point is 00:19:14 et cetera. Yeah, you nailed it. So, you know, if you think about a job that starts off that's 999 seconds can be parallelized. And then one second is serialized, meaning that a bunch of nodes need to talk to each other to get the data that they need to continue processing. Now, all of a sudden, you start scaling out that application. And you're doing that with GPUs, which themselves are massive parallel processing engines. You've got a thousand or more individual processing engines on a GPU. And now you take a thousand of those. So now you have a million X speed up on what used to be 99.9% of the time.

Starting point is 00:20:00 And when you get a million X speed up now, it's one one thousandth of the time. And all of the time is that synchronization process. What used to be one second out of 99 seconds is now one second out of 1.001 seconds. So a million X parallel is just a huge gain, and now the networking becomes the bottom line. And that's where we are today, And we're actually accelerating the network. We're doing in-network computing so that we're doing those collective operations as the data is moving through the network. It's interesting, the architectural difference that you mentioned as well, because I think

Starting point is 00:20:39 a lot of people just aren't used to thinking of networking that way. They're used to thinking of networking in a very hierarchical way where there's sort of, there's the spine, there's the leafs, there's the clients, you know, maybe you have top of rack switches and, you know, it's sort of a very North South networking approach, not an East West networking approach. But what I'm hearing you say is that really it's an any to any network now. Is that right? Yeah, absolutely. So, is that really it's an any-to-any network now. Is that right?

Starting point is 00:21:06 Yeah, absolutely. So, you know, it's famous, the Sun logo and Scott McNeely that said the network is the computer. What that meant back in the 80s when he articulated that was that computers could connect to each other, and that was their value. Today, it's the data center is the new unit of computing. That's what Jensen said when he bought Mellanox. And he said that together, we'll be able to optimize across the entire data center stack. So that's software, that's accelerated computing, and it's the network. And so what we're seeing is if the data center is the new unit of computer, you've got a single computer that's spanning thousands of things. And the networking becomes critical to that.

Starting point is 00:21:51 So really, when you start to think about how to optimize the network, it's all to all. Everything is talking to everything else. Latency, determinism, all of that's super important. So offloading, accelerating, isolating that we're doing with the DPU, and then making it very, very efficient through the switches and the networking fabric, all of that becomes part of the computing problem. So what comes next in networking? You know, we've seen as well that, you know, with PCI Express Gen 5 and CXL and all these concepts like that, you know, Gen Z, what are we going to see next in terms of building systems that are bigger than the systems themselves? Yeah, so obviously you alluded to some of the things that we're talking

Starting point is 00:22:42 about, which is the new local interconnects with the PCI Express Gen 5 and CXL. That's great. That lets us get the bandwidth out of the box so that we can use that on the network side. But I think you'll also start to see CXL, for example, does cache coherency. And you have these DPUs where they have memory. The data processing unit has memory. The GPU has memory and the CPU has memory. Now, all of a sudden, those memories can start to become really almost interchangeable. We're no longer having to say, hey, I'm going to go through some really slow IO path to get to the memory. That's what our history is, is sharing memory between nodes

Starting point is 00:23:27 as efficiently as we can. But PCI Express has always been a bottleneck. And now all of a sudden with cache coherency, both in the box and also just the ability to put memory where it needs to be, put the data in the memory where it's appropriate and to have transparent, low latency access to that. Again, you're going to have memory that's distributed throughout your server and then throughout the data center. And we're going to be able to access it at latencies. We're going to start to worry about the speed of light in fiber optics is going to say, oh, we need to figure out how to lay out our data center differently

Starting point is 00:24:03 to put this on board. We actually spend a ton of time thinking about the speed of light in fibers versus through the air versus our CERTIs, which is the serializer, deserializer technologies. And as we get to the faster speeds, everything becomes important. Error correction on your links can add latency. And if you're using that for memory, you really need to scratch your head and say, is there something we can do differently to reduce the latency of our networking? So we've really kind of zoomed in here on some pretty nerdy networking concepts.

Starting point is 00:24:40 I wonder if maybe we can leave our AI-focused audience with sort of a message. Like, what should they be looking for in connectivity in order to support AI applications? You know, what's required? You know, what are really the AI implications of this totally new data center architecture that's being built, well, frankly, for them? Yeah, that's a great question. So one of the things that we've

Starting point is 00:25:06 realized is that every business is going to become AI. And by that, I don't mean that they're going to be focused on AI as their core value, but whether it's a paper business or a pharmaceutical company, they will embrace and utilize AI to provide better products and services to their customers. But in order to do that, we don't want to have every single company become experts in AI. So we build platforms. We work with our server partners and security partners and storage partners to validate those platforms. And a platform is more than just the hardware. So we have AI frameworks for all of the businesses. So whether it's retail or natural language processing

Starting point is 00:25:51 or pharmaceuticals or robotics, all of those, we have AI platforms that we deliver. So the hardware is certified. We've sized it. We've done all the architectural matching of memory and storage and networking and GPUs. And then the platform that sits on top of that, we've pulled everything together so that a retail business can just build with their expertise on top of that AI platform. So I think the key thing with AI is don't try to go become, don't hire all your own PhDs that are going to go write core data learning

Starting point is 00:26:27 algorithms. That's a really, really challenging thing. That's what we do. We invest massive amounts in doing that. Instead, build on the platforms that are out there, both hardware and software AI frameworks, and take your business expertise and build on top of what's there already. Well, I really appreciate that message. And I think that that's probably in line with what we've heard from a lot of the guests here on Utilizing AI, as well as at AI Field Day. So before we go, Kevin, I think the time has come for us to spring a couple of questions on you. And a special note for the readers and listeners, we haven't given him a heads up on

Starting point is 00:27:06 these questions. So he's going to give us sort of an off-the-cuff answer to areas that may be a little bit beyond his expertise, but hopefully it'll be a lot of fun for everybody involved. So here we go. One of the things you mentioned was latency in communication and interpersonal communication, especially verbal. And that's actually one of our questions. When do you think that we're going to see a conversational verbal AI that can fool an average person and pass the Turing test, basically make an average person think they're talking to another person? That's great. So that's the Blade Runner question here. So that Turing test is really cool. And I think we're there now. I think that today we could have an AI natural language

Starting point is 00:27:53 conversation with the topic that unless you're really an expert and able to drill in, as that was the great Blade Runner movie scene where he starts asking questions of the robot and then figures out that indeed it isn't a human being. But I think we're at the point that for most natural language processing, I think we're there. Great. Yeah, I'm not talking about a Voigt-Kampff test here. I'm talking about a Turing test, so it's okay. I catch your reference. Yes. So number two, I know that NVIDIA is very deeply involved in video processing. When will we see a video-focused machine learning assistant that operates like the audio assistants that we have now? In other words, something built into a camera that's

Starting point is 00:28:40 reacting to things just like the audio systems do? Yeah, that's a great question too. So we spend a lot of time on video here. We have something called Metropolis, which is our smart city, where today most of it is unidirectional. The camera feeds are going into an AI engine that's actually doing inferencing, figuring out what we're seeing,

Starting point is 00:29:04 and then maybe responding in some way. And I think ultimately we'll see that response closed loop so that the camera itself starts to do things. And I think it's right around the corner. I think you'll see that in some of the ensembles that we're building, for example, with our Omniverse, which is a virtual world where you've got cameras and robots and real people and AR and VR all interacting in a closed loop environment. It's the coolest thing. Come to GTC

Starting point is 00:29:33 and watch. We'll be showing some of that. Well, that sounds great. And absolutely. And so finally, the last question is, are there any specific jobs that people hold today that will be completely eliminated? No one will have that job anymore because of AI in the next five years. You know, that's a tough question. I don't think that there'll be jobs that are completely eliminated, but they will be marginalized. So, and, and, you know, I think you have to embrace AI because like all new technologies, they can be disruptive. So, but I think the cool thing is, is that they're going to create more new opportunities than they are for the things that they eliminate. And I think that's the real key is don't be afraid of this, but look out, figure out what's

Starting point is 00:30:23 happening. And if you're doing something that is, can be afraid of this, but look out, figure out what's happening. And if you're doing something that can be done by a machine and software, eventually it probably will. Okay. So look at things that you're doing and figure out where is that new area that really requires human insight, requires human emotion. All of those things are going to continue to exist. Human beings are going to inspire AI, and AI is going to support human beings. So figure out, if you're doing something that you think, you know, I see something that's coming, potentially that can be done by somebody else, inspire yourself. Figure out how AI can work for you because it will. Well, thank you so much for this discussion. Where can people connect with you to learn more about your thoughts on artificial intelligence and other data center topics?

Starting point is 00:31:22 Yeah, so this is GTC week. It's online, it's virtual, it's free. There's going to be some awesome sessions. You know, my boss Jensen is going to give his keynote and introduce a whole bunch of new great technologies, some of which we've been talking about today. So come to GTC and then look me up on LinkedIn. So Kevin Deerling from NVIDIA. How about you, Chris? Anything exciting you've got in the works? Everything can be found on my website, chrisgrundeman.com, or you can follow me on Twitter at Chris Grundeman. Excellent. And as for me, the thing that I'm mostly focused on right now is preparing for our upcoming AI Field Day. So that's coming May 26th through 28th online, streaming to you at techfielday.com. So please do check that out. We were very excited about having a great group

Starting point is 00:32:06 of delegates and presenting companies for that event. And of course, tune in every Tuesday for Utilizing AI. Thank you very much for listening to this episode of the Utilizing AI podcast. If you enjoyed this discussion, please do subscribe, rate, and review the show on iTunes or your favorite podcast platform, since that really does help us. And please share this show with your friends. This podcast was brought to you by gestaltit.com, your home for IT coverage across the enterprise. For show notes and more episodes, please go to utilizing-ai.com or find us on Twitter at utilizing underscore AI. Thanks, and we'll see you next week.

CODACE Plant Stand

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 2x15: Enabling AI Applications through Datacenter Connectivity with Nvidia

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 2x15: Enabling AI Applications through Datacenter Connectivity with Nvidia

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.