Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 3x03: Platform Considerations For Deploying AI At Scale with Tony Paikeday of NVIDIA
Episode Date: September 21, 2021Enterprises are working to simplify the process of deploying and managing systems to support AI applications. That's what NVIDIA's DGX architecture is designed to do, and what we'll talk about on this... episode. Frederic Van Haren and Stephen Foskett are joined by Tony Paikeday, Senior Director, AI Systems at NVIDIA, to discuss the tools needed to operationalize AI at scale. Although many NVIDIA DGX systems have been purchased by data scientists or directly by lines of business, it is also a solution that CIOs have embraced. The system includes NVIDIA GPUs of course but also CPU, storage, and connectivity and all of this is held together with software that makes it easy to use as a unified solution. AI is a unique enterprise workload in that it requires high storage IOPS and low storage and network latency. Another issue is balancing these needs to scale performance in a linear manner as more GPUs are used, and this is why NVIDIA relies on NVLink and NVSwitch as well as DPU and InfiniBand to connect the largest systems Three Questions  How big can ML models get? Will today's hundred-billion parameter model look small tomorrow or have we reached the limit? Will we ever see a Hollywood-style “artificial mind” like Mr. Data or other characters? Can you give an example where an AI algorithm went terribly wrong and gave a result that clearly wasn’t correct? *Question asked by Mike O'Malley of SenecaGlobal. Guests and Hosts Tony Paikeday, Senior Director Senior Director, AI systems at NVIDIA. Connect with Tony on LinkedIn or on Twitter at @TonyPaikeday.  Frederic Van Haren, Founder at HighFens Inc., Consultancy & Services. Connect with Frederic on Highfens.com or on Twitter at @FredericVHaren. Stephen Foskett, Publisher of Gestalt IT and Organizer of Tech Field Day. Find Stephen’s writing at GestaltIT.com and on Twitter at @SFoskett.       Date: 9/21/2021 Tags: @TonyPaikeday, @nvidia, @SFoskett, @FredericVHaren
 Transcript
 Discussion  (0)
    
                                         I'm Stephen Foskett.
                                         
                                         And I'm Frederik van Herren.
                                         
                                         And this is the Utilizing AI podcast.
                                         
                                         Welcome to another episode of Utilizing AI,
                                         
                                         the podcast about enterprise applications for machine learning,
                                         
                                         deep learning, and other artificial intelligence topics.
                                         
                                         This week, Frederik and I are talking about the many ways in which different platforms are challenged by AI applications, and the fact that AI requires a completely different set of
                                         
                                         infrastructure and resources than conventional applications. Yeah, indeed. I mean, AI really is based on a bunch of software
                                         
    
                                         frameworks that are heavily mathematically based. And the needs for mathematical multiplications
                                         
                                         and executions per second has grown so fastly that the traditional concept of a CPU hasn't
                                         
                                         worked. And so there's a need for a much faster capability to process all those multiplications.
                                         
                                         And the GPUs obviously are a perfect solution to solve those problems.
                                         
                                         We've talked in previous episodes about the need for much more networking bandwidth and storage capacity, storage resources in terms of performance, memory, and of course, as you mentioned, GPUs.
                                         
                                         And of course, any talk of GPUs leads to the monster of the GPU market, which is NVIDIA.
                                         
                                         So we're very pleased to be joined today by somebody from NVIDIA. Tony Piketty is somebody
                                         
                                         that we've talked with previously about all the interesting aspects of, well,
                                         
    
                                         the changes that are coming in platforms to support enterprise AI applications. Tony,
                                         
                                         why don't you introduce yourself a little bit to the audience?
                                         
                                         Hi, thanks for having me. I'm Tony Paikde from NVIDIA. So I'm responsible for AI systems,
                                         
                                         product marketing at NVIDIA. We have a portfolio of enterprise solutions called the DGX system.
                                         
                                         So a lot of my team's charters are around helping enterprises around the world, you know, kind of democratize access to AI and AI infrastructure and build incredible applications to help power their business? It seems to me that the key to
                                         
                                         understanding the DGX systems is not that it's some kind of, I don't know, special configuration.
                                         
                                         It's that it's all about balance. It's about balancing the system resources to support the
                                         
                                         GPU in AI and other GPU heavy workloads. And that to me is the key here, because I think a lot of people
                                         
    
                                         think, well, if I just throw everything at the problem, then everything will work great. And
                                         
                                         that might be true, but it strikes me that DGX really isn't about throwing everything at the
                                         
                                         problem. It's making sure that the system is ready to keep the GPU busy. Is that right?
                                         
                                         Yeah, that's certainly, excuse me, a very important part of this.
                                         
                                         But I would actually say that the way we've looked at the problem in enterprise is around
                                         
                                         how do you simplify how enterprises can deploy and manage infrastructure specifically for the
                                         
                                         purpose of running AI workload. And so when you look at it
                                         
                                         from that perspective, there's a lot of layers in the equation. Certainly the GPUs that are there,
                                         
    
                                         all the things that surround it from an IO, bandwidth, memory, storage, network fabric,
                                         
                                         all those things matter certainly. The architecture that we're talking about, this design balance is very important.
                                         
                                         And I think there's a lot of organizations that oftentimes try to piece this stuff together. And
                                         
                                         sometimes you have the expertise, sometimes you don't necessarily in terms of striking the right
                                         
                                         design balance to ensure that GPUs are kept fed with data during a training run.
                                         
                                         But even beyond that, what we found at NVIDIA is just as importantly or even more importantly
                                         
                                         than the hardware is obviously the software.
                                         
                                         We spent a lot of time within the DGX business unit
                                         
    
                                         and NVIDIA at large optimizing a complete software stack.
                                         
                                         And what we've realized is that everyone
                                         
                                         from data science developers, practitioners,
                                         
                                         to people who manage IT infrastructure, stack. And what we've realized is that everyone from data science developers, practitioners to
                                         
                                         people who manage IT infrastructure, they essentially need the right tools and platform
                                         
                                         such that they can actually operationalize AI at scale. And what I mean by that is being able to
                                         
                                         see more of their valuable intellectual property in terms of viable models and prototypes actually deployed in production.
                                         
                                         This is a classical problem that's been solved in conventional enterprise apps,
                                         
    
                                         but a lot of businesses are right now struggling with how to manage and scale workflow
                                         
                                         that can allow data science developers to do incredible things on one end
                                         
                                         and have it realized in production applications at the other. So we spend a lot of time thinking about the tooling and the software
                                         
                                         that needs to enable that. And then in combination with that, making expertise available to our
                                         
                                         customers such that when they have a question about a framework, about a model type, about a
                                         
                                         use case, or about things like drivers, libraries,
                                         
                                         and communication primitives,
                                         
                                         that they have someone that they can talk to.
                                         
    
                                         So really for us, the approach has been full stack
                                         
                                         in that respect to help organizations scale AI.
                                         
                                         Yeah, I totally agree with that.
                                         
                                         I mean, not so long ago, you needed like an MBA
                                         
                                         and a bunch of PhDs to set up an AI environment.
                                         
                                         I mean, let alone the complexity of the hardware.
                                         
                                         So I do really agree what you said, that there is a need for, first of all, a complete software stack, right?
                                         
                                         So many more people do AI today or have to do AI to stay competitive.
                                         
    
                                         And they need not only the tools tools but also the support in order to
                                         
                                         make this happen. A lot of people talk about the democratization of AI where you provide the
                                         
                                         hardware and I think there's one key component you brought up which is the software. The hardware by
                                         
                                         itself is not the end solution, it's the complete stack and the ability to help end users.
                                         
                                         And I also want to point out that NVIDIA is not necessarily known
                                         
                                         for contributing a lot in the open source community,
                                         
                                         but the reality is they do, right?
                                         
                                         And it's just to enable not only the hardware,
                                         
    
                                         but the ability for people to do AI that don't necessarily know all the different components.
                                         
                                         Yeah, I'm so glad you brought that up, Frederick, because if you think about applications like NLP or recommender systems or any number of foundational AI use cases, many organizations, maybe the most, lack the data science expertise or the
                                         
                                         access to data sets or experience building and training models to be able to do that
                                         
                                         stuff from scratch.
                                         
                                         So we really do folks a disservice if we don't give them ready-made tools that allow them
                                         
                                         to pick, for instance, pre-trained models off the shelf
                                         
                                         and then do fine-tuning around the edges to optimize them for their unique vocabulary or
                                         
                                         their unique set of problems, right? So I think increasingly the state of the art is one that
                                         
    
                                         reflects what you just described, namely offering more and more ready to kind of plug and play type applications delivered in, again, pre-trained
                                         
                                         models, scripts, and other content that lets developers exert a lot less effort doing the
                                         
                                         foundational work and simply being able to now plug and play into their enterprise.
                                         
                                         Right. I would even say that customers today are basically saying, I want a solution.
                                         
                                         They don't want to detail all the different little items.
                                         
                                         It's to the point where people understand that the expertise is with NVIDIA from a software and a hardware stack. And they're just trying to, you know, let's buy a solution that works for us.
                                         
                                         Yeah. You know, you know, DGX has been around for five years. And if I look at our own trajectory, which I think reflects what we've seen in the broader market, so much of the early work that we saw with AI pioneers was born on the backs of like what you'd consider hyperscale type organizations who had deep pockets and incredible bench of expertise to build incredible infrastructure to solve really complex problems,
                                         
                                         right? And you'd expect them to do because they have the capital and operating budget to do that.
                                         
    
                                         And a lot of what we'd seen over the time might even classify, if you're looking at enterprises
                                         
                                         as potentially shadow AI, where you'd have data science teams or business units building what
                                         
                                         they needed to build outside of any kind
                                         
                                         of IT governance and certainly outside of any kind of IT shared infrastructure.
                                         
                                         And now we're seeing more and more CIOs and IT leaders wanting to define the infrastructure
                                         
                                         strategy because they see, in some respects, a lot of costs running out of control as developers run up OPEX trying to do DIY platform instead of having an IT
                                         
                                         sanctioned environment that centralizes all of that. So this has now put IT in the lens and
                                         
                                         forced us all to think about how do we make it simpler for IT teams to manage.
                                         
    
                                         Yeah, so we're talking about the DGX. I mean, the DGX, I think GPUs. I mean, one other thing which is important is, as Stephen mentioned, there is not just the GPUs, which you could consider that compute, but there's also the network and the storage, right?
                                         
                                         So from a network perspective, most people know that you have acquired Mellanox.
                                         
                                         So you kind of have the two pieces from a hardware perspective. And then with storage, I assume you're partnering with industry leaders from a storage perspective.
                                         
                                         And then also the announcements about ARM was also very interesting, where you kind of tried to optimize your infrastructure.
                                         
                                         So is that a trend you see from enterprises where you kind of feel the need to fill the gap, so to speak, like with Mellanox and other components?
                                         
                                         Do you see yourself kind of selling not just the DGX, but a solution where you provide NVIDIA stamped network, and storage? Yeah, this is really driven by the problem statement,
                                         
                                         how do you achieve the fastest time solution on the most complex AI problems an enterprise might face?
                                         
                                         And the challenge has been when the essential resources
                                         
    
                                         are disaggregated from each other and treated in almost piecemeal,
                                         
                                         you have a real challenge in terms of how do you parallelize
                                         
                                         the problem,
                                         
                                         the computational problem, in a way that you can effectively scale and shrink that time to solution,
                                         
                                         right? And when things are not necessarily cohesive, or there is a lot of time and distance
                                         
                                         between where the data lives and where the compute lives, and there's a lot of latency in the fabric connecting those
                                         
                                         things, you stretch time to solution out and your ability to distribute the problem, let's say on a
                                         
                                         training run across more and more nodes because you need to scale compute capacity, it escalates
                                         
    
                                         or exponentially increases out of control whereby language, you know, language model that, you know,
                                         
                                         you're trying to build might take weeks to months, but with the optimized architecture,
                                         
                                         and if you are able to, again, shrink time and distance between all these resources and make
                                         
                                         them appear as one computational unit, one really big processor with really big RAM and commensurate storage, then you
                                         
                                         now are able to take that problem that was solvable in weeks or worse and now deliver
                                         
                                         an answer on a training run in potentially minutes or hours.
                                         
                                         And that's kind of what's forcing all of this is that we see where the state of the
                                         
                                         art is going with the most important use cases
                                         
    
                                         that enterprises are trying to tackle. And we're seeing the need for a purposeful approach in terms
                                         
                                         of the right kind of network fabric with minimal latency and highest bandwidth, with having the
                                         
                                         right kind of storage subsystem that's compute proximate and compute that's data proximate, all those things coming
                                         
                                         together, forcing you into really this optimal architecture, kind of like we started out with
                                         
                                         this design balance problem, right? Right. Yeah. And then I would like to talk a little bit about
                                         
                                         scalability, right? The funny thing about an AI project is a successful AI project will actually
                                         
                                         generate even more data, right? So your problem actually becomes worse and
                                         
                                         worse over time if you don't have a scalability model. Is that something that you see happening
                                         
    
                                         in the future as well where people keep on adding more and more data or do you see technologies like
                                         
                                         transfer learning kind of helping you taking some shortcuts left and right in order to kind of not start from scratch and always having to add data, but to start from a baseline?
                                         
                                         Yeah, I think a number of these techniques will help mitigate certainly the amount of data coming into the enterprise that's either fueling initial
                                         
                                         model development and prototyping or simply operational data that comes in every day,
                                         
                                         every minute, every second being processed by models at the edge of the enterprise.
                                         
                                         I think what all of this is forcing is a rethinking of where we situate this infrastructure
                                         
                                         relative to the data.
                                         
                                         We think a lot about data gravity.
                                         
    
                                         That's been top of mind for us and a lot of folks in our ecosystem
                                         
                                         because we see what's happening and we see that a lot of organizations
                                         
                                         are spending a lot of time and effort fighting data gravity.
                                         
                                         And it's like fighting planetary gravity, you know, you spend a lot of time and money fighting data gravity. And it's like fighting planetary gravity. You know,
                                         
                                         you spend a lot of time and money escaping it or working against it when really what you want to
                                         
                                         do is bring your resources to where the data is, bring your applications to where the data is. And
                                         
                                         this is why we see a lot of organizations, for instance, repatriating workload in close proximity to where that
                                         
                                         data is being created.
                                         
    
                                         I heard this bumper sticker version of this years ago, train where your data lands.
                                         
                                         And I've kind of assimilated it and made it my own because I definitely feel that that's
                                         
                                         true.
                                         
                                         And it's true for a lot of our customers as well.
                                         
                                         And in terms of their mentality around what they need to do as far as their infrastructure and resource strategy.
                                         
                                         Yeah, I think data gravity is,
                                         
                                         I mean, if you ask a lot of enterprises,
                                         
                                         they will bring that up
                                         
    
                                         as one of the issues they're having.
                                         
                                         But I think also data gravity
                                         
                                         has to do a lot with architecture, right?
                                         
                                         So how it's designed, you know,
                                         
                                         with AI, typically you can start, you can start small.
                                         
                                         If you don't have an architecture that scales, nothing, nothing is going to help you with that.
                                         
                                         Right. And that's why I was mentioning earlier, the solution where a solution where the architect
                                         
                                         is kind of built in will help enterprises be more efficient without having to think about it. Right.
                                         
    
                                         You don't want to, you don't want to make the same mistakes over and over and over.
                                         
                                         Yes. Yeah, absolutely. Yeah. That's what I was kind of trying to get at the first year when I
                                         
                                         was talking about the balance of the system, because it's not about delivering maximum GPU
                                         
                                         throughput, because of course you can't do that unless you can deliver a system that can keep
                                         
                                         those things fed. And I think that maybe that's one area that people
                                         
                                         sell NVIDIA a little bit short on is because they look at the company as basically the GPU company,
                                         
                                         and they don't think of it as the systems company. And I guess that must be tough for you,
                                         
                                         because it's like, you're the systems guy, right? I mean, that's, that's your thing.
                                         
    
                                         Yeah, definitely. So, you know, traditionally, or historically, people have not thought of us as an enterprise
                                         
                                         systems company or a provider of IT infrastructure. And for the very reasons you described,
                                         
                                         and rightfully so, if you look at kind of our heritage, where we came from,
                                         
                                         that doesn't surprise me at all. I think gradually, we are starting to change that mindset, but, you know, enterprises
                                         
                                         also have expectations in terms of, you know, we love hero stories around big science and a lot of
                                         
                                         the incredible work done at the leading edge of research. But, you know, I think what we've worked
                                         
                                         on intently over the last few years is with our customers, with industries
                                         
                                         helping to showcase, I think the less sexy, but maybe more boring, but fundamental pragmatic
                                         
    
                                         use cases that are powering businesses today, especially in like challenging turbulent times,
                                         
                                         things that are helping them to improve customer intimacy
                                         
                                         and enhancing every customer interaction, streamlining operations and reducing costs
                                         
                                         and delivering competitive agility.
                                         
                                         These are things that almost every business we talk to cares about.
                                         
                                         And coincidentally, they're now talking with us about how to deploy you know, deploy the right kind of AI infrastructure to do these exact kind of things, which AI is obviously really great at.
                                         
                                         Yeah, without making it too much about you personally, I think the fact that NVIDIA has people like you who come from more of an enterprise, you know, data center background instead of people who are just more, you know, GPU focused. I think
                                         
                                         that that actually really helps the company. And frankly, that's one of the things, in my opinion,
                                         
    
                                         that they got with Mellanox as well, is that you've got a group of people who are used to
                                         
                                         selling not just into HPC, which of course they are, and cloud, but into enterprise environments
                                         
                                         as well. Did you find that it was a challenge though? I mean, is this a continuing challenge in the core DNA of the AI compute architecture,
                                         
                                         right, of these AI compute systems. And I think there's kind of two key modes of operation or
                                         
                                         two kinds of IT essentially that we see flourishing now in enterprises that are really leaning into AI.
                                         
                                         And on one end, you have infrastructure that customers need that is
                                         
                                         purpose-built to only do one thing and one thing only, and that is to take the most complex model
                                         
                                         and shrink it down into a, you know, a shortest amount of time possible to deliver an answer,
                                         
    
                                         right? Shortest time per training run and the most complex AI problem that they've got.
                                         
                                         And they come to us for that, right?
                                         
                                         And they know, for instance, that a DGX system,
                                         
                                         it's got one purpose in life.
                                         
                                         It has one purpose in life is to execute that training run
                                         
                                         as fast as possible and iterate as fast as possible.
                                         
                                         But on the flip side of it,
                                         
                                         on the other side of IT,
                                         
    
                                         when it comes to deploying a tuned model, one that's ready for inference, one that's ready to be used in operation production, enterprises have a lot of viable solutions that embed our technology, but in what we would consider approved servers or certified servers, we call them NV certified, but essentially servers that
                                         
                                         incorporate the best of our technology, but offered by our ecosystem partners. And so we see this
                                         
                                         duality coming together and most enterprises kind of need both. They need the development
                                         
                                         infrastructure, they need the deployment infrastructure. We're kind of in all of it.
                                         
                                         And, you know, this also brings our valued server partners into the equation as well, because
                                         
                                         they enable this, you know, our customers have made a huge investment in a lot of household names
                                         
                                         that are pervasive across their data center. And they want to be able to leverage that same
                                         
                                         investment to be able to run these applications and deployment at scale. And so we want to enable
                                         
    
                                         that. And that's important, of course, because as you
                                         
                                         mentioned, there are companies that NVIDIA works closely with and partners with, and certainly
                                         
                                         you're not going out there and fighting with any of these big ISVs. You're partnering with them
                                         
                                         and enabling them to sell solutions as well. Yeah, absolutely. And that's, you know, also part of why we build
                                         
                                         the platforms that we do. Oftentimes, what we see missing in the marketplace is a gap that we can
                                         
                                         step into and build kind of the proof point or the blueprint, offer that blueprint to our ecosystem,
                                         
                                         our partners who know how to do this stuff at scale can take that
                                         
                                         and then build solutions using it. And that's kind of the mission of DGX. It's the mission of a lot
                                         
    
                                         of the things that we build that essentially provide that blueprint to ISVs around the world
                                         
                                         who do this really well at scale and let them be successful with it. So yeah, you're absolutely right. Yeah, you talked a little bit about corporate IT. I think one of the trends we have
                                         
                                         seen in the past is that typically GPUs was just for the research crowd, right? The people that
                                         
                                         were really technical, had an understanding of what was going on. I think nowadays, because AI is such a dominant factor, that it's more moving towards the corporate IT world where it's becoming more and more standard.
                                         
                                         However, there is still a technology gap between the hardcore research people, if you want, and corporate IT.
                                         
                                         How do you see enterprises have conversations with NVIDIA?
                                         
                                         Do you feel that the conversations are more technical or more strategic?
                                         
                                         It's really both.
                                         
    
                                         It's evolved over time.
                                         
                                         It started off, and it still is the case that, you know, developers are our best friend.
                                         
                                         They grow up, you know, from the earliest stages and starting in school using our toolkits
                                         
                                         and our software.
                                         
                                         Many of them cut their teeth on CUDA
                                         
                                         and learn how to work with our GPUs from there. And then they eventually land in enterprise and
                                         
                                         building incredible things. So over the years, obviously, we've courted the developer community
                                         
                                         because we want them to have an incredible experience using our software to do what they
                                         
    
                                         need to do, their life's most important work.
                                         
                                         On the flip side of it, obviously, increasingly, we're getting on the CIO's radar.
                                         
                                         And we have regular roundtables with CIOs around the globe meeting with our executives.
                                         
                                         And we have this ongoing dialogue with pretty much every enterprise focused AI business that's out there.
                                         
                                         And a lot of times those conversations turn to, you know, how do I take this stuff that used to
                                         
                                         be science projects and now operationalize it at scale? And those conversations very much are
                                         
                                         around strategy. They're very much around things like MLOps and how to manage this kind of
                                         
                                         infrastructure and why do I need purpose-built infrastructure and what is the ecosystem and the
                                         
    
                                         whole offer look like beyond a really fast computational box? What is the complete end
                                         
                                         architecture look like? So it's really kind of hitting both ends And this is why you see, for instance, our partnership with companies
                                         
                                         like VMware, right? And those in the storage community, we work with them because we know
                                         
                                         that a lot of our partners are the trusted names with leadership. And by us working together,
                                         
                                         we're simplifying a problem for our customers and enabling them to onboard this infrastructure much quicker.
                                         
                                         They all need to do this.
                                         
                                         And many of them are doing this because they need to scale.
                                         
                                         They need to solve the shadow AI problem, you know, development silos spawning across their enterprise.
                                         
    
                                         They want to consolidate people, process, and platform.
                                         
                                         And so centralized shared IT infrastructure enables to do them. And
                                         
                                         so we see ourselves sitting at the table with them trying to help them with that.
                                         
                                         Right. I mean, it seems like you're helping customers with their AI roadmap, not necessarily
                                         
                                         the NVIDIA component of it. It's just like a roadmap and how NVIDIA can help and point out which partners can help them be successful.
                                         
                                         Yeah, very much so. I think the most valuable guidance we can give in the conversations we want to have with these customers,
                                         
                                         you know, on the development side of it is to share with them the best of what we know, whether that's commercialized or not. Oftentimes it's not simple. It's oftentimes just simply a matter of connecting, you know,
                                         
                                         a developer researcher with one of our own, you know, PhDs or scientists or researchers and
                                         
    
                                         letting those conversations happen because we've made it a point to onboard some of the best talent
                                         
                                         we can find out there from science and academia
                                         
                                         and enterprise and make those resources available to our customers so that they can benefit
                                         
                                         from the same things we're figuring out at the same time.
                                         
                                         I mean, we run a fairly large R&D shop and we are spending a lot of effort trying to
                                         
                                         move the ball forward in some really critical places for enterprises.
                                         
                                         We want them to be a part of that exploration and share in what we figure out along the way.
                                         
                                         I wonder if we can turn a little bit here, a little bit toward more of the specifics of
                                         
    
                                         what AI requires from a system. So you've been intimately involved with developing these
                                         
                                         specialized solutions for AI applications. And of course, this is utilizing AI.
                                         
                                         So maybe, can you talk a little bit about the various aspects of the system and maybe anything
                                         
                                         you've learned over the years that is essentially, you know, really a critical component to building
                                         
                                         a system to support AI applications? Yeah, it's a great question, Stephen.
                                         
                                         The reality is that AI is unique in how it consumes resources, unlike typical enterprise workload. And achieving, as I was saying, fastest time solution on a model requires having enough
                                         
                                         of those computational power of those resources in combination with ultra high performance storage, really high IOPS,
                                         
                                         really low latency to feed those data sets during a training run with everything connected over a
                                         
    
                                         very high speed, low latency network fabric, which increasingly obviously is InfiniBand when you're
                                         
                                         talking about multi-system, large cluster implementations. This is typically what we've found is needed to tackle
                                         
                                         the largest AI problems or models that need to be ultimately parallelized over multiple systems,
                                         
                                         if you want to have a reasonable timeframe within which to train those models.
                                         
                                         We talked about design balance. One of the challenges and the things that we solve for is ensuring linearity of performance with system scale.
                                         
                                         And this is at multiple levels. connected GPUs and I should be able to distribute my problem and get faster and faster performance
                                         
                                         as I string together more and more of these systems. But we know that that breaks down
                                         
                                         because as your problem gets larger and you try to parallelize it over more and more systems,
                                         
    
                                         you incur a lot more communications overhead to distribute the problem across all those systems. And so you get diminishing returns
                                         
                                         as you scale, if you approach it in the classical way of just, you know, a lot of PCIe connected
                                         
                                         GPUs with a pretty standard ethernet fabric and multiple systems. And that's why a lot of what
                                         
                                         is in the core DNA of our systems, things like NVLink, which offers a high bandwidth inter-GPU bus, if you will,
                                         
                                         to make eight GPUs seem as one, right? And NVSwitch is the other part of that technology
                                         
                                         that creates that inter-GPU fabric. Combined with the InfiniBand fabric connecting multiple systems, when you go from one to two to four to
                                         
                                         140, which is, you know, the ultimate in scale that I think we've, you know, been able to
                                         
                                         demonstrate with what we call a DGX super pod. This, you know, having this approach at the core
                                         
    
                                         system level, and then at the scale out level and being prescriptive around what the
                                         
                                         network topology looks like across all those systems has enabled us to demonstrate that
                                         
                                         linearity of performance such that there is no or minimal drop-off as you get to your 140th system. And it used to be, I would be floored by the idea of anybody stringing together 140
                                         
                                         systems over a single network fabric. You know, a couple of years later, I'm actually not that
                                         
                                         surprised because there are so many organizations who are doing exactly that, especially in areas
                                         
                                         like NLP and large recommenders and autonomous vehicle system
                                         
                                         development this is the kind of scale that they're operating so at such that they can iterate fast
                                         
                                         and and and um get an answer to a training run in you know in hours and days instead of weeks
                                         
    
                                         and months kind of thing yeah i agree a couple of years ago when when two two gpus in the server was considered
                                         
                                         like the top and i said we wanted more uh they called us crazy and then not not that long after
                                         
                                         you know there were more more gpus so it seems like nvidia i mean first of all ai is all about
                                         
                                         bottlenecks right you always have bottlenecks you try to solve the bottlenecks with NVLink and so on. So you kind of keep on solving the bottlenecks you see on the road. Is there
                                         
                                         any particular bottleneck you see today that NVIDIA really focuses on and believes that it will
                                         
                                         accelerate and remove a lot of roadblocks? Yeah, if you look at, for instance,
                                         
                                         GPU direct storage and MagmaIO,
                                         
                                         we're solving for the problem of the inherent latency
                                         
    
                                         incurred when the data path has to move through
                                         
                                         this host CPU before it gets to the GPU.
                                         
                                         And essentially short-circuiting that
                                         
                                         and offering a streamlined path from the data store
                                         
                                         in like external storage through a NIC in the in the server that's obviously optimized with what we call a DPU a data processing unit direct to the GPU.
                                         
                                         It speaks to the very thing you raised Frederick namely this idea of eliminating every bottleneck as we see it, that's inhibiting larger and larger levels of scale.
                                         
                                         So GPU direct storage is probably the latest example I have, you know, in combination with the DPU,
                                         
                                         that's allowing us to now ensure that there is minimal latency, minimal speed bumps between where the data lives and the computational power that needs to act upon
                                         
    
                                         it, right? So that's very much the way you described it is perfect. Namely, it is really about
                                         
                                         eliminating those bottlenecks, especially when you talk about distributing a problem over multiple
                                         
                                         systems. Yeah. So you're talking about reducing the impact of the host? How about eliminating the host completely?
                                         
                                         I mean, isn't that what the ARM ID was,
                                         
                                         is to kind of provide a bootable GPU
                                         
                                         that didn't require a host?
                                         
                                         I actually don't know that we would,
                                         
                                         I wouldn't look at it necessarily as eliminating the host.
                                         
    
                                         I see it as an adjunct
                                         
                                         and there is a natural bifurcation
                                         
                                         of what functions exist on one
                                         
                                         kind of processor versus the others because um essentially you know we're always going to have
                                         
                                         like mixed workloads except in in the realm of you know training where i think if you're trying to
                                         
                                         implement infrastructure to train uh very challenging models very complex problems
                                         
                                         you're going to have you're going to be very purposeful
                                         
                                         and very singular in what kinds of workloads run on that infrastructure. But increasingly,
                                         
    
                                         there is kind of this deployment infrastructure that needs to handle a much more wider palette
                                         
                                         of mainstream acceleratable applications. And in those environments, you need to have a way to still support applications that depend on
                                         
                                         traditional CPU, but also can offload as much of what doesn't need to be done on a CPU onto devices
                                         
                                         like a DPU, as an example. So we see in kind of those heterogeneous environments, you're still
                                         
                                         going to have kind of this multiplicity or duality of processor types. So it sounds like really NVIDIA
                                         
                                         is not only the GPU company that everybody thinks, but also a major player in enterprise AI
                                         
                                         applications. And I think that that may not come as a surprise to a lot of our listeners, but
                                         
                                         maybe some of them it might, because many of the people who are just starting to look at deploying
                                         
    
                                         AI applications are starting to ask themselves, what kind of system am I going to need in order
                                         
                                         to support this? And quite frankly, the answer is that, you know, NVIDIA has already answered
                                         
                                         that question for you with the DGX systems and in partnership with many of the familiar names
                                         
                                         that you're probably already working with today. So you can find yourself a balanced system that not only supports small AI and ML applications,
                                         
                                         but can scale up to really massive proportions here.
                                         
                                         And they got that covered for you, especially now that they're rolling out new products
                                         
                                         and technologies.
                                         
                                         So the time has come, Tony, to move
                                         
    
                                         on to the fun part of the podcast, where we talk about some things that are a little unexpected.
                                         
                                         And that leads us to our famous three questions. This tradition started back in season two,
                                         
                                         and we're now carrying it through to season three. But, you know, we're adding a little twist here.
                                         
                                         So our guest has not been prepared for these questions ahead of time and we're going to get their answers off the cuff right as we speak.
                                         
                                         The difference this season is that I'm going to ask a question and Fred's going to ask a question, but the third question actually comes from a previous guest on the podcast.
                                         
                                         And of course if Tony has one he can pay it forward here and ask a question of a future guest on our podcast.
                                         
                                         So let's kick things off. Fred, do you want to take the first question?
                                         
                                         Sure. So how big can ML models get? I mean, today there's hundreds of billions of parameters for a model, which might look small tomorrow.
                                         
    
                                         You know, is there a limit, you know,
                                         
                                         can it keep on growing?
                                         
                                         Yeah, it's a, it's a great question.
                                         
                                         I am always careful not to try and define an upper bound because when I thought 8 billion parameters on a language model was a big deal,
                                         
                                         lo and behold, you know, a few years later, we were here in a GPT-3, right?
                                         
                                         So I, you know, a few years later, we're here in a GPT-3, right? So I, you know, all of this
                                         
                                         will evolve and continue to scale in response to the infrastructure and the tools ability
                                         
                                         to enable models of that size. I can definitely see that the use cases and applications will only
                                         
    
                                         continue to drive us to larger and larger models.
                                         
                                         So I'd really say that there isn't a conceivable upper bound if we're kind of keeping our imaginations open to the art of the possible or the art of what could be.
                                         
                                         Excellent. Now for something a little bit more fun.
                                         
                                         So in Hollywood, they love to show us artificial
                                         
                                         intelligence that's basically an artificial person, like Mr. Data or somebody like that.
                                         
                                         Do you think we'll ever get to that point where we'll have just sort of a general artificial mind,
                                         
                                         somebody that we interact with walking around that's AI? You know, I think about this in two
                                         
                                         ways. One is, if I really, you know, put on the science fiction
                                         
    
                                         hat of things, the idea of a sentient being that is aware of itself and you can interact it
                                         
                                         like in a truly human way, that part of it, you know, kind of freaks me out to be perfectly honest.
                                         
                                         I mean, I don't know that any of us are really prepared for it, but maybe that's an eventuality that happens sometime way off in the distant future.
                                         
                                         If you look at the trajectory of things, you kind of wonder, could we eventually get there?
                                         
                                         And that's a really hard one to wrap one's mind around. But what increasingly is apparent to me is with the advent of these incredibly large,
                                         
                                         as we say, big NLP type models, they are increasingly presenting themselves in a way
                                         
                                         that you almost think behind the covers, there's someone incredibly smart. I recently saw a video
                                         
                                         pitting three different generations
                                         
    
                                         of GPT models against each other doing trivia questions and such. And it floored me how
                                         
                                         quickly they could deliver answers to some of the most, you know, arcane type questions and details.
                                         
                                         And I think that that level of intelligence backed by essentially an algorithm that knows how to connect the dots from oceans and oceans of data in milliseconds.
                                         
                                         I think that's something very real, very real and very possible.
                                         
                                         And we're already kind of seeing that hit the doorstep of enterprise, just to be quite honest.
                                         
                                         Our third question comes from a guest on season three, episode two. Take it away, Mike. This is Michael Malley, SVP of marketing and sales
                                         
                                         for Seneca Global. And my question is, can you give an example where an AI algorithm went terribly
                                         
                                         wrong and gave a result that clearly wasn't correct? I'd love to hear that. Yeah, you know, one example that's probably a lesson for all of us is I've seen where
                                         
    
                                         we've had NLP-based chatbots in, you know, engaging the Twitterverse and basically over
                                         
                                         time, you know, evolving to give answers that were really off color, really inappropriate,
                                         
                                         really bad for general consumption, but the algorithm was simply doing what it was programmed to do in response to
                                         
                                         the input received. And I think it's also a lesson in how, while this technology is incredibly
                                         
                                         powerful, there needs to be careful governance and thought around the data fueling these things and
                                         
                                         looking for things like bias and looking for explainability and understanding how the answer
                                         
                                         to the question is derived, such that we don't have AI that essentially goes completely off the
                                         
                                         rails and says or does a bunch of things that could really embarrass us or worse.
                                         
    
                                         Thank you so much, Mike. And Tony, thank you very much for joining us today. We look forward to
                                         
                                         hearing what your question might be for a future guest. And if you, the listeners, want to be part
                                         
                                         of this, you can. Just send an email to host at utilizing-ai.com and let us know you want to be part of our
                                         
                                         three questions segment.
                                         
                                         Tony, thank you again for joining us today.
                                         
                                         Where can people connect with you and follow your thoughts on enterprise AI and other topics?
                                         
                                         Well, first of all, thank you for having me, Stephen and Frederick.
                                         
                                         I had a great time.
                                         
    
                                         You guys can find me at Tony Paikaday on Twitter, and I'm on LinkedIn as well.
                                         
                                         Great. We'll include that in the show notes. Fred, how are things going with you?
                                         
                                         Doing well. So it's funny, we're having this conversation with NVIDIA because I'm working
                                         
                                         on a project with a super pot. So I'm learning all the internals and the outsides of the super
                                         
                                         pot. So I'm looking forward to doing that.
                                         
                                         Excellent.
                                         
                                         And as for me, I've been really enjoying following some of the announcements coming out
                                         
                                         of all the various exciting AI products.
                                         
    
                                         And we've been covering a lot of that
                                         
                                         on the Gestalt IT Rundown on Wednesdays
                                         
                                         at gestaltit.com.
                                         
                                         So thank you everyone for joining us here
                                         
                                         for the Utilizing AI podcast.
                                         
                                         If you enjoyed this discussion,
                                         
                                         please do shoot us a
                                         
                                         rating review on iTunes because that sure does help. And also please share the show with your
                                         
    
                                         friends. This podcast is brought to you by gestaltit.com, your home for IT coverage from
                                         
                                         across the enterprise. For show notes and more episodes, go to utilizing-ai.com or find us on
                                         
                                         Twitter at utilizing underscore AI. Thanks for joining and we'll see you next week.
                                         
