Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 3x03: Platform Considerations For Deploying AI At Scale with Tony Paikeday of NVIDIA

Episode Date: September 21, 2021

Enterprises are working to simplify the process of deploying and managing systems to support AI applications. That's what NVIDIA's DGX architecture is designed to do, and what we'll talk about on this... episode. Frederic Van Haren and Stephen Foskett are joined by Tony Paikeday, Senior Director, AI Systems at NVIDIA, to discuss the tools needed to operationalize AI at scale. Although many NVIDIA DGX systems have been purchased by data scientists or directly by lines of business, it is also a solution that CIOs have embraced. The system includes NVIDIA GPUs of course but also CPU, storage, and connectivity and all of this is held together with software that makes it easy to use as a unified solution. AI is a unique enterprise workload in that it requires high storage IOPS and low storage and network latency. Another issue is balancing these needs to scale performance in a linear manner as more GPUs are used, and this is why NVIDIA relies on NVLink and NVSwitch as well as DPU and InfiniBand to connect the largest systems Three Questions  How big can ML models get? Will today's hundred-billion parameter model look small tomorrow or have we reached the limit? Will we ever see a Hollywood-style “artificial mind” like Mr. Data or other characters? Can you give an example where an AI algorithm went terribly wrong and gave a result that clearly wasn’t correct? *Question asked by Mike O'Malley of SenecaGlobal. Guests and Hosts Tony Paikeday, Senior Director Senior Director, AI systems at NVIDIA. Connect with Tony on LinkedIn or on Twitter at @TonyPaikeday.  Frederic Van Haren, Founder at HighFens Inc., Consultancy & Services. Connect with Frederic on Highfens.com or on Twitter at @FredericVHaren. Stephen Foskett, Publisher of Gestalt IT and Organizer of Tech Field Day. Find Stephen’s writing at GestaltIT.com and on Twitter at @SFoskett.       Date: 9/21/2021 Tags: @TonyPaikeday, @nvidia, @SFoskett, @FredericVHaren

Transcript
Discussion (0)
Starting point is 00:00:00 I'm Stephen Foskett. And I'm Frederik van Herren. And this is the Utilizing AI podcast. Welcome to another episode of Utilizing AI, the podcast about enterprise applications for machine learning, deep learning, and other artificial intelligence topics. This week, Frederik and I are talking about the many ways in which different platforms are challenged by AI applications, and the fact that AI requires a completely different set of infrastructure and resources than conventional applications. Yeah, indeed. I mean, AI really is based on a bunch of software
Starting point is 00:00:46 frameworks that are heavily mathematically based. And the needs for mathematical multiplications and executions per second has grown so fastly that the traditional concept of a CPU hasn't worked. And so there's a need for a much faster capability to process all those multiplications. And the GPUs obviously are a perfect solution to solve those problems. We've talked in previous episodes about the need for much more networking bandwidth and storage capacity, storage resources in terms of performance, memory, and of course, as you mentioned, GPUs. And of course, any talk of GPUs leads to the monster of the GPU market, which is NVIDIA. So we're very pleased to be joined today by somebody from NVIDIA. Tony Piketty is somebody that we've talked with previously about all the interesting aspects of, well,
Starting point is 00:01:45 the changes that are coming in platforms to support enterprise AI applications. Tony, why don't you introduce yourself a little bit to the audience? Hi, thanks for having me. I'm Tony Paikde from NVIDIA. So I'm responsible for AI systems, product marketing at NVIDIA. We have a portfolio of enterprise solutions called the DGX system. So a lot of my team's charters are around helping enterprises around the world, you know, kind of democratize access to AI and AI infrastructure and build incredible applications to help power their business? It seems to me that the key to understanding the DGX systems is not that it's some kind of, I don't know, special configuration. It's that it's all about balance. It's about balancing the system resources to support the GPU in AI and other GPU heavy workloads. And that to me is the key here, because I think a lot of people
Starting point is 00:02:45 think, well, if I just throw everything at the problem, then everything will work great. And that might be true, but it strikes me that DGX really isn't about throwing everything at the problem. It's making sure that the system is ready to keep the GPU busy. Is that right? Yeah, that's certainly, excuse me, a very important part of this. But I would actually say that the way we've looked at the problem in enterprise is around how do you simplify how enterprises can deploy and manage infrastructure specifically for the purpose of running AI workload. And so when you look at it from that perspective, there's a lot of layers in the equation. Certainly the GPUs that are there,
Starting point is 00:03:32 all the things that surround it from an IO, bandwidth, memory, storage, network fabric, all those things matter certainly. The architecture that we're talking about, this design balance is very important. And I think there's a lot of organizations that oftentimes try to piece this stuff together. And sometimes you have the expertise, sometimes you don't necessarily in terms of striking the right design balance to ensure that GPUs are kept fed with data during a training run. But even beyond that, what we found at NVIDIA is just as importantly or even more importantly than the hardware is obviously the software. We spent a lot of time within the DGX business unit
Starting point is 00:04:15 and NVIDIA at large optimizing a complete software stack. And what we've realized is that everyone from data science developers, practitioners, to people who manage IT infrastructure, stack. And what we've realized is that everyone from data science developers, practitioners to people who manage IT infrastructure, they essentially need the right tools and platform such that they can actually operationalize AI at scale. And what I mean by that is being able to see more of their valuable intellectual property in terms of viable models and prototypes actually deployed in production. This is a classical problem that's been solved in conventional enterprise apps,
Starting point is 00:04:50 but a lot of businesses are right now struggling with how to manage and scale workflow that can allow data science developers to do incredible things on one end and have it realized in production applications at the other. So we spend a lot of time thinking about the tooling and the software that needs to enable that. And then in combination with that, making expertise available to our customers such that when they have a question about a framework, about a model type, about a use case, or about things like drivers, libraries, and communication primitives, that they have someone that they can talk to.
Starting point is 00:05:30 So really for us, the approach has been full stack in that respect to help organizations scale AI. Yeah, I totally agree with that. I mean, not so long ago, you needed like an MBA and a bunch of PhDs to set up an AI environment. I mean, let alone the complexity of the hardware. So I do really agree what you said, that there is a need for, first of all, a complete software stack, right? So many more people do AI today or have to do AI to stay competitive.
Starting point is 00:06:02 And they need not only the tools tools but also the support in order to make this happen. A lot of people talk about the democratization of AI where you provide the hardware and I think there's one key component you brought up which is the software. The hardware by itself is not the end solution, it's the complete stack and the ability to help end users. And I also want to point out that NVIDIA is not necessarily known for contributing a lot in the open source community, but the reality is they do, right? And it's just to enable not only the hardware,
Starting point is 00:06:42 but the ability for people to do AI that don't necessarily know all the different components. Yeah, I'm so glad you brought that up, Frederick, because if you think about applications like NLP or recommender systems or any number of foundational AI use cases, many organizations, maybe the most, lack the data science expertise or the access to data sets or experience building and training models to be able to do that stuff from scratch. So we really do folks a disservice if we don't give them ready-made tools that allow them to pick, for instance, pre-trained models off the shelf and then do fine-tuning around the edges to optimize them for their unique vocabulary or their unique set of problems, right? So I think increasingly the state of the art is one that
Starting point is 00:07:36 reflects what you just described, namely offering more and more ready to kind of plug and play type applications delivered in, again, pre-trained models, scripts, and other content that lets developers exert a lot less effort doing the foundational work and simply being able to now plug and play into their enterprise. Right. I would even say that customers today are basically saying, I want a solution. They don't want to detail all the different little items. It's to the point where people understand that the expertise is with NVIDIA from a software and a hardware stack. And they're just trying to, you know, let's buy a solution that works for us. Yeah. You know, you know, DGX has been around for five years. And if I look at our own trajectory, which I think reflects what we've seen in the broader market, so much of the early work that we saw with AI pioneers was born on the backs of like what you'd consider hyperscale type organizations who had deep pockets and incredible bench of expertise to build incredible infrastructure to solve really complex problems, right? And you'd expect them to do because they have the capital and operating budget to do that.
Starting point is 00:08:51 And a lot of what we'd seen over the time might even classify, if you're looking at enterprises as potentially shadow AI, where you'd have data science teams or business units building what they needed to build outside of any kind of IT governance and certainly outside of any kind of IT shared infrastructure. And now we're seeing more and more CIOs and IT leaders wanting to define the infrastructure strategy because they see, in some respects, a lot of costs running out of control as developers run up OPEX trying to do DIY platform instead of having an IT sanctioned environment that centralizes all of that. So this has now put IT in the lens and forced us all to think about how do we make it simpler for IT teams to manage.
Starting point is 00:09:40 Yeah, so we're talking about the DGX. I mean, the DGX, I think GPUs. I mean, one other thing which is important is, as Stephen mentioned, there is not just the GPUs, which you could consider that compute, but there's also the network and the storage, right? So from a network perspective, most people know that you have acquired Mellanox. So you kind of have the two pieces from a hardware perspective. And then with storage, I assume you're partnering with industry leaders from a storage perspective. And then also the announcements about ARM was also very interesting, where you kind of tried to optimize your infrastructure. So is that a trend you see from enterprises where you kind of feel the need to fill the gap, so to speak, like with Mellanox and other components? Do you see yourself kind of selling not just the DGX, but a solution where you provide NVIDIA stamped network, and storage? Yeah, this is really driven by the problem statement, how do you achieve the fastest time solution on the most complex AI problems an enterprise might face? And the challenge has been when the essential resources
Starting point is 00:10:57 are disaggregated from each other and treated in almost piecemeal, you have a real challenge in terms of how do you parallelize the problem, the computational problem, in a way that you can effectively scale and shrink that time to solution, right? And when things are not necessarily cohesive, or there is a lot of time and distance between where the data lives and where the compute lives, and there's a lot of latency in the fabric connecting those things, you stretch time to solution out and your ability to distribute the problem, let's say on a training run across more and more nodes because you need to scale compute capacity, it escalates
Starting point is 00:11:39 or exponentially increases out of control whereby language, you know, language model that, you know, you're trying to build might take weeks to months, but with the optimized architecture, and if you are able to, again, shrink time and distance between all these resources and make them appear as one computational unit, one really big processor with really big RAM and commensurate storage, then you now are able to take that problem that was solvable in weeks or worse and now deliver an answer on a training run in potentially minutes or hours. And that's kind of what's forcing all of this is that we see where the state of the art is going with the most important use cases
Starting point is 00:12:25 that enterprises are trying to tackle. And we're seeing the need for a purposeful approach in terms of the right kind of network fabric with minimal latency and highest bandwidth, with having the right kind of storage subsystem that's compute proximate and compute that's data proximate, all those things coming together, forcing you into really this optimal architecture, kind of like we started out with this design balance problem, right? Right. Yeah. And then I would like to talk a little bit about scalability, right? The funny thing about an AI project is a successful AI project will actually generate even more data, right? So your problem actually becomes worse and worse over time if you don't have a scalability model. Is that something that you see happening
Starting point is 00:13:13 in the future as well where people keep on adding more and more data or do you see technologies like transfer learning kind of helping you taking some shortcuts left and right in order to kind of not start from scratch and always having to add data, but to start from a baseline? Yeah, I think a number of these techniques will help mitigate certainly the amount of data coming into the enterprise that's either fueling initial model development and prototyping or simply operational data that comes in every day, every minute, every second being processed by models at the edge of the enterprise. I think what all of this is forcing is a rethinking of where we situate this infrastructure relative to the data. We think a lot about data gravity.
Starting point is 00:14:09 That's been top of mind for us and a lot of folks in our ecosystem because we see what's happening and we see that a lot of organizations are spending a lot of time and effort fighting data gravity. And it's like fighting planetary gravity, you know, you spend a lot of time and money fighting data gravity. And it's like fighting planetary gravity. You know, you spend a lot of time and money escaping it or working against it when really what you want to do is bring your resources to where the data is, bring your applications to where the data is. And this is why we see a lot of organizations, for instance, repatriating workload in close proximity to where that data is being created.
Starting point is 00:14:47 I heard this bumper sticker version of this years ago, train where your data lands. And I've kind of assimilated it and made it my own because I definitely feel that that's true. And it's true for a lot of our customers as well. And in terms of their mentality around what they need to do as far as their infrastructure and resource strategy. Yeah, I think data gravity is, I mean, if you ask a lot of enterprises, they will bring that up
Starting point is 00:15:12 as one of the issues they're having. But I think also data gravity has to do a lot with architecture, right? So how it's designed, you know, with AI, typically you can start, you can start small. If you don't have an architecture that scales, nothing, nothing is going to help you with that. Right. And that's why I was mentioning earlier, the solution where a solution where the architect is kind of built in will help enterprises be more efficient without having to think about it. Right.
Starting point is 00:15:41 You don't want to, you don't want to make the same mistakes over and over and over. Yes. Yeah, absolutely. Yeah. That's what I was kind of trying to get at the first year when I was talking about the balance of the system, because it's not about delivering maximum GPU throughput, because of course you can't do that unless you can deliver a system that can keep those things fed. And I think that maybe that's one area that people sell NVIDIA a little bit short on is because they look at the company as basically the GPU company, and they don't think of it as the systems company. And I guess that must be tough for you, because it's like, you're the systems guy, right? I mean, that's, that's your thing.
Starting point is 00:16:19 Yeah, definitely. So, you know, traditionally, or historically, people have not thought of us as an enterprise systems company or a provider of IT infrastructure. And for the very reasons you described, and rightfully so, if you look at kind of our heritage, where we came from, that doesn't surprise me at all. I think gradually, we are starting to change that mindset, but, you know, enterprises also have expectations in terms of, you know, we love hero stories around big science and a lot of the incredible work done at the leading edge of research. But, you know, I think what we've worked on intently over the last few years is with our customers, with industries helping to showcase, I think the less sexy, but maybe more boring, but fundamental pragmatic
Starting point is 00:17:15 use cases that are powering businesses today, especially in like challenging turbulent times, things that are helping them to improve customer intimacy and enhancing every customer interaction, streamlining operations and reducing costs and delivering competitive agility. These are things that almost every business we talk to cares about. And coincidentally, they're now talking with us about how to deploy you know, deploy the right kind of AI infrastructure to do these exact kind of things, which AI is obviously really great at. Yeah, without making it too much about you personally, I think the fact that NVIDIA has people like you who come from more of an enterprise, you know, data center background instead of people who are just more, you know, GPU focused. I think that that actually really helps the company. And frankly, that's one of the things, in my opinion,
Starting point is 00:18:09 that they got with Mellanox as well, is that you've got a group of people who are used to selling not just into HPC, which of course they are, and cloud, but into enterprise environments as well. Did you find that it was a challenge though? I mean, is this a continuing challenge in the core DNA of the AI compute architecture, right, of these AI compute systems. And I think there's kind of two key modes of operation or two kinds of IT essentially that we see flourishing now in enterprises that are really leaning into AI. And on one end, you have infrastructure that customers need that is purpose-built to only do one thing and one thing only, and that is to take the most complex model and shrink it down into a, you know, a shortest amount of time possible to deliver an answer,
Starting point is 00:19:21 right? Shortest time per training run and the most complex AI problem that they've got. And they come to us for that, right? And they know, for instance, that a DGX system, it's got one purpose in life. It has one purpose in life is to execute that training run as fast as possible and iterate as fast as possible. But on the flip side of it, on the other side of IT,
Starting point is 00:19:42 when it comes to deploying a tuned model, one that's ready for inference, one that's ready to be used in operation production, enterprises have a lot of viable solutions that embed our technology, but in what we would consider approved servers or certified servers, we call them NV certified, but essentially servers that incorporate the best of our technology, but offered by our ecosystem partners. And so we see this duality coming together and most enterprises kind of need both. They need the development infrastructure, they need the deployment infrastructure. We're kind of in all of it. And, you know, this also brings our valued server partners into the equation as well, because they enable this, you know, our customers have made a huge investment in a lot of household names that are pervasive across their data center. And they want to be able to leverage that same investment to be able to run these applications and deployment at scale. And so we want to enable
Starting point is 00:20:42 that. And that's important, of course, because as you mentioned, there are companies that NVIDIA works closely with and partners with, and certainly you're not going out there and fighting with any of these big ISVs. You're partnering with them and enabling them to sell solutions as well. Yeah, absolutely. And that's, you know, also part of why we build the platforms that we do. Oftentimes, what we see missing in the marketplace is a gap that we can step into and build kind of the proof point or the blueprint, offer that blueprint to our ecosystem, our partners who know how to do this stuff at scale can take that and then build solutions using it. And that's kind of the mission of DGX. It's the mission of a lot
Starting point is 00:21:31 of the things that we build that essentially provide that blueprint to ISVs around the world who do this really well at scale and let them be successful with it. So yeah, you're absolutely right. Yeah, you talked a little bit about corporate IT. I think one of the trends we have seen in the past is that typically GPUs was just for the research crowd, right? The people that were really technical, had an understanding of what was going on. I think nowadays, because AI is such a dominant factor, that it's more moving towards the corporate IT world where it's becoming more and more standard. However, there is still a technology gap between the hardcore research people, if you want, and corporate IT. How do you see enterprises have conversations with NVIDIA? Do you feel that the conversations are more technical or more strategic? It's really both.
Starting point is 00:22:30 It's evolved over time. It started off, and it still is the case that, you know, developers are our best friend. They grow up, you know, from the earliest stages and starting in school using our toolkits and our software. Many of them cut their teeth on CUDA and learn how to work with our GPUs from there. And then they eventually land in enterprise and building incredible things. So over the years, obviously, we've courted the developer community because we want them to have an incredible experience using our software to do what they
Starting point is 00:23:03 need to do, their life's most important work. On the flip side of it, obviously, increasingly, we're getting on the CIO's radar. And we have regular roundtables with CIOs around the globe meeting with our executives. And we have this ongoing dialogue with pretty much every enterprise focused AI business that's out there. And a lot of times those conversations turn to, you know, how do I take this stuff that used to be science projects and now operationalize it at scale? And those conversations very much are around strategy. They're very much around things like MLOps and how to manage this kind of infrastructure and why do I need purpose-built infrastructure and what is the ecosystem and the
Starting point is 00:23:51 whole offer look like beyond a really fast computational box? What is the complete end architecture look like? So it's really kind of hitting both ends And this is why you see, for instance, our partnership with companies like VMware, right? And those in the storage community, we work with them because we know that a lot of our partners are the trusted names with leadership. And by us working together, we're simplifying a problem for our customers and enabling them to onboard this infrastructure much quicker. They all need to do this. And many of them are doing this because they need to scale. They need to solve the shadow AI problem, you know, development silos spawning across their enterprise.
Starting point is 00:24:39 They want to consolidate people, process, and platform. And so centralized shared IT infrastructure enables to do them. And so we see ourselves sitting at the table with them trying to help them with that. Right. I mean, it seems like you're helping customers with their AI roadmap, not necessarily the NVIDIA component of it. It's just like a roadmap and how NVIDIA can help and point out which partners can help them be successful. Yeah, very much so. I think the most valuable guidance we can give in the conversations we want to have with these customers, you know, on the development side of it is to share with them the best of what we know, whether that's commercialized or not. Oftentimes it's not simple. It's oftentimes just simply a matter of connecting, you know, a developer researcher with one of our own, you know, PhDs or scientists or researchers and
Starting point is 00:25:36 letting those conversations happen because we've made it a point to onboard some of the best talent we can find out there from science and academia and enterprise and make those resources available to our customers so that they can benefit from the same things we're figuring out at the same time. I mean, we run a fairly large R&D shop and we are spending a lot of effort trying to move the ball forward in some really critical places for enterprises. We want them to be a part of that exploration and share in what we figure out along the way. I wonder if we can turn a little bit here, a little bit toward more of the specifics of
Starting point is 00:26:15 what AI requires from a system. So you've been intimately involved with developing these specialized solutions for AI applications. And of course, this is utilizing AI. So maybe, can you talk a little bit about the various aspects of the system and maybe anything you've learned over the years that is essentially, you know, really a critical component to building a system to support AI applications? Yeah, it's a great question, Stephen. The reality is that AI is unique in how it consumes resources, unlike typical enterprise workload. And achieving, as I was saying, fastest time solution on a model requires having enough of those computational power of those resources in combination with ultra high performance storage, really high IOPS, really low latency to feed those data sets during a training run with everything connected over a
Starting point is 00:27:12 very high speed, low latency network fabric, which increasingly obviously is InfiniBand when you're talking about multi-system, large cluster implementations. This is typically what we've found is needed to tackle the largest AI problems or models that need to be ultimately parallelized over multiple systems, if you want to have a reasonable timeframe within which to train those models. We talked about design balance. One of the challenges and the things that we solve for is ensuring linearity of performance with system scale. And this is at multiple levels. connected GPUs and I should be able to distribute my problem and get faster and faster performance as I string together more and more of these systems. But we know that that breaks down because as your problem gets larger and you try to parallelize it over more and more systems,
Starting point is 00:28:17 you incur a lot more communications overhead to distribute the problem across all those systems. And so you get diminishing returns as you scale, if you approach it in the classical way of just, you know, a lot of PCIe connected GPUs with a pretty standard ethernet fabric and multiple systems. And that's why a lot of what is in the core DNA of our systems, things like NVLink, which offers a high bandwidth inter-GPU bus, if you will, to make eight GPUs seem as one, right? And NVSwitch is the other part of that technology that creates that inter-GPU fabric. Combined with the InfiniBand fabric connecting multiple systems, when you go from one to two to four to 140, which is, you know, the ultimate in scale that I think we've, you know, been able to demonstrate with what we call a DGX super pod. This, you know, having this approach at the core
Starting point is 00:29:20 system level, and then at the scale out level and being prescriptive around what the network topology looks like across all those systems has enabled us to demonstrate that linearity of performance such that there is no or minimal drop-off as you get to your 140th system. And it used to be, I would be floored by the idea of anybody stringing together 140 systems over a single network fabric. You know, a couple of years later, I'm actually not that surprised because there are so many organizations who are doing exactly that, especially in areas like NLP and large recommenders and autonomous vehicle system development this is the kind of scale that they're operating so at such that they can iterate fast and and and um get an answer to a training run in you know in hours and days instead of weeks
Starting point is 00:30:19 and months kind of thing yeah i agree a couple of years ago when when two two gpus in the server was considered like the top and i said we wanted more uh they called us crazy and then not not that long after you know there were more more gpus so it seems like nvidia i mean first of all ai is all about bottlenecks right you always have bottlenecks you try to solve the bottlenecks with NVLink and so on. So you kind of keep on solving the bottlenecks you see on the road. Is there any particular bottleneck you see today that NVIDIA really focuses on and believes that it will accelerate and remove a lot of roadblocks? Yeah, if you look at, for instance, GPU direct storage and MagmaIO, we're solving for the problem of the inherent latency
Starting point is 00:31:11 incurred when the data path has to move through this host CPU before it gets to the GPU. And essentially short-circuiting that and offering a streamlined path from the data store in like external storage through a NIC in the in the server that's obviously optimized with what we call a DPU a data processing unit direct to the GPU. It speaks to the very thing you raised Frederick namely this idea of eliminating every bottleneck as we see it, that's inhibiting larger and larger levels of scale. So GPU direct storage is probably the latest example I have, you know, in combination with the DPU, that's allowing us to now ensure that there is minimal latency, minimal speed bumps between where the data lives and the computational power that needs to act upon
Starting point is 00:32:08 it, right? So that's very much the way you described it is perfect. Namely, it is really about eliminating those bottlenecks, especially when you talk about distributing a problem over multiple systems. Yeah. So you're talking about reducing the impact of the host? How about eliminating the host completely? I mean, isn't that what the ARM ID was, is to kind of provide a bootable GPU that didn't require a host? I actually don't know that we would, I wouldn't look at it necessarily as eliminating the host.
Starting point is 00:32:39 I see it as an adjunct and there is a natural bifurcation of what functions exist on one kind of processor versus the others because um essentially you know we're always going to have like mixed workloads except in in the realm of you know training where i think if you're trying to implement infrastructure to train uh very challenging models very complex problems you're going to have you're going to be very purposeful and very singular in what kinds of workloads run on that infrastructure. But increasingly,
Starting point is 00:33:10 there is kind of this deployment infrastructure that needs to handle a much more wider palette of mainstream acceleratable applications. And in those environments, you need to have a way to still support applications that depend on traditional CPU, but also can offload as much of what doesn't need to be done on a CPU onto devices like a DPU, as an example. So we see in kind of those heterogeneous environments, you're still going to have kind of this multiplicity or duality of processor types. So it sounds like really NVIDIA is not only the GPU company that everybody thinks, but also a major player in enterprise AI applications. And I think that that may not come as a surprise to a lot of our listeners, but maybe some of them it might, because many of the people who are just starting to look at deploying
Starting point is 00:34:04 AI applications are starting to ask themselves, what kind of system am I going to need in order to support this? And quite frankly, the answer is that, you know, NVIDIA has already answered that question for you with the DGX systems and in partnership with many of the familiar names that you're probably already working with today. So you can find yourself a balanced system that not only supports small AI and ML applications, but can scale up to really massive proportions here. And they got that covered for you, especially now that they're rolling out new products and technologies. So the time has come, Tony, to move
Starting point is 00:34:45 on to the fun part of the podcast, where we talk about some things that are a little unexpected. And that leads us to our famous three questions. This tradition started back in season two, and we're now carrying it through to season three. But, you know, we're adding a little twist here. So our guest has not been prepared for these questions ahead of time and we're going to get their answers off the cuff right as we speak. The difference this season is that I'm going to ask a question and Fred's going to ask a question, but the third question actually comes from a previous guest on the podcast. And of course if Tony has one he can pay it forward here and ask a question of a future guest on our podcast. So let's kick things off. Fred, do you want to take the first question? Sure. So how big can ML models get? I mean, today there's hundreds of billions of parameters for a model, which might look small tomorrow.
Starting point is 00:35:43 You know, is there a limit, you know, can it keep on growing? Yeah, it's a, it's a great question. I am always careful not to try and define an upper bound because when I thought 8 billion parameters on a language model was a big deal, lo and behold, you know, a few years later, we were here in a GPT-3, right? So I, you know, a few years later, we're here in a GPT-3, right? So I, you know, all of this will evolve and continue to scale in response to the infrastructure and the tools ability to enable models of that size. I can definitely see that the use cases and applications will only
Starting point is 00:36:22 continue to drive us to larger and larger models. So I'd really say that there isn't a conceivable upper bound if we're kind of keeping our imaginations open to the art of the possible or the art of what could be. Excellent. Now for something a little bit more fun. So in Hollywood, they love to show us artificial intelligence that's basically an artificial person, like Mr. Data or somebody like that. Do you think we'll ever get to that point where we'll have just sort of a general artificial mind, somebody that we interact with walking around that's AI? You know, I think about this in two ways. One is, if I really, you know, put on the science fiction
Starting point is 00:37:05 hat of things, the idea of a sentient being that is aware of itself and you can interact it like in a truly human way, that part of it, you know, kind of freaks me out to be perfectly honest. I mean, I don't know that any of us are really prepared for it, but maybe that's an eventuality that happens sometime way off in the distant future. If you look at the trajectory of things, you kind of wonder, could we eventually get there? And that's a really hard one to wrap one's mind around. But what increasingly is apparent to me is with the advent of these incredibly large, as we say, big NLP type models, they are increasingly presenting themselves in a way that you almost think behind the covers, there's someone incredibly smart. I recently saw a video pitting three different generations
Starting point is 00:38:05 of GPT models against each other doing trivia questions and such. And it floored me how quickly they could deliver answers to some of the most, you know, arcane type questions and details. And I think that that level of intelligence backed by essentially an algorithm that knows how to connect the dots from oceans and oceans of data in milliseconds. I think that's something very real, very real and very possible. And we're already kind of seeing that hit the doorstep of enterprise, just to be quite honest. Our third question comes from a guest on season three, episode two. Take it away, Mike. This is Michael Malley, SVP of marketing and sales for Seneca Global. And my question is, can you give an example where an AI algorithm went terribly wrong and gave a result that clearly wasn't correct? I'd love to hear that. Yeah, you know, one example that's probably a lesson for all of us is I've seen where
Starting point is 00:39:11 we've had NLP-based chatbots in, you know, engaging the Twitterverse and basically over time, you know, evolving to give answers that were really off color, really inappropriate, really bad for general consumption, but the algorithm was simply doing what it was programmed to do in response to the input received. And I think it's also a lesson in how, while this technology is incredibly powerful, there needs to be careful governance and thought around the data fueling these things and looking for things like bias and looking for explainability and understanding how the answer to the question is derived, such that we don't have AI that essentially goes completely off the rails and says or does a bunch of things that could really embarrass us or worse.
Starting point is 00:40:09 Thank you so much, Mike. And Tony, thank you very much for joining us today. We look forward to hearing what your question might be for a future guest. And if you, the listeners, want to be part of this, you can. Just send an email to host at utilizing-ai.com and let us know you want to be part of our three questions segment. Tony, thank you again for joining us today. Where can people connect with you and follow your thoughts on enterprise AI and other topics? Well, first of all, thank you for having me, Stephen and Frederick. I had a great time.
Starting point is 00:40:38 You guys can find me at Tony Paikaday on Twitter, and I'm on LinkedIn as well. Great. We'll include that in the show notes. Fred, how are things going with you? Doing well. So it's funny, we're having this conversation with NVIDIA because I'm working on a project with a super pot. So I'm learning all the internals and the outsides of the super pot. So I'm looking forward to doing that. Excellent. And as for me, I've been really enjoying following some of the announcements coming out of all the various exciting AI products.
Starting point is 00:41:13 And we've been covering a lot of that on the Gestalt IT Rundown on Wednesdays at gestaltit.com. So thank you everyone for joining us here for the Utilizing AI podcast. If you enjoyed this discussion, please do shoot us a rating review on iTunes because that sure does help. And also please share the show with your
Starting point is 00:41:30 friends. This podcast is brought to you by gestaltit.com, your home for IT coverage from across the enterprise. For show notes and more episodes, go to utilizing-ai.com or find us on Twitter at utilizing underscore AI. Thanks for joining and we'll see you next week.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.