Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 2x13: AI Needs Non-Traditional Storage Solutions with James Coomer of DDN

Episode Date: March 30, 2021

AI applications have large data volumes with lots of clients and conventional storage systems aren’t a good fit. In this episode, James Coomer from DDN talks about the lessons they have learned buil...ding storage systems to support AI applications. Inferencing requires terabytes or petabytes of data, often large files and streaming data. For example, autonomous driving applications generate hundreds of terabytes of data per vehicle drive, resulting in petabytes of data to ingest and process. DDN’s parallel filesystem goes a step further than NFS with an intelligent client that directs I/O to leverage all network links and storage endpoints available. Deep learning loves data, and a smart client can make the whole application faster. Because data is the biggest AI challenge today, an advanced storage solution can really help deliver AI solutions in the enterprise. Although most companies realize that finding expertise (data scientists, etc) is a major challenge, building infrastructure to support them is just as critical. Guests and Hosts James Coomer is Senior Vice President for Products at DDN. Connect with James on LinkedIn or learn more on Twitter at @DDN_Limitless. Andy Thurai, technology influencer and thought leader. Find Andy’s content at theFieldCTO.com and on Twitter at @AndyThurai. Stephen Foskett, Publisher of Gestalt IT and Organizer of Tech Field Day. Find Stephen’s writing at GestaltIT.com and on Twitter at @SFoskett. Date: 3/30/2021 Tags: @SFoskett, @AndyThurai, @DDN_Limitless

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Utilizing AI, the podcast for machine learning, deep learning, and other artificial intelligence topics. Each episode brings experts in enterprise infrastructure together to discuss applications of AI in today's data center. Today, we're discussing the challenges of building scalable storage systems to support AI applications. First, let's meet our guest, James Coomer from DDN. Hello. Yes, I'm James Coomer. I'm Senior Vice President of Products at DDN. So go to ddn.com and you'll find the website for our company. We've been around for over 20 years building storage and data management solutions for really tough data challenges. So when the capacities get big, when the performance challenges are very large,
Starting point is 00:00:51 then DDN is the company which our customers come to. I am Andy Thurai, founder and principal at thefieldcto.com. You can find me on Twitter at Andy Thurai and on LinkedIn. You can also check us out at thefieldcto.com where we do a lot of emerging tech consulting work, AI, ML, and on LinkedIn. You can also check us out at thefieldcto.com, where we do a lot of emerging tech consulting work, AI, ML, and the cloud. And I'm Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT. You can find me right here every week on Utilizing AI. I also am organizing the AI Field Day event, which is coming up soon. And you can connect with me on Twitter at S Foskett. So James, I'm very familiar with DDN having been in the enterprise storage space myself
Starting point is 00:01:30 for quite a while. And I know that DDN has tackled some really challenging high volume and high capacity, high performance storage for the media and entertainment space in the past. I was not at all surprised to see as well storage for the media and entertainment space in the past. I was not at all surprised to see as well that y'all are working to support AI applications. So maybe we can just start there
Starting point is 00:01:53 with just a little bit of a foundation of, you know, what are the challenges of supporting these kinds of applications that require, you know, tremendous amounts of data, tremendous amounts of throughput, lots and lots of clients with simultaneous access. These are not conventional storage applications. Yeah, that's right. So we have indeed been exposed to the real full breadth of tough data
Starting point is 00:02:17 challenges, whether that's in high-performance computing or media, streaming videos, etc. But AI is a bit different. I mean, firstly, there's not just one thing that represents AI data workflow. It's, of course, a big pipeline, and it all starts with data coming in in the first place. And that data might be coming in from vehicles, recording with cameras, LiDAR, radar, might be coming in from satellites, might be coming in from satellites might be coming in from life sciences instruments sequencing machines etc so that ingest phase is part one then there's a data labeling phase then there's of course the the all-important deep learning phase where we're teaching our models and training our models based on these big data sets then when we move into production we're performing inference a whole different class of IO challenges right there.
Starting point is 00:03:05 And then that data moves on, typically not really to be archived, but to be reused and go back into these model retraining events. And so we have this sort of endpoint, which isn't a traditional archive, but is a very, very active archive. So data volumes get very large, but we still want to access them very, very regularly. So that shape is genuinely different from anything else we've seen. And DDN has been really working on our technologies and our services in order to support customers with these large data volumes from which they want to extract value at big scales. Okay. So you talked about a few areas in there where the AI has a data problem. We're producing a lot of data, but also getting the data ready for AI is a problem, from the data collection some of the largest autonomous car vendors are using you without naming names.
Starting point is 00:04:13 But people don't realize when you have a live car going in, not just the instrument data, but the camera and the whole video and everything that comes in, in a matter of, you know, hours, when you do it for days, and when you store it and try to make meaning out of it, and trying to inference from that, you're talking about terabytes of data. So scaling enterprise production AIs to scale to terabytes of data, or even petabytesabytes of data is not easy. Is that a common problem that you see with that? And how are some of the guys solving this? Yeah, you're right. This is really problem number one, the most basic problem we try to solve. So as you say, in fact, in this autonomous driving world,
Starting point is 00:05:01 these vehicles come back from a drive around the city with between one and 200 terabytes of data each. And there's many vehicles and there's many cities. So it easily turns into hundreds of petabytes. And that's a challenge because particularly in AI, you don't want to siloize your data. You don't want to end up with 100 silos of 50 terabytes each or 50 silos of a petabyte each. Really, you want to have a very large, scalable, centralized data storage, which is robust, etc. And that's really where DDN comes in. So apart from the performance and all the other features, managing big data volumes is what we've architected the systems for. And the way we do it is by scaling the storage infrastructure in a very different way from any other storage system.
Starting point is 00:05:50 And this is a general principle, really. It's called parallel file systems. And, you know, in general, we can look at traditional NFS file systems, which companies, organizations might use for NAS and maybe even to serve virtualization environments. And they have a scaling property, which really doesn't help them stretch to tens of petabytes beyond. So the way we get around that is by adding a bit of intelligence into the network and into the compute systems. So to scale big, we need to scale the problem to include all those elements which
Starting point is 00:06:24 are also scaling, the compute scaling, the network scaling and bigger. And by having an intelligent client on the other end of the network, we can really help systems scale really limitlessly. And we do support some of the biggest supercomputers, AI supercomputers in the world. So just to dig into that a little bit, the reason that helps is now this intelligent client is sharing the problem of scale. It's able to direct IO, direct the data and read the data from where it lies and compare that to traditional NFS. There's no intelligence there. It has to go to something and then something else has to proxy for that. So in traditional NFS, you get this sort of big backend data movement, which is ultimately not scalable
Starting point is 00:07:07 because you're trying to squeeze all that scaling problem into the backend infrastructure. By having an intelligent client, we're kind of sharing the problem out a little bit. And it allows us to scale by compute, scale by network and scale by storage. And that's what really helps us have the systems which are 100 petabytes plus. First of all, there are 20 areas in there I want to double click,
Starting point is 00:07:28 but let's start out with the obvious one first. You talked about, you know, collecting 20, 30 terabytes of data per car, per city. And so some of them are already mapped out, meaning that, you know, that the cars or the autonomous vehicles are able to inference based on what's being fed to them. But the actual problem will come because, you know, video is a thick load, not only a thick load, but also it's unstructured load. People don't realize that, right? So it's hard to make meaning out of that. When you're getting terabytes of data, which is unmapped, for example, if the car is driving in a territory or terrain, which is not already in that, how do you inference that? But more importantly, how do you take that volume of data and update your model that you feed it back into all the other autonomous
Starting point is 00:08:15 vehicles that are coming into the area or in general overall map? That's a major problem, isn't it? It is. And it's a very complex process that our customers have been developing over the past few years. So you're right, 100 terabytes of video and LiDAR and radar data can come out of a vehicle after a trip around a city. And they do keep going around the same city many times because you need to see the city in different weather environments, you need to see these rare events, you want them to happen because you want to capture those rare events. You want them to happen because you want to capture those rare events. And what happens, of course, for some of these scenarios, there's somebody in the car and they're labeling things live. So they'll be going around in the vehicle. They'll be labeling stuff saying, I've seen a cat. There's a signpost which says stop. And they'll be adding some labels as they go around to add some supervision to this process. So the data comes in pre-labeled, but then has to go through a big process to add more labels.
Starting point is 00:09:08 The critical thing in autonomous driving, in fact, in all of AI, is having trust in this AI. And what that really translates to is knowing the complete set of data, of objects that you've put through your deep learning program program so if you've seen once a stop sign with snow on the top you want to be able to label that i've seen a stop sign
Starting point is 00:09:32 with it with snow on the top and get that into your model so this huge problem is this huge uh challenge is complex and the data feeds in yes every day from each vehicle but it goes around in a big cycle as they have re-simulation and they have virtual simulation with our virtual environments and they have these virtual cars driving around virtual environments adding more and more data so in fact you know while it seems a big data problem by its nature deep learning loves data it just wants more and more and so it's more of a you know for our customers it's more of a challenge of working out where the unique objects are to make sure that the mass of objects, even the unique rare events, is pushed back into that model. It is a big challenge.
Starting point is 00:10:14 So this has been a perennial challenge in the storage space. And I think that many people outside of the storage industry might not understand that storage clients, storage endpoints are really not intelligent at all. And that always has held back the development of storage. So whether it's a conventional file server or SMB or NFS or even cloud storage systems, for the most part, the client has a very limited dialect, a very limited range of interaction, and basically just treats the storage system as a, you know, something else, you know, hey, something else, store this data for me, rather than being actively involved in data placement and data labeling and data classification and data movement and all that. So what you were just describing, James, makes it sound as though
Starting point is 00:11:12 not only is your client actively participating in sort of where to put the data and how to get the data there, but also actively participating in sort of categorizing data. Did I hear that wrong? Well, we're assisting that process, basically by relieving the infrastructure of undue load. But you mentioned a bunch of things, and we talked a bit about how an intelligent client with a bit more intelligence than an NFS client can help scaling. But also, of course, what it can do is it can protect data from corruption of the network. We've got the data from the application. We can make
Starting point is 00:11:49 sure it's arriving at the storage safe just as you sent it. We can help performance because we're not handing off the problem to an NFS protocol. We're actively engaged in moving data over the network. And in fact, our intelligent client, and in general, these intelligent clients can have a virtual networking layer to optimize the data movement through the network as well. So the fun thing is by having this intelligent client, we're not just giving you fast storage. We're helping you make the most of the network layer, because we're really using RDMA protocols or FAS protocols across that network. We're helping you use your compute and apply labeling and compute intensive challenges because
Starting point is 00:12:31 our intelligent client is offloading the work, the network load from the compute system. So we basically, not only do we help storage with our intelligent client, we really help the whole infrastructure. And in fact, the biggest benefit for customers is that these intelligent clients make the applications go faster, which is something people really often don't really measure. They'll think about a storage system and think, well, I want this storage system because it says it's going to do 10 gigabytes a second or a million IOPS or something like this. And of course, as we all know, the critical thing is how much faster is your application going to go? And often the application isn't actually held up by the backend storage system itself. The storage system is wonderfully powerful, but it's everything in between the
Starting point is 00:13:19 storage and the application that is not allowing that potential to get through. So by spreading this storage loveliness right to the application, we can track the data, we can accelerate the data all the way into the application. So where we really think the intelligent clients are important is accelerating workflows. And we really mean that because we're interfacing directly with the application. We're not handing it off to NFS and handing off to a network protocol we're interfacing directly so we can push the data right into where it's needed and with ai like we're talking about here
Starting point is 00:13:54 the optimizations we've put in to cope with tensorflow to cope with pytorch to cope with the sort of behaviors they have we can can put in those fixes, those improvements, not just the storage layer, but the network layer and the client layer as well. So we can kind of fix the whole data path end-to-end into the application and out again. So there are a couple of areas in there I want to double-click. But first of all, I love the concept of, how should I put that? Feeding dumb data to intelligent processes, right? Because most of the intelligent applications and processes are held back by not,
Starting point is 00:14:35 as I said, it's not a dumb data. It's a smart, well, data is data. There's no smarter dumb data. But you know, how to make meaning out of it and get that right data in place, right? So that's the problem. So did I hear you say that, you know, regardless of the framework, whether it's PyTorch, TensorFlow, doesn't matter what it is, on the fly, you'd be able to figure out what data is needed when, and then you're able
Starting point is 00:14:57 to assemble the data on the fly using AI and then provide them? Yeah, so you open up a couple more avenues to discuss here. One is a data platform in general really needs to be flexible. You know, these companies are investing a lot of money in the data itself, in the data scientists, in the AI processes. And the last thing they want to happen is for suddenly to find a roadblock in their storage system. So flexibility, protocol support is important. And, you know, we put in optimizations for individual AI frameworks. So flexibility, protocol support is important. And we put in optimizations for individual AI frameworks. So you're right, TensorFlow, PyTorch, we run these in our labs and make sure we can cope with these, the sorts of IO. So just an example, some of these applications like to use MMAP as their primary POSIX call. That can be particularly troublesome
Starting point is 00:15:43 to some storage infrastructures. So we've optimized the MMAP call, even though it sounds like a minute detail, it can have a huge impact on the performance of these AI frameworks. And then you mentioned about finding the right data. Typically, the user decides the data set they're going to train on. So just like the autonomous driving or in life science or in finance, you want to basically choose the data set so your model is holistically trained across the correct span of elements that you want to train upon. What we do, so they already know that, we don't have to help that problem, but what we can do is we can look in this very large scalable storage system in the background with kind of intelligent algorithms you can find the data that tends to be hot you can find some
Starting point is 00:16:32 tendencies and optimize that data in the right place for quick access so that means you know you might have flash you might have hdds and of course when we're talking about tens, hundreds of petabytes, HDD is still, you know, king when it comes to price per terabyte. But we can sit there in the background, we can juggle this data around and optimize the hot data into flash at very, very large scales, the sort of thing that's easy to do in a tiny system for enterprise, but very, very difficult to do when the system's containing massive amounts of unstructured data. So, yeah, so we do all that stuff. So, A, we do have to have flexibility and support SMB, NFS, S3, as well as our intelligent client. And then the background, we also do kind of clever
Starting point is 00:17:17 things to try and optimize the data placement. So it's on ready for ready for some reads or whatever so then let me ask you a follow-up question on that um the the problem domain that you described that's actually a major problem with most of these enterprises particularly with the model creation phase right because they have this this probably unlabeled undocumented unstructured dark data tons of it spread all over the place because not necessarily all of them are centralized storage. It's a massive issue. But isn't the, particularly when it comes to HPC, right, because they need to assemble all of it to, you know, things like medical imaging, autonomous driving, combination thereof. But isn't that the same problem that some
Starting point is 00:18:01 of the data lakes are trying to solve? How are you unique and what is your differentiation? So the intelligent client is part of that. And the other part is we're POSIX. So what we found is data lakes, which has traditionally, well, the past 10 years meant big data, which has also implied Hadoop. The challenge is there because it's quite a particular storage infrastructure. It was built for batch workloads. And, you know, it's not POSIX. It's its own special thing. And what we're finding is, well, firstly, AI frameworks go massively fastest when using POSIX. It's an extremely, I mean, it's people think of it as a complex protocol, but actually it's super fast.
Starting point is 00:18:45 So the fastest way of accessing your data is through POSIX, especially with an intelligent client such as the one that DDM provides. So Hadoop's kind of out the window now. It's Spark, it's TensorFlow, it's PyTorch. Everything's maximally in memory, and they're all using POSIX for the fastest access. But not exclusively, so other protocols are still important. So we can still interact with the legacies of batch workloads because we're POSIX, absolutely standard, and we can export and allow people to access our data through NFS, S3 and SMB. So flexibility is key, flexibility at decent speed, but the best performance is no question through POSIX, accelerated by these intelligent clients. And I'll just jump in here as the resident storage nerd.
Starting point is 00:19:33 So when he says POSIX, what he's referring to is a set of IEEE standards that dictate how computers should interact with other systems. And so essentially, when you hear that a system is POSIX compliant, what they're saying is that the system uses standard accesses for various components, including storage. And so it's not some weird proprietary thing. They're engaging with the operating system, with standard operating systems like Unix and Windows in a standard way. Yes, thanks. So the stuff on your laptop is POSIX file system.
Starting point is 00:20:14 The stuff in our global parallel file systems is a POSIX file system, the same standards, which basically mean the data is correct when you read it and correct when you write it, and there's a certain format that's all built in. So let me pull you back. You guys are going too deep for me. You know, all technical details. So let me pull you back a little bit, level up and ask you a question about that, particularly with the look at the end of the day, major problem AI has is, you know, a data problem, whether it's a data collection, data storage, data labeling, providing the right data, all kinds of issues.
Starting point is 00:20:53 But so if they were to use a solution similar to yours, I mean, there are a ton of areas that you touched on. You almost seem to cover every problematic angle of AI, data problem of AI, that is. How does it help companies when I'm creating a model and start spending three days, I'll do it in half hour? Or what's my advantage? Why do I care? Why do I use you? Yeah, very, very good question. So the first one, of course, is time to market. So when companies have invested so much,
Starting point is 00:21:25 they want to build a system that's going to scale and start quickly and then scale pretty easily. The starting quickly thing is resolved by companies such as ours who work very closely with the vendors who are creating the GPU infrastructure. So we've been working closely with NVIDIA, very tight integrations. We've got multiple reference architecture publications. Our systems are plugged into Nvidia's largest supercomputer. So customers can really pick a solution off the shelf, starting small, starting literally this large, and scaling to the largest supercomputers in the world. They can pick that and see the performance they can expect to get from that from a white paper. Not because we've made it all up, because we've really tested it in these huge labs. So that's advantage number one. It's time to market with a solution that's going to ultimately scale. Now, you mentioned,
Starting point is 00:22:32 so why else would customers sort of come to DDN or come to a company like DDN? I guess the other thing is we do specialize just in storage and data management. So that specialization is always important to our customers because whenever they're deploying, let's say there's so much risk involved around the whole space of developing a new AI strategy and issues always happen. There's always funny things happen in the network. Applications behave strangely for whatever reason. There's bugs. Having expertise, which really knows networks, really knows compute, really knows storage, that's important. And one of the reasons DDN is quite good at this is because we do span the network.
Starting point is 00:23:17 We work in the compute. We work in network. So the problem doesn't stop at our storage. The problem really encompasses the whole environment. So any organization which will partner with these customers, with these big strategies, with high risks, would do well to find a storage partner who's got the expertise, not only just in storage, but in network and compute, because it's all super connected.
Starting point is 00:23:40 And optimizing really means optimizing everything, not just optimizing storage. And finally, you know, the shift is moving. So these various questionnaires by various analysts basically ask CIOs where the biggest challenges are in AI. And three, four years ago, it was actually in finding data scientists was number one, number one, and number two, number three. It's really shifting now as people are starting to implement, they're realizing infrastructure is actually a big issue at scale. And it was always there, but now it's really come to the fore. It's security, it's scale, it's performance. And there's two pieces there. One is people are starting to understand that in order to develop an AI strategy, not only do you want to cover everything in your plan, but you want to have overhead.
Starting point is 00:24:31 You want to be able to build a space where your data scientists can really innovate. Because it might be those people who actually, you know, find the golden egg from the golden goose and make your company really be amazingly competitive so you need to have a system that's not only capable but also more than capable you want to be able to have these data scientists have a lot of freedom to innovate and to de-risk the whole environment by having expertise being used to dealing with these very, very tough data challenges. So all that comes together. And I think, you know, companies are in a difficult position, given the biggest risks and the huge potential rewards. And so they need to think, is traditional storage going to work today and in the future? And what kind of services and surrounding portfolio and people do I need around that solution to help them succeed much, much longer term? Because one thing they've all got in common,
Starting point is 00:25:32 these customers, is they don't quite know what's going to be truly successful on day one. There's always going to be surprises on that route, so you need to keep flexible. I think another thing that a lot of companies are faced with is a lot of conflicting information and kind of almost FUD in the marketplace about what do you really need in terms of storage and so on to support AI applications. And certainly we've heard people say, oh, AI applications, they need to be all flash, for example. There's no way you can build
Starting point is 00:26:05 AI applications using disk. It has to be all flash in order to support the performance requirements. Or we'll hear somebody say, oh, well, you know, it has to be a distributed solution where the storage lives on the clients instead of in a centralized server. Or it has to be object, or it has to be NFS, or it has to have some proprietary it has to be NFS or it has to be to have some proprietary interface. There is a lot of confusion over there and a lot of companies out there selling basically what they've got on the truck instead of what the true answer is. So, I mean, how do you answer that considering that essentially you're selling a pre-existing system, not one that was specifically designed for AI? What makes this system better than others? Well, it's a very good question.
Starting point is 00:26:52 And the answer is that we actually have almost everything you mentioned, we do that. So we can provide storage that resides in compute. We provide S3 object stores. provide hdd and all flash so we've got relatively little bias when it comes to this area and when we're talking to our customers we've really got a very broad solution portfolio that does everything at scale so the way and so given that you know it's never totally objective but reasonably objective starting point we've got a broad range of solutions. What do we say? Well, we do say, of course, flash is best, but this economics is coming to play. And when our customers have 100 petabytes of storage, there's really, you know, it's going to be another. 30 before even the lowest cost flash kind of gets into its into its dollars capacity competitiveness
Starting point is 00:27:47 against today's HDD is still quite a way off which is why we built these hybrid solutions that really scale so of course we we love to sell flash to customers all NVMe is great but when there's just literally an economic boundary there then we can at least give them the best for hdd for their for their money so and we can also handle that so they mainly see flash performance but mainly pay for cost of active capacity which is of course you know the the ideal place in there in terms of objects you know we can talk to our customers about the benefits of S3 object protocols for a long time. We've been using that for over 15 years. Where it comes into play in AI is really for ingest. You've got these dumb systems out there, maybe they're satellites, maybe they're CCD cameras or whatever. They want
Starting point is 00:28:38 to push things into your network through S3. It's a very handy protocol when your devices are scattered around and they're relatively dumb. So bring stuff in through S3. It's a very handy protocol when your devices are scattered around and they're relatively dumb. So bring stuff in through S3, that's great, but there's no way you can perform with S3. You can't do your inference with S3. The protocol is inherently slow compared to POSIX. So that's, I mean, that's the argument. The argument about putting all your data inside the compute, it's always got to come out. Unfortunately, it's always got to come out at some point. And so you really just add complexity more and more by really mainlining on that route. It's all right to have an element of caching, if you like, in the compute, but by mainlining on, I'm going to put my storage in my compute is, you know,
Starting point is 00:29:22 an endless disaster because the data's got to come out at some point. And the manageability of data on storage devices inside computers is not very good compared to managing data in proper storage systems. So we do cover all these elements. We have thought about them a lot while we were architecting our systems for AI, which is really why we've come around to the system we ship today, which we think is the best of all those worlds. Well, thank you so much for that. And unfortunately unfortunately we do kind of have to wrap up here, but before we do that I'd like to move on to the lightning round of bonus questions here at the end, which is always a lot of fun. So a warning to the audience, James has not been given a heads up
Starting point is 00:30:02 about which questions I'm going to ask him and we'll see what he comes up with here as an answer for some of these. And of course, it's all in fun. So here you go. Let's kick things off. Number one, would you say that machine learning, deep learning, and artificial intelligence are synonymous, or do those terms mean very different things? They mean different things. AI is the big wide picture. Deep learning is the particular sort of machine learning which benefits from large data volumes. All right, next question. DDN has a long history in the video space, storing video and processing video. When will we have ML that's video focused and that operates the same way that Siri or Alexa works with audio? Well, kind of now, I suppose, a lot of our customers are
Starting point is 00:30:57 doing real-time video inferencing, some really fun ones, actually. So in fact, there was a recent announcement in London, I live in the UK, and that's been London about a new store, which changes the shopping experience. And we've had three customers do this over the past three years. Very interesting companies. So they have the cameras in the shops in the supermarket. And they don't look at you, they're not they're not looking at your face and recognizing your face. That would be bad. They're looking at what you put in your bag. So they're seeing the cucumber, they're seeing the tin of beans, and they're tracking it all. And when you walk out the shop, it says, hey, your phone goes ping, and it says, would you like to pay for this
Starting point is 00:31:39 now? And you go, yes, you pay for your cucumber and your tin of beans without having to go through a checkout. So lovely example of real- time video streaming, which hopefully means that all those people who work in supermarkets can spend their time helping you find the cucumbers and beans rather than stalling you at your checkout experience. So that's one of many. And there's lots of these rather nice video streaming inference examples often used in film work when you might be videoing outside and you want to in real time blur out some advertising which you didn't want to display in your live stream and broadcast to millions of TV viewers and again we'll have got these inference capabilities which are going to blur out these adverts as you pan your camera around the city center. So you don't accidentally advertise for somebody else. Lots of great examples. In fact, it's probably the most fun, the fun world is the video inference world. I knew that you'd have something to say about that. Again, this is a, you know, you guys are
Starting point is 00:32:36 a big gorilla in that market. So cool. And then following on to that, you mentioned supermarket workers. Are there any jobs, maybe not supermarket workers, but are there any jobs generally that are going to be completely eliminated by AI, jobs that will no longer exist in five years? Five years. Oh, damn it. Are you going to give me 10? So I think the most likely candidates, well, firstly, as some elements of factory work, you know, industry 4.0 is there's some things people are doing today, which kind of really
Starting point is 00:33:15 doesn't make sense on these manufacturing lines. So that's a big area for improvement and optimization. And the whole COVID thing has only accelerated that. The need not to have people unnecessarily sitting together in factory environments is an important area for improvement for manufacturing. And then ultimately, self-driving cars, they're a big one. It's not really five years, to be honest. But we'll start to see point solutions.
Starting point is 00:33:44 For example, when those lorries come into the huge rate logistics park to pick up their parcels, as soon as they come through the gates, then they can hands off and the truck can drive itself into the right place, park itself in the right place, according to computer algorithm, according to the logistics system, without the drivers messing things up. So there'll be point solutions in vehicles outside of the commercial, the open road, where we can put AI in charge to improve the process. You almost stole a fourth one of our questions, because we do have a question about autonomous driving that we ask a lot as well and your answer was definitely in line with what we've heard from others on that question so in fact i would say that overall we we've heard pretty much uh consensus on a lot of these and uh and some differing opinions on some others but thank you so much for joining us uh james it's been great to talk with you and learn a little bit more about how uh you and how d DDN are working in this new world of AI and
Starting point is 00:34:46 ML. Can you let us know where can people connect with you if they'd like to continue this discussion and follow your thoughts? Yeah, so go to DDN.com. Take a look at the blog there. Myself and my colleagues often put out our blogs there. You can email me at jcoumer at ddn.com, J-C-O-O-M-E-R at ddn.com. And if you want to hear from me and three of my colleagues, we'll all be presenting at GTC. That's the NVIDIA conference on AI
Starting point is 00:35:17 starting 12th of April. And we have four presentations going on there. How about you, Andy? What have you been up to lately? Doing a bunch of work in the AI stuff, as you might have seen. Just published an observability report, which is an external AI apps kind of thing
Starting point is 00:35:33 for GIGOM and doing a lot of work in that space. So you can check out most of my work at the fieldcto.com. As always, you can follow me on Twitter at Andy Thurai or connect with me on LinkedIn or find most of my work at bfieldcto.com. And as for me, you can find me on Twitter at S Foskett.
Starting point is 00:35:54 You can also find me every week on Wednesdays for the Gestalt IT News Rundown posted to gestaltit.com. And of course, every Tuesday here on Utilizing AI. So thank you for listening to the Utilizing AI podcast. If you enjoyed this discussion, please do subscribe, rate, and review since that helps our visibility. And please do share this show with your friends and share it on social media.
Starting point is 00:36:20 This podcast is brought to you by gestaltit.com, your home for IT coverage across the enterprise, and thefieldcto.com. For show notes and more episodes, go to utilizing-ai.com or find us on Twitter at utilizing underscore AI. Thanks for listening and we'll see you next week.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.