Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 2x18: Taking Machine Learning on the Road with IBM and B-Plus

Starting point is 00:00:00 Welcome to Utilizing AI, the podcast about enterprise applications for machine learning, deep learning, and other artificial intelligence topics. Each episode brings experts in enterprise infrastructure together to discuss applications of AI in today's data center and beyond. Today we're talking about taking machine learning on the road, literally, with B-plus technology and IBM. First, let's meet our guests. First up, we've got Alexander Nowak from B-plus technology. Yeah, hello, Stephen. Thank you very much for being able to join that.

Starting point is 00:00:38 My name is Alexander Nowak. I'm the managing director of B-plus technologies. And I worked in this field of developing autonomous cars and ADA sensors specifically through the last eight years, accompanying a lot of tier one and OEM customers. And I'm happy to join this podcast and share my experience on that. Thank you, Alexander. It's nice to have you.

Starting point is 00:01:00 Also, we have Frank Kramer from IBM. Thank you for having me. This is Frank Kramer. I'm an IBM systems architect and I'm working for IBM in Germany, working with a lot of automotive customers. And we see a huge trend in using AI in the development of robotic cars and happy to join the discussion here. And of course, I'm joined by my co-host, Chris Grundemann. Yes, I'm Chris Grundemann.

Starting point is 00:01:28 I'm a consultant, content creator, coach, and mentor. And you can learn more at chrisgrundemann.com. And I'm Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT. You can find me on most social networks, including Twitter, at sfoskett. Alexander and Frank, Chris and I had a briefing with you about a month ago where we were really excited to learn about how you're using machine learning in autonomous vehicles. Those of you who have listened to our podcast knows that this is an area of fascination for us. And particularly with regard to the podcast, we're interested in how the experience of

Starting point is 00:02:08 collecting analytics and telemetry and modeling machine learning in autonomous vehicles applies to industrial IoT and machine learning and enterprise generally. So before we start, Alexander, I wonder if you can describe a little bit of what B-Plus is doing in autonomous vehicle development. Yes, for sure. Thank you very much for that. Well, B-Plus and B-Plus technology specifically, we are a group of companies working along with the Tier 1 and OEM customers. And we've been doing that for 20 years now. So we started pretty early in ADAS, so driver assistance system development. And now we are working in the field of AD, so autonomous drive development a

Starting point is 00:02:49 lot. The point is that everything the car recognizes, no matter which autonomous car it is, or which autonomous system it is, needs to have perception sensors. So we have a decent amount of cameras, radars and lighters on those cars. And those are the input sources for everything which happens in an autonomous car at the end of the day. Usually there's some high-end compute platform inside the car looking at those technologies which are on the road. A lot of them is based on NVIDIA technology, for instance. But all these high performance compute platforms

Starting point is 00:03:25 are not really operating without the sensors. So the point is for developing the driver, which needs to sort of work in the autonomous car, you need to learn about what is the perception of the car? What is the camera seeing? What is the laser beam seeing from the lidar? What is the radar watching at? And specifically in every scenario the car is running in, you have a different kind of information set coming from those sensors.

Starting point is 00:03:54 The point we do in this case is we capture all the raw information coming from these sensors as being the input source for every compute platform, which then runs the algorithm, which is specifically then also deployed in the serious production car at the end of the day. So our customers target at developing this functionality, develop these algorithms based a lot today on AI trained neural networks, for instance, or anything similar to this kind of technology. So the input source, as I said, is the sensors. So we talk about gigabit streams coming from cameras with high resolution. We talk about a hundred megabit streams coming from radars.

Starting point is 00:04:36 We even talk about lighters adding to that with a number of megabytes per second. So you end up in a couple of gigabytes per second capturing which results at the end of the day in about 60 to 90 terabytes per day on a test drive shift. This all data is of course being used specifically at the data center side mainly for AI training and sorting that data, labeling that data, filtering that data is happening on the data center side usually. But what we took is we took neural networks into the car. So there's no point in capturing, let's say, hours of driving on boring highways without having the right scenarios being captured. So the point is to really trigger recordings based on scenario detections which

Starting point is 00:05:25 are running on neural networks on GPUs inside the test car already, which reduces the number of data sets being generated and being post-processed at the end of the day. And this along we accompany the projects of our customers, we help them in data acquisition, of course, and we put this data into data centers at the ingest stations and co-location spaces where we, and this is the point where we work together with Frank and his team, where we hand over to IBM storage solutions,

Starting point is 00:05:59 for instance, or anything similar to that, which then feeds the AI training on the data center. So our target is to really help our customers develop their sensors, develop their perception functionality, and at the end, develop their drive functionality. Wow, that's amazing. I mean, even just those numbers, right, 60 to 90 terabytes a day in a single vehicle is pretty astounding. But I think there's a lot that I want to dig into. But first, I mean, just rolling all the way back to the beginning and maybe a really basic question. But these systems in the car, I mean, obviously, you know, when we're talking about NVIDIA GPUs and IBM storage, I mean, this is stuff that usually runs in a data center.

Starting point is 00:06:41 How are you powering that in the car? Is there additional power systems? Or is it running on the 12-volt native system? Are you doing converters? Or I mean, just really basics, how is this even turning on in a car? Well, yeah, you're completely right. That's, of course, one of the challenges facing in these kind of test cars. Those cars are specific, modified, let's say, to really power also those high performance compute nodes. And having a, let's say, decent performance on the recording side, which we usually talk about on the logging in gigabytes per second, let's say, that's something you really could buffer out of the standard network from the car, which is a few hundred watts consumption.

Starting point is 00:07:29 But having an AI compute node in there, literally our device which we put in there is a dual Xeon platform with multiple GPUs. So it's something which can also consume up to one kilowatt. So, and this is something which needs to be decently prepared inside the cars. And we're working with specialists on that side, of course, to add additional power batteries, buffering, power sequencing mechanisms, and of course additional generators inside the car because the standard generator you have there for powering the 12 volt system isn't enough really to drive that and this is the specification we see at those kind of test cars so yes they certainly need to be modified and cannot be run in a a serious production car at the end in the serious production car of course everything is tweaked to really cover the performance based on 12 volts and based on a small footprint of power and this is where we see the

Starting point is 00:08:32 the technology of our partners like NVIDIA let's say moving into less and less power consumption but even rising performance and we have also seen other players in the market who really work into this direction to have a small, let's say, power footprint with a large compute power behind that. And this seems like it would be particularly relevant in the electric vehicle space because, of course, electric vehicles by default have a larger power envelope. but that power envelope is generally used to power the vehicle, not to power onboard systems like you're describing. So even there, you might need additional generators and power, I believe. But of course, this is only for development. We're not talking about a kilowatt of machine learning processing in production cars.

Starting point is 00:09:22 The data comes from the development vehicles and then goes back to central processing. Is that right, Frank? Yes, absolutely right. What we see here is that there is not only a single test and development car, so all these large automotive companies have fleets of these cars. And it's very important for these fleets that they are in use all over the world. So in Asia, in Europe, in the US, as the traffic conditions and all the data they have to capture are totally different there. But all the data has to come together

Starting point is 00:09:59 into a single data lake. And what we see here is that the data ingesting coming from the recording from B+, let's say 20 cars around the world, each 50 terabytes a day, that's a hard data problem. And in the data center, we see that data ingesting and the preparation cycles are very, very time consuming. Time to data, this is a relevant metric in order for the engineer to start working. And what we see, of course, many silos of infrastructure for various analytics use cases. So the data from an engineering perspective,

Starting point is 00:10:35 we see problem with multiple copies of the same data without a single source of truth. It is very, very hard to manage this on a worldwide basis. And of course, cloud is a helpful thing. Hybrid cloud is a helpful thing. But the sheer amount of data and the volume and to find the needle in the haystack, of course, you can imagine that's a big IT problem. And of course, this is not only happening for the development. We have to keep all these AI models because they are in a working progress. Graphic science might change in some place of the world.

Starting point is 00:11:14 So we have to retrain this thing. So there will be an update in the future for your consumer car, but the development has to go on and has to go on here. And yeah, this seems relevant too. So following the data, we start with sort of these development vehicles that are doing data collection and processing in the field as part of the early phases of development. Then we pull the data into the data lake and we're doing processing there. And then the next thing, Frank, that you mentioned is this concept that we would train and retrain models

Starting point is 00:11:45 and then deploy those models to production vehicles. Is that right? What we see is that about, let's say, 10 or 15 percent of the data which is recorded on the street will go into the AI training process. When the AI training is done, and typically inside such an AI training process, it's not only one AI model, it's about 15cessing, and this is to make sure that the AI model is really safe and is really capable of working in all conditions, you have to reprocess all the data that you have collected. And these data libraries, we see 200 petabytes to 500 petabytes. This is a typical size. And this is all the data which is in movement daily,

Starting point is 00:12:47 more or less. And it's a little bit bigger than what we normally see in databases or data lakes not using AI technology. Yeah, absolutely. It's almost staggering, the size of data there. And which I think kind of harks back to something, Alexander, you said a little bit earlier, which was, I think you started talking about kind of the smart recording or specific scenes you're looking for. Can you tell us a little bit more about how the system on the car knows what to record? Because it sounds like if you recorded everything for the whole drive, you would just actually be overwhelmed, right? And so what you're actually doing is targeting specific things to learn the next step for the algorithm.

Starting point is 00:13:29 Is that right? Yes, that's completely right, Chris. captured based on specific test drives, eight hour shift with data being either continuously recorded or even triggered by a guy sitting next to the test driver. So who has a list of things he needs to look at, who has a list of scenarios they are looking for. And we move that into a more like let's say sophisticated approach and we can deploy sort of algorithms for instance give me a crossing with let's say three pedestrians and five cars and and red light scenarios and these kind of things we deploy that into the

Starting point is 00:14:21 vehicle and the vehicle itself can sort of get the data from the cameras which are there, finds out by this algorithm, which is then also based on GPU processing, of course, finds out there is a scene like that. we have a pre-trigger and a post-trigger time. So we keep like 15 to 30 seconds in RAM permanently. And once the trigger occurs, of course, I can look like 30 seconds back and of course also capture the scene wholly like a minute additionally. And this gives us the possibility to really bring fleets into the roads or onto the roads, test fleets onto the

Starting point is 00:15:07 roads, which doesn't necessarily need a trained co-pilot or a trained driver, because the system itself can do that. And even there's customers looking at, let's put it like that, taxi drivers, which run around the streets, and they want to take them into the concept of getting data being stored. But of course, having a taxi driver on the road means a different story than having a trained test driver or having an engineer. And in certain stages of the development, every such kind of concepts does have its sense, of course. So, but this is something which happens. So basically, what we take is we have the camera, for instance, this streams the data into the recorder, the recorder permanently keeps that in RAM, not on the disk, but also forwards the data to the

Starting point is 00:15:57 high performance compute node. The high performance compute node checks the camera picture. And once he sees a scenario, he can trigger the recorder and the recorder basically writes it down to the to the SSDs. So this is the mechanism behind that and through let's say connected services, we can deploy these algorithms into the cars. We can put missions into the cars that the test driver knows exactly what he needs to do or what he needs to drive along, let's say. So this is where the recording, which was typically in the past was kind of very, very human-based triggered, let's say, is now getting more smarter than it ever can be in this case. It's fun because for a long time now, we've been saying on this podcast that AI is my co-pilot. So when we're talking about all sorts of applications of machine learning and AI in industrial settings, we've been trying to say it's not that AI is replacing jobs or getting rid of people. It's that AI is helping people to do things that they normally couldn't do.

Starting point is 00:17:02 It's helping people to deal with more data and deal with more situations and more triggers and more examples than ever they could before. And in order to summarize that, we've always said AI is my co-pilot. It's hilarious to think that for you, AI is literally your co-driver. It's literally doing the things that a co-driver would have been doing or an engineer would have been doing in the car. You know, Frank, I think that from an IBM perspective as well, this is something that you might see as more generally applicable beyond autonomous driving, right? So that this same kind of concept could be applied to all areas of industrial IoT. Yes, that's absolutely right. And there's one more aspect I want to add to Alexander's idea here.

Starting point is 00:17:47 What we see is the crossing of two metadata information problems. And we have the IT metadata, I think, which we all know of. It's a classical file system, metadata, file type, file size, all these things. But when we come to the AI and to the preselection and to the engineering part of it, you have a different metadata language. Engineers are interested in speed, in weather conditions, in place, GPS information, temperature, etc. So we have to cross both metadata information from a technical perspective. This is where, let's say, from an IDM perspective, we have fast file systems

Starting point is 00:18:26 and we know how to deal with these kinds of problems. But we have to help the engineers. And this is not only true for autonomous driving. This is true for all the other use cases where we see AI. It is the domain-specific metadata language, which is important. For a medical guy, he has another domain-specific metadata language, which is important. For a medical guy, he has another domain-specific

Starting point is 00:18:47 metadata language. And we have to match both in order to make AI, let's say, usable, helpful, and scalable more or less. And I think the other thing that the autonomous driving and training the cars, at least for me, I know a lot of folks in the circles I run in, we're talking about edge and moving edge data centers out towards more of the edge of the network in order to kind of catch streaming data coming off of vehicles into quote unquote edge data centers, at every cell site or wherever, we've instead turned the car into the edge data center.

Starting point is 00:19:30 And I wonder if that's true in other fields as well, right? Are we bringing the edge close to hospitals and manufacturing plants and power plants and then these kinds of things, or are we actually taking the compute and storage directly into these buildings and these facilities? It's a good question, but I think we see variations here. And definitely edge computing and edge AI-based computing is a very, very relevant topic. IBM just released this week IBM Spectrum Fusion, which is a data center, an edge data center concept running all in one storage, AI, GPU, CPU computing,

Starting point is 00:20:06 and it actually fits very, very nicely into the problems you described. Going into typically data centers very, very close to a specific problem. And we all know latency is the problem. Speed of light is constant. And the sensors will give us more and more data. So I think it's always a compromise between what the problem is you have to solve, what the compute power is you need for that, and how your engineering or how your development team is distributed around the world in order to, let's say, solve the problem in a fashion.

Starting point is 00:20:41 I think we will see much more of these hybrid approaches in this case. And I think that was also the, I was part of the NVIDIA GTC conference the last couple of two weeks. And we saw a lot of these use cases where AI is moving, not only in the data center, but moving out to the edge.

Starting point is 00:21:02 And it's a very, very clear trend. And it relates to all the industries, from manufacturing to mining to oil and gas to automotive to medical, et cetera. It makes a lot of sense. And I can see that definitely the hybrid approach being, I mean, even in the autonomous vehicles, you're doing the same thing, right?

Starting point is 00:21:20 You have a lot of capability in the vehicle, but you're still bringing that back and then offloading it into a central data lake. So I see that there as well. In that kind of distributed hybrid model, does Kubernetes and containers play a role in being able to allow that to flex that way? Yeah, I think Kubernetes and the container technology is the lingua franca in this game. This is, let's say, the way how the development people, if it's an automotive engineer or a doctor, will use his tools and his software in order to work on the data. That's the universal way of getting the computer. And even, no problem, also to move of getting the computer.

Starting point is 00:22:09 And even no problem also to move containers inside the car. I think that will be definitely, we see this already and container is, let's say the idea behind the container. We all know that it's an easy to use app like concept and this works very well. And I think we will see much more in the future. And it's a very good concept also working in this hybrid scenarios there. I think we've even seen situations

Starting point is 00:22:33 where Kubernetes is used not just for the infrastructure but also for the production. So treating, for example, you know, different machines at a factory as basically Kubernetes instances and using Kubernetes to manage the machines themselves as if they were containers. And I'm wondering in the future how that might happen. communications applications where Kubernetes is used not just to manage deployment of applications, but also to manage the literal cell sites themselves or satellites themselves as autonomous items. So it's a really interesting concept and it shows how concepts or ideas from one part of

Starting point is 00:23:21 the world, one part of the industry can come into another part of the industry and become relevant beyond that. And for me, again, to the topic at the start of this, that was the thing that I think was most interesting in talking to Alexander and B-plus about how autonomous vehicle development works. Thinking about not just, of course, the excitement of autonomous vehicles, but also how that concept can be used and applied to greater parts of the industry. So before we wrap up here, I thought I would just give you a moment, you know, to sort of sum up from your perspective. I'll start with Alexander. What is the big message? What would you like people who are deploying AI and machine learning to know? What lesson would you like to share with them from autonomous vehicle development? Well, the point is, and what we definitely look for

Starting point is 00:24:19 in these kind of development vehicles is high quality data because of course getting into AI training stuff and train neural networks you need high quality data so you have to be aware of what you do with the sensor data what kind of input sources you get to your neural network training and this is where we dedicatedly look at raw data so no compression nothing at all in this in this concept and of course a very very close time correlated data capturing so we have a time synchronization in place which which takes every data stream on a nanosecond base accuracy together because there's no point in training a model with a delay between the camera and the radar of a few milliseconds because this will void the functionality at the end of the day.

Starting point is 00:25:11 And this is something you decently have to look for. A really good concept for getting the right data, getting it in high quality, getting it stored properly, and of course getting it also handled easily into the IT infrastructure. And this is where we also bring technology into the car in order to ease that process to really ingest the data at the end of the day. It's a very good point. I think data is the big thing here. And I think about 80% of the time on the AI front has to be spent on data quality. This is true for the data recording inside the car, as Alexander said, but I think it's also true when it comes in the preparation of the AI training. This is where you, it's not the algorithm itself, it's not TensorFlow, it's not these things. These are pretty much usable and done and you don't have to reinvent the wheel here, but the data which is fed into the AI algorithm,

Starting point is 00:26:10 this data is the, that's the relevant part more or less. And here we see huge data lakes, write tooling, open source, container based, that, I think, that's the topic of today. Well, thank you very much. It's been really interesting to learn about this. Before we go, we do have a tradition here where we ask our guests three unanticipated surprise questions just to get a feel for how they see

Starting point is 00:26:40 the field of artificial intelligence developing in the future. Now, those of you who've listened to the podcast before probably think that this is a slam dunk question for our guests, since a couple of these questions are specifically about autonomous driving. So, you know, I'm going to get that one out of the way right from the start. Let's put Alexander on the spot here with our key questions. So, Alexander, this is something we've asked numerous guests every week here on Utilizing AI, and you probably have more insight into this question than anyone we've asked. So,

Starting point is 00:27:12 here you go. When will we see a full self-driving car that can drive anywhere at any time in any conditions? That's really the million-dollar question Stephen. So yes, there's a lot of movement in this direction. There's a lot of development going on. And there was a saying that in 2020, we will see something. There was a saying that in 2025, we will see something. And I read another article which just claimed, matter when autonomous driving will always be five years away. And there is some truth in that because the complexity of the world to really go everywhere will take several years still from now. But there will be dedicated. And this is what we already see.

Starting point is 00:28:01 We see people movers going onto the streets. We have regulations, legal regulations, which are more and more focusing on allowing that as well. So there are areas where we will see it within the next couple of years in full autonomous operation, of course. But the first things happening to passenger cars will be sort of highly automated drive platforms. So this will help you run the highway without any interaction anymore. And this is, I think, further down the road, like two or three years time, maybe five years timeframe. But having a device or a car which can operate in every kind of scenario, my personal opinion, it's going to be like 2030 plus. Well, thank you very much for that. As I

Starting point is 00:28:54 said, you're probably in better position to answer that than many of our guests. So I appreciate your answer to that. Frank, let's throw you a little bit of a curveball here. Can you think of any jobs that will be completely eliminated by AI in the next five years? I was listening to a nice podcast about jet fighter pilots. And I think we will see very soon that all these dangerous jobs, very dangerous jobs, and I think being a pilot in the combat space using jets in order to do dangerous things,

Starting point is 00:29:37 this will be done not any longer by people. OK, very good. And actually, for the final question, I'm going to give each of you a chance to answer this one since we've got two guests and three questions. So let's go with Alexander first. How small can ML get? Will we have machine learning powered household appliances?

Starting point is 00:29:59 How about toys or even disposable machine learning devices? Well, my personal opinion is that once we see that kind of shrinking of technology and still having a decent kind of compute power to really work on neural networks, I think we will most likely not see disposable things running ML, but household devices, certainly, there will be things coming up on that. There even are, I think, already supported things like television with artificial intelligence and these kind of things. So we see that and it will get even smaller in future, looking at the chip makers, looking at the performance they have, looking at the mechanics they're using in there.

Starting point is 00:30:52 What do you think, Frank? Yeah, I think, yes, absolutely. It's I think a question of the prices of the sensors and I've seen vacuum cleaners for normal people household using lighter technology. So I think that's a relevant sign here. The cheaper the sensors, the more sense it makes to use these devices together with AI.

Starting point is 00:31:20 Well, thank you both very much for joining us today. Where can people connect with you and follow your thoughts on artificial intelligence and other enterprise IT topics? Frank? You can find me on LinkedIn, and you can find my latest NVIDIA presentation at the NVIDIA GTC conference. Look for IBM and NVIDIA.

Starting point is 00:31:39 The number was SS33232. And it's a nice video, a nice presentation, and it's for free. So feel free or send me email, no problem. Thank you very much. Yeah, there were so many great presentations at GTC. It's nice to call out the ones that are most interesting. Alexander, how about you? Where can we connect with you? Well, you can also find me on LinkedIn. And we just recently did a post about our demo car, which really shows that technology. You can find that also on LinkedIn, but we also have a dedicated site for that, which is max.b-plus.com. And you will find a lot of things there, even videos. It's, of course, for free. So look into that. And there is even, let's say, virtual shows. We do that online and show you the examples if you want to see that, see the demo in life, that's also possible. Thank you very much. And Chris, have you been up to anything interesting lately? I think lots of things. ChrisGrendeman.com is the hub for many of those things, or I'd love

Starting point is 00:32:43 to have conversations on LinkedIn as well. And you can follow me on Twitter at Chris Gwendolyn. Thanks a lot, Chris. And as for me, I do want to call out in relation to GTC, we did do quite a lot of coverage at gestaltit.com of the GTC announcements, including an editorial that I'm particularly proud of, an editorial video, where I talk about the NVIDIA Grace platform and how most of the people in enterprise IT have that platform totally wrong. So if you just Google Foskett NVIDIA Grace, you'll find it probably, or you can go to gestaltit.com and click videos and check some is the title of our editorial video series. Of course, we're also planning our AI field day event coming up. If you go to techfieldday.com and click on AI, you will see a list of companies that are going to be presenting at our upcoming AI Field Day, as well as some of the delegates that are signing up to join us around the table virtually for that event, which is coming here next month in May. So

Starting point is 00:33:41 please do check that out as well. I think you'll see some familiar faces there in that list. And of course, if you're listening and you'd like to be involved in that, just reach out to me at S Foskett on Twitter. I would love to hear from you. Thank you for listening to the Utilizing AI podcast. If you enjoyed this discussion, please do subscribe, rate, and review the show in iTunes. We've noticed that that really does help, even though everybody says it, it really does help. So please do that. Also, please do share this show with your friends and connect them with the Gishtalt IT

Starting point is 00:34:12 and Utilizing AI community. This podcast is brought to you by gishtaltit.com, your home for IT coverage from across the enterprise. For show notes and more episodes, go to utilizing-ai.com or find us on Twitter at utilizing underscore AI. Thanks, and we'll see you next Tuesday.

Your Ad Here

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 2x18: Taking Machine Learning on the Road with IBM and B-Plus

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.