Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 2x18: Taking Machine Learning on the Road with IBM and B-Plus
Episode Date: May 4, 2021Development of autonomous vehicles is an excellent example of machine learning applied to industrial IoT. In this episode, Alexander Noack of b-plus and Frank Kräemer of IBM Germany join Chris Grunde...mann and Stephen Foskett to discuss data collection on the road, central processing, and AI model training. Machine learning is part of the development of autonomous vehicle development and is also used in production in vehicles. It is also used to filter data and enhance processing, and this is the same concept found in many edge and industrial use cases. Edge computing is relevant beyond AI, and these technologies are complementary, with the edge moving right into vehicles, factories, retail outlets, medical facilities, and more. Three Questions When will we see a full self-driving car that can drive anywhere, any time? Are there any jobs that will be completely eliminated by AI in the next five years? How small can ML get? Will we have ML-powered household appliances? Toys? Disposable devices? Guests and Hosts Frank Kräemer, Systems Architect at IBM Germany. Connect with Frank on LinkedIn or on Twitter @IBM. Alexander Noack, Managing Director at b-plus. Connect with Alexander on LinkedIn. Chris Grundemann, Gigaom Analyst and Managing Director at Grundemann Technology Solutions. Connect with Chris on ChrisGrundemann.com on Twitter at @ChrisGrundemann. Stephen Foskett, Publisher of Gestalt IT and Organizer of Tech Field Day. Find Stephen’s writing at GestaltIT.com and on Twitter at @SFoskett. Date: 5/4/2021 Tags: @SFoskett, @ChrisGrundemann, @IBM
Transcript
Discussion (0)
Welcome to Utilizing AI, the podcast about enterprise applications for machine learning,
deep learning, and other artificial intelligence topics.
Each episode brings experts in enterprise infrastructure together to discuss applications
of AI in today's data center and beyond.
Today we're talking about taking machine learning on the road, literally, with B-plus technology and IBM.
First, let's meet our guests.
First up, we've got Alexander Nowak from B-plus technology.
Yeah, hello, Stephen. Thank you very much for being able to join that.
My name is Alexander Nowak. I'm the managing director of B-plus technologies.
And I worked in this field of developing autonomous cars
and ADA sensors specifically through the last eight years,
accompanying a lot of tier one and OEM customers.
And I'm happy to join this podcast
and share my experience on that.
Thank you, Alexander.
It's nice to have you.
Also, we have Frank Kramer from IBM.
Thank you for having me.
This is Frank Kramer.
I'm an IBM systems architect and I'm working for IBM in Germany, working with a lot of automotive customers.
And we see a huge trend in using AI in the development of robotic cars and happy to join
the discussion here.
And of course, I'm joined by my co-host, Chris Grundemann.
Yes, I'm Chris Grundemann.
I'm a consultant, content creator, coach, and mentor.
And you can learn more at chrisgrundemann.com.
And I'm Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT.
You can find me on most social networks, including Twitter, at sfoskett.
Alexander and Frank, Chris and I had a briefing with you about a month ago
where we were really excited to learn about how you're using machine learning in autonomous
vehicles. Those of you who have listened to our podcast knows that this is an area of fascination
for us. And particularly with regard to the podcast, we're interested in how the experience of
collecting analytics and telemetry and modeling machine learning in autonomous vehicles applies
to industrial IoT and machine learning and enterprise generally. So before we start,
Alexander, I wonder if you can describe a little bit of what B-Plus is doing in autonomous vehicle development.
Yes, for sure. Thank you very much for that.
Well, B-Plus and B-Plus technology specifically, we are a group of companies working along with the Tier 1 and OEM customers.
And we've been doing that for 20 years now.
So we started pretty early in ADAS, so driver assistance system development. And now we are
working in the field of AD, so autonomous drive development a
lot. The point is that everything the car recognizes,
no matter which autonomous car it is, or which autonomous system
it is, needs to have perception sensors. So we have a decent
amount of cameras, radars and lighters on those cars.
And those are the input sources for everything which happens in an autonomous car at the end of the day.
Usually there's some high-end compute platform inside the car looking at those technologies which are on the road.
A lot of them is based on NVIDIA technology, for instance.
But all these high performance compute platforms
are not really operating without the sensors.
So the point is for developing the driver,
which needs to sort of work in the autonomous car,
you need to learn about what is the perception of the car?
What is the camera seeing?
What is the laser beam seeing from the lidar?
What is the radar watching at?
And specifically in every scenario the car is running in, you have a different kind of information set coming from those sensors.
The point we do in this case is we capture all the raw information coming from these
sensors as being the input source for every compute platform, which then runs the algorithm, which is specifically
then also deployed in the serious production car at the end of the day. So our customers target at
developing this functionality, develop these algorithms based a lot today on AI trained
neural networks, for instance, or anything similar to this kind of technology. So the input source, as I said, is the sensors.
So we talk about gigabit streams coming from cameras
with high resolution.
We talk about a hundred megabit streams coming from radars.
We even talk about lighters adding to that
with a number of megabytes per second.
So you end up in a couple of gigabytes per second capturing which results
at the end of the day in about 60 to 90 terabytes per day on a test drive shift. This all data is of
course being used specifically at the data center side mainly for AI training and sorting that data,
labeling that data, filtering that data is happening on the data center side usually.
But what we took is we took neural networks into the car.
So there's no point in capturing, let's say, hours of driving on boring highways without having the right scenarios being captured. So the point is to really trigger recordings based on scenario detections which
are running on neural networks on GPUs inside the test car already, which reduces the number of
data sets being generated and being post-processed at the end of the day. And this along we accompany
the projects of our customers, we help them in data acquisition, of course,
and we put this data into data centers
at the ingest stations and co-location spaces
where we, and this is the point
where we work together with Frank and his team,
where we hand over to IBM storage solutions,
for instance, or anything similar to that,
which then feeds the AI training on the data center.
So our target is to really help our customers develop their sensors, develop their perception functionality,
and at the end, develop their drive functionality.
Wow, that's amazing. I mean, even just those numbers, right, 60 to 90 terabytes a day in a single vehicle is pretty astounding.
But I think there's a lot that I want to dig into.
But first, I mean, just rolling all the way back to the beginning and maybe a really basic question.
But these systems in the car, I mean, obviously, you know, when we're talking about NVIDIA GPUs and IBM storage, I mean, this is stuff that usually runs in a data center.
How are you powering that in the car?
Is there additional power systems? Or is it
running on the 12-volt native system? Are you doing converters? Or I mean, just really basics,
how is this even turning on in a car? Well, yeah, you're completely right. That's, of course,
one of the challenges facing in these kind of test cars. Those cars are specific, modified, let's say, to really power also those high
performance compute nodes. And having a, let's say, decent performance on the recording side,
which we usually talk about on the logging in gigabytes per second, let's say, that's something
you really could buffer out of the standard network from the car, which is a few hundred watts consumption.
But having an AI compute node in there, literally our device which we put in there is a dual Xeon platform with multiple GPUs.
So it's something which can also consume up to one kilowatt. So, and this is something which needs to be decently prepared inside the cars.
And we're working with specialists on that side, of course, to add additional power batteries, buffering, power sequencing mechanisms,
and of course additional generators inside the car because the standard generator you have there
for powering the 12 volt system isn't enough really to drive that and this is the specification
we see at those kind of test cars so yes they certainly need to be modified and cannot be run
in a a serious production car at the end in the serious production car of course everything is tweaked to really cover the
performance based on 12 volts and based on a small footprint of power and this is where we see the
the technology of our partners like NVIDIA let's say moving into less and less power consumption
but even rising performance and we have also seen other players in the market who really work into this
direction to have a small, let's say, power footprint with a large compute power behind that.
And this seems like it would be particularly relevant in the electric vehicle space because,
of course, electric vehicles by default have a larger power envelope. but that power envelope is generally used to power
the vehicle, not to power onboard systems like you're describing. So even there, you might need
additional generators and power, I believe. But of course, this is only for development.
We're not talking about a kilowatt of machine learning processing in production cars.
The data comes from the development vehicles and then goes
back to central processing. Is that right, Frank? Yes, absolutely right. What we see here is
that there is not only a single test and development car, so all these large automotive
companies have fleets of these cars. And it's very important for these fleets that they are in use all over the world.
So in Asia, in Europe, in the US,
as the traffic conditions and all the data
they have to capture are totally different there.
But all the data has to come together
into a single data lake.
And what we see here is that the data ingesting coming
from the recording from B+, let's say 20 cars around the world, each 50 terabytes a day, that's a hard data problem.
And in the data center, we see that data ingesting and the preparation cycles are very, very time consuming.
Time to data, this is a relevant metric in order for the engineer to start working.
And what we see, of course, many silos of infrastructure
for various analytics use cases.
So the data from an engineering perspective,
we see problem with multiple copies of the same data
without a single source of truth.
It is very, very hard to manage this on a worldwide basis.
And of course, cloud is a helpful thing.
Hybrid cloud is a helpful thing.
But the sheer amount of data and the volume and to find the needle in the haystack, of course, you can imagine that's a big IT problem.
And of course, this is not only happening for the development. We have to keep all these AI models because they are in a working progress.
Graphic science might change in some place of the world.
So we have to retrain this thing.
So there will be an update in the future for your consumer car, but the development has
to go on and has to go on here.
And yeah, this seems relevant too. So
following the data, we start with sort of these development vehicles that are doing data collection
and processing in the field as part of the early phases of development. Then we pull the data into
the data lake and we're doing processing there. And then the next thing, Frank, that you mentioned
is this concept that we would train and retrain models
and then deploy those models to production vehicles. Is that right? What we see is that about,
let's say, 10 or 15 percent of the data which is recorded on the street will go into the AI
training process. When the AI training is done, and typically inside such an AI training process, it's not only one AI model, it's about 15cessing, and this is to make sure that the AI model is really safe
and is really capable of working in all conditions, you have to reprocess all the data that you
have collected.
And these data libraries, we see 200 petabytes to 500 petabytes.
This is a typical size.
And this is all the data which is in movement daily,
more or less.
And it's a little bit bigger than what
we normally see in databases or data lakes
not using AI technology.
Yeah, absolutely.
It's almost staggering, the size of data there.
And which I think kind of harks back to something, Alexander, you said a little bit earlier, which was, I think you started talking about kind of the smart recording or specific scenes you're looking for.
Can you tell us a little bit more about how the system on the car knows what to record? Because it sounds like if you recorded everything for the whole drive, you would just actually be overwhelmed, right? And so what you're actually doing is targeting specific things to learn the next step for the algorithm.
Is that right?
Yes, that's completely right, Chris. captured based on specific test drives, eight hour shift with data being either continuously recorded or even
triggered by a guy sitting next to the test driver.
So who has a list of things he needs to look at,
who has a list of scenarios they are looking for.
And we move that into a more like let's say sophisticated approach
and we can deploy sort of algorithms for instance give me a crossing with let's say three pedestrians
and five cars and and red light scenarios and these kind of things we deploy that into the
vehicle and the vehicle itself can sort of get the data from the cameras which are there, finds out by this algorithm, which is then also based on GPU processing, of course, finds out there is a scene like that. we have a pre-trigger and a post-trigger time. So we keep like 15 to 30 seconds in RAM permanently.
And once the trigger occurs,
of course, I can look like 30 seconds back
and of course also capture the scene
wholly like a minute additionally.
And this gives us the possibility
to really bring fleets into the roads
or onto the roads, test fleets onto the
roads, which doesn't necessarily need a trained co-pilot or a trained driver, because the system
itself can do that. And even there's customers looking at, let's put it like that, taxi drivers,
which run around the streets, and they want to take them into the concept of getting data being stored.
But of course, having a taxi driver on the road means a different story than having a trained test driver or having an engineer.
And in certain stages of the development, every such kind of concepts does have its sense, of course.
So, but this is something which happens. So basically,
what we take is we have the camera, for instance, this streams the data into the recorder,
the recorder permanently keeps that in RAM, not on the disk, but also forwards the data to the
high performance compute node. The high performance compute node checks the camera picture. And once
he sees a scenario, he can trigger the recorder and the recorder basically writes it down to the to the SSDs.
So this is the mechanism behind that and through let's say connected services, we can deploy these algorithms into the cars.
We can put missions into the cars that the test driver knows exactly what he needs to do or what he needs to drive along,
let's say. So this is where the recording, which was typically in the past was kind of very, very human-based triggered, let's say, is now getting more smarter than it ever can be
in this case. It's fun because for a long time now, we've been saying on this podcast that AI is my co-pilot.
So when we're talking about all sorts of applications of machine learning and AI in industrial settings, we've been trying to say it's not that AI is replacing jobs or getting rid of people.
It's that AI is helping people to do things that they normally couldn't do.
It's helping people to deal with more data and deal with more situations and more triggers and more examples than ever they could before.
And in order to summarize that, we've always said AI is my co-pilot. It's hilarious to think
that for you, AI is literally your co-driver. It's literally doing the things that a co-driver
would have been doing or an engineer would have been doing in the car.
You know, Frank, I think that from an IBM perspective as well,
this is something that you might see as more generally applicable beyond autonomous driving, right?
So that this same kind of concept could be applied to all areas of industrial IoT.
Yes, that's absolutely right. And there's one more aspect I want to add to Alexander's idea here.
What we see is the crossing of two metadata information problems.
And we have the IT metadata, I think, which we all know of.
It's a classical file system, metadata, file type, file size, all these things.
But when we come to the AI and to the preselection and to the engineering part of it, you have a different metadata language.
Engineers are interested in speed, in weather conditions, in place, GPS information, temperature, etc.
So we have to cross both metadata information from a technical perspective.
This is where, let's say, from an IDM perspective,
we have fast file systems
and we know how to deal with these kinds of problems.
But we have to help the engineers.
And this is not only true for autonomous driving.
This is true for all the other use cases where we see AI.
It is the domain-specific metadata language,
which is important.
For a medical guy,
he has another domain-specific metadata language, which is important. For a medical guy, he has another domain-specific
metadata language.
And we have to match both in order to make AI, let's say,
usable, helpful, and scalable more or less.
And I think the other thing that the autonomous driving
and training the cars, at least for me,
I know a lot of folks in the circles I run in, we're talking about edge and moving edge data centers out towards more of the edge of the network in order to kind of catch streaming data coming off of vehicles into quote unquote edge data centers,
at every cell site or wherever,
we've instead turned the car into the edge data center.
And I wonder if that's true in other fields as well, right?
Are we bringing the edge close to hospitals
and manufacturing plants and power plants
and then these kinds of things,
or are we actually taking the compute and storage
directly into these buildings and these facilities?
It's a good question, but I think we see variations here. And definitely edge computing and edge AI-based computing is a very, very relevant topic. IBM just released this week
IBM Spectrum Fusion, which is a data center, an edge data center concept running all in one storage, AI, GPU, CPU computing,
and it actually fits very, very nicely into the problems you described.
Going into typically data centers very, very close to a specific problem.
And we all know latency is the problem.
Speed of light is constant.
And the sensors will give us more and more data.
So I think it's always a compromise between what the problem is you have to solve, what
the compute power is you need for that, and how your engineering or how your development
team is distributed around the world in order to, let's say, solve the problem in a fashion.
I think we will see much more of these hybrid approaches
in this case.
And I think that was also the,
I was part of the NVIDIA GTC conference
the last couple of two weeks.
And we saw a lot of these use cases
where AI is moving, not only in the data center,
but moving out to the edge.
And it's a very, very clear trend.
And it relates to all the industries,
from manufacturing to mining to oil and gas
to automotive to medical, et cetera.
It makes a lot of sense.
And I can see that definitely the hybrid approach being,
I mean, even in the autonomous vehicles,
you're doing the same thing, right?
You have a lot of capability in the vehicle,
but you're still bringing that back
and then offloading it into a central data lake.
So I see that there as well.
In that kind of distributed hybrid model, does Kubernetes and containers play a role in being able to allow that to flex that way?
Yeah, I think Kubernetes and the container technology is the lingua franca in this game. This is, let's say, the way how the development people, if it's an automotive engineer or a doctor, will use his tools and his software in order to work on the data.
That's the universal way of getting the computer.
And even, no problem, also to move of getting the computer.
And even no problem also to move containers inside the car. I think that will be definitely, we see this already
and container is, let's say the idea behind the container.
We all know that it's an easy to use app like concept
and this works very well.
And I think we will see much more in the future.
And it's a very good concept also working
in this hybrid scenarios there.
I think we've even seen situations
where Kubernetes is used not just for the infrastructure
but also for the production.
So treating, for example, you know,
different machines at a factory
as basically Kubernetes instances and using Kubernetes to manage the machines themselves as if they were containers.
And I'm wondering in the future how that might happen. communications applications where Kubernetes is used not just to manage deployment of applications,
but also to manage the literal cell sites themselves or satellites themselves as autonomous
items. So it's a really interesting concept and it shows how concepts or ideas from one part of
the world, one part of the industry can come into another part of the
industry and become relevant beyond that. And for me, again, to the topic at the start of this,
that was the thing that I think was most interesting in talking to Alexander and B-plus
about how autonomous vehicle development works. Thinking about not just, of course, the excitement of autonomous vehicles, but also how that concept can be used and applied to greater parts of the industry. So before we wrap up here, I thought I
would just give you a moment, you know, to sort of sum up from your perspective. I'll start with
Alexander. What is the big message? What would you like people who are deploying
AI and machine learning to know? What lesson would you like to share with them
from autonomous vehicle development? Well, the point is, and what we definitely look for
in these kind of development vehicles is high quality data because of course getting into AI training
stuff and train neural networks you need high quality data so you have to be aware of
what you do with the sensor data what kind of input sources you get to your neural network
training and this is where we dedicatedly look at raw data so no compression nothing at all in this in this
concept and of course a very very close time correlated data capturing so we have a time
synchronization in place which which takes every data stream on a nanosecond base accuracy together
because there's no point in training a model with a delay between the camera and the
radar of a few milliseconds because this will void the functionality at the end of the day.
And this is something you decently have to look for. A really good concept for getting the right
data, getting it in high quality, getting it stored properly, and of course getting it also
handled easily into the IT infrastructure. And this is where we also bring technology into the car in order to ease that
process to really ingest the data at the end of the day. It's a very good point. I think data is
the big thing here. And I think about 80% of the time on the AI front has to be spent on data quality. This is true
for the data recording inside the car, as Alexander said, but I think it's also
true when it comes in the preparation of the AI training. This is where you, it's not the
algorithm itself, it's not TensorFlow, it's not these things. These are pretty much usable and done and you don't have to reinvent the wheel here, but the data which is fed into the AI algorithm,
this data is the, that's the relevant part more or less. And here we see huge data lakes,
write tooling, open source, container based, that, I think, that's the topic of today.
Well, thank you very much.
It's been really interesting to learn about this.
Before we go, we do have a tradition here
where we ask our guests
three unanticipated surprise questions
just to get a feel for how they see
the field of artificial intelligence
developing in the future.
Now, those of you who've listened
to the podcast before probably think that this is a slam dunk question for our guests, since a couple
of these questions are specifically about autonomous driving. So, you know, I'm going to get that one
out of the way right from the start. Let's put Alexander on the spot here with our key questions.
So, Alexander, this is something we've asked numerous guests every week here on
Utilizing AI, and you probably have more insight into this question than anyone we've asked. So,
here you go. When will we see a full self-driving car that can drive anywhere at any time in any
conditions? That's really the million-dollar question Stephen. So yes, there's a lot of
movement in this direction. There's a lot of development going on. And there was a saying
that in 2020, we will see something. There was a saying that in 2025, we will see something.
And I read another article which just claimed, matter when autonomous driving will always be five years away.
And there is some truth in that because the complexity of the world to really go everywhere will take several years still from now.
But there will be dedicated.
And this is what we already see.
We see people movers going onto the streets.
We have regulations, legal regulations, which are more and more focusing on allowing that
as well. So there are areas where we will see it within the next couple of years in
full autonomous operation, of course. But the first things happening to passenger cars will be sort of highly automated drive platforms.
So this will help you run the highway without any interaction anymore.
And this is, I think, further down the road, like two or three years time, maybe five years timeframe.
But having a device or a car which can operate in every kind of scenario,
my personal opinion, it's going to be like 2030 plus. Well, thank you very much for that. As I
said, you're probably in better position to answer that than many of our guests. So I appreciate your
answer to that. Frank, let's throw you a little bit of a curveball here.
Can you think of any jobs that will be completely eliminated
by AI in the next five years?
I was listening to a nice podcast about jet fighter pilots.
And I think we will see very soon
that all these dangerous jobs, very dangerous jobs, and I think being a pilot in the combat
space using jets in order to do dangerous things,
this will be done not any longer by people.
OK, very good.
And actually, for the final question,
I'm going to give each of you a chance to answer this one
since we've got two guests and three questions.
So let's go with Alexander first.
How small can ML get?
Will we have machine learning powered household appliances?
How about toys or even disposable machine learning devices?
Well, my personal opinion is that once we see that kind of shrinking of technology and
still having a decent kind of compute power to really work on neural networks, I think
we will most likely not see disposable things running ML, but household devices, certainly,
there will be things coming up on that. There even are, I think, already supported things like
television with artificial intelligence and these kind of things. So we see that and it will get
even smaller in future, looking at the chip makers, looking at the performance they have,
looking at the mechanics they're using in there.
What do you think, Frank?
Yeah, I think, yes, absolutely.
It's I think a question of the prices of the sensors
and I've seen vacuum cleaners
for normal people household using lighter technology.
So I think that's a relevant sign here.
The cheaper the sensors, the more sense it makes
to use these devices together with AI.
Well, thank you both very much for joining us today.
Where can people connect with you
and follow your thoughts on artificial intelligence and
other enterprise IT topics?
Frank?
You can find me on LinkedIn, and you can find my latest NVIDIA presentation at the NVIDIA
GTC conference.
Look for IBM and NVIDIA.
The number was SS33232.
And it's a nice video, a nice presentation, and it's for free. So feel free or
send me email, no problem. Thank you very much. Yeah, there were so many great presentations at
GTC. It's nice to call out the ones that are most interesting. Alexander, how about you? Where can
we connect with you? Well, you can also find me on LinkedIn. And we just recently did a post about our demo car, which really shows that technology. You can find that also on LinkedIn, but we also have a dedicated site for that, which is max.b-plus.com. And you will find a lot of things there, even videos. It's, of course, for free. So look into that. And there is even, let's say, virtual shows. We do that online
and show you the examples if you want to see that, see the demo in life, that's also possible.
Thank you very much. And Chris, have you been up to anything interesting lately?
I think lots of things. ChrisGrendeman.com is the hub for many of those things, or I'd love
to have conversations on LinkedIn as well. And you can follow me on Twitter at Chris Gwendolyn.
Thanks a lot, Chris. And as for me, I do want to call out in relation to GTC, we did do quite a lot of coverage at gestaltit.com of the GTC announcements, including an editorial that I'm particularly proud of, an editorial video, where I talk about the NVIDIA Grace platform and how most of the people in enterprise IT have that platform totally wrong. So if you just Google
Foskett NVIDIA Grace, you'll find it probably, or you can go to gestaltit.com and click videos and
check some is the title of our editorial video series. Of course, we're also planning our AI
field day event coming up. If
you go to techfieldday.com and click on AI, you will see a list of companies that are going to
be presenting at our upcoming AI Field Day, as well as some of the delegates that are signing
up to join us around the table virtually for that event, which is coming here next month in May. So
please do check that out as well. I think you'll see some familiar faces there
in that list. And of course, if you're listening and you'd like to be involved in that, just reach
out to me at S Foskett on Twitter. I would love to hear from you. Thank you for listening to the
Utilizing AI podcast. If you enjoyed this discussion, please do subscribe, rate, and review
the show in iTunes. We've noticed that that really does help, even though everybody says it, it really does help.
So please do that.
Also, please do share this show with your friends
and connect them with the Gishtalt IT
and Utilizing AI community.
This podcast is brought to you by gishtaltit.com,
your home for IT coverage from across the enterprise.
For show notes and more episodes,
go to utilizing-ai.com or find us on Twitter at utilizing underscore AI.
Thanks, and we'll see you next Tuesday.