Grey Beards on Systems - 79: GreyBeards talk AI deep learning infrastructure with Frederic Van Haren, CTO & Founder, HighFens, Inc.
Episode Date: January 28, 2019We’ve talked with Frederic before (see: Episode #33 on HPC storage) but since then, he has worked for an analyst firm and now he’s back on his own again, at HighFens. Given all the interest of lat...e in AI, machine learning and deep learning, we thought it would be a great time to catch up … Continue reading "79: GreyBeards talk AI deep learning infrastructure with Frederic Van Haren, CTO & Founder, HighFens, Inc."
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Howard Marks here.
Welcome to the next episode of Greybeards on Storage podcast,
a show where we get Greybeards storage bloggers to talk with system vendors
and technologists to discuss upcoming products, technologies, and trends affecting the data
center today. This Greybeard on Storage episode was recorded on January 23rd, 2019.
We have with us here today, Frederick Van Haren, CTO of Hyfence. Frederick's an old friend who's
been on our show before and is focused on AI consulting and services. So Frederick,
why don't you tell us a little bit about yourself and what you've been up to? Sure. As you mentioned, my name is Frederik van Heeren. In a previous life,
I used to build large HPC clusters for speech recognition. I did that for more than a decade
and wanted to do something else. So, I thought it was a grandiose idea to become a consultant.
Think about HPC, big data, AI, that's typically the areas I play in. So what
have I been doing? Well, mostly I've been doing a lot of NDA work for storage vendors,
being startups as well as established vendors. And so the technologies there were typically
NVMe or objects. Those are the kind of the two technologies
that storage vendors really were looking for.
Well, yeah, what's happening at the extremes?
Because we figured the middle out.
Yeah, apparently.
Okay, so disk is not a discussion point.
Okay.
Okay.
So as well as helping vendors and customers with AI projects, right,
going from early stages to complete transformation projects.
But also I've been doing some analyst work really to get some additional name recognition
because you can't get enough of that, right?
Why do you think we're here?
So we've been hearing for the past year or 18 months a lot from storage vendors about AI and ML applications.
And over and over I get press releases.
Would you like to speak with us about this NVIDIA thing
that we've combined with our storage?
But Ray and I are old,
and so we understand OLTP as a workload,
and we understand VMs as a workload.
How does AI deal with the storage?
How do we keep people running the applications you want to run happy?
Yeah, it's a question, unfortunately, with a very long answer because AI is really…
Well, we got about an hour. look at the definition of AI, or at least most of the implementations today, is really using
human reasoning and create a model to predict the future outcome driven by past events. And
what are past events? Past events really means data. So you're going to use data and process
data as fast as you can in order to get results. That is really AI from a high level. If you, you know, if you,
since we have an hour, I can describe a little bit, you know, the history of AI. So AI is a term
that almost has been around as long as HPC, right? And in the meantime, AI today means something
completely different than it meant in the 50s and the 80s. So in short, let's call traditional AI whatever happened between 1950 and 1980.
And so what happened in those days is if they had a problem to solve,
let's take the game of chess as an example, right?
So in order to come up with a model or an application that could play chess against a human, they had to extract all the rules and all the knowledge and how to play offense and how to play defense and how to recognize patterns.
All of that information, they had to get it from somewhere.
So in those days, they had to extract that information out of a Garry Kasparov and another grandmaster and come up with rules.
And those rules would be coded into an application.
Mind you, I wrote a checkers program in college and I got the award for the most computer time used in one semester ever.
I'm not sure it's been revoked since then, but again, they went for a more client server solution.
But at the time it was mainframe systems and I just blew through it. It was fun, but it's been revoked since then but again they went for a more client server solution but at the time
was mainframe systems and i just i just flew through it it was fun but it's crazy yeah i mean
it's so if you think about it it was all about you know taking rules taking whatever a grandmaster
knew translate that into code and then you had a model the problem was that if you wanted the model to learn, you actually had to programmatically implement that all over again.
And as you know, that's kind of a slow process and takes a lot of time and interaction from a programmer.
So let's move a little bit forward.
So if you look at machine learning, so machine learning is considered a subset of AI or the traditional AI, if you wish.
And now we're kind of in the 1980s, 2010. And so the way you can think about that is Hadoop
and Spark, those kinds of technologies where the open source community kind of kicked in and said,
look, you know, if you want to process things, you don't have to buy those large, expensive infrastructure.
You can use open source things like Linux, Spark, Hadoop.
And then from a hardware perspective, you also had the benefit that there were laws more slow, right?
Everything was going faster and faster, as well as the fact that hardware was becoming cheaper.
Yeah, and storage was becoming cheaper as well.
Yeah.
And so if we apply this again to the chess game, right, so the way you can look at it
is that ML applied to chess is where you still implement the rules of the chess game into
your ML system.
You know, from a technology standpoint, we'd call that the features.
That's kind of the right word to use.
And then you would use some data from a bunch of chess games that have been played.
So the combination of those features slash the rules together with the data, you were able to build a model that could learn from itself. So every time you use the model, people played against it. That would be
new data. That new data then can be put in the picture again. You create an updated model,
and there you are. You have something that works. So if I get this right, in the old days,
we would create rules that were, you should move your knight this way. And today we say,
these are the rules of chess. A knight is allowed to move in these ways and watch these 6 million
games and figure out how you should move it. That's right.
That's right.
And because in those days, you know, one of the things that was a problem for ML was a
lack of what we call good quality data, right?
It was not, there was no lack of data, but a lack of good quality data, meaning very
good chess games,
yes, well verified, without errors and all that stuff.
Because once you start heavily relying on data
and you use data as a way to make decisions,
bad data or bad quality data kind of means
that your output also will be affected by them.
So I can't enter the chess games between the nine-year-olds who'll cheat
when the other guy turns his back?
That's right.
Or the cheating will end up in the model, right?
Yeah.
I suppose that could be good and bad, but yeah, okay.
So that's machine learning.
So machine learning is really taking advantage of the data you have.
And then we enter kind of the 2010 area and now,
which is deep learning.
So what's so specific about deep learning?
Well, imagine that if you have enough data
that represents a significant amount of chess games,
instead of having the features slash the rules being dictated to you,
you can just start with data
and let the data figure out how the game is played.
Yeah, I'd say let the model figure out how the game is played based on the data, right?
That's right.
And since I never saw a knight move one square forward, he can't.
Yes.
And because of that, you enter a different level of complexity, right? So now that you need a lot of data, you also have to find a way to analyze that enormous amount of data in a timely fashion. came to the rescue where they started delivering open source frameworks that
would do all the analytical pieces for you right so the frameworks that there
exists today or cafe and tensorflow and pipe torch and Keras you probably have
heard those names flying around what those are those are frameworks that is
not data but those are the frameworks that allow you to process that data.
Now, those frameworks also realize that they should take advantage of the new processing
capabilities that exist today. And the processing capabilities are, you know, the number one used
today really is GPUs, right? So instead of using a CPU, they decided to use GPUs.
Google came up with Tensor, which is their hardware implementation. The software for that
is Tensor. And then FPGAs and ASICs, it's kind of flashback for me to DSPs and the like. But in reality, the way you can look at that is that this is hardware that allows you to do fast mathematical calculations.
And each one of those is taking shortcuts, mathematical shortcuts, without really impacting the results so much. It seemed like the TPUs had very limited precision, I think, was a key there.
I mean, rather than like 32 bits or 16 bits or 64 or whatever, it was almost on the order
of 8 or 16 bits.
Yeah.
So the goal of Tensor was to do a matrix operation, a 4x4 matrix, two 4x4s.
But each of the elements or the numbers are floating point 16,
but the outcome is a 32-bit floating point, right?
So the message there is not to do 32-bit floating point bit calculations
out of the gate, but to use 16-bit floating point bit calculations out of the gate, but to use 16-bit floating point and end up with 32-bit to do it faster.
It's all about cutting corners without impacting the outcome.
And so that's really what's happening.
And if you look at a physical piece of hardware,
if you take a one-use server with two sockets, it might have like 40 CPU cores.
If you take a 4U server with eight GPU cards in it, that 4U server can have 40,000 GPU cores.
So much greater parallelism for the simple thing that we're doing.
That's right.
So taking advantage of parallelism. But it also comes with a whole different bag of problems, right? you have to deal with storage, you have to deal with processing capabilities, GPUs, TPUs, FPGAs,
and then also the interconnect, right? If you have fast storage and you have fast GPUs,
you need some network, not only that does the fast processing, but also keeps the latency low,
right? So imagine that all those GPU cards have to talk to each other,
you know, kind of MPI-alike workload.
And in order to get, you know,
to keep those things going real time,
you need to have the lowest latency you can have, right?
So for DL, you need all innovation in all those areas.
But I must say that the open source community with the frameworks is kind of putting everything upside down, right?
So if we look at the traditional AI, everything was about the code and the rules, meaning that, you know, your source code was the IP.
While today, the open source community is delivering the tools.
So that's not your IP.
Your data is the IP.
Yeah.
It's all about the data.
Yeah.
Okay.
So would it be, I mean, to oversimplify it, this GPU, TPU, FPGA compute core is so fast and so expensive, we want to build the rest of the system around it to make
sure that it's constantly fed? So I got two questions here. So there's, I'll call it a
training phase and there's an operational phase of any AI and it's probably some other phases,
I don't know. But during the training phase, you're taking this data and you're passing it into this framework model
configuration and using it to build the model intelligence, I'll call it. I'm not exactly
certain what I'd call it. So that's happening during training. But once you deploy the model,
let's say it's a, I don't know, self-driving car or something like that. So the model is sitting there probably in the car, I guess, because it needs to have real-time control.
It's taking as input from data from the sensors, lidars, cameras, sound, you know, movement,
you know, also the vehicle speed and direction and those sorts of stuff. And then it's somehow
taking all that information and saying, okay,
this is where you want to go in this situation at this moment.
And then the next moment is another set of data comes in, et cetera.
Is that how a sort of thing works?
Yes. So, so indeed.
So AI really consists of two pieces. One is, is building the model, right?
So you have to,
you have to have a base model in order to do anything.
And as you mentioned, it's called training, right?
So let's assume a typical example is if you have 10,000 pictures of something
and you want to use that for AI to recognize if there is a cat or a dog in the picture, then what you would do is you would take
80% of your pictures and you would use them to create a model. And so that's the training phase.
And the outcome of the training phase is a model. And that model is something you can use
in production. The actual term that people typically use is inference.
Inference is the process where you deploy your model and you validate it against new
input, which is people using the system. So in the car, you get that feed, and then there's a feedback loop so once the model has has predicted yeah that's predicted you
know whatever whatever came in there is there is a a feedback loop where you sent that data back
added to your training and you can update your model and it's kind of an an infinite loop right
so as long as you use the system you you can use that data to improve the model.
Is that loop process,
it's almost like a batch,
you know, yet another training phase
with all the data that you had before,
plus any new data you have verified
and vetted and all that stuff.
And you go through and do the training pass again,
or is it real time?
You're feeding that new data into the
model and it's adjusting itself however. Yeah, it depends on the application. Training typically
is batch and inference is typically real time. But, you know, give you an example, if Waze,
you know, Waze, the app that most people now use to navigate, that is obviously real time because
you, if there's an accident somewhere,
you don't want to know about it within an hour.
In an hour, you want to know it now, right?
So there is a little bit of a real-time aspect
and you can, you know, there are ways to deal with that.
But then there are other products where,
let's say, if you use a speech recognition application
for a bank where updating the model is not something that has to happen right away, right?
So you might have an SLA and the SLA might say, I expect four-hour turnaround for new models, right?
And then the user uses the system and then it goes along.
But in general, training is batch and inference is really real-time.
So you could have like a four-hour model update frequency.
I mean, I always thought, you know, the DevOps guys doing new code every day is pretty bizarre.
But changing the model every four hours is a reasonable thing?
Yes, it's reasonable in the sense, well, if the application and your service allows it.
But it's all automated, right?
There's no human interaction, right?
The choice there is what kind of service do you deliver?
How fast do you need it?
And how do you handle it, right?
But it's all automated.
It's just GPU or CPU time. And typically people have a training cluster on one side
and then they use on the side like this closed loop or adaptation, if you wish.
You know, the term to change your model is typically referred to as adaptation.
We adapt the model, right?
And to kind of personalize it to you.
Because that's really what you want, right? And to kind of personalize it to you. Because that's really what you want, right?
If you build an application for a service, let's say, you know, Amazon, not that I worked on Amazon,
but I presume that Amazon, when you log in for the first time in Amazon, has no idea what you want,
but they give you a basic line of products on the on the on the on the front page
and then as you use the system and you yeah the recommendations will adjust just for you right and
and and how fast and and soon enough you'll be like everybody else and as soon as you buy something
you'll go to facebook and see ads for the thing you just bought. Yeah, there's some articles where Amazon is thinking of shipping stuff to you before you
actually decided to purchase them.
Oh, that would be interesting.
Let's not go there.
You know, as a geek, I got a couple of nuts and bolts questions.
So when you talk about the model, that's a data structure of some sort, right?
It's a combination of data and code.
Yeah.
I look at this as a neural net with weights that have been finally adjusted to support whatever you're trying to predict, I think.
That's right.
So training is both updating the data structure and essentially
self-modifying code. It's code generation. I would call it code generation. I would say
the training is more like, you know, you go through the process of updating those weights
based on some model architecture, I'll call it. And then there's a process where the model architecture
is actually tweaked based upon its accuracy
and those sorts of things.
But that's done, that seems like it's done more like once,
the model tweaking and architecture tweaking,
but the learning can happen multiple, multiple times,
or training rather, or adaption, I should say.
Yeah, I mean, so the way it works i
mean raves kind of kind of you know went the technical route here but it it's when you look
at neural networking it's indeed about weights right so you have different kinds of inputs and
then you have to decide how much weight you're going to put on on a particular input, right? So, and changing those weights
is what the neural network will do.
And by changing the weights,
the accuracy will change.
But every time you add new data,
you know, you have to redo those weights.
A model is where you figured out
or you think you figured out the weights and you kind of use those weights. A model is where you figured out or you think you figured out the weights and
you kind of use those weights statically until you kind of adapt your model and then you update
the weights to what you just learned, right? But it's really a lot of math, a lot of weights.
Everything can change and it can change automatically.
Okay, so I get that part, but I'm still stuck on how do these frameworks talk to storage?
Because I know media and entertainment people talk to files or objects and databases want to do small block IOs.
What are these applications doing?
Yeah.
So let's talk a little bit about workloads and how the data kind of flows through a system
because it's kind of different from what you traditionally would have done in the past.
So the first thing you have to deal with is different types of storage,
because at first you need to take care
of ingesting your data, right?
So before you can do any training,
you have to ingest that data.
So where are you gonna store that data
and where is that data coming from, Right. And as Ray already mentioned, it could be from you could collect all your data centralized or it could come from IoT devices.
But the bottom line is, is you have to deal with data ingestion.
So any storage storage requirements for that is heavily right.
So you're going to ingest a lot of data.
So there the focus is a lot of writing.
After you ingested the data, you have to prepare your data.
So we talked a little bit about data quality before and some pre-processing, and that's
a storage device or storage architecture where there's a lot of read-write going on, followed
by the actual training.
So the model training, as you can suspect,
is heavily read-write.
And then moving away from training and going to inference,
it's a lot of reading.
So if you look at what kind of storage do I use,
do I use block, do I use file? And do I use object?
In reality, all of the above could actually work.
It depends on where you are in your cycle.
Yeah, but the all of the above model would be ingest here, copy it there, run the next step, push it to the third place to do the inference.
Yes, and it's not a storage device, right?
It's typically a solution that consists or an architecture that consists of different types of storage and your data moves through various stages through that as you're processing
it or preparing it or engineering it, I guess I call it.
Yeah.
Yeah.
So in the early days, you know, think about traditional AI
and a little bit machine learning.
What people would do is they would create file systems
with different data profiles.
And based on the data profiles, they would go,
this is for data ingest, this is for data preparation,
and this is for data ingest, this is for data preparation, and this is for training. And because training is processing a lot of data in a timely fashion, you can expect that
anything that hits or gets close to a CPU or GPU, that's where you expect storage device
that is high performance and very low latency.
So, I mean, a lot of the stuff I've toyed with, mind you, I'm only toying with
machine learning. Usually it just reads in the data and converts it to
what I'll call an in-memory data structure. And that works for
small files and stuff like that. But for some of these massive
data sets that they're feeding into these machine learning things, I mean,
that doesn't work, right? I mean, you actually have to do reads and writes
actually directly to some storage files or something,
I guess, right?
Yes, that's right.
And so there are many, many ways to do it.
So the traditional way of just putting a SAN out there
and just hitting the SAN doesn't work anymore.
So what you need to do is, you know,
you have to deal with data gravity and data locality.
And so the decision then is, hopefully, ideally, you would bring your data as close as to your CPU or GPUs, which would mean in that box.
In that final form that you need it to be in and stuff like that.
Yeah, yeah.
In the final form, yes.
The training phase
and and and it goes all over the place right so so direct you have seen i've seen direct attached
i've seen people loading up um with flash drives i've seen other people use nvme drives particularly
for the extreme low latency i've seen people create high-performance file systems across those servers with GPUs
and GPUs so that they can kind of on-off data locality, you know, kind of creating a data
locality. Almost a cache for the data from the file system kind of thing. Yeah, it has to be tiered, right?
Because imagine that you're sitting on 50 petabytes of data.
50 petabytes, okay.
50 petabytes, yeah.
I mean, there's a lot of people that have 50 petabytes today.
I think Howard's got it in the back of his lab, don't you, Howard?
Not quite.
But I do remember not all that long ago hearing
vendors go, and we have three customers, each with a petabyte. Yeah, that's beyond that now,
Howard. Okay, I got you. Go ahead, Fred. I'm Frederick. I'm sorry. Yeah, no problem. I mean,
if you have 50 petabytes and you know that for the next three days, you're going to process half a petabyte of that storage, it doesn't make any sense to put the 50 petabytes all on high availability storage, right?
That's very expensive.
So you come up with a tier and you kind of, you know, talked about a cache, you know, slash tier zero model where you move your data as close as possible.
And then you probably have a second tier, maybe Flash or at least maybe still SAS, who knows.
That gives you the ability to move that data in and out really quickly.
And then you have a large pool of data where the data is kind of at rest, but you're waiting for some kind of an orchestrator
that moves that data up and down, right? And you can see that with storage vendors.
You know, we talk about AI, using AI with storage solutions, but the storage vendors themselves are
also building AI in their storage devices because they realize that there is a need
for such a thing. So they're also trying to help you out by pre-caching, pre-fetching.
Their caching algorithms are getting more and more sophisticated.
Yeah, we're still waiting for the day when the storage array recognizes that
it's the 30th of the month
and month-end close starts tomorrow, so I better promote all that data today.
I think they're out there, Howard.
Well, I mean, they don't do it today because the amount of data they need
to manage that long, you know, a year-long time horizon
and know this thing that happens once a month is about to repeat,
they haven't quite made it to that yet, but I see it coming.
But if you purely look at storage, it's very complex because I think, Howard, you said it earlier, right?
So one of the most expensive components nowadays around machine learning, deep learning, is those GPUs.
Those GPUs are not cheap, right?
So if you have a whole army of those, you have to make sure that your GPUs, all those 40,000 cores per 4U server.
Yeah, I paid a lot for them.
I'm going to keep them busy.
I use crypto mining to actually keep mine busy.
That's why they're so expensive. It's all your fault, right?
Probably. And others like me, yes, they are. Actually, I think they've fallen since the Bitcoin crash, but that's got multiple copies of data and thinking, wow, I really want to throw NVMe over fabrics at this.
I can replace some of those copies of move the data from the training cluster to the inference cluster with the NVMe namespace and not actually move the data.
I don't know if I interject here, but it seems like the data you're using during inferencing
slash deployment of the model and the data that you're using during the training phase
might be two different sets of data.
I don't know.
Frederick, you want to comment on that?
Yes, they are.
Definitely. Frederick, you want to comment on that? Yes, they are, definitely. I mean, let's assume the 50 petabyte example is you use that for training,
but when you deliver your model, your model is a fraction of that, right?
And because you kind of use the neural networking to deduct a model and the weight.
Based on the sensor information coming in and data coming in
as during almost real time, right? Kind of things, right? Yes. I mean, really what you're doing
when a user is using the model is statistically comparing what they're saying
or what you're doing with the model you have. And that has to be small. You don't need a large
computer environment to do inference. I mean, that has to be small. You don't need a large computer environment to do inference.
I mean, that would defeat the purpose.
But you also have to, I don't know, archive the data as it's coming off and the classification
slash prediction or whatever the inference actually did.
That all has to be archived someplace.
So you are, you know, I don't know, capturing the data.
Yeah, well, so some jamoke walks into the casino.
You take his picture.
You run it.
You save it in the database for training in the next training cycle.
You send it to the inference engine to see if he's allowed to gamble in your casino or
not.
The only result out of the inference is the guy's a the guy's a wise guy throw him out
yeah and so you record the you record the inference you record the the image and then
once a week i plug the new in new images every four hours you're saying you know i'm in the
casino business i don't have that strict an SLA. Major banks, maybe.
Yeah, okay. Okay. Interesting.
Yeah, you're getting, I don't know what you call it, real-time sensor information in.
The model is making some inference out, and that data has to be recorded so you can run the adaption cycle again. So a lot of that data that I'm keeping that has been used to train the model won't get used frequently because every four hours or every day I'm going to feed the new data in and refine the model.
And I'm only going to go back to all the other data.
Yeah, it's data. It's the data with the results, right?
So if you, let's say that you're doing, again, the example I used before, image recognition of cats and dogs, right?
So you have a model that recognizes cats and dogs.
Somebody feeds it a picture which shows a cat and the system says, hey, it's a dog.
And the user says, not really.
And so you feed that the data back as well as the metadata about the inference, which is, hey, this was not recognized as a cat.
Right. So again, to oversimplify, I've got a JPEG and it's got metadata that says it's a dog.
And we sent the JPEG through and the system said it's a cat.
When I retrain, I add metadata that says you thought this was a cat last time.
Yes.
So the adaption cycle is...
So here's the question, Frederick.
Does the adaption cycle just go and process the new data and new inferencing and the new metadata?
Or does it go through a whole complete pass across the training data plus the new information?
Typically, it just takes the base model and modifies the base model.
Okay, so a training instance with the new data.
Yes.
I mean, remember that when you deploy an application for the first time,
you have a base model that works for everybody.
What you want to do is to customize it just for you, right?
And so whatever you do with the system,
the system will adapt that model specifically for you.
And so you're not rerunning the whole thing.
You're just changing the weights, you know, if we're talking technically here.
You're just changing the weights such that it will be more in favor on what you said.
And then depending on if the system believes that this is also applicable
to other people, it might go back to the bigger pool.
And then next time when they use the bigger pool to rerun larger models, they have that new data.
But it's not like, you know, in the 50 petabyte example, it's not like every four hours they process 50 petabytes, right, that they process.
I was thinking that would be quite an interesting scenario.
That would be a lot of bandwidth.
Okay.
Now I see what you're saying.
Yeah, but 30 of those 50 petabytes are essentially cold data relative.
So storage-wise, you know, having something that provides both economical, large amounts
of storage as well, low latency, high performance storage,
and tiering between those two,
either automatically or programmatically.
Those are the sorts of things you'd think would be useful
for at least the training side of this coin.
Would you agree with that?
Yes.
Training is where all the data is,
and that's where data management
and where you deal
with data volume, right? Inference is a scalability perspective from a unit, right? So you have one
unit to do inference with a model and then you have, you know, let's say a thousand instances
if you want to have a thousand people using your system at the same time, right?
Yeah, but going back to the storage and how to use it with AI,
and I think I mentioned before that GPUs are very expensive,
and in order to take full advantage of those GPUs, you have to make sure that you feed those GPUs with enough data and keep them fed with data.
And it's really, really difficult to do.
And so what you will see is that a lot of storage vendors will come up with a solution that includes the three infrastructure components and then one or
more frameworks.
For example, you will see most vendors provide a solution that includes Mellanox for the
interconnects, use the DGX1 from NVIDIA.
So DGX1 is a box with a bunch of GPUs that is
fully optimized.
I wouldn't say it's plug and play,
but at least it eliminates
the fact that a lot of people
need to learn a lot about GPUs
in order to get it to work.
And then the storage vendor
will plug in their
storage device.
There's many vendors who come up with this architecture.
And then on top of that,
they will deliver one or more platforms
like frameworks, I should say,
like Keras or PyTorch or anything like that,
such that they can deliver this
as a solution to the customer
and the customer will know that the ratio and the performance
and the latency between the Mellanox, the DGX1,
and their storage solution is as optimal as possible to start with
because getting those things to to work together
is is is not an not an easy feat and the the ggx has storage as well i mean nvme direct access
storage or um it's it's a little bit of caching it's it's its goal is to provide enough flexibility so it can boot, it delivers fast communication in between the GPU cards.
There's a technology from NVIDIA called NVLink.
As you can suspect, those cards are typically, they go through the PCI bus. But if you want to have GPU cards communicate over the PCI bus,
that eventually that's going to become a bottleneck.
So what NVIDIA decided to do is to come up with a protocol
where the GPU cards amongst themselves can talk at a higher speed
such that they don't take over the PCI bus.
And it does come with some storage, some RAM.
It comes with networking as well.
But it doesn't have enough storage to do a large amount of training, right?
And I have to specify that the solutions that are being presented today are almost all aimed at training because that's where the heavy duty is working.
So you would not use something like this for inference unless you have a model that is also heavy and requires a lot of GPU processing. You will see today that a lot of the AI applications do their training with GPUs or TPUs, but in
production, they might use CPUs.
That's interesting.
Okay, I got it.
Well, gosh, guys, this has been great.
Howard, any last questions for Frederick?
Well, actually actually just one.
So these, I mean, we've seen these machine learning stacks from Pure and from NetApp that I remember.
IBM's got one. I'm sure EMC's probably got one by this time.
It's starting to sound like converged infrastructure for ML, which just makes sense.
Yes.
And that's exactly it.
I mean, the solution I described with Mellanox and NVIDIA DGX1, it's exactly that, right?
I think it's a little bit of converged, but also kind of a starter kit, right?
Because people don't know how to start and selling those three
components separately, you know, storage, compute, and network is very, very difficult. So the
converged approach is an economical kind of starter kit. Just like people didn't know how
to build a data center for VMware, so VCE sold them a Vblock. So my question would be, as I look more and more at this AI stuff, it seems its applicability is so
wide. It's almost as if just about any corporate entity and any organization of any size whatsoever
could seem like they could take advantage of this sort of thing. And, you know, and I'm, you know, I, so here I am a one person organization.
I was able to, to do some modeling with, you know, blog popularity prediction and stuff like that.
And I didn't get perfect accuracy with, but I only have, you know, less than a thousand posts and stuff like that.
But even at this point, I can use it and actually take advantage of some of it.
Yeah.
I think you said could take advantage.
I should replace could with should.
I mean, in the world we live in now, it's a competitive advantage.
Ignoring AI today is just waiting for one of your competitors to figure out based on their data.
How better to serve their customers.
Yeah.
Yeah.
Has been for a long time.
I remember doing the first data warehouse projects at the casinos.
It's like, wait, two weeks from Wednesday, we have extra hotel rooms.
Who gambles two weeks from Wednesday?
Yeah.
Find those guys and see if you can't rattle their cage and get them to come.
Oh, you call them and offer them a free room and a lobster dinner.
Yeah, yeah.
I understand.
Literally.
And that was AI decades ago, effectively.
Well, it wasn't AI because we had to know what we were looking for and then look in the data warehouse.
The AI part would be figuring it out ahead of time.
Yeah, automatically.
All right.
I don't know if I saw, Frederick, anything you'd like to say to our listening audience?
I think everybody should try out AI.
It doesn't take a lot to try AI.
I mean, even in the public cloud, it's easy to try it out.
I think if you have data or you think you can take advantage of AI, you know, just try it out.
Don't be scared.
Nowadays, it's relatively easy to get things going.
You know, you don't need a PhD anymore to do some AI. I did it on my laptop.
That's crazy.
Well, okay.
Gents, this has been great.
Thank you very much, Frederick,
for being on our show today.
Well, thanks for having me.
And Frederick, you're available for people
with deep pockets who need help with this, right?
Always.
Please tell them where.
Yeah.
So my website is highfence.com.
So next time we'll talk to another system storage technology person.
Any questions you want us to ask, please let us know.
And if you enjoy our podcast, tell your friends about it.
And please review us on iTunes and Google Play as this will help us get the word out.
That's it for now.
Bye, Howard.
Bye, Ray.
And bye, Frederick.
Bye-bye.
Until next time.