Grey Beards on Systems - 79: GreyBeards talk AI deep learning infrastructure with Frederic Van Haren, CTO & Founder, HighFens, Inc.

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Howard Marks here. Welcome to the next episode of Greybeards on Storage podcast, a show where we get Greybeards storage bloggers to talk with system vendors and technologists to discuss upcoming products, technologies, and trends affecting the data center today. This Greybeard on Storage episode was recorded on January 23rd, 2019. We have with us here today, Frederick Van Haren, CTO of Hyfence. Frederick's an old friend who's been on our show before and is focused on AI consulting and services. So Frederick, why don't you tell us a little bit about yourself and what you've been up to? Sure. As you mentioned, my name is Frederik van Heeren. In a previous life,

Starting point is 00:00:49 I used to build large HPC clusters for speech recognition. I did that for more than a decade and wanted to do something else. So, I thought it was a grandiose idea to become a consultant. Think about HPC, big data, AI, that's typically the areas I play in. So what have I been doing? Well, mostly I've been doing a lot of NDA work for storage vendors, being startups as well as established vendors. And so the technologies there were typically NVMe or objects. Those are the kind of the two technologies that storage vendors really were looking for. Well, yeah, what's happening at the extremes?

Starting point is 00:01:32 Because we figured the middle out. Yeah, apparently. Okay, so disk is not a discussion point. Okay. Okay. So as well as helping vendors and customers with AI projects, right, going from early stages to complete transformation projects. But also I've been doing some analyst work really to get some additional name recognition

Starting point is 00:01:52 because you can't get enough of that, right? Why do you think we're here? So we've been hearing for the past year or 18 months a lot from storage vendors about AI and ML applications. And over and over I get press releases. Would you like to speak with us about this NVIDIA thing that we've combined with our storage? But Ray and I are old, and so we understand OLTP as a workload,

Starting point is 00:02:17 and we understand VMs as a workload. How does AI deal with the storage? How do we keep people running the applications you want to run happy? Yeah, it's a question, unfortunately, with a very long answer because AI is really… Well, we got about an hour. look at the definition of AI, or at least most of the implementations today, is really using human reasoning and create a model to predict the future outcome driven by past events. And what are past events? Past events really means data. So you're going to use data and process data as fast as you can in order to get results. That is really AI from a high level. If you, you know, if you,

Starting point is 00:03:06 since we have an hour, I can describe a little bit, you know, the history of AI. So AI is a term that almost has been around as long as HPC, right? And in the meantime, AI today means something completely different than it meant in the 50s and the 80s. So in short, let's call traditional AI whatever happened between 1950 and 1980. And so what happened in those days is if they had a problem to solve, let's take the game of chess as an example, right? So in order to come up with a model or an application that could play chess against a human, they had to extract all the rules and all the knowledge and how to play offense and how to play defense and how to recognize patterns. All of that information, they had to get it from somewhere. So in those days, they had to extract that information out of a Garry Kasparov and another grandmaster and come up with rules.

Starting point is 00:04:05 And those rules would be coded into an application. Mind you, I wrote a checkers program in college and I got the award for the most computer time used in one semester ever. I'm not sure it's been revoked since then, but again, they went for a more client server solution. But at the time it was mainframe systems and I just blew through it. It was fun, but it's been revoked since then but again they went for a more client server solution but at the time was mainframe systems and i just i just flew through it it was fun but it's crazy yeah i mean it's so if you think about it it was all about you know taking rules taking whatever a grandmaster knew translate that into code and then you had a model the problem was that if you wanted the model to learn, you actually had to programmatically implement that all over again. And as you know, that's kind of a slow process and takes a lot of time and interaction from a programmer.

Starting point is 00:04:57 So let's move a little bit forward. So if you look at machine learning, so machine learning is considered a subset of AI or the traditional AI, if you wish. And now we're kind of in the 1980s, 2010. And so the way you can think about that is Hadoop and Spark, those kinds of technologies where the open source community kind of kicked in and said, look, you know, if you want to process things, you don't have to buy those large, expensive infrastructure. You can use open source things like Linux, Spark, Hadoop. And then from a hardware perspective, you also had the benefit that there were laws more slow, right? Everything was going faster and faster, as well as the fact that hardware was becoming cheaper.

Starting point is 00:05:44 Yeah, and storage was becoming cheaper as well. Yeah. And so if we apply this again to the chess game, right, so the way you can look at it is that ML applied to chess is where you still implement the rules of the chess game into your ML system. You know, from a technology standpoint, we'd call that the features. That's kind of the right word to use. And then you would use some data from a bunch of chess games that have been played.

Starting point is 00:06:27 So the combination of those features slash the rules together with the data, you were able to build a model that could learn from itself. So every time you use the model, people played against it. That would be new data. That new data then can be put in the picture again. You create an updated model, and there you are. You have something that works. So if I get this right, in the old days, we would create rules that were, you should move your knight this way. And today we say, these are the rules of chess. A knight is allowed to move in these ways and watch these 6 million games and figure out how you should move it. That's right. That's right. And because in those days, you know, one of the things that was a problem for ML was a

Starting point is 00:07:14 lack of what we call good quality data, right? It was not, there was no lack of data, but a lack of good quality data, meaning very good chess games, yes, well verified, without errors and all that stuff. Because once you start heavily relying on data and you use data as a way to make decisions, bad data or bad quality data kind of means that your output also will be affected by them.

Starting point is 00:07:43 So I can't enter the chess games between the nine-year-olds who'll cheat when the other guy turns his back? That's right. Or the cheating will end up in the model, right? Yeah. I suppose that could be good and bad, but yeah, okay. So that's machine learning. So machine learning is really taking advantage of the data you have.

Starting point is 00:08:03 And then we enter kind of the 2010 area and now, which is deep learning. So what's so specific about deep learning? Well, imagine that if you have enough data that represents a significant amount of chess games, instead of having the features slash the rules being dictated to you, you can just start with data and let the data figure out how the game is played.

Starting point is 00:08:29 Yeah, I'd say let the model figure out how the game is played based on the data, right? That's right. And since I never saw a knight move one square forward, he can't. Yes. And because of that, you enter a different level of complexity, right? So now that you need a lot of data, you also have to find a way to analyze that enormous amount of data in a timely fashion. came to the rescue where they started delivering open source frameworks that would do all the analytical pieces for you right so the frameworks that there exists today or cafe and tensorflow and pipe torch and Keras you probably have heard those names flying around what those are those are frameworks that is

Starting point is 00:09:21 not data but those are the frameworks that allow you to process that data. Now, those frameworks also realize that they should take advantage of the new processing capabilities that exist today. And the processing capabilities are, you know, the number one used today really is GPUs, right? So instead of using a CPU, they decided to use GPUs. Google came up with Tensor, which is their hardware implementation. The software for that is Tensor. And then FPGAs and ASICs, it's kind of flashback for me to DSPs and the like. But in reality, the way you can look at that is that this is hardware that allows you to do fast mathematical calculations. And each one of those is taking shortcuts, mathematical shortcuts, without really impacting the results so much. It seemed like the TPUs had very limited precision, I think, was a key there. I mean, rather than like 32 bits or 16 bits or 64 or whatever, it was almost on the order

Starting point is 00:10:35 of 8 or 16 bits. Yeah. So the goal of Tensor was to do a matrix operation, a 4x4 matrix, two 4x4s. But each of the elements or the numbers are floating point 16, but the outcome is a 32-bit floating point, right? So the message there is not to do 32-bit floating point bit calculations out of the gate, but to use 16-bit floating point bit calculations out of the gate, but to use 16-bit floating point and end up with 32-bit to do it faster. It's all about cutting corners without impacting the outcome.

Starting point is 00:11:16 And so that's really what's happening. And if you look at a physical piece of hardware, if you take a one-use server with two sockets, it might have like 40 CPU cores. If you take a 4U server with eight GPU cards in it, that 4U server can have 40,000 GPU cores. So much greater parallelism for the simple thing that we're doing. That's right. So taking advantage of parallelism. But it also comes with a whole different bag of problems, right? you have to deal with storage, you have to deal with processing capabilities, GPUs, TPUs, FPGAs, and then also the interconnect, right? If you have fast storage and you have fast GPUs,

Starting point is 00:12:14 you need some network, not only that does the fast processing, but also keeps the latency low, right? So imagine that all those GPU cards have to talk to each other, you know, kind of MPI-alike workload. And in order to get, you know, to keep those things going real time, you need to have the lowest latency you can have, right? So for DL, you need all innovation in all those areas. But I must say that the open source community with the frameworks is kind of putting everything upside down, right?

Starting point is 00:12:51 So if we look at the traditional AI, everything was about the code and the rules, meaning that, you know, your source code was the IP. While today, the open source community is delivering the tools. So that's not your IP. Your data is the IP. Yeah. It's all about the data. Yeah. Okay.

Starting point is 00:13:13 So would it be, I mean, to oversimplify it, this GPU, TPU, FPGA compute core is so fast and so expensive, we want to build the rest of the system around it to make sure that it's constantly fed? So I got two questions here. So there's, I'll call it a training phase and there's an operational phase of any AI and it's probably some other phases, I don't know. But during the training phase, you're taking this data and you're passing it into this framework model configuration and using it to build the model intelligence, I'll call it. I'm not exactly certain what I'd call it. So that's happening during training. But once you deploy the model, let's say it's a, I don't know, self-driving car or something like that. So the model is sitting there probably in the car, I guess, because it needs to have real-time control. It's taking as input from data from the sensors, lidars, cameras, sound, you know, movement,

Starting point is 00:14:18 you know, also the vehicle speed and direction and those sorts of stuff. And then it's somehow taking all that information and saying, okay, this is where you want to go in this situation at this moment. And then the next moment is another set of data comes in, et cetera. Is that how a sort of thing works? Yes. So, so indeed. So AI really consists of two pieces. One is, is building the model, right? So you have to,

Starting point is 00:14:43 you have to have a base model in order to do anything. And as you mentioned, it's called training, right? So let's assume a typical example is if you have 10,000 pictures of something and you want to use that for AI to recognize if there is a cat or a dog in the picture, then what you would do is you would take 80% of your pictures and you would use them to create a model. And so that's the training phase. And the outcome of the training phase is a model. And that model is something you can use in production. The actual term that people typically use is inference. Inference is the process where you deploy your model and you validate it against new

Starting point is 00:15:34 input, which is people using the system. So in the car, you get that feed, and then there's a feedback loop so once the model has has predicted yeah that's predicted you know whatever whatever came in there is there is a a feedback loop where you sent that data back added to your training and you can update your model and it's kind of an an infinite loop right so as long as you use the system you you can use that data to improve the model. Is that loop process, it's almost like a batch, you know, yet another training phase with all the data that you had before,

Starting point is 00:16:15 plus any new data you have verified and vetted and all that stuff. And you go through and do the training pass again, or is it real time? You're feeding that new data into the model and it's adjusting itself however. Yeah, it depends on the application. Training typically is batch and inference is typically real time. But, you know, give you an example, if Waze, you know, Waze, the app that most people now use to navigate, that is obviously real time because

Starting point is 00:16:44 you, if there's an accident somewhere, you don't want to know about it within an hour. In an hour, you want to know it now, right? So there is a little bit of a real-time aspect and you can, you know, there are ways to deal with that. But then there are other products where, let's say, if you use a speech recognition application for a bank where updating the model is not something that has to happen right away, right?

Starting point is 00:17:09 So you might have an SLA and the SLA might say, I expect four-hour turnaround for new models, right? And then the user uses the system and then it goes along. But in general, training is batch and inference is really real-time. So you could have like a four-hour model update frequency. I mean, I always thought, you know, the DevOps guys doing new code every day is pretty bizarre. But changing the model every four hours is a reasonable thing? Yes, it's reasonable in the sense, well, if the application and your service allows it. But it's all automated, right?

Starting point is 00:17:45 There's no human interaction, right? The choice there is what kind of service do you deliver? How fast do you need it? And how do you handle it, right? But it's all automated. It's just GPU or CPU time. And typically people have a training cluster on one side and then they use on the side like this closed loop or adaptation, if you wish. You know, the term to change your model is typically referred to as adaptation.

Starting point is 00:18:18 We adapt the model, right? And to kind of personalize it to you. Because that's really what you want, right? And to kind of personalize it to you. Because that's really what you want, right? If you build an application for a service, let's say, you know, Amazon, not that I worked on Amazon, but I presume that Amazon, when you log in for the first time in Amazon, has no idea what you want, but they give you a basic line of products on the on the on the on the front page and then as you use the system and you yeah the recommendations will adjust just for you right and and and how fast and and soon enough you'll be like everybody else and as soon as you buy something

Starting point is 00:19:00 you'll go to facebook and see ads for the thing you just bought. Yeah, there's some articles where Amazon is thinking of shipping stuff to you before you actually decided to purchase them. Oh, that would be interesting. Let's not go there. You know, as a geek, I got a couple of nuts and bolts questions. So when you talk about the model, that's a data structure of some sort, right? It's a combination of data and code. Yeah.

Starting point is 00:19:29 I look at this as a neural net with weights that have been finally adjusted to support whatever you're trying to predict, I think. That's right. So training is both updating the data structure and essentially self-modifying code. It's code generation. I would call it code generation. I would say the training is more like, you know, you go through the process of updating those weights based on some model architecture, I'll call it. And then there's a process where the model architecture is actually tweaked based upon its accuracy and those sorts of things.

Starting point is 00:20:10 But that's done, that seems like it's done more like once, the model tweaking and architecture tweaking, but the learning can happen multiple, multiple times, or training rather, or adaption, I should say. Yeah, I mean, so the way it works i mean raves kind of kind of you know went the technical route here but it it's when you look at neural networking it's indeed about weights right so you have different kinds of inputs and then you have to decide how much weight you're going to put on on a particular input, right? So, and changing those weights

Starting point is 00:20:47 is what the neural network will do. And by changing the weights, the accuracy will change. But every time you add new data, you know, you have to redo those weights. A model is where you figured out or you think you figured out the weights and you kind of use those weights. A model is where you figured out or you think you figured out the weights and you kind of use those weights statically until you kind of adapt your model and then you update

Starting point is 00:21:14 the weights to what you just learned, right? But it's really a lot of math, a lot of weights. Everything can change and it can change automatically. Okay, so I get that part, but I'm still stuck on how do these frameworks talk to storage? Because I know media and entertainment people talk to files or objects and databases want to do small block IOs. What are these applications doing? Yeah. So let's talk a little bit about workloads and how the data kind of flows through a system because it's kind of different from what you traditionally would have done in the past.

Starting point is 00:22:08 So the first thing you have to deal with is different types of storage, because at first you need to take care of ingesting your data, right? So before you can do any training, you have to ingest that data. So where are you gonna store that data and where is that data coming from, Right. And as Ray already mentioned, it could be from you could collect all your data centralized or it could come from IoT devices. But the bottom line is, is you have to deal with data ingestion.

Starting point is 00:22:37 So any storage storage requirements for that is heavily right. So you're going to ingest a lot of data. So there the focus is a lot of writing. After you ingested the data, you have to prepare your data. So we talked a little bit about data quality before and some pre-processing, and that's a storage device or storage architecture where there's a lot of read-write going on, followed by the actual training. So the model training, as you can suspect,

Starting point is 00:23:07 is heavily read-write. And then moving away from training and going to inference, it's a lot of reading. So if you look at what kind of storage do I use, do I use block, do I use file? And do I use object? In reality, all of the above could actually work. It depends on where you are in your cycle. Yeah, but the all of the above model would be ingest here, copy it there, run the next step, push it to the third place to do the inference.

Starting point is 00:23:45 Yes, and it's not a storage device, right? It's typically a solution that consists or an architecture that consists of different types of storage and your data moves through various stages through that as you're processing it or preparing it or engineering it, I guess I call it. Yeah. Yeah. So in the early days, you know, think about traditional AI and a little bit machine learning. What people would do is they would create file systems

Starting point is 00:24:14 with different data profiles. And based on the data profiles, they would go, this is for data ingest, this is for data preparation, and this is for data ingest, this is for data preparation, and this is for training. And because training is processing a lot of data in a timely fashion, you can expect that anything that hits or gets close to a CPU or GPU, that's where you expect storage device that is high performance and very low latency. So, I mean, a lot of the stuff I've toyed with, mind you, I'm only toying with machine learning. Usually it just reads in the data and converts it to

Starting point is 00:24:51 what I'll call an in-memory data structure. And that works for small files and stuff like that. But for some of these massive data sets that they're feeding into these machine learning things, I mean, that doesn't work, right? I mean, you actually have to do reads and writes actually directly to some storage files or something, I guess, right? Yes, that's right. And so there are many, many ways to do it.

Starting point is 00:25:16 So the traditional way of just putting a SAN out there and just hitting the SAN doesn't work anymore. So what you need to do is, you know, you have to deal with data gravity and data locality. And so the decision then is, hopefully, ideally, you would bring your data as close as to your CPU or GPUs, which would mean in that box. In that final form that you need it to be in and stuff like that. Yeah, yeah. In the final form, yes.

Starting point is 00:25:44 The training phase and and and it goes all over the place right so so direct you have seen i've seen direct attached i've seen people loading up um with flash drives i've seen other people use nvme drives particularly for the extreme low latency i've seen people create high-performance file systems across those servers with GPUs and GPUs so that they can kind of on-off data locality, you know, kind of creating a data locality. Almost a cache for the data from the file system kind of thing. Yeah, it has to be tiered, right? Because imagine that you're sitting on 50 petabytes of data. 50 petabytes, okay.

Starting point is 00:26:32 50 petabytes, yeah. I mean, there's a lot of people that have 50 petabytes today. I think Howard's got it in the back of his lab, don't you, Howard? Not quite. But I do remember not all that long ago hearing vendors go, and we have three customers, each with a petabyte. Yeah, that's beyond that now, Howard. Okay, I got you. Go ahead, Fred. I'm Frederick. I'm sorry. Yeah, no problem. I mean, if you have 50 petabytes and you know that for the next three days, you're going to process half a petabyte of that storage, it doesn't make any sense to put the 50 petabytes all on high availability storage, right?

Starting point is 00:27:14 That's very expensive. So you come up with a tier and you kind of, you know, talked about a cache, you know, slash tier zero model where you move your data as close as possible. And then you probably have a second tier, maybe Flash or at least maybe still SAS, who knows. That gives you the ability to move that data in and out really quickly. And then you have a large pool of data where the data is kind of at rest, but you're waiting for some kind of an orchestrator that moves that data up and down, right? And you can see that with storage vendors. You know, we talk about AI, using AI with storage solutions, but the storage vendors themselves are also building AI in their storage devices because they realize that there is a need

Starting point is 00:28:06 for such a thing. So they're also trying to help you out by pre-caching, pre-fetching. Their caching algorithms are getting more and more sophisticated. Yeah, we're still waiting for the day when the storage array recognizes that it's the 30th of the month and month-end close starts tomorrow, so I better promote all that data today. I think they're out there, Howard. Well, I mean, they don't do it today because the amount of data they need to manage that long, you know, a year-long time horizon

Starting point is 00:28:41 and know this thing that happens once a month is about to repeat, they haven't quite made it to that yet, but I see it coming. But if you purely look at storage, it's very complex because I think, Howard, you said it earlier, right? So one of the most expensive components nowadays around machine learning, deep learning, is those GPUs. Those GPUs are not cheap, right? So if you have a whole army of those, you have to make sure that your GPUs, all those 40,000 cores per 4U server. Yeah, I paid a lot for them. I'm going to keep them busy.

Starting point is 00:29:23 I use crypto mining to actually keep mine busy. That's why they're so expensive. It's all your fault, right? Probably. And others like me, yes, they are. Actually, I think they've fallen since the Bitcoin crash, but that's got multiple copies of data and thinking, wow, I really want to throw NVMe over fabrics at this. I can replace some of those copies of move the data from the training cluster to the inference cluster with the NVMe namespace and not actually move the data. I don't know if I interject here, but it seems like the data you're using during inferencing slash deployment of the model and the data that you're using during the training phase might be two different sets of data. I don't know.

Starting point is 00:30:20 Frederick, you want to comment on that? Yes, they are. Definitely. Frederick, you want to comment on that? Yes, they are, definitely. I mean, let's assume the 50 petabyte example is you use that for training, but when you deliver your model, your model is a fraction of that, right? And because you kind of use the neural networking to deduct a model and the weight. Based on the sensor information coming in and data coming in as during almost real time, right? Kind of things, right? Yes. I mean, really what you're doing when a user is using the model is statistically comparing what they're saying

Starting point is 00:30:57 or what you're doing with the model you have. And that has to be small. You don't need a large computer environment to do inference. I mean, that has to be small. You don't need a large computer environment to do inference. I mean, that would defeat the purpose. But you also have to, I don't know, archive the data as it's coming off and the classification slash prediction or whatever the inference actually did. That all has to be archived someplace. So you are, you know, I don't know, capturing the data. Yeah, well, so some jamoke walks into the casino.

Starting point is 00:31:28 You take his picture. You run it. You save it in the database for training in the next training cycle. You send it to the inference engine to see if he's allowed to gamble in your casino or not. The only result out of the inference is the guy's a the guy's a wise guy throw him out yeah and so you record the you record the inference you record the the image and then once a week i plug the new in new images every four hours you're saying you know i'm in the

Starting point is 00:32:01 casino business i don't have that strict an SLA. Major banks, maybe. Yeah, okay. Okay. Interesting. Yeah, you're getting, I don't know what you call it, real-time sensor information in. The model is making some inference out, and that data has to be recorded so you can run the adaption cycle again. So a lot of that data that I'm keeping that has been used to train the model won't get used frequently because every four hours or every day I'm going to feed the new data in and refine the model. And I'm only going to go back to all the other data. Yeah, it's data. It's the data with the results, right? So if you, let's say that you're doing, again, the example I used before, image recognition of cats and dogs, right? So you have a model that recognizes cats and dogs.

Starting point is 00:32:59 Somebody feeds it a picture which shows a cat and the system says, hey, it's a dog. And the user says, not really. And so you feed that the data back as well as the metadata about the inference, which is, hey, this was not recognized as a cat. Right. So again, to oversimplify, I've got a JPEG and it's got metadata that says it's a dog. And we sent the JPEG through and the system said it's a cat. When I retrain, I add metadata that says you thought this was a cat last time. Yes. So the adaption cycle is...

Starting point is 00:33:42 So here's the question, Frederick. Does the adaption cycle just go and process the new data and new inferencing and the new metadata? Or does it go through a whole complete pass across the training data plus the new information? Typically, it just takes the base model and modifies the base model. Okay, so a training instance with the new data. Yes. I mean, remember that when you deploy an application for the first time, you have a base model that works for everybody.

Starting point is 00:34:18 What you want to do is to customize it just for you, right? And so whatever you do with the system, the system will adapt that model specifically for you. And so you're not rerunning the whole thing. You're just changing the weights, you know, if we're talking technically here. You're just changing the weights such that it will be more in favor on what you said. And then depending on if the system believes that this is also applicable to other people, it might go back to the bigger pool.

Starting point is 00:34:48 And then next time when they use the bigger pool to rerun larger models, they have that new data. But it's not like, you know, in the 50 petabyte example, it's not like every four hours they process 50 petabytes, right, that they process. I was thinking that would be quite an interesting scenario. That would be a lot of bandwidth. Okay. Now I see what you're saying. Yeah, but 30 of those 50 petabytes are essentially cold data relative. So storage-wise, you know, having something that provides both economical, large amounts

Starting point is 00:35:22 of storage as well, low latency, high performance storage, and tiering between those two, either automatically or programmatically. Those are the sorts of things you'd think would be useful for at least the training side of this coin. Would you agree with that? Yes. Training is where all the data is,

Starting point is 00:35:43 and that's where data management and where you deal with data volume, right? Inference is a scalability perspective from a unit, right? So you have one unit to do inference with a model and then you have, you know, let's say a thousand instances if you want to have a thousand people using your system at the same time, right? Yeah, but going back to the storage and how to use it with AI, and I think I mentioned before that GPUs are very expensive, and in order to take full advantage of those GPUs, you have to make sure that you feed those GPUs with enough data and keep them fed with data.

Starting point is 00:36:31 And it's really, really difficult to do. And so what you will see is that a lot of storage vendors will come up with a solution that includes the three infrastructure components and then one or more frameworks. For example, you will see most vendors provide a solution that includes Mellanox for the interconnects, use the DGX1 from NVIDIA. So DGX1 is a box with a bunch of GPUs that is fully optimized. I wouldn't say it's plug and play,

Starting point is 00:37:09 but at least it eliminates the fact that a lot of people need to learn a lot about GPUs in order to get it to work. And then the storage vendor will plug in their storage device. There's many vendors who come up with this architecture.

Starting point is 00:37:30 And then on top of that, they will deliver one or more platforms like frameworks, I should say, like Keras or PyTorch or anything like that, such that they can deliver this as a solution to the customer and the customer will know that the ratio and the performance and the latency between the Mellanox, the DGX1,

Starting point is 00:37:55 and their storage solution is as optimal as possible to start with because getting those things to to work together is is is not an not an easy feat and the the ggx has storage as well i mean nvme direct access storage or um it's it's a little bit of caching it's it's its goal is to provide enough flexibility so it can boot, it delivers fast communication in between the GPU cards. There's a technology from NVIDIA called NVLink. As you can suspect, those cards are typically, they go through the PCI bus. But if you want to have GPU cards communicate over the PCI bus, that eventually that's going to become a bottleneck. So what NVIDIA decided to do is to come up with a protocol

Starting point is 00:38:54 where the GPU cards amongst themselves can talk at a higher speed such that they don't take over the PCI bus. And it does come with some storage, some RAM. It comes with networking as well. But it doesn't have enough storage to do a large amount of training, right? And I have to specify that the solutions that are being presented today are almost all aimed at training because that's where the heavy duty is working. So you would not use something like this for inference unless you have a model that is also heavy and requires a lot of GPU processing. You will see today that a lot of the AI applications do their training with GPUs or TPUs, but in production, they might use CPUs.

Starting point is 00:39:56 That's interesting. Okay, I got it. Well, gosh, guys, this has been great. Howard, any last questions for Frederick? Well, actually actually just one. So these, I mean, we've seen these machine learning stacks from Pure and from NetApp that I remember. IBM's got one. I'm sure EMC's probably got one by this time. It's starting to sound like converged infrastructure for ML, which just makes sense.

Starting point is 00:40:28 Yes. And that's exactly it. I mean, the solution I described with Mellanox and NVIDIA DGX1, it's exactly that, right? I think it's a little bit of converged, but also kind of a starter kit, right? Because people don't know how to start and selling those three components separately, you know, storage, compute, and network is very, very difficult. So the converged approach is an economical kind of starter kit. Just like people didn't know how to build a data center for VMware, so VCE sold them a Vblock. So my question would be, as I look more and more at this AI stuff, it seems its applicability is so

Starting point is 00:41:10 wide. It's almost as if just about any corporate entity and any organization of any size whatsoever could seem like they could take advantage of this sort of thing. And, you know, and I'm, you know, I, so here I am a one person organization. I was able to, to do some modeling with, you know, blog popularity prediction and stuff like that. And I didn't get perfect accuracy with, but I only have, you know, less than a thousand posts and stuff like that. But even at this point, I can use it and actually take advantage of some of it. Yeah. I think you said could take advantage. I should replace could with should.

Starting point is 00:41:48 I mean, in the world we live in now, it's a competitive advantage. Ignoring AI today is just waiting for one of your competitors to figure out based on their data. How better to serve their customers. Yeah. Yeah. Has been for a long time. I remember doing the first data warehouse projects at the casinos. It's like, wait, two weeks from Wednesday, we have extra hotel rooms.

Starting point is 00:42:17 Who gambles two weeks from Wednesday? Yeah. Find those guys and see if you can't rattle their cage and get them to come. Oh, you call them and offer them a free room and a lobster dinner. Yeah, yeah. I understand. Literally. And that was AI decades ago, effectively.

Starting point is 00:42:35 Well, it wasn't AI because we had to know what we were looking for and then look in the data warehouse. The AI part would be figuring it out ahead of time. Yeah, automatically. All right. I don't know if I saw, Frederick, anything you'd like to say to our listening audience? I think everybody should try out AI. It doesn't take a lot to try AI. I mean, even in the public cloud, it's easy to try it out.

Starting point is 00:43:12 I think if you have data or you think you can take advantage of AI, you know, just try it out. Don't be scared. Nowadays, it's relatively easy to get things going. You know, you don't need a PhD anymore to do some AI. I did it on my laptop. That's crazy. Well, okay. Gents, this has been great. Thank you very much, Frederick,

Starting point is 00:43:29 for being on our show today. Well, thanks for having me. And Frederick, you're available for people with deep pockets who need help with this, right? Always. Please tell them where. Yeah. So my website is highfence.com.

Starting point is 00:43:44 So next time we'll talk to another system storage technology person. Any questions you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it. And please review us on iTunes and Google Play as this will help us get the word out. That's it for now. Bye, Howard. Bye, Ray. And bye, Frederick.

Starting point is 00:43:59 Bye-bye. Until next time.

Grey Beards on Systems - 79: GreyBeards talk AI deep learning infrastructure with Frederic Van Haren, CTO & Founder, HighFens, Inc.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.