Podcast Archive - StorageReview.com - Podcast #148: LinkedIn Live – AI is a Data Problem

Starting point is 00:00:03 Hey, everyone, and welcome to this live session we've got going on today with our friends from Dell, Solidim and Oregon State. We've got a expert panel together today, and we're going to talk about how all this technology is coming together to create real AI outcomes, not just in the scientific research realm, which we'll get into, but student outcomes at Oregon State and how these components all come together to make phenomenal solutions. Let's just go around the horn real fast and do some introductions. Chris, won't you start? Yeah, I'm Chris Solomon. I'm the director of research and academic computing for the College of Earth, Ocean Atmospheric Sciences at Oregon State University. All right, Alan.

Starting point is 00:00:46 Hey, Alan, Benegner. I work at the Saladine Corporation. I'm the director of SOC planning and pathfinding. All right, and Seamus. Jamis Jones, Director of Tech Marketing Engineering for Computing, and sustainability at Dell. All right, and as promised in the run-up for this event, there will be no slides, there's no sales pitch.

Starting point is 00:01:07 This is a technical session, and if you have questions or comments, put them in the chat, and we will bring them up and answer them live. I want to make sure this as interactive as possible to make sure that you get value out of this session. Chris, why don't you start off and tell us a little bit about some of the projects you've got going on at Oregon State

Starting point is 00:01:26 because I think your mission, both from a scientific perspective, and then what you're doing with student e-val is really interesting. And you guys being an institution that are out there in higher education are evangelizing a lot of this technology. So the work that you guys have done is pretty critical there to help put some reality around AI outcomes. Yeah, thank you.

Starting point is 00:01:48 I've been here 27 years, and the position I'm going to allow me to straddle both the research side and the academic side and help us kind of move down the pathway of new technology. And so we have a lot of different projects. I'm both a researcher and an administrator. And so some of the big projects that we do are helping us monitor the planet with plankton and understand ocean health and climate change and things like this. And it takes big data to do some of these.

Starting point is 00:02:18 And we're monitoring the forests for endangered species and stuff. But when we look at the academic side, we're also trying to roll technology towards the students to change the way that we're interacting with them that kind of meet them. where they are so we can be more of a mentor and less of just a disseminator of information. How do you balance those requirements? Because those are all very different things, listening for birds in a forest and helping student evaluations and help enabling your professorial team to do more there. How do you figure all that out? Yeah, I mean, in the end, a lot of it is data. Okay, so we live in the data world. And I've said this over and over

Starting point is 00:02:58 again that AI is a data problem and science is a data problem. And so these two are really well married together. And when we started looking at how LLMs come in, we're really seeing the time of science coming forward to leverage technology in ways we've never seen before. And so one place where we all really do come together, regardless of scientific domain or academic domain, is compute. And we use the same type of compute, but we use it in vastly different ways. And so that's the, important piece is that it's generally the same technology, CPU, GPU, storage. It's how we're putting those pieces together and lining them out to create a solution. And that's really what we work with companies to help us identify is pathways around developing solutions with technology

Starting point is 00:03:48 that solve problems in specific areas. And so that's... Chris, can I ask you a quick question about that? I guess something that comes to my mind that I've seen a lot of other universities and just customers in general. The training aspect versus inferencing, you know, training is taking up only a small portion of resource in some of these universities. But the inferencing piece, which is what you're talking about, I mean, it's vastly larger, I mean, by factors of 50, right? Plus, I'm wondering, like, tell us a little bit about that. If we just look at the plankton, for example, which is a, is a massive project where we take ships out into the ocean and then putting 8K video cameras into the water.

Starting point is 00:04:34 And we're filming that. And it's generating 100 terra every 10 days of actual data that I have to then process, which turns into about 300 to 400 terra for one project. That one project, we trained the data over the consequence of weeks or so. But to do the actual inference, just one time was 11.8 billion images that I needed to go through. And I needed to get through it in a temporal aspect that was very rapid. Because if it takes me months and months and months and months, the information is irrelevant for the planet to interact or to make changes

Starting point is 00:05:09 and for us to make changes and to understand what we need to be doing. And so what we've got to do is use technology to isolate and get the data through in a timeline that's meaningful for the planet. And so, yes, we are impacted right now with inference workloads, but we also have to still retrain. And so we take that inference workload and we backfeed it back into our training models to make them more accurate. The other thing I need people to understand is that we still run a lot of simulations. Some of the biggest compute in the world with simulations over time.

Starting point is 00:05:41 And these simulations actually inform the training. And so we can't actually train without doing some simulations to actually inform how the training needs to go. And those simulations also take big compute. So yes, the training actually gets to be one of the smaller compute steps, but on both, the size of it, whether I'm doing simulations to build the data around the training, or I'm actually applying the trained model into the inference world, you're looking at massive, massive, massive compute and massive data. I feel like, Brian, that's something that, like, a year ago, or maybe even two years ago,

Starting point is 00:06:17 like the main focus was, we've got to train these frontier models, right? And it was all about, okay, how do we focus on a huge cluster that's going to be deployable within the main data center? And you're going to need the guts of a quarter of a megawatt of power to be able to put that in place and then get that model trained up. But now we're seeing, you know, what Chris is talking about, things are happening at the edge where we're actually doing this inferencing framework on products. and that are non-traditional, that are like, I mean, Chris has found some really unique use cases for some of the Dell products, like the 7745, you know, the X-C-77-45 with 8 GPUs. Super exciting. Honestly, I never thought of that being deployed, like, on ships and different use cases, you know? It's important to realize that this is where we were looking at high density or large capacity SSDs because I can't take my data center out into the ocean.

Starting point is 00:07:21 And I still need to have a large amount of space. We're talking about for one experiment was capturing 100 terra, okay? And I have 15 experiments going on on the ship at the same time, and I still have ship operations and all these other pieces. And so what we were really looking for was a platform that was reconfigurable on the fly or be able to take lots of different configurations for us to set it both either out into the ocean or into the forest or into a data center. And so when we looked at the piece of equipment that we own, which is one of those XE7745s, I can put 16 GPUs in it, or I can put AGPUs in it. I can load large capacity solidime into it.

Starting point is 00:07:59 And so really what it was, was it was a moldable platform for me to accomplish any kind of research or work I needed to do. And when we're talking about needing to put a petabyte out onto the ship into a single piece of equipment, I can do that right now with the solid dime hard drives that we've got with the 122s. And that was one of the key pieces I needed people to understand, is that I can slide 122 terabyte drives into these chassis and bring online a very redundant space that allows me to work at breakneck speeds. I mean, the speeds that we were seeing for Reed were in the 57 gig a second, and the rights were in the 17 gig a second. And so the machine supports this massive bandwidth to talk to that data store.

Starting point is 00:08:44 AI, again, is a data problem. It's not a compute problem. It is a data problem. And as we keep shoving more and more data at it, we need more compute to handle that. Remember, it's kind of like I tell Brian, it's a fork, knife, and spoon. I'm going to need a bigger fork knife and spoon. The food's coming out as fast. You're going to make Alan a little too excited, I think, talking about AI, the data problem.

Starting point is 00:09:08 I mean, Solidime had these high capacity drives, the 15s, 30s, 60s, 122s now. And, Alan, the timing for the inference inflection that Seamus is talking about couldn't have been better from a use case for these high-cap drives. It's a wild time for you guys right now, I'm sure. Yeah, I would, you know, here's some interesting tidbits for you. As we double capacity every generation going from 122, hopefully to 245 next year, the use cases that we expected are starting to change. quite a bit and I think Chris has been pioneering what I would say the multimodal use cases more than most folk so if you think about collecting all of these images and and some of the environments that he has to put this equipment into you know we were designing it for data centers and I think that you know being on a

Starting point is 00:10:04 ship is pretty exciting to watch one of our drives go to but you know the the real capability that Chris is talking about here is being able to collect all of this data and filter it at the edge And so what they can do is kind of a double filter set up, right? Because like Chris said, it's AI is kind of a data reduction exercise. So we can take all the center data, reduce it, bring it into a data center, reduce it again, train on it, model it. And so it's kind of exciting to see some of the new use cases that are popping up that are definitely driving multimodal.

Starting point is 00:10:41 Yeah. When you look at it, that temporal aspect is also in there. We're able to process now while we're traveling out or coming back from things. And so we're actually using that time to actually process data. We're also using the drives on the ship to help us move the ship around to get better data. Before, when we stick the cameras into the water, we don't know if we're getting good data or not. We're just collecting. And now we're able to do that semi-real time, which has been the goal we've been moving to for several years now.

Starting point is 00:11:08 You'll see that out there. And so that way we can make sure that we can actually stop, move to a different, what we call transsect and do a different transect and actually collect data in a different spot. So that way we're actually ensuring that we're getting data. Plankton don't swim. They float in the currents. And we've kind of just got to go out there and find where they are and then basically quantify them. Okay. So this ability to dynamically move the ship around ensures I'm not wasting a million dollars while I'm running that ship. Okay. I'm coming back with actually me able to be. Right. Hey, Seamus, you hit on something before I want to revisit about the change

Starting point is 00:11:47 and what AI means when you talk about it to your customers. In the early days, I think you're right. There was a mad rush. When those first eight-way platforms came out, the socketed what was probably A-100 at the time. That was a big generational shift in terms of memory footprint, what those things could do, and now they look kind of quaint by today's standards. But at that time, we saw a lot of enterprises, and I'm sure you did too, that we're worried about being left behind. And they made a big investment in hardware, but they didn't have data scientists. They didn't have AIML guys. They didn't necessarily know how to leverage that equipment the right way.

Starting point is 00:12:26 And I think you could make an argument now that we've got hindsight that perhaps that was a hair early to jump in, while many were super effective for sure. What is shipping now and how AI is morphed for the masses? I'm not talking about the hyperscalers or the big training loads on the NBL 72s, but where the bulk of those systems are going in the enterprise, it's almost all inference. And the flexibility in your platforms, especially the edge card ones that we're talking about here, 7740,

Starting point is 00:12:58 the Intel version, 7745, the AMD version. I mean, that's got to be, has to have been a fun transition to watch to really democratize what you can do with these inference systems now that anyone can pick them up. I mean, there's really two things that I've seen that have changed dynamically in the last even six months. Like, I know we sat down, all of us here sat down at Supercompute, and there was a lot happening at that time.

Starting point is 00:13:25 But even in the last six months, there's been massive changes around the way that customers are deploying GPUs, right? Applications are now becoming AI aware. So you no longer have to build specific. applications that will take advantage of models or model frameworks or inferencing. Some applications do have AI awareness. And it's amazing that, I mean, things like Oracle, things like SAP are now using and utilizing GPUs to make their application even more performant. So while, yes, inferencing is a large portion of GPU use cases, what we're actually seeing, too, is that

Starting point is 00:14:10 even traditional core applications are becoming aware and are using GPUs. So we're having to see a lot more use cases, even right down into traditional rack and edge products, like R770, like your 2U, two socket platforms. You know, every customer that's coming through the door most want to see, how do I get the most efficient GPU into that platform? So that way, if down the road, I do want to turn that into an inference. box or use it for specific key applications I can. We spoke quickly about a model of experts.

Starting point is 00:14:47 I know Alan had brought that up and in us working with several software partners that are like developing some of these use cases like Metrum AI. I know they worked with Chris and team. They're able to, we're able to see how these agentic workflows are dynamically changing the token count. that's being applied to these inferencing platforms, as in it is escalating, like on a level I've never seen before. If you were to take this a standard deployment,

Starting point is 00:15:23 and okay, if I have a certain number of users, I would look at maybe concurrency and time to first token, and I'd look at latency of the usage within each of the platforms. With the Gen-Tic workflows, you're seeing these spikes just go through the roof because they're multiplying out by 8, 10, 15 times the amount of input tokens into a model for these inferencing frameworks. And as a consequence, the system needs to be able to deal with that spike. It's really important that you do a few things, but one key thing in my mind is that you're building and scoping for, that type of growth. So that way you're not just planning

Starting point is 00:16:10 for your need today. You've got to really plan and deal with some expertise. I know, you know, we were talking about two years ago. It's hard to plan for this. Now we're starting to get foresight because of some of these universities and people that are pushing the boundaries

Starting point is 00:16:27 that we're able to say, look, I know that your specific need today based on your workload and use case is this, but AI is like Gremlin. Once they, you know, you feed them after dark, they're going to multiply these agents go crazy. So it's one of those things where, you know, there's no getting around the growth that's happening. You're going to pull out an 80s movie reference to make the analog that I work with?

Starting point is 00:16:54 I was the bus thing that we were talking about because really the changes to the bus architecture were crucial for us to be able to do what he's talking about. There was an actual fundamental change when we went from PC. PCI Gen 3 into PCI Gen 4 that allowed us to interact with those cards like an accelerator And it allowed us to also break up our workloads across the cars before that we had to have to just be all or nothing out to the GPU And now we can kind of do both and if we look at it really really properly and we take some of those new CPU processors like the new gen was and stuff like this that we've been testing you can show that you can run inference workloads on them pretty well and so you can actually double your hardware capabilities by still running inference workloads sometimes on CPU while you're still training on GPU. And so people need to realize that we're having a collaboration happening between CPU and

Starting point is 00:17:49 GPU. We watched this happen years and years ago between the original processor and the math co-processor and ultimately they came to being on the same die. And we're watching that same process happen right now, whereas Nvidia is now putting the GPU and the CPU on the same board and almost the same die, we're going to see another evolutionary step to where the accelerator is treated like an accelerator on the same die. So that will actually give us even more throughput and more bandwidth. And we will start leveraging them together. We already have groups who have started to develop applications where they take one little step and go out to a GPU and come back because they no longer have to send everything out. The bus now accommodates us interacting at that level. And that's what I think

Starting point is 00:18:31 Seamus is trying to get people understand is that for agents to be able, able to interact at the level that we're talking about, that bus, which was based out of the coherent accelerated processor interconnect, really changed the way GPUs and CPUs work together. Chris, do you have one really quick for Chris? Like, you know, we kind of, you know, Seamus did a really good job of like showing, you know, the services that they make are very hardened and useful at the edge. And you're deploying it in some very unique ways. But I get this feeling that you have some stuff that you're not doing that you'd like to have hardware capable of. Are there any wishes on the wish list?

Starting point is 00:19:14 I mean, ultimately, what I need to do is I need to basically take all of my data. My data stores need to have rag models put into them so that I can make my data come to life. And I need to be able to then present a pathway for my researchers to say, go grab all the data that's in this area of the world from this time. this time because really what's happening as we create these massive data stores is data is getting lost and the only people who know that data is there are the people who put it there and what I really need to do is make that scientific data and stuff accessible to the world and to do that

Starting point is 00:19:49 properly we need to start doing rag models and things like this across the data systems that use data that's not part of the systems not the time the file was created and not have the size of the file I really don't care about any of that okay what I care about is is metadata that the scientific community can inject around that file space and around the data in those files to say this file is valuable because of this, this, this, and this. And so that way I can now come in and I can make massive use of massive datasets that we're not able to take advantage of right now. If we just looked across the genomics group and I went over to their file systems and I looked and I said, how many copies of the Arabopsis genome are just sitting here? Because it's a model organism.

Starting point is 00:20:32 It's going to be, there's going to be thousands of copies of it out there, okay? And that's because we're not using the file space appropriately. And people aren't able to query the file space and say, where is the Arabdofs's genome already pulled down? Okay. And so these are problems. And what we really want to do is make that data more useful. Again, this is a data problem. And the more that we can get out of our data, the more value we're going to get out of our computers.

Starting point is 00:20:55 You know, everything comes around in cycles, right? And the way you're describing the data tagging, it's like the, M&E space at NAB a decade ago where they're talking about. We've got all these file management systems to help you know where your videos are and what they are and all these things. But the enterprise, I mean, this is not just a scientific problem. The enterprise still continues to struggle with knowing what data they have and then trusting the guardrails that they set up for if they're going to bring in a rag to be able to query that data. I mean, Alan, I'm sure as a data guy, you're seeing this and everything Chris is saying is surely resonating with you in terms of not you know, solidimes jobs to help you store it, but at a certain point, you've got to understand

Starting point is 00:21:37 what you have so that you can, in the commercial world, enterpresentize it, right? I have a great joke for you. I was in New York at a conference about a year ago, and I think probably 150 people were in the audience that day, and I happen to be on a panel of CPU experts, so as kind of the SSD or data person, came around to me, and, you know, we had this great conversation about how to train a model and how to set up the compute and everything. And so I, you know, looking at the, and this is mostly an enterprise IT conference. And I asked the question to the audience. I was like, hey, so I'm the data guy.

Starting point is 00:22:13 Like, how many people in the audience think their data is clean enough to run through a model right now? And only one person raised their hand. Guess who that was? It was Gary Greider at Los Alamos National. So I was like, that doesn't count. And here's the problem that we're at. and you're describing it pretty well. Yeah.

Starting point is 00:22:34 I mean, in the end, we have an amazing ability to store things as humans. I don't care if it's your garage. I don't care if it's your hard drive on your desktop, and I don't care if it's a disk area on a server. We will store the crap out of stuff, okay? And so what we need to start doing, just like in our garages,

Starting point is 00:22:50 is figure out where everything is and how to make use of it. Otherwise, it's useless data. And we don't want to do that, okay? We don't want to become data rich and information poor. And the reason that's going to happen is not being able to put these LLMs interacting with that data at the highest possible level. Well, Chris, now I'm going to aggravate Alan. You're not actually advocating for deleting anything, are you?

Starting point is 00:23:19 It's data management, though, right? I mean, okay, that's fine. I do have to clear off experiments. Let's talk about that. So my plankton won for, I did that $11.8 billion. And at the end of it, you know, we have to clear that off. I have the original videos. We could rerun it at any time.

Starting point is 00:23:39 And I keep the downstream output. But that's because I need to do another experiment. I don't have infinite research dollars to just buy infinite space from Allen. I want to. But they don't give me infinite research dollars. And so I actually do have to clear off the inference workloads. And Seamus has to appreciate that. Same thing happens on the educational side.

Starting point is 00:23:59 Let's talk about that. So the students, one of the reasons that we, we loved the machine that we were using is Metrum AI, phenomenal group to work with. We worked with them to bring this new AI evaluation too long for students. And it uses the same platform that we've been talking about, that 7745 that we have here, and we're running it on that same platform. But the most important thing was that we needed to manage student data, which involved FERPA. And FERPA is a compliance piece that follows along with all of the different types of information that we have.

Starting point is 00:24:32 And this is where student data has to be protected. And so for us, being able to put large capacity drives on a single solution allowed us to create a teaching piece of equipment that could store all the data, manage all the data appropriately, and keep it secure, private, on-premise. So a lot of the students didn't want necessarily their stuff going up into the cloud. And allowed us to exercise a piece of equipment in a multitude of ways. And so we were able to then isolate sections of the piece of equipment. equipment to do different aspects of the agentic workflows to create a robust solution to the students that could interact with a large data repository that was on the same machine.

Starting point is 00:25:13 Chris, how do you take that unstructured data and create rule sets to be able to manage that over a long period of time? Because it's easy enough to set policies and detail for structured data. But when it's like audio files and, you know, because with that use case, it's super interesting in that, you know, technically professors, what, professors were taking students end of year, or excuse me, thesis statements and things like that, that were actually, yeah, that they were actually putting them in place as an oral exam. And so then that way they can confer that they actually have a transfer of learning instead of doing. a paper that could be half written by AI. Half. I think that's being generous. Yeah, so we've recognized that the students are really good

Starting point is 00:26:08 with using a camera and recording themselves. So really, this evaluation tool that Metscham came up with, along with Oregon State, so we worked with them. We meet the students where they are, and it really reduces their activation energy to actually communicate their knowledge. And really, that's what we're trying to do.

Starting point is 00:26:28 We're trying to show that they have knowledge and have gained that knowledge. We don't care how that testing occurs. Now, we're also trying to not impact the professor. So if a professor had to sit there and do 300 oral evaluations for every assignment. Now, we're not just doing this for a thesis. We're doing this for assignments, okay? And so students can just record themselves answering questions for their assignment, upload that into the system, and the evaluation tool evaluates that and tells the professor

Starting point is 00:26:56 where things are and the professor can then, you know, basically push that forward. So the students are getting their information back faster. We're not impacting the professor at a higher level in ways that they can't get the work done. We're able to put more students in seats getting the same information or better information to them. And we're acting more like a mentor because of these agentic workflows that we're doing. How large are some of those one-on-one classes? Like those are like 300 plus students, right? 300.

Starting point is 00:27:25 Why not? Yeah, I mean, that is a lot of, that is a lot of 20-minute presentations to go through. You're right. Now, I'm taking not just their end of the class evaluation or thesis, it's actually class assignments throughout that. So we have 300 students submitting an assignment.

Starting point is 00:27:44 Then I have another assignment. Then I have another assignment. And we have multiple classes. And so the data store on that system has to be very large, okay? And it has to be able to be very fast a process. It has to be ingesting while it's processing, while it's sending out other outputs and stuff. And so that's why we needed very large capacity drives, and we needed it to be local, both for compliance, speed, and all those other pieces, and redundant.

Starting point is 00:28:08 And we also need to hold it for a certain period of time after the class is done. Okay, so people need to appreciate that I can't just clear that off immediately. I will down the road, but not until we get a certain distance past the class, because students have an opportunity to, you know, do an incomplete or do some other things to finish out their class a little bit later if some problem came up, okay? And so we need to make sure that we're still retaining that for almost a year, I believe, afterwards. And so when we look at this, that drive space has to help all these classes operate, store it for a year, help more classes operate. It's really impressive to watch us be able to put petabytes onto a single machine

Starting point is 00:28:51 without any effort that's fully redundant and meets all of those compliance and storage needs. That's really important. Hey, Seamus, let me ask you something too about something Chris said about the cloud, how they've got some reasons why they don't want to use the cloud for this specific project.

Starting point is 00:29:07 And the cloud's still going to be the gateway for a lot of people to experience or experiment with AI because it's just, it's there, it's accessible, it's instant. But I will say, Dell Tech World's coming up here in about two and a half week's amazing show, by the way, for anyone that's going, we'll have a bunch of people out there come to say hi. But the last two years on the keynote stage, Michael's been very clear that for the enterprise

Starting point is 00:29:36 inference workloads, those have come back or are coming back on-prem after there was maybe an over-rotation in some cases into the cloud. You're a hardware guy, so I mean, you like to move boxes, but what are you hearing about that specifically from your customers? Yeah, there's several reasons why hardware's become cool again, if you will. I mean, that framework has come back. I mean, the first one was something that Chris actually mentioned is the fact that having that data set on-prem is critical, right?

Starting point is 00:30:14 Or one for security and compliance, right? So a lot of customers are worried whether or not they're based in Europe or an APJ, because if you are based, if you're using a cloud service provider like AWS or others that are in the market space, and they have data centers in the U.S., even though that data might not be held in the U.S., the U.S. government has access to that data set no matter where it lots. And so especially for European firms, I've even heard of some companies in France, for example, have been not using Zoom for their meetings because Zoom's a U.S.-based company. And if you record the meeting, then that could be, you know, that data set could be taken.

Starting point is 00:31:03 So they're using a proprietary, actually there's a French company that is a Dell customer that we provide a lot of hardware for, but there's a lot of different options there. The other piece is that what we've seen, especially within, I mean, the challenges that customers have on these both model of experts and agenetic workflows is that the token counts go crazy. And even on a personal, like,

Starting point is 00:31:29 if anybody's even played with open claw or, you know, clod or anything, you realize very quickly that you, Your token counts are, you can only afford a certain amount of tokens into the cloud. And boy, wouldn't it be nice if, for example, I could iterate, you know, my code development and things. Because every time I make a small change in code and then I iterate that, I have to put a series of tokens into the cloud to make sure that it works and the framework. So you can see. We need to talk about tokenomics for a second. Okay.

Starting point is 00:32:10 Yeah. So tokenomics is actually a thing that the university cares a lot about. And how are we going to enable all our faculty, staff, and students to have a token allocation or a budget to be able to get work done? And so part of that actually is on-prem. Believe it or not, you need to maintain some level of on-prem to reduce your tokenomics problem. Okay. So I can do a whole bunch of that fun. I don't know what I'm doing.

Starting point is 00:32:39 I'm just going to go over here and I'm going to waste some tokens on-prem at a much different cost measure than I'm going to be doing that in the cloud. And then when I go to the cloud, I'm going to be very specific because I've already kind of figured out what I need and where I'm trying to go. And I'm going to be very specific and get exactly what I want for the tokens that I'm paying for up there. And so when we look at that, buying equipment on-prem has been a tremendous value for our groups to be able to use open source LLMs like GPTOSS-120 and all of these and Gwen and stuff like this, and open code, even around Claude, to be able to accomplish a lot of tasks that we don't need to pay tokens for.

Starting point is 00:33:19 They're still low enough that they're able to be done. And so that takes off that lower level, creating much better tokenomics for the stuff that we really need the higher level stuff for. There's two things that are happening there, Chris. One is that we're seeing customers, even if they're using cloud infrastructure, they're putting a system, let's say, in front of their query requests to optimize the queries to reduce tokens. It's really a token. It's like a standard R770 that's a token optimizer. And it can drastically, it can cut it by about eight times what you're spending. Because what it does is using some other source tools, you can reduce that token count. What are you going to say, Alan?

Starting point is 00:34:06 I was going to add, like, you know, we've actually had crazy requests where somebody wanted to use, you know, somebody's doing a lot of caching, so they're doing a lot of key value caching. And today you do that in HBM, and there's a few ways that you can overflow that into solid state drives. And we've had, you know, everybody thought that you would want that to be really fast. but I think what we're seeing with some of the models that we're playing with is the larger that cash can be, the faster your model gets to go, because the more intelligence you have just sitting there stored waiting for not to have to re-computed it, right? And so I think, you know, the progression of some of these things, you know, that that Seamus is talking about,

Starting point is 00:34:45 and maybe to Brian's earlier point, and definitely to where you're going, Chris, is you'll, that little short-term memory piece, right, starts to become as important as that storage piece that you're pulling that data from. So we're watching, you know, tokenomics go through the roof and, and people are trying to balance it with, hey, what do I not have to recompute? What can I just store because I've done it once already? Correct. And that's a big deal for us because we have a lot of these chatbots that we've put forward around some of our big research projects, but the same questions come in over and over again from different people. And I don't want to burn costs around those things. I basically can spit back out the answer at no cost.

Starting point is 00:35:25 You talked about the models that are out there now that the public consumes, Claude, chat, Chabutea, whatever. And I've heard more lately just in the last four or five weeks about end users running up against the end of their subscription credit and then getting rate limited while it waits to refresh. And those models aren't great at telling you until right at the end. They'll tell you you're almost out of token and you'll do one more question. And now you're out of tokens. swipe your credit card to keep the game. But end users have no concept of what a token is. They don't know that the question they're asking to go research is hard or easy or anything.

Starting point is 00:36:07 So I'm wondering from your perspective, how are enterprises or institutions like Chris's? And Chris, you can answer too, thinking about how much user education there has to be. I mean, you say, Chris, you give these guys a token allowance, but who knows? is what it's worth. I think we're at that starting point where everybody doesn't really understand it yet. And it's kind of like companies are going to treat this like our cell phones and minutes back in the day. People are going to learn really quickly where they want to use their minutes and where they don't want to use their minutes. Okay. And so we're at that bridge moments where we're bridging ourselves into this. And yes, I think the university has to go through and reevaluate

Starting point is 00:36:49 fees, student fees and stuff like this. So that way we can take fees. that we're not using properly anymore and use them for creating this other pathway in the same cost models. But yeah, I think that we're at that paradigm change. It's a funny one.

Starting point is 00:37:06 It's a funny one, Brian, because you know what? It's a completely different dynamic when you're looking at cloud offerings versus on-prem. Right? And what Chris is talking about, yeah, absolutely. Like trying to make sure that, look,

Starting point is 00:37:23 I'm using the most intelligent, largest, you know, 7 or 4 billion parameter models on things that are my most difficult problems, right? And data analysis and framework. But then these things that are, let's say, some 2 billion parameter models, you can deploy on CPU, on CPU with AMX, on the AMD portfolio as well. So there are options to be able to deploy almost like a T. tiered model offering on different hardware platforms. So that way you can use smaller models that might be more focused

Starting point is 00:38:01 towards certain agents or certain agendas and they can be solving very specific problems. And actually, in some instances, they're actually getting better results because they're not sorting through all the copus of corpus of data that's out there in the universe to solve a very simple direct question. And so what we're seeing is that there's much better. performance if you have some of these specialized frameworks in place, but it does take some strategic planning on-prem. If you were to do that into a cloud infrastructure, your costs

Starting point is 00:38:36 would go nuts, right? Because you just find GPU after GPU to be able to deploy. Just think about a researcher trying to use a cloud infrastructure or a cost model. I don't know what they're going to type. I don't know where they're going to go next. I don't know the scope of this research, guys, research. Again, we don't know what we're doing. We're just, doing. And so in the end, it's hard for us to understand what the costs are and put that into a grant. Having something local that we can just sit there locally and go question, question, question, and then say, if this question is not being answered right, take that to the larger one. Yeah. Yeah, there's multiple times that even within our models that we've deployed for some of our

Starting point is 00:39:16 workflows where I'm not sure if I need a more intelligent model, I'll actually deploy it on a smaller model that might be low cost or no cost for us. And then if I don't get the answer, like, guess what? I'll ask that same prompt to a more intelligent model. So I'm actually causing, yes, I'm causing more tokens. So what? It's on prem. I know that there's a threshold. I need a model to tell me what model to use. Yeah. We're going to create an agentic workflow. And our operating systems are going to turn in to LLMs to where you're coming in. And it's going to do stuff locally and you're going to say that's not a good enough answer and then it's going to expand out okay and that's what our subscriptions are going to do for us just like Netflix and things like

Starting point is 00:39:59 that and so and really what the on-prem needs is an ability to burst into large amounts of space as a ram disk what we really want to do is turn those SSDs into ram discs so that i can basically just stage all of the data over there and basically just be querying the crap out of it all my big And I can put bigger models on smaller GPUs. Okay, there I said it. I'm done. I mean, we're seeing this intelligence being embedded in all the storage products now. I mean, we just looked at some of the AI ops functionality in Power Store.

Starting point is 00:40:31 They're doing stuff like this, really advanced on Power Protect. Like, there's a lot going on there to help make those systems more intelligent. But more importantly, it's about optimizing the workflows for the admins that have to work against those, right? So that when you log in to the system you're administrating, you've been. got a panel that pops up and says, hey, while you were gone, I was running in the background and noticed these eight to ten anomalies, I already ranked them for you. And even here's some suggestions on how to remediate these issues. I mean, that level of embeddedness with these smaller models that are targeted, like Ellen, like I think you were talking about being

Starting point is 00:41:09 specific models for specific tasks, I mean, it's absolutely amazing how much is being bubbled up there. On campus, we are already running that exact same system on-prem to monitor all the classrooms and all the tech and all the classrooms. So I can tell when a piece of tech in a classroom is going down because I can't have that classroom down, classes are negatively affected. And so we have people who are actually just sitting there watching with AI agents saying, hey, something's happening over there. And it's literally watching, hey, these three things can't talk, something up higher is having a problem. Okay. And so the agents are starting to be able to associate those pieces. And that has to be on-prem. I'm not going to have somebody from the outside

Starting point is 00:41:50 watching over all of my stuff on the campus. We were talking about hardware before, and we didn't really get into the GPUs at all that Chris has been using in this latest work. But those RTX Pro 6,000s have really changed the game. And I know you support even the, what, 4,500 now? I lose track of the model number. Yeah, the RTX Pro 4500. How fundamental from a hardware standpoint, do you think that change has been from what was, I guess, L40S, the prior stepping zone to this feels like a dramatic change. Yeah, the progression from like that A100 and L40s and L4s back a few years ago, or a year ago, I mean, it's, I feel like it's, that's a decade worth of progress that's been made up in about a year and a half for two years. And really what we're seeing is it's a

Starting point is 00:42:45 funny one because instead of us looking at, you know, there is obviously like the GB300 and some of these very powerful eight-way GPs that are available for the Neo-Cloud providers and large rack-scale systems of NVL-72 and things like that. The big thing that I'm seeing, though, in these inferencing clusters, is power. So the power consumption that a normal data center has, When I say data center, it doesn't even have to be a data center. It has to be, like, let's say a manufacturing company, they've got two server racks that might be able to push 12 to 15 kilowatt per rack, right? You're not going to be able to buy some of these GB300, large liquid cooled solutions that are pushing the edge. And as a consequence, we're seeing innovation saying, okay, we need more powerful GPP.

Starting point is 00:43:43 that are drawing less and less power, right? And more efficiency with the CPU to GPU correspondence. And so what you're going to see, for example, AMD on their next generation of product, they've already made announcements that tour in their next generation of product is going to have even better correspondence between the CPU and GPU communication, so like a two-time bandwidth.

Starting point is 00:44:11 And as a consequence, we're developing newer products that we're going to be announcing at Dell World in a few weeks that are going to be able to take advantage of that and take advantage of that new bandwidth and the lower power GPUs mean that you can then scale those workloads as you need. So you don't necessarily buy full systems today. You can, I'm sure Alan will be happy about this. You can buy systems that are maxed out on drive counts, right? And enough memories for your KV cache, like we were talking about. And then you can scale your GPUs anywhere from one, four, eight GPUs and beyond. So it's... We need to realize that storage is not consuming the power it used to if we go with the new model of high-capacity SSDs.

Starting point is 00:45:01 Okay. So we're putting all that power back into the process and where it belongs. Now, we need to realize that it doesn't matter what new technology or graphic I mean, I don't care if it's Grace Hopper. I don't care if it's one of these, a bunch of RGX-6-000s. The moment I start to light them up, they make a lot of noise if they're air-cooled. I'm not going to lie. These things are freaking loud, bro.

Starting point is 00:45:20 They sound like they're going to take up. I feel like they've got to strap the damn thing down. Okay? Yeah. We brought liquid cooling into our data center because it gives us back performance. I get better performance on the CPUs and the GPS, whether you believe it in or not, it's absolutely true. And number two, the sound noise is dramatically different. I could put a liquid-cooled workstation with liquid-cooled GPUs and CPUs into a desk, into a room office with people working and they won't even know.

Starting point is 00:46:02 Okay. But if you turn on one of these big machines, I don't care if it's a grace opera or whatever, they will scream, bloody murder when you hit them. Yeah, those are scoped at anywhere from 75 to 95 decibels, which OSHA, you know, OSHA regulations are,

Starting point is 00:46:20 look, you need air projection anywhere over 70. So it's more than we're getting them hard. And they're fine to use, you literally hammer the, every GPU is going and every piece of, and then they're screaming. They're, they're a full go. So I'm a,

Starting point is 00:46:36 I'm a, I'm all about liquid, cooling. Alan, you guys have done a lot of work on that with cold plates for SSDs. I know the SSDs aren't the hottest thing in there anymore, but for these systems that are full liquid where they're removing all the fans, you've got

Starting point is 00:46:52 no choice, right? It's kind of funny. We got the request because we were the last air-cooled piece in the server. And most of the GPU servers, and everybody, to Chris's point, doesn't want to hear the fans anymore because they get so loud, you need hearing protection. And especially if we're trying to wedge

Starting point is 00:47:08 these things into places that maybe they don't belong. And I'm sure Seamus has probably got some really good stories about where he's found, you know, hardware that they've made. But I would tell you that, you know, for us, it was a pretty simple transaction. You know, our drive comes in a PCB with a little case around it. So we had thermal coupled the material and just a spring-loaded liquid cool plate. So when you slot it into the server, the plate sits right up against it, and it cools it all the way around the case because the way we designed it.

Starting point is 00:47:38 We've also had some drives that we've been working on a warranty for for immersion cooling. And so as you get through all of the different versions of cooling and the power utilization efficiency ratios, we're at, you know, air was, you know, probably two times the waters to cool it that you were using to run it. I've seen liquid plate down to probably 1.2. so 0.2 extra you know 20% extra power to cool it and I've seen you know immersion cooling down to

Starting point is 00:48:13 think 0.05 so 5% extra power and so you know it all it all kind of relates at the end of the day where you know if we're trying to shrink the power on storage as much as possible by increasing density and then at the same time we're trying to give as much power back to the compute side

Starting point is 00:48:31 and keeping that cool right there's a there's a is a nice delicate balance, but I think we've got some paths that we can proceed even in the future. This is why our university brought in liquid cooling. So I now have a six-inch chilled water pipe coming in and a manifold that allows us to deploy in RACCUs. Because instead of bringing in four more air conditioners that each would consume 75 KVA, we basically were able to keep 300 KVA going towards compute with the liquid cooling. And now I can buy more hardware, get more work done,

Starting point is 00:49:04 not just sit there and run electricity and stuff like this to cool things. Chris, this is a good point. And I'm curious your perspective here. So you guys brought in the chilled water. We've done a lot of work around this too. But the number of CDUs that are out there now, all the big infrastructure guys are buying up all the CDU guys and cold plate guys because they didn't have that as part of their portfolio.

Starting point is 00:49:27 They all relied on airflow, right? And so now there are, gosh, probably a dozen CDU vendors out there. There's all different kinds of technology for the cold plates, managing all of those thermals. There's two phase. How does it's immersion, as Alan said. When you decided that as a university, we want to get more efficient here, how do you sort through all of that to find the right technology set for your deployment? So I obviously used my background in physics. Thanks for me.

Starting point is 00:50:03 Most people won't have that, but okay. You know, that helps. But at the same time, I recognize that one of our major goals was to save electricity, okay, because I only had so much electricity. I had a megawatt coming into my data center, and I really needed that megawatt to go towards answering questions. And so I really focused in on two-phase initially, and I was looking at two-phase solutions that would be retrofitted onto hardware I already had and would meet, you know, standards with Dell, super micro, all of the big company names, okay?

Starting point is 00:50:31 And so we did work with a company called Zudacore on that. And Zudacore is here at Oregon State running in my data center. We do love it. It is a two-phase solution. And I get 70 KVA of cooling for a 515R plug, which is an amazing return. I want to be very clear. And yes, Dell did have it as a skew, and we can put it onto our Dells, and you could buy it with Dells and stuff like this. And so it is a tremendously robust solution.

Starting point is 00:50:59 And we even figured out that even if there were probably, The system kept itself up, even when the chill water was down for an hour or two as a bridge. And it was just so much better than some of our air handlers. It was such an impressive upgrade, and it really changed the way we could do stuff. I also got better performance. So GPUs, for example, in A100, we showed could actually outperform a Grace Hopper that was air-cooled simply because it was liquid-cooled. And so we bought this hardware. It has a certain level of performance in it.

Starting point is 00:51:29 and we're losing that based upon our air cooling. It's important to realize that Dell and Super Micro and these companies know this. And they can go through and they can change the bios when they put liquid cooling onto machines to get you back your performance. And that's the number one reason that you should be doing this, performance and saving in power. That's why we did you retrofit an existing data center or did you actually build a new facility? No, we retrofitted an existing data center. Right. Yeah.

Starting point is 00:52:00 My university, after we showed them what we could be doing with the data, put the money on the table and we got there. We did $3.8 million last summer. We upgraded the entire data center. And by December of this past year, the whole system was online and I was able to move ZudacorCCDU into the data center, light it up and watch it work. And it's fully functional right now running Dell hardware. Yeah.

Starting point is 00:52:25 I think there's a, so. There was a, you made an interesting. interesting point, Chris. Brian, I don't know if you remember we were somewhere in Maryland. We were talking to somebody who has had liquid cooling in their data center for maybe the past three or four years, so they were sort of a pioneer. But if I recall, Brian, the owner of the data center was overclocking everything. If you remember, it was like 40%. They could get away with it and get more efficiency out of it. And even when we took some Power Edge servers, Chris, and put the cool IT cold place.

Starting point is 00:52:59 on it and ran it through a small CDU, we too saw a performance increase. I mean, they don't advertise that as much as they probably could. That's what I want people to understand is that Dell does not actually go out there, shame on them for not telling you about this. I'm going to shame Del. Dell does not go out there and tell you that if you buy into this other system, you are going, they're going to change the bios and they're going to make this thing run like lickety split.

Starting point is 00:53:26 Okay. And we did show because Dell's an. amazing partner. They came through and they helped us change the bios to see what it would really look like and do the full test. And yes, they have engineered those products perfectly to operate at exactly what they're supposed to operate at. But if you go for the next level, you get back the return of the money you've put in. There's one more thing too, though, Chris, because it's not an all or nothing thing. You can just cool your CPUs and leave the fans in the system. But now the fans go from 100%. You still need to have crack units in your data center to support that.

Starting point is 00:53:59 You don't save as much power on the other side. It does go down. The fan consumption is such a big part of server energy use that most people don't understand. It's a huge chunk, 25% in some cases. We did this test. My test also was around sound. Remember, sound was important to us. And I put a whack of Dell hardware into somebody's lab running Zudacore.

Starting point is 00:54:21 And they still had people in the lab working. We had Dell helped us turn all the fans down and everything to make it run just like it's supposed to. and it was so quiet, we actually had people working in that lab. They didn't even notice the rack was in there. Wow. That's strong. We're coming up towards the hour, and I don't want to keep everyone past that time. But I do have one other question that I'd like to hear all three of your thoughts on.

Starting point is 00:54:49 I get worried sometimes, and we've talked about this a little bit today, that AI for enterprises and colleges is something that can be in terms. intimidating, that it's a big investment. They're not sure of the outcome. We don't really know how to enable the right kind of research or whatever. And Chris, the work that you guys have done at Oregon State is industry leading. I don't think there's any doubt about that. But what I worry about is the next set of smaller universities or smaller organizations that may be just intimidated by how do we get started. So to close us out, if you guys would kind of go around the horn starting with Chris, and then Alan and Seamus. I'd really just like your thoughts on how do we get more of this stuff to more people at a scale that they need to take advantage? And that might just be a GPU. It might be.

Starting point is 00:55:41 It was actually bringing it on-prem. So we need to realize the paradigm shift has already occurred, okay? And what we're doing is reacting to it. And what we need to do is change our reaction pace to be a little bit more rapid. And so what we recognize is that when I put on-prem resources, and people could feel like they could do things without being watched and without being recorded and without cost, all of a sudden they would go over there and they would play. And that's what I need them to do.

Starting point is 00:56:09 I need them just to treat it like a sandbox and go out there and play at no risk, no cost, and no one watching. So they feel unencumbered and feel like they can understand it, put things in and see what happens and comes back. And that's what we're trying to do right now is just lower the activation energy for people, to involve themselves to remove the bias. Alan, what are your thoughts on getting this out of more education? So I work with a really good team of folks who you are aware of, Brian, that have a program that, you know, we actually try to seed drives into new areas and proliferate things. But I would tell you that without a partner like Seamus and the folks at Dell that can do

Starting point is 00:56:54 the integration and give you the right recipe and, help with the right AI models and those things. And Chris is kind of a pioneer, so he doesn't need a lot of handholding at all ever. But if we take this recipe and we try to repeat this at other places, it's really you need to keep partner like Adele to help out with that in my mind because the level of integration that we're talking about just in this call is eye-watering. Yeah, we're leading heavily into Dell on this.

Starting point is 00:57:25 Yeah. I think that's my end. answer. I mean, I love it, guys. Yeah, no pressure on me. I love it. I guess the big thing is that, Brian, what you're talking about, you know, Chris's account team probably talks to him every week, right? And he has a direct line to our lab, you know, and also to you guys. The biggest thing, though, is that what you're talking about, Brian, are these other universities that might be, and learning centers that, you know, they're not they might not have that center of expertise like the chris sullivan on on their staff right they might not have um that type of framework in place and as a consequence one thing that we

Starting point is 00:58:11 offer that dell offers and we want customers not just the university but all customers to understand the benefit of on-prem infrastructure so you know what we do is we we do these uh things called demo depot where we actually have hardware that we can make available to you universities as a loaner. If they go to Dell Demodepo.com, they can see what the inventory is of systems with GPUs, some without GPUs, some liquid cooled, some not. And those are systems that my team put out there to customers on a 30, 60 or 90-day trial that they can see, look, would we get an impact with this or not?

Starting point is 00:58:51 It's obviously, it's one of those things where we ask that it's returned, right? It's not for them to keep it. I like to purchase them too, though. That's true. That's true. I would say it like a shot fired at Chris. He's not saying when he wants it back. Yeah.

Starting point is 00:59:08 But it gives people a really tangible idea of, look, how exactly can we use this in our exact environment? It's not this theoretical. Well, maybe we could do. No, it's like, let's get something on site. That way you can actually test it. Usually what we do is we'll come out. I'll send one of my team members out there and they'll do some demos and things like that.

Starting point is 00:59:29 Make sure that it gets set up well in your environment and that you're getting the best experience possible, right? That combined with your normal account teams means that like we look, we can look at costings and cost structures and when you actually want to buy the product. But what we want to make sure is that look, you understand what the full capabilities are, both of traditional rack of GPU, accelerated systems. and you're taking full advantage of AI as a sovereign AI, as an on-prem deployment. I would give one more plug to the Dell team as we go here. The hardware is critical, no doubt. The other thing is the software. And what Dell's done with the Enterprise Hub on Hugging Face is pretty amazing,

Starting point is 01:00:14 where they've pre-validated these models. They're easy to consume and deploy on the Dell hardware. And that's another piece. I think, Seamus, that really puts a lovely bow on that. that Dell package. Not just that level of software, they partner with Metrum. So the Metrum project that we did was actually a Dell Metrum project. And then when we look at future support, we are talking, and we're in the middle of that

Starting point is 01:00:38 conversation right now with Dell and Metrum about how to support that software long term. So Dell is actually helping groups bring in technologies to accompany the hardware, not just that level of software, but others. Exactly. There's a lot of AI ISVs out there at the moment. I mean, Metrum isn't excellent service provider, and then they have such great expertise, but there's a lot of different ISVs, everything from Run AI to KX for vector databases to, I mean, the list goes on and on and on that we're partnering with to try and understand. And then we're seeing the best of the breed of these ISVs like, like, Metrum that they just have some excellent expertise to be able to use. We can pull that together. We came up with an idea here, and Dell and Metram helped us bring that to life. And I think that that's an exact model that people need to realize.

Starting point is 01:01:35 You can have an idea like we did and bring that to Dell or a group like Metrum, and they'll bring it to life with you. Yeah, it's a tremendous value. And I think it's a really good point in terms of how can we adopt these things, if you've got a vision and put together the right team of partners that you can execute on any of those. And to the audience that watched this live, I appreciate you joining in. Chris, Alan, Seamus, thank you so much for doing this. I knew I'd have a hard time getting this done in under an hour, but we're just over.

Starting point is 01:02:08 So good job, guys, on all of that. And for everyone else, again, thanks for tuning in. We appreciate your support.

Podcast Archive - StorageReview.com - Podcast #148: LinkedIn Live – AI is a Data Problem

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.