Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 08x12: Revolutionizing Data Infrastructure for AI with WEKA

Starting point is 00:00:00 Storage software running on modern hardware can deliver incredible performance and capability to support AI applications. This episode of Utilizing Tech wraps up our season with a discussion of Weka's data platform for AI, with Alan McSeviny of Weka, as well as Scott Shadley of SolidIME, and myself. Learn how modern hardware is transforming storage for AI in this episode. Modern Hardware is transforming storage for AI in this episode. Welcome to Utilizing Tech, the podcast about emerging technology from Tech Field Day, part of the Futurum group. This season is presented by Solidim and focuses on AI at the edge and related technologies.

Starting point is 00:00:51 I'm your host, Stephen Foskett, organizer of the Tech Field Day event series, and joining me from Saladheim for this final episode of our season, once again, is my co-host and old friend, Scott Shadley. Welcome to the show, Scott. Hey, Stephen. It's great to have fun doing this season with you, and it's sad and also great to know that we've made it through another season of these wonderful episodes and some amazing conversations that we've had over the last few episodes. And working with you guys has always been so much fun. Yeah, it's been really great.

Starting point is 00:01:13 It's been great welcoming you as a co-host. I knew you could do it. Glad to have you. Yeah, I like to talk about things, that kind of stuff. So when it comes to talking tech, it's a lot of fun. And the guests and yourself make it a lot of entertaining. Well, that's what I was just going to say. I mean, the guests are incredible. You know, we get so much great insight from them and just so much perspective on how this AI thing

Starting point is 00:01:37 is being implemented around the world. You know, I think that people have this feeling that AI is somehow kind of a big iron thing, that it's some supercomputer in a big data center that's sucking down gigawatts of power. And it is, it is, but it's more than that. AI is being implemented outside the data center in smaller environments, at the edge, maybe let's say interesting venues, entertainment venues, all sorts of things. Exactly.

Starting point is 00:02:08 It's not just the home of big iron, right? Yeah, that's the wonderful thing about this is this is kind of the convergence of a whole bunch of different technologies at once and the ability to generate data in the way that we can generate data and then actually do something with it in a more meaningful way as we talked about in a couple of previous episodes of what people are doing to go back in time and bring that forward with the technologies that we have available today. So today's a fun one too because we have a literal convert joining us today from being a customer to an employee and so that's kind of fun as we bring Alan from Weka along.

Starting point is 00:02:46 Hi, I'm Alan McSebney. I'm the field CTO for media and entertainment and related AI at Weka. I recently joined only in December, but I was a customer of Weka for four years prior to that in a cloud-based visual effects studio. Um, most of my, my career has been related to creative industries. Um, actually I was a professional musician for the first 10 years of my life and really enjoyed marrying, um, creativity and, and technology and really pushing the boundaries of what could be done with technology back when actually audio and music was quite a challenging thing you could

Starting point is 00:03:32 do in a computer as opposed to now where you could run an entire studio on a laptop. But transitioned out of there into the visual effects world and large scale playback systems. And that's been an immensely rewarding career. Just using technology and being able to push boundaries has really been a place where I'm kind of happy. Well, it's interesting that in audio, audio visual, yeah, old Atari ST guy here, so I know a little thing about using personal computers for music. What happened there was basically specialized hardware gave way to software running on commoditized

Starting point is 00:04:16 hardware. And the same thing, Scott and I in our career in enterprise storage have seen the same thing happen, where what was once the domain of literally special boards, special processors, special everything has now become the domain of software. And that's really the story of Weka too. My understanding is that essentially the founding team and the origin of the product was what if this was done in software and what if we took advantage of the latest, you know, incredible advances in more commodity hardware, especially NVMe, but also all the things that you can do now on the processor.

Starting point is 00:04:58 And it worked, you know, I mean, this has been something where the software-based storage solution from Weka has literally become the bedrock of AI, right? Yeah, I mean the marriage of three things really allowed Weka to become not just a product but a vision in the first place, which was someone soldering an SSD onto a PCIe card, someone coming up with the concept of containerization, and then the network teams out there far surpassing everyone's expectations and blowing away what anybody considered could be achieved with network performance. So you put those three things together and now you can access NVMe storage across an array of servers all over network while orchestrating all of this through a container.

Starting point is 00:05:56 That, that is essentially what Weka is, um, gives you a huge expandable data platform, uh, that is all NVMe based, that is addressable over network, that will at times not only outperform local NVMe within a client, but sometimes even DREP. It's a creative innovation where you guys are literally transforming that ecosystem to allow the underlying hardware that comes from someone like ourselves into something that people just see as right next door. It's kind of unique that the software platforms and the data platform you guys have has that ability to transition data from point A to point B as if it were just sitting there and not have to worry

Starting point is 00:06:42 about all that transition time because to your point about networking, we keep seeing networks go up and down and faster and slower. And as we start getting further away from the main processing, that ability to see that localized data is something unique that you guys are working on. Yeah, I mean, there's a bottleneck always somewhere, right? You know, every so often it gets moved to somewhere else. But network performance today is astonishing. You know, we're seeing Ethernet networks up to 400 gigabit.

Starting point is 00:07:17 We're able to push data into a single host at hundreds of gigabytes a second. single host at hundreds of gigabytes a second. It's not something I thought we would see so quickly, but this is where we are today. I think you're right. The idea of having data local to whatever your processes are, whether it's on a laptop or on whatever compute you're using and then expecting to have to transfer that someplace and there'll be some penalty or some time spent or some transfer process that would just make you sigh. It is most

Starting point is 00:08:00 of our most of our collective experience and history. Today, that is just far from the case. Actually having all data centralized and accessible over network can be more performant than having it local. And that's one of the challenges, I think, when it comes to AI. Because this hardware is incredibly capable.

Starting point is 00:08:23 But I don't know that every system can take advantage of those capabilities in order to kind of move the bottlenecks out of the way. I mean, the entire history of technology is all about moving bottlenecks. We eliminated this one, and then it pops up over here. We eliminated that one, it pops up over there. And if you look at this kind of classic computer system hierarchy with processors and memory and storage, storage for a long time was just the ultimate bottleneck. With SSD that has been dramatically reduced, as you say, with things like NVMe and now

Starting point is 00:09:02 with Ethernet networking, not to mention proprietary networks, it's been reduced further. But the demand for data from these AI processors is just absolutely off the charts. It's insatiable. One of the things we've heard about repeatedly on this season and the last year on utilizing tech is basically the need to feed the beast. If you are not keeping your expensive GPUs fed, then you're essentially wasting money every minute, every hour that they're not working at maximum capacity. That's pretty much what companies are looking at Weka to solve with

Starting point is 00:09:39 software, right? Yeah, I mean the limiting factor with a large data center enterprise-scale GPU today is memory capacity. The amount of parallel compute available in a cut-and-edge GPU is mind-blowing, but the memory footprint on each card is just not where the processes that are running today needs to be. Our aim is to be able to augment that memory with a place where essentially you can tier the data that should be in memory off to Weka at such a data rate

Starting point is 00:10:28 that it can also be retrieved so fast that it becomes practical to now scale your memory footprint into petabytes of space. There's obviously, you know, we're talking tiers of performance here, but when we're able to outperform in some cases what DRAM could provide to GPU memory at that type of scale, hundreds of petabytes if you like, it really starts to be a paradigm change. The amount of time spent, for example, in LLM processes during pre-fill calculating key values and creating KV cache data, a lot of the time this will either just be cached to DRAM or to local NVMe as KV cache, but this can only be used by other GPUs inside the same server, or within the same NVLink. And now we can drop all of that KV cache out to Weka and make it available not just to other GPUs in the same server,

Starting point is 00:11:47 but to every single GPU in the entire data center. And at rates where, for example, to calculate around 105,000 tokens of pre-filled takes around 25 seconds on GPU, we've been able to take that same KB cache, place it back into GPU memory in around half a second. So we're talking more than 40, almost 50x speed up in some cases. And every time there's any query that's performed on an LLM, and this KB cache is generated,

Starting point is 00:12:27 we can just keep that for as long as that model is around. So it never has to be recalculated again. So the more and more and more that queries are common across multiple processes or customers, we just don't ever have to calculate it again. across multiple processes or customers. We just don't ever have to calculate it again. And then the GPU can get on with the meaty part, which is decoding. Yeah, you bring up an interesting point

Starting point is 00:12:54 about the idea of the tiering, right? Because we all looked at it as kind of, if you think of the hype cycle and all this kind of stuff, I brought this up with some of the other examples that we've gone through in this season. But we're at the point now where people realize they need more of something and that something is really being able to offload and shift the performance tier into an aspect of a larger footprint. Like for example, the massive drives that we can provide give you those petabytes of

Starting point is 00:13:18 storage that can look like that fast memory. Just because if you overload the memory, again, moving bottleneck to bottleneck to bottleneck the Eliminating of those bottlenecks is really kind of the key here and you know fast delivery to your point it's really cool you guys have had the the ability to highlight the Performance characteristics in real time of what you guys are up to with some of the work You've done in some of the recent venues that have come to light So it's really interesting to see how you guys have been able to transform the idea that it's really more about the data

Starting point is 00:13:49 and not where the data is sitting and being able to map, let the user maximize their hardware configuration by way what you can do with your software. Yeah, I mean, for example, probably the most prominent place that WECA could be seen in action would be the Las Vegas sphere. So it feeds data to that screen.

Starting point is 00:14:12 It's involved in rendering. It's involved in carving. At this point, it's touching pretty much every aspect of content creation and delivery to the screen. And we've seen enough interest of this where other large scale venues are looking to do the same. It's just being able to deliver this type of performance over a network is... Just being able to deliver this type of performance over a network is... I mean, I don't want to say that we don't have any competition, but there's nothing else right now that is able to hit these numbers that are also built that can scale up to the correct size, not just in performance, but in capacity.

Starting point is 00:15:04 This is always the trade size, not just in performance, but in capacity. This is always the trade-off, right? You look at systems that historically have been extremely performant. They're usually direct attached. If you want them to be a little bigger, you would switch to something like SAN. That was never quite as performant as direct attached storage, but it could be much bigger.

Starting point is 00:15:25 And then you could go bigger still and reduce some of the complexity by deploying NAS, which would be slower still, but could go larger. And then if you wanted stupid scale, you could go to object and also your performance goes through the floor. So the place where Weka sits really is beating the direct-attack storage performance and also scaling all the way up to object.

Starting point is 00:15:53 And customers that have these massive high performance requirements and large scale are coming and trying our product and just can't really find anything else that can hit those metrics. So I'm sure people will catch up, but today it's a good place to be. If I can provide a little background in there from a long time storage nerd,

Starting point is 00:16:19 you know, it's funny that people talk about, you know, what you just talked about, SAN and NAS and object. The scalability and performance of those is really a function not of the intrinsic element or nature of the storage or the storage protocol. It's about basically the modernization of the delivery mechanism and the software that's constructed to deliver those. I think that that's sort of the insight that some of these companies recently have had is that,

Starting point is 00:16:50 you know, the reason that direct attach storage was high performance was because it was dedicated. And the reason that object storage scaled so well was because it was distributed. And the idea that you could build a massive scale solution that would kind of combine the best of all possible worlds with software is really the reason that so many of these modern systems are able to scale.

Starting point is 00:17:19 Frankly, it reminds me a lot of Kubernetes and the cloud, and frankly, AI itself. I mean, the reason that AI processing is so incredibly power consuming and high performing is because of this whole idea of distributing it, breaking it up into small tasks and distributing it massively among multiple nodes in parallel.

Starting point is 00:17:40 That's exactly what's been going on in the leading software for storage. And that's the reason I think that Weka is able to scale the way it does. It's not because of some specialized little trick in there. It's simply because the system scales to just incredible levels, just like the cloud does, just like AI does. And I think that that makes it uniquely suited for this AI application because it's such a scalable platform, because everything is just completely distributed.

Starting point is 00:18:11 There isn't some monolith somewhere that says, this is only how fast it can run. Everything is distributed. Everything is running software. I think that's how people think that things work, but not everything works that way and yours certainly does. Yeah, I mean, you touched on one technology that has been instrumental in achieving planetary scale in anything and that's Kubernetes and the ability to orchestrate containerization in a way where as long as you can provide the resources behind

Starting point is 00:18:48 it, you can scale horizontally in an extremely resilient and redundant fashion is phenomenal. So Weka released recently its own Weka operator where you can actually provision an entire Weka cluster or multiple Weka clusters deployed fully in Kubernetes. So if a Kubernetes-based shop has compute that already has NVMe available in it and that is all siloed per server, has NVMe available in it. And that is all siloed per server. Installing Weka via Kubernetes now allows you to bundle all of this NVMe in a one giant file system that is available to every single server.

Starting point is 00:19:38 And if you're in a multi-tenancy environment, you could actually compose more than one cluster in this environment, shared across that infrastructure with each customer having its own entire dedicated cluster with cluster admin privileges per custom. I don't know another product that's doing that today, but it's pretty wild. I mean, normally you would treat storage like pets, and everything else could be treated

Starting point is 00:20:12 like cattle. But today, actually, we've been able to run storage in a cattle ranch. It's pretty interesting. It's pretty wild. I really do appreciate that you're putting some focus on storage. I mean, we've been the overlooked pet for quite some time. I'm not sure if I like being a pet or on the cattle ranch, but maybe I'm the dog managing the cattle. I like that idea.

Starting point is 00:20:39 But it's interesting because you talk about these ability to shift access to information across multiple points of physical location, which plays well into our conversations of this kind of season around Edge. How far away some of those platforms can be from the user or the operator, right? Because we all have this different definition of the word Edge and we've talked about those definitions all season. But realistically, I mean, how far out there are you guys seeing that the future of what

Starting point is 00:21:08 would be classified as your ability to reach closer and closer to where the data generation point is? Are there certain platforms or solutions that you're kind of investigating or already working on? Yeah, it's funny. I think the definition of edge moves probably more often than the bottleneck moves, right? And also one person's edge infrastructure could be larger than another's core infrastructure. And certainly some organizations, the amount of edge infrastructure they may have can vastly

Starting point is 00:21:44 outweigh what they have as core. But I think as we see more robotics appearing in the world to feed data to all of these compute processes that are literally out there in the world, not sitting in a data center, it's going to become more important. That will require different stages, different tiers, fantastic data movement between all these tiers. So yeah, I mean, edge network is going to be really important. Edge data centers, edge storage within them, data tiering from there back to much larger data centers.

Starting point is 00:22:40 All the orchestration of this is, you know, it's an intense focus for Weka. And, you know, we want to make sure that as that whole world gets more complex, that we stay at the forefront of it. And I think the nature of the solution too kind of matches the needs there too, because it's built up of sort of this parallel architecture, you can scale up and scale down very effectively. So you can use it at smaller scale, well, comparatively smaller scale,

Starting point is 00:23:13 for AI processing outside the data center, and then you can ramp it right up to massive scale, and then you can use your tools to enable data to make that leap from location to location from size. And I think that that's, again, that matches the way that people wish that software worked, but it doesn't always work that way. Yeah, I mean, so we have customers right now, for example, running autonomous vehicles all around the world who are generating massive amounts of metrics and trying to send all that home is

Starting point is 00:23:54 not very efficient. So deploying Weka in many, many different data centers all around the world that can be as close to these vehicles as possible to collect all their methods, clean them and reduce their size and then send all of that back to a core for additional research, for additional training and modeling is a place where WEC has been really successful. As that type of autonomy moves into additional places within the edge with more robotics. I think we're gonna see a lot more of that. Another place that we've been successful

Starting point is 00:24:39 is within media and entertainment as well, where the edge can serve to provide tool sets to talent that can be all around the world because talent is something you can't really scale. You have to find talent where it resides. Many companies have to set up infrastructure where you might make a toolset available to people for either a permanent or a temporary amount of time. But they need huge performance within the compute, within the storage, within rendering.

Starting point is 00:25:15 And all of this has to be able to seamlessly communicate with all of the other parties that are participating in the same project work code, for example. Um, so we've, we've had great success there. Um, particularly in cloud we've, we've watched customers being able to deploy temporary setups in different countries where you wouldn't even have normally at any footprint, um, employee talent in that area, tear it all down when the project's finished, while you're bringing up more someplace else for another project.

Starting point is 00:25:49 That's been, there's a bit of a game changer, actually. When people think of AI, especially nowadays, I think a lot of them are just focused on chatbots, and chatbots, and more chatb bots. But of course there's a lot more being done with this technology, whether it is using AI and ML in different ways or using HPC for other related applications. I know that you all are involved in some of that. Can you tell us a little bit about some other applications for this technology? Yeah, I mean, for example, we have a few companies in in in health and life sciences who are trying to solve some of the hardest problems in the world here that really matter to

Starting point is 00:26:36 people. Memorial Sloan Kettering, for example, deployed WECA to help speed up their modeling in the pursuit to solve many cancers. And they have managed to massively contract the time it takes to coherence for a model and massively reduce energy footprint in the same time, just by being able to achieve more miles per gallon on The exact same hardware in a shorter time so this This is going to be a game changer if if Memorial Sloan Kettering can actually achieve what they think they can in the next few years Which really excites me. I mean, it's fun to work on media and entertainment. It's fun to work on cars, you know, many things.

Starting point is 00:27:28 But when you actually see life changing work being done, it's quite humbling. Absolutely. And it's always fun to hear about technology. As I said, that's not just the same old thing that people are thinking of and using AI in new and exciting ways. Thanks so much for this incredible conversation. Scott, this is our last episode of the season.

Starting point is 00:27:55 We have been thrilled to have SolidIME co-hosting two seasons of Utilizing Tech now. And if our listeners go to utilizingtech.com, they'll find both of those seasons along with six other seasons previously. I guess before we go, Scott, sum up a little bit about season eight AI, AI data infrastructure, AI at the edge. How exactly should people be thinking

Starting point is 00:28:21 about data infrastructure and storage for AI? Yeah, I appreciate that and it has been it's been a lot of fun this season And I know Janice has had fun over a couple of seasons as well as my co-worker ace From our perspective and from my personal perspective. It's just AI is is a shiny object, right? It is something that is very real It's very true. But the fact that we're combining AI and now where we're generating the data and we're generating so much data nowadays, it's unique to think that people don't tend to realize as much that you have to put that data somewhere. A lot of this season, whether intentional or not, has been focused on the advent and benefit of storage.

Starting point is 00:29:03 It's kind of cool as a long-time storage guy to see the value and benefit of storage. And so it's kind of cool as a long time storage guy to see the value and the benefits of what we see and our daily lives coming through to everyone else as something of value. Because you spend so much time working on data and that data always seems to be the star of the show in certain different processing and things like that. But as you saw through the season, if you go back and look at it,

Starting point is 00:29:26 we've talked to a whole bunch of different ways of looking at managing data, focusing on data, and all of that revolves around where the data sits. And the data doesn't always just sit in the CPU or DRAM, which are wonderful toys and tools, but it does have to have a long-time placement of that. So I see that as one of one of the bigger nuts of this whole season is just, it's cool to know that storage is really getting a play in the space and all these companies are doing so many cool innovations

Starting point is 00:29:54 to again, shift those bottlenecks and talk about it, whether it's the industry standards bodies, the software platforms, the hardware platforms, a combination of all of that. So it's been a great season. I've had a lot of fun and I've learned a lot myself. Well, thanks a lot. Yeah, it has been a great season for me as well. Obviously, an old-time storage nerd here. It's fun to see where this industry is headed and it is fun to see just how all of those things that we wished we could do have

Starting point is 00:30:19 in many cases come true with modern software. So just incredible overall. Alan, again, thank you for joining us and representing WECA here on Utilizing Tech. As we wrap up this episode, where can people connect with you and continue this conversation? Actually, this week, 18th and 19th, WECA will be presenting at the three big AI conferences, San Francisco, London, and Singapore. So if you can make it down to those, please come and hear what we're all about.

Starting point is 00:30:51 Great. Scott, I guess going forward, where can people continue speaking with you and your colleagues and learning more about Solidaim? Yeah, for Solidaim, it's pretty straightforward. solidaim.com slashai. And also, you can find me on LinkedIn, Blue Sky, and Twitter, formerly known as, or ex-formerly known as Twitter, at SMShadley. I tend to spend a lot of time having fun sharing insights and just being a little bit social. Yep.

Starting point is 00:31:20 And you will find me as Sfosket on most of the socials, including Blue Sky and Mastodon as well, and of course on LinkedIn. Thanks for listening to this episode of Utilizing Tech. You can find this podcast in your favorite podcast application as well as on YouTube. As mentioned, this is the last episode of season eight. Yes, that's right. There are eight seasons of this and you can go back and listen to those all the way back to the pre-chat GPT era. If you enjoyed this discussion, please do leave us a rating and review. It's really nice to see those. This podcast was brought to you by Solidimed this season, as well as Tech Field Day,

Starting point is 00:31:56 which is now part of the Futurum group. For show notes and more episodes, go to UtilizingTech.com or find us on ex Twitter, Blue Sky, or Mastodon at Utilizing Tech. Thanks for listening and we will catch you next season on Utilizing Tech.

CODACE Plant Stand

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 08x12: Revolutionizing Data Infrastructure for AI with WEKA

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 08x12: Revolutionizing Data Infrastructure for AI with WEKA

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.