Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 08x09: Building Data Driven AI Applications with Metrum AI

Episode Date: May 26, 2025

As enterprises roll out production applications using AI model inferencing, they are finding that they are limited by the amount of memory that can be addressed by a GPU. This episode of Utilizing Tec...h features Steen Graham, founder of Metrum AI, discussing modern RAG and agentic AI applications with Ace Stryker and Stephen Foskett. Achieving the promise of AI requires access to data, and the memory required to deliver this is increasingly a focus of AI infrastructure providers. Technologies like DiskANN allow workloads to be offloaded to solid-state drives rather than system memory, and this surprisingly results in better performance. Another idea is to offload a large AI model to SSDs and deploy larger models on lower-cost GPUs, and this is showing a great deal of promise. Agentic AI in particular can be run in an asynchronous model, enabling them to take advantage of lower-spec hardware including older GPUs and accelerators, reduced RAM capacity and performance, and even all-CPU infrastructure. All of this suggests that AI can be run with less financial and power resources than generally assumed.Guest: Steen Graham is the Founder and CEO of Metrum AI. You can connect with Steen on LinkedIn and learn more about Metrum AI on their website.Guest Host: Ace Stryker is the Director of Product Marketing at Solidigm. You can connect with Ace on LinkedIn and learn more about Solidigm and their AI efforts on their dedicated AI landing page or watch their AI Field Day presentations from the recent event.Hosts:⁠⁠⁠⁠Stephen Foskett⁠⁠⁠⁠, President of the Tech Field Day Business Unit and Organizer of the ⁠⁠⁠⁠Tech Field Day Event Series⁠⁠⁠⁠⁠⁠⁠⁠Jeniece Wnorowski⁠⁠⁠⁠, Head of Influencer Marketing at ⁠⁠⁠⁠Solidigm⁠⁠⁠⁠ ⁠⁠⁠⁠Scott Shadley⁠⁠⁠⁠, Leadership Narrative Director and Evangelist at ⁠⁠⁠⁠Solidigm⁠⁠⁠⁠Follow Tech Field Day ⁠⁠⁠⁠on LinkedIn⁠⁠⁠⁠, ⁠⁠⁠⁠on X/Twitter,⁠⁠⁠⁠ ⁠⁠⁠⁠on Bluesky⁠⁠⁠⁠, and ⁠⁠⁠⁠on Mastodon⁠⁠⁠⁠. Visit the ⁠⁠⁠⁠Tech Field Day website⁠⁠⁠⁠ for more information on upcoming events. For more episodes of Utilizing Tech, head to ⁠⁠⁠⁠the dedicated website⁠⁠⁠⁠ and follow the show ⁠⁠⁠⁠on X/Twitter⁠⁠⁠⁠, ⁠⁠⁠⁠on Bluesky⁠⁠⁠⁠, and ⁠⁠⁠⁠on Mastodon⁠⁠⁠⁠.

Transcript
Discussion (0)
Starting point is 00:00:00 As enterprises roll out production applications using AI model inferencing, they are finding that they are limited by the amount of memory that can be addressed by a GPU and a lot of the other architectural considerations. This episode of Utilizing Tech features Steen Graham, founder of Metrum AI, discussing modern, rag, and agentic AI applications with Ace Stryker and myself. AI applications with Ace Striker and myself. Welcome to Utilizing Tech, the podcast about emerging technology from Tech Field Day, part of the Futurum group. This season is presented by SolidIME and focuses on AI at the edge and other advanced enterprise IT topics. I'm your host, Stephen Foskett, organizer of the Tech Field Day event series, and joining me today as my co-host from Solidaeim is Ace
Starting point is 00:00:45 Stryker, somebody you may recognize from our last season of Utilizing Tech. Welcome to the show, Ace. Thank you, Steve. And I'm very excited to be back. And we're here in beautiful Sunnyvale, California today. Thanks for having me. Yeah, it's pretty cool that we were able to get together in
Starting point is 00:01:00 person to record this episode. We're actually here for our AI infrastructure field day event and SolidIME is going to be presenting this afternoon and so we thought thought that it would be fun to record an episode of utilizing tech right here on the show. Talk to us a little bit about what we're going to be thinking about today. Sure thing, yeah. Well, the topic du jour is AI, as it has been for our last several conversations. What we're getting into today is really around the inference side of the AI equation. So we've been spending a lot of calories there lately, talking with partners, customers, understanding emerging use cases.
Starting point is 00:01:44 As we've said, there is no AI without data. There is no data without infrastructure. And that's where SolidIME comes in. What we're learning is that the sheer magnitude of data on the inference side of things is just blowing up. That's not just our point of view. I mean, there's analyst reports from McKinsey and Tech Insights and others that'll kind of reinforce
Starting point is 00:02:10 that notion, but what we're seeing is people use these models so much and there's so much data involved in going in and out of these models during inference that it's really putting a strain on infrastructure and it's driving requirements higher and higher at a very fast rate. And so I'm looking forward to getting into that topic a little bit today with our guest.
Starting point is 00:02:31 Yeah, and importantly, memory constraints play a huge part. So one of the things you're gonna be talking about today on both the podcast and as well as at the field day event is how that can be reduced through some clever engineering. And speaking of clever engineering, that's why we've got Steen Graham here. Steen, welcome to the show. It's nice to have you.
Starting point is 00:02:54 Well, thanks for having me. And Steen Graham, CEO of Metrum AI. We do a lot of work around building AI agents and also model and hardware performance evaluation as well. So talk to us a little bit about, well I guess let's just go right into it. Many companies are trying to deploy AI based applications. Many companies are trying to build applications that incorporate enterprise data. Retrieval augmented generation or RAG has been a huge topic. But what we're finding is that that can place some pretty extreme stresses on infrastructure
Starting point is 00:03:31 and require quite a lot of memory in order to implement in the real world, right? Absolutely. And I think the modern kind of transition point. And I think that's a really important point. And I think that's a really important point. And I think that's a really important point. And I think that's a really important point. And I think that's a really important point. And I think that's a really important point. And I think that's a really important point.
Starting point is 00:03:56 And I think that's a really important point. And I think that's a really important point. And I think that's a really important point. And I think that's a really important point. And I think that's a really important point. And I think what we've been working on with the team at SolidIME is how do we kind of look at those scenarios and pave a pathway instead of going full GPU-centric data center with all your infrastructure?
Starting point is 00:04:16 How do we kind of give them a pathway, an affordable TCO optimized pathway, into deploying those latest AI agents and agentic rags stacks for their use cases. Hey, Steve, can we just to start, no doubt some of our audience is familiar with retrieval augmented generation, what that is. But can you give us a quick summary? How is that different from just selecting a foundation
Starting point is 00:04:40 model off the shelf and plugging it in and starting to feed inputs into it? What does RAG buy you, and why are folks so interested in it? So I think the simplified view of things is several years ago with the advent of transformer-based large language models, you would just use a model, serve it in a chat bot, and then the model would hallucinate. And so obviously for enterprise quality needs,
Starting point is 00:05:06 that doesn't work. And so the next iteration of innovation was, how do we actually put your data close to that large language model? And notably, sometimes we use a vector database or a graph database in some scenarios as well. And graph could be super useful for different use cases. use a vector database or a graph database in some scenarios as well. Graph can be super useful for different use cases. But you just basically take all your existing data, and
Starting point is 00:05:33 usually it would be like for a particular use case and the associated tribal knowledge associated with how that use case is solved, and you're pairing that vector database with the language model. So now the language model before it generates an answer is querying high fidelity, accurate domain specific information. And that's really kind of the simplified view of the RAG environment today. Now what you'll hear in the year 2025 is everybody will say, this is the year of AI agents.
Starting point is 00:06:05 I've definitely been hearing that, yes. I think AI agents are an extension of that evolution where we're actually now allowing the AI to query and use tools from the company. API calls to the company's internal CRM system, or their HR system, or your supply chain system or your JIRA tickets. And so now we can do things like take that domain specific information that Ragarty has, add an agentic framework on top of it, and then extensively do that where you can actually create a digital worker that can get the job done while a human's not in the loop. And that's where people are really trying to look for,
Starting point is 00:06:47 where's my 10x ROI with AI? Turns out it doesn't happen when a person's in a chatbot co-chatting. And even a rag-based environment, traditional rag environments, have had that kind of chatbot type interface with your own data. So you're chatting with your own company's data for higher fidelity outcomes, no hallucinations, but you're still
Starting point is 00:07:08 not getting the scalability of a digital worker that the AI agents will provide. And that's, I mean, I guess if you want to talk metaphorically, I mean, RAG is a great idea. But essentially, I mean, metaphorically, it's a librarian with a really great card catalog who can look up things and make sure that things are contained within the data set and that they're the right things and so on.
Starting point is 00:07:31 You can validate data. There's a lot of things to love about it, but the problem is it needs to have that really big card catalog or vector database, and it needs to have this huge data set, and that can take up a lot of space. And that's been something that's, I think, held this back. Even though it sounds great,
Starting point is 00:07:48 how do you have that encompass your company's entire corpus of data? How do you? Yeah. Well, I think, I mean, most companies sit on terabytes or petabytes of data. So step one is just basically organizing that data and the high fidelity data.
Starting point is 00:08:08 And the last 10 years, we've went on a data journey. So for the companies that have transitioned to leadership data lakes, they're in a good position to be able to start vectorizing that data. And most of the database companies and data lake platforms are now offering the opportunity to kind of vectorize data as well. So there's a lot of opportunity and pre-work
Starting point is 00:08:31 that's already been done to put the AI models in a position to succeed. But we're still kind of probably in the position where you do want to nail a particular use case. So curating that data set for the domain-specific problem you're trying to solve, who's the business unit owner that wants to solve that particular problem, is still incredibly important.
Starting point is 00:08:50 So you don't want to just dump everything into a vector DB and start querying right away. You're probably not going to get the highest fidelity results. And a lot of corporate data is greatly outdated as well. So there's obviously some curation you want to do to set yourself up for success. But there's a lot of pre-work already being done today to make that possible.
Starting point is 00:09:12 But that is the most important part of the journey, having your data organized and ready to go. There's no question there's a lot of data involved in the stuff we're talking about. I'm curious when you look at the actual architecture that these models are running on and that this RAG data is sitting on, can you give us a sense of, is this stuff typically done in memory?
Starting point is 00:09:40 Is there a lot of storage involvement in real time, for example, an enterprise is running an inference workload and consulting some kind of reg-connected external data source? Yeah, I think maybe just looking at the models themselves, and where people are battling right now with GPUs, because we're centering the data center in all our applications around GPUs, because they're the bottleneck.
Starting point is 00:10:06 So any rational throughput analysis always says the most expensive component is where you want to have the bottleneck. And so there's a lot of pressure on the GPUs right now. The tradeoffs that the companies making GPUs have is it's really, really costly to put a bunch of memory in the GPUs have is it's really, really costly to put a bunch of memory in the GPUs. Simultaneously, the larger the model, roughly, the better the performance, the more state of the art model that occurs. So there's this really big challenge in the GPU memory
Starting point is 00:10:37 around fitting big, big large models in the GPU memory. And so that's kind of the number one bottleneck that we all face in the market today. Now, when you start pairing that with a RAG-based architecture, which usually we're running the vector DB on the CPU, and then we're bringing in all that vectorized data, the memory hierarchy kind of levels out a little bit more like a traditional memory hierarchy that you would anticipate as well.
Starting point is 00:11:07 But yeah, the GPU constraints are happening and then you're pushing the workload now into more of a full application systematic software workload that's driving more of a traditional storage memory as well when you shift to a RAG-based architecture. And AI agents is similarly extensible to that, where you're running a lot of application logic for that domain-specific use case, API calls that are all happening in more traditional compute
Starting point is 00:11:35 infrastructure. That doesn't need to happen on the GPU. GPU is just focusing on serving that model performantly. And when you say GPU, I'll just point out, too, that when it comes to Edge especially, but even increasingly in data center AI and cloud AI, it's other types of accelerators, too. I mean, there are definitely acceleration engines out there
Starting point is 00:11:58 that, in many cases, can provide better service than just a standard GPU. But they have the same constraints that you're talking about. In fact, in many cases, can provide better service than just a standard GPU. But they have the same constraints that you're talking about. In fact, in many cases, those accelerators have even greater memory constraints. Yeah, you absolutely nailed it if you look at the companies that are doing a lot of innovation in AI accelerators.
Starting point is 00:12:19 I use GPUs to cover the AI accelerator world almost interchangeably. But those companies, in many cases, made probably decisions to save on bomb costs and not have memory in their systems. They've driven them to make decisions on how they serve models. And then they have to parallelize model to serving these large models as well.
Starting point is 00:12:44 And some of them wrote great systematic software to do just that. But this is like a big challenge in the world today, especially as you apply larger models, but also you apply a chain of thought reasoning. We start to like massively increase the amount of inference calls we're doing by giving the model the ability to kind of think through things more. Now you're really driving a significant workload and ultimately a high memory footprint too. Not to mention people are announcing nearly unlimited context windows, like 1 million token context windows at all,
Starting point is 00:13:20 and wanting to kind of make sure we sustain that over time. Which, you know, the context window versus rag scenario is another trade-off to ACEs because with these massive context windows, you're almost getting rag in that, in the LLM application as well. But the kind of the fidelity of an enterprise application, I think, still likes the separation of a rag environment. A lot of those context windows are teed up more for consumer-based applications at this point in time. But definitely muddies the waters quite a bit. Yeah, one of the things we've been hearing about a little more often and
Starting point is 00:13:55 call it the last nine months is approaches for grappling with the increasing amount of data involved in inference. We've seen storage vendors come to market and talk about approaches, for example, for offloading some of your rag data and accessing that directly from storage. NVIDIA was just at GTC talking about the key value cache and approaches for placing that in storage as opposed to in memory, especially as you involve more complex models or longer interactions between models.
Starting point is 00:14:33 And that just grows and grows and grows. Is that a feasible approach? Is that something that folks should be thinking about as a realistic solution to the problem of memory constraints as more and more data gets pulled into the pipeline? Yeah, absolutely. I think kind of one of the gifts that we have that I think
Starting point is 00:14:53 is underutilized today in the AI world, probably because we're all focused on this GPU memory constraint, is disk ANN. And what that allows us to do with the disk ANN-based optimizations, we're allowed to offload workloads onto solid-state drives that traditionally would be run in memory. And while the indexing time, it takes a bit longer to index it, the net results as your queries per second and performance improve dramatically.
Starting point is 00:15:22 So that's, I think, a little hidden hack that you can use to lower the memory footprint. That's a little. I want to dig into that one because that seems a little counterintuitive, right? When you tell someone they can read some data from storage as opposed to from memory, and you're actually seeing higher queries per second when you take that approach. Is that right? Yeah, that sounds wild. Yeah, yeah. Well, I mean, I think the work that's been done
Starting point is 00:15:51 within the disk and in working group, and obviously we're spending a little time on the indexing side doing some pre-processing and some optimizations there, but once you've kind of made that trade off when you index, that kind of one-time is trade off when you're indexing, then you've kind of made that trade-off when you index, that kind of one-time-ish trade-off when you're indexing, then you've got a little bit of the algorithm-based optimization that will give you that queries per second performance. And it's not like a 2x type differentiator, but it's like same level performance, plus or minus. It can even, in some data sets, be dramatically more.
Starting point is 00:16:25 And then you're looking at same level recall accuracy. So just at a trade-off of indexing time. And so it's indexing time, not even capacity. Because I was thinking that it would be a trade-off with capacity as well. Because the nice thing about storage is that you can have a lot more capacity than you can have with memory. Yeah, that's a fair point.
Starting point is 00:16:43 You're definitely using more capacity with that implementation as well. But that's capacity that, as long as you're using the state-of-the-art PCI-based drives, that's capacity you usually have in the system initially as well that you're using for other applications. Yeah, and that's where I want to go to, too. So the side effect of making things,
Starting point is 00:17:08 and again, even if it wasn't faster, even if it was just not slower, that's still groundbreaking. The side effect is that you have much more capacity. So you can deploy applications with much, much more data to support them than you could in memory, even if it was not the same level of performance. Right. I mean, even if memory blew it away,
Starting point is 00:17:35 you'd still run out of memory pretty quickly. And I mean, we've talked about solid time. You guys have very big drives, you know. I mean, I remember the announcement of the 60 terabyte drives and of 120 terabyte drives. I don't think we're talking about having 120 terabytes available to a RAG application right now, but are we? Well, I think it's definitely, you know, in the scope, it really, I think it really depends on how much high value data an enterprise has. If they've got 100 terabytes of high value data that's for
Starting point is 00:18:12 a domain specific application that improves the quality of the output, it's definitely in the scope of deployability today. That's wild. Of course, it doesn't just have to be one drive. I'm a storage nerd. I mean, absolutely. I mean, most storage systems use multiple drives. But just the fact that we have that kind of capacity that could be made available to these applications is really, really shattering.
Starting point is 00:18:35 Because there's just no situation in which you could have that kind of RAM at an affordable price point if you really wanted to deploy an enterprise application with many terabytes of data, you just couldn't affordably. Yeah. Yeah, so speaking of affordability, one other cool thing we've been having fun with recently is because that kind of core problem that we've always seen about GPU memory footprint, we thought it would be really interesting to see if we can offload the actual large language model onto the SSD. So this is actually very unique and
Starting point is 00:19:14 you know what we've done. And I've heard about people investigating that, that's a really cool idea. Yeah and there's some tools and technologies, in this case we're using a feature in DeepSpeed, which in many cases we use for training applications. But DeepSpeed has some capabilities around model offloading. And so what we've done recently is we've taken a 70 billion parameter model, which
Starting point is 00:19:38 doesn't fit in like a L40S-based NVIDIA GPU. And we've actually offloaded the model into solid state drives. And while you don't get the same performance, you couldn't deploy that model at all. And so it gives you kind of model capability based on offloading. So for people that haven't refreshed all their infrastructure, or they're waiting to get the
Starting point is 00:20:05 latest and greatest GPUs, you can actually use this technique to use a bigger model on a lower cost GPU by SSD offloading. So that's kind of a cool innovation and I think just like disk ANN has evolved over time and performance has improved over time, I think we'll see a level of innovation and performance improvement in SSD offloading as well that it's gonna warrant paying attention to, especially as we're increasing the number of, chain of thought reasoning and applications and all these inference calls are exploding right now.
Starting point is 00:20:40 So at some point you have to look at affordability and that's a great way to hit a different totally different level entry point on pricing the The model off weight offload thing is is really compelling to me it's a really interesting idea and I wonder like my My my gut sense is that that may be interesting to folks particularly Who have interest or needs to deploy AI solutions at the edge?
Starting point is 00:21:10 Because in a lot of cases, we have more severe power constraints, space constraints. You may not be able to put the latest and greatest GPUs in your edge servers, right? Do you see that as a potential play for this? Where, hey, you can now run a 70 billion parameter model on a GPU running at potentially much less power than the GPU would have otherwise needed,
Starting point is 00:21:32 and now we can take that AI to new edge environments? Yeah, it definitely meets that criteria. When you look at the edge, you think about, OK, we're constrained from power footprint. Usually there's a big latency requirement at the edge. But the existing infrastructure at the edge, it's legacy. Typically, Edge has a little bit more legacy infrastructure.
Starting point is 00:21:54 So it kind of checks all those boxes as far as trade-offs you'd want to make at the edge. Now, it might not be a 70 billion parameter. That might be a technique you use on a 7 billion parameter. Or if you're deploying on some really legacy infrastructure at the edge, it might be a 700 billion parameter, that might be a technique you use on a 7 billion parameter, or if you're deploying on some really legacy infrastructure at the Edge, it might be a 700 billion parameter model. So I think that it scales down to kind of the right footprint for the Edge as well. I wouldn't ignore kind of the enterprise cloud applications here as well, because what's
Starting point is 00:22:22 happening with the transition from chatb bots to RAG to AI agents over time is AI agents are running autonomous of human intervention. Now you can always, you know, human in the loop it. But what we want our AI agents to do is they want to, we want them to be our digital workers that are working for us while we're asleep or hanging out with our family and enjoying life.
Starting point is 00:22:44 And then we want to come back in the morning the next day and see the output, all the reports and documents that the AA agent conducted for us while we were enjoying some great sleep and some great family time. And that can be done on a batch-based processing node. So we don't need to get the most high performance GPU in that scenario,
Starting point is 00:23:05 we can kind of use what, MacGyver, whatever we have available today, and leverage that and deploy it as well for batch-based workloads. So I think AI agents offer us a great opportunity to do some trade-offs in latency. So like real-time tokens per second, little less important for AI agents,
Starting point is 00:23:24 depending on the particular workload. Yeah, we've been hearing that as well with future and some of the research that we're doing. In fact, we're starting to see people talk about using CPUs for especially for agentic AI for the same reason, because it's sort of an asynchronous workload. Also, because there's a proliferation of CPU cores. The CPU cores have a lot of specialized functions.
Starting point is 00:23:47 In many cases, they're actually getting specialized AI instructions. And because they have greater addressable memory, in many cases, than GPUs or accelerators do. So CPUs can look increasingly attractive for this. And with many of the things that you're talking about, I could see a lot of that going hand-in-hand with this CPU trend as well, wanting to use more storage instead of memory
Starting point is 00:24:12 to reduce the overall bill of materials to deploy some of these agentic applications. Because essentially, you take this to its logical conclusion, we could see systems running agentic applications on conventional servers with a reasonable amount of memory and a reasonable CPU and a reasonable amount of storage thanks to the fact that we now have capability to use that. Is this a vision that you would share?
Starting point is 00:24:40 I think, yeah, absolutely. I think there's a lot of opportunity to take that historical data center architecture and run AI agents on it, whether it's CPU. We actually have a number of AI agents that run 100% on CPU and no GPU required. Now, that being said, I think some of the older GPUs, fabulous performance still for those type of workloads
Starting point is 00:25:02 as well. So I wouldn't start transitioning 100% to CPU in all case scenarios. But for those batch-based workloads where you're fine with a little bit more latency, it definitely works. So those existing data centers, I don't think, need to be totally retrofitted today in all scenarios for a GPU-centric architecture.
Starting point is 00:25:21 We can make use of them for deploying AI and AI agents. One more question from me, Steen. I'm curious, since you're our expert on agents here, and we haven't had one before on the podcast, I want to pick your brain on this. Let's say you've got a model. You've connected it to some RAG data. And what it can do without gentifying it,
Starting point is 00:25:45 is that the right term, without giving it agency, is it can give you insights and advice, right? And then when you give it agency, you're now connecting it to other tools and systems and allowing it to take actions on your behalf. Does the act of giving that model agency have significant repercussions in terms of the amount of data generated or used. Like there's a lot of data clearly
Starting point is 00:26:11 involved in training a model, there's a lot of data and RAG potentially, and then whatever systems you're connecting the model to have their own data sets which presumably existed before the connection. But is there a big impact to incremental data simply by virtue of making a model agentic? Is that something you guys have looked at? Yeah, I mean, absolutely. I mean, we see this in our AI agents that have chain of thought reasoning.
Starting point is 00:26:36 And the more autonomy you give these agents, I mean, the human is the bottleneck in the scenario. So the better you design the workflow, the more API calls that that agent can do, the more tools they can do, the more autonomy you can give it and problems it can solve. It just massively explodes the level of data being created. I don't want to characterize that data as synthetic data,
Starting point is 00:27:02 but I would say kind of non-human generated data footprint is massive. I think that's a very interesting transition point. I think that's a very interesting transition point. I think that's a very interesting transition point. I think that's a very interesting transition point. I think that's a very interesting transition point. I think that's a very interesting transition point. I think that's a very interesting transition point. I think that's a very interesting transition point. I think that's a very interesting transition point. I think that's a very interesting transition point.
Starting point is 00:27:20 I think that's a very interesting transition point. I think that's a very interesting transition point. I think that's a very interesting transition point. I think that's a very interesting transition point. based on giving the AI autonomy as well, and also the advent of modern robotic tools are very data intensive. So I think that transition point is going to be a very interesting transition point. It's great for the storage business, so ASU should be happy. It's also valuable synthetic data. This is what we're struggling with with AI right now is,
Starting point is 00:27:41 there's the open kind of web data we've certainly ran out of as far as training models. So then there's a path for synthetic data. There's more human labeling, more reinforcement learning. And now we've got this whole category of AI agents doing chain of thought reasoning with their data footprint. And so all of that kind of non-human generated data is going to be really important for the fidelity in the future of AI
Starting point is 00:28:07 models. Because if that's high quality data, if we're generating high quality workflows, then that'll be super helpful. Obviously, if the workflows don't work, or the AI model's failing, then maybe that will devalue that type of data relative to human generated data at this time. So it's really exciting to watch that play out. Yeah. Yeah. It really is pretty interesting what's happening here. I'm actually excited because generally people,
Starting point is 00:28:36 there's a thought that AI applications require absolutely cutting edge, high-end hardware that's really expensive, that consumes tons and tons of power, that just you know basically there's a lot of negatives around AI and and many of these negatives are true but the industry is absolutely working to address those challenges and those criticisms and in many cases we're going to see applications being deployed on much more restricted or modest hardware at a much lower price point. And I think that all of this means that this technology can have a bigger impact than we
Starting point is 00:29:18 might have assumed simply because it doesn't necessarily require the biggest, baddest, hottest hardware to run on. You can run it more approachably on more modest hardware. So all of these things, I think, go in that direction. And I think that that's a positive for all of us. So thank you so much for this conversation. I guess what last thing would you want to leave our audience with?
Starting point is 00:29:46 What's your summary of this message? For me, I think there's a lot of ways to deploy AI. And I think the innovations we've seen in the last year are, on affordable deployment of AI, are probably 10x what we saw in the last 10 years. I mean, it's just an incredible pace of change on driving affordable models. And I think what's most important is designing the right business workflow for these autonomous workers or AI agents.
Starting point is 00:30:18 And then figuring out the deployment methodology, there's so much innovation happening on affordable off-the-shelf hardware hardware or even affordably deploying a state-of-the-art hardware that I wouldn't let that get in the way of your company's innovation. Excellent. Well, thank you so much for joining us. It's been great having you. Before we go, where can people continue the conversation with you? Yeah, you can find me on LinkedIn at steengram or metrom.ai. Excellent. And Ace, it's been nice seeing you again.
Starting point is 00:30:50 As I said, you were one of the co-hosts last season. So check out Utilizing Tech Season 7. Where else can people catch up with you? Where have you presented recently? Or where are you going to be? Boy, oh boy, there's a lot going on at SolidIME these days. Certainly you can check out our LinkedIn. You can check out our AI landing page. That's solidim.com slash AI, where
Starting point is 00:31:09 we hope to be featuring some of Metrum AI's excellent work in the near future. We'll be at conferences all summer long, so keep an eye out at all the big ones. For now, my head is spinning with all the implications of what Sina is talking about. And so I need to go have a lay down and kind of chew on some of this stuff.
Starting point is 00:31:28 But really, really appreciate you being here, Steen. I've learned a lot, and thank you. Yeah, and me as well. And I will point out that by the time you watch this episode, the SolidIME and Metrum AI presentation will be published on YouTube. Just go to YouTube and search for SolidIME and Tech Field Day and you'll find that.
Starting point is 00:31:49 That was part of our AI Infrastructure Field Day event, which happened in April. We're also going to be doing an AI Field Day event, a Cloud Field Day event, and we just announced another AI Infrastructure event later in the year as well. So check out the Tech Field Day website for more information about that. So thank you very much for listening to this episode of Utilizing Tech. You can find this podcast in your favorite podcast application. Just search for
Starting point is 00:32:13 Utilizing Tech or you can find us on YouTube. If you enjoyed the discussion please do give us a rating or review or a comment. We'd love to hear from you. This podcast was brought to you by Solidim as well as Tech Field Day, which is part of the Futurum Group. For show notes and more episodes, we have a dedicated website, utilizingtech.com, and we are present on the socials. You'll find us on XTwitter, Blue Sky, and Mastodon. Just search for utilizing tech.
Starting point is 00:32:39 Thanks for listening, and we will see you next week.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.