In The Arena by TechArena - Metrum AI’s Steen Graham on Multi-Agent AI and Smarter Storage
Episode Date: January 5, 2026From OCP Summit, Metrum AI CEO Steen Graham unpacks multi-agent infrastructure, SSD-accelerated RAG, and the memory-to-storage shift—plus a 2026 roadmap to boost GPU utilization, uptime, and time-to...-value.
Transcript
Discussion (0)
Welcome to Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Allison Klein.
Now, let's step into the arena.
Welcome in the arena. My name is Allison Klein. We're coming to you from OCP Summit in San Jose, California.
And I am so excited because we are with Dean Graham, a friend of mine and CEO of Metro May I.
Welcome to the program, Steen.
Thanks for having me.
Congrats on all the success of tech arena.
I see you guys everywhere now.
Yeah, that's every event I go to.
I'm like probably steamrolling your LinkedIn right now, Steen.
Yeah.
Steen, you've been on the program before.
We're just going to start with.
Metrum is like all over the place.
Why don't you tell me a little bit about Metrum and what you're doing at OCP Summit?
Yeah, Metrum has a couple of key products that we work on.
And one of them is our.
Metrum Insights platform, which is a great no-code environment for evaluating the latest
greatest AI hardware pipelines and workloads and evaluating for performance and accuracy.
And just this week, we actually had a launch where we launched with Nvidia, the Spark platform.
Oh, my gosh.
Yeah, that's cool.
Usually we do server class infrastructure, but we've had the opportunity to work with
NVIDIA on their GPU product lines for pro applications, enterprise applications, and they're
fantastic.
Yeah.
And so, yeah, we were really excited to announce our support for Spark this week.
That's really cool.
I was thinking about you last night, and we both come from a heritage of working at Intel,
and I know that you spend a lot of time in IoT and Edge AI.
How is that shaped your approach at Metrum as you've grown a practice around AI across the
broadest of landscapes. Yeah. So where it really drove my approach is not just working with
NVIDIA and Spark, because that kind of brings me back to the edge AI days because there's so much
power and a small form factor there. But really what we learned with edge AI was it's all about
solving industry-specific customer problems. And so when we roll out our agent platforms,
which is another line of business for us,
we really focus on industry-specific agents
and how do we build industry-specific agents
that solve those domain-specific problems in the market?
And, you know, part of the reason I'm at OCP
is we have an effort around infrastructure agents
where one of the biggest challenges right now
is just keeping your infrastructure up and running,
your data center infrastructure.
The world's making hundreds of billions of dollars
investments in infrastructure.
The ROI is all driven by the,
ultimate utility of that. It's just like semiconductor manufacturing, we would just
stress over yields. And, you know, when I started Intel 20 plus years ago, we were just
furiously focused on, you know, can we get the yields from 97% to 97.4% because that's millions of
dollars. Yeah, that's money. Same thing as the industry builds out GPU-centric data center. You need
to make sure it has uptime. And it's hard to get employees to monitor data center infrastructure
24-7 and root cause issues as fast as AI and remediate those issues. And so we're able to do that
with our infrastructure agent portfolio. This is kind of the reason why I wanted to talk to you,
other than the fact that it's always lovely to talk to you, is like you are seeing adoption curves
take place in real time across industries. Where are you seeing the hot buttons across enterprise
today in terms of the types of use cases or the types of verticals that are deploying?
Yeah, I mean, there's the common hot problem of just security and privacy.
And then with juxtaposed of everybody also wants to be state of the art and affordable.
And so those three things interposed is what's really the hot buttons that I'm seeing in the industry,
because everybody knows there's an affordability opportunity if they go on-prem.
Everybody knows that the safest form of privacy is to own their own infrastructure.
But if you want the latest tools at the pace of the industry, that's tough.
Yeah, it's extremely difficult.
And the labor force that we had 20, 30 years ago to rack and stack servers, we all told them to go to the cloud.
You know, go to the cloud.
Now it's very difficult to say, even for the neocloud providers, as they upstream their work, they have a labor force challenge.
Yeah.
And so, you know, ironically, that can be solved with AI for sure.
Now, I know that one of the things that's happening within those contexts is smaller model applications
that can make some more standard approaches to hardware possible so that they're not having to go
into the esoteric realm of some of the neoclouds and hypers.
How do you see that changing the game?
Yeah, it's a very kind of interesting environment with the small language models.
They can be 100x plus more affordable to deploy.
They can sit on your PC class infrastructure or your own.
The challenge that you find is like there is accuracy loss.
So what's an acceptable accuracy loss?
And for us, same thing.
I look at, you know, you go run like deep seek, 671B,
and you can take a distilled llama version at 8B.
Yeah, sure.
And you can get a lot of that core deep seek model.
But you can see like a five plus point accuracy loss across a weight of benchmarks.
Right.
And then do you want to deploy that to your customers?
Right.
And the answer for me is no, unless the customer has a very simple app,
but we're focused on high fidelity enterprise applications.
And so it's usually a no, but there are some kind of in-between approaches
that you can take on the models.
Do you think that we're going to get to a point where we see those smaller models get higher quality,
or is this just something that will always exist in terms of model size and accuracy?
I don't think there's any theoretical technology gap in enabling the accuracy of the smaller models.
So I think we're going to continue to see fast-paced model acceleration.
But there is going to be some trade-offs in performance.
Whether that closes to zero over time, I really doubt it.
Okay.
In the time scale.
That's worth looking at it.
Yeah.
Like a five-year time scale.
Yeah.
I don't think it's going to be zero in five years.
Now, you have a program called Knowier a.
platform. Tell me about that.
Yeah. So one of the things that we do in our Metrum insights is you kind of have to know your
AI before you deploy your AI. And we use this product internally as well, where, you know,
what we really want to do is we wanted to evaluate the model performance on real world workloads
with as good or real world data sets as we can source. It could be a knowledge base from a client,
sure, you know, or otherwise. But you have to essentially evaluate the AI continuously. We deploy
know your AI in our evaluation module, and then we also deploy in our production modules
because you don't want to just test it once. You have to continually monitor the AI for performance
and accuracy loss over time. And so it's just a core tenant that's foundational. I was talking to
a friend of mine who was saying that, yeah, her company, who's a leader in terms of adoption of
enterprise AI, trained their own LLM, and then forgot this part. And she was telling me about the
accuracy challenges three years on in terms of the LN, just not maintaining the accuracy that was
needed for enterprise applications. And I think that you're really hitting on something so
critical. Yeah. It's no matter how good of a job you do up front, you've got to continually
eave out accuracy. Right. And so we're not just looking at the models. We're actually looking at
the accuracy of the agents on human level tasks as well that are domain specific. Yeah.
So if we have a debug engineer, we score the engineer on each and every task that they do.
Yeah.
So we've got a ranking system.
And their agent, who's a manager, by the way, their manager agent evaluates their scoring.
And then eventually a humans looped in the total eval platform.
I'm glad you said agent because that was my next question, agentic computing.
And obviously, agentic opens up a wealth of different use cases in the enterprise.
How do you see enterprises looking at agents? Do you think that there's a high trust at this point?
Where are we in terms of agentic deployment?
Yeah. Well, I think enterprises definitely see it as a massive productivity vector.
And for many enterprises, as well, it can impact top line revenue.
So you can't not be evaluating agents right now.
There's just too much detrimental risk to your business.
You know, like Andy Grove famously said, you can be the subject.
of a strategic inflection point or the cause of want.
So your business can get disrupted or you can be the one driving the inflection.
So you have to look at agents.
Equivocally, huge value prop.
Now, as far as like trust, I think a lot of people have had challenges with prototype
to production as we're talking about.
And so I think that the agentic platforms are still earning trust.
Yep.
And I mean, there's been a bunch of big failures, too, platform.
that came out and under-promised and under-delivered.
So there's a lot of trust to be earned left on the table.
Yeah, I think that this is going to be something that we're going to be watching really closely
is I had a conversation with one of the executives from Walmart when we were at AI Infra,
and I know you were there too, about agents.
And it was so interesting to see how advanced they were with the amount of agents that
they've developed, but still very early stages of deployment, a lot of prototyping,
not necessarily broadly deployed across worker classes or functions.
I think this is going to be a key topic of 2026.
I want to ship for a second and say,
you mentioned the Nvidia Partnership.
You were doing so much great work with so many people across the industry.
Talk a little bit about the partners that you've got
and how you've engaged to deliver value across different types of semiconductors,
different types of platforms, etc.
Yeah, we've been lucky to like 40.
For example, we were at Advancing AI Day.
For AMD, we've had the opportunity to work on early access to the Dell Power Edge,
MI 300X systems, and now the 355 systems.
And we built seven industry-specific agents to showcase on Instinct.
And we have no challenge developing state-of-the-art AI agents on the Instinct product line.
I know everybody's a little bit, Nvidia has such a great product and a great footprint.
But as far as a developer perspective, you know, that's great to do and really fun.
Yeah, we've worked on a number of compelling initiatives.
We'll be at Supercompute showing some great stuff off with the team at Solidime.
And we're really, really excited about that as well.
That's cool. That's awesome.
Some of the stuff we're working on with them is really interesting because you think solid state drives as a developer, you think, okay, why solid state drives are we talking about it?
Because we're so GPU memory constrained right now.
Right.
And you can do some really cool things like offload model weights to solid state drives.
You can use disk A&N to get really optimal performance for like rag type, agentic rag type workloads.
So there's a lot of cool stuff we've been doing with the team at Solidime as well.
Everybody I'm talking to is saying storage architecture is so critical at this point.
And thinking about the tradeoffs between memory and storage and the price points of memory,
whatever you can do and move to storage, you should.
should be doing and moving to store just for that reason. I'm glad that you guys are engaging there.
I can't wait to see it as super computing. But I do want to go back to the instinct thing because
I was at that event and we wrote about it. You know, you built some multi-model solutions for
healthcare for insurance. It was really incredible what you showed. Can you unpack that a little bit?
Yeah, the instinct-based agents that we developed. So like we have an insurance use case that we
developed. And what's really cool about that use case is, you know,
Everybody thinks about speed and everybody's focused on tokens per second as the key output.
Really, you need to deliver business values.
With that particular use case, what we're trying to solve theirs, we're trying to solve
the challenge of people getting paid for insurance claims as soon as possible.
Yeah.
Like most common damage and home insurance claims is roof damage.
And so that process, you know, somebody's got to get up on the roof.
Usually they do an audio recording, take photo.
Right.
Then they've got to come back down, write a report, write a report on the photos, take their audio notes and transcribe them.
And what's cool about instinct is we can pack in a vision language model, we can pack in audio transcription models, we can pack in an LLM reasoning model for report generation.
So we can use that full memory footprint.
So it's not just like the speed of output of tokens, it's the speed of output of the report.
And so the report gets done faster, the insurance.
company. There's a lot of regulatory requirements to just get the paperwork done so people can get
paid. Yeah, that's amazing. And so like that type of application is really cool. We also worked on
telecommunications infrastructure agents where it's like everybody doesn't want drop calls. It doesn't
want data drops. The carriers are starting to get more and more requirements for enterprise customers
because all enterprises are running online as well. And so agents that take in the telemetry streams
off the base stations and take remediation actions, generate jury tickets or service now
tickets, whatever, ticket system.
That's cool.
That's cool.
Yeah, a very cool use case.
Yeah, I mean, we've done a lot of great, so I can go on and on stuff.
But it's really the memory footprint that the instinct product line has allows you to kind
stuff in more models and get a better total cost of ownership for those type of multi-model type
application.
So it's a very cool product.
We live in interesting times.
I think about you a lot because we have unique vantage points.
Metrum, tech arena, both doing different things, but engaging with the entire industry at this incredible time of innovation.
One of the things that I have my last question, I guess, for you.
Yeah.
Is where are you going next with Metrum, Steen?
And what can we look forward to from you in 2026?
Wow.
2026 seems so far away right now.
I know.
But we've got a number of upcoming releases.
on our multi-agent infrastructure platform.
And we're just tremendously excited about helping neoclouds,
anybody setting up GPU-centric clusters,
save themselves a ton of time on debug,
on setting up the infrastructure,
and ultimately making sure that they can get high utilization
and generate revenue, you know,
an uptime off of those investments they're making
as the industry rolls out hundreds of billions of dollars.
Yeah, I know, infrastructure.
Exactly.
You want to make sure that is available as soon,
as possible. Oh, yeah. Oh, yeah. Normally we would say humanly possible, but in this case,
we'll say as soon as possible, period. Yeah, exactly. We're going to use AI and humans to make
it happen. Well, Stee, it's so fun to watch you soar and Metrumsore. Thank you so much for
spending time with us on Techorated. Thanks for having me, and thanks for all the great
content you guys are putting out in the market because thanks. It's keeping me up to date.
That's awesome. Thanks for joining Tech Arena. Subscribe and engage at our
website techorina.a i all content is copyright by tech arena
