In The Arena by TechArena - AMI on Open Firmware for the AI-Native Data Center
Episode Date: December 9, 2025AMI CEO Sanjoy Maity joins In the Arena to unpack the company's shift to open source firmware, OCP contributions, OpenBMC hardening, and the rack-scale future—cooling, power, telemetry, and RAS buil...t for AI.
Transcript
Discussion (0)
Welcome to Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein.
Now, let's step into the arena.
Welcome in the arena. My name's Alison Corrine, and I am so delighted. It's a real treat.
I have AMI CEO, Sanjoy Mighty, with me.
Zanjoy, welcome to the program. The first time you're on me.
Thank you. Thank you very much for inviting me.
So, Sanja, before we get started, can you just introduce a little bit about AMRI?
You played a pivotal role in firmware innovation over the last 40 years, which is incredible.
Since you took the helm, how is the company's strategy evolved, particularly around embracing open source solutions?
Yes, we are in this business for 40 years.
years. That's not a small time period. So we started our journey around 1985, and that is the time
a lot of things changed in the firmware area, especially hardware innovations were very different
in 80s and 90s. As you know, the PCs and servers and the cloud, they all evolved in last
three decades. We always innovated. We were a very innovative company, and innovation is in our DNA.
So we always created firmware
which can help a couple of things in the industry
which is scalability from day one.
The industry started with different slabor of hardware
which we used to call chipsets.
And now it turned it to be all two or three chipsets,
but there were like 20 different companies in the past.
But what we did all the time is
a common uniform code for the ecosystem
where any computer manufacturer can scale their operations
first and we provided the highest quality product all the time,
secured product all the time and an uniform port base to maintain and manage.
And when I took over in 2019, before that,
KMI was making a class-leading farmer,
but it was more of a proprietary farmer based on some of the open source
and open architecture and the industry,
but our farmer IPs and everything was more of a closed source
and proprietary, we recognize that the industry is moving fast. And because the industry's, the
AI innovations and the server industry, cloud technologies, they require first high pace of
innovation from everybody. And then productization comes into the picture. But innovation, transparency,
and scalability become the big problem. So what we did, we embraced open source. We started
contributing to the open source. And today, in last five to six years, our journey completely
changed 180 degree. All our products are open source based, and we work with all the industry
influencers, large companies, CSPs, and we provide them the open source based solution today. So that's how
our journey started. And today we have a completely open source based solution. That's incredible.
No, we're heading into a really important time for open source with the Open Compute Summit coming up next week.
How do you see AMI's role in OCP and in general open source technology innovation within the hardware's perspective?
And what role does AMI play in open ecosystems today?
Very good question.
And listen, our environment with OCP has some purpose.
So I will start with the OCP's mission statement.
OCP was created to accelerate the scalable designs where the ecosystem can embrace
and it is open source, mostly in the hardware side of it.
And we see some challenges over there, which is the scalability, of course, is the first challenge.
Second is the standardization.
Third is basically commoditization.
And fourth, I would say security.
and finally sustain this whole thing.
Now, AMI recognized these are the challenges
and AMI wanted to prove that the OCP's mission
can be accomplished by our contribution to OCP.
Now, there is a broader spectrum of open source
which is open buyers or IDK2 or UFI that we know.
And there is also an open BMC,
which is the main topic I would touch more
because that is reality today.
So that is a Linux foundation and there's a broader public open source where everybody contributes.
There are probably 100 companies or individuals have contributing to that, which is growing, which is very good.
But the problem comes in the scalability part of it because there are so many varieties of component manufacturer and also OCP has multiple specifications.
One notably is the modular system specification.
So whenever there are multiple CPU architecture could be X-80C, could be armed, could be risk,
there could be multiple component manufacturers, especially for the manageability controllers,
coming from company like a speed, Novotone, Axiato, maybe other companies are also making.
Then there is a lot of combination of the hardware.
that people are producing.
And when somebody takes the code base from open BMC,
they try to manage and create the product
and then contribute back.
So what is happening at the end
that public or the broad pool of open source
are getting fragmented.
So when it gets fragmented, it's a big problem
because you have OCP accepted hardware,
which source space that you should use,
which you can maintain for 10 to 15 years.
for sustainable purposes, and you have to patch continuously, you have to secure the system
continuously. So we recognize that problem. So what we do, we actually invest a significant
amount of time and effort. So we did that. We merge test, and not only that, now OCP has
a guideline for security, audit purposes, or practices called OCP safe. We actually run and we
invest that time and we make sure that it is OCP safe compliant by one of the four
approved auditing company by OCP. And then we give it back to the community, to the OCP community,
which is more interested to use a common source space, uniform source base for developing
the product, which is the ODIM ecosystem. So we contributed our buyers, BMC, as well as our
security code based to OCP in OCP's GitHub.
So besides that, we are also involved with other steering committees where the standardization
or specifications are being developed.
One is the modular system.
Other is rack management.
Another is the open silicon firmware.
And that would be other manageability solutions that we are also participating.
We are bringing all of them together to make sure that these specifications are implemented
as a modular way on the court base that we are.
contributing to the OCP. So that means we are also contributing to the broad, public,
open source, and we take that community court back to merging, testing, and validating, creating
a production-worthy, ROM-supported code base, and providing it to the OCP. And that is basically
for an accountability purposes with SLA to our customer.
This is really foundational work, and it's impressive, the scope and scale that you're
engaging in. Can you give some examples of some successful collaborations that either you drove
within the OCB community or other standards bodies that you reference that illustrate the
potential of open source firmware, you know? Absolutely. We worked with a company like Jabel and
our goal was to show that how modular and how uniform a port base can be from our point of view.
So we took the OCP version of the firmware and we had a tool change.
that we built around it, and those are our IP.
And we could change an Intel-based processor module
to an AMD-based processor module on the flight.
We could change the format, and the rest of the system is same,
and it is up and running, and it booted the system.
So that proved that how modular it should be
and how easy and scalable that solution could be for the ODM ecosystem completely.
Yeah, I mean, some folks will listen to this and say,
okay, that seems simple that we all know that if you've managed hardware,
these are things that will lead you did in the water if they're not working correctly.
When you look at some of the challenges, and we talked about code forking and getting
fragmented, can you talk a little bit about the technical and organizational challenges
and making failure more open and irreparable across these diverse platforms that are coming
know, in an AI-driven world?
That's absolutely a massive challenge that we see every day.
In an AI infrastructure, especially, it is very disaggregated architecture, because
AI demands and tasks or the workloads are different than the normal server or the compute.
So it is a disaggregated solution.
For example, the compute, the GPU, the networking, the cooling, and the power, everything
is disaggregated. Every component has firmware to manage it. And they are all using the common
open BMC code base, the base code base to manage each component. And guess what? Each person
may come from different ODMs and different companies altogether, then put together and create
the actual server or actual AI infrastructure. So it is coming from a cross-wender solution.
If you don't know the code base, you are losing the visibility of software bill of material.
You have no auditing capability.
Within the data center, you cannot definitely say that all modules within the firmware
across your data center is having the same security patches or same security measure.
So these are the challenges that we always continuously see.
Plus the obvious bug fixes will not be there.
So this is our effort.
that we always merging and every quarter that we want to upstream to OCP GitHub and telling
the OCP community, basically, that who are interested to make, who are relevant, basically, for
making hardware. And we are telling that this is the common uniform board base, download it,
and build it on top of it. And we have very success stories on that. The common base is used for
our own product also, which is based on open source. And we,
delivery to very large Tier 1 CSPs in this country, as well as all the ODMs in Taiwan.
So it is definitely a production worthy one, and we know that it can happen.
So this is on our effort today.
Now, you talked earlier about how innovation needs to move faster.
What innovations in the firmware space excite you most, especially those that could be accelerated
through the open source collaboration that you're talking about?
So one of the thing is definitely, I would say, sustainability is an important topic always.
Whether it is environmental or even delivering the product, the challenges are not just build the product, build, secure, scale and deliver.
And all put together, the innovations should be in the architecture by design.
So it has to be modular.
So it should work that way.
So not only just the modularity, but how do we build the firmware, how do we test the
firmware from the self-intest point of view, how do we sign the format, how do we create
the software build-up material, and how do we deploy that? All are important and all are implemented
within the format. That is basically how we build the format. Then the second part comes, how the
farmware has innovations about the feature wise, and that is the AI world, the AI infrastructure
world, demands a lot of innovations within the firmware. In terms of the monitoring and
availability, availability and serviceability side of it, monitoring the RAS functionality and
elevating the RAS to the data center level standard is very important. And they are all innovative
features that we build, continuously build within the firmware.
Now, I did want to ask you about a couple of different things around AI's influence and
firmware delivery. And I spent a lot of time in the Silicon Arena. What role do you see
for work bringing in AI silicon and in platform development?
Great. So I will divide into two part of it. The first aspect is AI silicons are now
heading towards the custom silicon designs. And we see a lot of news and
Finally, very exciting news from NVIDIA and Intel that we have looked at a couple of weeks ago that both companies will work together, where the GPU technology comes from Nvidia and it will be used X-86 technology together.
This is one example, but ARM has shown this last three to four years, showing that, okay, you can take the IPs from different companies that you can build a custom silicon which has the dedicated or target-specific EIF hardware.
Now, in this, there are two types of firmware that we see, and we are playing major role
over there.
One is the silicon firmware, which is running inside the silicon.
So there are different IPs coming from different companies, and they are packaged to create
a custom silicon for AI purposes.
And there are probably 10 or 15 different type of firmware running inside the silicon.
And those are important because these IPs are also coming from running.
different vendor that 10 years ago or, you know, Intel, AMD, or the ecosystem processor
company, they control the entire processor technology. They deliver the final product. But in
this case, it is coming from multiple companies. So when we are building, somebody is designing
the processor or the silicon, then it is very important. The own silicon firmware should
enable those IPs, working with those individual companies, and make a lot of them. And make a lot of,
sure that the silicon works. So these are basically follow the chiplet technology. Each are coming
as a chiplet. So chiplet firmware, and in the future, we believe there will be more
specification and standard coming, how to interact with the chiplets and how to manage chiplet on
silicon. And it will evolve tremendously in the future. We are part of the armed chiplet
specification, and we are also working with them in that trial. And second side,
is the plaquesome firmware.
So I talked about the silicon format.
Then the second part is the platform format,
where is the boot format, which is BIOS,
manageability firmware we call BMC,
base board management controller,
and the security format,
which is the platform route of trust.
All three are very important in case of AI.
One is the booting of the system is important
with the attestation, properly, route of trust,
validation, all these things is very important.
and secure boot is important.
The manageability part is probably the most important
in an AI data center.
Just to compare, there was a study made by META
probably a year and a half ago,
that shows that typical failure of a GPU in a data center.
It's not because of the GPU silicon,
but associated high speed connectivity
and associated glue logic.
In general, the failure of a GPU
In a data center, average failure rate is, they call AFR, annual failure rate, is 11%.
The highest AFR or the annual failure rate was a spinning drive, which is less than 1%.
So compare in the context of less than 1% to 11%.
So we are innovating many things in our firmware.
In terms of telemetry, we are even using AI tools to do that.
that how can we predict for the GPU failure and minimize that.
So that is one of the area that I think the firmware has to innovate more in the future.
That's fascinating.
And as you were talking, one thing that I thought about is AI's design point is moving from a single system to rack scale and even data center scale.
How do you see the role of firmware expanding within that process?
This is the great question.
So this has a tremendous amount of dependency on the firmware.
So now when we go from one server to the rack scale server, then the clustering comes into the picture.
And you cannot just build the server.
It is on demand or dynamic composition and decomposition of the rack.
And then multiple racks create the pod and multiple pod creates the entire data center.
So that's how the whole vision is.
Now, within a rack, if you think of the firmware, their responsibility is to also have the capability of taking instructions from the upper level manager or rack manager or data center manager, whatever we call, can compose and decompose the clusters, smaller clusters within the rack.
So that is a big area that firmware has to play a role, major role, and it is dynamic and it is demand basis.
So that's the area that the rack scale impact is there.
Now, when we talk about the rack skill AI server,
two major challenges that we have is cooling
because now it is generating a massive heat
because it is now a one megawatt rack.
So OCP Dublin, they just announced that
there is a specification from Google for a one megawatt that.
As a matter of fact, just 10 years ago,
I thought that megawatt is for,
for the Hentat data center.
Now it is only one rack, correct?
The cooling technologies are three different areas
that we believe farmer will play a major role.
One is the server side of it,
how to cool the chips.
And based on the cooling technology and cooling mechanism,
the GPUs, CPUs, and all this component
will be throttled properly.
So it can achieve the best business performance
to the data center.
Second part is the
cooling or coolant distribution unit, which is not part of the actual compute rack.
It is a site rack that provides the cooling and cooling distribution.
Over there also, there is a major role for firmware, and that will communicate with the
main rack that will also control the flow of the liquid, that will also detect that
leak of the liquid, and those kind of things.
And third is more coming into the facility side of it, which is all this cooling,
needs to be cooled down quickly and then coming back.
So there are major firmware which will also play major role.
That's really cool.
And thinking about the intelligence that firmware will deliver is really interesting
when you consider the brute force liquid cooling that's going on today.
You know, one thing that I wanted to ask you about, Svirj,
you've charted a new course for AMI in your tenure.
And I guess as you see this market transformation happening in a real tournament,
How do you see AMRIE in this space living forward?
So the good news is that we always know that we have to be innovative
and we have to be a thought leader in this area.
We invest a lot in terms of the partnership with the companies
and we invest a lot with the community and the platform like OCE
and especially contributing to the standardization or the specification
that is evolving around it.
As I mentioned, we will continue to innovate and contribute to the cooling technologies, open-chiplet format technologies.
And we believe that these are the areas we will evolve in the future.
Yeah, that's really awesome.
And I guess I wouldn't be a good interviewer if I didn't ask you, do you see AMI having soap beyond firmware in the future?
We are already.
So what we do, we believe that in the AI world, we started.
our journey with firmware. We do have that. And we are the leader today in the firmware world.
But we see that it is not enough. Because when we are talking about rack scale, when you are
talking about the cooling technologies, when you are talking about the power, I have not talked
about the power and carbon footprint, forecasting and all this area. We do have software today,
which does, and we are bringing it. We have not launched it fully. We also have an AI-based analysis
of this telemetries, which is coming from the firmware and then analyze it and give a meaningful
information, actionable information to the data center operator.
Otherwise, the firmware data is very raw data to them.
So these are the areas AMI already have products that we are bringing.
We do have solution for composing and decomposing the rack based on the demand and based on the workload
job.
And so we are beyond farmware.
So we are basically having software and running in-band software and all those things.
That's so fascinating.
And I love to hear the trajectory of the company and really how it paces with what is happening in data centers in terms of the transformation.
Sanjay, I'm sure that we keep a lot of interest from our audience today.
Where would you send them to find out more about all the things that you talked about and engage your team?
So I would first say that all the workgroup that we are working with OCP, if they are
OCP contributor or member, they will find us definitely over there what is our work and
everything and can connect. Our website is a good place to start with and there are a lot of
materials there and information as well as contacts there. I personally can be available in
LinkedIn. I am always available on LinkedIn and if any question or anything comes. But these are the
two places. I will start with OCP and everybody should see our work and then get engaged.
And I will encourage everybody also to contribute and build a good community.
Well, Sandra, I know that you're a busy guy. I appreciate your time today to share this
incredible vision that you're bringing with the AMI team to the marketplace. Thank you so much for
spending time with TechCrain. Thank you, Alison. Very nice talking to you. And thank you for having me.
Thanks for joining Tech Arena.
Subscribe and engage at our website, techorina.aI.
All content is copyright by tech arena.
