Computer Architecture Podcast - Ep 22: Measuring Datacenter Efficiency and Visioning the Future of Computer Architecture with Dr. Babak Falsafi, EPFL
Episode Date: December 10, 2025Dr. Babak Falsafi is a Professor at EPFL, the founding president of the Swiss Data Center Efficiency Association, and the founder of EcoCloud, an academic consortium focused on sustainable IT. His con...tributions to computer architecture include the invention of spatial and temporal memory streaming (SMS prefetchers) found in ARM cores and laying the groundwork for fence speculation by defining memory ordering requirements in modern CPUs. He is a key figure in cloud-native server design, with his work forming the foundation for the first-generation Cavium ARM server CPUs. He is a former chair of the SIGARCH Executive Committee, a recipient of the Alfred P. Sloan Research Fellowship and a Fellow of both the ACM and IEEE.
Transcript
Discussion (0)
Hi, and welcome to the Computer Architecture podcast, a show that brings you closer to cutting-edge work in computer architecture and the remarkable people behind it.
We are your hosts. I'm Suvinai Subramanian.
And I'm Lisa Shue.
On this episode, we welcomed as our guest, Dr. Babak Falsofi, who is a professor at EPFL and the founding president of the Swiss Data Center Efficiency Association,
which promotes best practices in sustainable data center operation with quantifiable energy efficiency and emissions,
with online calculators and a label.
His contributions to computer architecture
includes spatial memory streaming
or SMS prefetchers,
which have been documented to be in ArmCores
since Cortex A53.
He has shown that consistency models
are neither necessary nor sufficient
to eliminate memory ordering related stalls.
These results laid the foundation
for fence speculation in modern cores.
For the past two decades,
he has been working on cloud-native server design.
He is a recipient of an Alfred P. Sloan Research Fellowship and a fellow of the ACM and the I-Trippoli.
Babak joined us to discuss the art of understanding electricity efficiency in the data center by measuring where the energy flows
and how this measurement is hampered by a lack of standardization.
We also discussed broader topics like what the vision should be for our field in the coming years
after such dramatic developments in computing have happened since the last visioning workshop.
We want to particularly note that Babak was a chair of the Sigarch E.C. when we started this podcast and was an early supporter of this project. And we want to thank him for his continued support. And with that, let's get to the interview.
A quick disclaimer that all views shared on this show are the opinions of individuals and do not reflect the views of the organizations they work for.
So Bobbock, welcome to the podcast.
We're really happy to have you here.
And people know our first question is always,
what is getting you up in the mornings these days?
Thank you for having me.
I get up in the morning,
I have my espresso with artisanal coffee,
and usually play duolingo.
I'm learning Italian,
and it's the fastest language so far I've learned.
Wow.
that's cool so first i guess you're in europe so having your espresso is like a necessary thing
okay and then the italian piece so you said is the fastest language you ever learned what other
languages if you tried and what's with the italian like what's striking that so i was one
anyone so i speak farcee i can't say my farcee is as good as it used to be my native language
is probably english now i speak greek my kids speak greek my wife speaks greek
I speak French.
That's the language of teaching here.
It also language in La Zanamed language.
German, I spent a few years in Germany before coming to the U.S.
And Italian, because we're right next to the border, and it's a great place to hang out.
It turns out that if you speak Italian, it completely changes the world when you go to a restaurant and order of food and wine.
I see, I see.
Okay, so you've gone full euro.
I mean, I remember going on a business trip.
when I was at Qualcomm, we had to go to Europe. And so our counterparts there mostly were
European from various countries. And we had a few Europeans on our side, the Qualcomm side,
plus like a few regular old Americans. And we just had, it was the dinner, you know,
post-meeting day dinner. And we were discussing how many languages do you speak? How many languages
do you speak? Every European spoke at least three languages. And most of the Americans, with
exception of myself, took one, maybe one and a half. And it was just absurd. Why can't we speak
more languages here in the States? How many languages do you, Steve, Sufni? Probably five or six.
Oh my goodness. Are they mostly dialects of language? Are there? Yeah, various Indian languages
because I grew up in multiple states in India. So I picked up four languages there and then
the English. Amazing. I speak English, Mandarin, and Mandarin and
Spanish, all reasonably passable, I would say, and I'm trying to learn Portuguese, which is
actually hard because Portuguese is so similar to Spanish that there are some verbs that are the
same and some verbs that are completely different. You just have to remember which ones are which.
But yeah, well, cool. So that's getting you up in the mornings. And then what is happening after
that after you get to EPFL? Are you working in the office? Are you working from home?
I think after COVID, we're spending a little bit of time at home, but I do go in.
I go in usually three times a week.
I travel a lot, too.
So when I'm here, I go in a few times a week, and otherwise I travel.
I spend a lot of my time in industrial conferences and data center sector.
Go to academic conferences, just like the way I used to.
I went to micro, but I also spend a lot of time in industrial conferences these days.
Gotcha, gotcha.
And so a lot of your focus these days is on these sort of data.
Center industrial conferences. So tell us what's going on there. What are you focusing on?
Well, we're trying to bring a little bit of clarity about this electricity that comes in.
Where does it actually go? And we'd like to find ways in which we can help operators measure them
as precisely as possible. And then find our bottlenecks in the energy flows. D.C. Energy
flows. This is the part where you have your cooling, your UPS system, everything that houses
and supports the servers. That's more of a well-established area in terms of metrics and
methodologies. And then IT, when you bring the 20 kilowatts into the rack, where does that
electricity go? That's up in the air right now. We're trying to find ways to measure and report
that. Got it. Yeah. I mean, I think energy and data centers are one of the center
pieces today. Just in the US, I think we're expected to double the data center energy footprint
by 2030 from to like 70 gigawatts or something in that order. I'm sure are similar growth
trajectories in Europe and other parts of the world too. So one of the statements that you've said
is when you talk about data center energy and you're talking about measuring the energy flows,
we should not be guessing what is actually happening, but we should measure them very carefully
and understand what the metrics are and get clarity behind where are we spending our
energy, our dollars, and also carbon dioxide emissions and so on.
Can you double click on that?
Tell us a little bit about what are we guessing right now, where is there a lack of
clarity, and what can we do to make the state of the ecosystem better?
Well, first of all, I'd like to also follow up a little bit on the background that you gave
because there is a huge shift towards energy consumption and data systems.
They have started growing partially because there's just sort of normal growth, exponential growth in space, but also because of AI.
This is having an impact on energy market.
So it's super important for everyone to find a proper way of measuring and provisioning for energy in the data center market.
Now, what are the challenges there?
There are metrics that have been around for several decades now for DC energy flow that's
our usage effectiveness, VUE, it basically accounts for how much of electricity is going into
the infrastructure that houses the IT equipment. And there are, of course, changes that are coming
because we also recycle heat from data centers. There are on-premise renewables. We need
to account for these as well, and that impacts the metrics use in DC energy flow. And then,
Well, the good news is that VC energy flows are getting better.
So now we're getting PUEs of just some of our clients at STEA,
Swiss State Data Center, Efficiency Association.
They're getting less than 1.22.
That means, most of the electricity is going to IT.
And there, it's, we don't have proper counters and the servers to figure out
when you bring it down to electricity, is it actually used properly?
I see.
So your focus is.
So when I was at Microsoft, we thought about data centers quite a bit and as a whole.
So we did talk about things like the heating and cooling and that sort of infrastructure
as a sort of first class component to the data center architecture.
And I think what I'm hearing you saying is that specifically within this, there is the compute part.
And of course, you want as much of the energy as possible going to the thing that you built
the data center for the compute. And then within that, if we were to double click on that
presumably dominant source of the consumption, we don't have good details on what's inside. And that's
perfect for computer architects to really dive into. I see. And so then to go back to Suvenay's
question, like, what specifically are we really guessing on in there and what will we like
to get much more clarity about? Well, first of all, there are no standards today in industry or
R&T. So just creating standards is already a good thing. A lot of standards are based on some of the
emerging standards based on utilization. Utilization is super important. The reason is that there's idle power
and if you can utilize your servers or you use the electricity that's coming in a lot more
efficiently. And there are ways to measure that today. Based on the load and server and
consumption are correlated.
So you can measure the load and based on that, then figure out to the first order
how well you're using electricity.
So that's one thing.
There are other sources of electricity loss in the rack itself.
The server itself, electricity goes into the power supply units, power supply units.
They have industrial ratings already.
So if you're a titanium plus, you're about within the efficiency of 3 to 5%.
So that's a little loss.
You can measure that again based on the mode.
Then you have the fan power.
Fan power is a problem because, well, first of all,
a lot of operators want to operate at higher temperatures
that these fan power is higher.
So it's not to be ignored.
That could also account for 10, 15% in some extreme cases,
20% electricity loss.
And then with liquid cooling, liquid cooling is slowly coming in.
today with our PUE metrics,
fan power is accounted for
as IT power. With liquid cooling, there's no fan power.
So that makes PUE not an apples to Athens metric
when you're hosting both liquid cooled and airpool servers.
So that's the first. And then beyond that, we can double click into
let's look at when you're actually using the server.
program is running. How efficiently are we computing? I see. Okay. So then to get more precise,
the picture, and just to be clear for all of our listeners, too, there's one way to look at it. If you look
at an entire data center, all the electricity going into the data center is for the data center.
So you could say in some, like at a very coarse lens, 100% you're using it. Then you double
click in and say, like, well, actually, let's look inside the data center. Some percentage of it is
going to cooling, some percentage is going into whatever. You need to have lights for the
bathrooms, for the technicians or whatever. So now you have some amount of cost. So that's not directly
going into compute. So now you're going into the racks. And you're saying in the racks,
you could say all electricity going into a rack is being used for compute, except now you actually
have to double click and say, like, no, we're using some for fans. We're using some for, you're doing
some loss and just the transformers. And so now we're like actually kind of figure out how much
electricity is going to compute. And then once you go on beyond these level of peripherals,
what I'm really curious about is how you might think about electricity with respect to
like compute inside of what a computer architecture student might think of with respect to
like Patterson and Hennessy book or something like that. Like is it that you needed to be going
through an ALU to count or like what if it's just being used in a driver to push data along a long
wire or read something from memory or DRAM refresh.
Like, how are you thinking about how much electricity counts, I suppose, as like actually
being used for what you wanted to do?
So this is a very good question.
And to the first order again, as you double click, you go in, it's important to look at how
much electricity is going to compute, how much electricity is going into your storage and
memory and how much electricity is going into your network.
And this with AI actually gets a lot more interesting because interconnects and
data centers, they weren't accounted for a big chunk of electricity consumption.
But now with AI interconnects are a lot more interesting, a lot of electricity is going
into the data movement.
So that would be the next level of accounting, which is how much are you computing,
how much are you spending in your memory and storage and how much of it is going into movement?
We could, for cloud racks, not for AI training racks, there are already back of the envelope
sort of breakdowns of how much of your electricity is going to CPU homeschool memory and
breakdown homeschooled network. We can't measure that today, but we can already start with a set of
basically breakdowns that are aggregate average over what operators report.
That's the state of the first today, basically.
That's where we are.
So you look at utilization and your compute and your CPU, your GPUs.
Utilization used to be in the first couple of generation data set of the first 10 years,
I think, since 2000, the famous Google paper in 2000,
utilization used to be low because of SLO.
But then in the next decade, we started consolidating using software technologies because building was a lot more expensive than trying to consolidate.
And so we develop software consolidation technologies to allow utilization to go up from 15 to 20% to 60, maybe in some cases 70%.
There are no real numbers available for this.
So we have these numbers from the famous data center book by Rosso and Holtzlip.
And so that's where we are with hyperscalers today.
There is another market for data centers, which is outside of the U.S. and China,
and that's a co-locator market.
The co-locator market is real estate companies building data center campuses,
and they rent space to IT customers.
The IT customers are co-locators have really low utilization.
Their utilization are still 10%, or even below 10%.
So I also spend time, I'm going to IT conferences and talking to IT operators about
utilization.
Gotcha.
Interesting.
And so what would you say to someone who, you basically just sort of classified into
co-locators for people who are renting and hyperscalers,
like two totally different ways of using a lot.
lag racks. And so if you were, say, a hyperscaler, like, what is the one sort of way of
designing a data center that would be, like, very bad for this sort of electricity utilization
that might be sort of hidden right now because we don't really measure anything? Like, is there
something that jumps out as like, this is actually a no-go? I think for hypers, it's sort of
a somewhat of a
solved problem because
whatever
number of servers
they have to buy
and the electricity
they're spending
that's already factored
into the economic model
so they
try to operate
so that they can
for a given
service level objective
they can deliver
basically maximum throughput
they want to reduce
the number of service
they want to reduce the maximum
investment in
infrastructure
and they also
want to reduce the amount of electricity that's good.
Electricity today in the cloud world is still a small fraction for the CPUs and your regular
typical servers, a small fraction overall budget.
But in the case of AI, it's a bigger thing.
But they try to, because of fact, they're due to the economic model, they try to reduce
that.
But if you look at your typical IT customers, there are high, there are qualicators in the U.S.
as well.
They're multinational collicators like Digital Realty Acconic, Stack Infra.
Their IT customers, they're just basically,
if somebody writes a check and say, look, go buy servers and bring this thing up,
they're not looking at, can I do this with a third of the servers?
And I would save money on the servers.
I would save money on the electricity goes into them.
I would also save embodied emissions on the servers themselves.
Got it.
Yeah, I want to expand on this distinction between hyperscalers
and co-located clouds or neoclouds and so on.
And you specifically touched upon for standard data centers,
like CPUs and classic data center workloads.
We have about a few decades of understanding
and we can sort of annotate these details.
We have knock-in-math and we can work towards a system
where you get more clarity on the metrics.
If you switch gears to AI,
we are relatively in the early innings
of build-out of infrastructure for these services.
And, of course, hypers have been,
at the forefront of this in the last few years,
but increasingly over the last year or so,
we've been seeing a lot of neoclouds emerge as well, right?
So we have CoreWeave, you have a bunch of others.
So given your experience in standard data center, server racks, and so on,
what would be our best-case guidance as we go out building out the AI infrastructure?
Like what kind of metrics should people look at if you could sort of advise
like the neoclouds or the co-located clouds that are building out the AI infrastructure
what sort of visibility would you like to bring in?
And you touched upon utilization, right?
Like, yes, in the CPU world, we started off at very low utilization,
and then gradually as your software packing schemes got better,
you could co-locate workloads, variety of policies,
the utilization gradually went up.
But LLMs, we certainly know that,
especially for serving inference,
depending on the breakdown of pre-fill decode,
we can have very, very low utilization that could make your PUE numbers look good,
but maybe the utilization is still pretty bad.
So how would you go about, given your experience and learnings in the CPU side of the world,
as we move into the AI infrastructure side where you not really have CPUs, you have interconnects,
you have accelerators and more, what kind of visibility, clarity metrics would you like to see,
especially given that we're spending huge amounts of money, huge amounts of power into this new space?
That's a very good question.
First of all, in the cloud, not the GPUs, but the CPUs, even though hypers,
they run their first-party workloads at high-eal.
whatever containers they went, if they went to customers, those customers are not using
all the resources are the right thing. Of course, the cloud operator would like to
increase that utilization as much as possible because it would factor into an economic model
and if they can do more with less resources. But today we don't have the technologies to allow, for
example, though if you rent out a 4-gibite container, if the customer is not using all the 4
gigabytes, how do you give them the illusion that they're using the 4 gigabytes but not actually
allocate 4 gigabytes? This is a great opportunity for computer architecture students to look into
technologies that allow you with hardware software code design, emerging fabrics to make that
available. When you look at AI, GPUs have the same problem. So there's a paper from
Microsoft Asia, you can ICSI last year,
2024, where
they look at deployed AI
models, and they say
that the GPUs
have less than 50%
utilization. And there are a lot
of reasons why this is the KXM. You have the
Python stack. There are many
places where
their bottlenecks, for example, CPU,
GPU coordination of past,
movement of data. They do a great
job of breaking this down.
So, yes, we see this
also with GPUs.
With GPUs, of course,
GPUs are more expensive.
They also use a lot more power,
so utilization is more of an issue.
But finding ways to properly quantify utilization,
and then finally,
solutions that would increase utilization is a really good way to move forward.
Yeah.
Plus on that, I think there was a recent paper from Alibaba as well at SOSP this year,
where they talked about, you know,
their pooling system to improve GPU
utilizations, and they had some pretty eye-popping
numbers in their paper. But, yeah, this is
certainly an avenue that's under-explored in the
AI space. So I can
also, sort of this is going towards
research now, but I
we have a project here at
EPFO. We're starting on redefining
the future of a cloud rack
and there are emerging fabric.
So this is a great opportunity to get for
computer architecture students, perhaps working
with other people, like OS people,
maybe PL people, to redefine
the future of Iraq.
And a future rack
with these emerging fabrics,
NVL,
from Nvidia,
NeuronLink,
CXL to some extent,
we have the raw hardware capabilities
to bring,
you have their 42 units inside Iraq.
Right now,
these 42 units are
almost exact companies of each other.
The silicon is completely fragmented
because you have workloads,
the workloads have a completely diverse
requirements.
So instead of
creating this copy of the same server multiple times in the rack, you could disaggregate
the rack and connect it with a high-end fabric, one of these emerging fabrics.
And this is, I called Rack Duo, this is harder software pro-design and a great research opportunity
for computer architecture students.
The disaggregation increases utilization.
So it's an obvious way of addressing the problem.
Yes, this is making me think of something that you had said earlier, and they're kind of like
tangentially related. So earlier you were saying something about bottlenecks on
electricity utilization. And then for computer architectures, students and people, we tend to think
of bottlenecks in terms of, okay, we have this data that we're trying to, or this computation
that we're trying to get through, and there's something that's holding up the whole process.
And in this case, if you had really, really effective fabric and you could disaggregate all
resources, then you could compose and decompose and all that stuff. And you could stop this
computational bottleneck, which conceivably could help with electrical utilization. Is this what you
meant before by electrical electricity bottlenecks? Or is there something else? Because when you first
said it back then, I was like, in my mind, I pictured like electricity going down a wire and somehow
getting jammed up. And I was like, I don't understand what that means. So maybe you could talk about
that. So in the DC energy flow, and these are people, these are building operators. These are
like electrical and mechanical people, right? They're the ones who basically measure, they run and
measure their entire DC energy flow. DC energy flow is from the time electricity comes into
the building until it ends up in the racks. These guys are in charge of that. Once it goes into
the rack, it's not theirs, right? They worry about temperature, and that's not it, right? But it's all
of this electricity is coming in.
And so their job is basically
to, I can give you an example of
a hyperscale. Hyperscalers
apply to minimize
the electricity
that's used to come in
to basically get a PUE value
which is as close as one
can get to one. Right?
So what Peewees used to be
at around two, then around
a couple of tickets
ago, hyperscare started announcing.
RPUs are 1.05,
where really we lose very little
and the electricity gets delivered
to the rack itself.
That's what I mean by the energy bottle.
Like, taking that electricity
coming in from the outside of the campus
until it's delivered to the rack,
you want to maximize
the electricity that you deliver to the rack
from what's coming in and minimize
the loss. Those are energy
model like for the B.C. operators.
Once it goes into the IT, that's IT people, right?
The IT people, they have to basically figure out.
Now, the hyperscalers, first-party workloads,
and even third-party workloads,
they control the temperature.
They control everything, right?
With co-locators, it's up to the contract.
The IT customer says, look, we want to keep the temperature
between 20-2 and 26 from the bottom to the top of the rack.
Others have already shown you can operate at higher temperatures,
but some IT customers, they have these sort of industry standard mentality that we need to keep the server's cool.
We don't want them to fail.
Many are running at much higher temperatures.
So that's the energy loss that I'm talking about, is that to figure out how do you minimize and find a bottleneck in your energy flow,
so you maximize the electricity you're delivering to the racks.
I see.
Okay.
Yeah.
So mentally, the word bottleneck thing, they're like not exactly.
exactly the same from the, you know, like a computer architecture, computer architecture benchmarking to this.
It's almost like redirecting it and shunting it off to be used for something else.
Yeah, I pictured it as a slowdown.
And I was like, I don't know how you slow down electricity, but now it makes more sense.
It's a redirect to do it for something else to keep the programs running, but it ultimately is being diverted for some other purpose besides compute.
It's a loss. It's not a bottleneck.
Gotcha.
It's how much electricity you drop.
on the way to the wreck.
Yeah, makes sense.
Makes sense.
Okay.
Okay.
That helps because, yeah,
initially I was like,
I had a bottleneck.
What's an energy bottleneck?
Well, we had customers who were running at PUEO 1.6,
and we have an online calculator for DC energy flows.
They can go and enter their number,
whatever they measure.
And the calculator, it becomes obvious where they're losing electricity.
And they use the calculator.
the online calculator actually eliminate some of those losses,
and they can achieve much higher PUEs.
So much better, much better PUEs.
Got it.
And is this calculator part of your Swiss data energy efficiencies
or association tool, basically?
So, yeah, so Swiss Exenter Efficiency Associations,
and on-profit, it's partially subsidized by the Federal Office of Energy in Switzerland.
We have three online calculators.
One of them is for DC energy flows.
are already six pre-configured energy flows
and customers can also have their own custom energy flows
and they can submit and we can then help them calculate their efficiency.
And there is an IT calculator,
the IT calculator is mostly a set of KPIs that we've come up with,
which we believe are the first step to quantify efficiency,
the IT stack.
These are not things you can,
you're not measuring electricity here,
you're measuring utilization,
you're reporting your energy loss into PSUs through industry standards,
you're also reporting how well you've consolidated your workload.
So we're mostly looking at a set of KPIs we've come up with,
but it generates an index on the same scale as PUE.
So the IT customer can basically report their own energy efficiency for IT,
or they can combine their energy efficiency with the DC operator
and create one geometric mean index.
on the same scale.
And then we have an emissions calculator,
which basically is a source of emissions,
the source of electricity,
how much of it is based on coal,
how much of it based on hydro.
And based on that, again,
we report an index similar to COE,
the carbon usage effectiveness.
And depending on how much per kilowatt-hour CO2 emissions you have,
you can sort of identify your pollution,
problem. Yeah, this sounds like a great resource. I think it'll add a lot of just delineating
that these are the factors that go into various efficiency metrics and losses. We're super
helpful to people who are not familiar with these things. And as you said, coming up with a set
of KPIs that they can track, these are the higher order bits that you need to sort of pay attention
to. I think would be incredibly valuable. Yeah. These KPIs are also KPIs that you could
already instrument and software.
So we have a partner company in Germany that instruments servers, and they create a dashboard
for our APIs.
So these are not things that they're difficult to measure.
You can measure them with open source tools or you can use products that are available
in the market.
Wonderful.
And so you get like a layered approach.
Like you get the PUE at the data center level.
You get the IT energy efficiency metrics as well.
And then you also have the carbon dioxide emissions.
So you get all three of them visible to you,
and these are all available by instrumenting at the software there.
I wonder, this is a little bit out of left field,
but when I was at Microsoft, we thought about some of these sorts of things.
And one thing, at every layer, you have all sorts of margins baked in to the design,
because every layer is responsible for its own layer,
and you want to be careful.
And so then when you actually add up all the margins across,
you have like a whole lot of extra margin for everything with respect to the design that could
potentially, if you did some sort of cross-layer optimization, might be lower. And one thing that
jumps out to me is redundancy, right? So with respect to these sorts of data centers,
nobody ever has only a single data center, all their stuff in a single data center. It's just
a different scale than before. I wonder if there's any thought to sort of maybe almost like
ratifying data center.
where back in the day, you had one drive and you were worried about losing it, so then you had more.
But then that didn't seem efficient, so you do something like rate.
Now, I think a lot of times we have three data centers, maybe more, with a lot of redundancy.
And that is a lot of margin, like right off the top.
Are there thoughts on improving from at that level, or are we diving straight into the computer architecture, mesh fabrics, like,
composability type things.
It depends on who you're talking to,
but you're absolutely right.
The IT operators today of cold locators,
the customers are coal-ocatings,
they do care about
the reliability of a service,
the way RAID did, but they don't think about the cost.
Raid was a combination of reliability and costs.
And IT customers,
absolutely, if they
thought about how do I provide the same service
with same level of reliability with a lot less resources,
then you can get away with fewer servers,
you can get away with also less electricity.
So that's a good point.
I think we can all help at various levels of the stack.
A lot of the waste is at the higher levels of the stack, right now.
And I think, like I said,
hyperscalers are really good at squeezing most of that.
out. Once you look at what the hyperscalers are doing, I think you're getting to the point
where basically you're squeezing everything out of the server today. And the biggest problem
beyond that is that that server is just not designed correctly to run your services. You basically
took the personal computer of the 90s, which is exactly what I had at my desk was a master's in
Wisconsin with the operating system that runs on it. And now you're saying, okay, I'm going to take
this thing, better make it a building block.
This is super cheap, right?
So in 2024,
the worldwide investment in
data centers was $300 billion.
The cloud revenue was $700 billion.
So there's a lot
of money in that. Now, AI
is different because there's so much money
behind AI, but you could directly
go through much more expensive
technology. I think this is going to come for
the cloud. We're at a point
where we're getting, really, we're
squeezed everything out of the server. The,
Hardware is running in a nanosecond timescale, and it's a personal computer, and storage is basically the operating system are in the millisecond time scale, and everything in the data center is in the microsecond timescale.
So there's a complete mismatch with the building block of what we're building, not just the hardware, but also the software, the operating system, and what we're doing in the data system.
So I think for computer architecture students, for our community, there's a great opportunity to look at, okay, now if we're going to do this slowly,
from scratch, how would we build this for the micro-second time scale?
Gotcha. And do you still spend a lot of time advising students these days? Do you have
students working on these sorts of problems? Or are you mostly advising large entities and corporations
and associations, I suppose? So I spent a reasonable amount of my time talking to industry
mostly for best practices.
And then I do have.
I have a team back at UPFL.
We still do research.
We're doing fundamental research.
We're trying to look at some of these questions about how do you build cloud native servers.
How do you do it at the CPU level?
How do you do it at the memory level?
How do you look at beyond a single node?
What are the things we need to do to bring the resources together,
to use it more efficiently, so that's the segregated racks.
And the team is quite interesting because we used to do computer architecture,
so everybody was sort of a hardware person.
But now we're sort of bringing circuits, architecture, operating systems, databases, networking together, right?
So we have people that are working together on these topics to go forward.
Because if you just, if we do what we used to do in computer science, which is everybody was
working at their layer of the stack, we're not going to be able to solve this efficiency problem.
I think you've made a few pertinent points, which is that the prior design paradigm has sort of run
its course, especially if you look at the hypers, we have sort of squeezed every last inch of
efficiency, but we are moving towards a new paradigm. It looks like we need a clean slate approach
if we have to reap additional benefits. So there's a need for a new way of designing our computer
systems, and it needs to bring together multiple layers to the stack. It can't be like you
focus on just the architecture layer or just the operating system layer. You need to once again
have people talking to each other. Could you sketch, like, what's your vision? What are the design
principles that you would like to see embodied in this next iteration of computer systems
designed for the new data center? What are the design principles or pillars that you have,
that your research group is exploring towards this particular frontier? So I think we touched
upon looking at the new rack form factor.
So that's at the sort of larger scale of resources.
A lot of that is related to, like I said,
co-design.
Co-design is related to proper contracts
between software and hardware.
And some of these contracts are completely new.
For example, at the rack level,
if you want to have a memory access
that goes from one over to fast fabric
to another node, that would be
latencies that are orders of magnitude
larger than what CPUs can handle today.
So you need to find a way
to be able to
have an agreement with the operations and say,
look, if this thing is taking longer
than a certain distance, we need to be able to take
this context and put it away
and eventually bring it back
because the hardware is not going to be waiting there.
It's just the hardware basically blocked completely.
some of these were already revisited many, many years ago.
I remember, for example, the MITAO-wife back to CCMA days.
They were doing cash-cornernernerce software.
So depending on a common case, some of it was done in hardware,
but if something took longer, they had to, for example,
search a shared list to figure out where the block is.
It would trap to software.
So there were already hardware software contracts that were defined to do this.
So at the rack level, I think we need to look at what is proper hardware software code design,
what these contracts are that are new.
There are not contracts we dealt with before.
And using then those contracts, right, to properly support those in hardware.
Today, I think Google has already reported.
You spend 10 to 20% a few cycles are just contract switching.
A lot of OSCE that's called, they were referred to it as data center packs.
There's a lot of this sort of in the search for maximizing utilization of a socket,
we're running tens of 10,000 threats.
We're consolidating on a socket.
Now, what happens there is that we're basically spending a lot of time talking to the operating system.
There are a lot of opportunities for identifying operating system services that should be
millisecond time scale services, but there are a lot of things we can do.
do in hardware and software at the user level that do not need to be in the millisecond timescale.
We have an example of this case here where we show that you could draw on a single address
phase function as a service.
We actually perform memory isolation, proper memory utilization, hardware and software
with a user level library without talking to the operating system, right?
As you go down to the node, when you look at efficiency, we need proper metrics to a
establish how efficient our silicon is designed and it's operating.
And we're looking at CPUs today.
CPUs are historically, again, they've been desktop course.
And these desktop cores were basically put together at a piece of silicon.
And with that we added, we used a lot of S-RAM just to keep it cool.
We want to keep the power density of airport.
And these desktop cores are not proper course for the workloads that are running.
in the data center, but that was the cheapest way to build them.
The core to SRAM ratio has gone from what used to be 2 to 1 to now 1 to 3 in the latest
chips.
Chips are mostly S-RAM because the cords are basically sucking all the power.
You increase your silicon area so that you cool it down with air-cool it.
And we need to revisit this.
say, are we building the right course?
Are we running them at the right frequencies?
Maybe we can properly size these scores
and figure out what frequency we do want them at
so that we can flip this core to cash ratio
back to mostly logic.
Mostly electricity is going into running the program.
So there is a lot of opportunity here
to sort of properly figure out how to design
and operate even CPUs.
And you can do the other components in the server as well.
That's very interesting.
I never heard about this core to cash ratio,
particularly in the context of just having area for having the cores be the hotspots
and the caches sort of enable some area for cooling.
I wonder, so what you had said earlier is that there's potential for,
there have been some data to show that they're spending 10 to 20,
20% of time and context switching.
That almost makes you think that you could potentially,
like, there's got to be some sort of algebraic line that you could just draw to say,
well, if I slowed things down so that I don't have to wait so much that I don't have to
contact switch, it actually ends up being faster to accomplish my full task by not having
this overhead.
And so is this the kind of question that you're looking at right now with your students?
you need, because you have
multi-tenancy, you need isolation.
So you need protection,
but you don't want to get
the operating system involved, right?
Because the operating system is the operating system
on the 90s.
And in some sense,
it's like having a abstraction,
which we can bring
from the networking community, right?
You have, in the data center,
you need a control plane and a data plan.
The control plane are your CPU and your OS
basically establishing who can talk to who, making sure that it's isolated.
And the data plane is basically where the data moves and the work gets done.
And so, yeah, there are examples of basically what are the services you could do
without having to get the operands involved without even doing the switch.
But I think we need to look at this basically from a more fundamental way,
not just look at specific use cases and bring in a little bit of hardware software
go design.
We need to really look at what is it that the services are doing with hardware, and how do we
properly support that in our hardware design?
Yeah.
I guess what I was asking was, like, one of the reasons why you might context, which
is maybe this mismatch, it's almost like an impedance mismatch that you were talking about
where the cores are operating very fast, and then memory and storage are slower and
slower and then the operations, all this stuff is slow. So once you get to a point where
you run up against the roadblock, as of now, or the current paradigm is, the best thing to do
is just switch. And that context switch, as you're saying, is coming with a potentially 10 to 20
percent cost. So then you can imagine that if you did reduce the impedance mismatch, maybe by slowing
down the cores, as you suggested. I think I heard you say that, like maybe.
we slow down the course. So you reduce the impedance mismatch that allows you to like maybe you could
if you maybe if you reduce that then you don't have to pay this 10 to 20 percent tax and at some point
it actually ends up being faster. So I think that's what I thought I heard you say and then I was just
confirming but like it it didn't is that what you the kind of thing that you're looking at?
So it's important to think about where it is so this school's
back to 15 years ago or we did
whatever ended up being caveat on
the RANDAX. We actually looked at
what the workloads were doing, and
we weren't looking at SLO back then.
There was no SLO. It was just the workloads,
right? And we're saying,
look, we have this area budget, we have
this power budget. If we can properly design
the course, we can reclaim a lot
of this debt silicate. And we
did. We looked at building the first
64-bit arm or out of order. We took a
vortex 815, which was 32-biz, together
with arm, turned into a 64-bit
arm core and showed that with the area and power budget of that arm core, we can convert most
of the silicon to course. And the reason is that the chip only needs the instruction work
is to support the course. The data is on anyways. And the reason is that you have these back then
tens of gigabytes of data off chip. Now you have hundreds of gigabytes of data. The disparity between
on-chip memory, off-chip memory is several orders of magnitude. So you might as well properly
provision for your course, power and area, reclaim all the S-Rand, and make sure you just keep
your instruction work itself on chip. And by doing this, you will basically deliver with the
same amount of silicon electricity and order of magnitude increase in throughput. That's basically
what Kavium did, whether Thunder X. Back then, armed servers didn't happen, didn't take up,
because the entire Linux ecosystem for ARM was missing.
I had HP, Qualcomm, Samsung, AMV,
long list of companies that dived into ARR
and then they quickly left it.
Cavium eventually pivoted to HPC
because they couldn't sell servers to the cloud
with the Linux ecosystem back then.
And then Amazon and Huawei eventually built
an entire Linux ecosystem around Arm,
and now this year, Arm servers that are shipping
or over 25% of the market.
So this is reclaiming that silicon,
doing something better with that silicon.
And I think there's a lot that can be done.
And so now with SLO,
we don't really have a proper way of establishing
how well we're doing.
So there have been a lot of back and forward
about Winfew versus Brody course.
I know Earth Sports also load a blog about this.
If you read the block, it basically is an endorsement for better single-treat performance.
It's not really a quantification of the fact that single-threat performance is needed.
And the reason he's basically making that case back 15 years ago is saying, look, we're building these data centers.
A lot of our expenditure is in supporting software developers to debug.
Performance debug our software stacks.
And that's expensive.
We would much rather go to hardware, which is cheaper for us and go with a single-threat performance.
That means that if you have these high single-threat performance scores, you're going to need more servers, and they're willing to spend that.
But if you actually sit down and quantify SLO and figuring out how much you need, you can operate the chips with much lower core complexity and lower frequency as well.
And that's something that we need to do.
And that's on-gold work in my group right now.
Right.
Yeah, I think this has been a fascinating debate over the last 15 years or so,
at least, on Wimpy v. Brony Coors, like, how do you size these various components in the system and so on?
I think, pardon the pun, it's a fascinating threat to chase down.
But in the interest of time, I want to maybe switch gears a little bit.
You've been an invaluable steward for the computer architecture community,
you've served of the chair of Cigarch and so on.
maybe I'll just pick up on one, since we're talking about canonical debates in the computer
architecture community. Looking forward, you've seen the community evolve, tackle like various
challenges, go through various phase transitions as well. Given the current era that we are in,
what do you think will be the canonical debate for this era of computer architecture?
Well, in my specific area, data centers are out of energy. So we need to figure out how we're going
address this because it's impacting it's already impacting residential electricity costs it's impacting
the grids there are the the load on the grids are so high that they're getting these third
wave harmonics on people's light bolts are flickering at home right so we this is we need to
really look at tackle this energy problem in the data center space
I think mobile space, it's been the same.
Mobile has been always behind desktops.
Desktop was a volume sort of product for decades, thanks to Moore's Law.
And mobile was just sort of bringing technology from the desktop and reuse of yet.
Similar sort of workload, hardware mismatch we're seeing in the server space today.
We have in the mobile space.
I just went to a mobile technology conference and the,
The designers were telling me that we have apps that have three megabyte instruction working sets, right?
This is on a mobile SOC.
And so we need to, unfortunately, look at the mobile use case separately from the data center use case,
because these are both important and they don't actually necessarily have the same problem.
Some of the problems like the instruction work is that could be similar.
But others, for example, the future is going to be in sensory emergency.
with 6G later generations of Wi-Fi.
And we need to look at the killer apps and figure out how we can support that.
Most of the mobile problem would be the form factor because we can't put batteries
on our head.
The batteries have to be on our body somewhere so we can carry it.
But whatever is coming from around our head has to be able to filter, compress, sense,
and eventually communicate information to somebody,
some other note that's doing all the computation
and eventually connect to the back end.
I think we are going to,
we're entering a bit of a divided world in computer architecture.
And these are both important sort of use cases of computing
and we need to basically have proper debates about and support them.
Got it.
In your past instance as chair of Cigar,
And as part of other forums, we've had visioning workshops and the architecture 2030 white paper.
We've shepherded like multiple groups and worked with other leaders, professors in the field and so on.
As we sort of head into the 2030s, do you think this is an opportune moment to sort of take stock of since the last white paper, since the last decade,
which has been a flurry of a Cambrian explosion of different kind of ideas and new problems that have come into the fort as well.
Do you think there's a good time for the community to sort of reflect and think about where do we want to be in the 2030s?
What are the most pressing problems that we should try and tackle?
And how do we set a North Star for the community as a whole or the important problems, challenges facing our times?
Yeah, this is absolutely an important question.
So we get this in 2016.
We had an architecture 2030 workshop, and I think it's now about time we look at this again.
actually our vision workshops
as to regard, we ask
organizers to write a report
about what came out of it, which is
important. So we can also reflect
on that report and see how much
of that actually happened, how much of it hasn't happened
yet. We still have a few more years before 2030, but we definitely
need to redo this. Two challenges
here are for me, first of all, maybe the first
challenge is an opportunity. We need to reach out and
work with other communities, the layers of the stack in computing.
We cannot just have a computer architecture workshop and the future of computer architecture
because it's a future of computing.
Moro's law has sort of slowing down and the future is in code design.
So that's one challenge, which I think this would be an opportunity.
The second one is that we are quite broad in technologies that we're
recovering. There's a lot happening. And the challenge there is to find a way to harness all
this. The world is not the way it was when I was a PhD student computer architecture. So that's a
challenge that we need to basically address in any visioning for the future of computing.
Yeah, so maybe we can also take a little time now to talk about how you became that PhD student
all those years ago.
We also like to ask
our guests about their
computational origin story, I suppose.
So maybe you can share
with us your origin story
as we, as souvenir always says,
wind the clocks back.
Well, I come from a background
in the 70s where
I grew up. Everybody
was either a doctor or an engineer.
So electrical engineering was super hot.
So I wanted to be
in electrical engineering. And
And when I came to the U.S., my brother-in-law, who wasn't like community engineering, was also
getting a degree in computer science.
Computer science was hot back then.
It was this new field.
It wasn't as rigorous as computer engineering, but it has had its own sort of closer community
feeling because it was just growing.
It was becoming a field.
And so I decided to finish a degree in a community engineer and eventually also got a degree in
computer science. And beyond that, it was basically, I wanted to do computer engineering. And so
that's how I ended up going to Madison. I had offers from a few schools. At that time,
Madison had a really interesting energetic group of computer architects. Some of them had joined
recently at that time. And it looked like a good group of people could go work with.
Yeah. I mean, you came up in an era that was really exciting.
for computer architecture,
it felt like there were some,
I was telling Subunay,
it wasn't until I got maybe midway
through my career and looking back
is when you're a student,
when I was a student,
you're just trying to learn things as fast as you can
and read everything and just like,
I was still in that kind of like
read and absorb phase
rather than read
and really analyze and think
and whatever. So it got to me
a few years later when I was looking at it.
I was like, wow, there was some serious
really interesting fundamental debates that were going on in that era. And now we've gone through
a lot of churn, maybe like slightly winding back to what we were just talking about. We've gone
through a lot of churn. There's a lot of new stuff that has happened that has exploded on
the scene in the last decade. And there's been a lot of shifts in our community.
What do you, what do you imagine now as being one of like the fundamental debates that is going
on with us. I mean, there were definitely camps and team this and team that. These days,
it's, I guess when I was wearing up, there was team Edward and team Jacob. And now maybe there's
team, what's that show that the kids are watching? I feel so old saying this. The summer I turn
pretty, I think. I hear all these kids talking about team this or that. I forget the guy's
names. So for us now, like team risk, team sis, what are the teams now in your mind?
I think a lot of these things are circular and are coming back, right?
So I was part of this group of people, Stanford, Wisconsin, and MIT were building
multiprocessors because at that time, Stanford basically had announced that Webstar 4,000,
which is an order pipeline, was the fastest processor you would ever need.
You would never build anything.
The wimpy, broadie argument goes back to the mid-80s.
They started building shared our multi-prosters.
We jumped on that wagon as well.
And in 2010, Horowitz wrote a paper with a bunch of collaborators with Stanford.
He said, look, if you run the spec benchmarks in whatever technology node was at a time,
a two-way and order core is by far optimal.
There's a potential optimal frontier on core design,
the frequency and complexity, also sizing the components.
The way in order core is always superior to out-of-order cores.
And back in the 80s, Mark was saying that nobody would ever build an out-of-order core.
And out-of-order happened because of Moore's law, and we solved so much.
But if you go back and revisit that, and if you, today, if you care about efficiency,
And it's about performing for what performs for skilled silicon millimeter square of silicon.
Again, in order may be okay, they may actually win.
And so some of these questions, some of these debates never go away.
And we're looking at that right now at EPFO.
Many of these ideas, sys versus risk, they come back.
We, sysp could do whatever risk could do internally and it could make advantage of those risk
ideas. Now we're going to post more. We have to build accelerators. We're back to
some of those CISC ideas. Yeah, and around and around they go, I guess. That's true.
I mean, I remember in high school, I said something about history repeating itself to a history
teacher. And he was like, tell me more about what you mean. And things do circle around.
There's a big, it's a big pendulum. So, so, yeah.
Actually, I would like to also say that recently I went back and looked at the first
couple of Asplas proceedings, and those were super interesting because there was this huge
risk versus cyst debate at that time.
And you could see and read what people, the cis people were thinking and what the risk
people wanted to do.
And a lot of those ideas are relevant to that.
I completely agree with that.
And I think systems community is ripe with examples of these kind of things.
risk versus set complexity, this aggregation versus aggregation at various layers of the stack, right?
So these are constant debates. And sometimes it's not like what's the right answer for eternity.
It's like at this point in time, this set of tradeoffs makes a lot of sense. And then something changes,
either the technology changes, the application changes, or something else happens. And so you need to
revisit the assumptions or the conclusions of a particular era. And in a different era, you might have a
different conclusion, right, or a different set of ideas that sort of play well together.
Yeah. And I think going back,
to working together.
A lot of these harder software contracts,
we need to work with software people.
I think it was a little bit embarrassing
to find out that one of the best paper awards
of Cirodisco came from the OS community.
There's a group of amazing researchers, Cambridge,
basically came out and said,
look, the contract between very consistent models
when interacting with interrupts,
they're not defined at all.
And they actually showcase a bunch of products,
both within the same ISA and across ISAs,
that show that they're not supported properly,
and they get different behavior.
And this was the best paper.
And I think that was a great best paper.
But if you look at access dirty vets in virtual memory,
they're not well-defined.
X-86 does a much better job
that defining them than orange does.
So I think now moving forward,
because this code design is more important,
the contracts are super important between solid and hardware.
They have to be spelled out properly.
Yeah, completely agree with that.
I think the opportunities are there.
It's certainly a very interesting time.
And I think it's up to us and the broader community
to reach out and form those connections
across multiple layers of the stack,
whether it's architecture and compilers,
architecture and technology,
or it's architecture operating systems
and all the way across the stack.
I think there are plenty of opportunities,
very wise words indeed.
So on that note,
any other words of wisdom to our listeners
based on your long range of experience,
either technical or otherwise?
Well, this is the best time to be computer architecture.
There are absolutely no silver bullets.
We talked about when I was a master's student
and was multiprocessors versus out-of-water core.
Those were the only two things you could do.
Nowadays, sky is the limit in innovation,
and this is a great opportunity to be computer architecture.
I would have loved to be a grad student.
It could be architecture right now.
Yeah, there's certainly a lot of problems out there to be, to be solved.
And I think that is great because that means that the students can,
everything's right for the picking.
They can decide what they're interested in and go for it.
Instead of picking, dog piling on the hot topic,
Kim Hazelwood talked about that on our very first episode.
Don't dog pile.
There's a lot.
There's a lot.
Go for what you like.
So.
And I think we do have a relatively young audience.
So I hope they all take this to heart.
Cool.
Well, this was really wonderful, Bobak.
I think we had a really good conversation with you.
We talked about quite a lot of different things from maybe even like low-level silicon-type
ideas to very high-level data center, energy flow type ideas.
So, Ramit, appreciate your time.
I think this is a really fun conversation.
Thank you for having.
I had a great time talking to you.
Yeah, it was a fascinating conversation.
Thank you so much, Babak.
And for our listeners, thank you for being with us on the Computer Architecture podcast.
Till next time, it's goodbye from us.
