In The Arena by TechArena - AI, Data Centers, and Bare Metal Cloud: Insights from Phoenix NAP
Episode Date: September 16, 2024During our latest Data Insights podcast, sponsored by Solidigm, Ian McClarty of Phoenix NAP shares how AI is shaping data centers, discusses the rise of Bare Metal Cloud solutions, and more....
Transcript
Discussion (0)
Welcome to the Tech Arena, my name's Alison Klein, and this is another episode of
the Data Insight Series, which means my co-host, Janice Narowski, is back with us. Welcome back
to the program, Janice. How are you doing? Oh, thank you, Alison. I am doing great,
and it's such a pleasure to be back. So, Janice, what is the latest news
with Solidaim, and what have you been doing since the last time that we were on the podcast together?
Oh, gosh, you know, it's still all about AI.
And I love that there is just a ton of discussion around energy and power.
But we're also still hearing a lot about bare metal cloud solutions and how do I manage my cloud?
And AI is a hot topic there as well.
So I'm very excited to have Ian McLarty,
president of PhoenixNav, talk with us today. Welcome to the program, Ian. It's great to have
you here. Thank you both for the invite. Ian, why don't you just go ahead and give us an
introduction to yourself, your background in tech and PhoenixNav. Yeah, thank you. I've been in the
tech industry for over 25 years, always with a data center focus, hosting, infrastructure focus. And we got to a point where we had a critical mass for the metro of Phoenix. And we then needed a large scale location facility and ended up building our own. We were at that mass where it just made sense. And we wanted to do something different to the market. We really wanted to focus on connectivity. So our first foray was really into saying, what do we need to do as an organization to really bring as much telecom as we can to the metro, both from a, we needed a perspective and
also from just servicing the market better. We saw some deficiencies in the telco hotels that
were in the valley at the time. Fast forward today, we have over 40 distinct internet providers,
a lot of international providers in there that are unique to us. We house a lot of the core
infrastructure that services the rest of the many expanding wings.
The Phoenix Metro market is actually the second largest data center market now in the world
and growing still. More capacity, demand, multi-gigawatt facilities being built out,
multiple campuses, hundreds of acres. And we really sit right in the middle of all this and
offer connectivity solutions to the Metro. Along the way, because of our infrastructure services,
we also focus heavily on composable infrastructure. So in our vision,
we see a data center 2.0.
We call it bare metal cloud,
but it's a lot more than that
at this point.
It's a lot of composable
infrastructure that
we make it very easy
to absorb and also
to get dedicated infra,
not just from a
compute perspective,
but also from a GPU
and also storage perspective,
network perspective.
And it's all dedicated
to the end client.
And they can compose
this infrastructure
in multiple ways.
They can do it through interface. They can do it through API. They can do it through third-party modules, they can do it through code stacks of sorts as well. So there's a lot of
different ways to interface with our BMC product line. And we're very proud of the growth and also
the expansion it's had. And Solidigm has been a great partner to work with along the way when it
comes to multiple different types of storage solutions that we offer within the platform,
whether they be inside the server or they be actually composable stacks of really
fast file storage that they can do. And so we do offer a wide array of even how to compose
the pieces that we dedicate to you. Thank you, Ian. We too are big fans of partnering with you
guys. And just want to talk a little bit more about how, you know, PhoenixNAP obviously is a
leader infrastructure as a service and to include bare metal cloud service offerings as well.
But can you tell us your perspective on the service offerings and what customers are focused
on and tapping into today?
Yeah, great question.
When we look at our space and power, very traditional enterprises, large scale enterprises,
well-known brands.
When we look at our bare metal cloud offering, it's really next generation companies, software as a service companies that are building
out great platforms, great infrastructure stacks themselves from a software perspective and scale
up perspective. They typically come to us for a couple of different reasons. The first being
they start in public cloud and they need to scale out. And there's a cost basis or performance basis
that is usually too hard to overcome for them. Whether that be bandwidth, they need large-scale bandwidth,
and it's just hard to scale out costing in public cloud that way.
They need more dedicated infrastructure.
They're having problems with noisy neighbor issues.
They need really a sustainable workload and their actual hardware.
And so many times they come to us because of those needs.
They're like, look, we're seeing performance issues.
We're not being able to scale out.
We're having a cost basis issue here inside public cloud
that is just too hard to ignore at this point.
We need to really take the platform to the next level. And either it's costing too much to get the performance that we want, or we just can't get the performance that we desire.
And so we help them on multiple fronts to really take different types of infrastructure that we
provide. And they also have a lot of transparency into the infrastructure. So the troubleshooting,
the certification of that infrastructure is a lot easier for them. And once they really get
comfortable with how the instance types works, they really are able to now also have a lot more
say in a platform and be able to guide us and provide direct feedback and get the next generation
of their infra plan ahead as well. So we work very closely with clients on that front. We take a lot
of direct feedback, really listen to them and try to build the feature sets and also the functionality
around their needs. Ian, I'm so glad that you're talking about that next generation infrastructure
because it's something that's on top of mind
for everybody right now,
especially because we're grappling with the desire
for more AI capability and infrastructure.
And data centers are really feeling pressured by that.
What are the challenges that you're seeing
when delivering within your data center power envelopes
to deliver this core capability to customers?
And what changes are you seeing if you take a look at this from a system to rack to facility
perspective? Yeah, great question. Multiple fronts here as well. When it comes to the power profile,
one thing that we found very early on, we were very fortunate that because we're buying power
and infrastructure, unlike most operators are real estate companies.
We're not. We're a technology company.
And so our focus has really been around how to optimize infrastructure, how to deliver it in the fastest way possible, and how to make it so it's dedicated.
And one of the things that we looked at very early with AI was the power profile of the workload just sitting there, just idling out, just to turn it on.
And we found that half the allocated power to that workload, not even active, just powering up the systems, half the power was allocated came online.
So that means that just to turn on the units themselves, just to have them, because there's so much silicon, there's so much network, there's so much fiber there, there's a lot of storage typically in the larger arrays.
There's a power profile that is very heavy and it's heavy on idle, which is even a mobile problem.
Typically, when you see traditional idle workloads, they tend to be in the 10% less than that,
maybe 5% sometimes.
And now we're seeing them in the 50%.
So that creates a huge challenge down from a cooling
and also from a hardware delivery perspective.
So that means that you're only able to use 50%
of that workload for actual processing,
for actual GPU compute.
And a lot of people don't understand that.
They focus more on trying to get the fastest model out
as soon as they can.
And so the hardware industry is really performance-driven
right now, not optimization-driven,
even though a lot of these systems do support
and they do have that built in.
As an operator, as a data center builder,
we're actually building a new data center next door.
I'm going to create a bigger campus
to service our existing customer base,
but also to serve this better GPU workload as well.
It's going to be about four times bigger than what we have,
but we're having to take a lot more consideration
to not just the cooling aspect of the house
and higher densities and its smaller footprints, but also weight is becoming an issue.
So we have to do a very different type of design from a slab perspective, no more raised flooring.
It just won't scale out anymore. And also the cooling now, there's more water requests coming
through. So water as a resource coming through, even though we built a facility to utilize a
tent of the water and it's a minimum three times bigger of what we use today, we're still going to require water on the data center floor itself, which brings a whole slew
of other concerns. Number one, there's no set standards yet from the different hardware
manufacturers on water delivery mechanisms. There's no consensus yet on what's the best
type of water delivery to the floor. Also, you have potential risks factors in there,
like noisy neighbor issue, but think of it more from a water leak perspective. A lot of these
units are not built to be highly available or redundant.
Unfortunately, again, the market is really focused from a hardware manufacturer perspective on
performance, not on the highest availability factor and maintenance around these units.
And so when you're seeing units come in with pump houses that you can't even touch or get into,
unless you take the entire unit apart, that doesn't tell me it's mission critical ready.
It's really with a design of performance. And if you're looking at the background
of where a lot of the hardware manufacturing comes from,
it comes from really the high-performance compute clusters,
right, the research and development,
not mission-critical facilities,
not mission-critical mindset.
So there's a gap there that has to transform
and has to season out.
And the market will catch up over the next two, three years,
I would say it's going to catch up to the point
where availability should be taken into account,
performance, optimization,
better power density and profiles.
Those concepts do not even exist in a lot of these units that are being delivered today.
Yeah, I think as we grapple with the desire for more AI capability, data centers are really feeling that pressure.
What challenges delivering more within a data center's power envelope and what changes are you seeing from system to rack to facility? Yeah, so as I stated, what we're seeing is a lot more requests
for air cooling has a finite physical limitation. And typically it's around the, say, 60 kW range,
plus or minus a little bit, depending on the data center environment. When you're looking at rack
units that are coming in at 100 kW, 120 kW, there's no way you can air pull them anymore. And so you have to really look at different technology stacks.
Some of them tend to be more on dissipating heat as fast as possible. Others tend to be water
delivery or some kind of a cooling mechanism directly to the chip and other components. So
not just the CPU anymore, but also the GPU component, direct delivery to that, and also
the memory stacks as well. So even memory itself is actually getting water cooling down to that component level. Again, great for cooling purposes and
performance purposes, but it does create issues when it comes to potential leaks that will occur
at some point in time. You see these systems, they're meant to run 24-7, but at some point in
time, there will be a leak and they are extremely difficult to service themselves because of the
density of the units. They're so compact. If you look at the new NVIDIA clusters that are coming out within Broadwell, even the
fiber itself, the connectivity side of the house, it's not exactly serviceable.
They're basically like braided systems that are there together.
And so if you have one component go bad, you're basically losing capacity.
You're not able to maintain or service that.
It's built to run and operate.
And then once it gets to a threshold where there's so many problems with it, it gets
taken down at that point.
And that's a hard down.
That system is not going to be the same machine Critical, it's going to be down for weeks.
That's just being serviced in a very complex environment.
So that's one concern that we have is serviceability.
In our facility, we have a design philosophy for Mission Critical environments.
And Mission Critical really means that these systems cannot go down.
High availability, fault tolerance, increased maintainability.
These are concepts that are inherent in data center design and the way we also design our systems.
That has not necessarily carried over to AI yet.
And again, because the main goal of AI right now is performance.
Everybody's following AGI.
They're trying to get to the one application that you can't live without.
And it's almost like everybody's rushing towards that.
Forgetting about the other pieces that need to be addressed in later stages of optimization.
I'm glad that you've been talking about the fact that the focus is on performance
while we need to really focus the whole picture
in terms of how to drive scale with efficiency.
How do you see the industry working together
to improve compute efficiency
and core infrastructure capability
while delivering to this performance?
A necessity, I think is going to be it.
I think we will get to a point right now.
I just saw the latest JLR report as of this morning
on the entire industry. And they made a mention of us being in the near future. I'm talking about going to be it. I think we will get to a point right now. I just saw the latest JLR report as of this morning on the entire industry.
And they made a mention of us being in the near future.
I'm talking about a couple of years.
11% or so of all power realization in the U.S. is going to come from data centers.
11% plus.
That's significant that legislation is going to get involved soon.
Right now, again, there's a rush and there's movement.
Metros are changing power.
So as a metro runs out of power, other burdens are being put onto the operators, whether
they need to go to areas that are very rural in the middle of nowhere.
That has its own set of challenges as well.
So as you think about that, think of like things in a big city that are not available
in rural areas, like 911, not necessarily available.
Life safety things, fire departments, you know, not necessarily available.
So that will put a massive strain on the industry and also on the metros and the rural areas.
There's going to be, and right now this is happening,
where you may be able to find, in quotes,
air quotes saying cheap power,
but then you have to build transmission lines.
You have to build roads infrastructure.
You have to support the local economy
to be able to put in a fire department
and to be able to put in a 911 dispatch.
These are basic life safety things
that data centers need.
Logistical issues, right?
Along the way, you need to have road structures that can hold semi-trucks for equipment coming in
and out. The data centers are not just built once and forgotten. They're a very live environment.
There's activity going on all the time. You have constant maintenance. You have constant equipment
being migrated in and out. Upgrades are happening all the time. So you need a really good logistical
system to be able to manage that. And if the infrastructure is not there, you have to, the
burden is better put back on the operator.
And so that cheap power,
at the end is,
if you want to do a TCO analysis,
it's not cheap power.
It's highly expensive power,
which is, again,
driving a lot of the rate
structures up, unfortunately.
So the market's getting
more expensive.
And so, out of necessity,
everybody's going to have
to pitch in.
The hardware manufacturer
is going to have to rethink
how the deployment stacks
are happening,
how, again,
we're all following
performance right now.
But are there ways
to educate the market better, saying, hey, if this all following performance right now, but are there ways, you know, to educate the market better saying,
hey, if this model runs over the weekend
instead of trying to get it done
in a matter of a couple hours,
is that acceptable?
If we can get the power profile
now down to a third or a fourth
of what it used to be
by waiting a little bit longer
for that model to come out,
is that acceptable to your business,
to your use case or your technology case?
And so there's definitely
like a whole education to the market
in general needs to happen.
And the manufacturers themselves, again, because of power profiles and
densities, they're going to have to go after an optimization methodology, right? They're really
going to have to think, how do I optimize and scale this thing out in a more cost effective
and prudent manner? Yeah, those are all amazing examples. I want to dive a little bit more into
though, in what is the role of Phoenix that really taking to help drive advancements in this space?
And how does that involve engaging vendors?
Yeah, great question.
We are very active in the vendor engagement, specifically with folks like Solid Dime.
I'm very appreciative of the fact that folks like Solid Dime that are really trying to take a leadership status in the industry listen to us.
There are advisory councils that are set up.
And so we are able to get direct feedback and talk about and raise some of our challenges.
And also, it's a good mindshare scenario.
And it's not just us doing it on our own.
It's typically a peer group of folks like us that are in the industry that are doing it together.
And so we all have similar challenges.
And manufacturers like Solidigm are key to that, that are listening.
And manufacturing isn't just able to take that feedback and make it happen the next day.
No, it's a multi-year process.
And so if we don't start making the work now, we're not going to be prepared in three years
when that necessity really needs to be paid at that point.
And so vocalizing,
be more vocal to your manufacturer and to your vendor,
asking questions, asking probing questions.
Hey, I noticed that that cluster I just bought from you
has guarding controls.
Can you tell me more about that?
Can you explain to me how that works?
I will tell you from a sales process perspective,
a lot of these manufacturers don't even tell you that.
They don't even educate you
that you can put them in power profiles. Already, the technology is
already built out. It's already there. And so there's also a lot of learning that has to happen
and a lot of optimization and education to the end user. There's also going to be a lot more,
I would say, with better tool sets, especially AI, right? So it's funny that this is going to
create challenges, but it's going to solve things. Better software, optimization of software itself.
The result nuance is in software developments, get the code out as fast as you can to the
business so we can meet the requirement need, right?
Get that business outcome done.
Nobody said anything about optimization there.
It's about getting a business outcome done.
I think there's going to be a whole market ecosystem of AI alone that's going to be able
to optimize existing code.
Historically, it's been a lot cheaper just to upgrade the hardware.
Hey, our database is not running fast.
Get some new hardware in place.
Get some new storage systems, right? Which there's always gonna be a necessity
for better performance. Don't get me wrong, but there's also, I think, going to be a growing
ecosystem of software optimization that's going to happen in the industry. And so once that starts
happening, that's going to be a better, a lower power profile. We'll still be able to meet the
business outcome needs, but with more optimization in mind from the get-go. When you think about the
path ahead, and obviously there's going to be a
lot of capacity required from customers, how do you see that capacity coming between Greenfield
and Brownfield? Yeah. And let's just step back for the audience, right? When somebody says a
Brownfield system, it's taking an existing building. When you say Greenfield, it's taking
it from the ground up and to be purpose-built. Very big different philosophies.
And they both have their place.
Again, necessity, right?
So typically, brownfields are done in metros where there's density constraints.
So think of Singapore.
You're typically going to be looking at a lot of brownfield out there.
If you're lucky, you might find some land and be able to get a greenfield system going
for new purpose-built data systems, data centers.
So what you're doing is you're taking existing buildings or large warehouses that have large
power allocations.
Folks are always looking for that.
There is definitely a market for that and a need for that.
You work within the constraints of those four walls.
You don't necessarily are able to do what's best from a data center perspective.
Now, when you talk about purpose-built buildings that are meant to be data centers, you are
not worrying about office space, parking lots.
You're worrying about power delivery, cooling, how to optimize that building the best way
you can so you can get the most power in there and the most energy efficiency.
And on top of that, the best cooling mechanisms you can so that you can cool down these units
to a respectable, manageable state.
And so it's a very different design philosophy.
Folks that are doing the gigawatt facilities, we were very fortunate that we were able to
acquire land that was a greenfield.
And again, go through a purpose-built design philosophy versus our existing building, which is more of a brownfield philosophy. We've been very fortunate
in that. What you're going to find is that the large-scale gigawatt projects, they're typically
going to be greenfield projects. Because again, they're buying acres and acres and they're planning
out systems to be not just one data center, but dozens of data centers together in a cluster.
They're servicing maybe one tenant, maybe 12 tenants. So it's a very different type of design
within that even ecosystem of a greenfield.
Because you can do more creative things by doing that.
You can say, hey, maybe our roads should be a little bit wider because we know we're going to do a lot of semi-truck in here.
Hey, maybe our logistical system should be a little different or the way we do the actual delivery to the data center, right to the back of the house.
Because there's a lot of equipment delivery that happens at both sides.
I don't think real life's enough.
Last thing you want to do is have bottlenecks and traffic.
And so you're not servicing regular cars, you're servicing semi-trucks. So your top philosophy is very different there as
well, how you deploy that. So the greenfield market is obviously the ideal state, but it's
not available everywhere. Some metros just don't have it. New York City, you're not doing greenfields
there. I mean, very unlikely. And even if you're fortunate enough to do a greenfield, you'll be
limited by all of the other burdens that are around your height restrictions. You are not
able to get your road infrastructure redone. So it's a very light type of greenfield versus more rural areas that exist are able to
tap into that greenfield and do a data center purpose built for data center campuses, right?
At this point, we're talking about campuses, not just single buildings anymore.
Yeah. And speaking about all of that infrastructure, Ian, I like that you gave so many good
examples there. I'd love to just switch gears a little bit, but can you share some additional insights
on how your organization is really using
Solidigm SSDs in your particular data center operations?
Yeah, it's a core component
to the delivery of a bare metal.
So every system that we have in the bare metal side
has Solidigm drives in it.
And we are very proud of the fact
that we work with Solidigm,
have been a very easy company to work with.
It listens to us from a feedback perspective, works with us when it comes to customer, like any kind of customer issue or concern.
Gives us a good logic back and also engineering, right?
A lot of the folks that we work with, they tend to be a little more advanced when it comes to doing their own performance profiling.
And maybe they're not doing it right.
Maybe they're doing benchmarking that is skewed a certain way.
And so being able to work closely with engineering shops on both sides, right?
On the manufacturing side and also with the client side, and being able to be
a kind of broker with those conversations is very helpful for us.
And so we get a lot of support from SolidLine there, and
we're appreciative of that.
We also like the fact that it's a good balance between, again,
performance, availability.
So there's definitely mission criticality to the drives that we buy.
We want these systems to last.
And also our customers, once they certify on a platform, they like the platform.
They like to stick around.
They don't want to go through and make massive changes for that generation of certification.
The next generation of certification, different story.
Now they have learned their lessons and now they want to apply them.
So we have a constant feedback loop back to folks like Solid Diamond and that's helpful as well.
That way our customers as a whole, they, in aggregate, they get voice and we're able to supply that. And for us, it's been a very mission critical piece. And also some of the manufacturers
that we use that are not direct. So we do have other organizations that we work with that are
providing storage services that are maybe built for attached file storage, as an example, within
Bare Metal Cloud, that's composable infrastructure. Those vendors are also using Solid M in the back
end, which is also cool. It standardizes our performance
and our work there
with SolidM as a manufacturer.
Ian, it was great catching up
with you today.
And I've learned a ton
about Bare Metal Cloud
and how it's transforming
in the AI era.
Thank you so much
for sharing your insights.
If folks want to follow along with you
and discuss Phoenix Snaps offerings
with your team,
where should they go
for more information?
Yeah, pretty easy to get a hold of.
We have people available 24-7.
Sales at phoenixnap.com is typically a good place to start.
I'm also on LinkedIn.
Really, it's relatively easy to find.
So if somebody has good questions,
just feel free to chime over.
I'm pretty active there as well.
Awesome.
Thank you so much for being on the program today.
And Janice, that wraps another edition of Data Insights.
Thanks so much for being with me today. Amazing. Thank you so much, Allison.
Thank you both.
Thanks for joining the Tech Arena. Subscribe and engage at our website,
thetecharena.net. All content is copyright by The Tech Arena.