In The Arena by TechArena - AI, Data Centers, and Bare Metal Cloud: Insights from Phoenix NAP

Episode Date: September 16, 2024

During our latest Data Insights podcast, sponsored by Solidigm, Ian McClarty of Phoenix NAP shares how AI is shaping data centers, discusses the rise of Bare Metal Cloud solutions, and more....

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Tech Arena, my name's Alison Klein, and this is another episode of the Data Insight Series, which means my co-host, Janice Narowski, is back with us. Welcome back to the program, Janice. How are you doing? Oh, thank you, Alison. I am doing great, and it's such a pleasure to be back. So, Janice, what is the latest news with Solidaim, and what have you been doing since the last time that we were on the podcast together? Oh, gosh, you know, it's still all about AI. And I love that there is just a ton of discussion around energy and power. But we're also still hearing a lot about bare metal cloud solutions and how do I manage my cloud?
Starting point is 00:01:00 And AI is a hot topic there as well. So I'm very excited to have Ian McLarty, president of PhoenixNav, talk with us today. Welcome to the program, Ian. It's great to have you here. Thank you both for the invite. Ian, why don't you just go ahead and give us an introduction to yourself, your background in tech and PhoenixNav. Yeah, thank you. I've been in the tech industry for over 25 years, always with a data center focus, hosting, infrastructure focus. And we got to a point where we had a critical mass for the metro of Phoenix. And we then needed a large scale location facility and ended up building our own. We were at that mass where it just made sense. And we wanted to do something different to the market. We really wanted to focus on connectivity. So our first foray was really into saying, what do we need to do as an organization to really bring as much telecom as we can to the metro, both from a, we needed a perspective and also from just servicing the market better. We saw some deficiencies in the telco hotels that were in the valley at the time. Fast forward today, we have over 40 distinct internet providers,
Starting point is 00:01:58 a lot of international providers in there that are unique to us. We house a lot of the core infrastructure that services the rest of the many expanding wings. The Phoenix Metro market is actually the second largest data center market now in the world and growing still. More capacity, demand, multi-gigawatt facilities being built out, multiple campuses, hundreds of acres. And we really sit right in the middle of all this and offer connectivity solutions to the Metro. Along the way, because of our infrastructure services, we also focus heavily on composable infrastructure. So in our vision, we see a data center 2.0.
Starting point is 00:02:26 We call it bare metal cloud, but it's a lot more than that at this point. It's a lot of composable infrastructure that we make it very easy to absorb and also to get dedicated infra,
Starting point is 00:02:34 not just from a compute perspective, but also from a GPU and also storage perspective, network perspective. And it's all dedicated to the end client. And they can compose
Starting point is 00:02:43 this infrastructure in multiple ways. They can do it through interface. They can do it through API. They can do it through third-party modules, they can do it through code stacks of sorts as well. So there's a lot of different ways to interface with our BMC product line. And we're very proud of the growth and also the expansion it's had. And Solidigm has been a great partner to work with along the way when it comes to multiple different types of storage solutions that we offer within the platform, whether they be inside the server or they be actually composable stacks of really fast file storage that they can do. And so we do offer a wide array of even how to compose
Starting point is 00:03:13 the pieces that we dedicate to you. Thank you, Ian. We too are big fans of partnering with you guys. And just want to talk a little bit more about how, you know, PhoenixNAP obviously is a leader infrastructure as a service and to include bare metal cloud service offerings as well. But can you tell us your perspective on the service offerings and what customers are focused on and tapping into today? Yeah, great question. When we look at our space and power, very traditional enterprises, large scale enterprises, well-known brands.
Starting point is 00:03:42 When we look at our bare metal cloud offering, it's really next generation companies, software as a service companies that are building out great platforms, great infrastructure stacks themselves from a software perspective and scale up perspective. They typically come to us for a couple of different reasons. The first being they start in public cloud and they need to scale out. And there's a cost basis or performance basis that is usually too hard to overcome for them. Whether that be bandwidth, they need large-scale bandwidth, and it's just hard to scale out costing in public cloud that way. They need more dedicated infrastructure. They're having problems with noisy neighbor issues.
Starting point is 00:04:12 They need really a sustainable workload and their actual hardware. And so many times they come to us because of those needs. They're like, look, we're seeing performance issues. We're not being able to scale out. We're having a cost basis issue here inside public cloud that is just too hard to ignore at this point. We need to really take the platform to the next level. And either it's costing too much to get the performance that we want, or we just can't get the performance that we desire. And so we help them on multiple fronts to really take different types of infrastructure that we
Starting point is 00:04:36 provide. And they also have a lot of transparency into the infrastructure. So the troubleshooting, the certification of that infrastructure is a lot easier for them. And once they really get comfortable with how the instance types works, they really are able to now also have a lot more say in a platform and be able to guide us and provide direct feedback and get the next generation of their infra plan ahead as well. So we work very closely with clients on that front. We take a lot of direct feedback, really listen to them and try to build the feature sets and also the functionality around their needs. Ian, I'm so glad that you're talking about that next generation infrastructure because it's something that's on top of mind
Starting point is 00:05:07 for everybody right now, especially because we're grappling with the desire for more AI capability and infrastructure. And data centers are really feeling pressured by that. What are the challenges that you're seeing when delivering within your data center power envelopes to deliver this core capability to customers? And what changes are you seeing if you take a look at this from a system to rack to facility
Starting point is 00:05:33 perspective? Yeah, great question. Multiple fronts here as well. When it comes to the power profile, one thing that we found very early on, we were very fortunate that because we're buying power and infrastructure, unlike most operators are real estate companies. We're not. We're a technology company. And so our focus has really been around how to optimize infrastructure, how to deliver it in the fastest way possible, and how to make it so it's dedicated. And one of the things that we looked at very early with AI was the power profile of the workload just sitting there, just idling out, just to turn it on. And we found that half the allocated power to that workload, not even active, just powering up the systems, half the power was allocated came online. So that means that just to turn on the units themselves, just to have them, because there's so much silicon, there's so much network, there's so much fiber there, there's a lot of storage typically in the larger arrays.
Starting point is 00:06:17 There's a power profile that is very heavy and it's heavy on idle, which is even a mobile problem. Typically, when you see traditional idle workloads, they tend to be in the 10% less than that, maybe 5% sometimes. And now we're seeing them in the 50%. So that creates a huge challenge down from a cooling and also from a hardware delivery perspective. So that means that you're only able to use 50% of that workload for actual processing,
Starting point is 00:06:36 for actual GPU compute. And a lot of people don't understand that. They focus more on trying to get the fastest model out as soon as they can. And so the hardware industry is really performance-driven right now, not optimization-driven, even though a lot of these systems do support and they do have that built in.
Starting point is 00:06:49 As an operator, as a data center builder, we're actually building a new data center next door. I'm going to create a bigger campus to service our existing customer base, but also to serve this better GPU workload as well. It's going to be about four times bigger than what we have, but we're having to take a lot more consideration to not just the cooling aspect of the house
Starting point is 00:07:04 and higher densities and its smaller footprints, but also weight is becoming an issue. So we have to do a very different type of design from a slab perspective, no more raised flooring. It just won't scale out anymore. And also the cooling now, there's more water requests coming through. So water as a resource coming through, even though we built a facility to utilize a tent of the water and it's a minimum three times bigger of what we use today, we're still going to require water on the data center floor itself, which brings a whole slew of other concerns. Number one, there's no set standards yet from the different hardware manufacturers on water delivery mechanisms. There's no consensus yet on what's the best type of water delivery to the floor. Also, you have potential risks factors in there,
Starting point is 00:07:41 like noisy neighbor issue, but think of it more from a water leak perspective. A lot of these units are not built to be highly available or redundant. Unfortunately, again, the market is really focused from a hardware manufacturer perspective on performance, not on the highest availability factor and maintenance around these units. And so when you're seeing units come in with pump houses that you can't even touch or get into, unless you take the entire unit apart, that doesn't tell me it's mission critical ready. It's really with a design of performance. And if you're looking at the background of where a lot of the hardware manufacturing comes from,
Starting point is 00:08:07 it comes from really the high-performance compute clusters, right, the research and development, not mission-critical facilities, not mission-critical mindset. So there's a gap there that has to transform and has to season out. And the market will catch up over the next two, three years, I would say it's going to catch up to the point
Starting point is 00:08:21 where availability should be taken into account, performance, optimization, better power density and profiles. Those concepts do not even exist in a lot of these units that are being delivered today. Yeah, I think as we grapple with the desire for more AI capability, data centers are really feeling that pressure. What challenges delivering more within a data center's power envelope and what changes are you seeing from system to rack to facility? Yeah, so as I stated, what we're seeing is a lot more requests for air cooling has a finite physical limitation. And typically it's around the, say, 60 kW range, plus or minus a little bit, depending on the data center environment. When you're looking at rack
Starting point is 00:09:01 units that are coming in at 100 kW, 120 kW, there's no way you can air pull them anymore. And so you have to really look at different technology stacks. Some of them tend to be more on dissipating heat as fast as possible. Others tend to be water delivery or some kind of a cooling mechanism directly to the chip and other components. So not just the CPU anymore, but also the GPU component, direct delivery to that, and also the memory stacks as well. So even memory itself is actually getting water cooling down to that component level. Again, great for cooling purposes and performance purposes, but it does create issues when it comes to potential leaks that will occur at some point in time. You see these systems, they're meant to run 24-7, but at some point in time, there will be a leak and they are extremely difficult to service themselves because of the
Starting point is 00:09:41 density of the units. They're so compact. If you look at the new NVIDIA clusters that are coming out within Broadwell, even the fiber itself, the connectivity side of the house, it's not exactly serviceable. They're basically like braided systems that are there together. And so if you have one component go bad, you're basically losing capacity. You're not able to maintain or service that. It's built to run and operate. And then once it gets to a threshold where there's so many problems with it, it gets taken down at that point.
Starting point is 00:10:03 And that's a hard down. That system is not going to be the same machine Critical, it's going to be down for weeks. That's just being serviced in a very complex environment. So that's one concern that we have is serviceability. In our facility, we have a design philosophy for Mission Critical environments. And Mission Critical really means that these systems cannot go down. High availability, fault tolerance, increased maintainability. These are concepts that are inherent in data center design and the way we also design our systems.
Starting point is 00:10:25 That has not necessarily carried over to AI yet. And again, because the main goal of AI right now is performance. Everybody's following AGI. They're trying to get to the one application that you can't live without. And it's almost like everybody's rushing towards that. Forgetting about the other pieces that need to be addressed in later stages of optimization. I'm glad that you've been talking about the fact that the focus is on performance while we need to really focus the whole picture
Starting point is 00:10:47 in terms of how to drive scale with efficiency. How do you see the industry working together to improve compute efficiency and core infrastructure capability while delivering to this performance? A necessity, I think is going to be it. I think we will get to a point right now. I just saw the latest JLR report as of this morning
Starting point is 00:11:04 on the entire industry. And they made a mention of us being in the near future. I'm talking about going to be it. I think we will get to a point right now. I just saw the latest JLR report as of this morning on the entire industry. And they made a mention of us being in the near future. I'm talking about a couple of years. 11% or so of all power realization in the U.S. is going to come from data centers. 11% plus. That's significant that legislation is going to get involved soon. Right now, again, there's a rush and there's movement. Metros are changing power.
Starting point is 00:11:21 So as a metro runs out of power, other burdens are being put onto the operators, whether they need to go to areas that are very rural in the middle of nowhere. That has its own set of challenges as well. So as you think about that, think of like things in a big city that are not available in rural areas, like 911, not necessarily available. Life safety things, fire departments, you know, not necessarily available. So that will put a massive strain on the industry and also on the metros and the rural areas. There's going to be, and right now this is happening,
Starting point is 00:11:48 where you may be able to find, in quotes, air quotes saying cheap power, but then you have to build transmission lines. You have to build roads infrastructure. You have to support the local economy to be able to put in a fire department and to be able to put in a 911 dispatch. These are basic life safety things
Starting point is 00:12:02 that data centers need. Logistical issues, right? Along the way, you need to have road structures that can hold semi-trucks for equipment coming in and out. The data centers are not just built once and forgotten. They're a very live environment. There's activity going on all the time. You have constant maintenance. You have constant equipment being migrated in and out. Upgrades are happening all the time. So you need a really good logistical system to be able to manage that. And if the infrastructure is not there, you have to, the burden is better put back on the operator.
Starting point is 00:12:26 And so that cheap power, at the end is, if you want to do a TCO analysis, it's not cheap power. It's highly expensive power, which is, again, driving a lot of the rate structures up, unfortunately.
Starting point is 00:12:33 So the market's getting more expensive. And so, out of necessity, everybody's going to have to pitch in. The hardware manufacturer is going to have to rethink how the deployment stacks
Starting point is 00:12:40 are happening, how, again, we're all following performance right now. But are there ways to educate the market better, saying, hey, if this all following performance right now, but are there ways, you know, to educate the market better saying, hey, if this model runs over the weekend instead of trying to get it done
Starting point is 00:12:49 in a matter of a couple hours, is that acceptable? If we can get the power profile now down to a third or a fourth of what it used to be by waiting a little bit longer for that model to come out, is that acceptable to your business,
Starting point is 00:12:59 to your use case or your technology case? And so there's definitely like a whole education to the market in general needs to happen. And the manufacturers themselves, again, because of power profiles and densities, they're going to have to go after an optimization methodology, right? They're really going to have to think, how do I optimize and scale this thing out in a more cost effective and prudent manner? Yeah, those are all amazing examples. I want to dive a little bit more into
Starting point is 00:13:19 though, in what is the role of Phoenix that really taking to help drive advancements in this space? And how does that involve engaging vendors? Yeah, great question. We are very active in the vendor engagement, specifically with folks like Solid Dime. I'm very appreciative of the fact that folks like Solid Dime that are really trying to take a leadership status in the industry listen to us. There are advisory councils that are set up. And so we are able to get direct feedback and talk about and raise some of our challenges. And also, it's a good mindshare scenario.
Starting point is 00:13:44 And it's not just us doing it on our own. It's typically a peer group of folks like us that are in the industry that are doing it together. And so we all have similar challenges. And manufacturers like Solidigm are key to that, that are listening. And manufacturing isn't just able to take that feedback and make it happen the next day. No, it's a multi-year process. And so if we don't start making the work now, we're not going to be prepared in three years when that necessity really needs to be paid at that point.
Starting point is 00:14:08 And so vocalizing, be more vocal to your manufacturer and to your vendor, asking questions, asking probing questions. Hey, I noticed that that cluster I just bought from you has guarding controls. Can you tell me more about that? Can you explain to me how that works? I will tell you from a sales process perspective,
Starting point is 00:14:21 a lot of these manufacturers don't even tell you that. They don't even educate you that you can put them in power profiles. Already, the technology is already built out. It's already there. And so there's also a lot of learning that has to happen and a lot of optimization and education to the end user. There's also going to be a lot more, I would say, with better tool sets, especially AI, right? So it's funny that this is going to create challenges, but it's going to solve things. Better software, optimization of software itself. The result nuance is in software developments, get the code out as fast as you can to the
Starting point is 00:14:47 business so we can meet the requirement need, right? Get that business outcome done. Nobody said anything about optimization there. It's about getting a business outcome done. I think there's going to be a whole market ecosystem of AI alone that's going to be able to optimize existing code. Historically, it's been a lot cheaper just to upgrade the hardware. Hey, our database is not running fast.
Starting point is 00:15:03 Get some new hardware in place. Get some new storage systems, right? Which there's always gonna be a necessity for better performance. Don't get me wrong, but there's also, I think, going to be a growing ecosystem of software optimization that's going to happen in the industry. And so once that starts happening, that's going to be a better, a lower power profile. We'll still be able to meet the business outcome needs, but with more optimization in mind from the get-go. When you think about the path ahead, and obviously there's going to be a lot of capacity required from customers, how do you see that capacity coming between Greenfield
Starting point is 00:15:31 and Brownfield? Yeah. And let's just step back for the audience, right? When somebody says a Brownfield system, it's taking an existing building. When you say Greenfield, it's taking it from the ground up and to be purpose-built. Very big different philosophies. And they both have their place. Again, necessity, right? So typically, brownfields are done in metros where there's density constraints. So think of Singapore. You're typically going to be looking at a lot of brownfield out there.
Starting point is 00:15:55 If you're lucky, you might find some land and be able to get a greenfield system going for new purpose-built data systems, data centers. So what you're doing is you're taking existing buildings or large warehouses that have large power allocations. Folks are always looking for that. There is definitely a market for that and a need for that. You work within the constraints of those four walls. You don't necessarily are able to do what's best from a data center perspective.
Starting point is 00:16:15 Now, when you talk about purpose-built buildings that are meant to be data centers, you are not worrying about office space, parking lots. You're worrying about power delivery, cooling, how to optimize that building the best way you can so you can get the most power in there and the most energy efficiency. And on top of that, the best cooling mechanisms you can so that you can cool down these units to a respectable, manageable state. And so it's a very different design philosophy. Folks that are doing the gigawatt facilities, we were very fortunate that we were able to
Starting point is 00:16:39 acquire land that was a greenfield. And again, go through a purpose-built design philosophy versus our existing building, which is more of a brownfield philosophy. We've been very fortunate in that. What you're going to find is that the large-scale gigawatt projects, they're typically going to be greenfield projects. Because again, they're buying acres and acres and they're planning out systems to be not just one data center, but dozens of data centers together in a cluster. They're servicing maybe one tenant, maybe 12 tenants. So it's a very different type of design within that even ecosystem of a greenfield. Because you can do more creative things by doing that.
Starting point is 00:17:08 You can say, hey, maybe our roads should be a little bit wider because we know we're going to do a lot of semi-truck in here. Hey, maybe our logistical system should be a little different or the way we do the actual delivery to the data center, right to the back of the house. Because there's a lot of equipment delivery that happens at both sides. I don't think real life's enough. Last thing you want to do is have bottlenecks and traffic. And so you're not servicing regular cars, you're servicing semi-trucks. So your top philosophy is very different there as well, how you deploy that. So the greenfield market is obviously the ideal state, but it's not available everywhere. Some metros just don't have it. New York City, you're not doing greenfields
Starting point is 00:17:35 there. I mean, very unlikely. And even if you're fortunate enough to do a greenfield, you'll be limited by all of the other burdens that are around your height restrictions. You are not able to get your road infrastructure redone. So it's a very light type of greenfield versus more rural areas that exist are able to tap into that greenfield and do a data center purpose built for data center campuses, right? At this point, we're talking about campuses, not just single buildings anymore. Yeah. And speaking about all of that infrastructure, Ian, I like that you gave so many good examples there. I'd love to just switch gears a little bit, but can you share some additional insights on how your organization is really using
Starting point is 00:18:08 Solidigm SSDs in your particular data center operations? Yeah, it's a core component to the delivery of a bare metal. So every system that we have in the bare metal side has Solidigm drives in it. And we are very proud of the fact that we work with Solidigm, have been a very easy company to work with.
Starting point is 00:18:24 It listens to us from a feedback perspective, works with us when it comes to customer, like any kind of customer issue or concern. Gives us a good logic back and also engineering, right? A lot of the folks that we work with, they tend to be a little more advanced when it comes to doing their own performance profiling. And maybe they're not doing it right. Maybe they're doing benchmarking that is skewed a certain way. And so being able to work closely with engineering shops on both sides, right? On the manufacturing side and also with the client side, and being able to be a kind of broker with those conversations is very helpful for us.
Starting point is 00:18:52 And so we get a lot of support from SolidLine there, and we're appreciative of that. We also like the fact that it's a good balance between, again, performance, availability. So there's definitely mission criticality to the drives that we buy. We want these systems to last. And also our customers, once they certify on a platform, they like the platform. They like to stick around.
Starting point is 00:19:08 They don't want to go through and make massive changes for that generation of certification. The next generation of certification, different story. Now they have learned their lessons and now they want to apply them. So we have a constant feedback loop back to folks like Solid Diamond and that's helpful as well. That way our customers as a whole, they, in aggregate, they get voice and we're able to supply that. And for us, it's been a very mission critical piece. And also some of the manufacturers that we use that are not direct. So we do have other organizations that we work with that are providing storage services that are maybe built for attached file storage, as an example, within Bare Metal Cloud, that's composable infrastructure. Those vendors are also using Solid M in the back
Starting point is 00:19:43 end, which is also cool. It standardizes our performance and our work there with SolidM as a manufacturer. Ian, it was great catching up with you today. And I've learned a ton about Bare Metal Cloud and how it's transforming
Starting point is 00:19:54 in the AI era. Thank you so much for sharing your insights. If folks want to follow along with you and discuss Phoenix Snaps offerings with your team, where should they go for more information?
Starting point is 00:20:05 Yeah, pretty easy to get a hold of. We have people available 24-7. Sales at phoenixnap.com is typically a good place to start. I'm also on LinkedIn. Really, it's relatively easy to find. So if somebody has good questions, just feel free to chime over. I'm pretty active there as well.
Starting point is 00:20:18 Awesome. Thank you so much for being on the program today. And Janice, that wraps another edition of Data Insights. Thanks so much for being with me today. Amazing. Thank you so much, Allison. Thank you both. Thanks for joining the Tech Arena. Subscribe and engage at our website, thetecharena.net. All content is copyright by The Tech Arena.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.