In The Arena by TechArena - The Future of Computing Infrastructure for the AI Era with Cloudflare’s Rebecca Weekly

Starting point is 00:00:00 Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome to the Tech Arena. My name is Allison Klein, and I'm coming to you from the Open Compute Summit in San Jose, California. And I am so delighted to be joined by Rebecca Weakley from Cloudflare. Welcome, Rebecca. Thank you so much.

Starting point is 00:00:41 It's so great to be here with you. So, Rebecca, this is your first OCP summit after you stepped down. Not as the chairperson. Yeah, exactly. How does it feel to be here? It is such a low-hop. There's so much less work I have to do, which is great. I just get to enjoy everybody.

Starting point is 00:00:59 I did. I was on the AI track content advisory board. So, I definitely got to do a little bit there. And I'm obviously very involved with the future technology symposium work. So still had a lot to do with some of the content, but none of the, you know, meeting with all the different advisors and members and stuff that you do on the board. Very refreshing. So I was following your formerly known as Twitter live stream of

Starting point is 00:01:27 OCP keynotes this morning to see what were the topics that were resonating with you. And I have a bunch of stuff I want to talk to you about. I feel like the big topic today is AI and how in the world are we going to power data centers to fuel this power-hungry workload? I've attended Andy Bechtolsheim's talk where he talked about liquid cooling. I've listened to Microsoft. I've listened to Google. I've been interviewing the sustainability team at OCP. What is the prevailing theory about how we're going to address this when we're looking at incredible power draw for RAP.

Starting point is 00:02:05 So I think I'll take one step back and then try and step into it. One is that AI is not one work. And I thought that was one of the most interesting sets of spider diagrams that came out from Meta's keynote this morning. And it's definitely what we've observed within Podler. Depending on the model,

Starting point is 00:02:23 depending on whether you're doing inference or training for that model or anything in between, fine-tuning, run, you're going to see very different computational challenges. And where the bottleneck is will move. And that makes it even harder, right? But that's also the best time to be in hardware because it's where you can actually drive real changes in the solution space.

Starting point is 00:02:44 So when we think about the major challenges, it's really because we're looking at 3% of data center space is not currently occupied with a build plan, meaning there's not a tenant for 3% of the available capacity across the world. That's all we got left. And of those data centers, on average, they're a third to a tenth of the power footprint that we will need for training clusters. Now, training clusters, again, depending on what we're training. But let's take transformers because it's kind of the, you know, whenever anybody shows that exponential graph, like moving off into the end of space and time they're usually talking

Starting point is 00:03:26 about transformer models because those have the highest parameter counts those are the ones that are taking all the memory you cannot fit in a single gpu i mean there's some obviously smaller ones like llama that can fit in seven billion parameters but the vast majority are much much larger and so we need massive clusters and this is where you've been hearing about InfiniBand clusters and all these other things I think one of the big sets of themes here is around universal Ethernet consortium that has just been established in the last few months to really start to drive Ethernet as primary. And you saw Google's announcement today on giving Falcone to the community. That is a huge step forward, I would argue, for the ecosystem to be able to drive interconnected

Starting point is 00:04:15 systems to get us from a couple hundred accelerated nodes operating together to actually millions of accelerated nodes operating together. That is a connectivity first problem. Some of the other things that may have been a little less sexy in all the keynotes was talking about the manageability, the serviceability of this system, silent data corruption, and how we handle reliability when we start to see bits flipping in massive systems that are not doing stochastic modeling. They're quite literally doing complex, very hard to trace analysis in federated learning models or something else, right?

Starting point is 00:04:55 As we have reliability issues in those domains, the scale of the challenge to identify what is wrong and correct the data and correct the model's output is much more complicated and that's where the bread and butter of OCP is reliability, security, and regardless of the system, regardless of the system. So there was a lot of interesting I think developments on Microsoft Keynote, Google Keynote in that domain that I think will be great for us as a community to be able to take advantage of. And then the big themes around optics and Ethernet so that we can get congestion management

Starting point is 00:05:32 control out of border execution support. I mean, if we can get consistent Ethernet implementation from the adapter cards out to the switches. Right. That's huge. It Right. That's huge. It is. It's huge. It sounds so silly, but like for a standard that's 50 years old,

Starting point is 00:05:49 we have a lot of non-standard behavior. Right. No, that's always been the case. Yeah. You know, one thing that I've heard about in one of the talks, I can't even remember, is just new math, new math models for AI. And this is something that I've heard about from different people in the industry that maybe some of the computational models

Starting point is 00:06:08 that we put together for AI are inefficient and need to be redressed. Has this been a focus of OCP? Do you see that in the industry? So yes, in the industry has it been a focus of OCP, not really the model development directly, right? We are not at our heart software developers for the ecosystem. We're much more the software layers from the kernel and below.

Starting point is 00:06:32 That's where we play, the network operating systems and domains like that, Redfish, in partnership with organizations like Linux Foundation and DMTF. So it's more about taking those systems and ensuring that they can run with standard packages and compliance and capabilities. Now, model development, where are there interesting things happening? There are so many things happening in the domain space of like XLA. So if you were at the AI track earlier, there was a great conversation with NVIDIA and with Google together talking about XLA. I never thought the day would come that NVIDIA personally would talk about hardware abstraction layer.

Starting point is 00:07:10 That wasn't CUDA. Right. But there they are talking about, you know, the performance that can be found using a more generic open source option that will allow you to target NVIDIA GPUs or anybody else's ASICs. And I think that's an important part of how we as a hardware and systems ecosystem can start to unlock model development on innovative solutions. So I think that's kind of the areas where I'm seeing this community engage versus, again, in places like Linux Foundation

Starting point is 00:07:44 or where we're starting to see massive explosions in the model development itself. It's always been such a research-oriented community. Right. And there's so much that has exploded with, you know, generative AI, especially the open sourcing of generative AI from people like Meta. Now, we see the supply constraints of NVIDIA GPUs. There was a talk earlier today that called for standardized accelerators.

Starting point is 00:08:10 So I thought that was an interesting statement. Standardized RAS. So reliability, availability, and serviceability, as well as DMTF from a Redfish perspective. How can we make sure we are getting systems control and reliability information from components on our ecosystem, especially RDMA-capable components on our ecosystem, without being locked into everyone's specific telemetry, everyone's specific. Yeah. Nobody's going to standardize.

Starting point is 00:08:38 Okay. Let's take a step back. We know that there's a problem with the supply chain constraint. There's a tremendous amount of work in the startup community. And I know that startups are part of OCP. Do you see viable alternatives to GPU technology coming to market anytime soon? And are you excited about seeing that kind of stuff? Oh, there's so many fun startups in the world, right? And I'm a hardware nerd at heart, so I will always be excited. So, I mean, you know, Cerberus was on one of the keynote stages today

Starting point is 00:09:11 being highlighted in Rolf's section on, you know, immersive cooling and 25-kilowatt, you know, systems that need very unique cooling solutions, which are the kinds of systems that are driving. So, again, where OCP is trying to help is in standard form factors like OAM, the open accelerator module, and systems that can take different solutions with consistent, you know, cooling options, et cetera, to lower the barriers to entry and to keep secure systems reliability for these new entrants. And I think that's an important role in fashion. Would I chuck it all and go join the latest AI startup

Starting point is 00:09:50 in the hardware domain? Not yet. I haven't seen that yet. But it's not because of a lack of passion for seeing interesting solutions here. Transformers are incredibly different. I am tried and true one of the biggest fans of running imprints on CPUs in the world. But if you want to run an LLM, you're not going to run it on a CPU. Right. Not well. And, you know, I would like to have results to my queries before I have

Starting point is 00:10:21 grandbabies. Right. Exactly. And that's not going to be an option, even for inference, when you're running just on a CPU. So there are places where we are seeing, obviously, for machine learning models and other use cases for inference, great opportunities for inference. But we are seeing facing accelerators, and it's a market that NVIDIA serves very well,

Starting point is 00:10:47 but competition is good for everyone. And they haven't had enough supply to support everyone. So I think that's really given, you know, the Gaudis and the Cerberuses and the Etched. And I mean, literally, you can throw a stone and hit a new AI hardware startup these days. TinStore, you know, there's so many out there bringing silicon to market. And it's exciting to see. It's exciting to watch. Now, you just mentioned a 25-kilowatt rack.

Starting point is 00:11:16 That's just their node. Yeah, yeah. So a node, excuse me. Andy was talking about 100 to 200-kilowatt racks. Yeah, yeah. And I've been talking to your sustainability initiative team members. This is going in an opposite direction of maybe where the energy efficiency team wants to go. Yeah. So, you know, I understand why liquid cooling, because that definitely helps. But what are we going to do to maintain

Starting point is 00:11:46 that tenant around sustainability and SCP with this AI demand being so strong? Gosh, I wish I had heard what they said. For me personally, I think there's kind of two ways of looking at it. So again, when we talked earlier about AI training, these are the systems that are the 100, 150 kilowatt racks, right? These are the multi-megawatt campuses, brand new builds, just totally different design points, different pushing the terabits on your optics. Like, it's a whole new world. It's a whole new game. But you train a model. I mean, you see GPT-4,

Starting point is 00:12:27 these are six month releases. Right. Six months. Of training. Of consistent training, of megawatts of power. Like it's, this is not everybody's game. This is never gonna be everyone's game. And where I'm most excited personally

Starting point is 00:12:44 is actually in the inference, in the use of these models. Very few people need to train foundational models. That is probably a statement that could be questioned by lots of humans, but it is my strong belief that the vast majority of us can use models that have been trained on data sets that are larger than we'll ever have access to and actually get incredible insights. And when you're thinking about models for threat detection, anomaly detection, speech recognition, image recognition, standard models that are out there, that are available, that are open source for people to use for their own applications are so plentiful. So obviously where Cloudflare has been investing in this domain is on inference. And what's happening in the inference ecosystem is constant innovation for getting the most per watt.

Starting point is 00:13:41 So unlike the training systems, which I would argue are kind of in the hype cycle, they're doing whatever they have to do to get it done, to train these models, to take the crown of time to convergence, time to win, whatever. In the inference space, it's the same thing we've always been playing with compute. How do we get time to first token as quickly, as low latency, and as low power as possible? Because that's something that's requiring us to operate at edges with humans, at endpoints on cell phones. That's where Lama CPP just increased the performance per watt by 10x from where it originated out of Meta and how they've embraced that moving forward.

Starting point is 00:14:23 So from an inference perspective, I see people playing with quantization, looking at model optimization for deployment for whatever accelerator or GPU is available. And I see all the wonderful innovation that's happening with ONNX, with XLA, with interim layers to make sure models can run on different targets as optimally as possible. And that is pretty exciting, personally, because we'll infer from a standard model probably a thousand times more often than we will ever train those models. And so from a sustainability perspective, and those on the podcast can't see me crossing my fingers, but I fundamentally believe our goal has to be to focus on how we can make sure inference, which is happening so much more often, is happening as optimally as possible.

Starting point is 00:15:08 So let's go back to OCP for a second. OCP has been having an incredible impact on trickling down industry standard hardware into broad market. And I think that this has been something that's been a trend of the last few years, that those designs are actually having an impact in telecom, in edge, etc. Do we see a bifurcation of training and inference where training will be models that are only used for those who can afford the GPU and the power bill, and inference will be the playground of that broad proliferation across different vendors and different operators. From an OCP perspective, I mean, obviously I've already put my perspective out there

Starting point is 00:15:55 from a cloud perspective, but let me try and wear my OCP hat correctly. I fundamentally believe that is true. I believe that is the outcome largely because of the data sets required to train foundational models. So we are seeing consortiums come together to develop foundational models for important use cases that are not necessarily commercially viable, but are interesting to the world. Right. But outside of those organizations, who has the data to build these models, to understand and index the world? I mean, these are data sets that have been built over the last 20 years.

Starting point is 00:16:46 Right. Of every Google search, of every social media graph, you know, what has occurred. It's quite a trough of information. And again, that doesn't mean, you know, we just saw the latest purchase by AWS of diagnostic. So there are people who are training foundational models as a core business that are being recognized.

Starting point is 00:17:11 So it's not to say that that's always true, but I wonder if that's the exception versus the rule because where the acquisition happened was to a company with a huge amount of data acquiring talent that had built foundational models. And that was really leapfrogging. So was that acquisition about talent and capabilities or was it about their model being so well-known? I think it's fundamentally wrong.

Starting point is 00:17:42 So I do believe it comes back to the day and there will be places and ways in which. But I'm happy to be wrong. I do also think you will find regulated markets. And right now, the premise for a lot of the regulated markets, HIPAA regulated or banks, is that they can leverage public cloud models. and that's the right choice. The first time there's a massive data breach, we may see a major retrenching of that. Right. That makes sense. But for now, it does seem like they're willing to trust federated models, enterprise licenses,

Starting point is 00:18:19 and that Google and Microsoft and AWS will be responsible with their data. So for now, it seems like that's the trend. So I want to shift gears for a second. Sustainability. I've been talking to the sustainability initiative team members at OCP, and they're doing fascinating work across three vectors. And the first vector is what I want to talk to you about the most,

Starting point is 00:18:41 which is the embedded carbon in silicon and transparency in that embedded carbon. So you and I both have a background working for a silicon supplier. Yeah. How do you think the silicon industry is going to respond to this challenge of actually publishing that? That is a very complex thing to ask for when you consider the complexity of silicon manufacturing and what was the conversation like in OCP when that was

Starting point is 00:19:10 the target? You know I think the conversation within OCP this was led by the market share kind of winners you know the hyperscalers need to understand not just their usage but actually their embodied carbon, right? What is in there? And they can only get that from their suppliers. So even at the keynote today, right, there was a green concrete conversation that Partha spoke about on Google on iNASA's trying to drive green concrete and more sustainable practices in building. There's a ton of work that's been done actually in the construction industry across the world to help people understand more about the supply chain and its embodied carbon so that they

Starting point is 00:19:55 can make better choices in their R&Ps. The same work is not consistently being done yet, I would argue, and that's really what OCP identified last year as a massive requirement. We need a database that is actually telling you this skew of this SSD from this vendor looks like this from an embodied carbon footprint. The company, at the end,

Starting point is 00:20:21 will have to do the full scope one, scope two, scope three emissions to understand their logistics their supply chain challenges and beyond but right now that that body of data doesn't exist and there's high-end consultancies and there are certain semiconductor companies that can sort of put together a model but it's not well vetted. It's not aligned as an industry. It's not actually an apples to apples comparison quite often. I applaud the hyperscalers who have put out data in terms of their transparency and their methodology, but it's not enough for anybody to make a truly informed decision. And I think consumers care that they can, you know, look at their feed,

Starting point is 00:21:09 but also understand that they're not doing it by polluting water in, you know, China, right? All of us have a responsibility as global citizens to ensure that we are delivering. So yeah, we need increased transparency of the common methodology steps that we're using to actually do this reporting so that consumers can make better choices. The other thing that was talked about a lot is circularity. And I love this, especially when you consider open compute standards on hardware. It opens up a secondary market that's super interesting, especially for workloads that may not need the latest and greatest gear. Put your cloud flare hat on for a second.

Starting point is 00:21:50 Do you see this as something that's a viable alternative for infrastructure build out in the coming year, in the coming couple of years? Where do you think this is going? Absolutely. So I've been very public about my commitment to modularity and to circularity. And both are important. So modularity, right, is how do we reduce the embodied carbon in each new server generation? So instead of what we've done for the last 20 plus years in this industry, where with every new server, it's a new rack, it's a new motherboard, it's all new components. How do we identify the subset of the system that is getting more efficient? The CPU, the memory subsystem, possibly the peripherals, certain peripherals, but not every component on the, like, your baseboard management controller

Starting point is 00:22:39 is not getting significantly more efficient per watt. Why do I need a new one? Why can't I reuse all of that portion of the chassis and just swap the compute module with the memory that is actually that much more efficient, giving me 30% of the improvement talent then? So first step is have modular sub-components well-deferring for interoperability and only swap what we need to swap.

Starting point is 00:23:03 That greatly reduces your embodied programming footprint just moving generationally. Five, open system firmware. Things like OpenVMC, things like Redfish. The more that we move towards open standards and manageability, the more we open up the ecosystem for second-line things. And that allows us to really look at markets, like you mentioned, where from an overall power or an overall compute requirement, let's say the entire country has a 10 gig uplink to the Ethernet. I'm not kidding that this is true, that we have places in this world that that is the entire connectivity to the country. They do not need a 64-core latest process generation,

Starting point is 00:23:52 128-core, 192-core, right? They're not pushing the envelope. They don't have the I.O. requirement to push the envelope. They actually need older generation things that are speaking at a 10 gig. Right. need older generation things that are speaking at a 10 gig, you know, right? So everything about not is going to speak to, Hey, we have those servers. They're from 10 years ago. Right.

Starting point is 00:24:14 Let's, let's ship them. Let's give them a second life. But if you're in a closed ecosystem, your firmware, your bios, your security, nobody can maintain that server for those people in those projects. And so all of the embodied carbon in that server goes into some waste receptacle versus having second life. So why we care about open system firmware, why we care about these things

Starting point is 00:24:39 is absolutely so that we can strip away the persistent elements, right? You're not going to probably want to send your SSDs into that second life, although there's some great conversations that are happening this week about how storage, specifically from a persistence perspective, can be second-life and reduce erased. But usually, most of us have a policy where we remove persistent information and second-life the server, if we we can into markets that still want those capabilities that still have

Starting point is 00:25:09 so I think it's a super exciting initiative I mean this is something that you know Ali Fenn and IT Renew really kind of took the torch up upon I want to say six or seven years ago right and the roadmaps are only more aggressive in terms of how much, and I'd love to see the stats. I don't have them in front of me right now, but in terms of how many different countries we've brought online with OCP gear

Starting point is 00:25:35 that has been second purposed in those markets. It's pretty exciting. I love that. Well, Rebecca, it's been a pleasure, as always, to catch up with you. You've given me a lot to think about. And thank you so much. Enjoy your week at OCP Summit.

Starting point is 00:25:50 I can't wait. Thanks for joining the Tech Arena. Subscribe and engage at our website, thetecharena.net. All content is copyright by The Tech Arena.

In The Arena by TechArena - The Future of Computing Infrastructure for the AI Era with Cloudflare’s Rebecca Weekly

TechArena host Allyson Klein chats with Cloudflare VP Rebecca Weekly at the OCP Summit on far ranging topics including the demands of AI on infrastructure, how Summit announcements will shape the indu...stry, and the importance of modular and circular infrastructure oversight.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.