Big Compute - New Architectures in HPC

Starting point is 00:00:00 Hello, I am Gabriel Bronner and this is Big Compute Podcast. Today's topic is new architectures for HPC on-premise and in the cloud. For a few decades, CPU speed doubled every 18 months following Moore's law. HPC users could always count on the fact that a more powerful next-generation CPU would enable them to scale their applications. More recently, we started hitting the limits of miniaturization and power, and the exponential compute power increase over time has plateaued. This has given rise to a variety of architectures, such as different CPUs from Intel, ARM, AMD, Xeon Fire, KNL, NVIDIA GPUs, FPGAs, and TPUs. HPC customers now wonder, what is the next architecture they should buy. It

Starting point is 00:01:06 also opens the question if users can leverage different architectures for different applications are they better served with a fixed on-premise system or with a cloud HPC environment that enables to access different architectures for different applications. To discuss new architectures for HPC, our guest today is Mike O'Dacre. Mike is a fellow at HPE. Over the years, Mike has been a systems architect and chief engineer at MIPS, SGI, and HPE, where he set direction for hardware architectures. Welcome, Mike, to the Big Compute podcast.

Starting point is 00:01:43 Thanks, Gabriel. It's great to be here. Very good Mike, happy to have you. Can you tell us a little bit about yourself? Sure, yeah I grew up in the UK and had always had an interest in technology growing up and growing up in the 1980s, the rise of home computers and started programming on a Sinclair ZX81 and got excited about computing, studied computer systems engineering. And then my first job in the UK was with InMOS, working on the Transfuda microprocessor, which was a really fun place to be, lots of great innovation work going on. And then after a few years there, I decided I wanted to broaden my horizons, combine exploring the wider world with my career.

Starting point is 00:02:38 And so I relocated to Silicon Valley in the US and joined MIPS computer systems, working on the first 64-bit microprocessors. And then through a series of takeovers and acquisitions, my career has gone through into working for Silicon Graphics, then working with people from Cray, working with Rackable, and finally ending up at Hewlett Packard Enterprise today. And I did have some brief sort of move into the startup space with Three Leaf Systems in Sunloken Valley as well. So, Mike, we have a fantastic experience, and you've been with some of the major players in the industry. It's great to have you here. I'd like to ask you a few questions first perhaps you

Starting point is 00:03:29 can give us your views on the change that is happening from a mostly CPU world to a multi architecture world so I think it's actually a really exciting time in the industry you know when I started out in the 80s, there was a huge range of system architectures. You had RISC versus CISC, you know, digital signal processors. And then over the years, the kind of richness slimmed down as advances in IC technology, you know, how the highest volume general purpose approaches, and in particular x86 CPUs,

Starting point is 00:04:06 kind of went out as the way to drive both hardware and software costs down. And the world gravitated to this x86 space and in the HVC area, we sort of gravitated towards clusters using x86 and InfiniBand being the dominant interconnect. While there were a few proprietary interconnects, generally more focused at the very high end of HPC, the top of the top 500, the world has started to change again with the physical limits of the IC technology that led to that sort of dominance and consolidation onto x86. With these changes, so the end of Denard scaling, the power density, you know, used to stay constant as transistors were shrinking.

Starting point is 00:05:06 That changed. So power has become this huge issue. And so even though we could fit more transistors on the chip, there were real challenges on how you keep feeding those cores, the memory bandwidth challenges, and the fact that you can't really power all of that silicon on the chip all of the time. So we're now sort of into a heterogeneous area again where you start to have lots of

Starting point is 00:05:35 specialist silicon starting to appear and sort of it's the combination of, you know, the sort of exponential growth of data that we have to deal with in many different areas and the fact that artificial intelligence, machine learning, deep learning has been a way to start bridging how to analyze all that data. That's really led to people looking at application-specific use cases again, where you have sufficient business need to drive sufficient volume to invest in those technologies to give you that insight from all the data that we're drowning in.

Starting point is 00:06:22 But it's important also to keep things in perspective. You've got to pick the right tool for the job and make sure you have good operational efficiency, good resource utilization. So it's no good having this cool technology if it's underutilized. So you have to be making sure you understand what your needs are.

Starting point is 00:06:44 And it's sort of very different, you know, what some of the large cloud service providers need to solve their problems from what other people need at the department or corporate level. Sounds great, Mike. So you welcome the richness of the new style of architectures that we have today and the changes that we're facing because people will be able to use the right tool for the job, the way you say it. Let me go to the Intel side first. And Intel has evolved and their CPUs have evolved from Ivy Bridge to Haswell Broadwell

Starting point is 00:07:18 to Skylake. Would you like to comment on what have been the changes in the Intel processor line during these years? Sure. So, I think in those recent number of years, we've seen changes. I guess probably the biggest one is the continued growing number of cores. We know that the number of transistors has still been going up. So, the way to sort of address the power issue is to go multi-core rather than one very high frequency core. So, you know, the challenge there as you add more cores is how do you feed them providing sufficient memory bandwidth, you know, with memory technology so that the cores

Starting point is 00:07:57 are not starved. And equally, you've got to provide sufficient interconnect bandwidth to basically allow you to communicate, whether it's to storage or to other nodes, to internet communication. And so as well as memory bandwidth being driven, we need to drive that I.O., PCI Express being the majority of standard I.O. channels today on the x86. So we've seen the growth of parallel data units, SSE, AVX generations driving the flops per core

Starting point is 00:08:35 from 2 to 4.8 up to, with Skylake, we've got 32 flops, 64 bit operations today per cycle, which is obviously key for HPC. But this advancement has, you know, there's been overall still an increase in the power envelope of the processor sockets, which has pushed up costs both, you know, in the platforms and operational costs. So, you know, this is, you know, in general,

Starting point is 00:09:08 Intel's done a great job at deploying performance at the socket level, but it's a growing challenge. That sounds good. So Intel has increased in terms of number of cores, there's a memory bandwidth in terms of connectivity. We hear a lot about flops, even the top 500 list focuses a lot on flops in running Blimpact. But give me a sense of memory bandwidth.

Starting point is 00:09:36 How has memory bandwidth changed and how does memory bandwidth impact HPC applications? So when you analyze HPC applications, pretty soon you find that in many cases the memory bandwidth is a key limiting factor. The industry benchmark, the streams memory bandwidth benchmark is really good at analyzing memory performance of processes. And really that streams benchmark can kind of give you a good sense

Starting point is 00:10:14 of where you'll top out on quite a few applications. Another way, spec FP rate would be another way of looking how performance is limited. And again, spec FP rate, there's a number of areas where the memory bandwidth is really the key limit. And so, for example, today, as you get looking at the Intel Skylake processor, you can get SKUs from, I think, four cores, probably all the way up to 28 cores. Well, when you look at memory bandwidth, if you run streams, you'll top out at about a dozen cores.

Starting point is 00:10:54 You'll saturate the memory bandwidth. So it's a real key decision point in buying a technology to make sure that you've got sufficient bandwidth. And obviously there are other ways, you know, other than standard DDR memory channels, you can have things like high bandwidth memory technologies that we've seen adopted by some of the GPUs, which can drive even higher bandwidth, but you sort of have performance capacity trade-offs to make, and those can also be hard to configure

Starting point is 00:11:31 so they can add costs to the deployment. So, again, it's really important to look overall at what applications are key to your use case and pick the right bandwidth to fit that need. That sounds great, Mike. Thanks for explaining memory bandwidth and how it impacts applications. You touch on GPUs.

Starting point is 00:11:56 We've been talking mostly about CPUs and Intel. GPUs seem to achieve success today in some areas, for example, AI, machine learning, deep learning. I wonder if you can talk to us about why is that and what are the kinds of applications that today benefit from GPU architectures? So to me, GPUs in a way are the ultimate data parallel devices. GPU is a graphical processing unit. The people who come across them today may think of them as an AI unit. The graphics was the place they started where you needed to stream data through the chip in very much a single instruction, multiple data use case. And over time, what we see now is people are trying to extract information

Starting point is 00:12:53 from data using artificial intelligence, machine learning, deep learning. And the deep learning algorithms at the core of them is a matrix multiplication. And so that, again, maps really well to the GPU architecture. So you can use that parallel performance of many multiplication units to perform training of, you know, for these AI applications in reasonable timeframes because you're able to process that training in hours and not weeks.

Starting point is 00:13:30 So this is really getting that data parallelism has let people really start to make progress in algorithmically exploring the large data sets. We're trying to gain insights from things like image analysis. Medical applications of image analysis are tremendous. And so the opportunity here for the GPU and deep learning is really great. Yeah, it sounds great. So basically, you could do AI deep learning with CPUs, but it'll take much, much longer. And the advantage of GPUs, as you were mentioning,

Starting point is 00:14:13 could be hours instead of weeks to do that. That's great to hear. Actually, one other comment on that. I think the other key thing that NVIDIA did first in HVC and now actually more in the AI space is also help the tool chain. So in HVC with their CUDA work, that really made it easier for programmers and software to be developed. frameworks that have been developed in the industry, you know, Cafe, TensorFlow, again, that's really enabled the power of the GPU to be put in the hands of the many. And that's really a key to drive technology adoption. That's great. Mike, I wanted to ask you about something that being close to your life, you've been involved with large memory systems at SGI and now memory-driven

Starting point is 00:15:06 computing at HPE. What can you tell us about those architectures and their benefits? So again, one of the key issues we face today is the amount of data we have, which continues to grow exponentially. As we talked earlier, challenges with traditional CPUs is that they're just not able to keep up at the rate of processing performance as the data grows. So another way that people want to get real-time insights from data. So by getting data into memory, you can actually process at the speed of memory. You're basically getting IO storage bottlenecks out of the way,

Starting point is 00:15:50 and you can process data orders of magnitude faster than if you're having to access that data through a storage technology, even something like today's fastest storage is Flash or even phase change memory-based, you're still faster with DRAM. And so getting that data set into memory lets you speed up the analysis work. And you can even do all sorts of things.

Starting point is 00:16:21 You can just take your current pipeline of a typical workflow, say an HVC, preprocessing, simulation, analysis, visualization, do it again. You can basically put all of those applications on a single system, keep the data in memory just through an in-memory file system so you don't even need to change the applications. You can get tremendous speed-ups. And then you could, as well as just that simple way of using in-memory data, if you actually look at refactoring your algorithms now to think that you can process data in memory,

Starting point is 00:17:00 that you can have larger pools of memory. And here I'm talking about tens of terabytes and growing amounts of data. So, for example, we've done some work around Monte Carlo simulation, which actually a lot of banks use larger arrays of GPUs to do risk analysis. The team in our labs worked to refactor Monte Carlo so that you would do preprocessing to basically create large lookup tables. So when you're actually doing the real-time risk analysis, you could transform a calculation. Instead of doing a base-up calculation every time, you could do a simple lookup and then a small offset calculation, and we've seen

Starting point is 00:17:46 speedups in order of 1,000x speedup in that use case. So memory-driven computing is really a way to enable and it's another way you can address the challenges we have with large data. And then we're doing that today on proprietary memory semantic fabrics. But in the industry, the Gen Z consortium has been formed to provide an industry open memory semantic fabric to drive memory-driven computing technologies, and make it something that basically you can plug heterogeneous processing elements into.

Starting point is 00:18:29 You can plug all sorts of emerging memory technologies into. But by having that single address space, you can basically accelerate the ability of processing elements to get to these large data sets. So, yeah, it's an exciting space. So I summarize it, if everything is in memory, all my processing can happen directly without the bottlenecks of I.O. and I can even rethink my workflows to take advantage of everything being in memory. Is that fair?

Starting point is 00:18:59 Exactly, yeah. That sounds very good. So we go into a phase. We describe a few architectures here. We describe CPUs from Intel, NVIDIA GPUs. Now we're talking about memory-driven computing, right? So I'm a user, and I have to decide what machine to buy next. And what I always wonder is, will users benefit from buying a system and choosing an architecture? Will they benefit from using different architectures in the cloud, assuming all these become available in the cloud and I have instant access and use them when they need them? I see a possibility there. So I'd like to ask you, what do you think about the potential benefits

Starting point is 00:19:47 the user will have in terms of having all these architectures available to them in the cloud? So I think cloud is a really interesting space, and it's important to note that cloud is cloud is a very broad term you've got public clouds you've got private clouds and and the cloud model is is something that's attracting a lot of interest from cios or management you know for business reasons that that you know i think a key business reason i'll come on to some of the technology things, but first, the business reason is you can kind of change from a capital expenditure

Starting point is 00:20:31 model to an operational expenditure model. And there are ways that that can be done too through, you know, an on-premise deployment as well as in the cloud. But, you know, I think the business model is actually something people need to think about when they consider cloud technology. I think, and again, operationally, security used to be a concern people would raise. I think that with the cloud, I think that's pretty much in the background these days. But operationally, CIOs do worry about security of systems,

Starting point is 00:21:12 so it's pretty nice for them to be able to go and basically put that burden on someone else with a cloud solution. So I think the public cloud does provide really interesting entry points to explore options as we've been discussing all of these new heterogeneous processing technologies coming about. It's very expensive if you want to explore them personally on-site yourself. So being able to access them in the cloud is really interesting.

Starting point is 00:21:55 And the cloud is also great when you have burst capacity needs. And again, the other issue in the HPC world for people coming into it is getting access to software. So if you can get access to an HPC cloud environment where your software applications are provided for you, there's an easy way to get access to licensing for software. That can really help people with the on-ramp to HPC. So there's lots of benefits. But the other thing I would say is, again,

Starting point is 00:22:33 you have to step back and decide what's your overall workflow. It's great for exploring, but then you need to decide what's going to make economic and productivity sense for you. Because I think one of the other issues is the amounts of data that people are processing or dealing with. That can have a big impact on how you might think of using the cloud. And so as an example, I was recently working with a startup that was doing molecular dynamic simulations

Starting point is 00:23:06 using GPU nodes in the cloud. They've been doing some great work, but they generated 40 terabytes of data and needed it. The key next phase was analyzing that data, and they just couldn't process that in a reasonable time with the infrastructure that they had access to. And so then it becomes a bit of a painful process to extract 40 terabytes of data, both in time and in money. So I think it's important that people, it's great to go explore technologies,

Starting point is 00:23:40 and cloud can be very useful. If you have time-dependent needs for computing resources. But if you've got a sustained need for doing whatever type of computing it is, then there's a lot of end-to-end workflow and cost calculations to take into account to decide what's going to work best for your needs. Your summary sounds like just like there's different architectures that you could take advantage of, like GPUs, CPUs, etc. There may be different workflows that will benefit

Starting point is 00:24:20 from running in the cloud. There'll be different workflows that will benefit from running on premise. And you can always use the cloud to access things you don't have at home. Maybe that's... I'm oversimplifying, but that's... Yeah, I think it's easy to get excited around specific technologies, and there's some great things going on. But you also have to step back and decide

Starting point is 00:24:46 when you want to go into production with workflows or cases, you know, what's going to be the best overall solution. So again, it's kind of the right tool for the job at hand and, you know, both the cost of initial entry point and then ongoing costs of that solution. Yeah, great conversation. Mike, I really appreciate your answering the questions for us. Anything you'd like to add before we close? No, it's a great pleasure to chat with you, Gabriel. Thank you.

Starting point is 00:25:21 Very good. So I'd like to thank our guest, Mike Goodacre, HBE fellow, for sharing his experience and wisdom to help us understand this multi-architecture world that is evolving in front of us. Till next time, I am Geber Bronner, and this was the Big Compute Podcast. podcast.

CODACE Plant Stand

Big Compute - New Architectures in HPC

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Big Compute - New Architectures in HPC

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.