Big Compute - New Architectures in HPC
Episode Date: February 26, 2019Host Gabriel Broner interviews Mike Woodacre, HPE Fellow, to discuss the shift from CPUs to an emerging diversity of architectures. Hear about the evolution of CPUs, the advent of... GPUs with increasing data parallelism,memory-driven computing, and the potential benefits of a cloud environment with access to multiple architectures.
Transcript
Discussion (0)
Hello, I am Gabriel Bronner and this is Big Compute Podcast.
Today's topic is new architectures for HPC on-premise and in the cloud.
For a few decades, CPU speed doubled every 18 months following Moore's law. HPC users could
always count on the fact that a more powerful next-generation CPU would enable them to scale
their applications. More recently, we started hitting the limits of miniaturization and power,
and the exponential compute power increase over time has plateaued.
This has given rise to a variety of architectures, such as different CPUs from Intel, ARM, AMD, Xeon Fire, KNL, NVIDIA GPUs, FPGAs, and TPUs.
HPC customers now wonder, what is the next architecture they should buy. It
also opens the question if users can leverage different architectures for
different applications are they better served with a fixed on-premise system or
with a cloud HPC environment that enables to access different
architectures for different applications. To discuss new architectures for HPC, our guest today is Mike O'Dacre.
Mike is a fellow at HPE.
Over the years, Mike has been a systems architect and chief engineer at MIPS, SGI, and HPE,
where he set direction for hardware architectures.
Welcome, Mike, to the Big Compute podcast.
Thanks, Gabriel. It's great to be here.
Very good Mike, happy to have you. Can you tell us a little bit about yourself?
Sure, yeah I grew up in the UK and had always had an interest in technology growing up and
growing up in the 1980s, the rise of home computers and started programming on a Sinclair ZX81 and got excited about computing, studied computer systems engineering.
And then my first job in the UK was with InMOS, working on the Transfuda microprocessor, which was a really fun place to be, lots of
great innovation work going on.
And then after a few years there, I decided I wanted to broaden my horizons, combine exploring
the wider world with my career.
And so I relocated to Silicon Valley in the US and joined MIPS computer systems, working on the first 64-bit microprocessors.
And then through a series of takeovers and acquisitions, my career has gone through into
working for Silicon Graphics, then working with people from Cray, working with Rackable, and finally ending up at Hewlett Packard Enterprise today.
And I did have some brief sort of move into the startup space
with Three Leaf Systems in Sunloken Valley as well.
So, Mike, we have a fantastic experience,
and you've been with some of the major players in the industry.
It's great to have you here. I'd like to ask you a few questions first perhaps you
can give us your views on the change that is happening from a mostly CPU
world to a multi architecture world so I think it's actually a really exciting
time in the industry you know when I started out in the 80s, there was a huge range of system architectures.
You had RISC versus CISC, you know, digital signal processors.
And then over the years, the kind of richness slimmed down
as advances in IC technology, you know,
how the highest volume general purpose approaches,
and in particular x86 CPUs,
kind of went out as the way to drive
both hardware and software costs down.
And the world gravitated to this x86 space
and in the HVC area,
we sort of gravitated towards clusters using x86 and InfiniBand being the dominant interconnect.
While there were a few proprietary interconnects, generally more focused at the very high end of
HPC, the top of the top 500, the world has started to change again with the physical limits of the IC technology that led to that sort of dominance and consolidation onto x86.
With these changes, so the end of Denard scaling, the power density, you know, used to stay constant as transistors were shrinking.
That changed.
So power has become this huge issue.
And so even though we could fit more transistors on the chip,
there were real challenges on how you keep feeding those cores,
the memory bandwidth challenges,
and the fact that you can't really power all of that silicon
on the chip all of the time.
So we're now sort of into a heterogeneous area again where you start to have lots of
specialist silicon starting to appear and sort of it's the combination of, you know,
the sort of exponential growth of data that we have to deal with in many different
areas and the fact that artificial intelligence, machine learning, deep learning has been a
way to start bridging how to analyze all that data.
That's really led to people looking at application-specific
use cases again, where you have sufficient business need to drive sufficient volume to
invest in those technologies to give you that insight from all the data that we're drowning
in.
But it's important also to keep things in perspective.
You've got to pick the right tool for the job
and make sure you have good operational efficiency,
good resource utilization.
So it's no good having this cool technology
if it's underutilized.
So you have to be making sure you understand
what your needs are.
And it's sort of very different, you know,
what some of the large cloud service providers need to solve their problems
from what other people need at the department or corporate level.
Sounds great, Mike.
So you welcome the richness of the new style of architectures that we have today
and the changes that we're facing because people will be able to use the right tool for the job, the way you say it.
Let me go to the Intel side first.
And Intel has evolved and their CPUs have evolved from Ivy Bridge to Haswell Broadwell
to Skylake.
Would you like to comment on what have been the changes in the Intel processor line during
these years?
Sure. So, I think in those recent number of years, we've seen changes. I guess probably the biggest
one is the continued growing number of cores. We know that the number of transistors has still been
going up. So, the way to sort of address the power issue is to go multi-core rather than one very
high frequency core. So, you know, the challenge there as you add more cores is how do you feed
them providing sufficient memory bandwidth, you know, with memory technology so that the cores
are not starved. And equally, you've got to provide sufficient interconnect bandwidth to basically allow you to communicate,
whether it's to storage or to other nodes,
to internet communication.
And so as well as memory bandwidth being driven,
we need to drive that I.O., PCI Express being the majority
of standard I.O. channels today on the x86.
So we've seen the growth of parallel data units,
SSE, AVX generations driving the flops per core
from 2 to 4.8 up to, with Skylake, we've got 32 flops,
64 bit operations today per cycle, which is obviously key for HPC.
But this advancement has, you know,
there's been overall still an increase
in the power envelope of the processor sockets,
which has pushed up costs both, you know,
in the platforms and operational costs.
So, you know, this is, you know, in general,
Intel's done a great job at deploying performance at the socket level,
but it's a growing challenge.
That sounds good.
So Intel has increased in terms of number of cores,
there's a memory bandwidth in terms of connectivity.
We hear a lot about flops,
even the top 500 list focuses a lot on flops in running Blimpact.
But give me a sense of memory bandwidth.
How has memory bandwidth changed
and how does memory bandwidth impact HPC applications?
So when you analyze HPC applications,
pretty soon you find that in many cases
the memory bandwidth is a key limiting factor.
The industry benchmark, the streams memory bandwidth benchmark
is really good at analyzing memory performance of processes.
And really that streams benchmark can kind of give you a good sense
of where you'll top out on quite a few applications.
Another way, spec FP rate would be another way of looking how performance is limited.
And again, spec FP rate, there's a number of areas where the memory bandwidth is really
the key limit.
And so, for example, today, as you get looking at the Intel Skylake processor, you can get
SKUs from, I think, four cores, probably all the way up to 28 cores.
Well, when you look at memory bandwidth, if you run streams,
you'll top out at about a dozen cores.
You'll saturate the memory bandwidth. So it's a real key decision point in buying a technology
to make sure that you've got sufficient bandwidth.
And obviously there are other ways, you know, other than standard DDR memory channels,
you can have things like high bandwidth memory technologies
that we've seen adopted by some of the GPUs,
which can drive even higher bandwidth,
but you sort of have performance capacity trade-offs to make,
and those can also be hard to configure
so they can add costs to the deployment.
So, again, it's really important to look overall
at what applications are key to your use case
and pick the right bandwidth to fit that need.
That sounds great, Mike.
Thanks for explaining memory bandwidth
and how it impacts applications.
You touch on GPUs.
We've been talking mostly about CPUs and Intel.
GPUs seem to achieve success today in some areas,
for example, AI, machine learning, deep learning.
I wonder if you can talk to us about why is that and what are the kinds of applications that today benefit from GPU architectures?
So to me, GPUs in a way are the ultimate data parallel devices. GPU is a graphical processing unit.
The people who come across them today may think of them as an AI unit.
The graphics was the place they started where you needed to stream data through the chip
in very much a single instruction, multiple data use case. And over time, what we see now is people are trying to extract information
from data using artificial intelligence, machine learning, deep learning.
And the deep learning algorithms at the core of them is a matrix multiplication.
And so that, again, maps really well to the GPU architecture.
So you can use that parallel performance of many
multiplication units to perform training
of, you know, for these AI applications
in reasonable timeframes
because you're able to process that training in hours and not weeks.
So this is really getting that data parallelism has let people
really start to make progress in algorithmically exploring
the large data sets.
We're trying to gain insights from things like image analysis.
Medical applications of image analysis are tremendous. And so the opportunity here for
the GPU and deep learning is really great. Yeah, it sounds great. So basically, you could do AI deep learning with CPUs,
but it'll take much, much longer.
And the advantage of GPUs, as you were mentioning,
could be hours instead of weeks to do that.
That's great to hear.
Actually, one other comment on that.
I think the other key thing that NVIDIA did first in HVC and now actually more in the AI space is also help the tool chain.
So in HVC with their CUDA work, that really made it easier for programmers and software to be developed. frameworks that have been developed in the industry, you know, Cafe, TensorFlow, again,
that's really enabled the power of the GPU to be put in the hands of the many. And that's
really a key to drive technology adoption. That's great. Mike, I wanted to ask you about
something that being close to your life, you've been involved with large memory systems at SGI and now memory-driven
computing at HPE. What can you tell us about those architectures and their benefits?
So again, one of the key issues we face today is the amount of data we have, which continues
to grow exponentially. As we talked earlier, challenges with traditional CPUs is that they're just not able
to keep up at the rate of processing performance as the data grows.
So another way that people want to get real-time insights from data.
So by getting data into memory, you can actually process
at the speed of memory.
You're basically getting IO storage bottlenecks out of the way,
and you can process data orders of magnitude faster
than if you're having to access that data through a storage technology,
even something like today's fastest storage is Flash
or even phase change memory-based,
you're still faster with DRAM.
And so getting that data set into memory
lets you speed up the analysis work.
And you can even do all sorts of things.
You can just take your current pipeline of a typical workflow, say an HVC, preprocessing,
simulation, analysis, visualization, do it again.
You can basically put all of those applications on a single system, keep the data in memory
just through an in-memory file system so you don't even need to change the applications.
You can get tremendous speed-ups.
And then you could, as well as just that simple way of using in-memory data,
if you actually look at refactoring your algorithms now
to think that you can process data in memory,
that you can have larger pools of memory.
And here I'm talking about tens of terabytes and growing amounts of data.
So, for example, we've done some work around Monte Carlo simulation,
which actually a lot of banks use larger arrays of GPUs
to do risk analysis.
The team in our labs worked to refactor Monte Carlo so that you would do preprocessing to basically create large lookup tables.
So when you're actually doing the real-time risk analysis, you could transform a calculation.
Instead of doing a base-up calculation every time, you could do a simple lookup and then a small offset calculation, and we've seen
speedups in order of 1,000x speedup in that use case.
So memory-driven computing is really a way to enable
and it's another way you can address the challenges
we have with large data.
And then we're doing that today on proprietary memory semantic fabrics.
But in the industry, the Gen Z consortium has been formed to provide an industry open
memory semantic fabric to drive memory-driven computing technologies, and make it something that basically you can plug
heterogeneous processing elements into.
You can plug all sorts of emerging memory technologies into.
But by having that single address space,
you can basically accelerate the ability of processing elements
to get to these large data sets.
So, yeah, it's an exciting space.
So I summarize it, if everything is in memory, all my processing can happen
directly without the bottlenecks of I.O. and I can even rethink my workflows to
take advantage of everything being in memory. Is that fair?
Exactly, yeah. That sounds very good. So we go into a phase.
We describe a few architectures here.
We describe CPUs from Intel, NVIDIA GPUs.
Now we're talking about memory-driven computing, right?
So I'm a user, and I have to decide what machine to buy next.
And what I always wonder is, will users benefit from buying a system and choosing an architecture?
Will they benefit from using different architectures in the cloud, assuming all these become available in the cloud and I have instant access and use them when they need them?
I see a possibility there. So I'd like to ask you, what do you think about the potential benefits
the user will have in terms of having all these architectures
available to them in the cloud?
So I think cloud is a really interesting space,
and it's important to note that cloud is cloud is a very broad term you've got
public clouds you've got private clouds and and the cloud model is is something that's attracting
a lot of interest from cios or management you know for business reasons that that you know i
think a key business reason i'll come on to some of the technology
things, but first, the business reason is you can kind of change from a capital expenditure
model to an operational expenditure model.
And there are ways that that can be done too through, you know, an on-premise deployment
as well as in the cloud.
But, you know, I think the business model is actually something people need to think about
when they consider cloud technology.
I think, and again, operationally, security used to be a concern people would raise.
I think that with the cloud, I think that's pretty much in the background these days.
But operationally, CIOs do worry about security of systems,
so it's pretty nice for them to be able to go
and basically put that burden on someone else
with a cloud solution.
So I think the public cloud does provide really interesting
entry points to explore options as we've been discussing all of these new
heterogeneous processing technologies coming about. It's very
expensive if you want to explore them personally on-site yourself.
So being able to access them in the cloud is really interesting.
And the cloud is also great when you have burst capacity needs.
And again, the other issue in the HPC world for people coming into it is getting access to software.
So if you can get access to an HPC cloud environment
where your software applications are provided for you,
there's an easy way to get access to licensing for software.
That can really help people with the on-ramp to HPC.
So there's lots of benefits.
But the other thing I would say is, again,
you have to step back and decide what's your overall workflow.
It's great for exploring, but then you need to decide
what's going to make economic and productivity sense for you.
Because I think one of the other issues is the amounts of data
that people are processing or dealing with.
That can have a big impact on how you might think of using the cloud.
And so as an example, I was recently working with a startup
that was doing molecular dynamic simulations
using GPU nodes in the cloud.
They've been doing some great work, but they generated 40 terabytes of data and needed
it.
The key next phase was analyzing that data, and they just couldn't process that in a reasonable
time with the infrastructure that they had access to. And so then it becomes a bit of a painful process
to extract 40 terabytes of data, both in time and in money.
So I think it's important that people,
it's great to go explore technologies,
and cloud can be very useful.
If you have time-dependent needs
for computing resources.
But if you've got a sustained need for doing whatever type of computing it is,
then there's a lot of end-to-end workflow and cost calculations
to take into account to decide what's going to work best
for your needs. Your summary sounds like just like there's different architectures that you
could take advantage of, like GPUs, CPUs, etc. There may be different workflows that will benefit
from running in the cloud. There'll be different workflows that will benefit from running on
premise. And you can always use the cloud
to access things you don't have at home.
Maybe that's... I'm oversimplifying, but that's...
Yeah, I think it's easy to get excited
around specific technologies,
and there's some great things going on.
But you also have to step back and decide
when you want to go into production with workflows or cases, you know, what's going to be the best
overall solution. So again, it's kind of the right tool for the job at hand and, you know, both
the cost of initial entry point and then ongoing costs of that solution.
Yeah, great conversation.
Mike, I really appreciate your answering the questions for us.
Anything you'd like to add before we close?
No, it's a great pleasure to chat with you, Gabriel.
Thank you.
Very good.
So I'd like to thank our guest,
Mike Goodacre, HBE fellow,
for sharing his experience and wisdom
to help us understand this multi-architecture world
that is evolving in front of us.
Till next time, I am Geber Bronner,
and this was the Big Compute Podcast. podcast.