Big Compute - Rethinking HPC in Academia
Episode Date: March 4, 2019Gabriel Broner hosts Marek Michalewicz, Director of ICM, the HPC center at the University of Warsaw to discuss Rethinking HPC in Academia. With the advent of HPC cloud platforms, ...we may give every user access to systems on-premise, across multiple centers and in the cloud, to enable new research and accelerate time to research. Â
Transcript
Discussion (0)
Hello, I am Gabriel Bronner, and this is the Big Compute podcast.
Today's topic is rethinking HPC in academia.
Traditionally, HPC in academia has been on-premise.
A system acquired by the institution is kept for five years.
Utilization is high, so user jobs wait in queues, and relative performance declines
over time. With the advent of HPC cloud platforms, we wonder if it's time to
rethink HPC for academia. Instead of one institution, one system, we enable access
to systems on-premise across multiple centers and in the cloud.
At the same level of spending, we may be able to accelerate time to science and enable new areas of research
by having access to multiple architectures, to the latest technologies, and by reducing time waiting in queues. To discuss HPC in academia, our guest today is Marek Mihalevic.
Marek is the director of ICM, the High Performance Computing Center at the University of Warsaw.
With many years in the industry, Marek has headed academic and research HPC centers,
including ASTAR Computational Research Center
in Singapore.
Welcome, Marek, to the Big Compute Podcast.
Good morning, Gabriel.
It's morning in Warsaw.
It's nighttime at your place.
Great to talk to you.
I'm very happy to answer your questions and to share some of my thinking on HPC in
cloud or the progression of academic HPC computing.
Marek, it's fantastic to have you here with your experience so look forward to having this
conversation with you. Maybe we can start from the beginning. What are your views
on HPC in academia today?
And what are the challenges we face? And maybe since you're in Warsaw in the cold morning,
we can start with the University of Warsaw and how you see things from there.
Well, I think that there are numerous challenges in the environment that are typical academic environment in Poland or other country.
And one of the two biggest challenges is on one hand, there's an insatiable appetite
and the requirements for computing and for storage, sometimes separate storage from computing.
And on the other hand, the funding cycle is not predictable.
So it's very difficult to do long-term plans for expansion,
for maintaining the quality of service,
when very often the sources of funding are ill-defined
and varied.
Sometimes they come from within the university,
sometimes from various ministries and bodies of the,
sometimes from external grants,
in our case, European Union grants.
But, and they are substantial of course, and they help us meet those needs that we
are charged to satisfy, but the planning is a very big thing.
Of course, in our industry, it's very difficult to predict expansion. So, for example, around me I see incredible explosion
of interest in quantum computing.
Very new thing.
It's fueled by curiosity and excitement among young people.
And, of course, there are no readily available hardware or resources to do to
let young people explore and experiment with this and it would be absolutely
fantastic if the great variety of modalities different possible computing
platforms was available for for young, for researchers, academics.
This is great to hear. So, on the one hand, there's challenges like funding. When are you
going to get money? On the other hand, there's the possibilities of interesting quantum computing,
and how are you going to access? So, challenges and opportunities at the same time can you tell me a bit more about
the funding situation so is it typical for universities to get funding cycles or is it
grants that different research projects are going to receive or how do you how are you going to
receive money or is it very unpredictable for you it's not entirely unpredictable because in the in case of a center like
mine ICM we are funded and there are five high performance computing
academic centers in Poland they are all in equal league each one has got the
really brand new data centers and about of the order of petaflop computing engine, each one of them.
So the funding is provided through three-year cycle funding from Ministry of Science and Higher Education.
So it goes directly from the ministry.
However, over the last three years this funding was not sufficient.
We experienced, we actually enjoyed a very fast development cycle. About four or five years ago
Poland was extremely fortunate to receive larger funding from European Union that allowed us to expand in the unprecedented way.
But with the very fast growth comes the period
of sort of unmet ongoing needs.
So operationally, all five centers suffered.
And they, of course, deal with that problem in different ways. One center is extremely renowned and active in the networking,
so of course they can manage somehow with different source of income,
but others were not so fortunate.
So it was a period of difficult three years.
Right now we are at the verge of a new three-year funding period.
We'll see how it will go, but I believe that this is typical,
not only of Poland, but other countries too.
Of course, there are interesting mechanisms of funding
at the European Union level, and right now we are at the verge of embarking on the EuroHPC program.
But of course, Euro believe, will be more appropriate
for extremely seasoned users.
And I always care about this group
of sort of newcomers.
I think we have to really think always
in an academic environment
of that specific group of people who have
not tried HPC yet.
And for that group, there are special needs.
They don't necessarily need to have a scale, but definitely they have to have a feel of
how it is to use resources that can expand sort of practically without limit.
That's great to hear.
Yeah, I think when you talk about the newcomers, it's fantastic that you're thinking about
them, many people coming into HPC, but without the 20 plus years of experience.
And I assume those are the people you want to be bringing along growing the
community educating um are you seeing growth in the people who have not used HPC before coming to
HPC today look I I see it because I want to see it and and actually I'm also actively looking for
for people who who have not experienced HPC.
And that's one of the reasons we have started training students for student cluster competition,
which is a fantastic kind of event run at SC and ISC in Germany and also in China by ASC organization.
I started the student team in Singapore.
They are extremely successful to the extent that after a few years,
they have won the competition in America at SC last year.
That was two years ago.
It was absolutely brilliant.
And now when I moved to Warsaw, I started the Warsaw team.
And this year, they will go to their sixth final of the competition.
So now I have two teams that I started at various competitions,
competing against each other.
Great to see.
And those people come to HPC not knowing anything.
Of course, they are brilliant.
They're very, very talented young people.
And Gabriel, I tell you, it's great fun to see people who come.
They are curious.
They have bright minds.
And suddenly they get excited in HPC.
But that's just, you know, that's very select.
You know, they are the sort of almost, you know,
they become almost professional after a few years of training.
But what I'm thinking is that right now, we have all the order of thousand registered users
on our HPC systems and some sort of semi-grid systems.
And within university, we have 50,000 students
of the order of 50,000.
It's the largest university in Poland.
And I don't see any reason why almost anybody at the university should
not have access to expandable resources when I say it's the resources that can
go from from one core to thousands of course of tens tens of thousands of
course or more depending on the needs and almost everybody has some some
nowadays everybody has computing and I think
university could provide that not necessarily through own resources as we
know of course that lots of academics and students already use there's
commercial clouds yeah so you'd like to see not a thousand users but 50,000
users using high performance computing.
This is a fantastic goal, fantastic vision.
Gabriel, I would like to completely, completely obliterate, remove the obstacles to access to computing resources.
And when I say computers of any scale, of
course the requirement has to be justified, but the entry should not be
difficult. Yeah, now this idea of democratizing access to everyone is
great to hear. I like your vision. I'm rediscovering the world by listening
to you and your students and congratulations to the progress they're making and look forward to
get to the 50 students and let's make it happen. Let me ask you, you said a bit earlier, there's
five centers in Poland. Can you tell us a bit more? We're not always in Poland and we don't know. Are these five centers different in terms of specialization? Are they similar?
Each of them focuses on a different area or how does it work?
They are different. First of all, a slight sort of correction.
There are five centers that are directly funded through the Ministry of Science and Higher Education. There is a sixth center which is slightly off-site but of course belongs to the same category.
That's the center of National Nuclear Research Institute.
And of course that center is mostly focused on nuclear research. They run nuclear reactor, they are very closely connected
with very large scale European experiments, so that is very focused facility. Of course
there is very good equipment there, facility is great, it's not very far from Warsaw,
and we collaborate with them. There are five centers and
they more or less became their operation at about the same time, about 25 years
ago it all started. My center is in Warsaw, it's connected to, it's part of
University of Warsaw and we traditionally were focused on very large-scale computations, more of a capability type.
And traditionally, we had very interesting computing equipment.
There was Cell Computer at one stage, Blue Gene P, Blue Gene Q, Power 7 machine, water cooled.
So some really, really interesting and more exotic type of computers.
Whereas other centers, especially two major ones are Sifronet in Krakow,
related to Academy of Mining and a very highly regarded school.
They actually operate the largest
supercomputing equipment right now
of the order of two and a half,
close to three petaflops.
Incidentally, it's HP system,
liquid cooled, Apollo kind.
And we also have Poznan Supercomputing and Networking Center.
That center is our leader in networking. So, of course, they have very substantial, very equal to ours computing capacity.
They also focus more on capacity. They also focus more on capacity and it's a cluster system, FAT3 connected.
For example, our system is CREAX C40. So it's ARIES. Again, different than most other centers
in Poland. So in terms of equipment, we differentiate slightly. Going back to Poznań Poznań one of the most brilliant thing
they have done and they are they were the leaders of this initiative in Poland about 10 years or
more ago they started project called pioneer and they have built optical network, academic optical network that is fully owned by this organization, Pionier.
It's shared by all the metropolitan centers and HPC centers.
So now in Poland, we have 7,500 kilometers of optical fiber,
and we don't have to pay commercial carriers for that.
It allows us also to do all sorts of experiments
and tests and in the sense of networking, we really are world leaders. And this is of
course due to the work of predominantly Poznan Supercomputing and Networking Center, with
whom of course ICN collaborates very closely.
So, Marek, you're very familiar with these different centers and the capabilities they
have.
So, I think it's a great segue onto the question I also wanted to ask you, which is, how do
you view the possibilities of assuming all these centers become a pool for us to use. So if I'm a user,
I'm not just a user at the Warsaw Center, but I'm a user of this community. And when I submit jobs,
the jobs are going to run on the best place to run a job. So we move from one system,
one center to now I have access to all the system, I have access
to a variety of architectures, I have access to systems that may have lower weighting queue
than other systems.
Me as a user benefit from that variety.
If I'm an academic, I'm a researcher, I get the advantage of the multiple architectures
and the new architectures or even the reduced waiting queue.
How do you see that as a possibility?
What are your views on that?
Gabriel, actually, we are going directly into cloud,
HPC cloud solutions.
But the interesting thing is that sometimes it's very difficult to be original.
And in a sense, here, we are not original.
What we are talking here is about, in a certain sense,
refresh of the technology,
expansion of something that has already existed.
Because one of the very, very neat thing
that they have introduced in Poland
is a solution called GridPL.
And actually, GridPL was driven by another center,
by the Sifronet Center,
and that's their huge achievement.
And of course, we are part of it.
So basically, we did have it,
but it was not as scalable
and not as flexible as what you can achieve
with cloud solutions because
grid grid pl was in a certain sense something like exceed program in America
or you know predecessor of exceed but of course it's not based on on it's rather
rigid it's basically a grid system with on solutions and some users, large group of users in Poland,
academic users, were already using resources from various centers.
And we have one or two very substantial clusters that were explicitly or exclusively reserved
for this grid PL work.
And they were actually acquired through special funding from this very very large
project. This project has been going on for three years, for three rounds, was
three stages of grid PL,
grid
project and but what I see is that this moving to much more sophisticated and flexible technology
that is allowed now through cloud, HPC cloud solutions and provisioning,
is very natural progression, a very natural step.
I can't see a way out of it.
Okay, that sounds very interesting because
we could say that grid
has existed for some
time. It's not
massively
adopted today. The concept has
existed. You're
seeing positives now with
HPC cloud platforms.
Are there any
elements of that that you think make it more attractive
than the way we used to approach grid in the last 10 years or something like that? Are
you seeing some aspects of that that you particularly like? I'm curious.
Well, I always saw advantages of grid computing. And actually, all of those things, when I look back into how they developed,
they're very natural progression.
And I remember back in the 90s when I was in Canberra,
one of my pals, Russ Standish at ANU,
he was using cycle harvesting from all sorts of resources at the
University and then we had this Condor solution and similar solutions. So there
were solutions that were sort of addressing the problem of wasting of
resources, extremely huge waste of computing resources. Then we had this progression to things like grid.
You see it
in places
like Poland
with the
grid solution. But they are
very rigid.
There's still
configuration of the system
is different. So we had to wait
for time where
certain technical and business solutions had to be found. And I give you
examples. First thing in the context of HPC is provisioning of topology and
network. That was not possible and you couldn't do it in an easy way.
Nowadays you can do it.
Then you have to have things like,
in order to merge those things,
you have to have interplay
between cloud provisioning
and batch provisioning,
queuing systems and schedulers.
If you can merge them,
so you can actually
mix interactive
and batch processing. And one of the
fantastic, and of course grid
was also batch processing.
So it had
these features of typical
HPC environment and rather
rigid, whereas with cloud
we can actually treat
supercomputing as your
desktop and moving from batch to interactive whereas with cloud you can we can actually treat supercomputing as your
desktop and and moving from batch to interactive I think it's it makes huge
difference then there are other other very important things that you can
nowadays with the development of containers Dockers and singularity and
whatnot you can actually package you package not only your program,
not only your problem, but the whole environment.
And then you touch into two most important things that I think are also emerging.
Of course, it's very well recognized for forever, which is correctness and reproducibility.
Of course, you need to run it on the system that is stable,
but you also have to understand that your results are correct
and you can repeat them.
And you cannot repeat them at different pieces of hardware,
but also at different time points.
So after a few years, you can go back to your computations
and have similar results.
So next aspect that was a little bit sort of causing difficulty was data movement.
But nowadays, that is also this obstacle is being removed.
And you have various concepts like in-memory computing
and of course you also have huge pipelines
and you can move.
And also if you make computing resource ubiquitous,
then actually it doesn't really matter where you compute.
You actually should move your compute
part to where your data
is. And if there is a proliferation
or there's a widely accepted
cloud solution,
it doesn't really matter where you compute.
Then again, that
will lead to reduction of costs.
So all those things
lead me to think that there's absolutely
no way out of these things. So people have been talking for a long time about utility,
computing as utility. And you see it's happening. And I have to really congratulate people
who started Uber Cloud, for example.
UberCloud has just got some accolades and was recognized.
And you see other things in industry.
For example, the fact that IBM has acquired Red Hat
is interpreted by many analysts as the move to cloud and provision of the sort and
then you will see convergence and you already see it that that commercial
enterprise kind of cloud providers are slowly moving and crouching to HPC
territory so suddenly you can have CRASE available as cloud instances
or FPGAs or GPU enabled hardware as cloud hardware. And that's actually
perfect. That's exactly what should be happening, the merging of the worlds.
This sounds like, I like your thinking.
I like how you go from, you know, we were always trying to get there,
like grid was trying to get there, but maybe it wasn't as flexible.
With the advent of cloud, now we have more flexibility
to enable this pool of systems to be together.
And in addition, you have this variety of architecture you can take advantage of.
So I think we always wanted to do this, but it's getting much, much closer with the cloud
platforms being developed today.
So I think your vision is one that maybe you see it happening, and you've been seeing it
happening for a while.
It's materializing slowly.
The question I would ask you now is
what challenges do you see
in terms of this transformation for HPC in academia?
Is everybody with you
or are you fighting this battle a bit in a lonely way?
Not lonely.
There is a battle for sure
and there will be battle for sure and I think
the main obstacles
there are various kinds
of obstacles one is
psychology human factor
and people are actually
very afraid
of losing control and losing
their own territory
and I've seen it for
when I worked in Singapore,
my colleagues, excellent technical people, were so skeptical and critical about cloud.
Of course, the time was different. That was about 10 years ago. And surely there was no
certain pieces of solution like provision of interconnect in Finiband, for example.
But now those technical obstacles
are being conquered.
They are not a problem anymore.
But people are always constantly,
you know,
afraid of losing their sort of
special position or their expertise.
But I never worry about it.
Because whatever way you provision resources you still need to have expertise to guide users
and there will be hordes of new users so I think it would be if boom you know it
would be excellent time for us to offer our expertise. It's not threatening at all for me. Of course, there are other
things like the way of thinking. People who live in the enterprise or commodity type of
environment, they don't necessarily fully understand, are not attuned to the specific needs of HPC and supercomputing.
And so there's a, when you merge those two words,
people might not fully understand.
Of course, there are very smart people on both sides,
but there will be some differences.
And I can give you an example.
For example, in many places, you have certain resources available
as cloud at various universities.
But usually those resources are managed, administered by different group
of people who come from this commodity world way of thinking.
And when we start talking, you know, they also afraid that we would be encroaching on their territory
because basically it is about removing the barriers.
But for me, those barriers are for users, not for operators. So, of course, the difficulties on the side of those who manage,
those who get funding, those who decide about what kind of resources
should be acquired or merged or accessible.
And so the greatest factor is always human psychology.
Not technology, not hardware, not space, and nothing else.
Of course, money too.
Yeah, of course.
There's always money in the equation.
Marek, I'm left with this great impression of your vision, which is we are moving to this new world where cloud technology and cloud platforms are going to enable this merging or this ability to use multiple systems across academic institutions, to use multiple architectures that they become available, whatever they are.
We always try to do this.
This is previous times.
We tried with grid, but it wasn't as flexible.
It's happening now.
And maybe because it's happening, it represents change.
And some people are seeing this like encroaching in their territory.
But that will happen, and people find new ways of
work of that it will all benefit from the changes coming along so it's great
to hear your thinking in the process and try to learn as I as I hear this before
you close I like to ask anything you'd like to add to this well I think I
think just to talk two extra things that
I was thinking
when collecting my
thoughts before our
discussion. There are two
things. One is
this human in the loop
thing that really interests
me and things
that are related that is
interactive programming and visualization and also ability
to test arbitrary kind of hardware that might be very rare or very expensive or exotic and
with cloud and with sort of globally distributed resources.
Incidentally, we have been working on this globally distributed resources. Incidentally, we have been working
on this globally distributed resources
by building Infinii Cortex project for three years
in 2014 to 2016 with about 40 to 50 different organizations
in the world.
So that was moving into that, into sort of making,
basically breaking down all the barriers of distance, country borders,
and the divisions between, you know, continents.
If you sort of merge it, you can build one huge, humongous resource
that can be chopped in different ways.
It could be, sometimes it could be used as one single humongous computer of unprecedented
scales.
On the other hand, it can be used by a great number of people for smaller tasks, because
not every task and a very, very interesting scientific problem, academic problem, doesn't
have to be huge in size. So I would like to very much thank our guest,
Marek Michalewicz,
director of ICM,
the High Performance Computing Center
at the University of Warsaw,
for sharing his experience and his vision
to help us understand
the future of HPC in academia.
Until next time,
I'm Gabriel Bronner,
and this was the Big Compute Podcast.
Thank you.
Thanks. Thank you.