In The Arena by TechArena - Forecasting HPC, AI, and Next-Gen Weather Models with Peter Dueben
Episode Date: November 20, 2024Peter Dueben of European Centre for Medium-Range Weather Forecasts explores the role of HPC and AI in advancing weather modeling, tackling climate challenges, and scaling predictions to the kilometer ...level.
Transcript
Discussion (0)
Welcome to the Tech Arena,
featuring authentic discussions between
tech's leading innovators and our host, Alison Klein.
Now, let's step into the arena.
Welcome in the arena.
My name is Alison Klein. We are reporting on supercomputing
this week, and I am so delighted to be with Peter Dubin, head of Earth System Modeling
section at the European Center for Medium-Range Weather Forecasts. Welcome to the program,
Peter. Thank you very much for having me. Now, as your title suggests, Peter, your team
is very involved in weather modeling.
Can you just give us a brief introduction on yourself as well as the work that you're doing in driving more accurate forecasts of the weather?
Yes, of course.
So we are building weather prediction systems.
And this basically means that we are building models that represent the Earth system. So we have all the different components represented with atmosphere and ocean and land surface
and cloud physics and all the things that you could think of basically that somehow
influence the weather.
And then we build global models and these models will then basically predict the future
of the Earth system.
And in that sense, they can be used for weather predictions, just a couple of hours or days
into the future, but also for very long-time predictions in terms of, for example, seasonal predictions and all the
way to climate modeling as well. Now, obviously that modeling is based on what's happened in the
past and the world has really been turned upside down with climate change. How are you addressing
this in terms of changing systems and why is this research so important right now?
Yeah, it's a good question. I mean, climate change is there. We see that the weather regimes are also
changing. On the other hand, we're building physical systems. So the systems basically
represent the whole world and the physical equation. So when there's change in climate,
we normally would assume that the models can also cope with that and still represent it very well.
But a lot of our work is also to consider how these models are working in particular
in extreme events that will also change with climate change and then to basically always
diagnose and how much we are representing the events as they happen.
There's one thing to it, which is since a couple of years, machine learning models are
becoming more and more important.
And here it's actually much more critical to really ask the questions whether they are as good as conventional physics-based models
regarding climate change, because these machine learning models are trained for a specific climate
of the past 40 or 50 years, and we don't know exactly whether they can extrapolate properly
into the future. So if we want to look into climate change with machine learning models,
we have to be much more careful.
But the physics-based model we normally assume are actually capable of representing those changes.
Now, when you look at the work that you're doing, how do you see that influencing the broader scientific community?
And who's tapping your research for practical applications?
We are one of the big supercomputing applications and we also have a very challenging
application principle because we have very large model systems and we also come with a lot of data
movement and kind of data management. So I view myself basically a bit of a challenge to
supercomputing to some extent. To really have a high impact application that is really worth it
by really supercomputer can also make a difference. So the more power we have in computing, the higher
the resolution we can work with
and the better the prediction is going to be.
And it's certainly high impact
because I don't know how many of you
have checked the weather forecast this morning,
but I guess most of you.
So it's something where people are actually
also feeling the difference every day
if the prediction is getting worse or better.
And also there's a lot of industrial applications
as well behind this.
So if we get the weather forecast right, for example, this really helps as well.
Renewable energy generation, for example, with solar power plants and wind fields, it
will help the traffic sector to make sure that the planes are going off and landing
safely.
It will help industry in a sense that we will be able to make more statements about shipping
routes that will be open or closed and all these kinds of questions.
So it's one of the big high-impact applications as well for supercomputing. And I would claim
that it's actually also a very nice challenge in the high-performance computing community.
Yeah. And you mentioned HPC. Obviously, the physics models and the new machine learning
models are tightly tied to HPC capability. What are the analytics that you're addressing
when it comes to these models and how
does HPC integrate into the problems you're trying to solve? In general, it is clear that we had kind
of the steady progress in high-performance computing, the exponential growth of the
compute power. We really were able to have an ever-increasing increase in resolutions of the
model we were working with and therefore we could represent more and more features and therefore we
could also improve our weather predictions.
And we call this the quiet revolution of weather predictions that we had like decades of steady
progress in time.
We now see that we're kind of reaching the end of most law.
And we also see that we reach this sweet spot of basically just waiting a couple of years
and have more capacity.
And on the other hand, now machine learning is stepping in, which has been quite influential
in our community.
It also has shown us that we can really improve our predictions further by using large machine learning models.
And then we basically now also switch to the normal problems and challenges of machine learning applications,
like running on loads of GPUs, trying to get the data sets created in the proper way,
making the analysis to make sure that the models are not hallucinating or something like this, but actually producing good results.
And supercomputing is really the backbone of these developments, in particular also
for machine learning.
Now, obviously, there is tight coupling of understanding the models and their requirements
and building underlying infrastructure to support that.
Can you talk a little bit about how your group has approached that challenge? Of the use of infrastructure for the physics model, it was really a big challenge
to get the models working well, for example, on GPUs, right? We are coming from the Fortran world
still. We have very large models, up to a million lines of code, for example, and to get those
models running efficiently on GPUs is non-trivial. And we've spent significant resources really to make this happen,
investing into domain-specific languages, for example,
to make sure that we can separate concerns between domain scientists and the computing scientists.
But I think we actually managed well in a sense that we are now able to run
those model simulations on GPUs on some of the fastest and biggest systems in the world.
It's a constant development, obviously.
We're also spending an increasing amount of resources and staff
to make our models really efficient on those supercomputers,
but we get the benefit. We're doing well.
Now, Supercomputing 2024 is upon us, and you're a speaker at the event.
What is your topic?
I'm basically going to talk about what we call the digital revolution.
Once this kind of quiet revolution that I already mentioned stopped,
we had to revise our models and make them able to run on GPUs at scale toward the exascale
supercomputers as well. And this is allowing us now to run our models at what we call the
kilometer scale. So we used to run our models at something like 10 kilometer resolution,
and now we're pushing this to one kilometer resolution levels. And if you do so, you basically
resolve very important processes of the atmosphere in particular,
like individual thunderstorms,
but also the interactions between
small-scale topography features and the atmosphere.
And this will actually allow you to really improve
the quality of your prediction further.
And that was possible via larger-scale projects.
For example, the Destination Earth project in Europe,
where we're kind of building digital twins of the Earth
that is basically described around a kilometer scale system model. And the last topic is definitely
also going to be the machine learning revolution that was happening two years ago with kind of the
first model showing that you can actually build competitive model forecasts with pure machine
learning models when compared to physics-based models. And this has really changed the community
quite a lot. And we're still digesting what happened there. But we're also getting on very well in the sense that we are now having the first
more or less operational weather forecast systems for global weather and based on pure
machine learning models. And that's also going to be part of the talk.
Now, the world is talking about AI. You mentioned machine learning as something that you're
pursuing for your research. There's obviously a lot of talk about parallelism between HPC clusters and AI training models. Do you see
opportunity to tap AI more fully in your research? And how do you see these worlds colliding more
holistically? Yes. So first of all, machine learning is really there to stay. So we have now the first machine learning model running operationally or semi-operationally at EastWF, and it will continue to be there. And it will provide probably most of the useful weather predictions in the small. So we're basically talking about like multi-nodes training and then inference on a single GPU.
But this is just a matter of time until we reach more and more complex configurations
for the training of those machine learning models.
I guess they are growing exponentially currently in size.
And a good question is actually, where will this stop?
It's a bit unclear right now whether and when we will run ever, like from petascale
datasets, for example, the training on the Nexus scale supercomputer, or whether this
is not really helping that much in the end.
So I think that's one of the questions of the community is how complex will those machine
learning tools become?
And this is, to be honest, also one of the questions that I'm looking forward to discuss
with everyone at supercomputing in the sense like we're following a lot what other domains
have been doing in machine learning regarding the tools and the developments.
And now in particular for things like foundation models that have been very influential in
other domains, the question is how much we will be able to also get benefit from this
in a physical domain like weather and climate.
And in particular, we will start a new project at least in WDF and in Europe, which is called
the weather generator.
We want to build a foundation model for weather and climate.
So we definitely want to address this challenge and really scale our machine learning problems to a very large size and building basically generic tools that can take in more than one application.
It's not more than just forecasting, for example, in a single code base.
And that's an exciting development.
And no one really knows exactly where it's going to end.
When you look at the research that you're undertaking, you've talked about tens of kilometers to single kilometers. What's the holy grail within this space in terms of
the next major breakthrough from a scientific perspective?
Yeah, so for physics-based modeling, that's definitely the kilometer scale. And we're
actually doing very well to get there.
So with these big projects such as Destination Earth, we actually have pushed the limit quite a lot.
And recently, we have been able to, for example, run decade-long simulations at kilometer scale resolution.
We are able to run on a daily basis kilometer scale forecasts and so on.
There has been quite a push to reach this quotation mark, holy grail of physics-based modeling.
For machine learning, it's still a little bit different
in a sense that I think we still don't really know
where we're getting to in the future to some extent.
Will we have a very high resolution machine learning model
or will we rather have a coarser resolution model
that is customized for different applications
by channel networks or something like this?
So it's a bit more difficult to kind of find the holy grail there,
but it's going to be exciting anyway. That's fantastic. One final question is,
you have a session at SC 2024 and folks should absolutely check that out. But if folks are
listening online and they want to connect with you and your research team to learn more about
what you're delivering or potentially collaborate, where should they go for more information?
Yeah, they can definitely go, for example, onto the ECMWF webpage. So it would be ecmwf.int and we have blogs there and we have news items in particular. We are quite active there for the
machine learning side. You will also find, for example, a news item on the weather generator
project that I just mentioned. You are absolutely welcome to get into contact with me directly.
And as you say, I would be very thrilled to see you all
at the Supercomputing Conference,
either at my talk or somewhere else.
Please get into contact.
Peter, thank you so much for spending some time with us today.
It's always exciting to talk to folks
who are on the front lines of scientific discovery,
and your work is very exciting and impactful for the world.
Thank you for taking your time for The Tech Arena.
Thank you very much for having me.
Thanks for joining The Tech Arena.
Subscribe and engage at our website, thetecharena.net.
All content is copyright by The Tech Arena.