In The Arena by TechArena - Forecasting HPC, AI, and Next-Gen Weather Models with Peter Dueben

Episode Date: November 20, 2024

Peter Dueben of European Centre for Medium-Range Weather Forecasts explores the role of HPC and AI in advancing weather modeling, tackling climate challenges, and scaling predictions to the kilometer ...level.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome in the arena. My name is Alison Klein. We are reporting on supercomputing this week, and I am so delighted to be with Peter Dubin, head of Earth System Modeling section at the European Center for Medium-Range Weather Forecasts. Welcome to the program,
Starting point is 00:00:39 Peter. Thank you very much for having me. Now, as your title suggests, Peter, your team is very involved in weather modeling. Can you just give us a brief introduction on yourself as well as the work that you're doing in driving more accurate forecasts of the weather? Yes, of course. So we are building weather prediction systems. And this basically means that we are building models that represent the Earth system. So we have all the different components represented with atmosphere and ocean and land surface and cloud physics and all the things that you could think of basically that somehow influence the weather.
Starting point is 00:01:13 And then we build global models and these models will then basically predict the future of the Earth system. And in that sense, they can be used for weather predictions, just a couple of hours or days into the future, but also for very long-time predictions in terms of, for example, seasonal predictions and all the way to climate modeling as well. Now, obviously that modeling is based on what's happened in the past and the world has really been turned upside down with climate change. How are you addressing this in terms of changing systems and why is this research so important right now? Yeah, it's a good question. I mean, climate change is there. We see that the weather regimes are also
Starting point is 00:01:50 changing. On the other hand, we're building physical systems. So the systems basically represent the whole world and the physical equation. So when there's change in climate, we normally would assume that the models can also cope with that and still represent it very well. But a lot of our work is also to consider how these models are working in particular in extreme events that will also change with climate change and then to basically always diagnose and how much we are representing the events as they happen. There's one thing to it, which is since a couple of years, machine learning models are becoming more and more important.
Starting point is 00:02:23 And here it's actually much more critical to really ask the questions whether they are as good as conventional physics-based models regarding climate change, because these machine learning models are trained for a specific climate of the past 40 or 50 years, and we don't know exactly whether they can extrapolate properly into the future. So if we want to look into climate change with machine learning models, we have to be much more careful. But the physics-based model we normally assume are actually capable of representing those changes. Now, when you look at the work that you're doing, how do you see that influencing the broader scientific community? And who's tapping your research for practical applications?
Starting point is 00:03:04 We are one of the big supercomputing applications and we also have a very challenging application principle because we have very large model systems and we also come with a lot of data movement and kind of data management. So I view myself basically a bit of a challenge to supercomputing to some extent. To really have a high impact application that is really worth it by really supercomputer can also make a difference. So the more power we have in computing, the higher the resolution we can work with and the better the prediction is going to be. And it's certainly high impact
Starting point is 00:03:29 because I don't know how many of you have checked the weather forecast this morning, but I guess most of you. So it's something where people are actually also feeling the difference every day if the prediction is getting worse or better. And also there's a lot of industrial applications as well behind this.
Starting point is 00:03:43 So if we get the weather forecast right, for example, this really helps as well. Renewable energy generation, for example, with solar power plants and wind fields, it will help the traffic sector to make sure that the planes are going off and landing safely. It will help industry in a sense that we will be able to make more statements about shipping routes that will be open or closed and all these kinds of questions. So it's one of the big high-impact applications as well for supercomputing. And I would claim that it's actually also a very nice challenge in the high-performance computing community.
Starting point is 00:04:12 Yeah. And you mentioned HPC. Obviously, the physics models and the new machine learning models are tightly tied to HPC capability. What are the analytics that you're addressing when it comes to these models and how does HPC integrate into the problems you're trying to solve? In general, it is clear that we had kind of the steady progress in high-performance computing, the exponential growth of the compute power. We really were able to have an ever-increasing increase in resolutions of the model we were working with and therefore we could represent more and more features and therefore we could also improve our weather predictions.
Starting point is 00:04:45 And we call this the quiet revolution of weather predictions that we had like decades of steady progress in time. We now see that we're kind of reaching the end of most law. And we also see that we reach this sweet spot of basically just waiting a couple of years and have more capacity. And on the other hand, now machine learning is stepping in, which has been quite influential in our community. It also has shown us that we can really improve our predictions further by using large machine learning models.
Starting point is 00:05:09 And then we basically now also switch to the normal problems and challenges of machine learning applications, like running on loads of GPUs, trying to get the data sets created in the proper way, making the analysis to make sure that the models are not hallucinating or something like this, but actually producing good results. And supercomputing is really the backbone of these developments, in particular also for machine learning. Now, obviously, there is tight coupling of understanding the models and their requirements and building underlying infrastructure to support that. Can you talk a little bit about how your group has approached that challenge? Of the use of infrastructure for the physics model, it was really a big challenge
Starting point is 00:05:50 to get the models working well, for example, on GPUs, right? We are coming from the Fortran world still. We have very large models, up to a million lines of code, for example, and to get those models running efficiently on GPUs is non-trivial. And we've spent significant resources really to make this happen, investing into domain-specific languages, for example, to make sure that we can separate concerns between domain scientists and the computing scientists. But I think we actually managed well in a sense that we are now able to run those model simulations on GPUs on some of the fastest and biggest systems in the world. It's a constant development, obviously.
Starting point is 00:06:23 We're also spending an increasing amount of resources and staff to make our models really efficient on those supercomputers, but we get the benefit. We're doing well. Now, Supercomputing 2024 is upon us, and you're a speaker at the event. What is your topic? I'm basically going to talk about what we call the digital revolution. Once this kind of quiet revolution that I already mentioned stopped, we had to revise our models and make them able to run on GPUs at scale toward the exascale
Starting point is 00:06:50 supercomputers as well. And this is allowing us now to run our models at what we call the kilometer scale. So we used to run our models at something like 10 kilometer resolution, and now we're pushing this to one kilometer resolution levels. And if you do so, you basically resolve very important processes of the atmosphere in particular, like individual thunderstorms, but also the interactions between small-scale topography features and the atmosphere. And this will actually allow you to really improve
Starting point is 00:07:13 the quality of your prediction further. And that was possible via larger-scale projects. For example, the Destination Earth project in Europe, where we're kind of building digital twins of the Earth that is basically described around a kilometer scale system model. And the last topic is definitely also going to be the machine learning revolution that was happening two years ago with kind of the first model showing that you can actually build competitive model forecasts with pure machine learning models when compared to physics-based models. And this has really changed the community
Starting point is 00:07:42 quite a lot. And we're still digesting what happened there. But we're also getting on very well in the sense that we are now having the first more or less operational weather forecast systems for global weather and based on pure machine learning models. And that's also going to be part of the talk. Now, the world is talking about AI. You mentioned machine learning as something that you're pursuing for your research. There's obviously a lot of talk about parallelism between HPC clusters and AI training models. Do you see opportunity to tap AI more fully in your research? And how do you see these worlds colliding more holistically? Yes. So first of all, machine learning is really there to stay. So we have now the first machine learning model running operationally or semi-operationally at EastWF, and it will continue to be there. And it will provide probably most of the useful weather predictions in the small. So we're basically talking about like multi-nodes training and then inference on a single GPU. But this is just a matter of time until we reach more and more complex configurations
Starting point is 00:08:52 for the training of those machine learning models. I guess they are growing exponentially currently in size. And a good question is actually, where will this stop? It's a bit unclear right now whether and when we will run ever, like from petascale datasets, for example, the training on the Nexus scale supercomputer, or whether this is not really helping that much in the end. So I think that's one of the questions of the community is how complex will those machine learning tools become?
Starting point is 00:09:16 And this is, to be honest, also one of the questions that I'm looking forward to discuss with everyone at supercomputing in the sense like we're following a lot what other domains have been doing in machine learning regarding the tools and the developments. And now in particular for things like foundation models that have been very influential in other domains, the question is how much we will be able to also get benefit from this in a physical domain like weather and climate. And in particular, we will start a new project at least in WDF and in Europe, which is called the weather generator.
Starting point is 00:09:44 We want to build a foundation model for weather and climate. So we definitely want to address this challenge and really scale our machine learning problems to a very large size and building basically generic tools that can take in more than one application. It's not more than just forecasting, for example, in a single code base. And that's an exciting development. And no one really knows exactly where it's going to end. When you look at the research that you're undertaking, you've talked about tens of kilometers to single kilometers. What's the holy grail within this space in terms of the next major breakthrough from a scientific perspective? Yeah, so for physics-based modeling, that's definitely the kilometer scale. And we're
Starting point is 00:10:24 actually doing very well to get there. So with these big projects such as Destination Earth, we actually have pushed the limit quite a lot. And recently, we have been able to, for example, run decade-long simulations at kilometer scale resolution. We are able to run on a daily basis kilometer scale forecasts and so on. There has been quite a push to reach this quotation mark, holy grail of physics-based modeling. For machine learning, it's still a little bit different in a sense that I think we still don't really know where we're getting to in the future to some extent.
Starting point is 00:10:52 Will we have a very high resolution machine learning model or will we rather have a coarser resolution model that is customized for different applications by channel networks or something like this? So it's a bit more difficult to kind of find the holy grail there, but it's going to be exciting anyway. That's fantastic. One final question is, you have a session at SC 2024 and folks should absolutely check that out. But if folks are listening online and they want to connect with you and your research team to learn more about
Starting point is 00:11:20 what you're delivering or potentially collaborate, where should they go for more information? Yeah, they can definitely go, for example, onto the ECMWF webpage. So it would be ecmwf.int and we have blogs there and we have news items in particular. We are quite active there for the machine learning side. You will also find, for example, a news item on the weather generator project that I just mentioned. You are absolutely welcome to get into contact with me directly. And as you say, I would be very thrilled to see you all at the Supercomputing Conference, either at my talk or somewhere else. Please get into contact.
Starting point is 00:11:54 Peter, thank you so much for spending some time with us today. It's always exciting to talk to folks who are on the front lines of scientific discovery, and your work is very exciting and impactful for the world. Thank you for taking your time for The Tech Arena. Thank you very much for having me. Thanks for joining The Tech Arena. Subscribe and engage at our website, thetecharena.net.
Starting point is 00:12:19 All content is copyright by The Tech Arena.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.