Griffin Mooers, Mike Pritchard, Tom Beucler, Prakhar Srivastava, Harshini Mangipudi, Liran Peng, Pierre Gentine, Stephan Mandt
Global Storm-Resolving Models (GSRMs) have gained widespread interest because of the unprecedented detail with which they resolve the global climate. However, it remains difficult to quantify objective differences in how GSRMs resolve complex atmospheric formations. This lack of comprehensive tools for comparing model similarities is a problem in many disparate fields that involve simulation tools for complex data. To address this challenge we develop methods to estimate distributional distances based on both nonlinear dimensionality reduction and vector quantization. Our approach automatically learns physically meaningful notions of similarity from low-dimensional latent data representations that the different models produce. This enables an intercomparison of nine GSRMs based on their high-dimensional simulation data (2D vertical velocity snapshots) and reveals that only six are similar in their representation of atmospheric dynamics. Furthermore, we uncover signatures of the convective response to global warming in a fully unsupervised way. Our study provides a path toward evaluating future high-resolution simulation data more objectively.
Arthur Grundner, Tom Beucler, Pierre Gentine, Fernando Iglesias-Suarez, Marco A. Giorgetta, Veronika Eyring
A promising approach to improve cloud parameterizations within climate models and thus climate projections is to use deep learning in combination with training data from storm-resolving model (SRM) simulations. The ICOsahedral Non-hydrostatic (ICON) modeling framework permits simulations ranging from numerical weather prediction to climate projections, making it an ideal target to develop neural network (NN) based parameterizations for sub-grid scale processes. Within the ICON framework, we train NN based cloud cover parameterizations with coarse-grained data based on realistic regional and global ICON SRM simulations. We set up three different types of NNs that differ in the degree of vertical locality they assume for diagnosing cloud cover from coarse-grained atmospheric state variables. The NNs accurately estimate sub-grid scale cloud cover from coarse-grained data that has similar geographical characteristics as their training data. Additionally, globally trained NNs can reproduce sub-grid scale cloud cover of the regional SRM simulation. Using the game-theory based interpretability library SHapley Additive exPlanations, we identify an overemphasis on specific humidity and cloud ice as the reason why our column-based NN cannot perfectly generalize from the global to the regional coarse-grained SRM data. The interpretability tool also helps visualize similarities and differences in feature importance between regionally and globally trained column-based NNs, and reveals a local relationship between their cloud cover predictions and the thermodynamic environment. Our results show the potential of deep learning to derive accurate yet interpretable cloud cover parameterizations from global SRMs, and suggest that neighborhood-based models may be a good compromise between accuracy and generalizability.
Harshini Mangipudi, Griffin Mooers, Mike Pritchard, Tom Beucler, Stephan Mandt
Understanding the details of small-scale convection and storm formation is crucial to accurately represent the larger-scale planetary dynamics. Presently, atmospheric scientists run high-resolution, storm-resolving simulations to capture these kilometer-scale weather details. However, because they contain abundant information, these simulations can be overwhelming to analyze using conventional approaches. This paper takes a data-driven approach and jointly embeds spatial arrays of vertical wind velocities, temperatures, and water vapor information as three "channels" of a VAE architecture. Our "multi-channel VAE" results in more interpretable and robust latent structures than earlier work analyzing vertical velocities in isolation. Analyzing and clustering the VAE's latent space identifies weather patterns and their geographical manifestations in a fully unsupervised fashion. Our approach shows that VAEs can play essential roles in analyzing high-dimensional simulation data and extracting critical weather and climate characteristics.
Tom Beucler, Arthur Grundner, Sara Shamekh, Peter Ukkonen, Matthew Chantry, Ryan Lagerquist
The added value of machine learning for weather and climate applications is measurable through performance metrics, but explaining it remains challenging, particularly for large deep learning models. Inspired by climate model hierarchies, we propose that a full hierarchy of Pareto-optimal models, defined within an appropriately determined error-complexity plane, can guide model development and help understand the models' added value. We demonstrate the use of Pareto fronts in atmospheric physics through three sample applications, with hierarchies ranging from semi-empirical models with minimal parameters to deep learning algorithms. First, in cloud cover parameterization, we find that neural networks identify nonlinear relationships between cloud cover and its thermodynamic environment, and assimilate previously neglected features such as vertical gradients in relative humidity that improve the representation of low cloud cover. This added value is condensed into a ten-parameter equation that rivals deep learning models. Second, we establish a machine learning model hierarchy for emulating shortwave radiative transfer, distilling the importance of bidirectional vertical connectivity for accurately representing absorption and scattering, especially for multiple cloud layers. Third, we emphasize the importance of convective organization information when modeling the relationship between tropical precipitation and its surrounding environment. We discuss the added value of temporal memory when high-resolution spatial information is unavailable, with implications for precipitation parameterization. Therefore, by comparing data-driven models directly with existing schemes using Pareto optimality, we promote process understanding by hierarchically unveiling system complexity, with the hope of improving the trustworthiness of machine learning models in atmospheric applications.
Jason C. Furtado, Maria J. Molina, Marybeth C. Arcodia, Weston Anderson, Tom Beucler, John A. Callahan, Laura M. Ciasto, Vittorio A. Gensini, Michelle L'Heureux, Kathleen Pegion, Jhayron S. Pérez-Carrasquilla, Maike Sonnewald, Ken Takahashi, Baoqiang Xiang, Brian G. Zimmerman
Artificial intelligence (AI) - and specifically machine learning (ML) - applications for climate prediction across timescales are proliferating quickly. The emergence of these methods prompts a revisit to the impact of data preprocessing, a topic familiar to the climate community, as more traditional statistical models work with relatively small sample sizes. Indeed, the skill and confidence in the forecasts produced by data-driven models are directly influenced by the quality of the datasets and how they are treated during model development, thus yielding the colloquialism, "garbage in, garbage out." As such, this article establishes protocols for the proper preprocessing of input data for AI/ML models designed for climate prediction (i.e., subseasonal to decadal and longer). The three aims are to: (1) educate researchers, developers, and end users on the effects that preprocessing has on climate predictions; (2) provide recommended practices for data preprocessing for such applications; and (3) empower end users to decipher whether the models they are using are properly designed for their objectives. Specific topics covered in this article include the creation of (standardized) anomalies, dealing with non-stationarity and the spatiotemporally correlated nature of climate data, and handling of extreme values and variables with potentially complex distributions. Case studies will illustrate how using different preprocessing techniques can produce different predictions from the same model, which can create confusion and decrease confidence in the overall process. Ultimately, implementing the recommended practices set forth in this article will enhance the robustness and transparency of AI/ML in climate prediction studies.
Erisa Ismaili, Robert C. Jnglin Wills, Tom Beucler
Machine learning (ML) can represent processes unresolved in coarse-resolution Earth system models (ESMs) by learning from high-resolution climate data. Such ML parameterization approaches have been primarily tested in idealized setups where they have focused on deep convection. It remains largely unexplored whether these approaches could be used in a more targeted fashion to learn vertical fluxes resulting from midlatitude mesoscale processes, such as slantwise convection and frontal dynamics in extratropical cyclones, which are not well represented in ESMs. To address this, we employ a variable-resolution CESM2 simulation with a refined area over the North Atlantic (14-km grid refinement) that resolves such midlatitude mesoscale processes. We train an artificial neural network to predict vertical profiles of mesoscale moisture, heat, and momentum fluxes from the perspective of a coarse-resolution (111-km grid) model. Our results show that a large number of features are required to achieve reasonable model performance when data come from the midlatitudes of real-geography atmospheric simulations, especially when coarse-grained vertical velocities, which we show are not representative of vertical velocities in a coarse-resolution model, are excluded as inputs. Feature importance analysis reveals the importance of vertically non-local information in temperature, moisture, and the meridional wind. We suggest that these non-local relationships capture the influence of cold air outbreaks and fronts on mesoscale fluxes. Our results demonstrate the importance of vertically non-local processes, clarify the regime-dependent predictability of mesoscale fluxes, and identify variables most informative for their parameterization, providing guidance for improving ESMs with ML and advancing our understanding of multi-scale interactions in the midlatitudes.
Tom Beucler, Michael Pritchard, Pierre Gentine, Stephan Rasp
Data-driven algorithms, in particular neural networks, can emulate the effect of sub-grid scale processes in coarse-resolution climate models if trained on high-resolution climate simulations. However, they may violate key physical constraints and lack the ability to generalize outside of their training set. Here, we show that physical constraints can be enforced in neural networks, either approximately by adapting the loss function or to within machine precision by adapting the architecture. As these physical constraints are insufficient to guarantee generalizability, we additionally propose to physically rescale the training and validation data to improve the ability of neural networks to generalize to unseen climates.
Tom Beucler, Pierre Gentine, Janni Yuval, Ankitesh Gupta, Liran Peng, Jerry Lin, Sungduk Yu, Stephan Rasp, Fiaz Ahmed, Paul A. O'Gorman, J. David Neelin, Nicholas J. Lutsko, Michael Pritchard
Projecting climate change is a generalization problem: we extrapolate the recent past using physical models across past, present, and future climates. Current climate models require representations of processes that occur at scales smaller than model grid size, which have been the main source of model projection uncertainty. Recent machine learning (ML) algorithms hold promise to improve such process representations, but tend to extrapolate poorly to climate regimes they were not trained on. To get the best of the physical and statistical worlds, we propose a new framework - termed "climate-invariant" ML - incorporating knowledge of climate processes into ML algorithms, and show that it can maintain high offline accuracy across a wide range of climate conditions and configurations in three distinct atmospheric models. Our results suggest that explicitly incorporating physical knowledge into data-driven models of Earth system processes can improve their consistency, data efficiency, and generalizability across climate regimes.
Saranya Ganesh S., Tom Beucler, Frederick Iat-Hin Tam, Milton S. Gomez, Jakob Runge, Andreas Gerhardus
Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a Multidata (M) causal feature selection approach that simultaneously processes an ensemble of time series datasets and produces a single set of causal drivers. This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package. These algorithms utilize conditional independence tests to infer parts of the causal graph. Our causal feature selection approach filters out causally-spurious links before passing the remaining causal features as inputs to ML models (Multiple linear regression, Random Forest) that predict the targets. We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones (TC), for which it is often difficult to accurately choose drivers and their dimensionality reduction (time lags, vertical levels, and area-averaging). Using more stringent significance thresholds in the conditional independence tests helps eliminate spurious causal relationships, thus helping the ML model generalize better to unseen TC cases. M-PC1 with a reduced number of features outperforms M-PCMCI, non-causal ML, and other feature selection methods (lagged correlation, random), even slightly outperforming feature selection based on eXplainable Artificial Intelligence. The optimal causal drivers obtained from our causal feature selection help improve our understanding of underlying relationships and suggest new potential drivers of TC intensification.
Tom Beucler, Erwan Koch, Sven Kotlarski, David Leutwyler, Adrien Michel, Jonathan Koh
We review how machine learning has transformed our ability to model the Earth system, and how we expect recent breakthroughs to benefit end-users in Switzerland in the near future. Drawing from our review, we identify three recommendations. Recommendation 1: Develop Hybrid AI-Physical Models: Emphasize the integration of AI and physical modeling for improved reliability, especially for longer prediction horizons, acknowledging the delicate balance between knowledge-based and data-driven components required for optimal performance. Recommendation 2: Emphasize Robustness in AI Downscaling Approaches, favoring techniques that respect physical laws, preserve inter-variable dependencies and spatial structures, and accurately represent extremes at the local scale. Recommendation 3: Promote Inclusive Model Development: Ensure Earth System Model development is open and accessible to diverse stakeholders, enabling forecasters, the public, and AI/statistics experts to use, develop, and engage with the model and its predictions/projections.
Monika Feldmann, Tom Beucler, Milton Gomez, Olivia Martius
Severe convective storms are among the most dangerous weather phenomena and accurate forecasts mitigate their impacts. The recently released suite of AI-based weather models produces medium-range forecasts within seconds, with a skill similar to state-of-the-art operational forecasts for variables on single levels. However, predicting severe thunderstorm environments requires accurate combinations of dynamic and thermodynamic variables and the vertical structure of the atmosphere. Advancing the assessment of AI-models towards process-based evaluations lays the foundation for hazard-driven applications. We assess the forecast skill of three top-performing AI-models for convective parameters at lead-times of up to 10 days against reanalysis and ECMWF's operational numerical weather prediction model IFS. In a case study and seasonal analyses, we see the best performance by GraphCast and Pangu-Weather: these models match or even exceed the performance of IFS for instability and shear. This opens opportunities for fast and inexpensive predictions of severe weather environments.
Louise Largeau, Tom Beucler, David Leutwyler, Gregoire Mariethoz, Valerie Chavez-Demoulin, Erwan Koch
The coarse spatial resolution of gridded climate models, such as general circulation models, limits their direct use in projecting socially relevant variables like extreme precipitation. Most downscaling methods estimate the conditional distributions of extremes by generating large ensembles, complicating the assessment of robustness under distributional transformations, such as those induced by climate change. To better understand and potentially improve robustness, we propose super-resolving the parameters of the target variable's probability distribution directly using analytically tractable mappings. Within a perfect-model framework over Switzerland, we demonstrate that vector generalized linear and additive models can super-resolve the generalized extreme value distribution of summer hourly precipitation extremes from coarse precipitation fields and topography. We introduce the notion of a "robustness gap", defined as the difference in predictive error between present-trained and future-trained models, and use it to diagnose how model structure affects the generalization of each quantile to a pseudo-global warming scenario. By evaluating multiple model configurations, we also identify an upper limit on the super-resolution factor based on the spatial auto- and cross-correlation of precipitation and elevation, beyond which coarse precipitation loses predictive value. Our framework is broadly applicable to variables governed by parametric distributions and offers a model-agnostic diagnostic for understanding when and why empirical downscaling generalizes to climate change and extremes.
Miriam Simm, Corinna Hoose, Tom Beucler
Conformal prediction can yield statistically valid prediction intervals for any regression model, with no model modifications and small computational costs. To assess its practical value, we apply conformal methods to quantify uncertainty in machine learning emulators of six microphysical process rates. Microphysical process rates describe small-scale processes in atmospheric clouds such as precipitation formation and aerosol-cloud interactions, and help understand weather and climate. The emulators are trained on simulation output from the ICOsahedral Nonhydrostatic (ICON) model in a limited-area numerical weather prediction configuration. We compare split conformal prediction for deterministic emulators with conformalized quantile regression for quantile regression emulators. Both conformal prediction methods yield well-calibrated and sharp prediction intervals on average, but conformalized quantile regression provides more consistent intervals across several orders of magnitude, making it preferable for the uncertainty quantification of climate variables.
Milton S. Gomez, Tom Beucler
While extensive guidance exists for ensuring the reproducibility of one's own study, there is little discussion regarding the reproduction and replication of external studies within one's own research. To initiate this discussion, drawing lessons from our experience reproducing an operational product for predicting tropical cyclogenesis, we present a two-dimensional framework to offer guidance on reproduction and replication. Our framework, representing model fitting on one axis and its use in inference on the other, builds upon three key aspects: the dataset, the metrics, and the model itself. By assessing the trajectories of our studies on this 2D plane, we can better inform the claims made using our research. Additionally, we use this framework to contextualize the utility of benchmark datasets in the atmospheric sciences. Our two-dimensional framework provides a tool for researchers, especially early career researchers, to incorporate prior work in their own research and to inform the claims they can make in this context.
Tom Beucler, David Leutwyler, Julia Windmiller
On small scales, the tropical atmosphere tends to be either moist or very dry. This defines two states that, on large scales, are separated by a sharp margin, well-identified by the anti-mode of the bimodal tropical column water vapor distribution. Despite recent progress in understanding physical processes governing the spatio-temporal variability of tropical water vapor, the behavior of this margin remains elusive, and we lack a simple framework to understand the bimodality of tropical water vapor in observations. Motivated by the success of coarsening theory in explaining bimodal distributions, we leverage its methodology to relate the moisture field's spatial organization to its time-evolution. This results in a new diagnostic framework for the bimodality of tropical water vapor, from which we argue that the length of the margin separating moist from dry regions should evolve towards a minimum in equilibrium. As the spatial organization of moisture is closely related to the organization of tropical convection, we hereby introduce a new organization index (BLW) measuring the ratio of the margin's length to the circumference of a well-defined equilibrium shape. Using BLW, we assess the evolution of self-aggregation in idealized cloud-resolving simulations of radiative-convective equilibrium and contrast it to the time-evolution of the Atlantic Inter-Tropical Convergence Zone (ITCZ) in the ERA5 meteorological re-analysis product. We find that BLW successfully captures aspects of convective organization ignored by more traditional metrics, while offering a new perpective on the seasonal cycle of convective organization in the Atlantic ITCZ.
Jerry Lin, Sungduk Yu, Liran Peng, Tom Beucler, Eliot Wong-Toi, Zeyuan Hu, Pierre Gentine, Margarita Geleta, Mike Pritchard
Machine-learning (ML) parameterizations of subgrid processes (here of turbulence, convection, and radiation) may one day replace conventional parameterizations by emulating high-resolution physics without the cost of explicit simulation. However, uncertainty about the relationship between offline and online performance (i.e., when integrated with a large-scale general circulation model (GCM)) hinders their development. Much of this uncertainty stems from limited sampling of the noisy, emergent effects of upstream ML design decisions on downstream online hybrid simulation. Our work rectifies the sampling issue via the construction of a semi-automated, end-to-end pipeline for $\mathcal{O}(100)$ size ensembles of hybrid simulations, revealing important nuances in how systematic reductions in offline error manifest in changes to online error and online stability. For example, removing dropout and switching from a Mean Squared Error (MSE) to a Mean Absolute Error (MAE) loss both reduce offline error, but they have opposite effects on online error and online stability. Other design decisions, like incorporating memory, converting moisture input from specific humidity to relative humidity, using batch normalization, and training on multiple climates do not come with any such compromises. Finally, we show that ensemble sizes of $\mathcal{O}(100)$ may be necessary to reliably detect causally relevant differences online. By enabling rapid online experimentation at scale, we can empirically settle debates regarding subgrid ML parameterization design that would have otherwise remained unresolved in the noise.
Antoine Leclerc, Erwan Koch, Monika Feldmann, Daniele Nerini, Tom Beucler
Issuing timely severe weather warnings helps mitigate potentially disastrous consequences. Recent advancements in Neural Weather Models (NWMs) offer a computationally inexpensive and fast approach for forecasting atmospheric environments on a 0.25° global grid. For thunderstorms, these environments can be empirically post-processed to predict wind gust distributions at specific locations. With the Pangu-Weather NWM, we apply a hierarchy of statistical and deep learning post-processing methods to forecast hourly wind gusts up to three days ahead. To ensure statistical robustness, we constrain our probabilistic forecasts using generalised extreme-value distributions across five regions in Switzerland. Using a convolutional neural network to post-process the predicted atmospheric environment's spatial patterns yields the best results, outperforming direct forecasting approaches across lead times and wind gust speeds. Our results confirm the added value of NWMs for extreme wind forecasting, especially for designing more responsive early-warning systems.
Savannah L. Ferretti, Jerry Lin, Sara Shamekh, Jane W. Baldwin, Michael S. Pritchard, Tom Beucler
Machine learning models can represent climate processes that are nonlocal in horizontal space, height, and time, often by combining information across these dimensions in highly nonlinear ways. While this can improve predictive skill, it makes learned relationships difficult to interpret and prone to overfitting as the extent of nonlocal information grows. We address this challenge by introducing data-driven integration kernels, a framework that adds structure to nonlocal operator learning by explicitly separating nonlocal information aggregation from local nonlinear prediction. Each spatiotemporal predictor field is first integrated using learnable kernels (defined as continuous weighting functions over horizontal space, height, and/or time), after which a local nonlinear mapping is applied only to the resulting kernel-integrated features and optional local inputs. This design confines nonlinear interactions to a small set of integrated features and makes each kernel directly interpretable as a weighting pattern that reveals which horizontal locations, vertical levels, and past timesteps contribute most to the prediction. We demonstrate the framework for South Asian monsoon precipitation using a hierarchy of neural network models with increasing structure, including baseline, nonparametric kernel, and parametric kernel models. Across this hierarchy, kernel models achieve near-baseline performance with far fewer trainable parameters, indicating that much of the relevant nonlocal information can be captured through a small set of interpretable integrations when appropriate structural constraints are imposed.
Griffin Mooers, Mike Pritchard, Tom Beucler, Jordan Ott, Galen Yacalis, Pierre Baldi, Pierre Gentine
We explore the potential of feed-forward deep neural networks (DNNs) for emulating cloud superparameterization in realistic geography, using offline fits to data from the Super Parameterized Community Atmospheric Model. To identify the network architecture of greatest skill, we formally optimize hyperparameters using ~250 trials. Our DNN explains over 70 percent of the temporal variance at the 15-minute sampling scale throughout the mid-to-upper troposphere. Autocorrelation timescale analysis compared against DNN skill suggests the less good fit in the tropical, marine boundary layer is driven by neural network difficulty emulating fast, stochastic signals in convection. However, spectral analysis in the temporal domain indicates skillful emulation of signals on diurnal to synoptic scales. A close look at the diurnal cycle reveals correct emulation of land-sea contrasts and vertical structure in the heating and moistening fields, but some distortion of precipitation. Sensitivity tests targeting precipitation skill reveal complementary effects of adding positive constraints vs. hyperparameter tuning, motivating the use of both in the future. A first attempt to force an offline land model with DNN emulated atmospheric fields produces reassuring results further supporting neural network emulation viability in real-geography settings. Overall, the fit skill is competitive with recent attempts by sophisticated Residual and Convolutional Neural Network architectures trained on added information, including memory of past states. Our results confirm the parameterizability of superparameterized convection with continents through machine learning and we highlight advantages of casting this problem locally in space and time for accurate emulation and hopefully quick implementation of hybrid climate models.
Helge Heuer, Tom Beucler, Mierk Schwabe, Julien Savre, Manuel Schlund, Veronika Eyring
Persistent systematic errors in Earth system models (ESMs) arise from difficulties in representing the full diversity of subgrid, multiscale atmospheric convection and turbulence. Machine learning (ML) parameterizations trained on short high-resolution simulations show strong potential to reduce these errors. However, stable long-term atmospheric simulations with hybrid (physics + ML) ESMs remain difficult, as neural networks (NNs) trained offline often destabilize online runs. Training convection parameterizations directly on coarse-grained data is challenging, notably because scales cannot be cleanly separated. This issue is mitigated using data from superparameterized simulations, which provide clearer scale separation. Yet, transferring a parameterization from one ESM to another remains difficult due to distribution shifts that induce large inference errors. Here, we present a proof-of-concept where a ClimSim-trained, physics-informed NN convection parameterization is successfully transferred to ICON-A. The scheme is (a) trained on adjusted ClimSim data with subtracted radiative tendencies, and (b) integrated into ICON-A. The NN parameterization predicts its own error, enabling mixing with a conventional convection scheme when confidence is low, thus making the hybrid AI-physics model tunable with respect to observations and reanalysis through mixing parameters. This improves process understanding by constraining convective tendencies across column water vapor, lower-tropospheric stability, and geographical conditions, yielding interpretable regime behavior. In AMIP-style setups, several hybrid configurations outperform the default convection scheme (e.g., improved precipitation statistics). With additive input noise during training, both hybrid and pure-ML schemes lead to stable simulations and remain physically consistent for at least 20 years.