Milton Gomez, Marie McGraw, Saranya Ganesh S., Frederick Iat-Hin Tam, Ilia Azizi, Samuel Darmon, Monika Feldmann, Stella Bourdin, Louis Poulain--Auzéau, Suzana J. Camargo, Jonathan Lin, Dan Chavas, Chia-Ying Lee, Ritwik Gupta, Andrea Jenney, Tom Beucler
TCBench is a benchmark for evaluating global, short to medium-range (1-5 days) forecasts of tropical cyclone (TC) track and intensity. To allow a fair and model-agnostic comparison, TCBench builds on the IBTrACS observational dataset and formulates TC forecasting as predicting the time evolution of an existing tropical system conditioned on its initial position and intensity. TCBench includes state-of-the-art dynamical (TIGGE) and neural weather models (AIFS, Pangu-Weather, FourCastNet v2, GenCast). If not readily available, baseline tracks are consistently derived from model outputs using the TempestExtremes library. For evaluation, TCBench provides deterministic and probabilistic storm-following metrics. On 2023 test cases, neural weather models skillfully forecast TC tracks, while skillful intensity forecasts require additional steps such as post-processing. Designed for accessibility, TCBench helps AI practitioners tackle domain-relevant TC challenges and equips tropical meteorologists with data-driven tools and workflows to improve prediction and TC process understanding. By lowering barriers to reproducible, process-aware evaluation of extreme events, TCBench aims to democratize data-driven TC forecasting.
Milton Gomez, Louis Poulain--Auzeau, Alexis Berne, Tom Beucler
Numerical Weather Prediction (NWP) models that integrate coupled physical equations forward in time are the traditional tools for simulating atmospheric processes and forecasting weather. With recent advancements in deep learning, AI-based Weather Prediction models that rely on neural network architectures$\unicode{x2013}$Neural Weather Models (NeWMs)$\unicode{x2013}$have emerged as competent medium-range NWP emulators, with performances that compare favorably to state-of-the-art NWP models. However, they are commonly trained on reanalyses with limited spatial resolution (e.g., 0.25° horizontal grid spacing), which smooths out key features of weather systems. For example, tropical cyclones (TCs)$\unicode{x2013}$among the most impactful weather events due to their devastating effects on human activities$\unicode{x2013}$are challenging to forecast, as extrema are smoothed in deterministic forecasts at 0.25° resolution. To address this, we use our best observational estimates of wind gusts and minimum sea level pressure to train a hierarchy of post-processing models on NeWM outputs. Applied to Pangu-Weather and FourCastNet v2, the post-processing models produce accurate and reliable forecasts of TC intensity up to five days ahead. Our post-processing algorithm is tracking-independent, preventing full misses, and we demonstrate that even linear models extract predictive information from NeWM outputs beyond what is encoded in their initial conditions. While spatial masking improves probabilistic forecast consistency, we do not find clear advantages of convolutional architectures over simple multilayer perceptrons for our NeWM post-processing purposes. Overall, by combining the efficiency of NeWMs with a lightweight, tracking-independent postprocessing framework, our approach improves the accessibility of global TC intensity forecasts, marking a step toward their democratization.