Ashesh Chattopadhyay, Jaideep Pathak, Ebrahim Nabizadeh, Wahid Bhimji, Pedram Hassanzadeh
Recent years have seen a surge in interest in building deep learning-based fully data-driven models for weather prediction. Such deep learning models if trained on observations can mitigate certain biases in current state-of-the-art weather models, some of which stem from inaccurate representation of subgrid-scale processes. However, these data-driven models, being over-parameterized, require a lot of training data which may not be available from reanalysis (observational data) products. Moreover, an accurate, noise-free, initial condition to start forecasting with a data-driven weather model is not available in realistic scenarios. Finally, deterministic data-driven forecasting models suffer from issues with long-term stability and unphysical climate drift, which makes these data-driven models unsuitable for computing climate statistics. Given these challenges, previous studies have tried to pre-train deep learning-based weather forecasting models on a large amount of imperfect long-term climate model simulations and then re-train them on available observational data. In this paper, we propose a convolutional variational autoencoder-based stochastic data-driven model that is pre-trained on an imperfect climate model simulation from a 2-layer quasi-geostrophic flow and re-trained, using transfer learning, on a small number of noisy observations from a perfect simulation. This re-trained model then performs stochastic forecasting with a noisy initial condition sampled from the perfect simulation. We show that our ensemble-based stochastic data-driven model outperforms a baseline deterministic encoder-decoder-based convolutional model in terms of short-term skills while remaining stable for long-term climate simulations yielding accurate climatology.
Ashesh Chattopadhyay, Ebrahim Nabizadeh, Eviatar Bach, Pedram Hassanzadeh
Data assimilation (DA) is a key component of many forecasting models in science and engineering. DA allows one to estimate better initial conditions using an imperfect dynamical model of the system and noisy/sparse observations available from the system. Ensemble Kalman filter (EnKF) is a DA algorithm that is widely used in applications involving high-dimensional nonlinear dynamical systems. However, EnKF requires evolving large ensembles of forecasts using the dynamical model of the system. This often becomes computationally intractable, especially when the number of states of the system is very large, e.g., for weather prediction. With small ensembles, the estimated background error covariance matrix in the EnKF algorithm suffers from sampling error, leading to an erroneous estimate of the analysis state (initial condition for the next forecast cycle). In this work, we propose hybrid ensemble Kalman filter (H-EnKF), which is applied to a two-layer quasi-geostrophic flow system as a test case. This framework utilizes a pre-trained deep learning-based data-driven surrogate that inexpensively generates and evolves a large data-driven ensemble of the states of the system to accurately compute the background error covariance matrix with less sampling error. The H-EnKF framework estimates a better initial condition without the need for any ad-hoc localization strategies. H-EnKF can be extended to any ensemble-based DA algorithm, e.g., particle filters, which are currently difficult to use for high dimensional systems.
Niloofar Asefi, Leonard Lupin-Jimenez, Tianning Wu, Ruoying He, Ashesh Chattopadhyay
Reconstructing ocean dynamics from observational data is fundamentally limited by the sparse, irregular, and Lagrangian nature of spatial sampling, particularly in subsurface and remote regions. This sparsity poses significant challenges for forecasting key phenomena such as eddy shedding and rogue waves. Traditional data assimilation methods and deep learning models often struggle to recover mesoscale turbulence under such constraints. We leverage a deep learning framework that combines neural operators with denoising diffusion probabilistic models (DDPMs) to reconstruct high-resolution ocean states from extremely sparse Lagrangian observations. By conditioning the generative model on neural operator outputs, the framework accurately captures small-scale, high-wavenumber dynamics even at $99\%$ sparsity (for synthetic data) and $99.9\%$ sparsity (for real satellite observations). We validate our method on benchmark systems, synthetic float observations, and real satellite data, demonstrating robust performance under severe spatial sampling limitations as compared to other deep learning baselines.
Ashesh Chattopadhyay, Adam Subel, Pedram Hassanzadeh
To make weather/climate modeling computationally affordable, small-scale processes are usually represented in terms of the large-scale, explicitly-resolved processes using physics-based or semi-empirical parameterization schemes. Another approach, computationally more demanding but often more accurate, is super-parameterization (SP), which involves integrating the equations of small-scale processes on high-resolution grids embedded within the low-resolution grids of large-scale processes. Recently, studies have used machine learning (ML) to develop data-driven parameterization (DD-P) schemes. Here, we propose a new approach, data-driven SP (DD-SP), in which the equations of the small-scale processes are integrated data-drivenly using ML methods such as recurrent neural networks. Employing multi-scale Lorenz 96 systems as testbed, we compare the cost and accuracy (in terms of both short-term prediction and long-term statistics) of parameterized low-resolution (LR), SP, DD-P, and DD-SP models. We show that with the same computational cost, DD-SP substantially outperforms LR, and is better than DD-P, particularly when scale separation is lacking. DD-SP is much cheaper than SP, yet its accuracy is the same in reproducing long-term statistics and often comparable in short-term forecasting. We also investigate generalization, finding that when models trained on data from one system are applied to a system with different forcing (e.g., more chaotic), the models often do not generalize, particularly when the short-term prediction accuracy is examined. But we show that transfer-learning, which involves re-training the data-driven model with a small amount of data from the new system, significantly improves generalization. Potential applications of DD-SP and transfer-learning in climate/weather modeling and the expected challenges are discussed.
Ashesh Chattopadhyay, Pedram Hassanzadeh, Devika Subramanian
In this paper, the performance of three deep learning methods for predicting short-term evolution and for reproducing the long-term statistics of a multi-scale spatio-temporal Lorenz 96 system is examined. The methods are: echo state network (a type of reservoir computing, RC-ESN), deep feed-forward artificial neural network (ANN), and recurrent neural network with long short-term memory (RNN-LSTM). This Lorenz 96 system has three tiers of nonlinearly interacting variables representing slow/large-scale ($X$), intermediate ($Y$), and fast/small-scale ($Z$) processes. For training or testing, only $X$ is available; $Y$ and $Z$ are never known or used. We show that RC-ESN substantially outperforms ANN and RNN-LSTM for short-term prediction, e.g., accurately forecasting the chaotic trajectories for hundreds of numerical solver's time steps, equivalent to several Lyapunov timescales. The RNN-LSTM and ANN show some prediction skills as well; RNN-LSTM bests ANN. Furthermore, even after losing the trajectory, data predicted by RC-ESN and RNN-LSTM have probability density functions (PDFs) that closely match the true PDF, even at the tails. The PDF of the data predicted using ANN, however, deviates from the true PDF. Implications, caveats, and applications to data-driven and data-assisted surrogate modeling of complex nonlinear dynamical systems such as weather/climate are discussed.
Leonard Lupin-Jimenez, Moein Darman, Subhashis Hazarika, Tianning Wu, Michael Gray, Ruyoing He, Anthony Wong, Ashesh Chattopadhyay
Building on top of the success in AI-based atmospheric emulation, we propose an AI-based ocean emulation and downscaling framework focusing on the high-resolution regional ocean over Gulf of Mexico. Regional ocean emulation presents unique challenges owing to the complex bathymetry and lateral boundary conditions as well as from fundamental biases in deep learning-based frameworks, such as instability and hallucinations. In this paper, we develop a deep learning-based framework to autoregressively integrate ocean-surface variables over the Gulf of Mexico at $8$ Km spatial resolution without unphysical drifts over decadal time scales and simulataneously downscale and bias-correct it to $4$ Km resolution using a physics-constrained generative model. The framework shows both short-term skills as well as accurate long-term statistics in terms of mean and variability.
Moein Darman, Pedram Hassanzadeh, Laure Zanna, Ashesh Chattopadhyay
Transfer learning (TL) is a powerful tool for enhancing the performance of neural networks (NNs) in applications such as weather and climate prediction and turbulence modeling. TL enables models to generalize to out-of-distribution data with minimal training data from the new system. In this study, we employ a 9-layer convolutional NN to predict the subgrid forcing in a two-layer ocean quasi-geostrophic system and examine which metrics best describe its performance and generalizability to unseen dynamical regimes. Fourier analysis of the NN kernels reveals that they learn low-pass, Gabor, and high-pass filters, regardless of whether the training data are isotropic or anisotropic. By analyzing the activation spectra, we identify why NNs fail to generalize without TL and how TL can overcome these limitations: the learned weights and biases from one dataset underestimate the out-of-distribution sample spectra as they pass through the network, leading to an underestimation of output spectra. By re-training only one layer with data from the target system, this underestimation is corrected, enabling the NN to produce predictions that match the target spectra. These findings are broadly applicable to data-driven parameterization of dynamical systems.
Ashesh Chattopadhyay, Y. Qiang Sun, Pedram Hassanzadeh
Long-term stability and physical consistency are critical properties for AI-based weather models if they are going to be used for subseasonal-to-seasonal forecasts or beyond, e.g., climate change projection. However, current AI-based weather models can only provide short-term forecasts accurately since they become unstable or physically inconsistent when time-integrated beyond a few weeks or a few months. Either they exhibit numerical blow-up or hallucinate unrealistic dynamics of the atmospheric variables, akin to the current class of autoregressive large language models. The cause of the instabilities is unknown, and the methods that are used to improve their stability horizons are ad-hoc and lack rigorous theory. In this paper, we reveal that the universal causal mechanism for these instabilities in any turbulent flow is due to \textit{spectral bias} wherein, \textit{any} deep learning architecture is biased to learn only the large-scale dynamics and ignores the small scales completely. We further elucidate how turbulence physics and the absence of convergence in deep learning-based time-integrators amplify this bias, leading to unstable error propagation. Finally, using the quasi-geostrophic flow and European Center for Medium-Range Weather Forecasting (ECMWF) Reanalysis data as test cases, we bridge the gap between deep learning theory and numerical analysis to propose one mitigative solution to such unphysical behavior. We develop long-term physically-consistent data-driven models for the climate system and demonstrate accurate short-term forecasts, and hundreds of years of time-integration with accurate mean and variability.
Ashesh Chattopadhyay, Mustafa Mustafa, Pedram Hassanzadeh, Eviatar Bach, Karthik Kashinath
There is growing interest in data-driven weather prediction (DDWP), for example using convolutional neural networks such as U-NETs that are trained on data from models or reanalysis. Here, we propose 3 components to integrate with commonly used DDWP models in order to improve their physical consistency and forecast accuracy. These components are 1) a deep spatial transformer added to the latent space of the U-NETs to preserve a property called equivariance, which is related to correctly capturing rotations and scalings of features in spatio-temporal data, 2) a data-assimilation (DA) algorithm to ingest noisy observations and improve the initial conditions for next forecasts, and 3) a multi-time-step algorithm, which combines forecasts from DDWP models with different time steps through DA, improving the accuracy of forecasts at short intervals. To show the benefit/feasibility of each component, we use geopotential height at 500~hPa (Z500) from ERA5 reanalysis and examine the short-term forecast accuracy of specific setups of the DDWP framework. Results show that the equivariance-preserving networks (U-STNs) clearly outperform the U-NETs, for example improving the forecast skill by $45\%$. Using a sigma-point ensemble Kalman (SPEnKF) algorithm for DA and U-STN as the forward model, we show that stable, accurate DA cycles are achieved even with high observation noise. The DDWP+DA framework substantially benefits from large ($O(1000)$) ensembles that are inexpensively generated with the data-driven forward model in each DA cycle. The multi-time-step DDWP+DA framework also shows promises, e.g., it reduces the average error by factors of 2-3.
Anish Sambamurthy, Ashesh Chattopadhyay
Turbulent flows posses broadband, power-law spectra in which multiscale interactions couple high-wavenumber fluctuations to large-scale dynamics. Although diffusion-based generative models offer a principled probabilistic forecasting framework, we show that standard DDPMs induce a fundamental \emph{spectral collapse}: a Fourier-space analysis of the forward SDE reveals a closed-form, mode-wise signal-to-noise ratio (SNR) that decays monotonically in wavenumber, $|k|$ for spectra $S(k)\!\propto\!|k|^{-λ}$, rendering high-wavenumber modes indistinguishable from noise and producing an intrinsic spectral bias. We reinterpret the noise schedule as a spectral regularizer and introduce power-law schedules $β(τ)\!\propto\!τ^γ$ that preserve fine-scale structure deeper into diffusion time, along with \emph{Lazy Diffusion}, a one-step distillation method that leverages the learned score geometry to bypass long reverse-time trajectories and prevent high-$k$ degradation. Applied to high-Reynolds-number 2D Kolmogorov turbulence and $1/12^\circ$ Gulf of Mexico ocean reanalysis, these methods resolve spectral collapse, stabilize long-horizon autoregression, and restore physically realistic inertial-range scaling. Together, they show that naïve Gaussian scheduling is structurally incompatible with power-law physics and that physics-aware diffusion processes can yield accurate, efficient, and fully probabilistic surrogates for multiscale dynamical systems.
Ashesh Chattopadhyay, Ebrahim Nabizadeh, Pedram Hassanzadeh
Numerical weather prediction (NWP) models require ever-growing computing time/resources, but still, have difficulties with predicting weather extremes. Here we introduce a data-driven framework that is based on analog forecasting (prediction using past similar patterns) and employs a novel deep learning pattern-recognition technique (capsule neural networks, CapsNets) and impact-based auto-labeling strategy. CapsNets are trained on mid-tropospheric large-scale circulation patterns (Z500) labeled $0-4$ depending on the existence and geographical region of surface temperature extremes over North America several days ahead. The trained networks predict the occurrence/region of cold or heat waves, only using Z500, with accuracies (recalls) of $69\%-45\%$ $(77\%-48\%)$ or $62\%-41\%$ $(73\%-47\%)$ $1-5$ days ahead. CapsNets outperform simpler techniques such as convolutional neural networks and logistic regression. Using both temperature and Z500, accuracies (recalls) with CapsNets increase to $\sim 80\%$ $(88\%)$, showing the promises of multi-modal data-driven frameworks for accurate/fast extreme weather predictions, which can augment NWP efforts in providing early warnings.
Ashesh Chattopadhyay, Michael Gray, Tianning Wu, Anna B. Lowe, Ruoying He
While data-driven approaches demonstrate great potential in atmospheric modeling and weather forecasting, ocean modeling poses distinct challenges due to complex bathymetry, land, vertical structure, and flow non-linearity. This study introduces OceanNet, a principled neural operator-based digital twin for ocean circulation. OceanNet uses a Fourier neural operator and predictor-evaluate-corrector integration scheme to mitigate autoregressive error growth and enhance stability over extended time scales. A spectral regularizer counteracts spectral bias at smaller scales. OceanNet is applied to the northwest Atlantic Ocean western boundary current (the Gulf Stream), focusing on the task of seasonal prediction for Loop Current eddies and the Gulf Stream meander. Trained using historical sea surface height (SSH) data, OceanNet demonstrates competitive forecast skill by outperforming SSH predictions by an uncoupled, state-of-the-art dynamical ocean model forecast, reducing computation by 500,000 times. These accomplishments demonstrate the potential of physics-inspired deep neural operators as cost-effective alternatives to high-resolution numerical ocean models.
Niloofar Asefi, Tianning Wu, Ruoying He, Ashesh Chattopadhyay
The ocean interior regulates Earth's climate but remains sparsely observed due to limited in situ measurements, while satellite observations are restricted to the surface. We present a depth-aware generative framework for reconstructing high-resolution three-dimensional ocean states from extremely sparse surface data. Our approach employs a conditional denoising diffusion probabilistic model (DDPM) trained on sea surface height and temperature observations with up to 99.9 percent sparsity, without reliance on a background dynamical model. By incorporating continuous depth embeddings, the model learns a unified vertical representation of the ocean states and generalizes to previously unseen depths. Applied to the Gulf of Mexico, the framework accurately reconstructs subsurface temperature, salinity, and velocity fields across multiple depths. Evaluations using statistical metrics, spectral analysis, and heat transport diagnostics demonstrate recovery of both large-scale circulation and multiscale variability. These results establish generative diffusion models as a scalable approach for probabilistic ocean reconstruction in data-limited regimes, with implications for climate monitoring and forecasting.
Yifei Guan, Ashesh Chattopadhyay, Adam Subel, Pedram Hassanzadeh
There is a growing interest in developing data-driven subgrid-scale (SGS) models for large-eddy simulation (LES) using machine learning (ML). In a priori (offline) tests, some recent studies have found ML-based data-driven SGS models that are trained on high-fidelity data (e.g., from direct numerical simulation, DNS) to outperform baseline physics-based models and accurately capture the inter-scale transfers, both forward (diffusion) and backscatter. While promising, instabilities in a posteriori (online) tests and inabilities to generalize to a different flow (e.g., with a higher Reynolds number, Re) remain as major obstacles in broadening the applications of such data-driven SGS models. For example, many of the same aforementioned studies have found instabilities that required often ad-hoc remedies to stabilize the LES at the expense of reducing accuracy. Here, using 2D decaying turbulence as the testbed, we show that deep fully convolutional neural networks (CNNs) can accurately predict the SGS forcing terms and the inter-scale transfers in a priori tests, and if trained with enough samples, lead to stable and accurate a posteriori LES-CNN. Further analysis attributes these instabilities to the disproportionally lower accuracy of the CNNs in capturing backscattering when the training set is small. We also show that transfer learning, which involves re-training the CNN with a small amount of data (e.g., 1%) from the new flow, enables accurate and stable a posteriori LES-CNN for flows with 16x higher Re (as well as higher grid resolution if needed). These results show the promise of CNNs with transfer learning to provide stable, accurate, and generalizable LES for practical use.
Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, Animashree Anandkumar
FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at $0.25^{\circ}$ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning wind energy resources, predicting extreme weather events such as tropical cyclones, extra-tropical cyclones, and atmospheric rivers. FourCastNet matches the forecasting accuracy of the ECMWF Integrated Forecasting System (IFS), a state-of-the-art Numerical Weather Prediction (NWP) model, at short lead times for large-scale variables, while outperforming IFS for variables with complex fine-scale structure, including precipitation. FourCastNet generates a week-long forecast in less than 2 seconds, orders of magnitude faster than IFS. The speed of FourCastNet enables the creation of rapid and inexpensive large-ensemble forecasts with thousands of ensemble-members for improving probabilistic forecasting. We discuss how data-driven deep learning models such as FourCastNet are a valuable addition to the meteorology toolkit to aid and augment NWP models.
Y. Qiang Sun, Hamid A. Pahlavan, Ashesh Chattopadhyay, Pedram Hassanzadeh, Sandro W. Lubis, M. Joan Alexander, Edwin Gerber, Aditi Sheshadri, Yifei Guan
Neural networks (NNs) are increasingly used for data-driven subgrid-scale parameterization in weather and climate models. While NNs are powerful tools for learning complex nonlinear relationships from data, there are several challenges in using them for parameterizations. Three of these challenges are 1) data imbalance related to learning rare (often large-amplitude) samples; 2) uncertainty quantification (UQ) of the predictions to provide an accuracy indicator; and 3) generalization to other climates, e.g., those with higher radiative forcing. Here, we examine the performance of methods for addressing these challenges using NN-based emulators of the Whole Atmosphere Community Climate Model (WACCM) physics-based gravity wave (GW) parameterizations as the test case. WACCM has complex, state-of-the-art parameterizations for orography-, convection- and frontal-driven GWs. Convection- and orography-driven GWs have significant data imbalance due to the absence of convection or orography in many grid points. We address data imbalance using resampling and/or weighted loss functions, enabling the successful emulation of parameterizations for all three sources. We demonstrate that three UQ methods (Bayesian NNs, variational auto-encoders, and dropouts) provide ensemble spreads that correspond to accuracy during testing, offering criteria on when a NN gives inaccurate predictions. Finally, we show that the accuracy of these NNs decreases for a warmer climate (4XCO2). However, the generalization accuracy is significantly improved by applying transfer learning, e.g., re-training only one layer using ~1% new data from the warmer climate. The findings of this study offer insights for developing reliable and generalizable data-driven parameterizations for various processes, including (but not limited) to GWs.
Subhashis Hazarika, Leonard Lupin-Jimenez, Rohit Vuppala, Ashesh Chattopadhyay, Hon Yung Wong
Efficient and sustainable maritime transport increasingly depends on reliable forecasting and adaptive routing, yet operational adoption remains difficult due to forecast latencies and the need for human judgment in rapid decision-making under changing ocean conditions. We introduce SWR-Viz, an AI-assisted visual analytics framework that combines a physics-informed Fourier Neural Operator wave forecast model with SIMROUTE-based routing and interactive emissions analytics. The framework generates near-term forecasts directly from current conditions, supports data assimilation with sparse observations, and enables rapid exploration of what-if routing scenarios. We evaluate the forecast models and SWR-Viz framework along key shipping corridors in the Japan Coast and Gulf of Mexico, showing both improved forecast stability and realistic routing outcomes comparable to ground-truth reanalysis wave products. Expert feedback highlights the usability of SWR-Viz, its ability to isolate voyage segments with high emission reduction potential, and its value as a practical decision-support system. More broadly, this work illustrates how lightweight AI forecasting can be integrated with interactive visual analytics to support human-centered decision-making in complex geospatial and environmental domains.
Rambod Mojgani, Ashesh Chattopadhyay, Pedram Hassanzadeh
Models of many engineering and natural systems are imperfect. The discrepancy between the mathematical representations of a true physical system and its imperfect model is called the model error. These model errors can lead to substantial differences between the numerical solutions of the model and the state of the system, particularly in those involving nonlinear, multi-scale phenomena. Thus, there is increasing interest in reducing model errors, particularly by leveraging the rapidly growing observational data to understand their physics and sources. Here, we introduce a framework named MEDIDA: Model Error Discovery with Interpretability and Data Assimilation. MEDIDA only requires a working numerical solver of the model and a small number of noise-free or noisy sporadic observations of the system. In MEDIDA, first the model error is estimated from differences between the observed states and model-predicted states (the latter are obtained from a number of one-time-step numerical integrations from the previous observed states). If observations are noisy, a data assimilation (DA) technique such as ensemble Kalman filter (EnKF) is employed to provide the analysis state of the system, which is then used to estimate the model error. Finally, an equation-discovery technique, here the relevance vector machine (RVM), a sparsity-promoting Bayesian method, is used to identify an interpretable, parsimonious, and closed-form representation of the model error. Using the chaotic Kuramoto-Sivashinsky (KS) system as the test case, we demonstrate the excellent performance of MEDIDA in discovering different types of structural/parametric model errors, representing different types of missing physics, using noise-free and noisy observations.
Haiwen Guan, Troy Arcomano, Ashesh Chattopadhyay, Romit Maulik
We present a lightweight, easy-to-train, low-resolution, fully data-driven climate emulator, LUCIE, that can be trained on as low as $2$ years of $6$-hourly ERA5 data. Unlike most state-of-the-art AI weather models, LUCIE remains stable and physically consistent for $100$ years of autoregressive simulation with $100$ ensemble members. Long-term mean climatology from LUCIE's simulation of temperature, wind, precipitation, and humidity matches that of ERA5 data, along with the variability. We further demonstrate how well extreme weather events and their return periods can be estimated from a large ensemble of long-term simulations. We further discuss an improved training strategy with a hard-constrained first-order integrator to suppress autoregressive error growth, a novel spectral regularization strategy to better capture fine-scale dynamics, and finally an optimization algorithm that enables data-limited (as low as $2$ years of $6$-hourly data) training of the emulator without losing stability and physical consistency. Finally, we provide a scaling experiment to compare the long-term bias of LUCIE with respect to the number of training samples. Importantly, LUCIE is an easy to use model that can be trained in just $2.4$h on a single A-100 GPU, allowing for multiple experiments that can explore important scientific questions that could be answered with large ensembles of long-term simulations, e.g., the impact of different variables on the simulation, dynamic response to external forcing, and estimation of extreme weather events, amongst others.
Patrick Wyrod, Ashesh Chattopadhyay, Daniele Venturi
Chaotic dynamical systems exhibit strong sensitivity to initial conditions and often contain unresolved multiscale processes, making deterministic forecasting fundamentally limited. Generative models offer an appealing alternative by learning distributions over plausible system evolutions; yet, most existing approaches focus on next-step conditional prediction rather than the structure of the underlying dynamics. In this work, we reframe forecasting as a fully generative problem by learning the joint probability distribution of lagged system states over short temporal windows and obtaining forecasts through marginalization. This new perspective allows the model to capture nonlinear temporal dependencies, represent multistep trajectory segments, and produce next-step predictions consistent with the learned joint distribution. We also introduce a general, model-agnostic training and inference framework for joint generative forecasting and show how it enables assessment of forecast robustness and reliability using three complementary uncertainty quantification metrics (ensemble variance, short-horizon autocorrelation, and cumulative Wasserstein drift), without access to ground truth. We evaluate the performance of the proposed method on two canonical chaotic dynamical systems, the Lorenz-63 system and the Kuramoto-Sivashinsky equation, and show that joint generative models yield improved short-term predictive skill, preserve attractor geometry, and achieve substantially more accurate long-range statistical behaviour than conventional conditional next-step models.