CSE Community Seminar | April 24, 2026

Presenter Yuan Yuan, Graduate student, Aeronautics and Astronautics, MIT

Talk Title Dimensionless learning based on information

Presenter Ruizhe Huang, MechE-CSE SM student

Talk Title Generative AI for weather data assimilation

Benchmarking Generative Models for Weather Data Assimilation on Real Station Observations

Ruizhe Huang, MechE-CSE SM student

Abstract: Weather forecasts begin with data assimilation: fusing sparse, noisy observations into a complete gridded atmospheric state. For decades the standard recipe has been variational methods like 4D-Var (the algorithm behind ERA5), which rely on a model-forecast background and a hand-specified Gaussian error covariance — and are expensive to run. Generative models offer a cheap alternative: replace the parametric prior with one learned directly from data. But no prior work has benchmarked generative and classical methods against each other on real station observations.

We built the first such benchmark: 11,849 MADIS stations across the contiguous U.S., four surface variables, with flow-matching, diffusion, and classical methods compared under identical conditions. Starting from pure Gaussian noise, with no ERA5 input at inference, a learned generative model beats 3D-Var (35.7% vs. 33.3% RMSE reduction over ERA5), even though 3D-Var is handed the ERA5 field as input. Under sparser observation coverage, the generative advantage widens (+7.9 pp vs +2.4 pp dense) — learned priors help most where observations are weakest.

Beyond the benchmark, we analyze mechanisms. We identify the Langevin corrector as a joint refinement step on observations and the data manifold that stabilizes guided sampling, worth up to 8.5 pp of test improvement under sparse observations. Moreover, regional case studies show generative methods correcting ERA5’s systematic biases with observations — especially for wind. Finally, pixel-space Flow Guidance dominates latent alternatives on both accuracy and wall-clock time.


Dimensionless learning based on information

Yuan Yuan, Graduate student, Aeronautics and Astronautics, MIT

Abstract: Dimensional analysis is one of the most fundamental tools for understanding physical systems. However, the construction of dimensionless variables, as guided by the Buckingham-𝜋 theorem, is not uniquely determined. Here, we introduce IT-𝜋, a model-free method that combines dimensionless learning with the principles of information theory. Grounded in the irreducible error theorem, IT-𝜋 identifies dimensionless variables with the highest predictive power by measuring their shared information content. The approach is able to rank variables by predictability, identify distinct physical regimes, uncover self-similar variables, determine the characteristic scales of the problem, and extract its dimensionless parameters. IT-𝜋 also provides a bound of the minimum predictive error achievable across all possible models, from simple linear regression to advanced deep learning techniques, naturally enabling a definition of model efficiency. We benchmark IT-𝜋 across different cases and demonstrate that it offers superior performance and capabilities compared to existing tools. The method is also applied to conduct dimensionless learning for supersonic turbulence, aerodynamic drag on both smooth and irregular surfaces, magnetohydrodynamic power generation, and laser-metal interaction.