MechE-CSE PhD Thesis Defense | Bianca Champenois
Bianca Champenois, MechE-CSE PhD Thesis Defense Announcement
Thesis Title: Advancing Geophysical Fluid Dynamics with Physics-Driven Machine Learning
Date: Tuesday, September 23, 2025
Time: 3 PM ET
Location: 37-212 / Zoom
Thesis Committee:
- Prof. Themistoklis Sapsis, Department of Mechanical Engineering, MIT
- Prof. Pierre Lermusiaux, Department of Mechanical Engineering, MIT
- Prof. Sherrie Wang, Department of Mechanical Engineering, MIT
Abstract:
Modeling turbulent and geophysical fluid dynamics remains a fundamental challenge in computational science and engineering. These systems are inherently multiscale, nonlinear, and chaotic, requiring high-resolution simulations that are often too expensive to run in practical operations or over long time horizons. At the same time, real-world observations, collected remotely or in situ, are often sparse, noisy, and irregular in space and time. Bridging the gap between these incomplete measurements and the governing physical processes is key to improving both scientific understanding and predictive capabilities. However, many existing approaches struggle with high computational costs, inaccurate representation of nonlinear dynamics, missing dynamical processes or variables, unreliable uncertainty quantification, and poor prediction of extreme events.
These challenges are especially urgent in the context of Earth’s changing climate. The ocean, in particular, absorbs large quantities of carbon dioxide and heat, buffering the effects of global warming but also contributing to ocean acidification and altered circulation. These changes in ocean and atmospheric dynamics can intensify extreme weather and disrupt ecosystems, underscoring the need for accurate and efficient prediction tools for climate resilience and marine resource management.
This thesis develops machine learning (ML) algorithms for geophysical fluid dynamics applied to ocean warming and acidification, ocean pollution dispersion, and extreme event quantification. The first two applications focus on learning parsimonious representations of high-fidelity ocean models and developing efficient ML-based methods for integrating observational data. The third application introduces an active sampling algorithm to identify within vast datasets the data points that are most useful for understanding properties of extreme events, including their prediction. Overall, this research introduces new mathematical and computational tools to leverage data — whether they be terabytes of high-dimensional physics-based numerical simulations or sparse real-time sensor measurements — for modeling geophysical fluid dynamics more accurately and with lower computational cost.
To enable high-resolution monitoring and prediction of coastal ocean acidification properties, the first part of this thesis develops a temporal convolutional neural network that infers depth-resolved fields of temperature and salinity from surface observations. The model is trained on the vertical PCA space, using data from a regional reanalysis dataset, i.e., the output of a data-assimilation operational ocean model. During inference, surface observations consisting of both satellite field and buoy point measurements are passed as input into the trained model. Predictions from the neural network then serve as inputs to empirical Bayesian regression models to estimate dissolved inorganic carbon and alkalinity. These estimates are used to produce 4D maps of aragonite saturation state with uncertainty quantification, validated against in situ measurements in the Massachusetts and Cape Cod Bays, a region with fishing and tourism industries affected by ocean acidification.
The second part of this thesis addresses the problem of reconstructing ocean flow from sparse Lagrangian drifter trajectories. A convolutional autoencoder is trained on a regional reanalysis dataset to learn an efficient representation in the form of a nonlinear reduced-order model. Subsequently, flow fields are reconstructed by optimizing the latent variables of the autoencoder with Bayesian optimization to match observed drifter trajectories. The proposed approach takes advantage of the temporal coherence of the observed Lagrangian trajectories, as opposed to instantaneously matching Eulerian velocities, which is the typical approach in data assimilation methods. This approach also provides an efficient means of determining both the minimum number of trajectories and the optimal drifter release points for accurate flow reconstruction.
The third part of this thesis explores an active sampling algorithm that selects data points from large, fixed datasets to improve the prediction of extreme events, which are typically underrepresented in training data. The algorithm uses a likelihood-weighted uncertainty sampling criterion to sequentially select samples that reduce model uncertainty and increase accuracy in the tails of the distribution. When applied to a very high-dimensional ML climate model, the likelihood-weighted sampling improves extreme weather event prediction and identifies the training points most valuable for capturing extremes.
This thesis demonstrates that carefully designed ML algorithms that (i) incorporate dynamics from existing datasets produced by physics-based data-assimilation models and (ii) intelligently integrate real-time multi-modal observations, result in high-accuracy, parsimonious models without the need for tuning or parameter calibration. Carefully selecting ML training data can also improve the prediction of extreme events. On the practical side, the produced computational algorithms allow for a real-time, practical monitoring tool that has been experimentally validated in the Massachusetts and Cape Cod Bays, providing local stakeholders with a platform for monitoring, decision‐making, and future planning.