CSE Community Seminar | October 10, 2025

Presenter Daniel Sharp, PhD Student, Department of Aeronautics and Astronautics, MIT

Talk Title Learning a quadrature rule by quantizing the measure: using data-driven gradient flows for MMD minimization

Presenter Mohamed Elrefaie, PhD Student, Department of Mechanical Engineering, MIT

Talk Title DrivAerNet++: Toward Scalable Datasets for Engineering Design—Challenges, Infrastructure, and Future Directions​ ​
Time 12:00–1:00 PM

Location 45-432

Talk 1 Abstract

Uncertainty quantification methods often require the discretization of an integral, particularly when performing function approximation or estimating a quantity of interest; however, many traditional methods assume intrusive knowledge of the distribution we integrate with respect to. In this talk, we discuss discretizations formed by summarizing Monte Carlo samples, where we aim to, e.g., estimate the average of an expensive black-box model over thousands of datapoints by only evaluating this model ten times. We propose to summarize the dataset by finding weighted points which quantize the data distribution. We then present a method based on Wasserstein-Fisher-Rao gradient flows to find this quantized distribution by minimizing its error relative to larger dataset in the maximum mean discrepancy (MMD). We show that this characterization naturally minimizes error of expectations (relative to the larger dataset) for a particular class of functions. We then demonstrate a preconditioning of this system, which resembles a relaxation of the popular Lloyd’s algorithm for clustering. Finally, we apply this to use-cases for clustering, particularly in high-dimensional datasets and adversarial initializations.

Talk 2 Abstract

This talk presents DrivAerNet++, the largest and most comprehensive multimodal dataset for car design, comprising 8,000 diverse geometries modeled with high-fidelity CFD simulations. Each entry is provided with 3D meshes, parametric models, segmented components, aerodynamic coefficients, detailed flow fields, and different machine learning representations, covering multiple archetypes (fastback, notchback, estateback) and drivetrain configurations (ICE and EV). With more than 39 TB of open-source engineering data, DrivAerNet++ is positioned to fill a critical gap by enabling data-driven design optimization, generative modeling, surrogate training, CFD acceleration, and geometry-aware classification. The dataset design, validation, and benchmarking (e.g., aerodynamic drag prediction) are described, and the infrastructure and computational challenges of building large-scale engineering datasets are discussed. Emphasis is placed on the multimodal nature of DrivAerNet++, its application to the training of geometric deep learning models (including graph networks, transformers, and neural operators), and the role of HPC resources in scaling to foundation models for physics. The dataset’s role in supporting emerging multi-agent design systems is highlighted, where agents for styling, CAD, meshing, and simulation interact to close the loop between creativity and performance. Finally, early results on 3D-conditioned shape optimization are presented, demonstrating how geometric deep learning models trained on DrivAerNet++ can guide inverse design and accelerate engineering workflows, marking a step toward AI-native engineering design workflows and scalable multiphysics foundation models.