CSE Community Seminar | April 24, 2026

Presenter Evelyne Ringoot, Math-CSE PhD student

Talk Title Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision

Presenter Ruizhe Huang, MechE-CSE SM student

Talk Title Generative AI for weather data assimilation
Time 12:00–1:00 PM

Location 45-432

Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision

Evelyne Ringoot, Math-CSE PhD student

Abstract:This talk presents a portable, GPU-accelerated implementation of a QR-based singular value computation algorithm in Julia, also presented at ICPP25. The singular value decomposition (SVD) is a fundamental numerical tool in scientific computing and machine learning, providing optimal low-rank matrix approximations. The implemented algorithm is based on the classic two-stage QR reduction, consisting of successive matrix reduction to band form and bidiagonal form. Our implementation leverages Julia’s multiple dispatch and metaprogramming capabilities, integrating with the GPUArrays and KernelAbstractions frameworks to provide a unified type and hardware-agnostic function. It supports diverse GPU architectures and data types, and is to our knowledge the first GPU-accelerated singular value implementation to support Apple Metal GPUs and half precision. We explore GPU kernel optimization through parameter tuning to enable efficient parallelism and improved memory locality. Performance results on multiple GPU backends and data types demonstrate that portability does not require sacrificing performance: the unified function outperforms most linear algebra libraries (MAGMA, SLATE, rocSOLVER, oneMKL) for matrix sizes larger than 1024 × 1024, and achieves 80%-90% of the performance of cuSOLVER for large matrices, highlighting Julia’s suitability for high-performance linear algebra in heterogeneous environments.


Generative AI for weather data assimilation

Ruizhe Huang, MechE-CSE SM student

Abstract: To anchor weather products in reality, data assimilation integrates observational data into physical simulations of the atmosphere. Traditional approaches do this by using numerical model forecasts as a prior, which is expensive. Today, researchers are exploring the use of deep generative models, such as diffusion models, as emulators to reconstruct full weather fields directly from sparse observations, but existing guidance-based approaches can be unstable or have not been evaluated under real-world conditions. We introduce GLaD-Flow (Guided Latent D-Flow), which combines Guidance and D-Flow within the latent space. It uses Latent D-Flow to optimize the latent initial noise using an observation loss, then generates full fields with observation guidance using the optimized initial noise produced by Latent D-Flow.
We conduct a comprehensive benchmark over the Continental United States (CONUS) by training the flow model from 2017 to 2022 and testing in 2023. We generate full ERA5-like fields for 4 surface variables (10-meter wind, 2-meter temperature, and 2-meter dewpoint) from sparse ground station observations. We test the generalizability of our method by evaluating performance on held-out test weather stations. Our results show that GLaD-Flow reduces the Root Mean Square Error (RMSE) compared to ERA5 by over 31% on average across 1,778 test stations, while retaining ERA5 physics. We estimate that GLaD-Flow reduces ERA5 error by 22.0% at median-distance locations across the CONUS, demonstrating meaningful generalization beyond the immediate vicinity of observation stations. Our work demonstrates that unconditional generative models, particularly the GLaD-Flow framework, provide a promising tool for reducing the cost and improving the accuracy of weather products.