CSE Community Seminar | February 20, 2025
Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision
Evelyne Ringoot, Math-CSE PhD student
Abstract: This talk presents a portable, GPU-accelerated implementation of a QR-based singular value computation algorithm in Julia, also presented at ICPP25. The singular value decomposition (SVD) is a fundamental numerical tool in scientific computing and machine learning, providing optimal low-rank matrix approximations. The implemented algorithm is based on the classic two-stage QR reduction, consisting of successive matrix reduction to band form and bidiagonal form. Our implementation leverages Julia’s multiple dispatch and metaprogramming capabilities, integrating with the GPUArrays and KernelAbstractions frameworks to provide a unified type and hardware-agnostic function. It supports diverse GPU architectures and data types, and is to our knowledge the first GPU-accelerated singular value implementation to support Apple Metal GPUs and half precision. We explore GPU kernel optimization through parameter tuning to enable efficient parallelism and improved memory locality. Performance results on multiple GPU backends and data types demonstrate that portability does not require sacrificing performance: the unified function outperforms most linear algebra libraries (MAGMA, SLATE, rocSOLVER, oneMKL) for matrix sizes larger than 1024 × 1024, and achieves 80%-90% of the performance of cuSOLVER for large matrices, highlighting Julia’s suitability for high-performance linear algebra in heterogeneous environments.
Score-based generative emulation of impact-relevant earth system model outputs
Shahine Bouabid, Postdoctoral associate, MIT EAPS
Abstract: Policy targets evolve faster than the Couple Model Intercomparison Project cycles, complicating adaptation and mitigation planning that must often contend with outdated projections. Climate model output emulators address this gap by offering inexpensive surrogates that can rapidly explore alternative futures while staying close to Earth System Model (ESM) behavior. The focus is on emulators designed to provide inputs to impact models. Using monthly ESM fields of near-surface temperature, precipitation, relative humidity, and wind speed, it is shown that deep generative models have the potential to model the joint distribution of variables relevant for impacts. The specific model proposed uses score-based diffusion on a spherical mesh and runs on a single mid-range graphical processing unit. A thorough suite of diagnostics is introduced to compare emulator outputs with their parent ESMs, including their probability densities, cross-variable correlations, time of emergence, or tail behavior. The emulator performance is evaluated across three distinct ESMs in both pre-industrial and forced regimes. The results show that the emulator produces distributions that closely match the ESM outputs and captures key forced responses. They also reveal important failure cases, notably for variables with a strong regime shift in the seasonal cycle. Although not a perfect match to the ESM, the inaccuracies of the emulator are small relative to the magnitude of internal variability in ESM projections. This suggests that the generative emulators can be useful in supporting impact assessment. Priorities for future development toward daily resolution, finer spatial scales, and bias-aware training are discussed. Link : https://arxiv.org/abs/2510.04358