Sebastian Friedemann PhD defense - Ensemble-based Data Assimilation for Large Scale Simulations [July 4, 2022]


Prediction of chaotic and non-linear systems like weather or the groundwater cycle
relies on a floating fusion of sensor data (observations) with numerical models to
decide on good system trajectories and to compensate for non-linear feedback effects.
Ensemble-based data assimilation (DA) is a major method for this concern. It relies
on the propagation of an ensemble of perturbed model realizations (members) that
is enriched by the integration of observation data. Performing DA at large scale to
capture continental up to global geospatial effects, while running at high resolution to
accurately predict impacts from small scales is computationally demanding. This requires
supercomputers leveraging hundreds of thousands of compute nodes, interconnected
via high-speed networks. Efficiently scaling DA algorithms to such machines requires
carefully designed highly parallelized workflows that avoid overloading of shared resources.
Fault tolerance is of importance too, since the probability of hardware and numerical
faults increases with the amount of resources and the number of ensemble members.
Existing DA frameworks either use the file system as intermediate storage to provide a
fault-tolerant and elastic workflow, which, at large scale, is slowed down by file system
overload, or run large monolithic jobs that suffer from intrinsic load imbalance and are
very sensible to numerical and hardware faults. This thesis elaborates on a highly parallel,
load-balanced, elastic, and fault-tolerant solution, enabling it to run efficiently statistical,
ensemble-based DA at large scale. We investigate two classes of DA algorithms, the en-
semble Kalman filter (EnKF), and the particle filter algorithm with sequential importance
resampling (SIR), and validate our framework under realistic conditions. Groundwater
sensor data is assimilated using a regional hydrological simulation leveraging the ParFlow
model. We efficiently run EnKF with up to 16,384 members on 16,240 compute cores
for this purpose. A comparison with an existing state-of-the-art solution on the same
domain, running 2,500 members on 20,000 cores, shows that our approach is about
50 % faster. We also present performance improvements running particle filter with
SIR at large scale. These experiments assimilate cloud coverage observations into
2,555 members, i.e., particles, running the weather research and forecasting (WRF)
model over the European domain. To manage the many experiments performed on
various supercomputers, we developed a specific setup that we also present.

Keywords: Data Assimilation, Ensemble Based, In Situ Processing, EnKF, Particle
Filter, High Performance Computing

Tags: data assimilation enkf ensemble based high performance computing in situ processing particle filter




Social Networks

Check the box to autoplay the video.
Check the box to loop the video.
Check the box to indicate the beginning of playing desired.
 Embed in a web page
 Share the link