High Performance Data Analytics for Numerical Simulations

Prendre des notes

Il n’y a pas de note disponible pour vous pour cette vidéo.

Connectez-vous pour en créer une nouvelle.

Disciplines

Types

Mots clés

perform 302 fle 292 sciences 290 techniques 290 filipé 284 fos 282 gricad 203 dsim 189 lig 159 soutenance 151 mathematiques 147 cpp 137 thèse 137 prepa inp 134 prepa des inp 133 uga 120 intercomprehension 114 mooc 103 cinema 102 romanofonia 101

Bruno RAFFIN / GRICAD

Large scale numerical simulations are producing an ever growing amount of data that represent a double challenge. First, these amounts of data are becoming increasingly difficult to analyse relying on the traditional tools. Next, moving these data from the simulation to disks, to latter retrieve them from disks to the analysis machine is becoming increasingly costly in term of time and energy. And this situation is expected to worsen as supercomputer I/Os and more generally data movements capabilities are progressing more slowly than compute capabilities. While the simulation was at the center of all attentions, it is now time to focus on high performance data analysis. This integration of data analytics with large scale simulations represents a new kind of workflow that needs adapted software solutions. In this talk we will survey two directions: big data like solutions and in-situ analysis. Big Data Analytics solutions like Google MapReduce, Spark or Flink were developed to answer the needs for analyzing large amount of data from the web, social networks, or generated by business applications on cloud infrastructures. We will give an overview of some research work that either developed their specific map/reduce stack for analyzing scientific data or relied on classical Big Data stacks like the Velassco project. In-situ analysis attempt to more specifically address the reduction of data movements and data storage. In-situ analysis proposes to start processing data as soon as made available by the simulation in the memories of the compute nodes. Raw data produced by the simulation can start to be reduced before moving out of the compute nodes, thus saving on data movements and on the amount of data to store to disk. Part of data analysis can be performed on the same supercomputer than the one booked for the simulation. The process can be massively parallelized, reading data from memory and not from disk, reducing the time for performing these tasks. We will give an overview of in-situ approaches with some examples. The conclusion will summarize the main challenges that still need to be addressed before high performance data analysis tools become a commodity in the scientist toolbox.

Mots clés : gricad

Ajouté par : Gricad Vidéos
Mis à jour le : 1 janvier 2021 00:00
Chaîne :
- Recherche
Type : Conférences
Langue principale : Français

Les commentaires ont été désactivés pour cette vidéo.

High Performance Data Analytics for Numerical Simulations

Informations