Gaia AVU-GSR parallel solver towards exascale infrastructure
2021-10-26, 12:30–12:45, Grand Ballroom

The Gaia Astrometric Verification Unit – Global Sphere Reconstruction (AVU-GSR) Parallel Solver aims to find the astrometric parameters for ~1 billion of stars in the Milky Way, the attitude and the instrumental parameters of the Gaia satellite, and the global parameter gamma of the post Newtonian formalism RAMOD. To perform this task, the code iteratively solves a system of linear equations, A x = b, where the coefficient matrix A is large, having a dimension of ~10^10 x 10^8, and sparse.
To deal with this big data problem, the matrix A is compactified through an ad-hoc compression algorithm and only its elements different from zero are considered during computation. The matrix dimension reduces from ~10^10 x 10^8 to ~10^10 x 10^1. To solve this system, the code exploits a hybrid implementation of the iterative PC-LSQR algorithm, where the computation related to different horizontal portions of the coefficient matrix is assigned to different MPI processes. In the original code, each matrix portion is further parallelized over the OpenMP threads. To further improve the code performance, we ported the application on the GPU, replacing the OpenMP part with OpenACC. In this porting, the ~95% of the data is copied at the beginning of the entire cycle of iterations, making the code compute bound rather than memory bound. In the preliminary tests, the OpenACC code already accelerates with respect to the OpenMP version and we aim to obtain a speedup equal to 2. Further optimizations, that involve the asynchronous work of the GPU and CPU regions of the code, are in progress to obtain higher gains. The code runs on multiple GPUs, one per MPI process, and it was tested on a full node of the CINECA supercomputer Marconi100, with 4 V100 GPUs having 16 GB of memory each, in perspective of a porting on the pre-exascale system Leonardo, that will be installed at CINECA in 2022. In the next month, we are going to work on the porting of the code with CUDA.


Big data: How to deal with the 5 Vs (volume, velocity, variety, veracity, value)