### Valentina Cesare

**Biography**–

I am a research associate at the Astrophysics Observatory of Catania. I am currently working, with Dr. Ugo Becciani, on the study of optimal methodologies for the development, support and porting of astrophysics applications related to Gaia space mission on HPC, HTC, and GPU environments. As part of this work, I am also managing the system administration of the computer cluster of the observatory.

I got my Ph.D. in Physics and Astrophysics in March 2021, at the Physics Department of the University of Turin, under the supervision of Prof. Antonaldo Diaferio. My Ph.D. thesis was about the investigation of the dynamics of disk and elliptical galaxies with the theory of modified gravity Refracted Gravity.

**Profile Picture**– adass-xxxi-2021/question_uploads/Valentina_uHp4Wvc.jpg

**Affiliation**–

INAF, Osservatorio Astrofisico di Catania

**Position**–

Research associate

**Homepage**–

#### Sessions

The Gaia Astrometric Verification Unit – Global Sphere Reconstruction (AVU-GSR) Parallel Solver aims to find the astrometric parameters for ~1 billion of stars in the Milky Way, the attitude and the instrumental parameters of the Gaia satellite, and the global parameter gamma of the post Newtonian formalism RAMOD. To perform this task, the code iteratively solves a system of linear equations, A x = b, where the coefficient matrix A is large, having a dimension of ~10^10 x 10^8, and sparse.

To deal with this big data problem, the matrix A is compactified through an ad-hoc compression algorithm and only its elements different from zero are considered during computation. The matrix dimension reduces from ~10^10 x 10^8 to ~10^10 x 10^1. To solve this system, the code exploits a hybrid implementation of the iterative PC-LSQR algorithm, where the computation related to different horizontal portions of the coefficient matrix is assigned to different MPI processes. In the original code, each matrix portion is further parallelized over the OpenMP threads. To further improve the code performance, we ported the application on the GPU, replacing the OpenMP part with OpenACC. In this porting, the ~95% of the data is copied at the beginning of the entire cycle of iterations, making the code compute bound rather than memory bound. In the preliminary tests, the OpenACC code already accelerates with respect to the OpenMP version and we aim to obtain a speedup equal to 2. Further optimizations, that involve the asynchronous work of the GPU and CPU regions of the code, are in progress to obtain higher gains. The code runs on multiple GPUs, one per MPI process, and it was tested on a full node of the CINECA supercomputer Marconi100, with 4 V100 GPUs having 16 GB of memory each, in perspective of a porting on the pre-exascale system Leonardo, that will be installed at CINECA in 2022. In the next month, we are going to work on the porting of the code with CUDA.