BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.adass2021.ac.za//VZCWBR
BEGIN:VTIMEZONE
TZID:Africa/Johannesburg
BEGIN:STANDARD
DTSTART:20000101T000000
RRULE:FREQ=YEARLY;BYMONTH=1
TZNAME:SAST
TZOFFSETFROM:+0200
TZOFFSETTO:+0200
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-adass-xxxi-2021-VZCWBR@pretalx.adass2021.ac.za
DTSTART;TZID=Africa/Johannesburg:20211026T123000
DTEND;TZID=Africa/Johannesburg:20211026T124500
DESCRIPTION:The Gaia Astrometric Verification Unit – Global Sphere Recons
truction (AVU-GSR) Parallel Solver aims to find the astrometric parameters
for ~1 billion of stars in the Milky Way\, the attitude and the instrumen
tal parameters of the Gaia satellite\, and the global parameter gamma of t
he post Newtonian formalism RAMOD. To perform this task\, the code iterati
vely solves a system of linear equations\, A x = b\, where the coefficient
matrix A is large\, having a dimension of ~10^10 x 10^8\, and sparse.\nTo
deal with this big data problem\, the matrix A is compactified through an
ad-hoc compression algorithm and only its elements different from zero ar
e considered during computation. The matrix dimension reduces from ~10^10
x 10^8 to ~10^10 x 10^1. To solve this system\, the code exploits a hybrid
implementation of the iterative PC-LSQR algorithm\, where the computation
related to different horizontal portions of the coefficient matrix is ass
igned to different MPI processes. In the original code\, each matrix porti
on is further parallelized over the OpenMP threads. To further improve the
code performance\, we ported the application on the GPU\, replacing the O
penMP part with OpenACC. In this porting\, the ~95% of the data is copied
at the beginning of the entire cycle of iterations\, making the code compu
te bound rather than memory bound. In the preliminary tests\, the OpenACC
code already accelerates with respect to the OpenMP version and we aim to
obtain a speedup equal to 2. Further optimizations\, that involve the asyn
chronous work of the GPU and CPU regions of the code\, are in progress to
obtain higher gains. The code runs on multiple GPUs\, one per MPI process\
, and it was tested on a full node of the CINECA supercomputer Marconi100\
, with 4 V100 GPUs having 16 GB of memory each\, in perspective of a porti
ng on the pre-exascale system Leonardo\, that will be installed at CINECA
in 2022. In the next month\, we are going to work on the porting of the co
de with CUDA.
DTSTAMP:20220520T074309Z
LOCATION:Grand Ballroom
SUMMARY:Gaia AVU-GSR parallel solver towards exascale infrastructure - Vale
ntina Cesare
URL:https://pretalx.adass2021.ac.za/adass-xxxi-2021/talk/VZCWBR/
END:VEVENT
END:VCALENDAR