A globally distributed and scalable data post-processing framework for WALLABY science
2021-10-28, 10:45–11:00, Grand Ballroom

WALLABY is the ASKAP all-sky HI survey, the post-processing for which involves mosaicking of spectral-line data-cubes, source finding, cross-matching, computing moment maps for the detected galaxies, kinematics and ad-hoc interactions with the data. The aforementioned processes are provided by a number of collaborating institutions, which are distributed internationally and utilising different computing facilities. Over the course of the survey, these institutions will process data on the scale of petabytes. The existing post-processing approach is mostly manual and labour-intensive, and has led to unnecessary logistical effort by scientists. As such, new framework is required to efficiently process large ASKAP data across international borders for the full survey.

We have developed a new framework for distributed and scalable WALLABY data post-processing. The technology stack includes PostgreSQL for relational databases, replicated across institutions with Bucardo, to provide a central location for WALLABY survey data products. Interactive web interfaces such as Django admin portals, Jupyter notebooks and VO services provide user access to the data. Computational pipelines, composed in Nextflow, allow HPC resource agnostic and parallel execution of containerised applications. In this discussion we provide an overview of the framework, system architecture, and how it will be utilised to help WALLABY scientists process ASKAP spectral-line data.


Solutions for workflow management and reproducibility, Other