ADASS XXXI

Austin Shen

The speaker's profile picture

Biography

Hi, my name is Austin and I'm a research software engineer working at CSIRO (Perth, WA) on the Australian SKA Regional Centre design study. I have a Masters degree in physics (astronomy and astrophysics) from the UWA, and have spent time in industry working in software engineering and data science roles. My current work is focused on the development of data post-processing pipelines for SKA precursor telescope science projects.

Profile Picture adass-xxxi-2021/question_uploads/me_ymLAlEs.jpg Affiliation

CSIRO

Position

Research Software Engineer

GitHub ID

axshen


Sessions

10-28
10:45
15min
A globally distributed and scalable data post-processing framework for WALLABY science
Austin Shen

WALLABY is the ASKAP all-sky HI survey, the post-processing for which involves mosaicking of spectral-line data-cubes, source finding, cross-matching, computing moment maps for the detected galaxies, kinematics and ad-hoc interactions with the data. The aforementioned processes are provided by a number of collaborating institutions, which are distributed internationally and utilising different computing facilities. Over the course of the survey, these institutions will process data on the scale of petabytes. The existing post-processing approach is mostly manual and labour-intensive, and has led to unnecessary logistical effort by scientists. As such, new framework is required to efficiently process large ASKAP data across international borders for the full survey.

We have developed a new framework for distributed and scalable WALLABY data post-processing. The technology stack includes PostgreSQL for relational databases, replicated across institutions with Bucardo, to provide a central location for WALLABY survey data products. Interactive web interfaces such as Django admin portals, Jupyter notebooks and VO services provide user access to the data. Computational pipelines, composed in Nextflow, allow HPC resource agnostic and parallel execution of containerised applications. In this discussion we provide an overview of the framework, system architecture, and how it will be utilised to help WALLABY scientists process ASKAP spectral-line data.

Grand Ballroom