ADASS XXXI

Julia meets BIG DATA: JVO experience with distributed computing
2021-10-25, 10:30–10:45, Grand Ballroom

Each year the size of FITS data cubes coming out from various telescopes keeps rising. At present the largest publicly-available FITS file handled by the Japanese Virtual Observatory (JVO) is about 350GB, which is putting an enormous strain on the existing hardware infrastructure. We anticipate ALMA FITS files to break through 1TB in the near future.

A simple solution to handling such large file sizes is to keep buying a more powerful server each year. However, in a budget-constrained astronomy data centre the reasonable alternative is to try distributed computing using commodity cluster computing. To this end, in order to keep up with the ever-rising file sizes the JVO development team has been experimenting with various forms of distributed computing using several commodity desktop computers networked via 10Gbps Ethernet. As part of a search for the most convenient and performant distributed computing paradigm, since late 2018 we have gone through three programming language changes: from Rust to C/C++17, then a hybrid CoArray Fortran 2018 augmented by pure C, finally to settle for Julia due to its superior asynchronous distributed computing capabilities.

The talk discusses in detail the rationale behind such drastic programming language changes as well as showcases the latest cluster edition (still under development) of FITSWebQL v5, code-named FITSWEBQL SE (Supercomputer Edition). The purpose of the JVO FITSWebQL software is to provide an interactive preview of FITS files via a web browser, without a need to download the underlying FITS files. The latest Julia FITSWEBQLSE makes it possible to preview interactively (in near real-time) over 350GB-large FITS files from the comfort of a web browser.


Theme

Big data: How to deal with the 5 Vs (volume, velocity, variety, veracity, value)