QuartiCal - embarassingly parallel calibration using Numba and Dask
2021-10-25, 10:00–10:15, Grand Ballroom

We live in the era of Big Data. Where once it was a looming threat, a problem for our future selves, it is now very much upon us. Existing radio interferometers, such as the MeerKAT and LOFAR, already produce unprecedented quantities of visibility data, and even they will soon be dwarfed by the planned SKA and ngVLA. Despite the modernity of these instruments, they will still require extensive calibration in order to correct various science-limiting effects. Thus calibration is and will remain an integral part of radio interferometric data reduction. This has motivated the development of QuartiCal, a Python application that leverages a couple of contemporary packages to make calibration scalable, distributable and fast. The first such package is Dask, a library for parallel and distributed computing with Python. It provides parallel, Big Data collections that extend familiar interfaces (e.g. NumPy, Pandas). Using these collections, it is possible to construct task graphs that can be understood by Dask’s dynamic task schedulers. This allows appropriately written code to scale from executing locally on a laptop to remotely on a compute cluster. The second package of interest is a Numba. Numba is a just-in-time compiler for a subset of Python/NumPy that can provide C-like speed without forfeiting the expressiveness and dynamism of Python. It has been used to extensively optimize the computationally demanding components of the calibration algorithms. QuartiCal convincingly outperforms its predecessor, CubiCal, in terms of both wall time and memory footprint. Finally, in testing QuartiCal, we have found that the Measurement Set (backed by the Casacore Table Data System) can limit parallel performance.


Big data: How to deal with the 5 Vs (volume, velocity, variety, veracity, value)