2021-10-27, 10:00–10:30, Grand Ballroom
In an era of astronomical surveys and experiments capable of generating 10s to 100s of petabytes of data a year, we face are facing the question of how we maximize the scientific potential of these billion dollar investments. We design and build experiments with sufficient computational resources to acquire, process and store the observational data they generate. We do not, however, apply to the same thought and effort to developing the software and computational infrastructure necessary for large-scale scientific analyses of data sets that are often complex, noisy and incomplete. In this talk I will discuss how we might address some of these challenges through a combination of emergent techniques and methodologies, together with lessons we can learn from other fields facing similar issues. Partial solutions will come from novel computational architectures (GPUs and FPGAs), analytics frameworks designed to run across thousands of processors (e.g. Spark and Dask), and algorithmic advances in machine learning and statistics (e.g. deep learning). To fully realize the potential of large scale survey astronomy will, however, also require changes in how we educate and train our community, how we share resources and expertise between institutions, and in particular how we integrate professional software engineering within our research infrastructure.
Big data: How to deal with the 5 Vs (volume, velocity, variety, veracity, value)