Andy Connolly

The speaker's profile picture


I am a Professor in Astronomy and the Director of the eScience Institute, which acts as the hub of data science at the University of Washington. Prior to that, I was the founding Director of the DiRAC Institute which is a data science institute for astronomy housed in the Department of Astronomy at the University of Washington. I have worked for a number of years on the design and construction of large astronomical surveys including the Legacy Survey of Space and Time (LSST) that will be undertaken by the Rubin Observatory. My research draws on astrophysics, computer science, statistics, and other disciplines to develop scalable tools to characterize, search and analyze the hundreds of petabytes of data generated by large astronomical surveys. In this direction, with Zeljko Ivezic, Jake VanderPlas, and Alex Gray I co-wrote a book on Statistics and Machine Learning for Astronomers, "Statistics, Data Mining, and Machine Learning in Astronomy" which we use to teach students how to apply machine learning techniques to astronomical data sets. A particular interest is the study of the formation and evolution of galaxies and cosmology (to understand the nature of dark energy and dark matter).

Profile Picture adass-xxxi-2021/question_uploads/Andy_C2_2hgCXMm.jpg Affiliation

University of Washington


Director, eScience Institute


Scaling Science in the Era of Survey Astronomy
Andy Connolly

In an era of astronomical surveys and experiments capable of generating 10s to 100s of petabytes of data a year, we face are facing the question of how we maximize the scientific potential of these billion dollar investments. We design and build experiments with sufficient computational resources to acquire, process and store the observational data they generate. We do not, however, apply to the same thought and effort to developing the software and computational infrastructure necessary for large-scale scientific analyses of data sets that are often complex, noisy and incomplete. In this talk I will discuss how we might address some of these challenges through a combination of emergent techniques and methodologies, together with lessons we can learn from other fields facing similar issues. Partial solutions will come from novel computational architectures (GPUs and FPGAs), analytics frameworks designed to run across thousands of processors (e.g. Spark and Dask), and algorithmic advances in machine learning and statistics (e.g. deep learning). To fully realize the potential of large scale survey astronomy will, however, also require changes in how we educate and train our community, how we share resources and expertise between institutions, and in particular how we integrate professional software engineering within our research infrastructure.

Grand Ballroom