Three workflow add-ons to improve machine learning reproducibility in astronomy
2021-10-27, 09:00–09:15, Grand Ballroom

As machine learning increasingly becomes adopted into astronomy research practices, additional workflows must be added to preserve reproducible and replicable results. However, as machine learning tools are often drawn from industry or computer science domains, how these imports can segue into traditional astronomy computational practices and enhance results is not well understood. In this talk, I start from the position that machine learning models rarely can be rerun after a year’s time, let alone after a longer time span. I ask the question: How can scientific results be verified, compared to, or built upon if the artefacts necessary to examine the process to obtain results have not been saved? With this in mind, I suggest three straightforward additions to typical machine learning workflows to improve reproducibility: (1) Fixing the training data used to create the model, (2) Covering best practices in saving the model, including parameters and hyperparameters used in the final tuning, and (3) Describing additional artefacts in model evaluations that enable a further understanding of outputs that the model produced. I will conclude with steps enacted in our team’s research project using machine learning to predict galactic redshifts, and present lessons learned in our experience reproducing models using machine learning.


Solutions for workflow management and reproducibility, Understanding and improving machine learning