The universe speaks for itself: from unsupervised physics to semantic source separation
2021-10-28, 12:30–12:45, Grand Ballroom

Machine learning has been widely applied to clearly defined problems of astronomy and astrophysics. However, deep learning and its conceptual differences to classical machine learning have been largely overlooked in these fields. The broad hypothesis behind our work is that letting the abundant real astrophysical data speak for itself, with minimal supervision and no labels, can reveal interesting patterns which may facilitate discovery of novel physical relationships.
We train an encoder-decoder architecture on the self-supervised auxiliary task of reconstruction to allow it to learn general representations without bias towards any specific task. By exerting weak disentanglement at the information bottleneck of the network, we implicitly enforce interpretability in the learned features.
So far we have achieved interesting results in two avenues: firstly, our "AstroMachines" have learned to infer physical parameters such as radial velocity and effective temperature, just by watching a large number of stellar spectra and without being asked to do so. Secondly and more recently, we have observed semantic source separation abilities in the same architecture, and have reinforced it to "randomize out" telluric lines in stellar spectra, again in a non-supervised fashion.


Understanding and improving machine learning, Big data: How to deal with the 5 Vs (volume, velocity, variety, veracity, value)