Exploring the halo-galaxy connection with probabilistic approaches

Rodrigues, Natália V. N.; de Santi, Natalí S. M.; Abramo, Raul; Montero-Dorta, Antonio D.
Bibliographical reference

Astronomy and Astrophysics

Advertised on:
6
2025
Number of authors
4
IAC number of authors
1
Citations
0
Refereed citations
0
Description
Context. The connection between galaxies and their host dark matter halos encompasses a range of intricate and interrelated processes, playing a pivotal role in our understanding of galaxy formation and evolution. Traditionally, this link has been established through physical or empirical models. On the other hand, machine learning techniques are adaptable tools capable of handling high-dimensional data and grasping associations between numerous attributes. In particular, probabilistic models in machine learning capture the stochasticity inherent to these highly complex processes and relations. Aims. We compare different probabilistic machine learning methods to model the uncertainty in the halo-galaxy connection and efficiently generate galaxy catalogs that faithfully resemble the reference sample by predicting joint distributions of central galaxy properties, namely stellar mass, color, specific star formation rate, and radius, conditioned to their host halo features. Methods. The analysis is based on the IllustrisTNG300 magnetohydrodynamical simulation. The machine learning methods model the distributions in different ways. We compare a multilayer perceptron that predicts the parameters of a multivariate Gaussian distribution, a multilayer perceptron classifier, and the method of normalizing flows. The classifier predicts the parameters of a categorical distribution, which are defined in a high-dimensional parameter space through a Voronoi cell-based hierarchical scheme. The results are validated with metrics designed to test probability density distributions and the predictive power of the methods. Results. We evaluate the model's performances under various sample selections based on halo properties. The three methods exhibit comparable results, with normalizing flows showing the best performance in most scenarios. The models not only reproduce the main features of galaxy properties distributions with high-fidelity, but can also be used to reproduce the results obtained with traditional, deterministic, estimators. Our results also indicate that different halos and galaxy populations are subject to varying degrees of stochasticity, which has relevant implications for studies of large-scale structure.