# Just another WordPress site - Ruhr-Universität Bochum

## Predicting the thermodynamic stability of solids combining density functional theory and machine learning

J. Schmidt, J. Shi, P. Borlido, L. Chen, S. Botti, M. Marques

Chemistry of Materials, **29**, 5090–5103, (2017)

DOI: 10.1021/acs.chemmater.7b00156

Download: BibT_{E}X

We perform a large scale benchmark of machine learning methods for the prediction of the thermodynamic stability of solids. We start by constructing a data set that comprises density functional theory calculations of around 250000 cubic perovskite systems. This includes all possible perovskite and antiperovskite crystals that can be generated with elements from hydrogen to bismuth, excluding rare gases and lanthanides. Incidentally, these calculations already reveal a large number of systems (around 500) that are thermodynamically stable but that are not present in crystal structure databases. Moreover, some of these phases have unconventional compositions and define completely new families of perovskites. This data set is then used to train and test a series of machine learning algorithms to predict the energy distance to the convex hull of stability. In particular, we study the performance of ridge regression, random forests, extremely randomized trees (including adaptive boosting), and neural networks. We find that extremely randomized trees give the smallest mean absolute error of the distance to the convex hull (121 meV/atom) in the test set of 230000 perovskites, after being trained in 20000 samples. Surprisingly, the machine already works if we give it as sole input features the group and row in the periodic table of the three elements composing the perovskite. Moreover, we find that the prediction accuracy is not uniform across the periodic table, being worse for first-row elements and elements forming magnetic compounds. Our results suggest that machine learning can be used to speed up considerably (by at least a factor of 5) high-throughput DFT calculations, by restricting the space of relevant chemical compositions without degradation of the accuracy.