Statistical approach for automated weighting of datasets: Application to heat capacity data

S. Zomorodpoosh, B. Bocklund, A. Obaied, R. Otis, Z. Liu, I. Roslyakova.

CALPHAD: Computer Coupling of Phase Diagrams and Thermochemistry, 71, 101994, (2020)

An essential step in CALPHAD is assigning relative weights to different datasets, but there is no consensus as to the best approach regarding this issue. Currently, such an assignment of weights for experimental or first-principles data is performed manually based on the knowledge and experience of the modeler. Since the existing manual treatment is subjective and time consuming, manipulation of such data is rapidly advancing toward automated procedures through statistical and data mining tools. In the present study, we propose an automated approach to determine the weight of datasets based on the K-Fold Cross-Validation method, modified under the conditions that each fold is selected non-randomly and contains an unequal number of observations. This approach can be considered for researchers as a support tool to evaluate the reliability of each dataset involved in the CALPHAD modeling and quantify the impact of weighting by statistical analysis of the corresponding model. We demonstrate the efficacy of this method through the evaluation of heat capacity data of fcc nickel, hcp magnesium, and bcc iron.

DOI: https://doi.org/10.1016/j.calphad.2020.101994
