Just another WordPress site - Ruhr-Universität Bochum
Text mining for insights in material science: A case study in electrocatalysis
- Date: 07.09.2023
- Time:
- Place: FEMS EUROMAT, Frankfurt am Main, Germany
Abstract
The field of materials science relies heavily on data to understand the properties and behavior of materials. One important source of data is scientific literature in text form. However, it is becoming increasingly harder for researchers to digest the vast amount of information contained, possibly missing important clues for promising discovery directions and design principles. We present a method for preprocessing text data, including cleaning, tokenization, and stemming/lemmatization, to prepare a cleaned corpus for further analysis. Insights in material design are further extracted by creating e.g. simple word clouds. Further, word embeddings are retrieved by using word2vec based on the text data. It allows performing mathematical operations on words, such as developing similarity measures between words (“entities”, i.e. certain materials), which makes it a powerful tool for extracting insights and knowledge from text data in the field of material science. This means that we can compare different materials based on their properties, synthesis methods, and other characteristics as presented in the literature, and identify similarities and differences between them. Additionally, the similarity measures can be used to group materials into clusters or categories, making it easier to understand and analyze the data. The knowledge extraction strategy proposed in this work can be useful for creating a corpus of text data for text mining, and predictive capabilities, as well as providing overviews with the latest research in a field. Specifically, we show how to apply this method in the field of electrocatalysis, which can support researchers to discover new materials and design robust electrocatalysts based on existing published results.