Logo RUB
  • Institute
    • ICAMS
      • Mission
      • Structure
      • Members
      • Fellows
    • Departments & Research Groups
      • Atomistic Modelling and Simulation
      • Scale-Bridging Thermodynamic and Kinetic Simulation
      • Micromechanical and Macroscopic Modelling
      • Artificial Intelligence for Integrated Material Science
      • Computational Design of Functional Interfaces
      • Scale-Bridging Simulation of Functional Composites
      • Materials Informatics and Data Science
      • High-Performance Computing in Materials Science
    • Central Services
      • Coordination Office
      • IT
  • Research
    • Overview
    • Publications
    • Software and Data
    • Collaborative research
    • Research networks
    • Young enterprises
  • Teaching
    • Overview
    • Materialwissenschaft B.Sc.
    • Materials Science and Simulation M.Sc.
    • ICAMS Graduate School
    • Student Projects
  • News & Events
    • Overview
    • News
    • Seminars and Workshops
    • Conferences
  • Services
    • Overview
    • Contact
    • Open positions
    • Travel information
 
ICAMS
ICAMS
MENÜ
  • RUB-STARTSEITE
  • Institute
    • ICAMS
    • Departments & Research Groups
    • Central Services
  • Research
    • Overview
    • Publications
    • Software and Data
    • Collaborative research
    • Research networks
    • Young enterprises
  • Teaching
    • Overview
    • Materialwissenschaft B.Sc.
    • Materials Science and Simulation M.Sc.
    • ICAMS Graduate School
    • Student Projects
  • News & Events
    • Overview
    • News
    • Seminars and Workshops
    • Conferences
  • Services
    • Overview
    • Contact
    • Open positions
    • Travel information

Just another WordPress site - Ruhr-Universität Bochum

conference

From text data to word embeddings in materials science

Lei Zhang, Ruhr-Universität Bochum, Bochum, Germany

Markus Stricker, Ruhr-Universität Bochum, Bochum, Germany

Time & Place
  • Date: 06.09.2023
  • Time:
  • Place: FEMS EUROMAT 2023, Frankfurt am Main, Germany

Abstract

The field of materials science relies heavily on data to understand the properties and behavior of materials. One important source of data is scientific literature in text form. However, it is becoming increasingly harder for researchers to digest the vast amount of information contained, possibly missing important clues for promising discovery directions and design principles. We present a method for preprocessing text data, including cleaning, tokenization, and stemming/lemmatization, to prepare a cleaned corpus for further analysis. Insights in material design are further extracted by creating e.g. simple word clouds. Further, word embeddings are retrieved by using word2vec based on the text data. It allows performing mathematical operations on words, such as developing similarity measures between words (“entities”, i.e. certain materials), which makes it a powerful tool for extracting insights and knowledge from text data in the field of material science. This means that we can compare different materials based on their properties, synthesis methods, and other characteristics as presented in the literature, and identify similarities and differences between them. Additionally, the similarity measures can be used to group materials into clusters or categories, making it easier to understand and analyze the data. The knowledge extraction strategy proposed in this paper can be useful for creating a corpus of text data for text mining, and predictive capabilities, as well as providing overviews with the latest research in a field. Specifically, we show how to apply this method in the field of electrocatalysis, which can support researchers to discover new materials and design robust electrocatalysts based on existing published results.

back
Logo RUB
  • Open positions
  • Travel information
  • Imprint
  • Privacy Policy
  • Sitemap
Ruhr-Universität Bochum
Universitätsstraße 150
44801 Bochum

  • Open positions
  • Travel information
  • Imprint
  • Privacy Policy
  • Sitemap
Seitenanfang Kontrast N