Logo RUB
  • Institute
    • ICAMS
      • Mission
      • Structure
      • Members
      • Fellows
      • Scientific Reports
    • Departments & Research Groups
      • Atomistic Modelling and Simulation
      • Scale-Bridging Thermodynamic and Kinetic Simulation
      • Micromechanical and Macroscopic Modelling
      • Artificial Intelligence for Integrated Material Science
      • Computational Design of Functional Interfaces
      • Scale-Bridging Simulation of Functional Composites
      • Materials Informatics and Data Science
      • High-Performance Computing in Materials Science
    • Central Services
      • Coordination Office
      • IT
  • Research
    • Overview
    • Publications
    • Software and Data
    • Collaborative research
    • Research networks
    • Young enterprises
  • Teaching
    • Overview
    • Materialwissenschaft B.Sc.
    • Materials Science and Simulation M.Sc.
    • ICAMS Graduate School
    • Student Projects
  • News & Events
    • Overview
    • News
    • Seminars and Workshops
    • Conferences
  • Services
    • Overview
    • Contact
    • Open positions
    • Travel information
 
ICAMS
ICAMS
MENÜ
  • RUB-STARTSEITE
  • Institute
    • ICAMS
    • Departments & Research Groups
    • Central Services
  • Research
    • Overview
    • Publications
    • Software and Data
    • Collaborative research
    • Research networks
    • Young enterprises
  • Teaching
    • Overview
    • Materialwissenschaft B.Sc.
    • Materials Science and Simulation M.Sc.
    • ICAMS Graduate School
    • Student Projects
  • News & Events
    • Overview
    • News
    • Seminars and Workshops
    • Conferences
  • Services
    • Overview
    • Contact
    • Open positions
    • Travel information

Just another WordPress site - Ruhr-Universität Bochum

Iterative corpus refinement for materials property prediction based on scientific texts

L. Zhang, M. Stricker

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track, 89-103, (2026)

DOI: 10.1007/978-3-032-06118-8_6

Download: BibTEX

The discovery and optimization of materials for specific applications is hampered by the practically infinite number of possible elemental combinations and associated properties, also known as the `combinatorial explosion'. By nature of the problem, data are scarce and all possible data sources should be used. In addition to simulations and experimental results, the latent knowledge in scientific texts is not yet used to its full potential. We present an iterative framework that refines a given scientific corpus by strategic selection of the most diverse documents, training Word2Vec models, and monitoring the convergence of composition-property correlations in embedding space. Our approach is applied to predict high-performing materials for oxygen reduction (ORR), hydrogen evolution (HER), and oxygen evolution (OER) reactions for a large number of possible candidate compositions. Our method successfully predicts the highest performing compositions among a large pool of candidates, validated by experimental measurements of the electrocatalytic performance in the lab. This work demonstrates and validates the potential of iterative corpus refinement to accelerate materials discovery and optimization, offering a scalable and efficient tool for screening large compositional spaces where reliable data are scarce or non-existent.

back
{"type":"inproceedings", "name":"l.zhang20261", "author":"L. Zhang and M. Stricker", "title":"Iterative corpus refinement for materials property prediction based on scientific texts", "journal":"Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track", "volume":"", "OPTnumber":"", "OPTmonth":"1", "year":"2026", "OPTpages":"89-103", "OPTnote":"", "OPTkey":"", "DOI":"10.1007/978-3-032-06118-8_6"}
Logo RUB
  • Open positions
  • Travel information
  • Imprint
  • Privacy Policy
  • Sitemap
Ruhr-Universität Bochum
Universitätsstraße 150
44801 Bochum

  • Open positions
  • Travel information
  • Imprint
  • Privacy Policy
  • Sitemap
Seitenanfang Kontrast N