AI MSE 2025
Lecture
19.11.2025 (CET)
Literature-Based Prediction of High-Performance Electrocatalysts
LZ

Dr.-Ing. Lei Zhang

Ruhr-Universität Bochum

Zhang, L. (Speaker)¹; Stricker, M.¹
¹Ruhr University Bochum
Vorschau
22 Min.

The discovery and optimization of high-performance materials for electrocatalysis is fundamental to the advancement of energy conversion technologies. However, the vastness of the chemical design space, driven by the nearly infinite combinations of elements and processing conditions, poses a major challenge, often referred to as the ``combinatorial explosion''. Traditional approaches, which are heavily based on simulations and experimental screening, are constrained by time, cost, and the scarcity of reliable data. An underutilized yet powerful resource is the latent knowledge in the scientific literature which includes, e.g., correlations between composition and material properties.


spacIn this work, we present a literature-based framework that leverages natural language processing (NLP) techniques to predict promising electrocatalyst compositions. Using Word2Vec to model semantic relationships between material compositions and performance descriptors extracted from scientific abstracts, we identify candidates for key electrochemical reactions, including the oxygen reduction reaction (ORR), hydrogen evolution reaction (HER), and oxygen evolution reaction (OER). To enhance prediction quality and data efficiency, we employ an iterative corpus refinement strategy that prioritizes the most diverse and informative documents, allowing composition-property correlations to emerge more clearly in embedding space.


In regions of sparse experimental or simulation data, we combine these text-derived embeddings with Pareto front analysis to isolate high-performance candidates based solely on their linguistic similarity to target properties such as `conductivity' or `dielectric'. The resulting candidate predictions are experimentally validated and show excellent agreement with the best measured electrocatalytically active compositions. Our approach highlights the untapped potential of the scientific literature as a data source for predictive models in materials discovery and offers a scalable, data-efficient method for navigating large, unexplored compositional spaces.

Ähnliche Inhalte

© 2026