FEMS EUROMAT 2023
Lecture
07.09.2023 (CEST)
From text data to word embeddings in Materials Science
LZ

Dr.-Ing. Lei Zhang

Ruhr-Universität Bochum

Zhang, L. (Speaker)¹; Stricker, M.¹
¹Ruhr University Bochum
Vorschau
22 Min. Untertitel (CC)

The field of materials science relies heavily on data to understand the properties and behavior of materials. One important source of data is scientific literature in text form. However, it is becoming increasingly harder for researchers to digest the vast amount of information contained, possibly missing important clues for promising discovery directions and design principles. We present a method for preprocessing text data, including cleaning, tokenization, and stemming/lemmatization, to prepare a cleaned corpus for further analysis. Insights in material design are further extracted by creating e.g. simple word clouds. Further, word embeddings are retrieved by using word2vec based on the text data. It allows performing mathematical operations on words, such as developing similarity measures between words (“entities”, i.e. certain materials), which makes it a powerful tool for extracting insights and knowledge from text data in the field of material science. This means that we can compare different materials based on their properties, synthesis methods, and other characteristics as presented in the literature, and identify similarities and differences between them. Additionally, the similarity measures can be used to group materials into clusters or categories, making it easier to understand and analyze the data. The knowledge extraction strategy proposed in this paper can be useful for creating a corpus of text data for text mining, and predictive capabilities, as well as providing overviews with the latest research in a field. Specifically, we show how to apply this method in the field of electrocatalysis, which can support researchers to discover new materials and design robust electrocatalysts based on existing published results. 

Abstract

Abstract

Erwerben Sie einen Zugang, um dieses Dokument anzusehen.

Ähnliche Inhalte

© 2026