Ruhr-Universität Bochum
Two primary data sources exist for gaining insight into material properties: experimental measurements and computer simulations. The results of these experiments and simulations are often documented in publications, yet this data remains unstructured, making it challenging to effectively manage and retrieve pertinent information and, therefore, remains largely unused. Because of these obstacles, researchers are unable to make full use of the abundance of information presented in publications.
In response to this challenge, our strategy involves two main steps. First, we compile training data for a large language model by extracting information from unstructured literature sources manually. We use various natural language processing techniques to preprocess the data, identify entities, and extract relationships. Second, we train this language model to utilize the gathered information effectively on test data. This enables us to construct a comprehensive knowledge graph, serving as a structured framework that links all pieces of information related to materials properties, experiments, and simulations. Several publications were utilized within this framework to compile the data and create the knowledge graph on specific material properties, including Stacking Fault Energy (SFE) and Creep tests.
Our goals are first, to simplify the storage and retrieval of data; and second, to improve the usage of existing materials data in literature by simplifying the combination of experimental and simulated data in a queryable knowledge graph.
© 2026