Helmholtz-Zentrum Hereon GmbH
Recent developments in the field of data mining have received considerable attention from the materials science community due to their ability to accelerate the design of new materials. Experimental datasets of materials research findings are usually published in scientific literature. Mining such literature thus enables the discovery of synergistic effects and meaningful insights by virtue of evaluating the combined experimental datasets. The availability of relevant and ready-to-use databases that contain such datasets is essential for data-driven materials modelling and knowledge discovery in scientific literature. Unfortunately, such databases are not provided by any existing tool or digital library, such as Scopus, ScienceDirect, and Wiley. The creation of these databases demands highly specific searches. For example, retrieval of literature that has exclusively experimental datasets on minimum creep rate of gamma titanium aluminides is not possible with their provided search engines.
This work presents a system that facilitates a federated search-based automatic ingestion of literature from digital libraries, followed by the generation of multimodal databases. This includes collections of machine-readable normalized data, plain-text corpus, and visual elements. Furthermore, the system implements an information retrieval system for finding relevant literature containing datasets. Besides phrase, faceted, full-text, and conjunctive and disjunctive search capabilities, the selection mechanism allows dataset-aware literature retrieval based on the metadata of visual elements. This metadata includes the visual element type depending on its content and characteristics, along with the caption text. Moreover, the system aids in creating curated datasets that can be utilized for training domain-specific large language models and modelling the mechanical behaviour of materials. The flexibility of the proposed system also empowers it to be seamlessly applied to other fields of research for literature mining purposes, opening up a multitude of research opportunities. This talk emphasizes the features and applicability of the system in the materials science field with real-world examples.
Abstract
Erwerben Sie einen Zugang, um dieses Dokument anzusehen.
© 2025