Karlsruher Institut für Technologie (KIT)
Efficient retrieval and comparison of simulation and experimental data are essential for advancing materials science [1]. However, navigating through the growing body of scientific knowledge and extracting relevant information for specific research remains a challenging task since much of the multi-modal data remains locked in scientific literature with limited accessibility and machine-readability. While open science initiatives have improved data availability, the extraction of meaningful information from unstructured sources remains a challenge.
This contribution presents an automated workflow that transforms scientific literature into a structured, machine-readable format. By leveraging natural language processing (NLP) and vision transformer (ViT) models, the presented method extracts and organizes information from text, figures, tables, equations, and metadata. The resulting database is further enriched with local data, enabling a user-specific knowledge integration. By integrating a Retrieval-Augmented Generation (RAG) based Large Language Model (LLM), an efficient, domain-specific question-answering chat bot is employed to accelerate scientific discovery. A use case in the microstructural analysis of face-centred cubic single crystals highlights the applicability of the workflow [2].
[1] K. Choudhary et al.; npj Computational Materials, 2022, 8.
[2] B. Katzer, S. Klinder, K. Schulz; Materials Today Communications, 2025, 45, 112186.
Abstract
Erwerben Sie einen Zugang, um dieses Dokument anzusehen.
© 2026