Optimizing Retrieval-Augmented Generation for Tabular Data in Material Property Databases

Lecture

19.11.2025 (CET)

Optimizing Retrieval-Augmented Generation for Tabular Data in Material Property Databases

IP

Inés Pérez Couñago (M.Sc.)

AIMEN Technology Centre

Pérez Couñago, I. (Speaker)¹; Muíños-Landín, S.¹; Novas Domínguez, G.¹; Suárez Casabiell, L.¹

¹AIMEN Technology Center, O Porriño (Spain)

Vorschau

17 Min.

Large Language Models (LLMs) facilitate the development of material property databases by automating data retrieval, enhancing repository consistency, and improving integration into engineering workflows. However, material data is often stored in tabular form, posing a challenge for LLMs, which are primarily trained on sequential text and struggle with accurately interpreting structured data.

This work explores the use of Retrieval-Augmented Generation (RAG) for extracting material properties from tables, with a focus on both preprocessing and postprocessing stages. Preprocessing optimizes tabular data for LLM accessibility, ensuring more effective retrieval, while postprocessing refines extracted information into well-structured value/unit pairs. Standardization enables unit conversion, a requirement to perform data analysis, and enhances database integrity. Additionally, this approach enables the development of a methodology for objectively evaluating extraction performance. A custom evaluation metric, which weights the accuracy of both retrieved values and units, is introduced to assess domain-specific challenges. These include model effectiveness in table reformatting and data retrieval, the impact of prompt refinements and embeddings, and the complexity of different material properties.

Finally, the extracted knowledge is validated through outlier detection, ensuring that erroneous values are excluded from the generated database and enhancing its reliability and usability in engineering applications.

Abstract

Erwerben Sie einen Zugang, um dieses Dokument anzusehen.

Ähnliche Inhalte

Leveraging Large Language Models in Polymer Informatics Data Collection

Barthelemy, V. (Speaker)¹; Holzbach, N.¹; van de Bunt, N.¹; Shahmohammadi, S.¹; Boersma, A.¹; Urbanus, J.H.¹

Development of neural network interatomic potentials for molecular dynamics: Application for martensitic nickel titanium

Jaros, P. (Speaker)¹; Behler, J.²; Sedlák, P.³; Šesták, P.³

Leveraging Knowledge Graphs for Extraction and Linkage of Information from Unstructured Data

Baghaee Ravari, S. (Speaker)¹; Hickel, T.²; Stricker, M.¹

An automated workflow in materials science for combining multi-modal simulation and experimental information using data mining and large language models

Katzer, B. (Speaker)¹; Klinder, S.¹; Schulz, K.¹

Latent space analysis of corrosion progression using CLIP-based semantic representations

Helwing, R. (Speaker)¹; Kosak, M.¹; Walther, F.¹

Automated Detection and Crystallographic Classification of Dislocations in ECCI Micrographs Using Deep Learning

Ruzaeva, K. (Speaker)¹; Medina, A.²; Lee, S.²; Kazimi, B.¹; Kirchlechner, C.²; Sandfeld, S.¹

From Accelerated Creep Tests to Performance Parameters: Digitalizing Materials Development with Dataspaces and Machine Learning Applications

Morand, L. (Speaker)¹; Büschelberger, M.²; Chen, F.³; Gopalakrishnan, A.¹; Habraken, A.M.³; Helm, D.¹; Kumaraswamy, K.¹; Liu, Y.⁴; Nahshon, Y.²; Romero, I.⁵; Schenk, C.⁴; Zapara, M.¹

Defocus estimation in light-optical microscopy and optimized image acquisition strategies based on deep learning

Krawczyk, P. (Speaker)¹; Jansche, A.¹; Bernthaler, T.¹; Schneider, G.¹

Literature-Based Prediction of High-Performance Electrocatalysts

Zhang, L. (Speaker)¹; Stricker, M.¹

A texture synthesis approach for generating synthetic microstructural images for training ML models in a low-data regime

Müller, M. (Speaker)¹; Britz, D.¹; Mücklich, F.²

Contextual Representations of Elements in Periodic Table with Fine-Tuned Language Models

Putri, S.F.H. (Speaker)¹; Ishii, F.¹

Uncertainty Propagation for Machine-learned Interatomic Potentials

Gaafer, H. (Speaker)¹; Janssen, J.¹; Bitzek, E.¹; Drautz, R.²; Neugebauer, J.¹

© 2026