AIMEN Technology Centre
Large Language Models (LLMs) facilitate the development of material property databases by automating data retrieval, enhancing repository consistency, and improving integration into engineering workflows. However, material data is often stored in tabular form, posing a challenge for LLMs, which are primarily trained on sequential text and struggle with accurately interpreting structured data.
This work explores the use of Retrieval-Augmented Generation (RAG) for extracting material properties from tables, with a focus on both preprocessing and postprocessing stages. Preprocessing optimizes tabular data for LLM accessibility, ensuring more effective retrieval, while postprocessing refines extracted information into well-structured value/unit pairs. Standardization enables unit conversion, a requirement to perform data analysis, and enhances database integrity. Additionally, this approach enables the development of a methodology for objectively evaluating extraction performance. A custom evaluation metric, which weights the accuracy of both retrieved values and units, is introduced to assess domain-specific challenges. These include model effectiveness in table reformatting and data retrieval, the impact of prompt refinements and embeddings, and the complexity of different material properties.
Finally, the extracted knowledge is validated through outlier detection, ensuring that erroneous values are excluded from the generated database and enhancing its reliability and usability in engineering applications.
Abstract
Erwerben Sie einen Zugang, um dieses Dokument anzusehen.
© 2026