AI MSE 2025
Poster
Development of Predictive Models for Aqueous Solubility of Inorganic Materials Leveraging Structured Data Extraction Using LLMs
SN

Sahith Reddy Namireddy (M.Sc.)

ExoMatter GmbH

Namireddy, S.R. (Speaker)¹; Caicedo-Dávila, S.¹
¹ExoMatter GmbH, Munich

Aqueous solubility (S) is a key property in various domains of material science research, such as catalysts for water splitting reactions and battery design. Measurement and prediction of aqueous solubility is a complex and prevailing challenge in chemistry, which could be tackled using accurate predictions of solubility by machine learning (ML), which in turn would allow expanding material search space for better candidate material identification. However, developing said models remains challenging owing to data sparsity and the scarcity of publicly available datasets—particularly for inorganic materials. Recently, researchers have taken advantage of large language models (LLM's) to generate datasets by leveraging structured data extraction from large amounts of text. In this work, we used LLM's to extract material properties from PDF property handbooks using LLM's. We used this extraction pipeline to generate the training set for a predictive ML model for aqueous solubility of inorganic crystals at 20°C. The extracted data was still sparse, but by employing data-efficient machine learning models, satisfactory results for solubility predictions were achieved on log(S) with train set mean absolute error (MAE) 0.68 and test set MAE 1.57.

The developed models were embedded in the ExoMatter platform, our web-based solution for materials R&D, which allows our users use a holistic approach for materials screening and candidate identification, combining physical, chemical, business and sustainability criteria.

Abstract

Abstract

Erwerben Sie einen Zugang, um dieses Dokument anzusehen.

Poster

Poster

Erwerben Sie einen Zugang, um dieses Dokument anzusehen.

Ähnliche Inhalte

© 2026