ExoMatter GmbH
Aqueous solubility (S) is a key property in various domains of material science research, such as catalysts for water splitting reactions and battery design. Measurement and prediction of aqueous solubility is a complex and prevailing challenge in chemistry, which could be tackled using accurate predictions of solubility by machine learning (ML), which in turn would allow expanding material search space for better candidate material identification. However, developing said models remains challenging owing to data sparsity and the scarcity of publicly available datasets—particularly for inorganic materials. Recently, researchers have taken advantage of large language models (LLM's) to generate datasets by leveraging structured data extraction from large amounts of text. In this work, we used LLM's to extract material properties from PDF property handbooks using LLM's. We used this extraction pipeline to generate the training set for a predictive ML model for aqueous solubility of inorganic crystals at 20°C. The extracted data was still sparse, but by employing data-efficient machine learning models, satisfactory results for solubility predictions were achieved on log(S) with train set mean absolute error (MAE) 0.68 and test set MAE 1.57.
The developed models were embedded in the ExoMatter platform, our web-based solution for materials R&D, which allows our users use a holistic approach for materials screening and candidate identification, combining physical, chemical, business and sustainability criteria.
Abstract
Erwerben Sie einen Zugang, um dieses Dokument anzusehen.
Poster
Erwerben Sie einen Zugang, um dieses Dokument anzusehen.
© 2026