MSE 2024
Lecture
24.09.2024
Navigating atomistic simulations data space with workflows and knowledge graphs
SM

Dr.-Ing. Sarath Menon

Max-Planck-Institut für Nachhaltige Materialien GmbH

Menon, S. (Speaker)¹; Azócar Guzmán, A.²; Hofmann, V.²; Sandfeld, S.²; Hi­ckel, T.³; Neugebauer, J.¹
¹Max-Planck-Institut für Eisenforschung GmbH, Berlin; ²Forschungszentrum Jülich GmbH, Aachen; ³Bundesanstalt für Materialforschung und -prüfung (BAM), Berlin
Vorschau
23 Min. Untertitel (CC)

Computational materials science workflows often involve a multitude of steps, include diverse software and tools, span different length and time scales, and encompass a large range of material compositions, structures and thermodynamic conditions, leading to substantial amounts of complex data. On the atomic scale, simulation methods often present a physics-based approach to generate data which in turn can be used for data-driven techniques. To achieve workflow reproducibility, data reusability, and meaningful interpretation, it is necessary to ensure well-described data and metadata at each step of the workflow, from atomic structure to computed material properties.

Often, the first step of an atomistic workflow is the generation of the input atomic structure. In this step, we automate the description of data and metadata using the software tool, pyscal-rdf [1]. This open-source tool can be used for generating atomic structures, including defect systems, such as grain boundaries, that are annotated using the Computational Materials Sample Ontology (CMSO) [2].

Another essential aspect to achieve interoperability is the semantic description of the simulation workflow used to generate the data. To this end, we utilize the Atomistic Simulation Methods Ontology (ASMO) [3] to describe the workflows involving diverse simulation codes and calculated properties. The data is stored in an application-level knowledge graph, using tailored semantically annotated jobs. We utilize pyiron[4], as an example for the workflow environment to demonstrate the two-fold benefits of the knowledge graph: (i) data, such as atomic structures, materials, and simulation processes, are fully queryable through an automated system, with detailed metadata enhancing reuse and improving interoperability; (ii) the potential to explore data and metadata, identify new trends, and extract materials properties that were not explicitly calculated.

In this work, we aim to extract thermodynamic properties of materials. This innovative approach, combining workflows and semantic technologies, can be a valuable tool for domain scientists in their everyday research, for leveraging the advantages of a knowledge graph, and taking a step towards FAIR data [5].

References

[1] https://doi.org/10.5281/zenodo.8146527

[2] https://purls.helmholtz-metadaten.de/cmso/

[3] https://purls.helmholtz-metadaten.de/mmss/asmo

[4] https://doi.org/10.1016/j.commatsci.2018.07.043

[5] https://doi.org/10.1038/sdata.2016.18


© 2026