Helmholtz-Zentrum Hereon GmbH
The knowledge discovery in databases (KDD)- and data mining (DM)-driven approaches have emerged as a new paradigm for accelerating the discovery and design of new materials. One of the specialized applications of DM is literature mining. This can enable the discovery of new patterns from within the available scientific findings. It can also be utilized in creating experimental datasets. Literature mining thus creates tremendous opportunities for the materials research community, as a synergistic evaluation of combined experimental datasets may provide meaningful insights and conclusions not retrievable from any of the single datasets alone.
The two most crucial steps in the KDD process are data collection and data selection. The data collection step requires the acquisition of literature from various digital libraries such as ScienceDirect, Wiley, etc. through their APIs. These digital libraries have their own distinctive implementations of search engines with a limited support to the structured search. However, the information needs are very demanding and not always entertained by these search engines, e.g., retrieval of only literature that have creep results. This induces the conceptualization and implementation of a system that facilitates 1) the collection of literature from digital libraries based on generic searches, 2) the selection of relevant literature from a domain-specific information retrieval system (DSIRS) based on specific needs. The DSIRS extends the structured search notion to include the possibility of performing searches over the figure captions. Moreover, the DSIRS leverages other IR concepts, e.g., phrase search, full-text search, domain knowledge taxonomy-based semantic search, and matching-based summarization.
Abstract
Erwerben Sie einen Zugang, um dieses Dokument anzusehen.
Poster
Erwerben Sie einen Zugang, um dieses Dokument anzusehen.
© 2025