Data Mining Based Searches using ICSD Database

One of the main topics of our group is the search for new materials, compounds, and structures. To reach such a goal, one big part of our research is based on large datasets of crystal structures deposited in large databases which could be investigated using data mining-based methods.
Our group is collaborating with ICSD (Inorganic Crystal Structure Database), the world’s largest database for completely determined inorganic crystal structures. The ICSD database contains an almost exhaustive list of known inorganic crystal structures published since 1913 and today it contains more than 300.000 crystal structures. Important crystal structure data are available, including unit cell, space group, complete atomic parameters, etc. (80 % of the structures are allocated to about 9,000 structure types, abstracts for a quick grasp of the article content are available, and simulation of Powder Diffraction Data, keywords to describe physical and chemical properties are provided, etc).
The ICSD database contains the following types of crystal structures: Experimental inorganic structures, Experimental metal-organic structures, and Theoretical inorganic structures. The L-TIM is closely collaborating with the ICSD on the Theoretical inorganic structures. Each theoretical structure has been carefully evaluated, and the resulting theoretical CIF file has been extended and standardized for the first time as a result of our collaboration. Furthermore, the first classification of theoretical data in the ICSD has been presented, including additional categories used for comparison of experimental and theoretical information.

In addition, our group is interested to use the ICSD database (and similar databases) for energy landscape exploration. Our guiding idea was to find all possible structure (proto)types, which might be observable under experimental conditions. For this first step, we have used data mining, a methodology that has as its overall goal the extraction of information from a given data set, transforming it into an easily analyzable “structure” for further use. At its simplest level, data mining is just a raw analysis step, but it can constitute a more complex process, generally known as the “knowledge discovery in databases” process (KDD) which involves several important steps: selection, pre-processing, transformation, “Data Mining and Interpretation/Evaluation”, and post-processing. Finally, we like to combine data mining-based explorations with quantum mechanics and other theoretical methods.
We would like to thank FIZ Karlsruhe – Leibniz Institute for Information Infrastructure, Karlsruhe, Germany, for providing the ICSD database and scientific collaboration.
Recommended literature:
- D. Zagorac, H. Muller, S. Ruehl, J. Zagorac, S. Rehme, Recent developments in the Inorganic Crystal Structure Database: Theoretical crystal structure data and related features, Journal of Applied Crystallography 52 (2019) 918-925. DOI: https://doi.org/10.1107/S160057671900997X .
- J. Zagorac, D. Zagorac, M. Rosić, J. C. Schön, B. Matović, Structure prediction of aluminum nitride combining data mining and quantum mechanics, CrystEngComm 19(35) (2017) 5259-5268. DOI: https://doi.org/10.1039/c7ce01039g
- Zagorac J., Jovanović D., Volkov-Husović T., Matović B., Zagorac D., Structure prediction, high pressure effect and properties investigation of superhard B6O, (2020) Modelling and Simulation in Materials Science and Engineering, 28 (3), art. no. 035004. DOI: 10.1088/1361-651X/ab6ec8
- Škundrić, T., Zagorac, D., Schön, J. C., Pejić, M., Matović, B., Crystal structure prediction of the novel Cr2SiN4 compound via global optimization, data mining, and the PCAE method, Crystals, (2021) 11(8), 891. DOI: https://doi.org/10.3390/cryst11080891
- Zagorac D., Zagorac J., Fonović M., Pejić M., Schön J.C., Computational discovery of new modifications in scandium oxychloride (ScOCl) using a multi-methodological approach, (2022) Zeitschrift fur Anorganische und Allgemeine Chemie, 648 (23), art. no. e202200198. DOI: 10.1002/zaac.202200198
