Enhancing dimensionality prediction in hybrid metal halides via feature engineering and class-imbalance mitigation
/ Authors
/ Abstract
We present a machine learning (ML) framework for predicting the structural dimensionality of hybrid metal halides (HMHs), including organic-inorganic perovskites, using a combination of chemically-informed feature engineering and advanced class-imbalance handling techniques. This study is motivated by the small and highly imbalanced nature of experimentally available HMH datasets, which limits the applicability and reliability of conventional ML approaches. The dataset, consisting of 494 HMH structures, is highly imbalanced across dimensionality classes (0D, 1D, 2D, 3D), posing significant challenges to predictive modeling. To mitigate this limitation, the dataset was augmented to 1336 samples using the synthetic minority oversampling technique, enabling improved learning of underrepresented dimensionality classes while preserving chemically meaningful feature relationships. We developed interaction-based descriptors designed to capture coupled steric and polarity effects relevant to dimensionality prediction, which are not readily captured by standard single-parameter or composition-only descriptors. These descriptors are integrated into a multi-stage workflow combining feature selection, ensemble stacking, and performance optimization. Our approach significantly improves F1-scores for underrepresented classes, achieving robust cross-validation performance across all dimensionalities. This work demonstrates a generalizable strategy for extracting reliable and interpretable structure–dimensionality relationships from limited experimental data, enabling pre-synthesis screening of organic cations and providing a practical blueprint for small-data ML in hybrid materials systems.
Journal: Machine Learning: Science and Technology