Enhancing Classification Accuracy through Cluster-Based Ensemble Learning and Adaptive Weighting

Mustafa Radif¹, Zainab Fahad alnaseri¹, Salam saad alkafagi², Ali Hakem Al-saeedi¹,

Riyadh Rahef Nuiaa Alogaili^{3,* ,}Mazin Abed Mohammed^4,5

¹College of Computer Science and Information Technology, University of Al-Qadisiyah, Diwaniyah, Iraq

²Babylon Education Directorate, Ministry of Education, Babil, Iraq

³College of Computer Science and Information Technology, Wasit University, Al-Kut, Wasit, Iraq

⁴Department of Artificial Intelligence, College of Computer Science and Information Technology, University of Anbar, Anbar, Iraq

⁵College of science, Al-Farabi University, Baghdad, Iraq

Emails: mustafa.radif@qu.edu.iq; zainab.alnaseri@qu.edu.iq; salam.s.alkafagi@gmail.com; riyadh@uowasit.edu.iq; mazinalshujeary@uoanbar.edu.iq;

Abstract

As digital devices continue to process ever-increasing volumes of complex data, ensuring accurate and efficient machine learning performance has become a significant challenge. Traditional ensemble learning methods often attempt to address these issues through data sampling or partitioning; however, such approaches can introduce biases and fail to fully capture the underlying structure of the data. To address these limitations, this paper proposes a novel classification framework that integrates clustering with adaptive weighting strategies. The process begins by dividing the training data into clusters, each representing a specific subset of the overall data distribution. Separate machine learning models are then trained on these clusters, allowing each model to specialize in different areas of the data. When analyzing a test instance, its relationship to the individual clusters is evaluated using two key measures: the correlation coefficient, which assesses feature similarity, and the Mahalanobis distance, which calculates the statistical proximity to the cluster center. These values are subsequently used to generate optimized weights that determine the influence each model should have in the final ensemble prediction. By aligning model contributions with the structural similarities between the test and training data, the proposed approach enhances both the reliability and precision of classification. Experimental results demonstrate that this cluster-aware ensemble consistently outperforms both baseline and advanced classifiers on benchmark datasets.

Keywords: Ensemble learning; Correlation-based model; Similarity; Data mining; Machine learning