Enhancing Classification Accuracy through Cluster-Based Ensemble Learning and Adaptive Weighting
Mustafa Radif1, Zainab Fahad alnaseri1, Salam saad alkafagi2, Ali Hakem Al-saeedi1,
Riyadh Rahef Nuiaa Alogaili3,* , Mazin Abed Mohammed4,5
1College of Computer Science and Information Technology, University of Al-Qadisiyah, Diwaniyah, Iraq
2Babylon Education Directorate, Ministry of Education, Babil, Iraq
3College of Computer Science and Information Technology, Wasit University, Al-Kut, Wasit, Iraq
4Department of Artificial Intelligence, College of Computer Science and Information Technology, University of Anbar, Anbar, Iraq
5College of science, Al-Farabi University, Baghdad, Iraq
Emails: mustafa.radif@qu.edu.iq; zainab.alnaseri@qu.edu.iq; salam.s.alkafagi@gmail.com; riyadh@uowasit.edu.iq; mazinalshujeary@uoanbar.edu.iq;
Abstract
As digital devices continue to process ever-increasing volumes of complex data, ensuring accurate and efficient machine learning performance has become a significant challenge. Traditional ensemble learning methods often attempt to address these issues through data sampling or partitioning; however, such approaches can introduce biases and fail to fully capture the underlying structure of the data. To address these limitations, this paper proposes a novel classification framework that integrates clustering with adaptive weighting strategies. The process begins by dividing the training data into clusters, each representing a specific subset of the overall data distribution. Separate machine learning models are then trained on these clusters, allowing each model to specialize in different areas of the data. When analyzing a test instance, its relationship to the individual clusters is evaluated using two key measures: the correlation coefficient, which assesses feature similarity, and the Mahalanobis distance, which calculates the statistical proximity to the cluster center. These values are subsequently used to generate optimized weights that determine the influence each model should have in the final ensemble prediction. By aligning model contributions with the structural similarities between the test and training data, the proposed approach enhances both the reliability and precision of classification. Experimental results demonstrate that this cluster-aware ensemble consistently outperforms both baseline and advanced classifiers on benchmark datasets.
Keywords: Ensemble learning; Correlation-based model; Similarity; Data mining; Machine learning