ASPG Menu
search

American Scientific Publishing Group

verified Journal

Fusion: Practice and Applications

ISSN
Online: 2692-4048 Print: 2770-0070
Frequency

Continuous publication

Publication Model

Open access · Articles freely available online · APC applies after acceptance

Fusion: Practice and Applications
Full Length Article

Volume 19Issue 2PP: 367-378 • 2025

ML-kNN-H: A Multi-Label Classification Model based on Hoeffding’s Inequality

Mashail Althabiti 1* ,
Manal Abdullah 1 ,
Omaima Almatrafi 1
1Faculty of Computing and Information Technology, King Abdulaziz University, 21589 Jeddah, Saudi Arabia
* Corresponding Author.
Received: January 06, 2025 Revised: February 09, 2025 Accepted: March 08, 2025

Abstract

Multi-label data stream classification plays a crucial role in various applications, including recommendation systems, real-time monitoring systems, smart cities, social media analysis, and healthcare. Its ability to classify constantly generated, potentially unbounded data at a high rate is of utmost importance. Besides accommodating multiple labels, data streams may evolve due to concept drift and bias toward particular classes due to class imbalance. This research introduces the multi-label classification model based on Hoeffding inequality (ML-kNN-H). The proposed model aims to process multi-label data streams, handle concept drift, and class imbalance. ML-kNN-H removes instances introducing errors based on a dynamic value computed from the Hoeffding inequality instead of a fixed value, thereby enhancing the model's efficiency and applicability to different types of data streams. Several experiments have been conducted to assess the model's performance in the presence of concept drift (abrupt and gradual drift) and class imbalance. Particularly, it has been evaluated against six kNN multi-label classifiers on ten datasets: synthetic and real world. The results indicate that ML-kNN-H outperformed the other classifiers on benchmark datasets in terms of Subset Accuracy, Accuracy, Hamming Score, and F-score, except in running time. Statistical analysis has also been utilized to measure the significance of the ML-kNN-H compared to the state-of-the-art classifiers.

Keywords

K Nearest Neighbor Multi-label Classification Data Stream Concept Drift Hoeffding&rsquo s Inequality

References

[1] F. Herrera, F. Charte, A. J. Rivera, and M. J. del Jesus, Multilabel Classification: Problem Analysis, Metrics and Techniques, 1st ed. Springer Publishing Company, Incorporated, 2016.

[2] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” ACM Comput. Surv, vol. 1, no. 1, 2013, doi: 10.1145/0000000.0000000.

[3] M. Roseberry and A. Cano, “Multi-label kNN classifier with self-adjusting memory for drifting data streams,” in Proc. Mach. Learn. Res., 2018, pp. 23–37.

[4] F. Charte, A. J. Rivera, M. J. del Jesus, and F. Herrera, “Addressing imbalance in multilabel classification: Measures and random resampling algorithms,” Neurocomputing, vol. 163, pp. 3–16, Sep. 2015, doi: 10.1016/j.neucom.2014.08.091.

[5] S. Kumar, N. Kumar, A. Dev, and S. Naorem, “Movie genre classification using binary relevance, label powerset, and machine learning classifiers,” Multimed. Tools Appl., vol. 82, no. 1, pp. 945–968, 2023, doi: 10.1007/s11042-022-13211-5.

[6] E. Hallaji, R. Razavi-Far, and M. Saif, “Expanding analytical capabilities in intrusion detection through ensemble-based multi-label classification,” Comput. Secur., vol. 139, Apr. 2024, doi: 10.1016/j.cose.2024.103730.

[7] B. K. Mishra, D. Thakker, S. Mazumdar, D. Neagu, M. Gheorghe, and S. Simpson, “A novel application of deep learning with image cropping: A smart city use case for flood monitoring,” J. Reliab. Intell. Environ, vol. 6, no. 1, pp. 51–61, Mar. 2020, doi: 10.1007/s40860-020-00099-x.

[8] G. Tsoumakas and I. Katakis, “Multi-label classification: An overview,” in Int. J. Data Warehous. Min., IGI Publishing, 2007, pp. 1–13, doi: 10.4018/jdwm.2007070101.

[9] X. Zheng, “A survey on multi-label data stream classification,” IEEE Access, vol. 8, pp. 1249–1275, 2020, doi: 10.1109/ACCESS.2019.2962059.

[10] X. Zheng and P. Li, “An efficient framework for multi-label learning in non-stationary data stream,” in Proc. 12th IEEE Int. Conf. Big Knowl., ICBK 2021, IEEE, 2021, pp. 149–156, doi: 10.1109/ICKG52313.2021.00029.

[11] X. Wang, P. Kuntz, F. Meyer, and V. Lemaire, “Multi-label kNN classifier with online dual memory on data stream,” in IEEE Int. Conf. Data Min. Workshops, ICDMW, IEEE, 2021, pp. 405–413, doi: 10.1109/ICDMW53433.2021.00056.

[12] M. Roseberry, B. Krawczyk, and A. Cano, “Multi-label punitive kNN with self-adjusting memory for drifting data streams,” ACM Trans. Knowl. Discov. Data, vol. 13, no. 6, Oct. 2019, doi: 10.1145/3363573.

[13] M. Roseberry, S. Džeroski, A. Bifet, and A. Cano, “Aging and rejuvenating strategies for fading windows in multi-label classification on data streams,” in Proc. ACM Symp. Appl. Comput., ACM, Mar. 2023, pp. 390–397, doi: 10.1145/3555776.3577625.

[14] M. Roseberry, B. Krawczyk, Y. Djenouri, and A. Cano, “Self-adjusting k-nearest neighbors for continual learning from multi-label drifting data streams,” Neurocomputing, vol. 442, pp. 10–25, Jun. 2021, doi: 10.1016/j.neucom.2021.02.032.

[15] A. Bifet, “MOA: Massive Online Analysis Learning Examples,” J. Mach. Learn. Res., vol. 11, pp. 1601–1604, 2010. [Online]. Available: http://dl.acm.org/citation.cfm?id=1859890.1859903.

[16] G. Alberghini, S. Barbon Junior, and A. Cano, “Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams,” Neurocomputing, vol. 481, pp. 228–248, Apr. 2022, doi: 10.1016/j.neucom.2022.01.075.

[17] A. Bifet and R. Gavaldà, “Learning from time-changing data with adaptive windowing,” in Proc. SIAM Int. Conf. Data Min., 2013, pp. 443–448, doi: 10.1137/1.9781611972771.42.

[18] M. Althabiti, M. Abdullah, and O. Almatrafi, “Multi-label classification for drift detection in IoT data streams,” Commun. Math. Appl., vol. 15, 2024.

[19] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” in The Collected Works of Wassily Hoeffding, N. I. Fisher and P. K. Sen, Eds. Springer, New York, 1994, pp. 409–426, doi: 10.1007/978-1-4612-0865-5_26.

[20] I. Frías-Blanco, J. Del Campo-Ávila, G. Ramos-Jiménez, R. Morales-Bueno, A. Ortiz-Díaz, and Y. Caballero-Mota, “Online and non-parametric drift detection methods based on Hoeffding’s bounds,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 3, pp. 810–823, Mar. 2015, doi: 10.1109/TKDE.2014.2345382.

[21] A. Pesaranghader and H. Viktor, “Fast Hoeffding drift detection method for evolving data streams,” in Mach. Learn. Knowl. Discov. Databases, Springer, Cham, 2016, pp. 96–111, doi: 10.1007/978-3-319-46227-1_7.

[22] X. Wang, P. Kuntz, F. Meyer, and V. Lemaire, “Multi-label kNN classifier with online dual memory on data stream,” in IEEE Int. Conf. Data Min. Workshops, ICDMW, IEEE, 2021, pp. 405–413, doi: 10.1109/ICDMW53433.2021.00056.

[23] “Multi-label classification dataset repository,” Knowl. Discov. Intell. Syst. KDIS, Univ. Córdoba. Accessed: Jul. 14, 2024. [Online]. Available: https://www.uco.es/kdis/mllresources/.

[24] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” 2006.

Cite This Article

Choose your preferred format

format_quote
Althabiti, Mashail, Abdullah, Manal, Almatrafi, Omaima. "ML-kNN-H: A Multi-Label Classification Model based on Hoeffding’s Inequality." Fusion: Practice and Applications, vol. Volume 19, no. Issue 2, 2025, pp. 367-378. DOI: https://doi.org/10.54216/FPA.190226
Althabiti, M., Abdullah, M., Almatrafi, O. (2025). ML-kNN-H: A Multi-Label Classification Model based on Hoeffding’s Inequality. Fusion: Practice and Applications, Volume 19(Issue 2), 367-378. DOI: https://doi.org/10.54216/FPA.190226
Althabiti, Mashail, Abdullah, Manal, Almatrafi, Omaima. "ML-kNN-H: A Multi-Label Classification Model based on Hoeffding’s Inequality." Fusion: Practice and Applications Volume 19, no. Issue 2 (2025): 367-378. DOI: https://doi.org/10.54216/FPA.190226
Althabiti, M., Abdullah, M., Almatrafi, O. (2025) 'ML-kNN-H: A Multi-Label Classification Model based on Hoeffding’s Inequality', Fusion: Practice and Applications, Volume 19(Issue 2), pp. 367-378. DOI: https://doi.org/10.54216/FPA.190226
Althabiti M, Abdullah M, Almatrafi O. ML-kNN-H: A Multi-Label Classification Model based on Hoeffding’s Inequality. Fusion: Practice and Applications. 2025;Volume 19(Issue 2):367-378. DOI: https://doi.org/10.54216/FPA.190226
M. Althabiti, M. Abdullah, O. Almatrafi, "ML-kNN-H: A Multi-Label Classification Model based on Hoeffding’s Inequality," Fusion: Practice and Applications, vol. Volume 19, no. Issue 2, pp. 367-378, 2025. DOI: https://doi.org/10.54216/FPA.190226
Digital Archive Ready