Volume 19 , Issue 2 , PP: 367-378, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Mashail Althabiti 1 * , Manal Abdullah 2 , Omaima Almatrafi 3
Doi: https://doi.org/10.54216/FPA.190226
Multi-label data stream classification plays a crucial role in various applications, including recommendation systems, real-time monitoring systems, smart cities, social media analysis, and healthcare. Its ability to classify constantly generated, potentially unbounded data at a high rate is of utmost importance. Besides accommodating multiple labels, data streams may evolve due to concept drift and bias toward particular classes due to class imbalance. This research introduces the multi-label classification model based on Hoeffding inequality (ML-kNN-H). The proposed model aims to process multi-label data streams, handle concept drift, and class imbalance. ML-kNN-H removes instances introducing errors based on a dynamic value computed from the Hoeffding inequality instead of a fixed value, thereby enhancing the model's efficiency and applicability to different types of data streams. Several experiments have been conducted to assess the model's performance in the presence of concept drift (abrupt and gradual drift) and class imbalance. Particularly, it has been evaluated against six kNN multi-label classifiers on ten datasets: synthetic and real world. The results indicate that ML-kNN-H outperformed the other classifiers on benchmark datasets in terms of Subset Accuracy, Accuracy, Hamming Score, and F-score, except in running time. Statistical analysis has also been utilized to measure the significance of the ML-kNN-H compared to the state-of-the-art classifiers.
K Nearest Neighbor , Multi-label Classification , Data Stream , Concept Drift , Hoeffding&rsquo , s Inequality
[1] F. Herrera, F. Charte, A. J. Rivera, and M. J. del Jesus, Multilabel Classification: Problem Analysis, Metrics and Techniques, 1st ed. Springer Publishing Company, Incorporated, 2016.
[2] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” ACM Comput. Surv, vol. 1, no. 1, 2013, doi: 10.1145/0000000.0000000.
[3] M. Roseberry and A. Cano, “Multi-label kNN classifier with self-adjusting memory for drifting data streams,” in Proc. Mach. Learn. Res., 2018, pp. 23–37.
[4] F. Charte, A. J. Rivera, M. J. del Jesus, and F. Herrera, “Addressing imbalance in multilabel classification: Measures and random resampling algorithms,” Neurocomputing, vol. 163, pp. 3–16, Sep. 2015, doi: 10.1016/j.neucom.2014.08.091.
[5] S. Kumar, N. Kumar, A. Dev, and S. Naorem, “Movie genre classification using binary relevance, label powerset, and machine learning classifiers,” Multimed. Tools Appl., vol. 82, no. 1, pp. 945–968, 2023, doi: 10.1007/s11042-022-13211-5.
[6] E. Hallaji, R. Razavi-Far, and M. Saif, “Expanding analytical capabilities in intrusion detection through ensemble-based multi-label classification,” Comput. Secur., vol. 139, Apr. 2024, doi: 10.1016/j.cose.2024.103730.
[7] B. K. Mishra, D. Thakker, S. Mazumdar, D. Neagu, M. Gheorghe, and S. Simpson, “A novel application of deep learning with image cropping: A smart city use case for flood monitoring,” J. Reliab. Intell. Environ, vol. 6, no. 1, pp. 51–61, Mar. 2020, doi: 10.1007/s40860-020-00099-x.
[8] G. Tsoumakas and I. Katakis, “Multi-label classification: An overview,” in Int. J. Data Warehous. Min., IGI Publishing, 2007, pp. 1–13, doi: 10.4018/jdwm.2007070101.
[9] X. Zheng, “A survey on multi-label data stream classification,” IEEE Access, vol. 8, pp. 1249–1275, 2020, doi: 10.1109/ACCESS.2019.2962059.
[10] X. Zheng and P. Li, “An efficient framework for multi-label learning in non-stationary data stream,” in Proc. 12th IEEE Int. Conf. Big Knowl., ICBK 2021, IEEE, 2021, pp. 149–156, doi: 10.1109/ICKG52313.2021.00029.
[11] X. Wang, P. Kuntz, F. Meyer, and V. Lemaire, “Multi-label kNN classifier with online dual memory on data stream,” in IEEE Int. Conf. Data Min. Workshops, ICDMW, IEEE, 2021, pp. 405–413, doi: 10.1109/ICDMW53433.2021.00056.
[12] M. Roseberry, B. Krawczyk, and A. Cano, “Multi-label punitive kNN with self-adjusting memory for drifting data streams,” ACM Trans. Knowl. Discov. Data, vol. 13, no. 6, Oct. 2019, doi: 10.1145/3363573.
[13] M. Roseberry, S. Džeroski, A. Bifet, and A. Cano, “Aging and rejuvenating strategies for fading windows in multi-label classification on data streams,” in Proc. ACM Symp. Appl. Comput., ACM, Mar. 2023, pp. 390–397, doi: 10.1145/3555776.3577625.
[14] M. Roseberry, B. Krawczyk, Y. Djenouri, and A. Cano, “Self-adjusting k-nearest neighbors for continual learning from multi-label drifting data streams,” Neurocomputing, vol. 442, pp. 10–25, Jun. 2021, doi: 10.1016/j.neucom.2021.02.032.
[15] A. Bifet, “MOA: Massive Online Analysis Learning Examples,” J. Mach. Learn. Res., vol. 11, pp. 1601–1604, 2010. [Online]. Available: http://dl.acm.org/citation.cfm?id=1859890.1859903.
[16] G. Alberghini, S. Barbon Junior, and A. Cano, “Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams,” Neurocomputing, vol. 481, pp. 228–248, Apr. 2022, doi: 10.1016/j.neucom.2022.01.075.
[17] A. Bifet and R. Gavaldà, “Learning from time-changing data with adaptive windowing,” in Proc. SIAM Int. Conf. Data Min., 2013, pp. 443–448, doi: 10.1137/1.9781611972771.42.
[18] M. Althabiti, M. Abdullah, and O. Almatrafi, “Multi-label classification for drift detection in IoT data streams,” Commun. Math. Appl., vol. 15, 2024.
[19] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” in The Collected Works of Wassily Hoeffding, N. I. Fisher and P. K. Sen, Eds. Springer, New York, 1994, pp. 409–426, doi: 10.1007/978-1-4612-0865-5_26.
[20] I. Frías-Blanco, J. Del Campo-Ávila, G. Ramos-Jiménez, R. Morales-Bueno, A. Ortiz-Díaz, and Y. Caballero-Mota, “Online and non-parametric drift detection methods based on Hoeffding’s bounds,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 3, pp. 810–823, Mar. 2015, doi: 10.1109/TKDE.2014.2345382.
[21] A. Pesaranghader and H. Viktor, “Fast Hoeffding drift detection method for evolving data streams,” in Mach. Learn. Knowl. Discov. Databases, Springer, Cham, 2016, pp. 96–111, doi: 10.1007/978-3-319-46227-1_7.
[22] X. Wang, P. Kuntz, F. Meyer, and V. Lemaire, “Multi-label kNN classifier with online dual memory on data stream,” in IEEE Int. Conf. Data Min. Workshops, ICDMW, IEEE, 2021, pp. 405–413, doi: 10.1109/ICDMW53433.2021.00056.
[23] “Multi-label classification dataset repository,” Knowl. Discov. Intell. Syst. KDIS, Univ. Córdoba. Accessed: Jul. 14, 2024. [Online]. Available: https://www.uco.es/kdis/mllresources/.
[24] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” 2006.