Volume 16 , Issue 1 , PP: 41-48, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Dipti Theng 1 * , K. K. Bhoyar 2 , Prashant Pawade 3
Doi: https://doi.org/10.54216/JISIoT.160104
Selecting the most relevant feature subset for a task is demanded and recommended for high accuracy and reduced model training time. Ensemble learning has shown superior results in classification; hence, we propose an ensemble method for feature selection and shown stability analysis for the selected feature set. The research question being investigated is whether ensemble methods are effective at selecting informative features in a dataset and if the selected features are stable compared to other feature selection methods. This paper presented a tree-based ensemble learning approach for feature selection. Our approach for ensemble feature selection includes function perturbation with the voting ensemble, an ensemble with a fixed number of features, and an ensemble with a contiguous number of features. Ensemble learning is found to be superior to other traditional feature selection algorithms. Ensemble learning algorithms are implemented on two high-dimensional microarray biomedical datasets. From our experimental study, it is observed that the voting ensemble outperforms other ensemble techniques, thereby reducing feature subset size and achieving higher accuracy. Stability analysis of all the algorithms has been studied and it is found that all ensemble techniques have higher stability than the traditional feature selection methods. Thus, ensemble learning proves to be a superior technique for feature selection. Our results demonstrate that the proposed method is effective in identifying relevant features and stable features and can improve the performance of machine learning models.
Feature selection , Ensemble technique , Stability , Microarray dataset , Biomarker selection
[1] A. Wang, H. Liu, J. Yang, and G. Chen, “Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data,” Computers in Biology and Medicine, vol. 142, p. 105208, 2022.
[2] R. Salman, A. Alzaatreh, and H. Sulieman, “The stability of different aggregation techniques in ensemble feature selection,” Journal of Big Data, vol. 9, no. 1, pp. 1–23, 2022.
[3] B. Pes, N. Dessì, and M. Angioni, “Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data,” Information Fusion, vol. 35, pp. 132–147, 2017.
[4] D. Guan, W. Yuan, Y. K. Lee, K. Najeebullah, and M. K. Rasel, “A review of ensemble learning-based feature selection,” IETE Technical Review, vol. 31, no. 3, pp. 190–198, 2014.
[5] A. Kalousis, J. Prados, and M. Hilario, “Stability of feature selection algorithms,” in Proc. 5th IEEE Int. Conf. Data Mining (ICDM’05), Nov. 2005, pp. 8–pp.
[6] S. Nogueira, K. Sechidis, and G. Brown, “On the stability of feature selection algorithms,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 6345–6398, 2017.
[7] J. Racicot, Dynamiques de connectivité cérébrale fonctionnelle associées aux fluctuations journalières des états affectifs, 2024.
[8] B. Pes, “Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains,” Neural Computing and Applications, pp. 1–23, 2019.
[9] T. Al-Quraishi et al., “Breast cancer risk assessment prediction using an ensemble classifier,” in Proc. CAINE2017, 2017.
[10] T. Gaudelet et al., “Unveiling new disease, pathway, and gene associations via multi-scale neural networks,” arXiv preprint arXiv: 1901.10005, 2019.
[11] N. M. Abdulkareem and A. M. Abdulazeez, “Machine learning classification based on random forest algorithm: A review,” Int. J. Sci. Bus., vol. 5, no. 2, pp. 128–142, 2021.
[12] D. Baby, S. J. Devaraj, and J. Hemanth, “Leukocyte classification based on feature selection using extra trees classifier: A transfer learning approach,” Turkish J. Electr. Eng. Comput. Sci., vol. 29, no. 8, pp. 2742–2757, 2021.
[13] L. Azkue, J. Kerexeta, J. Sampedro, M. Espejo, and N. Larburu, “Predictive models of ward admissions from the emergency,” Age, vol. 50, pp. 23–77.
[14] P. G. Asteris et al., “Slope stability classification under seismic conditions using several tree-based intelligent techniques,” Appl. Sci., vol. 12, no. 3, p. 1753, 2022.
[15] R. M. Mohana, C. K. K. Reddy, P. R. Anisha, and B. R. Murthy, “Random forest algorithms for the classification of tree-based ensemble,” Mater. Today: Proc., 2021.