331 211
Full Length Article
Journal of Intelligent Systems and Internet of Things
Volume 11 , Issue 1, PP: 44-54 , 2024 | Cite this article as | XML | Html |PDF

Title

Forward feature selection: empirical analysis

  Firuz Kamalov 1 * ,   Said Elnaffar 2 ,   Aswani Cherukuri 3 ,   Annapurna Jonnalagadda 4

1  Department of Electrical Engineering, Canadian University Dubai, Dubai, UAE
    (firuz@cud.ac.ae)

2  Department of Computer Science, Canadian University Dubai, Dubai, UAE
    (said.elnaffar@cud.ac.ae;)

3  School of Information Systems, Vellore Institute of Technology, India
    (cherukuri@acm.org)

4  School of Computer Science and Engineering, Vellore Institute of Technology, India
    (jannapurna@gmail.com)


Doi   :   https://doi.org/10.54216/JISIoT.110105

Received: March 19, 2023 Revised: July 26, 2023 Accepted: November 27, 2023

Abstract :

Feature selection is an important preprocessing step in many data science and machine learning applications. Although there exist several sophisticated feature selection algorithms, their benefits are sometimes overshadowed by their complexity and slow execution. Therefore, in many cases, a more simple algorithm may be better suited. In this paper, we demonstrate that a rudimentary forward selection algorithm can achieve optimal performance with a low time complexity. Our study is based on an extensive empirical evaluation of the forward feature selection algorithm in the context of linear regression. Concretely, we compare the forward selection algorithm against the gold standard exhaustive search algorithm based on several datasets. The results show that the forward selection algorithm achieves high performance with relatively fast execution. Given the simplicity, accuracy, and speed of the forward feature selection algorithm, we recommend it as a primary feature selection method for most regression applications. Our results are particularly pertinent in the case of big data and real-time analysis.

Keywords :

data transformation; data mining; standardization

References :

[1] Dokeroglu, T., Deniz, A., & Kiziloz, H. E. (2022). A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing, 494, 269-296.

[2] Pudjihartono, N., Fadason, T., Kempa-Liehr, A.W., & O’Sullivan, J. M. (2022). A review of feature selection methods for machine learning-based disease risk prediction. Frontiers in Bioinformatics, 2, 927312.

[3] Kamalov, F., Thabtah, F., & Leung, H. H. (2023). Feature selection in imbalanced data. Annals of Data Science, 10(6), 1527-1541.

[4] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine learning, 46, 389-422.

[5] Kamalov, F., Sulieman, H., Moussa, S., Reyes, J. A., & Safaraliev, M. (2023). Nested ensemble selection: An effective hybrid feature selection method. Heliyon, 9(9).

[6] Chen, H., Xu, K., Chen, L., & Jiang, Q. (2021). Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection. Mathematics, 9(14), 1680.

[7] Gurrib, I., Kamalov, F., Starkova, O., Elshareif, E. E., & Contu, D. (2023). Drivers of the next-minute Bitcoin price using sparse regre ssions. Studies in Economics and Finance.

[8] Kour, H., Pandith, V., Manhas, J., & Sharma, V. (2023). Machine Learning-Based Hybrid Model for Wheat Yield Prediction. Machine Intelligence, Big Data Analytics, and IoT in Image Processing: Practical Applications, 151-176.

[9] Li, M., Wang, H., Yang, L., Liang, Y., Shang, Z., & Wan, H. (2020). Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction. Expert Systems with Applications, 150, 113277.

[10] Alelyani, S. (2021). Stable bagging feature selection on medical data. Journal of Big Data, 8(1), 1-18.

[11] Yin, Y., Jang-Jaccard, J., Xu,W., Singh, A., Zhu, J., Sabrina, F., & Kwak, J. (2023). IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset. Journal of Big Data, 10(1), 1-26.

[12] Deng, X., Li, M., Deng, S., & Wang, L. (2022). Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification. Medical & Biological Engineering & Computing, 60(3), 663-681.

[13] Abu Khurma, R., Aljarah, I., Sharieh, A., Abd Elaziz, M., Damaˇseviˇcius, R., & Krilaviˇcius, T. (2022). A review of the modification strategies of the nature inspired algorithms for feature selection problem. Mathematics, 10(3), 464.

[14] Kareem, S. S., Mostafa, R. R., Hashim, F. A., & El-Bakry, H. M. (2022). An effective feature selection model using hybrid metaheuristic algorithms for iot intrusion detection. Sensors, 22(4), 1396.

[15] Tiwari, A., & Chaturvedi, A. (2022). A hybrid feature selection approach based on information theory and dynamic butterfly optimization algorithm for data classification. Expert Systems with Applications, 196, 116621.

[16] EL-Hasnony, I. M., Elhoseny, M., & Tarek, Z. (2022). A hybrid feature selection model based on butterfly optimization algorithm: COVID-19 as a case study. Expert Systems, 39(3), e12786.

[17] Afza, F., Sharif, M., Khan, M. A., Tariq, U., Yong, H. S., & Cha, J. (2022). Multiclass skin lesion classification using hybrid deep features selection and extreme learning machine. Sensors, 22(3), 799.

[18] Mahendran, N., & PM, D. R. V. (2022). A deep learning framework with an embedded-based feature selection approach for the early detection of the Alzheimer’s disease. Computers in Biology and Medicine, 141, 105056.

[19] Satrya, G. B., Ramatryana, I. N. A., & Shin, S. Y. (2023). Compressive Sensing of Medical Images Based on HSV Color Space. Sensors, 23(5), 2616.

[20] Mohamed, T., Ibrahim, A., Faiz, T., Alhasan, W., Atta, A., Mago, V., ... & Munir, S. (2022, October). Intelligent Hand Gesture Recognition System Empowered With CNN. In 2022 International Conference on Cyber Resilience (ICCR) (pp. 1-8). IEEE.

[21] Flores, E. (n.d.). Direct-Mail Fundraising. RPubs. Retrieved June 7, 2022, from https://rpubs.com/elizabethfl/646805

[22] Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision support systems, 47(4), 547-553.

[23] Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance.

[24] Hamidieh, K. (2018). A data-driven statistical model for predicting the critical temperature of a superconductor. Computational Materials Science, 154, 346-354.


Cite this Article as :
Style #
MLA Firuz Kamalov, Said Elnaffar, Aswani Cherukuri, Annapurna Jonnalagadda. "Forward feature selection: empirical analysis." Journal of Intelligent Systems and Internet of Things, Vol. 11, No. 1, 2024 ,PP. 44-54 (Doi   :  https://doi.org/10.54216/JISIoT.110105)
APA Firuz Kamalov, Said Elnaffar, Aswani Cherukuri, Annapurna Jonnalagadda. (2024). Forward feature selection: empirical analysis. Journal of Journal of Intelligent Systems and Internet of Things, 11 ( 1 ), 44-54 (Doi   :  https://doi.org/10.54216/JISIoT.110105)
Chicago Firuz Kamalov, Said Elnaffar, Aswani Cherukuri, Annapurna Jonnalagadda. "Forward feature selection: empirical analysis." Journal of Journal of Intelligent Systems and Internet of Things, 11 no. 1 (2024): 44-54 (Doi   :  https://doi.org/10.54216/JISIoT.110105)
Harvard Firuz Kamalov, Said Elnaffar, Aswani Cherukuri, Annapurna Jonnalagadda. (2024). Forward feature selection: empirical analysis. Journal of Journal of Intelligent Systems and Internet of Things, 11 ( 1 ), 44-54 (Doi   :  https://doi.org/10.54216/JISIoT.110105)
Vancouver Firuz Kamalov, Said Elnaffar, Aswani Cherukuri, Annapurna Jonnalagadda. Forward feature selection: empirical analysis. Journal of Journal of Intelligent Systems and Internet of Things, (2024); 11 ( 1 ): 44-54 (Doi   :  https://doi.org/10.54216/JISIoT.110105)
IEEE Firuz Kamalov, Said Elnaffar, Aswani Cherukuri, Annapurna Jonnalagadda, Forward feature selection: empirical analysis, Journal of Journal of Intelligent Systems and Internet of Things, Vol. 11 , No. 1 , (2024) : 44-54 (Doi   :  https://doi.org/10.54216/JISIoT.110105)