Journal of Cybersecurity and Information Management

Journal DOI

https://doi.org/10.54216/JCIM

Submit Your Paper

2690-6775ISSN (Online) 2769-7851ISSN (Print)

Volume 16 , Issue 1 , PP: 53-67, 2025 | Cite this article as | XML | Html | PDF | Full Length Article

Implementing Comparative Analysis on Feature Engineering Techniques and Multi-Model Evaluation Framework for IDS

Neha Sharma 1 * , Abhishek Kajal 2

  • 1 Department of Computer Science and Engineering, Guru Jambheshwar University of Science and Technology, Hisar, 125001, Haryana, India - (nehasharma31066@gmail.com)
  • 2 Department of Computer Science and Engineering, Guru Jambheshwar University of Science and Technology, Hisar, 125001, Haryana, India - (drabhishekkajal@gmail.com)
  • Doi: https://doi.org/10.54216/JCIM.160105

    Received: November 14, 2024 Revised: January 16, 2025 Accepted: February 14, 2025
    Abstract

    In recent years, most of the current intrusion detection methods run for critical information infrastructure are tested for IDS datasets, but does not provide desired protection against emerging cyber- threats. Most machine and deep learning-based intrusion detection methods are inefficient on networks due to their high imbalanced or noisy IDS datasets. Therefore, in this paper, our proposed work implements a comprehensive framework, using multiple models of machine learning and deep learning by taking advantage of advanced feature engineering approaches. Our research explores the impacts of a variety of feature engineering approaches on dimensionality reduction methods used to train and test model performance with execution time taken on the CICIDS2017 dataset to reduce the time complexity and enhance performance to detect intrusion by experiment and leveraging feature engineering techniques like PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis), t_SNE (t-Distributed Stochastic Neighbor Embedding), and Autoencoders. This framework also resolves the class imbalance issues by using SMOTE (Synthetic Minority Oversampling Technique), generates synthetic samples of those classes, which have a very low number of samples to balance the class for a better model performance. Our comparative analysis is performed on metrics like accuracy, training time and memory usage for machine learning models like Gradient Boosting, Logistic Regression, XGBoost and deep learning models. DL with LDA feature engineering approach achieved the highest test accuracy of 95.99% and Gradient Boosting shows strong performance by attaining a high-test accuracy of 90.8%. Illustrated DL model had higher memory usage, but LR and XG- Boost models performed computationally efficient. Further, it is observed that LDA performed better with ML and DL models in comparison to other feature engineering techniques to enhance the intrusion detection efficiency.

    Keywords :

    PCA , LDA , t_SNE , Autoencoder , ML and DL , IDS

    References

    [1]      L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.

    [2]      H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Trans. Knowledge Data Eng., vol. 21, no. 9, pp. 1263-1284, 2009.

    [3]      M. Kuhn and K. Johnson, Feature Engineering and Selection: A Practical Approach for Predictive Models, CRC Press, 2019.

    [4]      E. Alpaydin, Introduction to Machine Learning, 4th ed., MIT Press, 2020.

    [5]      D. Srilatha and N. Thillaiarasu, "DDoSNet: A deep learning model for detecting network attacks in cloud computing," in Proc. 4th Int. Conf. Inventive Research Computing Applications, ICIRCA 2022, 2022, doi: 10.1109/ICIRCA54612.2022.9985524.

    [6]      C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

    [7]      I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," J. Mach. Learn. Res., vol. 3, pp. 1157-1182, 2003.

    [8]      I. Guyon, J. Weston, and S. Barnhill, "Gene selection for cancer classification using support vector machines," Machine Learning, vol. 46, no. 1, pp. 389-422, 2002.

    [9]      F. Pedregosa et al., "Scikit-learn: Machine learning in Python," J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.

    [10]   A. Kumar et al., "Feature engineering for IoT networks: A survey," IEEE Access, vol. 8, pp. 164107–164128, 2020.

    [11]   L. Kotthoff et al., "Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA," J. Mach. Learn. Res., vol. 18, no. 1, pp. 826-830, 2017.

    [12]   G. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, 2006.

    [13]   S. Suhana, S. Karthic, and N. Yuvaraj, "Ensemble based dimensionality reduction for intrusion detection using random forest in wireless networks," in Proc. 5th Int. Conf. Smart Systems Inventive Technology, ICSSIT 2023, 2023, doi: 10.1109/ICSSIT55814.2023.10060929.

    [14]   M. Sarhan, S. Layeghy, N. Moustafa, M. Gallagher, and M. Portmann, "Feature extraction for machine learning-based intrusion detection in IoT networks," Digital Communications and Networks, vol. 10, no. 1, 2024, doi: 10.1016/j.dcan.2022.08.012.

    [15]   F. Zare and P. Mahmoudi-Nasr, "Feature engineering methods in intrusion detection system: A performance evaluation," Int. J. Eng. Trans. B: Applications, vol. 36, no. 7, 2023, doi: 10.5829/ije.2023.36.07a.15.

    [16]   D. Musleh, M. Alotaibi, F. Alhaidari, A. Rahman, and R. M. Mohammad, "Intrusion detection system using feature extraction with machine learning algorithms in IoT," J. Sensor Actuator Networks, vol. 12, no. 2, 2023, doi: 10.3390/jsan12020029.

    [17]   R. Mohammad, F. Saeed, A. A. Almazroi, F. S. Alsubaei, and A. A. Almazroi, "Enhancing intrusion detection systems using a deep learning and data augmentation approach," Systems, vol. 12, no. 3, 2024, doi: 10.3390/systems12030079.

    [18]   S. Saha, A. T. Priyoti, A. Sharma, and A. Haque, "Towards an optimal feature selection method for AI-based DDoS detection system," in Proc. IEEE Consumer Communications Networking Conf., CCNC 2022, 2022, doi: 10.1109/CCNC49033.2022.9700569.

    [19]   R. Tibshirani, "Regression shrinkage and selection via the Lasso," J. Roy. Stat. Soc. B, vol. 267, pp. 267-288, 1996.

    [20]   G. Ketepalli and P. Bulla, "Data preparation and pre-processing of intrusion detection datasets using machine learning," in Proc. 6th Int. Conf. Inventive Computation Technologies, ICICT 2023, 2023, doi: 10.1109/ICICT57646.2023.10134025.

    [21]   B. Natarajan, S. Bose, N. Maheswaran, G. Logeswari, and T. Anitha, "A new high-performance feature selection method for machine learning-based IoT intrusion detection," in Proc. 12th IEEE Int. Conf. Advanced Computing, ICoAC 2023, 2023, doi: 10.1109/ICoAC59537.2023.10249916.

    [22]   K. Albulayhi et al., "IoT intrusion detection using machine learning with a novel high performing feature selection method," Appl. Sci. (Switzerland), vol. 12, no. 10, 2022, doi: 10.3390/app12105015.

    [23]   Y. Guo et al., "Deep feature selection: Theory and application to identify enhancers and promoters," Bioinformatics, vol. 35, no. 21, pp. 4298-4306, 2019.

    [24]   J. M. Kanter and K. Veeramachaneni, "Deep feature synthesis: Towards automating data science endeavors," in Proc. 2015 IEEE Int. Conf. Data Science Advanced Analytics, 2015, pp. 1-10.

    [25]   A. Ng, "Sparse autoencoder," CS294A Lecture Notes, 2011.

    [26]   Y. Niu et al., "Application of a new feature generation algorithm in intrusion detection system," Wireless Commun. Mobile Comput., 2022, doi: 10.1155/2022/3794579.

    [27]   E. Roponena and I. Polaka, "Classifier selection for an ensemble of network traffic analysis machine learning models," in Proc. 2022 63rd Int. Sci. Conf. Information Technology Management Science Riga Technical University, ITMS 2022, 2022, doi: 10.1109/ITMS56974.2022.9937116.

    [28]   S. Das et al., "Network intrusion detection and comparative analysis using ensemble machine learning and feature selection," IEEE Trans. Netw. Service Manag., vol. 19, no. 4, 2022, doi: 10.1109/TNSM.2021.3138457.

    [29]   A. Fatani et al., "Advanced feature extraction and selection approach using deep learning and aquila optimizer for IoT intrusion detection system," Sensors, vol. 22, no. 1, 2022, doi: 10.3390/s22010140.

    [30]   H. Lin, Q. Xue, J. Feng, and D. Bai, "Internet of things intrusion detection model and algorithm based on cloud computing and multi-feature extraction extreme learning machine," Digital Commun. Networks, vol. 9, no. 1, 2023, doi: 10.1016/j.dcan.2022.09.021.

    [31]   A. M. Abdullah et al., "Feature engineering algorithms for traffic dataset," Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 4, 2021, doi: 10.14569/IJACSA.2021.0120435.

    [32]   R. R. Rejimol Robinson, K. P. Anagha Madhav, and C. Thomas, "Improved minority attack detection in intrusion detection system using efficient feature selection algorithms," Expert Systems, vol. 41, no. 7, 2024, doi: 10.1111/exsy.13546.

    [33]   P. C. Nguyen, Q. T. Nguyen, and K. H. Le, "An ensemble feature selection algorithm for machine learning-based intrusion detection system," in Proc. 2021 8th NAFOSTED Conf. Information Computer Science, NICS 2021, 2021, doi: 10.1109/NICS54270.2021.9701577.

    [34]   R. Mohammad et al., "Enhancing intrusion detection systems using a deep learning and data augmentation approach," Systems, vol. 12, no. 3, 2024.

    Cite This Article As :
    Sharma, Neha. , Kajal, Abhishek. Implementing Comparative Analysis on Feature Engineering Techniques and Multi-Model Evaluation Framework for IDS. Journal of Cybersecurity and Information Management, vol. , no. , 2025, pp. 53-67. DOI: https://doi.org/10.54216/JCIM.160105
    Sharma, N. Kajal, A. (2025). Implementing Comparative Analysis on Feature Engineering Techniques and Multi-Model Evaluation Framework for IDS. Journal of Cybersecurity and Information Management, (), 53-67. DOI: https://doi.org/10.54216/JCIM.160105
    Sharma, Neha. Kajal, Abhishek. Implementing Comparative Analysis on Feature Engineering Techniques and Multi-Model Evaluation Framework for IDS. Journal of Cybersecurity and Information Management , no. (2025): 53-67. DOI: https://doi.org/10.54216/JCIM.160105
    Sharma, N. , Kajal, A. (2025) . Implementing Comparative Analysis on Feature Engineering Techniques and Multi-Model Evaluation Framework for IDS. Journal of Cybersecurity and Information Management , () , 53-67 . DOI: https://doi.org/10.54216/JCIM.160105
    Sharma N. , Kajal A. [2025]. Implementing Comparative Analysis on Feature Engineering Techniques and Multi-Model Evaluation Framework for IDS. Journal of Cybersecurity and Information Management. (): 53-67. DOI: https://doi.org/10.54216/JCIM.160105
    Sharma, N. Kajal, A. "Implementing Comparative Analysis on Feature Engineering Techniques and Multi-Model Evaluation Framework for IDS," Journal of Cybersecurity and Information Management, vol. , no. , pp. 53-67, 2025. DOI: https://doi.org/10.54216/JCIM.160105