Implementing Comparative Analysis on Feature Engineering Techniques and Multi-Model Evaluation Framework for IDS

Journal of Cybersecurity and Information Management JCIM 2690-6775 2769-7851 10.54216/JCIM https://www.americaspg.com/journals/show/3590 2019 2019 Implementing Comparative Analysis on Feature Engineering Techniques and Multi-Model Evaluation Framework for IDS Department of Computer Science and Engineering, Guru Jambheshwar University of Science and Technology, Hisar, 125001, Haryana, India Neha Neha Department of Computer Science and Engineering, Guru Jambheshwar University of Science and Technology, Hisar, 125001, Haryana, India Abhishek Kajal In recent years, most of the current intrusion detection methods run for critical information infrastructure are tested for IDS datasets, but does not provide desired protection against emerging cyber- threats. Most machine and deep learning-based intrusion detection methods are inefficient on networks due to their high imbalanced or noisy IDS datasets. Therefore, in this paper, our proposed work implements a comprehensive framework, using multiple models of machine learning and deep learning by taking advantage of advanced feature engineering approaches. Our research explores the impacts of a variety of feature engineering approaches on dimensionality reduction methods used to train and test model performance with execution time taken on the CICIDS2017 dataset to reduce the time complexity and enhance performance to detect intrusion by experiment and leveraging feature engineering techniques like PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis), t_SNE (t-Distributed Stochastic Neighbor Embedding), and Autoencoders. This framework also resolves the class imbalance issues by using SMOTE (Synthetic Minority Oversampling Technique), generates synthetic samples of those classes, which have a very low number of samples to balance the class for a better model performance. Our comparative analysis is performed on metrics like accuracy, training time and memory usage for machine learning models like Gradient Boosting, Logistic Regression, XGBoost and deep learning models. DL with LDA feature engineering approach achieved the highest test accuracy of 95.99% and Gradient Boosting shows strong performance by attaining a high-test accuracy of 90.8%. Illustrated DL model had higher memory usage, but LR and XG- Boost models performed computationally efficient. Further, it is observed that LDA performed better with ML and DL models in comparison to other feature engineering techniques to enhance the intrusion detection efficiency. 2025 2025 53 67 10.54216/JCIM.160105 https://www.americaspg.com/articleinfo/2/show/3590