Machine Learning-Driven Cyber Threat Prediction and Prevention: A Multi-Dataset Design and Comparative Evaluation

Krishneel Sundar; Pritika Reddy; Kaylash C. Chaudhary

doi:https://doi.org/10.54216/JCIM.180106

Machine Learning-Driven Cyber Threat Prediction and Prevention: A Multi-Dataset Design and Comparative Evaluation

Krishneel Sundar ^{1
*} , Pritika Reddy ² , Kaylash C. Chaudhary ³

1 Department of Computing Science and Information Systems, Fiji National University, Nasinu, Fiji - (krishneelsundar143@gmail.com )

2 Department of Computing Science and Information Systems, Fiji National University, Nasinu, Fiji - (pritikareddy26@gmail.com )

3 Department of Information and Mathematical Sciences, the University of the South Pacific- Laucala Campus, Suva, Fiji - (kaylash.chaudhary@usp.ac.fj)

Doi: https://doi.org/10.54216/JCIM.180106

January 22, 2026 Revised: February 18, 2026 Accepted: March 30, 2026

Abstract

As technology advances, the frequency and variety of intrusions and other security threats within network environments continue to grow. Intrusion detection systems (IDS) play a vital role in securing networks against unauthorized access and attacks on computer systems; however, traditional IDSs are very limited in their ability to recognize new, complex malicious threats because they rely on signature-based detection. Approaches based on machine learning have shown a promising alternative in identifying unknown malicious attacks. This study proposes a computationally efficient, generalizable machine-learning framework for robust cyber-threat prediction. Three benchmark datasets (HIKARI-2021, CIC-IDS2017, and KDDCup99) were used for full-pipeline evaluations, including preprocessing, feature selection, class-imbalance handling, hyperparameter optimization, and strict model validation. Eight classifiers were assessed, which included traditional classifiers and more modern ensemble methods. The results from this study showed that tree-based models, mainly both Random Forest and XGBoost achieved near-perfect performance across all datasets, reaching accuracy values up to 0.999 and F1-scores between 0.99 and 0.999. Additionally, the SHAP-based explainability analysis was applied to reveal features that drove predictions, enabling interpretability and transparency. Compared with prior studies, the proposed framework consistently delivers improved, more stable detection performance. The findings highlight that optimized ML models combined with balanced datasets and rigorous validation protocols can significantly enhance intrusion detection reliability. Furthermore, this approach provides a practical and scalable solution for strengthening cybersecurity defenses against evolving and emerging cyber threats.

Keywords :

Network intrusion , , Machine Learning , Intrusion Detection System , Ensemble Methods , Generalization , SHAP Explainability

References

[1] S. Morgan, “Cybercrime To Cost The World $10.5 Trillion Annually By 2025.” Accessed: Feb. 07, 2026. [Online]. Available: https://cybersecurityventures.com/hackerpocalypse-cybercrime-report-2016/

[2] IBM, “What is an Intrusion Detection System (IDS)?” Accessed: Feb. 07, 2026. [Online]. Available: https://www.ibm.com/think/topics/intrusion-detection-system

[3] A. Hozouri, A. Mirzaei, and M. Effatparvar, “A comprehensive survey on intrusion detection systems with advances in machine learning, deep learning and emerging cybersecurity challenges,” Discover Artificial Intelligence, vol. 5, no. 1, Dec. 2025, doi: 10.1007/s44163-025-00578-1.

[4] M. L. Ali, K. Thakur, S. Schmeelk, J. Debello, and D. Dragos, “Deep Learning vs. Machine Learning for Intrusion Detection in Computer Networks: A Comparative Study,” Applied Sciences (Switzerland), vol. 15, no. 4, Feb. 2025, doi: 10.3390/app15041903.

[5] M. A. Hossain and M. S. Islam, “Ensuring network security with a robust intrusion detection system using ensemble-based machine learning,” Array, vol. 19, Sep. 2023, doi: 10.1016/j.array.2023.100306.

[6] P. Waghmode, M. Kanumuri, H. El-Ocla, and T. Boyle, “Intrusion detection system based on machine learning using least square support vector machine,” Sci. Rep., vol. 15, no. 1, Dec. 2025, doi: 10.1038/s41598-025-95621-7.

[7] R. O. Arogundade, “Network Security Concepts, Dangers, and Defense Best Practical,” Computer Engineering and Intelligent Systems, Mar. 2023, doi: 10.7176/ceis/14-2-03.

[8] D. H. Jeong, B. K. Jeong, and S. Y. Ji, “Multi-Resolution Analysis with Visualization to Determine Network Attack Patterns,” Applied Sciences (Switzerland), vol. 13, no. 6, Mar. 2023, doi: 10.3390/app13063792.

[9] F. Hachmi, K. Boujenfa, and M. Limam, “Enhancing the Accuracy of Intrusion Detection Systems by Reducing the Rates of False Positives and False Negatives Through Multi-objective Optimization,” Journal of Network and Systems Management 2018 27:1, vol. 27, no. 1, pp. 93–120, May 2018, doi: 10.1007/s10922-018-9459-y.

[10] Md. R. Ahmed, salekul Islam, S. Shatabda, A. K. M. M. Islam, and Md. T. I. Robin, “Intrusion Detection System in Software-Defined Networks Using Machine Learning and Deep Learning Techniques –A Comprehensive Survey,” Nov. 21, 2022. doi: 10.36227/techrxiv.17153213.v2.

[11] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating a new intrusion detection dataset and intrusion traffic characterization,” in ICISSP 2018 - Proceedings of the 4th International Conference on Information Systems Security and Privacy, SciTePress, 2018, pp. 108–116. doi: 10.5220/0006639801080116.

[12] A. Ferriyan, A. H. Thamrin, K. Takeda, and J. Murai, “Generating network intrusion detection dataset based on real and encrypted synthetic attack traffic [dataset],” Applied Sciences (Switzerland), vol. 11, no. 17, Sep. 2021, doi: 10.3390/app11177868.

[13] K. He, D. D. Kim, and M. R. Asghar, “Adversarial Machine Learning for Network Intrusion Detection Systems: A Comprehensive Survey,” IEEE Communications Surveys and Tutorials, vol. 25, no. 1, pp. 538–566, 2023, doi: 10.1109/COMST.2022.3233793.

[14] T. Bin Tariq et al., “Intelligent Cyber Security Framework for Threat Detection using Ensemble Learning Techniques”, doi: 10.56979/802/2025.

[15] J. Vitorino, M. Silva, E. Maia, and I. Praça, “An Adversarial Robustness Benchmark for Enterprise Network Intrusion Detection,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14551 LNCS, pp. 3–17, 2024, doi: 10.1007/978-3-031-57537-2_1.

[16] MD Shadman Soumik, “A comparative analysis of Network Intrusion Detection (NID) using Artificial Intelligence techniques for increase network security,” International Journal of Science and Research Archive, vol. 13, no. 2, pp. 4014–4025, Dec. 2024, doi: 10.30574/ijsra.2024.13.2.2664.

[17] S. Farhat, M. Abdelkader, A. Meddeb-Makhlouf, and F. Zarai, “Evaluation of DoS/DDoS Attack Detection with ML Techniques on CIC-IDS2017 Dataset,” in International Conference on Information Systems Security and Privacy, Science and Technology Publications, Lda, 2023, pp. 287–295. doi: 10.5220/0011605700003405.

[18] M. Cantone, C. Marrocco, and A. Bria, “On the Cross-Dataset Generalization of Machine Learning for Network Intrusion Detection,” Feb. 2024, doi: 10.1109/ACCESS.2024.3472907.

[19] Venu Gopal Bitra, Ajay Kumar, Seshagiri Rao, Prakash, and Md. Shakeel Ahmed, “Comparative analysis on intrusion detection system using machine learning approach,” World Journal of Advanced Research and Reviews, vol. 21, no. 3, pp. 2555–2562, Mar. 2024, doi: 10.30574/wjarr.2024.21.3.0983.

[20] Z. Li et al., “Denial of Service (DoS) Attack Detection: Performance Comparison of Supervised Machine Learning Algorithms,” in 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), IEEE, Aug. 2020, pp. 469–474. doi: 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00088.

[21] S. S. Tripathy and B. Behera, “PERFORMANCE EVALUATION OF MACHINE LEARNING ALGORITHMS FOR INTRUSION DETECTION SYSTEM,” 2023, doi: 10.17605/OSF.IO/WX6CS.

[22] “Intrusion detection evaluation dataset (CIC-IDS2017) [dataset].” Accessed: Feb. 07, 2026. [Online]. Available: https://www.unb.ca/cic/datasets/ids-2017.html

[23] S. Stolfo, W. Fan, W. Lee, A. Prodromidis, and P. Chan, “KDD Cup 1999 Data - UCI Machine Learning Repository [dataset].” Accessed: Feb. 07, 2026. [Online]. Available: https://archive.ics.uci.edu/dataset/130/kdd+cup+1999+data

[24] A. Subasi, “Chapter 2 - Data preprocessing,” Practical Machine Learning for Data Analysis Using Python, pp. 27–89, 2020, doi: 10.1016/b978-0-12-821379-7.00002-3.

[25] S. Walling and S. Lodh, “Network intrusion detection system for IoT security using machine learning and statistical based hybrid feature selection,” Security and Privacy, vol. 7, no. 6, p. e429, Nov. 2024, doi: 10.1002/spy2.429.

[26] M. B. Musthafa et al., “Optimizing IoT Intrusion Detection Using Balanced Class Distribution, Feature Selection, and Ensemble Machine Learning Techniques,” Sensors, vol. 24, no. 13, Jul. 2024, doi: 10.3390/s24134293.

[27] R. Duangsoithong and T. Windeatt, “Correlation-Based and Causal Feature Selection Analysis for Ensemble Classifiers,” Artificial Neural Networks in Pattern Recognition, vol. 5998, pp. 25–36, 2010, doi: 10.1007/978-3-642-12159-3_3.

[28] N. Verma, N. Kumar, K. Singh, A. Aljohani, A. Sinha, and S. A. Hussain, “A novel univariate feature selection with ANOVA F-test-based machine learning model for Intrusion Detection Framework of Robotics system,” Applied Artificial Intelligence, vol. 39, no. 1, Dec. 2025, doi: 10.1080/08839514.2025.2539395.

[29] A. H. Ali, M. Charfeddine, B. Ammar, and B. Ben Hamed, “Intrusion Detection Schemes Based on Synthetic Minority Oversampling Technique and Machine Learning Models,” Proceedings - 2024 IEEE 27th International Symposium on Real-Time Distributed Computing, ISORC 2024, 2024, doi: 10.1109/ISORC61049.2024.10551335.

[30] M. Khairy, T. M. Mahmoud, and T. Abd-El-Hafeez, “The effect of rebalancing techniques on the classification performance in cyberbullying datasets,” Jan. 01, 2024, Springer Science and Business Media Deutschland GmbH. doi: 10.1007/s00521-023-09084-w.

[31] D. Bisen, A. Ghanghoria, P. Saurabh, D. Rohith, and U. Singh, “Optimizing Intrusion Detection in Software-Defined Networks Through Automated Machine Learning and Intelligent Feature Engineering,” IEEE Access, vol. 13, pp. 194097–194114, 2025, doi: 10.1109/ACCESS.2025.3632116.

[32] P. Heidari and A. Milan, “Combining K-fold cross validation with bayesian hyperparameter optimization for accuracy enhancement of land cover and land use classification,” Sci. Rep., vol. 15, no. 1, Dec. 2025, doi: 10.1038/s41598-025-23336-w.

[33] “3.1. Cross-validation: evaluating estimator performance — scikit-learn 1.8.0 documentation.” Accessed: May 06, 2026. [Online]. Available: https://scikit-learn.org/stable/modules/cross_validation.html

[34] H. M and S. M.N, “A Review on Evaluation Metrics for Data Classification Evaluations,” International Journal of Data Mining & Knowledge Management Process, vol. 5, no. 2, pp. 01–11, Mar. 2015, doi: 10.5121/ijdkp.2015.5201.

[35] A. A. Awan, “An Introduction to SHAP Values and Machine Learning Interpretability | DataCamp.” Accessed: Feb. 07, 2026. [Online]. Available: https://www.datacamp.com/tutorial/introduction-to-shap-values-machine-learning-interpretability

[36] L. Grinsztajn, E. Oyallon, and G. Varoquaux, “Why do tree-based models still outperform deep learning on tabular data?,” Jul. 2022, [Online]. Available: http://arxiv.org/abs/2207.08815

[37] S. Chalichalamala, N. Govindan, and R. Kasarapu, “Logistic Regression Ensemble Classifier for Intrusion Detection System in Internet of Things,” Sensors 2023, Vol. 23, vol. 23, no. 23, pp. 1–19, Dec. 2023, doi: 10.3390/S23239583.

[38] Z. Chen, F. Jiang, Y. Cheng, X. Gu, W. Liu, and J. Peng, “XGBoost Classifier for DDoS Attack Detection and Analysis in SDN-Based Cloud,” Proceedings - 2018 IEEE International Conference on Big Data and Smart Computing, BigComp 2018, pp. 251–256, May 2018, doi: 10.1109/BIGCOMP.2018.00044.

[39] Y. Huang, “Network Intrusion Detection Method Based on Naive Bayes Algorithm,” Proceedings of 2022 6th Asian Conference on Artificial Intelligence Technology, ACAIT 2022, 2022, doi: 10.1109/ACAIT56212.2022.10137846.

Cite This Article As :

Sundar, Krishneel. , Reddy, Pritika. , C., Kaylash. Machine Learning-Driven Cyber Threat Prediction and Prevention: A Multi-Dataset Design and Comparative Evaluation. Journal of Cybersecurity and Information Management, vol. , no. , 2026, pp. 40–55. DOI: https://doi.org/10.54216/JCIM.180106

Sundar, K. Reddy, P. C., K. (2026). Machine Learning-Driven Cyber Threat Prediction and Prevention: A Multi-Dataset Design and Comparative Evaluation. Journal of Cybersecurity and Information Management, (), 40–55. DOI: https://doi.org/10.54216/JCIM.180106

Sundar, Krishneel. Reddy, Pritika. C., Kaylash. Machine Learning-Driven Cyber Threat Prediction and Prevention: A Multi-Dataset Design and Comparative Evaluation. Journal of Cybersecurity and Information Management , no. (2026): 40–55. DOI: https://doi.org/10.54216/JCIM.180106

Sundar, K. , Reddy, P. , C., K. (2026) . Machine Learning-Driven Cyber Threat Prediction and Prevention: A Multi-Dataset Design and Comparative Evaluation. Journal of Cybersecurity and Information Management , () , 40–55 . DOI: https://doi.org/10.54216/JCIM.180106

Sundar K. , Reddy P. , C. K. [2026]. Machine Learning-Driven Cyber Threat Prediction and Prevention: A Multi-Dataset Design and Comparative Evaluation. Journal of Cybersecurity and Information Management. (): 40–55. DOI: https://doi.org/10.54216/JCIM.180106

Sundar, K. Reddy, P. C., K. "Machine Learning-Driven Cyber Threat Prediction and Prevention: A Multi-Dataset Design and Comparative Evaluation," Journal of Cybersecurity and Information Management, vol. , no. , pp. 40–55, 2026. DOI: https://doi.org/10.54216/JCIM.180106

Journal of Cybersecurity and Information Management

Journal Menu

Journal Volumes

Volume 0

Volume 1

Volume 2

Volume 3

Volume 4

Volume 5

Volume 6

Volume 7

Volume 8

Volume 9

Volume 10

Volume 11

Volume 12

Volume 13

Volume 14

Volume 15

Volume 16

Volume 17

Volume 18

Machine Learning-Driven Cyber Threat Prediction and Prevention: A Multi-Dataset Design and Comparative Evaluation

Abstract

Keywords :

References

Cite This Article As :

Article Statistics

Download