Hybrid Ensemble Learning for Flow-Level IoT Traffic Classification
Using ACI Dataset: Towards Scalable and Real-Time Threat Detection
El-Sayed M. El-Kenawy1,2, Sini Raj Pulari3,∗ , Shriram K Vasudevan4
1School of ICT, Faculty of Engineering, Design and Information and Communications Technology (EDICT),
Bahrain Polytechnic, PO Box 33349, Isa Town, Bahrain
2Applied Science Research Center. Applied Science Private University, Amman, Jordan
3Dept. of CSE, Vignan’s Foundation for Science, Technology and Research, Guntur, Andhra Pradesh, India
4Intel India Pvt. Ltd., Bengaluru, India
Emails: skenawy@ieee.org; sinikishan@gmail.com; shriram.kris.vasudevan@intel.com
Abstract
Internet of Things devices, which spread across consumer industrial and critical infrastructure domains, have
boosted the quantity of diverse network traffic and its high frequency. The increasing scale of IoT networks
causes problems securing the diverse data flow within these networks, threatening system performance and
management capabilities. Analyzing network traffic with traditional methods based on signature identification
and rule detection becomes ineffective for new traffic activity patterns and system behavior. Due to extensive
growth in IoT networks, developing intelligent data-based classification systems that can process IoT traffic
quickly and at large operational scales becomes essential. A detailed model of flow-level data-based ma-
chine learning operations for IoT traffic classification utilizes features extracted from the Army Cyber Institute
(ACI) IoT dataset. The dataset encompasses statistical, temporal, and protocol-specific attributes for benign
and malicious network flows. Our methodology first conducts a strict data preprocessing stage, which involves
numerous operations such as cleaning the data, normalizing it and encoding the labels, and performing a fea-
ture correlation analysis before preparing the learning algorithms with a suitable quality and balanced dataset.
Various classification models underwent training, including Linear Discriminant Analysis (LDA), Quadratic
Discriminant Analysis (QDA), Naive Bayes and SGD Classifiers, and statistical learners. Our proposed hy-
brid ensemble method combines weighted voting between a deep learning neural network, a Random Forest
model, and an XGBoost classifier to overcome the limitations of single classifiers. This ensemble model
aimed to make the system more resilient while lowering bias and enhancing its ability to understand various
IoT traffic patterns. A complete set of evaluation metrics assessed the models, using accuracy, precision, recall,
F1-score, Hamming loss, Matthews correlation coefficient (MCC) and Cohen’s Kappa plus balanced accuracy
and log loss for assessment. The chosen metrics allowed researchers to monitor model performance from
global and detailed perspectives when dealing with imbalanced classes and similar patterns between legitimate
and malicious network traffic. The ensemble methodology produces superior results than individual classifiers
demonstrated through experimental results under all performance metrics evaluation. The complex nature
of network environments demonstrates that model fusion achieves excellent results when tracking non-easy-
to-classify traffic patterns. The ensemble approach proves excellent generalization properties and optimized
performance for real-time IoT implementations because of its ability to adapt continuously while maintaining
high accuracy levels. This proposed framework adds to intelligent IoT traffic analysis research while demon-
strating how deep learning and traditional machine learning methods enhance ensemble systems. The system
develops an expandable and clear quantitative solution that can be implemented for advanced network security
systems and traffic monitoring applications across smart cities industrial settings, and critical infrastructure
frameworks.
Keywords: IoT Traffic Classification; Ensemble Learning; Deep Learning; Flow-Based Analysis