Hybrid Ensemble Learning for Flow-Level IoT Traffic Classification

Using ACI Dataset: Towards Scalable and Real-Time Threat Detection

El-Sayed M. El-Kenawy1,2, Sini Raj Pulari3,∗ , Shriram K Vasudevan4

1School of ICT, Faculty of Engineering, Design and Information and Communications Technology (EDICT),

Bahrain Polytechnic, PO Box 33349, Isa Town, Bahrain

2Applied Science Research Center. Applied Science Private University, Amman, Jordan

3Dept. of CSE, Vignan’s Foundation for Science, Technology and Research, Guntur, Andhra Pradesh, India

4Intel India Pvt. Ltd., Bengaluru, India

Emails: skenawy@ieee.org; sinikishan@gmail.com; shriram.kris.vasudevan@intel.com

Abstract

Internet of Things devices, which spread across consumer industrial and critical infrastructure domains, have

boosted the quantity of diverse network traffic and its high frequency. The increasing scale of IoT networks

causes problems securing the diverse data flow within these networks, threatening system performance and

management capabilities. Analyzing network traffic with traditional methods based on signature identification

and rule detection becomes ineffective for new traffic activity patterns and system behavior. Due to extensive

growth in IoT networks, developing intelligent data-based classification systems that can process IoT traffic

quickly and at large operational scales becomes essential. A detailed model of flow-level data-based ma-

chine learning operations for IoT traffic classification utilizes features extracted from the Army Cyber Institute

(ACI) IoT dataset. The dataset encompasses statistical, temporal, and protocol-specific attributes for benign

and malicious network flows. Our methodology first conducts a strict data preprocessing stage, which involves

numerous operations such as cleaning the data, normalizing it and encoding the labels, and performing a fea-

ture correlation analysis before preparing the learning algorithms with a suitable quality and balanced dataset.

Various classification models underwent training, including Linear Discriminant Analysis (LDA), Quadratic

Discriminant Analysis (QDA), Naive Bayes and SGD Classifiers, and statistical learners. Our proposed hy-

brid ensemble method combines weighted voting between a deep learning neural network, a Random Forest

model, and an XGBoost classifier to overcome the limitations of single classifiers. This ensemble model

aimed to make the system more resilient while lowering bias and enhancing its ability to understand various

IoT traffic patterns. A complete set of evaluation metrics assessed the models, using accuracy, precision, recall,

F1-score, Hamming loss, Matthews correlation coefficient (MCC) and Cohen’s Kappa plus balanced accuracy

and log loss for assessment. The chosen metrics allowed researchers to monitor model performance from

global and detailed perspectives when dealing with imbalanced classes and similar patterns between legitimate

and malicious network traffic. The ensemble methodology produces superior results than individual classifiers

demonstrated through experimental results under all performance metrics evaluation. The complex nature

of network environments demonstrates that model fusion achieves excellent results when tracking non-easy-

to-classify traffic patterns. The ensemble approach proves excellent generalization properties and optimized

performance for real-time IoT implementations because of its ability to adapt continuously while maintaining

high accuracy levels. This proposed framework adds to intelligent IoT traffic analysis research while demon-

strating how deep learning and traditional machine learning methods enhance ensemble systems. The system

develops an expandable and clear quantitative solution that can be implemented for advanced network security

systems and traffic monitoring applications across smart cities industrial settings, and critical infrastructure

frameworks.

Keywords: IoT Traffic Classification; Ensemble Learning; Deep Learning; Flow-Based Analysis