Volume 12 , Issue 2 , PP: 44-64, 2024 | Cite this article as | XML | Html | PDF | Full Length Article
Parh Yong Wong 1 , Nayef A. M. Alduais 2 * , Nurul Aswa Omar 3 , Salama A. Mostafa 4 , Abdul-Malik H. Y. Saad 5 , Antar Shaddad H. Abdul-Qawy 6 , Abdullah B. Nasser 7 , Waheed Ali H. M. Ghanem 8
Doi: https://doi.org/10.54216/JISIoT.120204
With the development and advancement of ICST, data-driven technology such as the Internet of Things (IoT) and Smart Technology including Smart Energy Management Systems (SEMS) has become a trend in many regions and around the globe. There is no doubt that data quality and data quality problems are among the most vital topics to be addressed for a successful application of IoT-based SEMS. Poor data in such major yet delicate systems will affect the quality of life (QoL) of millions, and even cause destruction and disruption to a country. This paper aims to tackle this problem by searching for suitable outlier detection techniques from the many developed ML-based outlier detection methods. Three methods are chosen and analyzed for their performances, namely the K-Nearest Neighbour (KNN)+ Mahalanobis Distance (MD), Minimum Covariance Determinant (MCD), and Local Outlier Factor (LOF) models. Three sensor-collected datasets that are related to SEMS and with different data types are used in this research, they are pre-processed and split into training and testing datasets with manually injected outliers. The training datasets are then used for searching the patterns of the datasets through training of the models, and the trained models are then tested with the testing datasets, using the found patterns to identify and label the outliers in the datasets. All the models can accurately identify the outliers, with their average accuracies scoring over 95%. However, the average execution time used for each model varies, where the KNN+MD model has the longest average execution time at 12.99 seconds, MCD achieving 3.98 seconds for execution time, and the LOF model at 0.60 seconds, the shortest among the three.
Internet of Things (IoT) , Smart Energy Management System , Outlier Detection Techniques , Comparative Analysis , K-Nearest Neighbor (KNN) , Minimum Covariance Determinant (MCD) , Local Outlier Factor (LOF).
[1] Saleem, M. U., Usman, M. R., & Shakir, M. (2021). Design, Implementation, and Deployment of an IoT-Based Smart Energy Management System. IEEE Access, 9, 59649–59664. doi:10.1109/access.2021.3070960
[2] Wang, R. Y., & Strong, D. M. (1996). Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems, 12(4), 5–33. doi:10.1080/07421222.1996.11518099
[3] Geiger, R. S., Cope, D., Ip, J., Lotosh, M., Shah, A., Weng, J., & Tang, R. (2021). ‘Garbage In, Garbage Out’ Revisited: What Do Machine Learning Application Papers Report About Human-Labeled Training Data? doi:10.48550/ARXIV.2107.02278
[4] Yassine, S., & Stanulov, A. (2024). A comparative analysis of machine learning algorithms for the purpose of predicting norwegian air passenger traffic. International Journal of Mathematics, Statistics, and Computer Science, 2, 28–43.
[5] Viteri, M. C., Aguilar, L. R., & Sánchez, M. (2012). Statistical Monitoring of Water Systems. In 11th International Symposium on Process Systems Engineering (pp. 735–739). doi:10.1016/b978-0-444-59507-2.50139-6
[6] Hubert, M., & Debruyne, M. (2009). Minimum covariance determinant. WIREs Computational Statistics, 2(1), 36–43. doi:10.1002/wics.61
[7] Smiti, A. (2020). A critical overview of outlier detection methods. Computer Science Review, 38, 100306. doi:10.1016/j.cosrev.2020.100306
[8] Han, J., Kamber, M., & Pei, J. (2012). Outlier Detection. In Data Mining (pp. 543–584). doi:10.1016/b978-0-12-381479-1.00012-5
[9] Lubis, A. R., Lubis, M., & Khowarizmi, A.-. (2020). Optimization of distance formula in K-Nearest Neighbor method. Bulletin of Electrical Engineering and Informatics, 9(1), 326–338. doi:10.11591/eei.v9i1.1464
[10] Raymaekers, J., & Rousseeuw, P. J. (2023). The Cellwise Minimum Covariance Determinant Estimator. Journal of the American Statistical Association, 1–12. doi:10.1080/01621459.2023.2267777
[11] Leys, C., Klein, O., Dominicy, Y., & Ley, C. (2018). Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance. Journal of Experimental Social Psychology, 74, 150–156. doi:10.1016/j.jesp.2017.09.011
[12] You, L., Peng, Q., Xiong, Z., He, D., Qiu, M., & Zhang, X. (2020). Integrating aspect analysis and local outlier factor for intelligent review spam detection. Future Generation Computer Systems, 102, 163–172. doi:10.1016/j.future.2019.07.044
[13] Abuzaid, A. H. (2020). Detection of Outliers in Univariate Circular Data by Means of the Outlier Local Factor (LOF). Statistics in Transition New Series, 21(3), 39–51. doi:10.21307/stattrans-2020-043
[14] Himeur, Y., Alsalemi, A., Bensaali, F., & Amira, A. (2021). Smart power consumption abnormality detection in buildings using micromoments and improved K‐nearest neighbors. International Journal of Intelligent Systems, 36(6), 2865–2894. doi:10.1002/int.22404
[15] Park, C. H., & Kim, T. (2020). Energy Theft Detection in Advanced Metering Infrastructure Based on Anomaly Pattern Detection. Energies, 13(15), 3832. doi:10.3390/en13153832
[16] Wu, Y., Dai, H.-N., & Tang, H. (2022). Graph Neural Networks for Anomaly Detection in Industrial Internet of Things. IEEE Internet of Things Journal, 9(12), 9214–9231. doi:10.1109/jiot.2021.3094295
[17] Jaiswal, R., Chakravorty, A., & Rong, C. (2020, August). Distributed Fog Computing Architecture for Real-Time Anomaly Detection in Smart Meter Data. 2020 IEEE Sixth International Conference on Big Data Computing Service and Applications (BigDataService). doi:10.1109/bigdataservice49289.2020.00009
[18] Frikha, M. S., Gammar, S. M., & Lahmadi, A. (2021, November). Multi-Attribute Monitoring for Anomaly Detection: a Reinforcement Learning Approach based on Unsupervised Reward. 2021 10th IFIP International Conference on Performance Evaluation and Modeling in Wireless and Wired Networks (PEMWN). doi:10.23919/pemwn53042.2021.9664667
[19] Pourahmadi, V., Alameddine, H. A., Salahuddin, M. A., & Boutaba, R. (2023). Spotting Anomalies at the Edge: Outlier Exposure-Based Cross-Silo Federated Learning for DDoS Detection. IEEE Transactions on Dependable and Secure Computing, 20(5), 4002–4015. doi:10.1109/tdsc.2022.3224896
[20] Gulhare, A. K., Badholia, A., & Sharma, A. (2022, July). Mean-Shift and Local Outlier Factor-Based Ensemble Machine Learning Approach for Anomaly Detection in IoT Devices. 2022 International Conference on Inventive Computation Technologies (ICICT). doi:10.1109/icict54344.2022.9850880
[21] Bhatti, M. A., Riaz, R., Rizvi, S. S., Shokat, S., Riaz, F., & Kwon, S. J. (2020). Outlier detection in indoor localization and Internet of Things (IoT) using machine learning. Journal of Communications and Networks, 22(3), 236–243. doi:10.1109/jcn.2020.000018
[22] Wibisono, A. (2020). Data for: Short-term Prediction of CO2 Concentration based on a Wireless Sensor Network. doi:10.17632/6D798DKHPZ.1
[23] Kusy, B., Hovington, L., Hu, W., & Rana, R. (2012). QCAT Smart Office environment - Humidity. doi:10.4225/08/50629B0DE50C7
[24] Tekler, Z. D., Ono, E., Peng, Y., Zhan, S., Lasternas, B., & Chong, A. (2022). ROBOD, room-level occupancy and building operation dataset. Building Simulation, 15(12), 2127–2137. doi:10.1007/s12273-022-0925-9
[25] Pawluszek-Filipiak, K., & Borkowski, A. (2020). On the Importance of Train–Test Split Ratio of Datasets in Automatic Landslide Detection by Supervised Classification. Remote Sensing, 12(18), 3054. doi:10.3390/rs12183054
[26] yzhao062. (2024, February). PYOD Official Documentation. Retrieved from https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.knn.KNN
[27] Varoquaux, G., & Colliot, O. (2023). Evaluating Machine Learning Models and Their Diagnostic Value. In Neuromethods (pp. 601–630). doi:10.1007/978-1-0716-3195-9_20
[28] Rashid, C. A. (2021). The importance of statistical analysis in accounting research. Journal of Global Social Sciences, 2(7), 71–84. doi:10.58934/jgss.v2i7.26
[29] Foundation, P. S. (2023, October). time — Time access and conversions. Retrieved from https://docs.python.org/3/library/time.html
[30] Dashdondov, K., & Kim, M.-H. (2021). Mahalanobis Distance Based Multivariate Outlier Detection to Improve Performance of Hypertension Prediction. Neural Processing Letters, 55(1), 265–277. doi:10.1007/s11063-021-10663-y
[31] Boateng, E. Y., Otoo, J., & Abaye, D. A. (2020). Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review. Journal of Data Analysis and Information Processing, 08(04), 341–357. doi:10.4236/jdaip.2020.84020