Volume 15 , Issue 2 , PP: 285-292, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Omar Dhafer Madeeh 1 , Osamah M. Abduljabbar 2 , Huda Mohammed Lateef 3
Doi: https://doi.org/10.54216/JCIM.150221
Protecting big data has become an extremely vital necessity in the context of cybersecurity, given the significant impact that this data has on institutions and clients. The importance of this type of data is highlighted as a basis for decision-making processes and policy guidance. Therefore, attacks on this data can lead to serious losses through illicit access, resulting in a loss of integrity, reliability, confidentiality, and availability of this data. The second problem in this context arises from the necessity of reducing the attack detection period and its vital importance in classifying malicious and non-harmful patterns. Structured Query Language Injection Attack (SQLIA) is among the common attacks targeting data, which is the focus of interest in the proposed model. The aim of this research revolves around developing an approach aimed at detecting and distinguishing patterns of loads sent by the user. The proposed method is based on training a model using random forest technology, which is considered one of the machine learning (ML) techniques while taking advantage of the Spark ML library that interacts effectively with big data frameworks. This is accompanied by a comprehensive analysis of the effectiveness of ML techniques in monitoring and detecting SQLIA. The study was conducted using the SQL dataset available on the Kaggle platform and showed promising results as the proposed method achieved an accuracy of 98.12%. While the proposed approach takes 0.046 seconds to determine the SQL type. It is concluded from these results that using the Spark ML library based on ML techniques contributes to achieving higher accuracy and requires less time to identify the class of request sent due to its ability to be distributed in memory.
Big data , Spark ML , SQL Injection , Random Forest
[1] A. H. Farhan and R. F. Hasan, “Detection SQL Injection Attacks Against Web Application by Using K-Nearest Neighbors with Principal Component Analysis,” in Proceedings of Data Analytics and Management: ICDAM 2022, Springer, pp. 631–642,2023.
[2] K. N. Durai, R. Subha, and A. Haldorai, “A Novel Method to Detect and Prevent SQLIA Using to Cloud Web Security,” Wirel. Pers. Commun., vol. Ontology 117, no. 4, pp. 2995–3014, 2021, doi: 10.1007/s11277-020-07243-z.
[3] A. Haldorai, S. Devi, R. Joan, and L. Arulmurugan, “Big Data in Intelligent Information Systems,” Mob. Networks Appl., no. October 2021, pp. 997–999, 2022, doi: 10.1007/s11036-021-01863-w.
[4] M. J. Awan et al., “Real-time ddos attack detection system using big data approach,” Sustain., vol. 13, no. 19, pp. 1–19, 2021, doi: 10.3390/su131910743.
[5] A. H. Farhan and R. F. Hasan, “Using random forest with principal component analysis to detect SQLIA,” in AIP Conference Proceedings, 2023.
[6] M. Alghawazi, D. Alghazzawi, and S. Alarifi, “Detection of SQL Injection Attack Using Machine Learning Techniques: A Systematic Literature Review,” J. Cybersecurity Priv., vol. 2, no. 4, pp. 764–777, 2022, doi: 10.3390/jcp2040039.
[7] O. S. F. Shareef, R. F. Hasan, and A. H. Farhan, “Analyzing SQL payloads using logistic regression in a big data environment,” J. Intell. Syst., 2023, [Online]. Available: https://doi.org/10.1515/jisys-2023-0063
[8] S. A. Alasadi and W. S. Bhaya, “Review of data preprocessing techniques in data mining,” J. Eng. Appl. Sci., vol. 12, no. 16, pp. 4102–4107, 2017.
[9] H. El Rifai, L. Al Qadi, and A. Elnagar, “Arabic text classification: the need for multi-labeling systems,” Neural Comput. Appl., vol. 34, no. 2, pp. 1135–1159, 2022, doi: 10.1007/s00521-021-06390-z.
[10] J. S. Yang, C. Y. Zhao, H. T. Yu, and H. Y. Chen, “Use GBDT to Predict the Stock Market,” Procedia Comput. Sci., vol. 174, no. 2019, pp. 161–171, 2020, doi: 10.1016/j.procs.2020.06.071.
[11] M. RafaĆo, “Cross validation methods: Analysis based on diagnostics of thyroid cancer metastasis,” ICT Express, vol. 8, no. 2, pp. 183–188, 2022, doi: 10.1016/j.icte.2021.05.001.
[12] A. B. Shaik and S. Srinivasan, A brief survey on random forest ensembles in classification model, vol. 56. Springer Singapore, 2019. Doi: 10.1007/978-981-13-2354-6_27.
[13] O. D. Madeeh and H. S. Abdullah, “An Efficient Prediction Model based on Machine Learning Techniques for Prediction of the Stock Market,” J. Phys. Conf. Ser., vol. 1804, no. 1, 2021, doi: 10.1088/1742-6596/1804/1/012008.
[14] I. S. I. Abuhaiba and H. M. Dawoud, “Combining different approaches to improve Arabic text documents classification,” Int. J. Intell. Syst. Appl., vol. 9, no. 4, pp. 39–52, 2017, doi: 10.5815/ijisa.2017.04.05.
[15] F. K. Alarfaj, “applied sciences Enhancing the Performance of SQL Injection Attack Detection through Probabilistic Neural Networks,” 2023.
[16] R. F. Hasan, O. S. F. Shareef, and A. H. Farhan, “Analysis of the False Prediction of the Logistic Regression Algorithm in SQL Payload Classification and its Impact on the Principles of Information Security (CIA),” Iraqi J. Comput. Sci. Math., vol. 4, no. 4, pp. 191–203, 2023, doi: 10.52866/ijcsm.2023.04.04.015.
[17] S. O. Uwagbole, W. J. Buchanan, and L. Fan, “Applied Machine Learning predictive analytics to SQL Injection Attack detection and prevention,” Proc. IM 2017 - 2017 IFIP/IEEE Int. Symp. Integr. Netw. Serv. Manag., pp. 1087–1090, 2017, doi: 10.23919/INM.2017.7987433.
[18] O. Hubskyi, T. Babenko, L. Myrutenko, and O. Oksiiuk, “Detection of sql injection attack using neural networks,” Adv. Intell. Syst. Comput., vol. 1265 AISC, pp. 277–286, 2021, doi: 10.1007/978-3-030-58124-4_27.
[19] P. Tang, W. Qiu, Z. Huang, H. Lian, and G. Liu, “Detection of SQL injection based on artificial neural network,” Knowledge-Based Syst., vol. 190, p. 105528, 2020, doi: 10.1016/j.knosys.2020.105528.
[20] B. Kranthikumar and R. L. Velusamy, “SQL injection detection using REGEX classifier,” J. Xi’an Univ. Archit. Technol., vol. Volume XII, no. Issue VI, pp. 800–809, 2020.
[21] A. Joshi and V. Geetha, “SQL Injection detection using machine learning,” in 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies, ICCICCT 2014, pp. 1111–1115, 2014, doi: 10.1109/ICCICCT.2014.6993127.
[22] P. Aggarwal, A. Kumar, K. Michael, J. Nemade, S. Sharma, and others, “Random Decision Forest approach for Mitigating SQL Injection Attacks,” in 2021 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), pp. 1–5,2021.