Volume 18 , Issue 1 , PP: 145-182, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Firas Zawaideh 1 , Qusay Bsoul 2 , Ala Alzoubi 3 , Nardine T. Botros 4 , Moaz T. Fawzy 5 , Diaa Salama AbdElminaam 6 , Nour Mostafa 7 *
Doi: https://doi.org/10.54216/FPA.180112
This paper presents an optimized framework for detecting SMS spam using advanced machine learning algorithms and natural language processing (NLP) techniques. Two datasets, the Filtering Mobile Phone Spam Dataset and the SMS Spam Collection Dataset, were utilized to evaluate the performance of various classifiers, including Multinomial Naive Bayes, K-Nearest Neighbors, Support Vector Classifier, Decision Trees, and AdaBoost. The methodology encompasses comprehensive data preprocessing steps, such as tokenization, stopword removal, and text normalization, followed by feature extraction using TF-IDF and Bag-of-Words models. The classifiers’ performances were evaluated using accuracy, precision, recall, and F1-score, alongside cross-validation techniques. Results indicate that Support Vector Classifier and AdaBoost consistently achieved superior accuracy in distinguishing between spam and ham messages. The study underscores the importance of data preprocessing and model optimization in enhancing spam detection accuracy, offering valuable insights for improving SMS filtering systems in cybersecurity applications.
SMS spam detection , Machine learning classifiers , Natural Language Processing (NLP) , Feature extraction techniques , Naive Bayes classifier , Support Vector Classifier (SVC) , Bag of Words (BoW) model , Spam vs Ham classification , Hyperparameter optimization
[1] N. Kumar, S. Sonowal, ”Email spam detection using machine learning algorithms”, Proc. of 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 108–113, 2020.
[2] P. T. Nallamothu, M. S. Khan, ”Machine learning for SPAM detection”, Asian Journal of Advances in Research, vol. 6, no. 1, pp. 167–179, 2023.
[3] N. Ahmed, R. Amin, H. Aldabbas, D. Koundal, B. Alouffi, T. Shah, ”Machine learning techniques for spam detection in email and IoT platforms: analysis and research challenges”, Security and Communication Networks, vol. 2022, pp. 1–19, 2022.
[4] Y. Kontsewaya, E. Antonov, A. Artamonov, ”Evaluating the effectiveness of machine learning methods for spam detection”, Procedia Computer Science, vol. 190, pp. 479–486, 2021.
[5] A. P. Rodrigues, R. Fernandes, A. Shetty, K. Lakshmanna, R. M. Shafi, ”Real-time twitter spam detection and sentiment analysis using machine learning and deep learning techniques”, Computational Intelligence and Neuroscience, vol. 2022, 2022.
[6] N. Sun, G. Lin, J. Qiu, P. Rimba, ”Near real-time twitter spam detection with machine learning techniques”, International Journal of Computers and Applications, vol. 44, no. 4, pp. 338–348, 2022.
[7] S. Nandhini, J. Marseline, ”Performance evaluation of machine learning algorithms for email spam detection”, Proc. of 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), pp. 1–4, 2020.
[8] S. D. Gupta, S. Saha, S. K. Das, ”SMS spam detection using machine learning”, Journal of Physics: Conf. Ser., vol. 1797, no. 1, 2021.
[9] M. F. A. Kadir, A. F. A. Abidin, M. A. Mohamed, N. A. Hamid, ”Spam detection by using machine learning based binary classifier”, Indonesian Journal of Electrical Engineering and Computer Science, vol. 26, no. 1, pp. 310–317, 2022.
[10] M. R. Julis, S. Alagesan, ”Spam detection in SMS using machine learning through text mining”, International Journal of Scientific Technology Research, vol. 9, no. 02, 2020.
[11] L. GuangJun, S. Nazir, H. U. Khan, A. U. Haq, ”Spam detection approach for secure mobile message communication using machine learning algorithms”, Security and Communication Networks, vol. 2020, pp. 1–6, 2020.
[12] H. Sajedi, G. Z. Parast, F. Akbari, ”SMS spam filtering using machine learning techniques: A survey”, Machine Learning Research, vol. 1, no. 1, pp. 1–14, 2016.
[13] T. Almeida, J. M. Hidalgo, T. Silva, ”Towards SMS spam filtering: Results under a new dataset”, International Journal of Information Security Science, vol. 2, no. 1, pp. 1–18, 2013.
[14] L. Jiang, S.Wang, C. Li, L. Zhang, ”Structure extended multinomial naive Bayes”, Information Sciences,
vol. 329, pp. 346–356, 2016.
[15] O. Bardhi, B. G. Zapirain, ”Machine learning techniques applied to electronic healthcare records to predict cancer patient survivability”, Computers, Materials & Continua, vol. 68, no. 2, pp. 1595–1613, 2021.
[16] S. Suthaharan, ”Support vector machine”, in Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning, pp. 207–235, 2016, Springer.
[17] B. De Ville, ”Decision trees”, Wiley Interdisciplinary Reviews: Computational Statistics, vol. 5, no. 6, pp. 448–455, 2013.