Optimized Machine Learning Framework for SMS Spam Detection and Classification:A Comparative Evaluation
This paper presents an optimized framework for detecting SMS spam using advanced machine learning algorithms and natural language processing (NLP) techniques. Two datasets, the Filtering Mobile Phone Spam Dataset and the SMS Spam Collection Dataset, were utilized to evaluate the performance of various classifiers, including Multinomial Naive Bayes, K-Nearest Neighbors, Support Vector Classifier, Decision Trees, and AdaBoost. The methodology encompasses comprehensive data preprocessing steps, such as tokenization, stopword removal, and text normalization, followed by feature extraction using TF-IDF and Bag-of-Words models. The classifiers’ performances were evaluated using accuracy, precision, recall, and F1-score, alongside cross-validation techniques. Results indicate that Support Vector Classifier and AdaBoost consistently achieved superior accuracy in distinguishing between spam and ham messages. The study underscores the importance of data preprocessing and model optimization in enhancing spam detection accuracy, offering valuable insights for improving SMS filtering systems in cybersecurity applications.
Volume & Issue
Vol. Volume 18 / Iss. Issue 1