Volume 15 , Issue 1 , PP: 342-351, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Muatamed Abed Hajer 1 * , Mustafa K. Alasadi 2 , Ali Obied 3
Doi: https://doi.org/10.54216/JCIM.150127
Phishing and spam are examples of unsolicited emails, result in significant financial losses for businesses and individuals every year. Numerous methodologies and strategies have been devised for the automated identification of spam, yet they have not demonstrated complete predictive precision. Within the spectrum of suggested methodologies, ML and DL algorithms have shown the most promising results. This article scrutinizes the outcomes of assessing the efficacy of three transformation-based models - BERT, AlBERT, and RoBERTa - in scrutinizing both textual and numerical data. The proposed models achieved higher accuracy and efficiency in classification tasks, which was a notable improvement above traditional models such as KNN, NB, BiLSTM, and LSTM. Interestingly, in several criteria the Roberta model achieved almost perfect accuracy, suggesting that it is very flexible on a variety of datasets.
Spam E-mail , BERT , ALBERT , Roberta , Machine learning , Deep learning
[1] Mohammed, M.A., Ibrahim, D.A. and Salman, A.O., 2021. Adaptive intelligent learning approach based on visual anti-spam email model for multi-natural language. Journal of Intelligent Systems, 30(1), pp.774-792.
[2] Source of the provided text is an article titled "The Evolution of Email: From Simple Communication to Spam Combat," published on TechCrunch on March 15, 2023.
[3] Belinkov, Yonatan, and James Glass. "Analysis methods in neural language processing: A survey." Transactions of the Association for Computational Linguistics 7 (2019): 49-72.
[4] Devlin, Jacob, et al. "BERT: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv: 1810.04805 (2018).
[5] Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." Journal of Machine Learning Research 22.140 (2021): 1-67
[6] Tida, V. S., & Hsu, S. (2022). Universal Spam Detection using Transfer Learning of BERT Model. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2202.03480
[7] Li, Y., & Zhao, H. (2020). BURT: BERT-inspired Universal Representation from Twin Structure. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2004.13947
[8] Li, Y., & Zhao, H. (2020). BURT: BERT-inspired Universal Representation from Twin Structure. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2004.13947
[9] Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). ALBERT: A lite BERT for self-supervised learning of language representations. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1909.11942
[10] Reusch, A., Thiele, M., & Lehner, W. (2021). An ALBERT-based Similarity Measure for Mathematical Answer Retrieval. Virtual Event, Canada. https://doi.org/10.1145/3404835.3463023
[11] Majumder, S., & Das, D. (2020). Rhetorical role labelling for legal judgements using ROBERTA. https://www.semanticscholar.org/paper/Rhetorical-Role-Labelling-for-Legal-Judgements-Majumder-Das/483a39b7953a88494d2ec65f53acaea91ae76eb0
[12] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). ROBERTA: A robustly optimized BERT pretraining approach. https://www.semanticscholar.org/paper/RoBERTa%3A-A-Robustly-Optimized-BERT-Pretraining-Liu-Ott/077f8329a7b6fa3b7c877a57b81eb6c18b5f87de
[13] Pritzkau, A. (2021). NLytics at CheckThat! 2021: Multi-class fake news detection of news articles and domain identification with RoBERTa - a baseline model. https://www.semanticscholar.org/paper/NLytics-at-CheckThat!%C2%A02021%3A-Multi-class-fake-news-a-Pritzkau/898d3560f7d36dc84d8355ed7439c78c32068380
[14] Suwarningsih, W., Pramata, R. A., Rahadika, F. Y., & Purnomo, M. H. A. (2022). RoBERTa: language modelling in building Indonesian question-answering systems. Telkomnika, 20(6), 1248. https://doi.org/10.12928/telkomnika.v20i6.24248
[15] R. Hassanpour, E. Dogdu, R. Choupani, O. Goker, and N. Nazli, “Phishing e-mail detection by using deep learning algorithms,” in Proceed-ings of the ACMSE 2018 Conference, 2018, pp. 1–1.
[16] G. Egozi and R. Verma, “Phishing email detection using robust nlp techniques,” in 2018 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, 2018, pp. 7–12
[17] Shajideen, N.M., Bindu, V. (2018). Spam filtering: A comparison between different machine learning classifiers. 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 1919-1922. https://doi.org/10.1109/ICECA.2018.8474778
[18] Saab, S. A., Mitri, N., & Awad, M. (2014). Ham or Spam? A comparative study for some content-based classification algorithms for email filtering. Proceedings of MELECON 2014-2014 17th IEEE Mediterranean Electrotechnical Conference (pp. 339-343)
[19] Anshumaanmishra, N., & VigneshwaranPandi, N. (2022). Classifications of E-MAIL SPAM using deep learning approaches. In Advances in parallel computing. https://doi.org/10.3233/apc220058
[20] Iqbal, F., Javed, A. R., Jhaveri, R. H., Almadhor, A., & Farooq, U. (2023). Transfer learning-based forensic analysis and classification of E-Mail content. ACM Transactions on Asian and Low-resource Language Information Processing. https://doi.org/10.1145/360459
[21] AbdulNabi, I., & Yaseen, Q. (2021). Spam email detection using deep learning techniques. Procedia Computer Science, 184, 853–858. https://doi.org/10.1016/j.procs.2021.03.107
[22] Malhotra, P., & Malik, S. (2022). Spam email detection using machine learning and deep learning techniques. Social Science Research Network. https://doi.org/10.2139/ssrn.4145123
[23] D. Dua and C. Graff, UCI machine learning repository, 2017. [Online]. Available: http://archive.ics.uci.edu/ml.
[24] karthick veerakumar, Spam filter, 2017. [Online]. Available:https://www.kaggle.com/karthickveerakumar/spam-filter.
[25] V. Metsis, I. Androutsopoulos and G. Paliouras, "Spam Filtering with Naive Bayes - Which Naive Bayes?” Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS 2006), Mountain View, CA, USA, 2006. Available: https://www.kaggle.com/datasets/venky73/spam-mails-dataset