Fusion: Practice and Applications

Journal DOI

https://doi.org/10.54216/FPA

Submit Your Paper

2692-4048ISSN (Online) 2770-0070ISSN (Print)

Volume 16 , Issue 1 , PP: 08-22, 2024 | Cite this article as | XML | Html | PDF | Full Length Article

Improving Arabic Spam classification in social media using hyperparameters tuning and Particle Swarm Optimization

Amr Mohamed El Koshiry 1 * , Entesar H. Ibraheem Eliwa 2 , Ahmed Omar 3

  • 1 Department of Curricula and Teaching Methods, College of Education, King Faisal University, P.O. Box: 400 Al-Ahsa, 31982, Saudi Arabia; Faculty of Specific Education, Minia university, Egypt - (aalkoshiry@kfu.edu.sa)
  • 2 Department of Mathematics and Statistics, College of Science, King Faisal University, P.O. Box: 400 Al-Ahsa, 31982, Saudi Arabia; Department of Computer Science, Faculty of Science, Minia University, P.O. Box:91519, Minia, Egypt - (eheliwa@kfu.edu.sa)
  • 3 Department of Computer Science, Faculty of Science, Minia University, P.O. Box:91519, Minia, Egypt - (ahmed.omar@mu.edu.eg)
  • Doi: https://doi.org/10.54216/FPA.160101

    Received: July 21, 2023 Revised: November 19, 2023 Accepted: April 02, 2024
    Abstract

    Online social networks continue to evolve, serving a variety of purposes, such as sharing educational content, chatting, making friends and followers, sharing news, and playing online games. However, the widespread flow of unwanted messages poses significant problems, including reducing online user interaction time, extremist views, reducing the quality of information, especially in the educational field. The use of coordinated automated accounts or robots on social networking sites is a common tactic for spreading unwanted messages, rumors, fake news, and false testimonies for mass communication or targeted users. Since users (especially in the educational field) receive many messages through social media, they often fail to recognize the content of unwanted messages, which may contain harmful links, malicious programs, fake accounts, false reports, and misleading opinions. Therefore, it is vital to regulate and classify disturbing texts to enhance the security of social media. This study focuses on building an Arabic disturbing message dataset extracted from Twitter, which consists of 14,250 tweets. Our proposed methodology includes applying new tag identification technology to collected tweets. Then, we use prevailing machine learning algorithms to build a model for classifying disturbing messages in Arabic, using effective parameter tuning methods to obtain the most suitable parameters for each algorithm. In addition, we use particle swarm optimization to identify the most relevant features to improve the classification performance. The results indicate a clear improvement in the classification performance from 0.9822 to 0.98875, with a 50% reduction in the feature set. Our study focuses on Arabic spam messages, classifying spam messages, tuning effective parameters, and selecting features as key areas of investigation.

    Keywords :

    Arabic Spam , Spam Classification , Hyperparameters Tuning , Feature Selection.

    References

    [1]        Statista, “Number of global social network users 2017-2027,” 2023. https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/ (accessed Feb. 01, 2023).

    [2]        A. Omar, T. M. Mahmoud, T. Abd-El-Hafeez, and A. Mahfouz, “Multi-label Arabic text classification in Online Social Networks,” Inf. Syst., vol. 100, p. 101785, 2021, doi: 10.1016/j.is.2021.101785.

    [3]        P. V. Bindu, R. Mishra, and P. S. Thilagam, “Discovering spammer communities in twitter,” J. Intell. Inf. Syst., vol. 51, no. 3, pp. 503–527, 2018, doi: 10.1007/s10844-017-0494-z.

    [4]        F. Alqahtani, “Optimizing Spam Detection in Twitter by Using Naïve Bayes, Logistic Regression and Stochastic Gradient Descent with Whale Optimization Algorithm and Genetic Algorithm,” J. Xi’an Univ. Archit. Technol., vol. XII, no. III, pp. 2742–2747, 2020, doi: 10.37896/jxat12.03/225.

    [5]        M. Westerlund, “The emergence of deepfake technology: A review,” Technol. Innov. Manag. Rev., vol. 9, no. 11, pp. 39–52, 2019, doi: 10.22215/TIMREVIEW/1282.

    [6]        J. P. Carpenter, “Spam and Educators ’ Twitter Use : Methodological Challenges and Considerations,” pp. 460–469, 2020.

    [7]        A. Omar, T. M. Mahmoud, and T. Abd-El-Hafeez, Building Online Social Network Dataset for Arabic Text Classification, vol. 723. 2018. doi: 10.1007/978-3-319-74690-6_48.

    [8]        A. Omar and A. E. Hassanien, “An Optimized Arabic Sarcasm Detection in Tweets using Artificial Neural Networks,” 5th Int. Conf. Comput. Informatics, ICCI 2022, no. March 2022, pp. 251–256, 2022, doi: 10.1109/ICCI54321.2022.9756102.

    [9]        A. Omar and T. M. Mahmoud, Comparative Performance of Machine Learning and Deep Learning Algorithms for Arabic Hate Speech Detection in OSNs, vol. 1. Springer International Publishing, 2020. doi: 10.1007/978-3-030-44289-7.

    [10]      X. Deng, Y. Li, J. Weng, and J. Zhang, “Feature selection for text classification: A review,” Multimed. Tools Appl., vol. 78, no. 3, pp. 3797–3816, 2019, doi: 10.1007/s11042-018-6083-5.

    [11]      M. Mataoui, O. Zelmati, D. Boughaci, M. Chaouche, and F. Lagoug, “A proposed spam detection approach for Arabic social networks content,” Proc. 2017 Int. Conf. Math. Inf. Technol. ICMIT 2017, vol. 2018-Janua, pp. 222–226, 2017, doi: 10.1109/MATHIT.2017.8259721.

    [12]      S. Al-Azani and E. S. M. El-Alfy, “Detection of Arabic spam tweets using word embedding and machine learning,” 2018 Int. Conf. Innov. Intell. Informatics, Comput. Technol. 3ICT 2018, 2018, doi: 10.1109/3ICT.2018.8855747.

    [13]      H. Almerekhi and T. Elsayed, “Detecting Automatically-Generated Arabic Tweets,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9460, 2015, pp. 123–134. doi: 10.1007/978-3-319-28940-3_10.

    [14]      A. Ghourabi, M. A. Mahmood, and Q. M. Alzubi, “A hybrid CNN-LSTM model for SMS spam detection in arabic and english messages,” Futur. Internet, vol. 12, no. 9, pp. 1–16, 2020, doi: 10.3390/FI12090156.

    [15]      H. Mubarak, A. Abdelali, S. Hassan, and K. Darwish, Spam Detection on Arabic Twitter, vol. 12467 LNCS. Springer International Publishing, 2020. doi: 10.1007/978-3-030-60975-7_18.

    [16]      A. Ziani et al., “Deceptive Opinions Detection Using New Proposed Arabic Semantic Features,” Procedia CIRP, vol. 189, pp. 29–36, 2021, doi: 10.1016/j.procs.2021.05.067.

    [17]      M. Ott, C. Cardie, and J. T. Hancock, “Negative deceptive opinion spam,” in Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: human language technologies, 2013, pp. 497–501.

    [18]      A. M. Al-Zoubi, J. Alqatawna, H. Faris, and M. A. Hassonah, “Spam profiles detection on social networks using computational intelligence methods: The effect of the lingual context,” J. Inf. Sci., vol. 47, no. 1, pp. 58–81, 2021, doi: 10.1177/0165551519861599.

    [19]      A. M. Alkadri, A. Elkorany, and C. Ahmed, “Enhancing Detection of Arabic Social Spam Using Data Augmentation and Machine Learning,” Appl. Sci., vol. 12, no. 22, 2022, doi: 10.3390/app122211388.

    [20]      H. Najadat, M. A. Alzubaidi, and I. Qarqaz, “Detecting Arabic Spam Reviews in Social Networks Based on Classification Algorithms,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 21, no. 1, pp. 1–13, 2022, doi: 10.1145/3476115.

    [21]      T. Yu and H. Zhu, “Hyper-Parameter Optimization : A Review of Algorithms,” arXiv Prepr. arXiv2003.05689, pp. 1–56, 2020.

    [22]      T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, and P. Networks, “Optuna : A Next - generation H yperparameter Optimization Framework,” pp. 1–10, 2019.

    [23]      E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, pp. 1–21, 2021, doi: 10.3390/informatics8040079.

    [24]      S. Nematzadeh, F. Kiani, M. Torkamanian-afshar, and N. Aydin, “Tuning hyperparameters of machine learning algorithms and deep neural networks using metaheuristics : A bioinformatics study on biomedical and biological cases,” Comput. Biol. Chem., vol. 97, no. December 2021, p. 107619, 2022, doi: 10.1016/j.compbiolchem.2021.107619.

    [25]      B. G. Marcot and A. M. Hanea, “What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?,” Comput. Stat., no. 0123456789, 2020, doi: 10.1007/s00180-020-00999-9.

    [26]      A. P. Piotrowski, J. J. Napiorkowski, and A. E. Piotrowska, “Population size in Particle Swarm Optimization,” Swarm Evol. Comput., vol. 58, no. March 2019, p. 100718, 2020, doi: 10.1016/j.swevo.2020.100718.

    [27]      V. Govindaraju, I. Nwogu, and S. Setlur, “Chapter 1 - Document Informatics for Scientific Learning and Accelerated Discovery,” in Big Data Analytics, vol. 33, V. Govindaraju, V. V Raghavan, and C. R. Rao, Eds. Elsevier, 2015, pp. 3–28. doi: https://doi.org/10.1016/B978-0-444-63492-4.00001-0.

    [28]      N. Orangi-Fard, A. Akhbardeh, and H. Sagreiya, “Predictive Model for ICU Readmission Based on Discharge Summaries Using Machine Learning and Natural Language Processing,” Informatics, vol. 9, no. 1, 2022, doi: 10.3390/informatics9010010.

    [29]      J. Awwalu, A. A. Bakar, and M. R. Yaakub, “Hybrid N-gram model using Naïve Bayes for classification of political sentiments on Twitter,” Neural Comput. Appl., vol. 31, no. 12, pp. 9207–9220, 2019, doi: 10.1007/s00521-019-04248-z.

    [30]      Scikit_Learn, “Machine Learning in Python,” 2022. https://scikit-learn.org/stable/ (accessed Mar. 01, 2023).

    Cite This Article As :
    Mohamed, Amr. , H., Entesar. , Omar, Ahmed. Improving Arabic Spam classification in social media using hyperparameters tuning and Particle Swarm Optimization. Fusion: Practice and Applications, vol. , no. , 2024, pp. 08-22. DOI: https://doi.org/10.54216/FPA.160101
    Mohamed, A. H., E. Omar, A. (2024). Improving Arabic Spam classification in social media using hyperparameters tuning and Particle Swarm Optimization. Fusion: Practice and Applications, (), 08-22. DOI: https://doi.org/10.54216/FPA.160101
    Mohamed, Amr. H., Entesar. Omar, Ahmed. Improving Arabic Spam classification in social media using hyperparameters tuning and Particle Swarm Optimization. Fusion: Practice and Applications , no. (2024): 08-22. DOI: https://doi.org/10.54216/FPA.160101
    Mohamed, A. , H., E. , Omar, A. (2024) . Improving Arabic Spam classification in social media using hyperparameters tuning and Particle Swarm Optimization. Fusion: Practice and Applications , () , 08-22 . DOI: https://doi.org/10.54216/FPA.160101
    Mohamed A. , H. E. , Omar A. [2024]. Improving Arabic Spam classification in social media using hyperparameters tuning and Particle Swarm Optimization. Fusion: Practice and Applications. (): 08-22. DOI: https://doi.org/10.54216/FPA.160101
    Mohamed, A. H., E. Omar, A. "Improving Arabic Spam classification in social media using hyperparameters tuning and Particle Swarm Optimization," Fusion: Practice and Applications, vol. , no. , pp. 08-22, 2024. DOI: https://doi.org/10.54216/FPA.160101