Optimized and Comprehensive Fake Review Detection based on Harris Hawks optimization integrated with Machine Learning Techniques

Journal of Cybersecurity and Information Management JCIM 2690-6775 2769-7851 10.54216/JCIM https://www.americaspg.com/journals/show/3068 2019 2019 Optimized and Comprehensive Fake Review Detection based on Harris Hawks optimization integrated with Machine Learning Techniques Department of Computer Sciences, College of Science for Women, University of Babylon, Babylon, Iraq admin admin Computer sciences department, College of Science for Women, University of Babylon, Iraq Hussien Attia Department of Computer Sciences, University of Technology, Baghdad, Iraq Yossra Hussain Ali Fake review detection, often known as spam review detection, is a crucial aspect of natural language processing. It involves extracting valuable information from text documents obtained from various sources. Various methodologies, such as simple rule-based approaches, lexicon-based methods, and advanced machine learning algorithms, have been extensively employed with diverse classifiers to provide accurate detection of fake reviews. Nevertheless, review classification based on lexicons continues to face challenges in achieving high accuracies, mostly because of the need for domain-specific comprehensive dictionaries. Furthermore, machine learning-driven review detection also addresses the limitations in accuracy caused by the uncertainty of features in social data. In order To address the problem of accuracy, one effective approach is to carefully choose the most optimal set of features and minimize the number of features used. The Objective of the research paper is to select a small subset of features out of the thousands of features for accurate classification of spam review detection. Therefore, a good feature selection method is needed in order to speed up the processing rate and predictive accuracy. This paper, Harris Hawks Optimization (HHO), is utilized for feature selection in sentiment analysis tasks. The performance of the selected feature subsets was evaluated using SVM, X-GBoost, ETC classifiers. Experimental results on tweet reviews for the airline dataset demonstrated superior sentiment classification capabilities, achieving an accuracy of 0.9435% with SVM and 0.9607%, 0.9635% for X-Boost, ETC, respectively. 2025 2025 11 21 10.54216/JCIM.150102 https://www.americaspg.com/articleinfo/2/show/3068