Fusion: Practice and Applications

Journal DOI

https://doi.org/10.54216/FPA

Submit Your Paper

2692-4048ISSN (Online) 2770-0070ISSN (Print)

Volume 17 , Issue 2 , PP: 161-172, 2025 | Cite this article as | XML | Html | PDF | Full Length Article

A Comparative Analysis of Feature Extraction Techniques for Fake Reviews Detection

Zahraa Fadhel 1 * , Hussien Attia 2 , Yossra Hussain Ali 3

  • 1 Department of Computer Sciences, College of Science for Women, University of Babylon, Babylon, Iraq - (zahraa.alkhafaji.jsci140@student.uobabylon.edu.iq)
  • 2 Department of Computer Sciences, College of Science for Women, University of Babylon, Babylon, Iraq - (w‏sci.husein.attia@uobabylon.edu.iq)
  • 3 Department of Computer sciences, University of Technology, Baghdad, Iraq - (Yossra.H.Ali@uotechnology.edu.iq)
  • Doi: https://doi.org/10.54216/FPA.170212

    Received: January 29, 2024 Revised: April 25, 2024 Accepted: September 27, 2024
    Abstract

    The current Internet era is characterized by the widespread circulation of ideas and viewpoints among users across many social media platforms, such as microblogging sites, personal blogs, and reviews. Detecting fake reviews has become a widespread problem on digital platforms, posing a major challenge for both consumers and businesses. Due to the ever-increasing number of online reviews, it is no longer possible to manually identify fraudulent reviews. Artificial intelligence (AI) is essential in addressing the problem of identifying fake reviews. Feature extraction is a crucial stage in detecting fake reviews, and successful feature engineering techniques can significantly improve the accuracy of opinion extraction. The paper compares five feature extraction methods for multiple opinion classification using Twitter on airline and Borderland game reviews. FastText with X-GBoost classifier outperformed all other techniques, achieving 94.10% accuracy on the airline dataset and 100% accuracy in Borderland game reviews.

    Keywords :

    Feature extraction , Fake reviews , Natural language processing , FastText , X-GBoost

    References

    [1]    “Comparative Analysis of Feature Extraction,” pp. 1–13, 2022.

    [2]    B. Liu, “Sentiment analysis and subjectivity,” Handb. Nat. Lang. Process. Second Ed., no. January 2010, pp. 627–666, 2010.

    [3]    S. H. Imanuddin, K. Adi, and R. Gernowo, “Sentiment Analysis on Satusehat Application Using Support Vector Machine Method,” vol. 5, no. 3, pp. 143–149, 2023.

    [4]    M. Syamala and N. J. Nalini, “A filter based improved decision tree sentiment classification model for real-time amazon product review data,” Int. J. Intell. Eng. Syst., vol. 13, no. 1, pp. 191–202, 2020, doi: 10.22266/ijies2020.0229.18.

    [5]    U. Naseem, I. Razzak, M. Khushi, P. W. Eklund, and J. Kim, “COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis,” IEEE Trans. Comput. Soc. Syst., vol. 8, no. 4, pp. 976–988, 2021, doi: 10.1109/TCSS.2021.3051189.

    [6]    M. Umer et al., “Impact of convolutional neural network and FastText embedding on text classification,” Multimed. Tools Appl., vol. 82, no. 4, pp. 5569–5585, 2023, doi: 10.1007/s11042-022-13459-x.

    [7]    K. Sharifani, M. Amini, Y. Akbari, and J. A. Godarzi, “Operating Machine Learning across Natural Language Processing Techniques for Improvement of Fabricated News Model,” Int. J. Sci. Inf. Syst. Res., vol. 12, no. 9, pp. 20–44, 2022, [Online]. Available: https://www.researchgate.net/publication/364340252

    [8]    S. Sadiq, T. Aljrees, and S. Ullah, “Deepfake Detection on Social Media: Leveraging Deep Learning and FastText Embeddings for Identifying Machine-Generated Tweets,” IEEE Access, vol. 11, no. September, pp. 95008–95021, 2023, doi: 10.1109/ACCESS.2023.3308515.

    [9]    A. M. Asri, S. R. Ahmad, and N. M. M. Yusop, “Feature Selection using Particle Swarm Optimization for Sentiment Analysis of Drug Reviews,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 5, pp. 286–295, 2023, doi: 10.14569/IJACSA.2023.0140530.

    [10]  K. Karthikeya, “Sentiment Analysis of Tweets Using Logistic Regression and Neural Networks with Emojis and Emoticons Sentiment Analysis of Tweets Using Logistic Regression and Neural Networks with Emojis and Emoticons,” pp. 0–6, 2023.

    [11]  D. Sameh, G. Khoriba, and M. Haggag, “Behaviour analysis voting model using social media data,” Int. J. Intell. Eng. Syst., vol. 12, no. 2, pp. 211–221, 2019, doi: 10.22266/IJIES2019.0430.21.

    [12]  W. H. Asaad, R. Allami, and Y. H. Ali, “Fake Review Detection Using Machine Learning,” Rev. d’Intelligence Artif., vol. 37, no. 5, pp. 1159–1166, 2023, doi: 10.18280/ria.370507.

    [13]  T. Hasan, A. Matin, M. Kamruzzaman, S. Islam, and M. O. F. Goni, “A Comparative Analysis of Feature Extraction Methods for Human Opinion Grouping Using Several Machine Learning Techniques,” Proc. 2020 IEEE Int. Women Eng. Conf. Electr. Comput. Eng. WIECON-ECE 2020, pp. 272–275, 2020, doi: 10.1109/WIECON-ECE52138.2020.9398025.

    [14]  M. Qorib, T. Oladunni, M. Denis, E. Ososanya, and P. Cotae, “Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset,” Expert Syst. Appl., vol. 212, no. September 2022, p. 118715, 2023, doi: 10.1016/j.eswa.2022.118715.

    [15]  M. Siino, I. Tinnirello, and M. La Cascia, “Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers,” Inf. Syst., vol. 121, no. December 2023, p. 102342, 2024, doi: 10.1016/j.is.2023.102342.

    [16]  S. Bird, E. Klein, and E. Loper, LIVRO: cookbook Natural Language Processing with Python, vol. 28, no. 4. 2009. [Online]. Available: https://www.oreilly.com/library/view/natural-language-processing/9780596803346/

    [17]  K. L. ANUSHA, P. and PRASAD, “Survey on Fake Online Reviews Using Machine Learning Algorithms,” J. Crit. Rev., vol. 7, no. 18, pp. 2752–2758, 2020.

    [18]  S. N. Alsubari et al., “Data analytics for the identification of fake reviews using supervised learning,” Comput. Mater. Contin., vol. 70, no. 2, pp. 3189–3204, 2022, doi: 10.32604/cmc.2022.019625.

    [19]  L. Ma and Y. Zhang, “Using Word2Vec to process big text data,” Proc. - 2015 IEEE Int. Conf. Big Data, IEEE Big Data 2015, pp. 2895–2897, 2015, doi: 10.1109/BigData.2015.7364114.

    [20]  HANSON ER, “Musicassette Interchangeability. the Facts Behind the Facts,” AES J. Audio Eng. Soc., vol. 19, no. 5, pp. 417–425, 1971.

    [21]  A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep learning architectures: a review,” Artif. Intell. Rev., vol. 53, no. 6, pp. 4335–4385, 2020, doi: 10.1007/s10462-019-09794-5.

    [22]  A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” 15th Conf. Eur. Chapter Assoc. Comput. Linguist. EACL 2017 - Proc. Conf., vol. 2, pp. 427–431, 2017, doi: 10.18653/v1/e17-2068.

    [23]  P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,” Trans. Assoc. Comput. Linguist., vol. 5, pp. 135–146, 2017, doi: 10.1162/tacl_a_00051.

    [24]  J. Park, S. Kwon, and S. P. Jeong, “A study on improving turnover intention forecasting by solving imbalanced data problems: focusing on SMOTE and generative adversarial networks,” J. Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537-023-00715-6.

    [25]  J. Yao, Y. Zheng, and H. Jiang, “An Ensemble Model for Fake Online Review Detection Based on Data Resampling, Feature Pruning, and Parameter Optimization,” IEEE Access, vol. 9, pp. 16914–16927, 2021, doi: 10.1109/ACCESS.2021.3051174.

    [26]  S. Hakak, M. Alazab, S. Khan, T. R. Gadekallu, P. K. R. Maddikunta, and W. Z. Khan, “An ensemble machine learning approach through effective feature extraction to classify fake news,” Futur. Gener. Comput. Syst., vol. 117, pp. 47–58, 2021, doi: 10.1016/j.future.2020.11.022.

    [27]  F. Rustam, M. Khalid, W. Aslam, V. Rupapara, A. Mehmood, and G. S. Choi, “A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis,” PLoS One, vol. 16, no. 2, pp. 1–23, 2021, doi: 10.1371/journal.pone.0245909.

    [28]  D. R. Nahma, “Patient Opinion Mining : Analysis of Patient Drugs Satisfaction using Support Vector Machine and Logistic Regression algorithm ضيرملا اضر ليلحت : ضيرملا يأ ر نيدعت يتسجوللا رادحنلاا ةيمزراوخو ةمعادلا تاهجتملا ةلآ مادختساب ةيودلأا نع,” vol. 12, no. 2, pp. 164–171, 2020.

    [29]  M. Avinash and E. Sivasankar, A study of feature extraction techniques for sentiment analysis, vol. 814. Springer Singapore, 2019. doi: 10.1007/978-981-13-1501-5_41.

    Cite This Article As :
    Fadhel, Zahraa. , Attia, Hussien. , Hussain, Yossra. A Comparative Analysis of Feature Extraction Techniques for Fake Reviews Detection. Fusion: Practice and Applications, vol. , no. , 2025, pp. 161-172. DOI: https://doi.org/10.54216/FPA.170212
    Fadhel, Z. Attia, H. Hussain, Y. (2025). A Comparative Analysis of Feature Extraction Techniques for Fake Reviews Detection. Fusion: Practice and Applications, (), 161-172. DOI: https://doi.org/10.54216/FPA.170212
    Fadhel, Zahraa. Attia, Hussien. Hussain, Yossra. A Comparative Analysis of Feature Extraction Techniques for Fake Reviews Detection. Fusion: Practice and Applications , no. (2025): 161-172. DOI: https://doi.org/10.54216/FPA.170212
    Fadhel, Z. , Attia, H. , Hussain, Y. (2025) . A Comparative Analysis of Feature Extraction Techniques for Fake Reviews Detection. Fusion: Practice and Applications , () , 161-172 . DOI: https://doi.org/10.54216/FPA.170212
    Fadhel Z. , Attia H. , Hussain Y. [2025]. A Comparative Analysis of Feature Extraction Techniques for Fake Reviews Detection. Fusion: Practice and Applications. (): 161-172. DOI: https://doi.org/10.54216/FPA.170212
    Fadhel, Z. Attia, H. Hussain, Y. "A Comparative Analysis of Feature Extraction Techniques for Fake Reviews Detection," Fusion: Practice and Applications, vol. , no. , pp. 161-172, 2025. DOI: https://doi.org/10.54216/FPA.170212