Segmentation Word to Improve Performance Sentiment Analysis for Indonesian Language

Siti Mujilahwati; Noor Zuraidin M. Safar; Catur Supriyanto

doi:https://doi.org/10.54216/FPA.150213

Segmentation Word to Improve Performance Sentiment Analysis for Indonesian Language

Siti Mujilahwati ¹ , Noor Zuraidin M. Safar ² , Catur Supriyanto ³

1 Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn, MALAYSIA; Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn, Malaysia - (gi210037@student.uthm.edu.my)

2 Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn, Malaysia - (zuraidin@uthm.edu.my)

3 Informatic Engineering Department, Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia - (catur.supriyanto@dsn.dinus.ac.id)

Doi: https://doi.org/10.54216/FPA.150213

Received: August 17, 2023 Revised: December 02, 2023 Accepted: April 17, 2024

Abstract

This study explores the enhancement of accuracy in Indonesian sentiment analysis by incorporating text segmentation features during the pre-processing phase. One of the most important steps in creating a high-quality Bag of Words is to separate Indonesian sentences with no spacing, which is made possible by the created text segmentation algorithm. Through the conducted observations and analyses, it was observed that text comments from social media frequently exhibit connected sentences without spacing. The segmentation process was developed through a matching model utilizing a standard Indonesian word dictionary. Implementation involved testing Indonesian text data related to COVID-19 management, resulting in a substantial increase of 3,036 features. The Bag of Words was then constructed using the Term Frequency-Inverse Document Frequency method. Subsequently, sentiment analysis classification testing was conducted using both deep learning and machine learning models to assess data quality and accuracy. The sentiment analysis accuracy for applying Deep Learning, Support Vector Machine and Naive Bayes is 86.46%, 88.02% and 86.19% respectively.

Keywords :

Segmentation Text , Sentiment Analysis , Indonesian Language , CNN , SVM , Naï , ve Bayes.

References

[1] D. Chaffey, “Global social media statistics research summary 2023,” https://www.smartinsights.com/social-media-marketing/social-media-strategy/new-global-social-media-research/.

[2] R. Feldman and J. Sanger, The Text Mining Handbook. Cambridge: Cambridge University Press, 2006. doi: 10.1017/cbo9780511546914.

[3] “Mining Text Data.”

[4] F. A. Pozzi, E. Fersini, E. Messina, and B. Liu, Sentiment Analysis in Social Networks. 2016.

[5] Matthew A.Russell, “Mining the Social Web: Analyzing Data from Facebook,” Twitter, LinkedIn, and Other Social Media Sites, p. 428, 2019.

[6] F. A. Nugraha, N. H. Harani, R. Habibi, and Rd. N. S. Fatonah, “Sentiment Analysis on Social Distancing and Physical Distancing on Twitter Social Media using Recurrent Neural Network (RNN) Algorithm,” Jurnal Online Informatika, vol. 5, no. 2, 2020, doi: 10.15575/join.v5i2.632.

[7] S. Makinist, İ. R. Hallaç, B. Ay Karakuş, and G. Aydın, “Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media,” ITM Web of Conferences, vol. 13, 2017, doi: 10.1051/itmconf/20171301030.

[8] P. H. Prastyo, A. S. Sumi, A. W. Dian, and A. E. Permanasari, “Tweets Responding to the Indonesian Government’s Handling of COVID-19: Sentiment Analysis Using SVM with Normalized Poly Kernel,” Journal of Information Systems Engineering and Business Intelligence, vol. 6, no. 2, 2020, doi: 10.20473/jisebi.6.2.112-122.

[9] M. B. Ressan and R. F. Hassan, “Naïve-Bayes family for sentiment analysis during COVID-19 pandemic and classification tweets,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 28, no. 1, 2022, doi: 10.11591/ijeecs.v28.i1.pp375-383.

[10] P. Arsi, B. A. Kusuma, and A. Nurhakim, “Analisis Sentimen Pindah Ibu Kota Berbasis Naive Bayes Classifier,” Jurnal Informatika Upgris, vol. 7, no. 1, 2021, doi: 10.26877/jiu.v7i1.7636.

[11] A. Perdana, A. Hermawan, and D. Avianto, “Analisis Sentimen Terhadap Isu Penundaan Pemilu di Twitter Menggunakan Naive Bayes Clasifier,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 11, no. 2, pp. 195–200, Jul. 2022, doi: 10.32736/sisfokom.v11i2.1412.

[12] A. Erfina and M. Rifki Nurul, “Implementation of Naive Bayes classification algorithm for Twitter user sentiment analysis on ChatGPT using Python programming language,” Data & Metadata, vol. 2, p. 45, Jun. 2023, doi: 10.56294/dm202345.

[13] S. Hadianti et al., “ANALISIS SENTIMENT COVID-19 DI TWITTER MENGGUNAKAN METODE NAIVE BAYES DAN SVM,” Jurnal Teknologi Informasi), vol. 6, no. 1, [Online]. Available: www.Kaggle.com.

[14] P. Cen, K. Zhang, and D. Zheng, “Sentiment Analysis Using Deep Learning Approach,” vol. 2, no. 1, pp. 17–27, 2020, doi: 10.32604/jai.2020.010132.

[15] D. Tang et al., “Sentiment analysis using deep learning architectures: a review,” Artif Intell Rev, vol. 9, no. 2, pp. 4335–4385, 2020, doi: 10.1007/s10462-019-09794-5.

[16] Z. Amalia, M. Irfan, D. S. A. Maylawati, A. Wahana, W. B. Zulfikar, and M. A. Ramdhani, “Sentiment Analysis of the Use of Telecommunication Providers on Twitter Social Media using Convolutional Neural Network,” in 2022 IEEE 8th International Conference on Computing, Engineering and Design, ICCED 2022, 2022. doi: 10.1109/ICCED56140.2022.10010357.

[17] E. Y. Hidayat and D. Handayani, “Penerapan 1D-CNN untuk Analisis Sentimen Ulasan Produk Kosmetik Berdasar Female Daily Review,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 8, no. 3, pp. 153–163, Jan. 2023, doi: 10.25077/teknosi.v8i3.2022.153-163.

[18] A. Yunita, H. B. Santoso, and Z. A. Hasibuan, “Deep Learning for Predicting Students’ Academic Performance,” Proceedings of 2019 4th International Conference on Informatics and Computing, ICIC 2019, p. 8985721, Oct. 2019, doi: 10.1109/ICIC47613.2019.8985721.

[19] R. Ganda and A. Mahmood, “Deep Learning approach for sentiment analysis of short texts,” no. February 2018, 2017, doi: 10.1109/ICCAR.2017.7942788.

[20] “The effects of Pre-Processing Techniques on Arabic Text Classification,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 10, no. 1, 2021, doi: 10.30534/ijatcse/2021/061012021.

[21] R. Duwairi and M. El-Orfali, “A study of the effects of preprocessing strategies on sentiment analysis for Arabic text,” J Inf Sci, vol. 40, no. 4, pp. 501–513, 2014, doi: 10.1177/0165551514534143.

[22] H. M. Zin, N. Mustapha, M. A. A. Murad, and N. M. Sharef, “The effects of pre-processing strategies in sentiment analysis of online movie reviews,” AIP Conf Proc, vol. 1891, no. October 2017, 2017, doi: 10.1063/1.5005422.

[23] Y. S. Mehanna and M. Mahmuddin, “The Effect of Pre-processing Techniques on the Accuracy of Sentiment Analysis Using Bag-of-Concepts Text Representation,” SN Comput Sci, vol. 2, no. 4, 2021, doi: 10.1007/s42979-021-00453-7.

[24] M. U. Albab, Y. Karuniawati P, and M. N. Fawaiq, “Optimization of the Stemming Technique on Text preprocessing President 3 Periods Topic,” vol. 20, no. 2, pp. 1–10, 2023, doi: 10.26623/transformatika.v20i2.5374.

[25] Rianto, A. B. Mutiara, E. P. Wibowo, and P. I. Santosa, “Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation,” J Big Data, vol. 8, no. 1, 2021, doi: 10.1186/s40537-021-00413-1.

[26] Lamia Mohamed Ahmed, Gawaher Soliman Hussein, Abdel Nasser Hessin Zaied, A Survey on Sentiment Analysis Algorithms and Techniques For Arabic Textual Data, Journal of Fusion: Practice and Applications, Vol. 2 , No. 2 , (2020) : 74-87 (Doi : https://doi.org/10.54216/FPA.020205)

[27] R. Feldman and J. Sanger, “Text Mining Preprocessing Techniques,” in The Text Mining Handbook, Cambridge University Press, 2006, pp. 57–63. doi: 10.1017/CBO9780511546914.004.

[28] D. Rao and B. Mcmahan, “Natural Language Processing with PyTorch Build Intelligent Language Applications Using Deep Learning,” 2019.

[29] A. S. Alammary, “Arabic Questions Classification Using Modified TF-IDF,” IEEE Access, vol. 9, 2021, doi: 10.1109/ACCESS.2021.3094115.

[30] Vijay K, Collaborating The Textual Reviews Of The Merchandise and Foretelling The Rating Supported Social Sentiment, Journal of Journal of Cognitive Human-Computer Interaction, Vol. 1 , No. 2 , (2021) : 63 - 72 (Doi : DOI: https://doi.org/10.54216/JCHCI.010203)

[31] Moch. A. Nasichuddin, T. B. Adji, and W. Widyawan, “Performance Improvement Using CNN for Sentiment Analysis,” IJITEE (International Journal of Information Technology and Electrical Engineering), vol. 2, no. 1, 2018, doi: 10.22146/ijitee.36642.

[32] Praloy Biswas, A. Daniel, Subhrendu Guha Neogi, Spider Monkey Optimization with Deep Learning-based Hindi Short Text Sentiment Analysis, Journal of Journal of Intelligent Systems and Internet of Things, Vol. 12 , No. 1 , (2024) : 97-109 (Doi : https://doi.org/10.54216/JISIoT.120108)

[33] M. M. Khalid, & O. Karan, Deep Learning for Plant Disease Detection. International Journal of Mathematics, Statistics, and Computer Science, 2023, v. 2, 75–84.

[34] D. A. Prabowo, M. Fhadli, M. A. Najib, H. A. Fauzi, and I. Cholissodin, “TF-IDF-Enhanced Genetic Algorithm Untuk Extractive Automatic Text Summarization,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 3, no. 3, 2016, doi: 10.25126/jtiik.201633217.

[35] M. Chiny, M. Chihab, Y. Chihab, and O. Bencharef, “LSTM, VADER and TF-IDF based Hybrid Sentiment Analysis Model,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 7, 2021, doi: 10.14569/IJACSA.2021.0120730.

Cite This Article As :

Mujilahwati, Siti. , Zuraidin, Noor. , Supriyanto, Catur. Segmentation Word to Improve Performance Sentiment Analysis for Indonesian Language. Fusion: Practice and Applications, vol. , no. , 2024, pp. 145-154. DOI: https://doi.org/10.54216/FPA.150213

Mujilahwati, S. Zuraidin, N. Supriyanto, C. (2024). Segmentation Word to Improve Performance Sentiment Analysis for Indonesian Language. Fusion: Practice and Applications, (), 145-154. DOI: https://doi.org/10.54216/FPA.150213

Mujilahwati, Siti. Zuraidin, Noor. Supriyanto, Catur. Segmentation Word to Improve Performance Sentiment Analysis for Indonesian Language. Fusion: Practice and Applications , no. (2024): 145-154. DOI: https://doi.org/10.54216/FPA.150213

Mujilahwati, S. , Zuraidin, N. , Supriyanto, C. (2024) . Segmentation Word to Improve Performance Sentiment Analysis for Indonesian Language. Fusion: Practice and Applications , () , 145-154 . DOI: https://doi.org/10.54216/FPA.150213

Mujilahwati S. , Zuraidin N. , Supriyanto C. [2024]. Segmentation Word to Improve Performance Sentiment Analysis for Indonesian Language. Fusion: Practice and Applications. (): 145-154. DOI: https://doi.org/10.54216/FPA.150213

Mujilahwati, S. Zuraidin, N. Supriyanto, C. "Segmentation Word to Improve Performance Sentiment Analysis for Indonesian Language," Fusion: Practice and Applications, vol. , no. , pp. 145-154, 2024. DOI: https://doi.org/10.54216/FPA.150213

Fusion: Practice and Applications

Journal DOI

Journal Menu

Journal Volumes

Volume 1

Volume 2

Volume 3

Volume 4

Volume 5

Volume 6

Volume 7

Volume 8

Volume 9

Volume 10

Volume 11

Volume 12

Volume 13

Volume 14

Volume 15

Volume 16

Volume 17

Volume 18

Volume 19

Volume 20

Volume 21

Segmentation Word to Improve Performance Sentiment Analysis for Indonesian Language

Abstract

Keywords :

References

Cite This Article As :

Article Statistics

Download