Volume 18 , Issue 1 , PP: 249-260, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Wessam Lahmod Nados 1 * , Behrooz Minaei Bidgoli 2 , Sayyed Sauleh Eetemadi 3 , Mohammad Ebrahim Shenasa 4 , Seyyed Ali Hosseini 5
Doi: https://doi.org/10.54216/FPA.180117
This paper focuses on the training, evaluation and development of named entity recognition (NER) models designed for Islamic hadiths in Arabic Utilizing the Hadith Noor dataset, the study uses the BIO (Basic, In, Out) tagging scheme to classify words or tokens in NER tasks and the segmentation of the text into individual tokens. The right-skewed distribution revealed by examining the lengths of the Islamic hadiths revealed a right-skewed distribution, indicating that shorter texts are more common. Texts less than 100 words were most prevalent, followed by texts between 100 and 200 words, while texts longer than 200 words were rare. The dataset identifies eight types of entities, such as common names among narrators and locations. The study by training the three models AraBERT, LSTM and the hybrid model AraBERT-LSTM on Arabic text processing respectively, the hybrid model showed a performance, efficiency and accuracy of 0.981, outperforming the rest of the models, confirming its worth and reliability in NER tasks for natural language in Arabic, especially Islamic hadiths, which opens the way for exploring further investigations for future research in natural language processing.
NER , Entity Recognition , Islamic Hadiths , Noor Al-Hadith dataset , BIO and Hybrid LSTM and AraBERT
[1] S. Alanazi, "Thesis: A Named Entity Recognition System Applied to Arabic Text in the Medical Domain," Staffordshire University, 2017.
[2] M. Oudah and K. Shaalan, "NERA 2.0: Improving coverage and performance of rule-based named entity recognition for Arabic," Nat. Lang. Eng., vol. 23, no. 3, pp. 441–472, 2017.
[3] M. Alkaoud and M. Syed, "On the importance of tokenization in Arabic embedding models," in Proceedings of the Fifth Arabic Natural Language Processing Workshop, 2020, pp. 119–129.
[4] H. Nayel, N. Marzouk, and A. Elsawy, "Named Entity Recognition for Arabic Medical Texts Using Deep Learning Models," in 2023 Intelligent Methods, Systems, and Applications (IMSA), IEEE, 2023, pp. 281–285.
[5] W. Yoon, R. Jackson, A. Lagerberg, and J. Kang, "Sequence tagging for biomedical extractive question answering," Bioinformatics, vol. 38, no. 15, pp. 3794–3801, 2022, DOI: https://doi.org/10.1093/bioinformatics/btac397.
[6] N. Garg and K. Sharma, "Text pre-processing of multilingual for sentiment analysis based on social network data," Int. J. Electr. Comput. Eng., vol. 12, no. 1, pp. 776–784, 2022, DOI: 10.11591/ijece.v12i1.pp776-784.
[7] M. A. Siddiqui, M. E. Saleh, and A. A. Bagais, "Extraction and Visualization of the Chain of Narrators from Hadiths using Named Entity Recognition and Classification," Int. J. Comput. Linguist. Res., vol. 5, no. 1, pp. 14–25, 2014.
[8] M. A. Aslam et al., "Improving Arabic Multi-Label Emotion Classification using Stacked Embeddings and Hybrid Loss Function," 2024. [Online]. Available: http://arxiv.org/abs/2410.03979.
[9] H. Saleh, S. Mostafa, L. A. Gabralla, A. O. Aseeri, and S. El-Sappagh, "Enhanced Arabic Sentiment Analysis Using a Novel Stacking Ensemble of Hybrid and Deep Learning Models," Appl. Sci., vol. 12, no. 18, pp. 1–25, 2022, DOI: 10.3390/app12188967.
[10] W. Alosaimi et al., "ArabBert-LSTM: improving Arabic sentiment analysis based on transformer model and Long Short-Term Memory," Front. Artif. Intell., vol. 7, 2024.
[11] A. Alrayzah, F. Alsolami, and M. Saleh, "Challenges and opportunities for Arabic question-answering systems: current techniques and future directions," PeerJ Comput. Sci., vol. 9, pp. 1–62, 2023, DOI: 10.7717/peerj-cs.1633.
[12] S. Szabó, I. J. Holb, V. É. Abriha-Molnár, G. Szatmári, S. K. Singh, and D. Abriha, "Classification Assessment Tool: A program to measure the uncertainty of classification models in terms of class-level metrics," Appl. Soft Comput., vol. 155, no. April 2023, 2024, DOI: 10.1016/j.asoc.2024.111468.
[13] A. S. Alammary, "BERT Models for Arabic Text Classification: A Systematic Review," Appl. Sci., vol. 12, no. 11, 2022, DOI: 10.3390/app12115720.
[14] F. Hedhli and M. Kboubi, "CNN-BiLSTM Model for Arabic Dialect Identification," in Advances in Intelligent Systems and Computing, vol. 1864, Springer, Cham, 2023, pp. 213–225.
[15] A. Youssef, M. Elattar, and S. R. El-Beltagy, "A Multi-Embeddings Approach Coupled with Deep Learning for Arabic Named Entity Recognition," in 2nd Nov. Intell. Lead. Emerg. Sci. Conf. NILES, IEEE, 2020, pp. 456–460, DOI: 10.1109/NILES50944.2020.9257975.
[16] M. Al-Qurishi and R. Souissi, "Arabic Named Entity Recognition Using Transformer-based-CRF Model," in ICNLSP 2021 - Proceedings of the 4th International Conference on Natural Language and Speech Processing, 2021, pp. 262–271.
[17] N. Habbat, H. Anoun, and L. Hassouni, "A Novel Hybrid Network for Arabic Sentiment Analysis using fine-tuned AraBERT model," Int. J. Electr. Eng. Informatics, vol. 13, no. 4, pp. 801–812, 2021, DOI: 10.15676/ijeei.2021.13.4.3.
[18] E. Y. Daraghmi et al., "From Text to Insight: An Integrated CNN-BiLSTM-GRU Model for Arabic Cyberbullying Detection," IEEE Access, 2024.
[19] N. Alsaaran and M. Alrabiah, "Classical Arabic Named Entity Recognition Using Variant Deep Neural Network Architectures and BERT," IEEE Access, vol. 9, pp. 91537–91547, 2021, DOI: 10.1109/ACCESS.2021.3092261.
[20] A. H. Abo-Elghit, T. Hamza, and A. Al-Zoghby, "Embedding Extraction for Arabic Text Using Transformer Models," Comput. Mater. Contin., vol. 72, no. 1, pp. 1968-, 2022, DOI: 10.32604/cmc.
[21] K. L. Tan, C. P. Lee, K. S. M. Anbananthen, and K. M. Lim, "RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network," IEEE Access, vol. 10, pp. 21517–21525, 2022, DOI: 10.1109/ACCESS.2022.3152828.
[22] E. T. Luthfi, Z. I. M. Yusoh, and B. M. Aboobaider, "BERT based Named Entity Recognition for Automated Hadith Narrator Identification," Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 1, pp. 604–611, 2022, DOI: 10.14569/IJACSA.2022.0130173.
[23] H. AlShuhayeb, B. Minaei-Bidgoli, M. E. Shenassa, and S.-A. Hossayni, "Noor-Ghateh: A Benchmark Dataset for Evaluating Arabic Word Segmenters in Hadith Domain," pp. 153–164, 2023. [Online]. Available: http://arxiv.org/abs/2307.09630.
[24] S. Ahmed et al., "Tafsir Dataset: A Novel Multi-Task Benchmark for Named Entity Recognition and Topic Modeling in Classical Arabic Literature," in Proceedings - International Conference on Computational Linguistics, COLING, 2022, pp. 3753–3768. [Online]. Available: https://gk.islamweb.net.
[25] K. Gaanoun and M. Alsuhaibani, "Fabricated Hadith Detection: A Novel Matn-Based Approach With Transformer Language Models," IEEE Access, vol. 10, pp. 113330–113342, 2022, DOI: 10.1109/ACCESS.2022.3217457.
[26] J. Dodge et al., "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus," no. Table 1, 2020.
[27] J. Kaur, "A Systematic Review on Stopword Removal Algorithms," no. April 2018, 2021.
[28] F. Qarah and T. Alsanoosy, "A Comprehensive Analysis of Various Tokenizers for Arabic Large Language Models," Appl. Sci., vol. 14, no. 13, p. 5696, 2024.
[29] C. Sabty, "Computational Approaches to Arabic-English Code-Switching," arXiv Prepr. arXiv2410.13318, 2024.
[30] X. Qu, Y. Gu, Q. Xia, Z. Li, Z. Wang, and B. Huai, "A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and Future Trends," 2024, DOI: 10.1109/TKDE.2023.3303136.
[31] I. Guellil, H. Saâdane, F. Azouaou, B. Gueni, and D. Nouvel, "Arabic Natural Language Processing: An Overview," J. King Saud Univ. Inf. Sci., vol. 33, no. 5, pp. 497–507, 2021.
[32] A. M. Alsugair and N. S. Alghamdi, "Sentiment Analysis of Arabic Tweets using ARABERT as a Fine-Tuner and Feature Extractors," in 2024 11th IEEE Swiss Conference on Data Science (SDS), IEEE, 2024, pp. 31–36.
[33] W. Antoun, F. Baly, and H. Hajj, "Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools," Arab. Transform. Model Arab. Lang. Underst., no. May, pp. 9–15, 2020. [Online]. Available
[34] A. Karami, M. M. Aghaee, and S. M. M. T. Sadeghi, "Arabic Text Classification Using Deep Learning: A Survey," J. King Saud Univ. Comput. Inf. Sci., vol. 34, no. 8, pp. 3131-3142, 2022.
[35] Z. Alharbi, A. A. Aly, and M. S. Al-Ghamdi, "Deep Learning Models for Arabic Hate Speech Detection: A Survey," Neural Comput. Appl., vol. 34, no. 9, pp. 6757–6775, 2022, DOI: 10.1007/s00542-021-06757-0.
[36] D. R. Khamis, H. A. R. Salama, and S. M. Ragab, "Arabic Text Summarization Based on Hybrid Deep Learning Models," J. Comput. Sci., vol. 12, no. 6, pp. 215–229, 2023, DOI: 10.1109/JCS.2023.3349852.
[37] H. Mohamed, A. M. Khalifa, and Y. K. Alam, "A Hybrid Deep Learning Model for Arabic Named Entity Recognition," in Proceedings of the 2022 IEEE International Conference on Big Data (BigData), IEEE, 2022, pp. 1238–1243, DOI: 10.1109/BigData52589.2022.00056.
[38] H. A. Hassan, A. M. Kassem, and O. M. Abualigah, "Arabic Text Mining for Detection of Fake News Using Supervised and Unsupervised Learning Algorithms," Int. J. Data Min. Bioinforma., vol. 24, no. 2, pp. 345–366, 2023, DOI: 10.1504/IJDMB.2023.122976.
[39] M. B. Samir, A. S. Abo-Elghit, and M. M. T. Al-Tekreeti, "A Comprehensive Review of Arabic Text Preprocessing Techniques," J. Comput. Sci. Technol., vol. 10, no. 7, pp. 567–579, 2024.
[40] M. R. Kaddour, H. M. Ahmed, and I. M. O. El-Etr, "Building an Arabic Text Classification System Using Hybrid Feature Selection Methods," Future Gener. Comput. Syst., vol. 105, pp. 59–72, 2023, DOI: 10.1016/j.future.2020.07.008.