Volume 20 , Issue 1 , PP: 12-23, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Huda Ragheb Kadhim 1 * , Rand Abdulwahid Albeer 2 , Dhamyaa A. Nasrawi 3 , Huda Hallawi 4 * , Muthanna Medin Nasser 5 , Ibrahim Haider Jabbar 6 , Burhan Karar Abbas 7
Doi: https://doi.org/10.54216/FPA.200102
Data compression technologies play a big role in various areas where efficient data storage and transmission are essential. Data compression is the science of reducing redundant data to a compact form, which used to safely store files or information. On the other side, Unicode is a global standard for the representation of text and symbols in computers. The basic elements of the Unicode standard are code points, which represent a specific symbol. Unicode provides a unified way to map and manage these points to ensure consistent representation and interpretation of text data across different systems, platforms, and languages. This paper proposes a method to compress texts in Arabic, based on Unicode ligatures, which typically join characters together. This method replaces two or more Unicode Arabic ligature characters with a single Unicode Arabic ligature based on their appearance in the Arabic text file, eliminating the need for coding or decoding. The size of the original and output text files has been compared to show the percentage of compression. The selected dataset: Modern Standard Arabic text involves Arabic news, and Classical Arabic text involves Arabic Holy and Honorific text collected from Kaggle. The percentage of compression depends on the frequency of ligature characters in Arabic documents. Unfortunately, the results were not promising, as the method was only able to compress the file to a very small percentage (6.71 %and 12.82 %, respectively, for Arabic news and Arabic Holy text). We think that the proposed method can be improved by using a hybrid technique of text compression in the future; in addition, consider other properties of Arabic Unicode. Programming can express competency concepts in a well-defined mathematical model for a particular.
Arabic ligatures characters Unicode , Compression , Decompression , Redundant data , Text compression
[1] M. J. Haque and M. N. Huda, “Study on data compression technique,” International Journal of Computer Applications, vol. 159, no. 5, pp. 6-13, 2017.
[2] I. M. Pu, Fundamental data compression, Butterworth-Heinemann, 2005.
[3] H. Jani and J. Trivedi, “A survey on different compression techniques algorithm for data compression,” International Journal of Advanced Research in Computer Science and Technology, vol. 2, no. 3, pp. 1-5, 2014.
[4] N. M. Norwawi and A. S. M. Alomoush, “LIGHTWEIGHT VERSION FOR DIGITAL QURAN MODEL BY HANDLING DUPLICATION,” PERINTIS eJournal, vol. 13, no. 1, pp. 69-76, 2023.
[5] P. Raundale, “Comparative Study of Data Compression Techniques,” International Journal of Computer Applications, vol. 178, no. 28, pp. 1-10, 2019.
[6] Z. M. Alasmer, B. M. Zahran, B. A. Ayyoub, M. A. Kanan, A. I. Hammouri, and J. Ababneh, “A Comparison between English and Arabic text compression,” Journal of Contemporary Engineering Sciences, vol. 6, no. 3, pp. 111-119, 2013.
[7] E. A. Jrai, S. Alsharari, L. Almazaydeh, K. Elleithy, and O. Abu-Hamdan, “Improving LZW Compression of Unicode Arabic Text Using Multi-Level Encoding and a Variable-Length Phrase Code,” IEEE Access, vol. 11, pp. 51915-51929, 2023.
[8] I. Guellil, H. Saâdane, F. Azouaou, B. Gueni, and D. Nouvel, “Arabic natural language processing: An overview,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 5, pp. 497-507, 2021.
[9] T. A. Hilal and H. A. Hilal, “Arabic text lossless compression by characters encoding,” Procedia Computer Science, vol. 155, pp. 618-623, 2019.
[10] S. A. Al-Busaeed and U. A. İnan, “A New Arabic Coding Scheme,” International Journal of Engineering and Natural Sciences, vol. 2, no. 3, pp. 22-28.
[11] M. AbuSafiya, “Speeding up Natural Language Text Search using Compression,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 4, 2021.
[12] M. Needleman, “The unicode standard,” Serials Review, vol. 26, no. 2, pp. 51-54, 2000.
[13] T. D. Kamusella, “The Arabic language: A Latin of modernity?,” Journal of Nationalism, Memory and Language Politics, 2017.
[14] D. A. AL-Nasrawi, A. F. Almukhtar, and W. S. AL-Baldawi, “From Arabic Alphabets to Two Dimension Shapes in Kufic Calligraphy Style Using Grid Board Catalog,” Communications in Applied Sciences, vol. 3, no. 2, 2015.
[15] Archived Code Charts, “CodeCharts_16.0”.
[16] B. Vijayalakshmi and N. Sasirekha, “Comparative Analysis of Lossless Text Compression Methods with Novel Tamil Compression Technique,” International Journal of Research in Engineering and Science (IJRES), vol. 9, no. 7, pp. 38-44, 2021.