Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode

Huda Ragheb Kadhim; Rand Abdulwahid Albeer; Dhamyaa A. Nasrawi; Huda Hallawi; Muthanna Medin Nasser; Ibrahim Haider Jabbar; Burhan Karar Abbas

doi:https://doi.org/10.54216/FPA.200102

Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode

Huda Ragheb Kadhim ^{1
*} , Rand Abdulwahid Albeer ² , Dhamyaa A. Nasrawi ³ , Huda Hallawi ^{4
*} , Muthanna Medin Nasser ⁵ , Ibrahim Haider Jabbar ⁶ , Burhan Karar Abbas ⁷

1 College of Computer Science and Information Technology, University of Kerbala, Iraq - (huda.raghib@uokerbala.edu.iq)

2 College of Computer Science and Information Technology, University of Kerbala, Iraq - (rand.a@uokerbala.edu.iq)

3 College of Computer Science and Information Technology, University of Kerbala, Iraq - (dh.alnasrawy@uokerbala.edu.iq)

4 College of Computer Science and Information Technology, University of Kerbala, Iraq - (huda.f@uokerbala.edu.iq)

5 College of Computer Science and Information Technology, University of Kerbala, Iraq - (e17201231@s.uokerbala.edu.iq)

6 College of Computer Science and Information Technology, University of Kerbala, Iraq - (e17201209@s.uokerbala.edu.iq)

7 College of Computer Science and Information Technology, University of Kerbala, Iraq - (e17201243@s.uokerbala.edu.iq)

Doi: https://doi.org/10.54216/FPA.200102

Received: December 14, 2024 Revised: February 01, 2025 Accepted: April 02, 2025

Abstract

Data compression technologies play a big role in various areas where efficient data storage and transmission are essential. Data compression is the science of reducing redundant data to a compact form, which used to safely store files or information. On the other side, Unicode is a global standard for the representation of text and symbols in computers. The basic elements of the Unicode standard are code points, which represent a specific symbol. Unicode provides a unified way to map and manage these points to ensure consistent representation and interpretation of text data across different systems, platforms, and languages. This paper proposes a method to compress texts in Arabic, based on Unicode ligatures, which typically join characters together. This method replaces two or more Unicode Arabic ligature characters with a single Unicode Arabic ligature based on their appearance in the Arabic text file, eliminating the need for coding or decoding. The size of the original and output text files has been compared to show the percentage of compression. The selected dataset: Modern Standard Arabic text involves Arabic news, and Classical Arabic text involves Arabic Holy and Honorific text collected from Kaggle. The percentage of compression depends on the frequency of ligature characters in Arabic documents. Unfortunately, the results were not promising, as the method was only able to compress the file to a very small percentage (6.71 %and 12.82 %, respectively, for Arabic news and Arabic Holy text). We think that the proposed method can be improved by using a hybrid technique of text compression in the future; in addition, consider other properties of Arabic Unicode. Programming can express competency concepts in a well-defined mathematical model for a particular.

Keywords :

Arabic ligatures characters Unicode , Compression , Decompression , Redundant data , Text compression

References

[1] M. J. Haque and M. N. Huda, “Study on data compression technique,” International Journal of Computer Applications, vol. 159, no. 5, pp. 6-13, 2017.

[2] I. M. Pu, Fundamental data compression, Butterworth-Heinemann, 2005.

[3] H. Jani and J. Trivedi, “A survey on different compression techniques algorithm for data compression,” International Journal of Advanced Research in Computer Science and Technology, vol. 2, no. 3, pp. 1-5, 2014.

[4] N. M. Norwawi and A. S. M. Alomoush, “LIGHTWEIGHT VERSION FOR DIGITAL QURAN MODEL BY HANDLING DUPLICATION,” PERINTIS eJournal, vol. 13, no. 1, pp. 69-76, 2023.

[5] P. Raundale, “Comparative Study of Data Compression Techniques,” International Journal of Computer Applications, vol. 178, no. 28, pp. 1-10, 2019.

[6] Z. M. Alasmer, B. M. Zahran, B. A. Ayyoub, M. A. Kanan, A. I. Hammouri, and J. Ababneh, “A Comparison between English and Arabic text compression,” Journal of Contemporary Engineering Sciences, vol. 6, no. 3, pp. 111-119, 2013.

[7] E. A. Jrai, S. Alsharari, L. Almazaydeh, K. Elleithy, and O. Abu-Hamdan, “Improving LZW Compression of Unicode Arabic Text Using Multi-Level Encoding and a Variable-Length Phrase Code,” IEEE Access, vol. 11, pp. 51915-51929, 2023.

[8] I. Guellil, H. Saâdane, F. Azouaou, B. Gueni, and D. Nouvel, “Arabic natural language processing: An overview,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 5, pp. 497-507, 2021.

[9] T. A. Hilal and H. A. Hilal, “Arabic text lossless compression by characters encoding,” Procedia Computer Science, vol. 155, pp. 618-623, 2019.

[10] S. A. Al-Busaeed and U. A. İnan, “A New Arabic Coding Scheme,” International Journal of Engineering and Natural Sciences, vol. 2, no. 3, pp. 22-28.

[11] M. AbuSafiya, “Speeding up Natural Language Text Search using Compression,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 4, 2021.

[12] M. Needleman, “The unicode standard,” Serials Review, vol. 26, no. 2, pp. 51-54, 2000.

[13] T. D. Kamusella, “The Arabic language: A Latin of modernity?,” Journal of Nationalism, Memory and Language Politics, 2017.

[14] D. A. AL-Nasrawi, A. F. Almukhtar, and W. S. AL-Baldawi, “From Arabic Alphabets to Two Dimension Shapes in Kufic Calligraphy Style Using Grid Board Catalog,” Communications in Applied Sciences, vol. 3, no. 2, 2015.

[15] Archived Code Charts, “CodeCharts_16.0”.

[16] B. Vijayalakshmi and N. Sasirekha, “Comparative Analysis of Lossless Text Compression Methods with Novel Tamil Compression Technique,” International Journal of Research in Engineering and Science (IJRES), vol. 9, no. 7, pp. 38-44, 2021.

Cite This Article As :

Ragheb, Huda. , Abdulwahid, Rand. , A., Dhamyaa. , Hallawi, Huda. , Medin, Muthanna. , Haider, Ibrahim. , Karar, Burhan. Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode. Fusion: Practice and Applications, vol. , no. , 2025, pp. 12-23. DOI: https://doi.org/10.54216/FPA.200102

Ragheb, H. Abdulwahid, R. A., D. Hallawi, H. Medin, M. Haider, I. Karar, B. (2025). Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode. Fusion: Practice and Applications, (), 12-23. DOI: https://doi.org/10.54216/FPA.200102

Ragheb, Huda. Abdulwahid, Rand. A., Dhamyaa. Hallawi, Huda. Medin, Muthanna. Haider, Ibrahim. Karar, Burhan. Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode. Fusion: Practice and Applications , no. (2025): 12-23. DOI: https://doi.org/10.54216/FPA.200102

Ragheb, H. , Abdulwahid, R. , A., D. , Hallawi, H. , Medin, M. , Haider, I. , Karar, B. (2025) . Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode. Fusion: Practice and Applications , () , 12-23 . DOI: https://doi.org/10.54216/FPA.200102

Ragheb H. , Abdulwahid R. , A. D. , Hallawi H. , Medin M. , Haider I. , Karar B. [2025]. Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode. Fusion: Practice and Applications. (): 12-23. DOI: https://doi.org/10.54216/FPA.200102

Ragheb, H. Abdulwahid, R. A., D. Hallawi, H. Medin, M. Haider, I. Karar, B. "Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode," Fusion: Practice and Applications, vol. , no. , pp. 12-23, 2025. DOI: https://doi.org/10.54216/FPA.200102

Fusion: Practice and Applications

Journal Menu

Journal Volumes

Volume 1

Volume 2

Volume 3

Volume 4

Volume 5

Volume 6

Volume 7

Volume 8

Volume 9

Volume 10

Volume 11

Volume 12

Volume 13

Volume 14

Volume 15

Volume 16

Volume 17

Volume 18

Volume 19

Volume 20

Volume 21

Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode

Abstract

Keywords :

References

Cite This Article As :

Article Statistics

Download