Volume 21 , Issue 1 , PP: 165-175, 2026 | Cite this article as | XML | Html | PDF | Full Length Article
Heba Adnan Raheem 1 * , Hiba Jabbar Aleqabie 2 , Ameer Sameer Hamood Mohammed Ali 3
Doi: https://doi.org/10.54216/FPA.210112
The task of automatically generating descriptive and accurate image tags has gained significant attention in recent years due to the exponential growth of image data. Traditional methods for image tagging rely on manual annotation, which is time-consuming and subjective. Automated imagine description fills the gap between visual content and human comprehension, making it vital for activities such as information retrieval, editing, and accessibility. The expanding number of unannotated photographs makes manual tagging impossible. This paper provides a deep learning-based system that combines CNNs for feature extraction, RNNs for caption production, and attention techniques to focus on significant image areas. The model uses a sequence-to-sequence architecture to create coherent captions using pre-trained CNN features and attention-enhanced RNNs. Experiments on datasets such as Flickr8k and Flickr30k show higher performance, as evidenced by BLEU, ROUGE, and CIDEr measures. This approach provides a scalable, cutting-edge solution for image captioning, with potential applications in video analysis, enriched language production, and larger datasets.
CNN , Deep learning , Feature extraction , Image processing , Tag generation
[1] M. M. Adnan, M. S. M. Rahim, A. Rehman, Z. Mehmood, T. Saba, and R. A. Naqvi, "Automatic image annotation based on deep learning models: a systematic review and future challenges," IEEE Access, vol. 9, pp. 50253-50264, 2021. doi: 10.1109/ACCESS.2021.3067244.
[2] Y. Chen et al., "The image annotation algorithm using convolutional features from intermediate layer of deep learning," Multimedia Tools and Applications, vol. 80, pp. 4237-4261, 2021. doi: 10.1007/s11042-021-10654-3.
[3] B. Ionescu et al., "Overview of the ImageCLEF 2024: Multimedia retrieval in medical applications," in Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2024: Springer, Cham, pp. 140-164. doi: 10.1007/978-3-031-08882-1_11.
[4] V. R. Allugunti, "A machine learning model for skin disease classification using convolution neural network," International Journal of Computing, Programming and Database Management, vol. 3, no. 1, pp. 141-147, 2022. doi: 10.5120/ijcpdm.v3i1.1948.
[5] P. Bansal, R. Kumar, and S. Kumar, "Disease detection in apple leaves using deep convolutional neural network," Agriculture, vol. 11, no. 7, p. 617, 2021. doi: 10.3390/agriculture11070617.
[6] H. Shirmard et al., "A comparative study of convolutional neural networks and conventional machine learning models for lithological mapping using remote sensing data," Remote Sensing, vol. 14, no. 4, p. 819, 2022. doi: 10.3390/rs14040819.
[7] S. Suganyadevi, V. Seethalakshmi, and K. Balasamy, "A review on deep learning in medical image analysis," International Journal of Multimedia Information Retrieval, vol. 11, pp. 19-38, 2022. doi: 10.1007/s13735-021-00250-8.
[8] J. Wang, H. Zhu, S. H. Wang, and Y. D. Zhang, "A review of deep learning on medical image analysis," Mobile Networks and Applications, vol. 26, no. 1, pp. 351-380, 2021. doi: 10.1007/s11036-020-00692-0.
[9] R. Wang, T. Lei, R. Cui, B. Zhang, H. Meng, and A. K. Nandi, "Medical image segmentation using deep learning: A survey," IET Image Processing, vol. 16, no. 5, pp. 1243-1267, 2022. doi: 10.1049/ipr2.12035.
[10] P. Aggarwal, N. K. Mishra, B. Fatimah, P. Singh, A. Gupta, and S. D. Joshi, "COVID-19 image classification using deep learning: Advances, challenges and opportunities," Computers in Biology and Medicine, vol. 144, p. 105350, 2022. doi: 10.1016/j.compbiomed.2022.105350.
[11] K. Choudhary et al., "Recent advances and applications of deep learning methods in materials science," npj Computational Materials, vol. 8, p. 59, 2022. doi: 10.1038/s41524-022-00684-6.
[12] T. Ghandi, H. Pourreza, and H. Mahyar, "Deep learning approaches on image captioning: A review," ACM Computing Surveys, vol. 56, no. 3, pp. 1-39, 2023. doi: 10.1145/3560303.
[13] M. M. Taye, "Understanding of machine learning with deep learning: architectures, workflow, applications and future directions," Computers, vol. 12, no. 5, p. 91, 2023. doi: 10.3390/computers12050091.
[14] J. Wang, H. Zhang, Y. Zhong, Y. Liang, R. Ji, and Y. Cang, "Advanced Multimodal Deep Learning Architecture for Image-Text Matching," in 2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI), Changchun, China, 2024: IEEE, pp. 1185-1191. doi: 10.1109/ICETCI52956.2024.00183.
[15] M. Poongodi, M. Hamdi, and H. Wang, "Image and audio caps: automated captioning of background sounds and images using deep learning," Multimedia Systems, vol. 29, pp. 2951-2959, 2023. doi: 10.1007/s00530-022-00923-4.
[16] O. Sidorov, R. Hu, M. Rohrbach, and A. Singh, "Textcaps: a dataset for image captioning with reading comprehension," in Computer Vision–ECCV 2020, Glasgow, UK, 2020: Springer, Cham, pp. 742-758. doi: 10.1007/978-3-030-58452-8_46.
[17] M. A. Al-Malla, A. Jafar, and N. Ghneim, "Image captioning model using attention and object features to mimic human image understanding," Journal of Big Data, vol. 9, p. 20, 2022. doi: 10.1186/s40537-022-00293-1.
[18] Y. Li, Y. Pan, T. Yao, and T. Mei, "Comprehending and ordering semantics for image captioning," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: CVF, pp. 17990-17999. doi: 10.1109/CVPR52688.2022.01784.
[19] M. Stefanini et al., "From show to tell: A survey on deep learning-based image captioning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 539-559, 2022. doi: 10.1109/TPAMI.2021.3056308.
[20] M. Tsuneki, "Deep learning models in medical image analysis," Journal of Oral Biosciences, vol. 64, no. 3, pp. 312-320, 2022. doi: 10.1016/j.job.2022.04.003.
[21] X.-S. Wei et al., "Fine-grained image analysis with deep learning: A survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 8927-8948, 2021. doi: 10.1109/TPAMI.2021.3075240.
[22] S. Chakraborty and K. Mali, "An overview of biomedical image analysis from the deep learning perspective," in Applications of advanced machine intelligence in computer vision and object recognition: emerging research and opportunities, IGI Global Scientific Publishing, 2020, pp. 197-218. doi: 10.4018/978-1-7998-2441-3.ch010.
[23] M. Salvi, U. R. Acharya, F. Molinari, and K. M. Meiburger, "The impact of pre-and post-image processing techniques on deep learning frameworks: A comprehensive review for digital pathology image analysis," Computers in Biology and Medicine, vol. 128, p. 104129, 2021. doi: 10.1016/j.compbiomed.2020.104129.
[24] GitHub, "Flickr8k Dataset," 2019. [Online]. Available: https://github.com/jbrownlee/Datasets/releases/tag/Flickr8k.
[25] S. R. Waheed, M. S. M. Rahim, N. M. Suaib, and A. Salim, "RETRACTED ARTICLE: CNN deep learning-based image to vector depiction," Multimedia Tools and Applications, vol. 82, pp. 20283-20302, 2023. doi: 10.1007/s11042-023-13486-4.