Volume 13 , Issue 2 , PP: 136-144, 2023 | Cite this article as | XML | Html | PDF | Full Length Article
Suha Dh. Athab 1 * , Abdulamir A. Karim 2
Doi: https://doi.org/10.54216/FPA.130212
This paper presents a tagging model used the Segmentation map as reference regions. The suggested model leverages an encoder-decoder architecture combined with a proposal layer and dense layers for accurate object tagging and segmentation. The proposed model utilizes a pre-trained VGG16 encoder to extract high-level features from input images, followed by a decoder network that reconstructs the image. A proposal layer generates a binary map indicating the presence or absence of objects at each location in the image. The proposal layer is integrated with the decoder output and further refined by a convolutional layer to produce the final segmentation. Two dense layers are employed to predict object classes and bounding box coordinates. The model is trained using a custom loss function that combines categorical cross-entropy loss and means squared error loss. Experimental results demonstrate the effectiveness of the proposed model in achieving accurate object tagging and segmentation.
Tagging , Encoder decoder , Semantic segmentation , Object detection
[1] Y. Li, L. Yuan, and N. Vasconcelos, "Deep Hierarchical Semantic Segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 1-10.
[2] S. Mehta and M. Rastegari, "Simple and Efficient Architectures for Semantic Segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, pp. 1-10.
[3] F. Lateef and Y. J. N. Ruichek, "Survey on semantic segmentation using deep learning techniques," vol. 338, pp. 321-348, 2019.
[4] J. M. Stokes et al., "A deep learning approach to antibiotic discovery," vol. 180, no. 4, pp. 688-702. e13, 2020.
[5] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[6] T. Kong, A. Yao, Y. Chen, and F. Sun, "Hypernet: Towards accurate region proposal generation and joint object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 845-853.
[7] J. H. Giraldo et al., "Hypergraph Convolutional Networks for Weakly-Supervised Semantic Segmentation," arXiv preprint arXiv:2210.05564, 2022.
[8] J. Fu et al., "Dual attention network for scene segmentation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3146-3154.
[9] A. Aakerberg and M. Felsberg, "Semantic Segmentation Guided Real-World Super-Resolution," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), 2022, pp. 1-10.
[10] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510-4520.
[11] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881-2890.
[12] J. Dai, K. He, and J. Sun, "Instance-aware semantic segmentation via multi-task network cascades," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3150-3158.
[13] E. Temlioglu, I. Erer, and D. Kumlu, "A least mean square approach to buried object detection in ground penetrating radar," in 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2017, pp. 4833-4836: IEEE.
[14] Z. Zhang and M. J. A. i. n. i. p. s. Sabuncu, "Generalized cross entropy loss for training deep neural networks with noisy labels," vol. 31, 2018.
[15] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 2014, pp. 740-755: Springer.
[16] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU loss: Faster and better learning for bounding box regression," in Proceedings of the AAAI conference on artificial intelligence, 2020, vol. 34, no. 07, pp. 12993-13000.
[17] Z. Hao et al., "Automated tree-crown and height detection in a young forest plantation using mask region-based convolutional neural network (Mask R-CNN)," vol. 178, pp. 112-123, 2021.
[18] X. Zhou, D. Wang, and P. Krähenbühl, "Objects as Points," in arXiv preprint arXiv: 1904.07850, 2019.
[19] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8759-8768.
[20] K. Chen et al., "Hybrid task cascade for instance segmentation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4974-4983.
[21] P. Sun et al., "Sparse r-cnn: End-to-end object detection with learnable proposals," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 14454-14463.
[22] Z. Tian, C. Shen, H. Chen, and T. He, "FCOS: Fully Convolutional One-Stage Object Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 3, pp. 1089-1102, 2020, doi: 10.1109/TPAMI.2019.2951682.
[23] Y. Chen et al., "YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection," arXiv preprint arXiv:2308.05480, 2023.
[24] T. Kong, F. Sun, H. Liu, Y. Jiang, and J. J. a. p. a. Shi, "FoveaBox: Beyond anchor-based object detector. arXiv 2019," vol. 2, no. 5.
[25] F. Wei, X. Sun, H. Li, J. Wang, and S. Lin, "Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation," in Proceedings of the European Conference on Computer Vision (ECCV), 2020, pp. 527–544.