Gated Recurrent Fusion in Long Short-Term Memory Fusion

Anita Venugopal; Aditi Sharma; Preetish Kakkar; Daya Nand; Arvind R. Yadav; Gaurav Kumar Ameta

doi:https://doi.org/10.54216/FPA.190105

Gated Recurrent Fusion in Long Short-Term Memory Fusion

Anita Venugopal ¹ , Aditi Sharma ^{2
*} , Preetish Kakkar ³ , Daya Nand ⁴ , Arvind R. Yadav ⁵ , Gaurav Kumar Ameta ⁶

1 Dhofar University, Sultanate of Oman - (anita@du.edu.om)

2 Department of Computer Sc. and Engg, Symbosis Institute of Technology, Pune, India; Symbiosis International (Deemed) University, Pune, India - (aditi.sharma@ieee.org)

3 IEEE Senior Member, USA - (preetish.kakkar@gmail.com)

4 University of Houston, Victoria, Texas, USA - (nandD@uhv.edu)

5 E&I Engineering Department, Institute of Technology, Nirma University, Ahmedabad, India - (arvind.yadav.me@gmail.com)

6 Department of Computer Sc. and Engg, Parul Institute of Technology, Parul University, Vadodara, India - (gauravameta1@gmail.com)

Doi: https://doi.org/10.54216/FPA.190105

Received: November 10, 2024 Revised: January 12, 2025 Accepted: February 11, 2025

Abstract

Fusion techniques on enhancing the efficiency of Long Short-Term Memory (LSTM) networks are dominating across a variety of domains. To handle sequential data while integrating from various sources is often challenging using LSTM techniques. Fusion methods that integrate different models enhances LSTM’ ability to handle complex correlations in the data. This paper examines early, late and hybrid fusion techniques. The study provides fusion approaches to enhance LSTM networks to efficiently handle complex multimodal data across self-navigating models. The findings reveal that the hybrid fusion techniques outperform traditional methods in terms of accuracy and generalization of various tasks. This paper proposes the Gated Recurrent Fusion (GRF) approach to demonstrate its performance to handle multimodal and temporal models in a supervised recurrence. The findings report 10% enhancement in terms of precision rate

Keywords :

Fusion technique , LSTM, RNN , Scalability , Early fusion , Hybrid fusion , Multimodal data

References

[1] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[2] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A robustly optimized BERT pretraining approach,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 3126–3141, May 2022.

[3] A. Graves, A. R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2013, pp. 6645–6649.

[4] A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 3128–3137.

[5] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in Proc. Int. Conf. Mach. Learn. (ICML), 2015, pp. 2048–2057.

[6] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv: 1409.0473, 2014.

[7] Z. Wang, D. Guo, C. Wu, and M. Zhang, “Dynamic multi-modal fusion for transformer-based sentiment analysis,” IEEE Access, 2022.

[8] Y. Chen, J. Li, H. Xiao, X. Jin, Z. Zhou, and S. Zhang, “Multimodal sentiment analysis using hierarchical fusion with context modeling,” IEEE Transactions on Affective Computing, 2020.

[9] X. Chen and Y. Li, “Fusion methods in neural networks: A comparative study of LSTM model enhancements,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 5, pp. 2223–2238, 2022.

[10] A. Martinez and S. Gupta, “Advances in multimodal fusion techniques for LSTM networks,” Int. J. Artif. Intell, vol. 15, no. 3, pp. 330–345, 2023.

[11] Y. Kim, C. Denton, L. Hoang, and A. M. Rush, “Structured attention networks,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2017.

[12] H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 2174–2182.

[13] S. S. Rajagopalan, L.-P. Morency, T. Baltrusaitis, and R. Goecke, “Extending long short-term memory for multi-view structured learning,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 338–353.

[14] D. A. Pomerleau, “ALVINN: An autonomous land vehicle in a neural network,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 1989, pp. 305–313.

[15] U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, “Off-road obstacle avoidance through end-to-end learning,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2006, pp. 739–746.

[16] H. Gao, B. Cheng, J. Wang, K. Li, J. Zhao, and D. Li, “Object classification using CNN-based fusion of vision and lidar in autonomous vehicle environment,” IEEE Transactions on Industrial Informatics, vol. 14, no. 9, pp. 4224–4231, Sep. 2018.

[17] M. Bojarski et al., “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016.

[18] Y. Chen et al., “Lidar-video driving dataset: Learning driving policies effectively,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 5870–5878.

[19] [19] R. Girdhar, D. Ramanan, A. Gupta, J. Sivic, and B. Russell, “ActionVLAD: Learning spatio-temporal aggregation for action classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 971–980.

[20] M. Kam, X. Zhu, and P. Kalata, “Sensor fusion for mobile robot navigation,” Proc. IEEE, vol. 85, no. 1, pp. 108–119, Jan. 1997.

[21] J. Sasiadek and Q. Wang, “Sensor fusion based on fuzzy Kalman filtering for autonomous robot vehicle,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), vol. 4, 1999, pp. 2970–2975.

[22] Q. Li, L. Chen, M. Li, S.-L. Shaw, and A. Nüchter, “A sensor-fusion drivable-region and lane-detection system for autonomous vehicle navigation in challenging road scenarios,” IEEE Transactions on Vehicular Technology, vol. 63, no. 2, pp. 540–555, Feb. 2014.

[23] V. De Silva, J. Roche, and A. Kondoz, “Fusion of lidar and camera sensor data for environment sensing in driverless vehicles,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017.

[24] W. Luo, B. Yang, and R. Urtasun, “Fast and furious: Real time end-to-end 3D detection tracking and motion forecasting with a single convolutional net,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 3569–3577.

[25] N. Radwan, A. Valada, and W. Burgard, “Multimodal interaction-aware motion prediction for autonomous street crossing,” arXiv preprint arXiv: 1808.06887, 2018.

[26] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “DeepDriving: Learning affordance for direct perception in autonomous driving,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2015, pp. 2722–2730.

[27] N. Radwan, A. Valada, and W. Burgard, “VLocNet++: Deep multitask learning for semantic visual localization and odometry,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4407–4414, Oct. 2018.

[28] A. Valada, R. Mohan, and W. Burgard, “Self-supervised model adaptation for multimodal semantic segmentation,” Int. J. Comput. Vis., pp. 1–47, 2018.

[29] P. Paygude et al., “Species identification for Indian seafood markets: A machine learning approach with a fish dataset,” Data in Brief, vol. 58, 2025, Art. no. 111209.

[30] K. Nakamura, S. Yeung, A. Alahi, and L. Fei-Fei, “Jointly learning energy expenditures and activities using egocentric multimodal signals,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 6817–6826.

[31] Z. Shou et al., “Online action detection in untrimmed streaming videos-modeling and evaluation,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018.

[32] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.

Cite This Article As :

Venugopal, Anita. , Sharma, Aditi. , Kakkar, Preetish. , Nand, Daya. , R., Arvind. , Kumar, Gaurav. Gated Recurrent Fusion in Long Short-Term Memory Fusion. Fusion: Practice and Applications, vol. , no. , 2025, pp. 50-56. DOI: https://doi.org/10.54216/FPA.190105

Venugopal, A. Sharma, A. Kakkar, P. Nand, D. R., A. Kumar, G. (2025). Gated Recurrent Fusion in Long Short-Term Memory Fusion. Fusion: Practice and Applications, (), 50-56. DOI: https://doi.org/10.54216/FPA.190105

Venugopal, Anita. Sharma, Aditi. Kakkar, Preetish. Nand, Daya. R., Arvind. Kumar, Gaurav. Gated Recurrent Fusion in Long Short-Term Memory Fusion. Fusion: Practice and Applications , no. (2025): 50-56. DOI: https://doi.org/10.54216/FPA.190105

Venugopal, A. , Sharma, A. , Kakkar, P. , Nand, D. , R., A. , Kumar, G. (2025) . Gated Recurrent Fusion in Long Short-Term Memory Fusion. Fusion: Practice and Applications , () , 50-56 . DOI: https://doi.org/10.54216/FPA.190105

Venugopal A. , Sharma A. , Kakkar P. , Nand D. , R. A. , Kumar G. [2025]. Gated Recurrent Fusion in Long Short-Term Memory Fusion. Fusion: Practice and Applications. (): 50-56. DOI: https://doi.org/10.54216/FPA.190105

Venugopal, A. Sharma, A. Kakkar, P. Nand, D. R., A. Kumar, G. "Gated Recurrent Fusion in Long Short-Term Memory Fusion," Fusion: Practice and Applications, vol. , no. , pp. 50-56, 2025. DOI: https://doi.org/10.54216/FPA.190105

Fusion: Practice and Applications

Journal DOI

Journal Menu

Journal Volumes

Volume 1

Volume 2

Volume 3

Volume 4

Volume 5

Volume 6

Volume 7

Volume 8

Volume 9

Volume 10

Volume 11

Volume 12

Volume 13

Volume 14

Volume 15

Volume 16

Volume 17

Volume 18

Volume 19

Volume 20

Volume 21

Gated Recurrent Fusion in Long Short-Term Memory Fusion

Abstract

Keywords :

References

Cite This Article As :

Article Statistics

Download