Real-Time Gesture Recognition Using Attention-Based CNN-RNN Framework for Human-Robot Interaction

R. Poorni; Chinnathambi Kamatchi; Y. Dharshan; K. Kowsalya; R. Vijay; M. Balakrishnan

doi:https://doi.org/10.54216/JISIoT.170128

Real-Time Gesture Recognition Using Attention-Based CNN-RNN Framework for Human-Robot Interaction

R. Poorni ^{1
*} , Chinnathambi Kamatchi ² , Y. Dharshan ³ , K. Kowsalya ⁴ , R. Vijay ^{5
*} , M. Balakrishnan ⁶

1 Assistant Professor, School of Computer Science Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, Tamilnadu, India - (Poorniram21@gmail.com)

2 Assistant Professor, Department of Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Chennai, Tamil Nadu, India - (k.chinnathambimku@gmail.com)

3 Assistant Professor, Department of Electronics and Instrumentation Engineering, Sri Ramakrishna Engineering College, Coimbatore, Tamil Nadu, India - (dharshan.y@srec.ac.in)

4 Assistant Professor, Department of Electronics and Communication Engineering, Hindusthan Institute of Technology, Coimbatore, Tamil Nadu, India - (kowsalya.k@hit.edu.in)

5 Assistant Professor, Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation (Deemed to be University), Andhra Pradesh, India - (vijayraja4398@gmail.com)

6 Professor, Department of Artificial Intelligence and Data Science, Dr. Mahalingam College of Engineering and Technology, Pollachi, Coimbatore, Tamil Nadu, India - (balakrishnanme@gmail.com)

Doi: https://doi.org/10.54216/JISIoT.170128

Received: January 10, 2025 Revised: February 24, 2025 Accepted: March 30, 2025

Abstract

Gesture recognition serves as a key enabler for natural and intuitive human–robot interaction (HRI) in smart automation and assistive systems. However, achieving real-time performance with high recognition accuracy remains a significant challenge due to dynamic background variations, occlusion, and complex spatio-temporal dependencies in gesture sequences. This paper presents a real-time attention-based CNN-RNN framework for robust gesture recognition and adaptive HRI in dynamic environments. The proposed system utilizes Convolutional Neural Networks (CNNs) for spatial feature extraction from sequential video frames and Bidirectional Recurrent Neural Networks (BiRNNs)—integrated with an attention mechanism—for modeling temporal dependencies and focusing on discriminative motion cues. The attention layer enhances interpretability by prioritizing salient gestures and reducing background noise. A hybrid optimization strategy, combining adaptive learning rate scheduling and regularized dropout, ensures computational stability and generalization across gesture datasets. Experiments conducted on benchmark datasets such as NVIDIA Dynamic Gesture (NvGesture) and ChaLearn IsoGD demonstrate superior performance, achieving an accuracy of 97.8% and a real-time inference speed of 34 FPS, outperforming baseline CNN, 3D-CNN, and LSTM architectures. The proposed framework effectively balances accuracy, latency, and interpretability, making it suitable for real-world HRI applications, including service robotics, industrial automation, and assistive technologies.

Keywords :

Gesture recognition , human&ndash , robot interaction (HRI) , convolutional neural network (CNN) , recurrent neural network (RNN) , attention mechanism , bidirectional RNN , spatio-temporal modelling , real-time processing , deep learning , intelligent robotics

References

[1] J. Wu et al., "Data glove-based gesture recognition using CNN-BiLSTM model with attention mechanism," PLoS One, vol. 18, no. 11, p. e0294174, 2023.

[2] M. A. I. Khan and S. Islam, "Multimodal Gesture Recognition using CNN-GCN-LSTM with RGB, Depth, and Skeleton Data," Int. J. Comput. Appl., vol. 975, p. 8887.

[3] M. H. Zafar, E. F. Langås, and F. Sanfilippo, "Empowering human-robot interaction using semg sensor: Hybrid deep learning model for accurate hand gesture recognition," Results Eng., vol. 20, p. 101639, 2023.

[4] M. R. Shuvo, M. S. Mekala, and E. Elyan, "Deep Learning and Attention-Based Methods for Human Activity Recognition and Anticipation: A Comprehensive Review," Cogn. Comput., vol. 17, no. 6, pp. 1-28, 2025.

[5] Singh and A. K. Bansal, "An Integrated Model for Automated Identification and Learning of Conversational Gestures in Human–Robot Interaction," in Cutting Edge Applications of Computational Intelligence Tools and Techniques. Cham: Springer, 2023, pp. 33-61.

[6] J. Shin et al., "Hand gesture recognition using sEMG signals with a multi-stream time-varying feature enhancement approach," Sci. Rep., vol. 14, no. 1, p. 22061, 2024.

[7] G. Yu et al., "Gesture classification in electromyography signals for real-time prosthetic hand control using a convolutional neural network-enhanced channel attention model," Bioengineering, vol. 10, no. 11, p. 1324, 2023.

[8] S. Wang et al., "Improved multi-stream convolutional block attention module for sEMG-based gesture recognition," Front. Bioeng. Biotechnol., vol. 10, p. 909023, 2022.

[9] Z. Zhang, Q. Shen, and Y. Wang, "Electromyographic hand gesture recognition using convolutional neural network with multi-attention," Biomed. Signal Process. Control, vol. 91, p. 105935, 2024.

[10] S. Zabihi, E. Rahimian, A. Asif, and A. Mohammadi, "Light-weight CNN-attention based architecture for hand gesture recognition via electromyography," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2023, pp. 1-5.

[11] N. Ramsai and K. Sridharan, "Deep Networks and Sensor Fusion for Personal Care Robot Tasks-A Review," IEEE Sensors J., early access, 2025.

[12] R. Deepika et al., "Hand Gesture Recognition using CNN-GNN," in Proc. Int. Conf. Autom., Comput. Renew. Syst. (ICACRS), 2024, pp. 1482-1487.

[13] T. Bao, Z. Lu, and P. Zhou, "Deep Learning Based Post-stroke Myoelectric Gesture Recognition: From Feature Construction to Network Design," IEEE Trans. Neural Syst. Rehabil. Eng., early access, 2024.

[14] D. Noh, H. Yoon, and D. Lee, "A decade of progress in human motion recognition: A comprehensive survey from 2010 to 2020," IEEE Access, vol. 12, pp. 5684-5707, 2024.

[15] E. Rahimian et al., "Surface EMG-based hand gesture recognition via hybrid and dilated deep neural network architectures for neurorobotic prostheses," J. Med. Robot. Res., vol. 5, no. 01n02, p. 2041001, 2020.

[16] S. Dewangan, V. K. Origanti, and F. Kirchner, "Real-Time Dynamic Gesture Recognition for Human-Robot Collaboration in Rescue Operations," in Proc. IEEE Int. Symp. Saf., Secur., Rescue Robot. (SSRR), 2024, pp. 229-236.

[17] M. Karim et al., "Next Generation Human Action Recognition: A Comprehensive Review of State-of-the-Art Signal Processing Techniques," IEEE Access, early access, 2025.

[18] M. K. Kadhim, C. S. Der, and C. C. Phing, "Enhanced dynamic hand gesture recognition for finger disabilities using deep learning and an optimized Otsu threshold method," Eng. Res. Express, vol. 7, no. 1, p. 015228, 2025.

[19] Toro-Ossaba et al., "LSTM recurrent neural network for hand gesture recognition using EMG signals," Appl. Sci., vol. 12, no. 19, p. 9700, 2022.

[20] H. Manjunatha, S. S. Jujjavarapu, and E. T. Esfahani, "Transfer learning of motor difficulty classification in physical human–robot interaction using electromyography," J. Comput. Inf. Sci. Eng., vol. 22, no. 5, p. 050908, 2022.

Cite This Article As :

Poorni, R.. , Kamatchi, Chinnathambi. , Dharshan, Y.. , Kowsalya, K.. , Vijay, R.. , Balakrishnan, M.. Real-Time Gesture Recognition Using Attention-Based CNN-RNN Framework for Human-Robot Interaction. Journal of Intelligent Systems and Internet of Things, vol. , no. , 2025, pp. 398-408. DOI: https://doi.org/10.54216/JISIoT.170128

Poorni, R. Kamatchi, C. Dharshan, Y. Kowsalya, K. Vijay, R. Balakrishnan, M. (2025). Real-Time Gesture Recognition Using Attention-Based CNN-RNN Framework for Human-Robot Interaction. Journal of Intelligent Systems and Internet of Things, (), 398-408. DOI: https://doi.org/10.54216/JISIoT.170128

Poorni, R.. Kamatchi, Chinnathambi. Dharshan, Y.. Kowsalya, K.. Vijay, R.. Balakrishnan, M.. Real-Time Gesture Recognition Using Attention-Based CNN-RNN Framework for Human-Robot Interaction. Journal of Intelligent Systems and Internet of Things , no. (2025): 398-408. DOI: https://doi.org/10.54216/JISIoT.170128

Poorni, R. , Kamatchi, C. , Dharshan, Y. , Kowsalya, K. , Vijay, R. , Balakrishnan, M. (2025) . Real-Time Gesture Recognition Using Attention-Based CNN-RNN Framework for Human-Robot Interaction. Journal of Intelligent Systems and Internet of Things , () , 398-408 . DOI: https://doi.org/10.54216/JISIoT.170128

Poorni R. , Kamatchi C. , Dharshan Y. , Kowsalya K. , Vijay R. , Balakrishnan M. [2025]. Real-Time Gesture Recognition Using Attention-Based CNN-RNN Framework for Human-Robot Interaction. Journal of Intelligent Systems and Internet of Things. (): 398-408. DOI: https://doi.org/10.54216/JISIoT.170128

Poorni, R. Kamatchi, C. Dharshan, Y. Kowsalya, K. Vijay, R. Balakrishnan, M. "Real-Time Gesture Recognition Using Attention-Based CNN-RNN Framework for Human-Robot Interaction," Journal of Intelligent Systems and Internet of Things, vol. , no. , pp. 398-408, 2025. DOI: https://doi.org/10.54216/JISIoT.170128

Journal of Intelligent Systems and Internet of Things

Journal DOI

Journal Menu

Journal Volumes

Volume 0

Volume 1

Volume 2

Volume 3

Volume 4

Volume 5

Volume 6

Volume 7

Volume 8

Volume 9

Volume 10

Volume 11

Volume 12

Volume 13

Volume 14

Volume 15

Volume 16

Volume 17

Volume 18

Real-Time Gesture Recognition Using Attention-Based CNN-RNN Framework for Human-Robot Interaction

Abstract

Keywords :

References

Cite This Article As :

Article Statistics

Download