Volume 17 , Issue 1 , PP: 398-408, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
R. Poorni 1 * , Chinnathambi Kamatchi 2 , Y. Dharshan 3 , K. Kowsalya 4 , R. Vijay 5 * , M. Balakrishnan 6
Doi: https://doi.org/10.54216/JISIoT.170128
Gesture recognition serves as a key enabler for natural and intuitive human–robot interaction (HRI) in smart automation and assistive systems. However, achieving real-time performance with high recognition accuracy remains a significant challenge due to dynamic background variations, occlusion, and complex spatio-temporal dependencies in gesture sequences. This paper presents a real-time attention-based CNN-RNN framework for robust gesture recognition and adaptive HRI in dynamic environments. The proposed system utilizes Convolutional Neural Networks (CNNs) for spatial feature extraction from sequential video frames and Bidirectional Recurrent Neural Networks (BiRNNs)—integrated with an attention mechanism—for modeling temporal dependencies and focusing on discriminative motion cues. The attention layer enhances interpretability by prioritizing salient gestures and reducing background noise. A hybrid optimization strategy, combining adaptive learning rate scheduling and regularized dropout, ensures computational stability and generalization across gesture datasets. Experiments conducted on benchmark datasets such as NVIDIA Dynamic Gesture (NvGesture) and ChaLearn IsoGD demonstrate superior performance, achieving an accuracy of 97.8% and a real-time inference speed of 34 FPS, outperforming baseline CNN, 3D-CNN, and LSTM architectures. The proposed framework effectively balances accuracy, latency, and interpretability, making it suitable for real-world HRI applications, including service robotics, industrial automation, and assistive technologies.
Gesture recognition , human&ndash , robot interaction (HRI) , convolutional neural network (CNN) , recurrent neural network (RNN) , attention mechanism , bidirectional RNN , spatio-temporal modelling , real-time processing , deep learning , intelligent robotics
[1] J. Wu et al., "Data glove-based gesture recognition using CNN-BiLSTM model with attention mechanism," PLoS One, vol. 18, no. 11, p. e0294174, 2023.
[2] M. A. I. Khan and S. Islam, "Multimodal Gesture Recognition using CNN-GCN-LSTM with RGB, Depth, and Skeleton Data," Int. J. Comput. Appl., vol. 975, p. 8887.
[3] M. H. Zafar, E. F. Langås, and F. Sanfilippo, "Empowering human-robot interaction using semg sensor: Hybrid deep learning model for accurate hand gesture recognition," Results Eng., vol. 20, p. 101639, 2023.
[4] M. R. Shuvo, M. S. Mekala, and E. Elyan, "Deep Learning and Attention-Based Methods for Human Activity Recognition and Anticipation: A Comprehensive Review," Cogn. Comput., vol. 17, no. 6, pp. 1-28, 2025.
[5] Singh and A. K. Bansal, "An Integrated Model for Automated Identification and Learning of Conversational Gestures in Human–Robot Interaction," in Cutting Edge Applications of Computational Intelligence Tools and Techniques. Cham: Springer, 2023, pp. 33-61.
[6] J. Shin et al., "Hand gesture recognition using sEMG signals with a multi-stream time-varying feature enhancement approach," Sci. Rep., vol. 14, no. 1, p. 22061, 2024.
[7] G. Yu et al., "Gesture classification in electromyography signals for real-time prosthetic hand control using a convolutional neural network-enhanced channel attention model," Bioengineering, vol. 10, no. 11, p. 1324, 2023.
[8] S. Wang et al., "Improved multi-stream convolutional block attention module for sEMG-based gesture recognition," Front. Bioeng. Biotechnol., vol. 10, p. 909023, 2022.
[9] Z. Zhang, Q. Shen, and Y. Wang, "Electromyographic hand gesture recognition using convolutional neural network with multi-attention," Biomed. Signal Process. Control, vol. 91, p. 105935, 2024.
[10] S. Zabihi, E. Rahimian, A. Asif, and A. Mohammadi, "Light-weight CNN-attention based architecture for hand gesture recognition via electromyography," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2023, pp. 1-5.
[11] N. Ramsai and K. Sridharan, "Deep Networks and Sensor Fusion for Personal Care Robot Tasks-A Review," IEEE Sensors J., early access, 2025.
[12] R. Deepika et al., "Hand Gesture Recognition using CNN-GNN," in Proc. Int. Conf. Autom., Comput. Renew. Syst. (ICACRS), 2024, pp. 1482-1487.
[13] T. Bao, Z. Lu, and P. Zhou, "Deep Learning Based Post-stroke Myoelectric Gesture Recognition: From Feature Construction to Network Design," IEEE Trans. Neural Syst. Rehabil. Eng., early access, 2024.
[14] D. Noh, H. Yoon, and D. Lee, "A decade of progress in human motion recognition: A comprehensive survey from 2010 to 2020," IEEE Access, vol. 12, pp. 5684-5707, 2024.
[15] E. Rahimian et al., "Surface EMG-based hand gesture recognition via hybrid and dilated deep neural network architectures for neurorobotic prostheses," J. Med. Robot. Res., vol. 5, no. 01n02, p. 2041001, 2020.
[16] S. Dewangan, V. K. Origanti, and F. Kirchner, "Real-Time Dynamic Gesture Recognition for Human-Robot Collaboration in Rescue Operations," in Proc. IEEE Int. Symp. Saf., Secur., Rescue Robot. (SSRR), 2024, pp. 229-236.
[17] M. Karim et al., "Next Generation Human Action Recognition: A Comprehensive Review of State-of-the-Art Signal Processing Techniques," IEEE Access, early access, 2025.
[18] M. K. Kadhim, C. S. Der, and C. C. Phing, "Enhanced dynamic hand gesture recognition for finger disabilities using deep learning and an optimized Otsu threshold method," Eng. Res. Express, vol. 7, no. 1, p. 015228, 2025.
[19] Toro-Ossaba et al., "LSTM recurrent neural network for hand gesture recognition using EMG signals," Appl. Sci., vol. 12, no. 19, p. 9700, 2022.
[20] H. Manjunatha, S. S. Jujjavarapu, and E. T. Esfahani, "Transfer learning of motor difficulty classification in physical human–robot interaction using electromyography," J. Comput. Inf. Sci. Eng., vol. 22, no. 5, p. 050908, 2022.