Volume 10 , Issue 1 , PP: 78-87, 2023 | Cite this article as | XML | Html | PDF | Full Length Article
Abdelaziz A. Abdelhamid 1 *
Doi: https://doi.org/10.54216/FPA.100104
The severe circumstances caused by COVID-19 make online education the best replacement for regular face-to-face education for continuing the education process. One year ago, and till now most schools adopted online learning during this pandemic shutdown, which indicates the applicability of this teaching methodology. However, the efficiency of this method needs to be improved to guarantee its effectiveness. Although face-to-face teaching has many advantages over online education, there is a chance to promote online learning by utilizing the recent techniques of artificial intelligence. From this perspective, we propose a framework to detect and recognize emotions in the speech of students during virtual classes to keep instructors updated with the feelings of students so and can behave accordingly. The approach of detecting emotions from the speech is much more helpful for cases when turning on the cameras at the student's side could be embarrassing. This case is very common, especially for schools in Middle East countries. The proposed framework can also be applied to other similar scenarios such as online meetings.
Speech emotions , Online learning , Machine Learning.
[1] C. Darwin and P. Prodger, 'e Expression of the Emotions in Man and Animals, Oxford University Press, Oxford, MA, USA, 1998.
[2] Y.-I. Tian, T. Kanade, and J. F. Cohn, “Recognizing action units for facial expression analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 97–115, 2001.
[3] R. Donoso, C. San Mart´ın, and G. Hermosilla, “Reduced isothermal feature set for long wave infrared (LWIR) face recognition,” Infrared Physics&Technology, vol. 83, pp. 114 –123, 2017.
[4] T. Liu, H. Liu, Z. Chen et al., “FBRDLR: fast blind reconstruction approach with dictionary learning regularization for infrared microscopy spectra,” Infrared Physics & Technology, vol. 90, pp. 101–109, 2018.
[5] Z. Huang, H. Fang, Q. Li et al., “Optical remote sensing image enhancement with weak structure preservation via spatially adaptive gamma correction,” Infrared Physics & Technology, vol. 94, pp. 38–47, 2018.
[6] Y. Bi, M. Lv, Y. Wei, N. Guan, and W. Yi, “Multi-feature fusion for thermal face recognition,” Infrared Physics & Technology, vol. 77, pp. 366–374, 2016.
[7] H. Liu, Z. Zhang, S. Liu, J. Shu, T. Liu, and T. Zhang, “Blind spectrum reconstruction algorithm with L0-sparse representation,”Measurement Science and Technology, vol. 26, no. 8, pp. 085501–085507, 2015.
[8] H. Wu, Y. Liu, L. Qiu, and Y. Liu, “Online judge system and its applications in C language teaching,” in Proceedings of the International Symposium on Educational Technology (ISET),
[9] pp. 57–60, Beijing, China, July 2016.[10] T. Liu, Z. Chen, H. Liu, Z. Zhang, and Y. Chen, “Multi-modal hand gesture designing in multiscreen touchable teaching system for human-computer interaction,” in Proceedings of the Second International Conference on Advances in Image Processing, pp. 100–109, Chengdu China, June 2018.
[11] P. Ekman and W. V. Friesen, “Constants across cultures in the face and emotion,” Journal ofPersonality and Social Psychology, vol. 17, no. 2, pp. 124–129, 1971.
[12] P. Ekman, “Strong evidence for universals in facial expressions: a reply to Russell‟s mistaken critique,” Psychological Bulletin, vol. 115, no. 2, pp. 268–287, 1994.
[13] Samarth Tripathi and Sarthak Tripathi and Homayoon Beigi, " ulti-Modal Emotion recognition on IEMOCAP Dataset using Deep Learning," arXiv, 1804.05788, 2019.
[14] Wagner, J., Lingenfelser, F., & Andre, E.. The Social Signal Interpretation Framework (SSI) for Real Time Signal Processing and Recognitions, In Proceedings of INTERSPEECH, Florence, Italy, 2011.
[15] Wagner, J., Lingenfelser, F., Baur, T., Damian, I., Kistler, F., & Andre, E. The Social Signal Interpretation (SSI) Framework: Multimodal Signal Processing and Recognition in Real-time. Proceedings of the 21st ACM International Conference on Multimedia, MM „13. Barcelona, Spain. 831–834, 2013.
[16] Jo Jianhua, T., Tieniu, T., & RosalindW, P. Affective computing: a review. Affective computing and intelligent interaction. Springer Berlin Heidelberg, 3784, 981–995, 2005.
[17] Jones, C., & Sutherland, J. Acoustic emotion recognition for affective computer gaming. In C. Peter & R. Beale (Eds.), Affect and Emotion in Human-Computer Interaction. LNCS. 4868. Heidelberg: Springer, 2008.
[18] Bahreini, K., Nadolski, R. & Westera, W. Towards real-time speech emotion recognition for affective e-learning. Educ Inf Technol 21, 1367–1386 (2016).
[19] Beale, R., & Creed, C. Affective interaction: How emotional agents affect users. International Journal of Human-Computer Studies, 67(9), 755–776, 2009.
[20] Feidakis, M., Daradoumis, T., & Caballe, S. Emotion Measurement in Intelligent Tutoring Systems: What, When and How to Measure. Third International Conference on Intelligent Networking and Collaborative Systems, 807–812. 2011.
[21] Huhnel, I., Fölster, M., Werheid, K., & Hess, U. Empathic reactions of younger and older adults:no age related decline in affective responding. Journal of Experimental Social Psychology, 50,136–143, 2014.
[22] Nwe, T., Foo, S., & De Silva, L. Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623, 2003.
[23] Pfister, T., & Robinson, P. Real-time recognition of affective states from nonverbal features of speech and its application for public speaking skill analysis. IEEE Transactions on Affective Computing, 2(2), 66–78, 2011.
[24] Chen, L., Mao, X., Xue, Y., & Cheng, L. L. Speech emotion recognition: features and classification models. Digital Signal Processing, 22(6), 1154–1160, 2012.
[25] Bahreini, K., Nadolski, R., & Westera, W. FLITWAM and Voice Emotion Recognition. Games and Learning Alliance (GaLA) Conference. Paris, France, 23–25, 2013.
[26] Beale, R., & Creed, C. Affective interaction: How emotional agents affect users. International Journal f Human-Computer Studies, 67(9), 755–776, 2009.
[27] Ben Ammar, M., Neji, M., Alimi, A. M., & Gouardères, G. The affective tutoring system. Expert Systems with Applications, 37(4), 3013–3023, 2010.
[28] Chen, L., Mao, X., Xue, Y., & Cheng, L. L. Speech emotion recognition: features and classification models. Digital Signal Processing, 22(6), 1154–1160, 2012.
[29] Happy, S. L. Dasgupta, A., Patnaik, P., Routray, A. Automated Alertness and Emotion Detection for Empathic Feedback during e-Learning. IEEE Fifth International Conference on Technology for Education (T4E). 47–50, 2013.
[30] López-Cózar, R., Silovsky, J., & Kroul, M. Enhancement of emotion detection in spoken dialogue systems by combining several information sources. Speech Communication, 53(9–10), 1210–1228, 2011.
[31] Pekrun, R.. The impact of emotions on learning and achievement: towards a theory of cognitive/ motivational mediators. Journal of Applied Psychology, 41, 359–376, 1992..
[32] Hamzah A. Alsayadi, Abdelaziz A. Abdelhamid, Islam Hegazy, and Zaki T. Fayed: Arabic speech recognition using end-to-end deep learning. IET Signal Process. 15( 8), 521– 534 (2021).
[33] A. Abdelhamid, E.-S. M. El-kenawy, B. Alotaibi, M. Abdelkader, A. Ibrahim et al., Robust speech emotion recognition using CNN+LSTM based on stochastic fractal search optimization algorithm, IEEE Access, vol. 10, pp. 49265-49284, 2022.
[34] A. Abdelhamid, W. Abdulla, B. Macdonald. WFST-Based Large Vocabulary Continuous Speech Decoder for Service Robots. ACTA Press, 2012. www.actapress.com, https://doi.org/10.2316/P.2012.771-009