Volume 17 , Issue 1 , PP: 89-105, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Rajeswary Nair 1 * , K. S. Kannan 2
Doi: https://doi.org/10.54216/JISIoT.170107
Advanced stuttering detection and classification using artificial intelligence is the main emphasis of this work. Determining the degree of stuttering for speech therapists, providing an early patient diagnosis and facilitating communication with voice assistants are just a few of the uses for an efficient classification of stuttering and its subclasses. This work's first portion examines the databases and features utilized, along with the deep learning and classical methods used for automated stuttering categorization. The Bayesian Bi-directional Long Short Memory with Fully Convoluted Classifier model (BaBi-LSTM) is a deep learning model in conjunction with an available stuttering information set. The tests evaluate the impact of individual signal features on the classification outcomes, including pitch-determining variables, different 2D speech representations, and Mel-Frequency Cepstral Coefficients (MFCCs). The suggested technique turns out to be the most successful, obtaining a 95% F1 measure for the entire class. When detecting stuttering disorders, deep learning algorithms outperform classical methods. However, the results differ amongst stuttering subtypes because of incomplete data and poor annotation quality. The study also examines the impact of the number of thick layers, the magnitude of the training information set, and the division apportionment of data into training and evaluation groups on the effectiveness of stuttering event recognition to offer insights for future technique improvements.
Stuttering , Prediction , Deep learning , Cepstral coefficients , Speech representation
[1] M. Pagel, "Q&A: what is human language, when did it evolve and why should we care?" BMC Biology, vol. 15, pp. 1–6, 2017. doi:10.1186/s12915-017-0405-3.
[2] World Health Organization, "Disability," 2023. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/disability-and-health. [Accessed: May 17, 2024].
[3] A. Hair, P. Monroe, B. Ahmed, K. J. Ballard, and R. Gutierrez-Osuna, "Apraxia world: a speech therapy game for children with speech sound disorders," in Proceedings of the 17th ACM Conference on Interaction Design and Children, 2018, pp. 119–131.
[4] P. Wang and H. Van Hamme, "Benefits of pre-trained mono-and cross-lingual speech representations for spoken language understanding of Dutch dysarthric speech," EURASIP Journal on Audio, Speech, and Music Processing, vol. 2023, pp. 1–25, 2023. doi:10.1186/s13636-023-00280-z.
[5] Y. Gu, M. Bahrani, A. Billot, et al., "A statistical modeling approach for predicting post-stroke aphasia recovery: a pilot study," in Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive Environments, 2020, pp. 1–9. doi:10.1145/3389189.3389204.
[6] D. Mulfari, G. Meoni, M. Marini, and L. Fanucci, "Statistical modeling assistive application for users with speech disorders," Applied Soft Computing, vol. 103, p. 107147, 2021. doi:10.1016/j.asoc.2021.107147.
[7] S. Abderrazek, C. Fredouille, A. Ghio, M. Lalain, C. Meunier, and V. Woisard, "Interpreting deep representations of phonetic features via neuro-based concept detector: application to speech disorders due to head and neck cancer," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 200–214, 2023. doi:10.1109/TASLP.2022.3221039.
[8] V. Vashisht, A. Kumar Pandey, and S. Prakash Yadav, "Speech recognition using statistical modeling," IEIE Transactions on Smart Processing and Computing, vol. 10, no. 3, pp. 233–239, 2021.
[9] S. Ayanouz and A. Anouar Abdelhakim, "A smart chatbot architecture based on NLP and statistical modeling for health care assistance," in Proceedings of the 3rd International Conference on Networking, Information Systems & Security, 2020, pp. 1–6.
[10] A. Zhang, "Human computer interaction system for teacher-student interaction model using statistical modeling," International Journal of Human-Computer Interaction, vol. 2022, pp. 1–12, 2022.
[11] A. K. Tyagi and M. Manoj Nair, "Deep learning for clinical and health informatics," in Computational Analysis and Deep Learning for Medical Care: Principles, Methods, and Applications, 2021, pp. 107–129.
[12] P. Janbakhshi, I. Kodrasi, and H. Bourlard, "Subspace-based learning for automatic dysarthric speech detection," IEEE Signal Processing Letters, vol. 28, pp. 96–100, 2020. doi:10.1109/LSP.2020.3044503.
[13] A. Tripathi, S. Bhosale, and S. K. Kopparapu, "Automatic speaker independent dysarthric speech intelligibility assessment system," Computer Speech and Language, vol. 69, p. 101213, 2021. doi:10.1016/j.csl.2021.101213.
[14] C. Sitaula, J. He, A. Priyadarshi, et al., "Neonatal bowel sound detection using convolutional neural network and Laplace hidden semi-Markov model," Scientific Reports, vol. 30, pp. 1853–1864, 2022. doi:10.1109/TASLP.2022.3178225.
[15] J.-F. Landrigan, F. Zhang, and D. Mirman, "A data-driven approach to post-stroke aphasia classification and lesion-based prediction," Brain, vol. 144, pp. 1372–1383, 2021. doi:10.1093/brain/awab010.
[16] K. Jothi and V. Mamatha, "A systematic review of statistical modeling based automatic speech assessment system to evaluate speech impairment," in Proceedings of the 3rd International Conference on Intelligent Sustainable Systems (ICISS), IEEE, 2020, pp. 175–185.
[17] K. Bharti and P. K. Das, "A Survey on ASR Systems for Dysarthric Speech," in Proceedings of the 2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST), IEEE, 2022, pp. 1–6.
[18] E. Smith, S. Hokstad, and K. A. B. Næss, "Children with Down syndrome can benefit from language interventions; Results from a systematic review and meta-analysis," Journal of Communication Disorders, vol. 85, p. 105992, 2020. doi:10.1016/j.jcomdis.2020.105992.
[19] M. J. Page, J. E. McKenzie, P. M. Bossuyt, et al., "The PRISMA 2020 statement: an updated guideline for reporting systematic reviews," International Journal of Surgery, vol. 88, p. 105906, 2021. doi:10.1016/j.ijsu.2021.105906.
[20] A. Mehrish, N. Majumder, R. Bharadwaj, R. Mihalcea, and S. Poria, "A review of deep learning techniques for speech processing," Information Fusion, vol. 2023, p. 101869, 2023.
[21] A. Abeysinghe, M. Fard, R. Jazar, F. Zambetta, and J. Davy, "Mel frequency cepstral coefficient temporal feature integration for classifying squeak and rattle noise," Journal of the Acoustical Society of America, vol. 150, pp. 193–201, 2021. doi:10.1121/10.0005201.
[22] I. Kodrasi and H. Bourlard, "Spectro-temporal sparsity characterization for dysarthric speech detection," Green Engineering, vol. 28, pp. 1210–1222, 2020. doi:10.1109/TASLP.2020.2985066.
[23] N. D. Cilia, C. De Stefano, F. Fontanella, and A. S. Di Freca, "A ranking-based feature selection approach for handwritten character recognition," Pattern Recognition Letters, vol. 121, pp. 77–86, 2019. doi:10.1016/j.patrec.2018.04.007.
[24] S. Hegde, S. Shetty, S. Rai, and T. Dodderi, "A survey on statistical modeling approaches for automatic detection of voice disorders," Journal of Voice, vol. 33, pp. 947–e11, 2019. doi:10.1016/j.jvoice.2018.07.014.
[25] H. Azadi, M. R. Akbarzadeh-T, H. R. Kobravi, and A. Shoeibi, "Robust voice feature selection using interval type-2 Fuzzy AHP for automated diagnosis of Parkinson’s disease," Personal Computing, vol. 29, pp. 2792–2802, 2021. doi:10.1109/TASLP.2021.3097215.
[26] J. Kaur, A. Singh, and V. Kadyan, "Automatic speech recognition system for tonal languages: state-of-the-art survey," Archives of Computational Methods in Engineering, vol. 28, pp. 1039–1068, 2021. doi:10.1007/s11831-020-09414-4.
[27] M. H. Franciscatto, M. D. Del Fabro, J. C. D. Lima, et al., "Towards a speech therapy support system based on phonological processes early detection," Natural Language Processing, vol. 65, p. 101130, 2021. doi:10.1016/j.csl.2020.101130.
[28] S. R. Shahamiri, "Speech vision: an end-to-end deep learning-based dysarthric automatic speech recognition system," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 29, pp. 852–861, 2021. doi:10.1109/TNSRE.2021.3076778.
[29] M. Geng, X. Xie, Z. Ye, et al., "Speaker adaptation using spectro-temporal deep features for dysarthric and elderly speech recognition," Soft Computing, vol. 30, pp. 2597–2611, 2022. doi:10.1109/TASLP.2022.3195113.
[30] S. Dudy, S. Bedrick, M. Asgari, and A. Kain, "Automatic analysis of pronunciations for children with speech sound disorders," Computer Speech and Language, vol. 50, pp. 62–84, 2018. doi:10.1016/j.csl.2017.