A Deep Learning-Based Guidance for Stuttering Prediction

 

Rajeswary Nair1,*, K. S. Kannan2

1Department of Computer Applications, Kalasalingam Academy of Research and Education, Srivilliputtur, India

2Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Srivilliputtur, India

Text Box: Abstract

Advanced stuttering detection and classification using artificial intelligence is the main emphasis of this work. Determining the degree of stuttering for speech therapists, providing an early patient diagnosis and facilitating communication with voice assistants are just a few of the uses for an efficient classification of stuttering and its subclasses. This work's first portion examines the databases and features utilized, along with the deep learning and classical methods used for automated stuttering categorization. The Bayesian Bi-directional Long Short Memory with Fully Convoluted Classifier model (BaBi-LSTM) is a deep learning model in conjunction with an available stuttering information set. The tests evaluate the impact of individual signal features on the classification outcomes, including pitch-determining variables, different 2D speech representations, and Mel-Frequency Cepstral Coefficients (MFCCs). The suggested technique turns out to be the most successful, obtaining a 95% F1 measure for the entire class. When detecting stuttering disorders, deep learning algorithms outperform classical methods. However, the results differ amongst stuttering subtypes because of incomplete data and poor annotation quality. The study also examines the impact of the number of thick layers, the magnitude of the training information set, and the division apportionment of data into training and evaluation groups on the effectiveness of stuttering event recognition to offer insights for future technique improvements.
Email: rajeswarynr@gmail.com; saikannan2012@gmail.com


Received: November 24, 2024 Revised: January 16, 2025 Accepted: March 13, 2025

 

Keywords: Stuttering; Prediction; Deep learning; Cepstral coefficients; Speech representation