Volume 15 , Issue 1 , PP: 105-121, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Hadeel Luhaib Fouad 1 * , Husam Ali Abdulmohsin 2
Doi: https://doi.org/10.54216/JISIoT.150109
Speech-to-text Conversion is a type of Speech Recognition Program that effectively takes audio content as input and transcribes it into written words. With increasing technologies and large data corpus, the importance of speech recognition has increased. Now everyone seems to be exploitation Speech Recognition Technology for users to work a tool, perform commands, or write while not having to use a keyboard, mouse, or press any buttons. It is also easy for everyone to utter sound or speak than using hands to be work done and it is also convenient to use. In this paper, a system capable of converting audio files to text has been developed. The proposed system consists of a set of algorithms for processing audio files, where the MFCC algorithm combine with standard deviation was adopted to extract the features of the audio file and convert it into an image. The features of audio files are stored as images because deep learning algorithms can be trained on images better than CSV files. The second part of the proposed system is the design of a deep learning model in which two algorithms, Convolutional Neural Network (CNN) and Deep Neural Network (DNN) are combined to predict words. The model consists of a set of layers to extract the features from the images, choose the best features, then train and classify them based on the proposed DNN model. In this thesis, three types of datasets (Arabic, English, and Real) were adopted to test the proposed system in speech prediction and the accuracy of the proposed system has reached more than 95%.
Speech Recognition , Convolutional Neural Network , Deep Neural Network , MFCC
[1] H. A. Abdulmohsin, et al., "Automatic illness prediction system through speech," compute. Electr. Eng., vol. 102, p. 108224, 2022.
[2] H. A. Abdulmohsin, "A new proposed statistical feature extraction method in speech emotion recognition," Compute. Electr. Eng., vol. 93, p. 107172, 2021.
[3] Z. K. Mohammed and N. A. Z. Abdullah, "Survey for Arabic part of speech tagging based on machine learning," Iraqi J. Sci., vol. 63, no. 8, pp. 2676-2685, 2022.
[4] A. A. Hussien and N. A. Z. Abdullah, "A review for Arabic sentiment analysis using deep learning," Iraqi J. Sci., vol. 64, no. 12, 2023.
[5] A. R. Ali, "Multi-dialect Arabic speech recognition," in Proc. 2020 Int. Joint Conf. Neural Networks (IJCNN), 2020.
[6] P. D. Reddy, C. Rudresh, and A. S. Adithya, "Multilingual speech recognition methods using deep learning and cosine similarity," CS & IT Conf. Proc., vol. 12, no. 7, pp. 1-7, 2022.
[7] H. P. Arun, et al., "Malayalam speech to text conversion using deep learning," IOSR J. Eng., vol. 11, no. 7, pp. 24-30, 2021.
[8] A. Alsobhani, H. M. A. ALabboodi, and H. Mahdi, "Speech recognition using convolution deep neural networks," J. Phys.: Conf. Ser., vol. 1973, no. 1, 2021.
[9] E. R. Abdelmaksoud, et al., "Convolutional neural network for Arabic speech recognition," Egypt. J. Lang. Eng., vol. 8, no. 1, pp. 27-38, 2021.
[10] A. Bhavani and N. R. Moparthi, "Speech recognition using the NN," Int. J. Adv. Res. Eng. Technol., vol. 11, no. 6, pp. 2663-2671, 2020.
[11] C. Sridhar and A. Kanhe, "Performance comparison of various neural networks for speech recognition," J. Phys.: Conf. Ser., vol. 2466, no. 1, 2023.
[12] K. Yalova, M. Babenko, and K. Yashyna, "Automatic speech recognition system with dynamic time warping and mel-frequency cepstral coefficients," COLINS, vol. 2, pp. 1-7, 2023.