Fusion: Practice and Applications

Journal DOI

https://doi.org/10.54216/FPA

Submit Your Paper

2692-4048ISSN (Online) 2770-0070ISSN (Print)

Volume 16 , Issue 2 , PP: 118-125, 2024 | Cite this article as | XML | Html | PDF | Full Length Article

Speaker Identification in Crowd Speech Audio using Convolutional Neural Networks

Ghadeer Qasim Ali 1 , Husam Ali Abdulmohsin 2 *

  • 1 Computer Science Department, College of Science, University of Baghdad - (Ghadeer.Ali2201m@sc.uobaghdad.edu.iq)
  • 2 Computer Science Department, College of Science, University of Baghdad - (Husam.a@sc.uobaghdad.edu.iq)
  • Doi: https://doi.org/10.54216/FPA.160208

    Received: December 21, 2023 Revised: February 25, 2024 Accepted: June 02, 2024
    Abstract

    Crowd speaker identification is the most advanced technology in the field of audio identification and personal user experience which researchers have extensively focused on, but still, science hasn’t been able to achieve high results in crowed identification. This work aims to design and implement a novel crowd speech identification method that can identify speakers in a multi speaker environment, (two, three, four and five speakers). This work will be implemented through two phases. The training phase is the Convolutional Neural Network (CNN) training and testing phase. Through this phase, the training will be implemented on data generated via the Combinatorial Cartesian Product approach. This approach uses two primary processes, the Computation of the Cartesian product process and combinatorial selection process. The second phase is the prediction phase. The aim of this phase is to check the CNN trained in the first phase, through testing it on new crowed audios than the data that the CNN was trained on in the first phase, these new crowded audios exist in the Ghadeer-Speech-Crowd-Corpus (GSCC) dataset, which is a new database designed through this work. Compared to the state-of-the-art speaker identification in multi speaker environment approaches, the results are impressive, with a recognition rate of 99.5% for audio with three speakers, 98.5% for music with four speakers, and 96.4% for audio with five speakers.

    Keywords :

    Crowded speech identification , Combinatorial Cartesian Product , GSCC Dataset

    References

    [1]      A. Shakat, K. I. Arif, S. Hasan, Y. Dawood, and M. A. Mohammed, “YouTube keyword search engine using speech recognition,” Iraqi J. Sci., vol. 2021, pp. 167–173, 2021, doi: 10.24996/ijs.2021.SI.1.23.

    [2]    Samyuktha, S. Kavitha, D. Kshaya, V. Shalini, P. Ramya, R. "A Survey on Cyber Security Meets Artificial Intelligence: AI– Driven Cyber Security," Journal of Journal of Cognitive Human-Computer Interaction, vol. 2, no. 2, pp. 50-55, 2022. DOI: https://doi.org/10.54216/JCHCI.020202

    [3]    anthi, V. Kumar, A. "Enhancing Healthcare Monitoring through the Integration of IoT Networks and Machine Learning," Journal of International Journal of Wireless and Ad Hoc Communication, vol. 7, no. 1, pp. 28-39, 2023. DOI: https://doi.org/10.54216/IJWAC.070103

    [4]       H. A. Abdulmohsin, B. Al-Khateeb, S. S. Hasan, and R. Dwivedi, “Automatic illness prediction system through speech,” Comput. Electr. Eng., vol. 102, p. 108224, 2022.

    [5]       A. Mehrish, N. Majumder, R. Bharadwaj, R. Mihalcea, and S. Poria, “A review of deep learning techniques for speech processing,” Inf. Fusion, p. 101869, 2023.

    [6]       A. Tripathi, H. Lu, and H. Sak, “End-to-end multi-talker overlapping speech recognition.” Google Patents, Dec. 06, 2022.

    [7]      N. M. Jakovljevic, T. V Delic, S. V Etinski, D. M. Miskovic, and T. G. Loncar-Turukalo, “A multi-target speaker detection and identification system based on combination of plda and dnn,” in 2018 26th Telecommunications Forum (TELFOR), IEEE, 2018, pp. 1–4.

    [8]     M. Thakker, S. Vyas, P. Ved, and S. Shanthi Therese, “Speaker identification in a multi-speaker environment,” in Information and Communication Technology for Sustainable Development: Proceedings of ICT4SD 2016, Volume 2, Springer, 2018, pp. 239–244.

    [9]      N. Kanda, Y. Fujita, S. Horiguchi, R. Ikeshita, K. Nagamatsu, and S. Watanabe, “Acoustic modeling for distant multi-talker speech recognition with single-and multi-channel branches,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 6630–6634.

    [10]    V.-T. Tran and W.-H. Tsai, “Speaker identification in multi-talker overlapping speech using neural networks,” IEEE Access, vol. 8, pp. 134868–134879, 2020.

    [11]    M. Yousefi and J. H. L. Hansen, “Frame-based overlapping speech detection using convolutional neural networks,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2020, pp. 6744–6748.

    [12]   H. Sato, T. Ochiai, M. Delcroix, K. Kinoshita, T. Moriya, and N. Kamo, “Should we always separate?: Switching between enhanced and observed signals for overlapping speech recognition,” arXiv Prepr. arXiv2106.00949, 2021.

    [13]   Z. Li and J. Whitehill, “Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 7163–7167.

    [14]   Lhiani, D. Al-basheer, O. "Gorilla Troops Optimizer with Deep Learning-based Multi-Criteria Decision Making for Traffic Analysis in V2X Networks," Journal of International Journal of Advances in Applied Computational Intelligence, vol. 5, no. 2, pp. 60-72, 2024. DOI: https://doi.org/10.54216/IJAACI.050205

    [15]     L. Meng, J. Kang, M. Cui, H. Wu, X. Wu, and H. Meng, “Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator,” arXiv Prepr. arXiv2305.16263, 2023.

    [16]    H. A. Abdulmohsin, “Automatic Health Speech Prediction System Using Support Vector Machine,” in Proceedings of International Conference on Computing and Communication Networks: ICCCN 2021, Springer, 2022, pp. 165–175.

    Cite This Article As :
    Qasim, Ghadeer. , Ali, Husam. Speaker Identification in Crowd Speech Audio using Convolutional Neural Networks. Fusion: Practice and Applications, vol. , no. , 2024, pp. 118-125. DOI: https://doi.org/10.54216/FPA.160208
    Qasim, G. Ali, H. (2024). Speaker Identification in Crowd Speech Audio using Convolutional Neural Networks. Fusion: Practice and Applications, (), 118-125. DOI: https://doi.org/10.54216/FPA.160208
    Qasim, Ghadeer. Ali, Husam. Speaker Identification in Crowd Speech Audio using Convolutional Neural Networks. Fusion: Practice and Applications , no. (2024): 118-125. DOI: https://doi.org/10.54216/FPA.160208
    Qasim, G. , Ali, H. (2024) . Speaker Identification in Crowd Speech Audio using Convolutional Neural Networks. Fusion: Practice and Applications , () , 118-125 . DOI: https://doi.org/10.54216/FPA.160208
    Qasim G. , Ali H. [2024]. Speaker Identification in Crowd Speech Audio using Convolutional Neural Networks. Fusion: Practice and Applications. (): 118-125. DOI: https://doi.org/10.54216/FPA.160208
    Qasim, G. Ali, H. "Speaker Identification in Crowd Speech Audio using Convolutional Neural Networks," Fusion: Practice and Applications, vol. , no. , pp. 118-125, 2024. DOI: https://doi.org/10.54216/FPA.160208