Fusion: Practice and Applications
FPA
2692-4048
2770-0070
10.54216/FPA
https://www.americaspg.com/journals/show/3041
2018
2018
Speaker Identification in Crowd Speech Audio using Convolutional Neural Networks
Computer Science Department, College of Science, University of Baghdad
Husam
Husam
Computer Science Department, College of Science, University of Baghdad
Husam Ali
Abdulmohsin
Crowd speaker identification is the most advanced technology in the field of audio identification and personal user experience which researchers have extensively focused on, but still, science hasn’t been able to achieve high results in crowed identification. This work aims to design and implement a novel crowd speech identification method that can identify speakers in a multi speaker environment, (two, three, four and five speakers). This work will be implemented through two phases. The training phase is the Convolutional Neural Network (CNN) training and testing phase. Through this phase, the training will be implemented on data generated via the Combinatorial Cartesian Product approach. This approach uses two primary processes, the Computation of the Cartesian product process and combinatorial selection process. The second phase is the prediction phase. The aim of this phase is to check the CNN trained in the first phase, through testing it on new crowed audios than the data that the CNN was trained on in the first phase, these new crowded audios exist in the Ghadeer-Speech-Crowd-Corpus (GSCC) dataset, which is a new database designed through this work. Compared to the state-of-the-art speaker identification in multi speaker environment approaches, the results are impressive, with a recognition rate of 99.5% for audio with three speakers, 98.5% for music with four speakers, and 96.4% for audio with five speakers.
2024
2024
118
125
10.54216/FPA.160208
https://www.americaspg.com/articleinfo/3/show/3041