Speaker Identification in Crowd Speech Audio using Convolutional Neural Networks

Fusion: Practice and Applications FPA 2692-4048 2770-0070 10.54216/FPA https://www.americaspg.com/journals/show/3041 2018 2018 Speaker Identification in Crowd Speech Audio using Convolutional Neural Networks Computer Science Department, College of Science, University of Baghdad Husam Husam Computer Science Department, College of Science, University of Baghdad Husam Ali Abdulmohsin Crowd speaker identification is the most advanced technology in the field of audio identification and personal user experience which researchers have extensively focused on, but still, science hasn’t been able to achieve high results in crowed identification. This work aims to design and implement a novel crowd speech identification method that can identify speakers in a multi speaker environment, (two, three, four and five speakers). This work will be implemented through two phases. The training phase is the Convolutional Neural Network (CNN) training and testing phase. Through this phase, the training will be implemented on data generated via the Combinatorial Cartesian Product approach. This approach uses two primary processes, the Computation of the Cartesian product process and combinatorial selection process. The second phase is the prediction phase. The aim of this phase is to check the CNN trained in the first phase, through testing it on new crowed audios than the data that the CNN was trained on in the first phase, these new crowded audios exist in the Ghadeer-Speech-Crowd-Corpus (GSCC) dataset, which is a new database designed through this work. Compared to the state-of-the-art speaker identification in multi speaker environment approaches, the results are impressive, with a recognition rate of 99.5% for audio with three speakers, 98.5% for music with four speakers, and 96.4% for audio with five speakers. 2024 2024 118 125 10.54216/FPA.160208 https://www.americaspg.com/articleinfo/3/show/3041