Extraction of signal features in Voice Signals to train Machine Learning-based Classifier algorithms for Emotion Detection

Simran Somani¹, Bhagyashree Shah², Bhisaji C. Surve³

¹Student of MBA TECH (IT), MPSTME, NMIMS University, Mumbai, India

² Student of MBA TECH (IT), MPSTME, NMIMS University, Mumbai, India

³Asst. Professor, Dept. of IT, MPSTME, NMIMS University, Mumbai, India

Emails: simran.somani@nmims.in; bhagyashree.shah@nmims.in;bhisaji.surve@nmims.edu

Abstract

This research aims to detect human emotions using speech signals through the development and implementation of methodologies, namely the frequency domain synthesis. To achieve improved results, various machine learning and deep learning models were applied for implementation and their resulting model performance was analyzed. The research findings revealed that each model exhibited different accuracy rates for different emotions but weighted accuracy is best for deep learning based model. This study provides valuable insights into the feasibility and effectiveness of utilizing different methodologies and models for emotion detection through voice signals synthesis. The audio signals are synthesized for Mel-Frequency Cestrum Coefficients (MFCC), Chroma, and MEL characteristics, which are then used as features to train the various machine learning-based classifiers. Python libraries like Librosa, Sklearn, Pyaudio, Numpy, and sound files are used to analyze voice modulations and identify emotions.

Keywords: MFCC; Emotion Detection; Machine Learning; Neural network.