Deep Learning and Fourier Transform for Speaker Recognition(DLFSR)

Sayed, Taqwa Mahmoud; Gody, Amr; Muhammad, Sayed T.

doi:10.21608/fuje.2024.313518.1090

Deep Learning and Fourier Transform for Speaker Recognition(DLFSR)

Document Type : Original Article

Authors

¹ tamiyyah-fayoum-egypt tamiyyah.fayoum.egypt

² Kyman Faryes Faculty of engineering

³ Computers and Systems Engineering Department, Faculty of Engineering, Fayoum University,Fayoum ,Egypt

10.21608/fuje.2024.313518.1090

Abstract

Automatic Speaker recognition (ASR) and verification have gained increased visibility and significance in society as speech technology. Speaker recognition has undergone a revolution due to deep learning techniques, specifically deep neural networks (DNNs). With the use of models like convolutional neural networks (CNNs) and recurrent neural networks (RNNs), it is possible to learn discriminative features directly from unprocessed speech signals without the requirement for manual feature extraction. A growing number of people are using end-to-end speaker recognition models because of how well they work and how easily they can link speaker IDs to speech waveforms. It can recognize and authenticate people based on their distinct vocal traits. A lot of Applications of automatic speaker recognition can be found in many areas, such as voice-based digital device authentication, forensic analysis of audio recordings, access control, and phone-based customer support identification. Through our study, we introduce a Deep Learning and Fourier Transform for Speaker Recognition model (LDLSR)that based on Short Term Fourier Transform (STFT) in which the input speech can be transformed into spectrogram then we apply deep learning especially Convolutional Neural Network (CNN) to the spectrogram images to extract feature and classify the spoken person. The training and validation test are applied on speaker recognition dataset 16000pcm.This model performs excellent result with 98.8% correct identification and classification.

Keywords