We present a new architecture to address the challenges of speaker identification that arise in interaction of humans with social robots. Though deep learning systems have led to impressive performance in many speech applications, limited speech data at training stage and short utterances with background noise at test stage present challenges and are still open problems as no optimum solution has been reported to date. The proposed design employs a generative model namely the Gaussian mixture model (GMM) and a discriminative model-support vector machine (SVM) classifiers as well as prosodic features and short-term spectral features to concurrently classify a speaker's gender and his/her identity.
View Article and Find Full Text PDF