The presented paper is concerned with detection of presentation attacks against unsupervised remote biometric speaker verification, using a well-known challenge-response scheme. We propose a novel approach to convolutional phoneme classifier training, which ensures high phoneme recognition accuracy even for significantly simplified network architectures, thus enabling efficient utterance verification on resource-limited hardware, such as mobile phones or embedded devices. We consider Deep Convolutional Neural Networks operating on windows of speech Mel-Spectrograms as a means for phoneme recognition, and we show that one can boost the performance of highly simplified neural architectures by modifying the principle underlying training set construction.
View Article and Find Full Text PDF