Dynamic time warping in phoneme modeling for fast pronunciation error detection.

Comput Biol Med

Faculty of Biomedical Engineering, Silesian University of Technology, Roosevelta 40, 41-800 Zabrze, Poland.

Published: February 2016

The presented paper describes a novel approach to the detection of pronunciation errors. It makes use of the modeling of well-pronounced and mispronounced phonemes by means of the Dynamic Time Warping (DTW) algorithm. Four approaches that make use of the DTW phoneme modeling were developed to detect pronunciation errors: Variations of the Word Structure (VoWS), Normalized Phoneme Distances Thresholding (NPDT), Furthest Segment Search (FSS) and Normalized Furthest Segment Search (NFSS). The performance evaluation of each module was carried out using a speech database of correctly and incorrectly pronounced words in the Polish language, with up to 10 patterns of every trained word from a set of 12 words having different phonetic structures. The performance of DTW modeling was compared to Hidden Markov Models (HMM) that were used for the same four approaches (VoWS, NPDT, FSS, NFSS). The average error rate (AER) was the lowest for DTW with NPDT (AER=0.287) and scored better than HMM with FSS (AER=0.473), which was the best result for HMM. The DTW modeling was faster than HMM for all four approaches. This technique can be used for computer-assisted pronunciation training systems that can work with a relatively small training speech corpus (less than 20 patterns per word) to support speech therapy at home.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2015.12.004DOI Listing

Publication Analysis

Top Keywords

dynamic time
8
time warping
8
phoneme modeling
8
pronunciation errors
8
furthest segment
8
segment search
8
dtw modeling
8
hmm approaches
8
modeling
5
dtw
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!