Glove-TalkII--a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

IEEE Trans Neural Netw

Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada.

Published: May 2010

Glove-TalkII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a Cyberglove, a ContactGlove, a three-space tracker, and a foot pedal), a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.

Download full-text PDF

Source
http://dx.doi.org/10.1109/72.655042DOI Listing

Publication Analysis

Top Keywords

parallel formant
12
formant speech
12
speech synthesizer
12
hand gestures
8
fundamental frequency
8
input devices
8
vowel consonant
8
gating network
8
examples user
8
speech
5

Similar Publications

Cortical tracking of speakers' spectral changes predicts selective listening.

Cereb Cortex

December 2024

Instituto de Investigaciones Biológicas Clemente Estable, Department of Integrative and Computational Neurosciences, Av. Italia 3318, Montevideo, 11.600, Uruguay.

A social scene is particularly informative when people are distinguishable. To understand somebody amid a "cocktail party" chatter, we automatically index their voice. This ability is underpinned by parallel processing of vocal spectral contours from speech sounds, but it has not yet been established how this occurs in the brain's cortex.

View Article and Find Full Text PDF

Acoustic and Kinematic Predictors of Intelligibility and Articulatory Precision in Parkinson's Disease.

J Speech Lang Hear Res

October 2024

School of Communication Science and Disorders, Florida State University, Tallahassee, FL.

Purpose: This study investigated relationships within and between perceptual, acoustic, and kinematic measures in speakers with and without dysarthria due to Parkinson's disease (PD) across different clarity conditions. Additionally, the study assessed the predictive capabilities of selected acoustic and kinematic measures for intelligibility and articulatory precision ratings.

Method: Forty participants, comprising 22 with PD and 18 controls, read three phrases aloud using conversational, less clear, and more clear speaking conditions.

View Article and Find Full Text PDF

Biofeedback therapy is mainly based on the analysis of physiological features to improve an individual's affective state. There are insufficient objective indicators to assess symptom improvement after biofeedback. In addition to psychological and physiological features, speech features can precisely convey information about emotions.

View Article and Find Full Text PDF

Acoustic cues play a major role in social interactions in many animal species. In addition to the semantic contents of human speech, voice attributes - e.g.

View Article and Find Full Text PDF

MFCC Parameters of the Speech Signal: An Alternative to Formant-Based Instantaneous Vocal Tract Length Estimation.

J Voice

June 2023

Escuela de Ing. Eléctrica, Electrónica y de Telecomunicaciones (E3T), Universidad Industrial de Santander, Bucaramanga, Colombia.

On the one hand, the relationship between formant frequencies and vocal tract length (VTL) has been intensively studied over the years. On the other hand, the connection involving mel-frequency cepstral coefficients (MFCCs), which concisely codify the overall shape of a speaker's spectral envelope with just a few cepstral coefficients, and VTL has only been modestly analyzed, being worth of further investigation. Thus, based on different statistical models, this article explores the advantages and disadvantages of the latter approach, which is relatively novel, in contrast to the former which arises from more traditional studies.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!