We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is desrcribed. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergarten through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TVCG.2006.18DOI Listing

Publication Analysis

Top Keywords

visible speech
24
accurate visible
8
speech synthesis
8
motion capture
8
capture data
8
source face
8
target face
8
face model
8
animation system
8
speech
6

Similar Publications

Objectives: The acoustic change complex (ACC) is a cortical auditory evoked potential that shows promise as an objective test of the neural capacity for speech and sound discrimination, particularly for difficult-to-test populations, for example, cognitively impaired adults. There is uncertainty, however, surrounding the performance of the ACC with behavioral measures. The objective of this study was to systematically review the literature, focusing on adult studies, to investigate the relationship between ACC responses and behavioral psychophysical measures.

View Article and Find Full Text PDF

Background: Reduction of spontaneous nystagmus by fixation, a characteristic feature of peripheral nystagmus, is important for differentiating between peripheral and central vestibular disorders. In the emergency room, Frenzel goggles are recommended to observe spontaneous nystagmus for the differential diagnosis of acute vestibular syndrome. We developed a portable loupe with a Fresnel lens to observe nystagmus.

View Article and Find Full Text PDF

The evolving trajectory of conjunction use in the ELT research articles.

Front Res Metr Anal

November 2024

English Language and Literature Department, Faculty of Language and Literature, Yazd University, Yazd, Iran.

The present study was aimed at a diachronic investigation of conjunction as a grammatical cohesive device in ELT research articles. A total number of 100 research articles concentrating on teaching writing skills in the EFL context, and were released in two extremes of 1980-82 and 2020-22 were selected. The caution was taken to choose the papers which were written by expert English writers.

View Article and Find Full Text PDF
Article Synopsis
  • A 73-year-old patient with Parkinsonian-type multiple system atrophy (MSA-P) experienced motor symptoms, including gait disturbances and swallowing difficulties, prompting medical evaluation.
  • Brain imaging and diagnostic tests confirmed MSA-P, but traditional Parkinson’s treatments showed minimal effectiveness as the patient's condition worsened.
  • After undergoing Lee Silverman Voice Treatment (LSVT-LOUD) at age 75, the patient showed significant improvements in speech volume, swallowing abilities, and cognitive function, indicating potential benefits of LSVT-LOUD for patients with MSA-P.
View Article and Find Full Text PDF

Prosodic Preferences of Surface Electromyography-based Subvocal Speech for People With Laryngectomy.

J Voice

December 2024

Delsys, Inc., Natick, Massachusetts 01760; Altec, Inc., Natick, Massachusetts 01760; Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215. Electronic address:

Introduction: People who undergo a total laryngectomy lose their natural voice and depend on alaryngeal technologies for communication. However, these technologies are often difficult to use and lack prosody. Surface electromyographic-based silent speech interfaces are novel communication systems that overcome many of the shortcomings of traditional alaryngeal speech and have the potential to seamlessly incorporate individualized prosody.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!