Synthesizing Audio from Tongue Motion During Speech Using Tagged MRI Via Transformer.

Xiaofeng Liu Fangxu Xing Jerry L Prince Maureen Stone Georges El Fakhri Jonghye Woo

Proc SPIE Int Soc Opt Eng

Gordon Center for Medical Imaging, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114 USA.

Published: February 2023

Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10669779	PMC
http://dx.doi.org/10.1117/12.2653345	DOI Listing

Publication Analysis

Top Keywords

motion fields

motion

tagged mri

audio waveforms

speech

fields

synthesizing audio

audio tongue

tongue motion

motion speech

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!