emg2vec: Self-supervised Pretraining in Electromyography-based Silent Speech Interfaces.

Qinhan Hou Stefano van Gogh Kevin Scheck Zhao Ren Tanja Schultz Michael Wand Jurgen Schmidhuber

Annu Int Conf IEEE Eng Med Biol Soc

Published: July 2024

Silent speech interfaces (SSI) enable the generation of audio speech or readable texts without vocalization. Electromyography (EMG), being one of the possible source signals of SSI, demonstrates its superiority, particularly for individuals with vocal organ injuries. In this work, we propose a self-pretraining framework, i.e. emg2vec, in EMG-based SSI, including EMG-to-speech and EMG-to-text conversion. Our experiments reveal that self-pretraining yields improvements compared to plain supervised learning. Our experiments show that, compared to training the models from scratch, self-pretraining improves the downstream speech recognition word error rate (WER) relatively by 7.32% when utilizing the entire labeled dataset and by 5.18% when employing only a 20% fraction of the labeled data for supervised training. The improvement also happens in speech synthesis, but only by 2.91% when using 20% of training data.

Download full-text PDF	Source
http://dx.doi.org/10.1109/EMBC53108.2024.10781736	DOI Listing

Publication Analysis

Top Keywords

silent speech

speech interfaces

speech

emg2vec self-supervised

self-supervised pretraining

pretraining electromyography-based

electromyography-based silent

interfaces silent

interfaces ssi

ssi enable

Similar Publications

The Effect of Rhinoplasty on the Acoustic Characteristics of Resonance and Sound Production.

Indian J Otolaryngol Head Neck Surg

January 2025

Sinus and Surgical Endoscopic Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.

Mehdi Bakhshaee Amir Bahador Sadri Davood Sobhani Negar Morovatdar Bashir Rasoulian

Rhinoplasty is the most common cosmetic surgery procedure in Iran. One of the complications of this procedure that has been less considered is the probable effect of rhinoplasty on voice. This study aimed to assess the influence of rhinoplasty on acoustic characteristics of resonance and sound production.

View Article and Find Full Text PDF

Similar Publications

Pausing patterns in English school-age children with a history of late talking: Frequent pauses and prolonged response delays.

J Commun Disord

March 2025

Institute of Language Sciences, Shanghai International Studies University, China; Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, China; National Research Centre for Language and Well-Being, Shanghai, China. Electronic address:

Yanting Sun Hongwei Ding

Introduction: This study explored silent pause patterns, their interaction with filled pauses, and response delays in five-year-old children who were previously identified as late talkers in their conversations with adults.

Methods: We analyzed 73 child-adult conversations (36 with a late-talking history, 37 typically developing) from the CHILDES Clinical English Ellis Weismer Corpus at age five across three temporal stages. Using Praat, we identified and classified silent pauses (> 250 ms) by duration and position and annotated them across three tiers: silent pause categories, pauses near filled pauses, and response delays.

View Article and Find Full Text PDF

Similar Publications

emg2vec: Self-supervised Pretraining in Electromyography-based Silent Speech Interfaces.

Annu Int Conf IEEE Eng Med Biol Soc

July 2024

Qinhan Hou Stefano van Gogh Kevin Scheck Zhao Ren Tanja Schultz

View Article and Find Full Text PDF

Similar Publications

Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion.

Annu Int Conf IEEE Eng Med Biol Soc

July 2024

Zhao Ren Kevin Scheck Qinhan Hou Stefano van Gogh Michael Wand

Electromyography-to-Speech (ETS) conversion has demonstrated its potential for silent speech interfaces by generating audible speech from Electromyography (EMG) signals during silent articulations. ETS models usually consist of an EMG encoder which converts EMG signals to acoustic speech features, and a vocoder which then synthesises the speech signals. Due to an inadequate amount of available data and noisy signals, the synthesised speech often exhibits a low level of naturalness.

View Article and Find Full Text PDF

Similar Publications

Impact of Transcutaneous Electrical Stimulation on Oral Moisture in Older Adults with and without Xerostomia: A Pilot Study.

Folia Phoniatr Logop

February 2025

Swallowing Physiology and Rehabilitation Research Laboratory, Speech Pathology and Audiology Program, Kent State University, Kent, Ohio, USA.

Ali Barikroo Lauren Falter

Introduction: Xerostomia, or dry mouth, is a prevalent and distressing oral health condition in older adults that is associated with reduced swallow frequency, thereby increasing the risk of dysphagia and aspiration pneumonia in this cohort. This pseudo-experimental study investigated the association between transcutaneous electrical stimulation (TES) and changes in perceived oral moisture, as well as the function of major and minor salivary glands in two groups of older adults, including those with and without xerostomia.

Methods: Ten older adults with self-reported xerostomia and 7 control participants were exposed to two conditions: no TES and motor TES.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!