Speech decoding using cortical and subcortical electrophysiological signals.

Hemmings Wu Chengwei Cai Wenjie Ming Wangyu Chen Zhoule Zhu Chen Feng Hongjie Jiang Zhe Zheng Mohamad Sawan Ting Wang Junming Zhu

Front Neurosci

Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.

Published: February 2024

Introduction: Language impairments often result from severe neurological disorders, driving the development of neural prosthetics utilizing electrophysiological signals to restore comprehensible language. Previous decoding efforts primarily focused on signals from the cerebral cortex, neglecting subcortical brain structures' potential contributions to speech decoding in brain-computer interfaces.

Methods: In this study, stereotactic electroencephalography (sEEG) was employed to investigate subcortical structures' role in speech decoding. Two native Mandarin Chinese speakers, undergoing sEEG implantation for epilepsy treatment, participated. Participants read Chinese text, with 1-30, 30-70, and 70-150 Hz frequency band powers of sEEG signals extracted as key features. A deep learning model based on long short-term memory assessed the contribution of different brain structures to speech decoding, predicting consonant articulatory place, manner, and tone within single syllable.

Results: Cortical signals excelled in articulatory place prediction (86.5% accuracy), while cortical and subcortical signals performed similarly for articulatory manner (51.5% vs. 51.7% accuracy). Subcortical signals provided superior tone prediction (58.3% accuracy). The superior temporal gyrus was consistently relevant in speech decoding for consonants and tone. Combining cortical and subcortical inputs yielded the highest prediction accuracy, especially for tone.

Discussion: This study underscores the essential roles of both cortical and subcortical structures in different aspects of speech decoding.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10937352	PMC
http://dx.doi.org/10.3389/fnins.2024.1345308	DOI Listing

Publication Analysis

Top Keywords

speech decoding

cortical subcortical

electrophysiological signals

articulatory place

subcortical signals

subcortical

signals

speech

decoding

cortical

Similar Publications

Ultra high density imaging arrays in diffuse optical tomography for human brain mapping improve image quality and decoding performance.

Sci Rep

January 2025

Mallinckrodt Institute of Radiology, Washington University School of Medicine, 4515 McKinley Ave., St. Louis, MO, 63110, USA.

Zachary E Markow Jason W Trobaugh Edward J Richter Kalyan Tripathy Sean M Rafferty

Functional magnetic resonance imaging (fMRI) has dramatically advanced non-invasive human brain mapping and decoding. Functional near-infrared spectroscopy (fNIRS) and high-density diffuse optical tomography (HD-DOT) non-invasively measure blood oxygen fluctuations related to brain activity, like fMRI, at the brain surface, using more-lightweight equipment that circumvents ergonomic and logistical limitations of fMRI. HD-DOT grids have smaller inter-optode spacing (~ 13 mm) than sparse fNIRS (~ 30 mm) and therefore provide higher image quality, with spatial resolution ~ 1/2 that of fMRI, when using the several source-detector distances (13-40 mm) afforded by the HD-DOT grid.

View Article and Find Full Text PDF

Similar Publications

An End-To-End Speech Recognition Model for the North Shaanxi Dialect: Design and Evaluation.

Sensors (Basel)

January 2025

SHCCIG Yubei Coal Industry Co., Ltd., Xi'an 710900, China.

Yi Qin Feifan Yu

The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as "Shapu", characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture.

View Article and Find Full Text PDF

Similar Publications

Multisensory naturalistic decoding with high-density diffuse optical tomography.

Neurophotonics

January 2025

Washington University School of Medicine, Mallinckrodt Institute of Radiology, St. Louis, Missouri, United States.

Kalyan Tripathy Zachary E Markow Morgan Fogarty Mariel L Schroeder Alexa M Svoboda

Significance: Decoding naturalistic content from brain activity has important neuroscience and clinical implications. Information about visual scenes and intelligible speech has been decoded from cortical activity using functional magnetic resonance imaging (fMRI) and electrocorticography, but widespread applications are limited by the logistics of these technologies.

Aim: High-density diffuse optical tomography (HD-DOT) offers image quality approaching that of fMRI but with the silent, open scanning environment afforded by optical methods, thus opening the door to more naturalistic research and applications.

View Article and Find Full Text PDF

Similar Publications

Cognitive component of auditory attention to natural speech events.

Front Hum Neurosci

January 2025

Center for Ear-EEG, Department of Electrical and Computer Engineering, Aarhus University, Aarhus, Denmark.

Nhan Duc Thanh Nguyen Kaare Mikkelsen Preben Kidmose

The recent progress in auditory attention decoding (AAD) methods is based on algorithms that find a relation between the audio envelope and the neurophysiological response. The most popular approach is based on the reconstruction of the audio envelope from electroencephalogram (EEG) signals. These methods are primarily based on the exogenous response driven by the physical characteristics of the stimuli.

View Article and Find Full Text PDF

Similar Publications

Tibetan-Chinese speech-to-speech translation based on discrete units.

Sci Rep

January 2025

Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing, 100081, China.

Zairan Gong Xiaona Xu Yue Zhao

Speech-to-speech translation (S2ST) has evolved from cascade systems which integrate Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS), to end-to-end models. This evolution has been driven by advancements in model performance and the expansion of cross-lingual speech datasets. Despite the paucity of research on Tibetan speech translation, this paper endeavors to tackle the challenge of Tibetan-to-Chinese direct speech-to-speech translation within the multi-task learning framework, employing self-supervised learning (SSL) and sequence-to-sequence model training.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!