Introduction: Language impairments often result from severe neurological disorders, driving the development of neural prosthetics utilizing electrophysiological signals to restore comprehensible language. Previous decoding efforts primarily focused on signals from the cerebral cortex, neglecting subcortical brain structures' potential contributions to speech decoding in brain-computer interfaces.
Methods: In this study, stereotactic electroencephalography (sEEG) was employed to investigate subcortical structures' role in speech decoding. Two native Mandarin Chinese speakers, undergoing sEEG implantation for epilepsy treatment, participated. Participants read Chinese text, with 1-30, 30-70, and 70-150 Hz frequency band powers of sEEG signals extracted as key features. A deep learning model based on long short-term memory assessed the contribution of different brain structures to speech decoding, predicting consonant articulatory place, manner, and tone within single syllable.
Results: Cortical signals excelled in articulatory place prediction (86.5% accuracy), while cortical and subcortical signals performed similarly for articulatory manner (51.5% vs. 51.7% accuracy). Subcortical signals provided superior tone prediction (58.3% accuracy). The superior temporal gyrus was consistently relevant in speech decoding for consonants and tone. Combining cortical and subcortical inputs yielded the highest prediction accuracy, especially for tone.
Discussion: This study underscores the essential roles of both cortical and subcortical structures in different aspects of speech decoding.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10937352 | PMC |
http://dx.doi.org/10.3389/fnins.2024.1345308 | DOI Listing |
Sci Rep
January 2025
Mallinckrodt Institute of Radiology, Washington University School of Medicine, 4515 McKinley Ave., St. Louis, MO, 63110, USA.
Functional magnetic resonance imaging (fMRI) has dramatically advanced non-invasive human brain mapping and decoding. Functional near-infrared spectroscopy (fNIRS) and high-density diffuse optical tomography (HD-DOT) non-invasively measure blood oxygen fluctuations related to brain activity, like fMRI, at the brain surface, using more-lightweight equipment that circumvents ergonomic and logistical limitations of fMRI. HD-DOT grids have smaller inter-optode spacing (~ 13 mm) than sparse fNIRS (~ 30 mm) and therefore provide higher image quality, with spatial resolution ~ 1/2 that of fMRI, when using the several source-detector distances (13-40 mm) afforded by the HD-DOT grid.
View Article and Find Full Text PDFSensors (Basel)
January 2025
SHCCIG Yubei Coal Industry Co., Ltd., Xi'an 710900, China.
The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as "Shapu", characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture.
View Article and Find Full Text PDFNeurophotonics
January 2025
Washington University School of Medicine, Mallinckrodt Institute of Radiology, St. Louis, Missouri, United States.
Significance: Decoding naturalistic content from brain activity has important neuroscience and clinical implications. Information about visual scenes and intelligible speech has been decoded from cortical activity using functional magnetic resonance imaging (fMRI) and electrocorticography, but widespread applications are limited by the logistics of these technologies.
Aim: High-density diffuse optical tomography (HD-DOT) offers image quality approaching that of fMRI but with the silent, open scanning environment afforded by optical methods, thus opening the door to more naturalistic research and applications.
Front Hum Neurosci
January 2025
Center for Ear-EEG, Department of Electrical and Computer Engineering, Aarhus University, Aarhus, Denmark.
The recent progress in auditory attention decoding (AAD) methods is based on algorithms that find a relation between the audio envelope and the neurophysiological response. The most popular approach is based on the reconstruction of the audio envelope from electroencephalogram (EEG) signals. These methods are primarily based on the exogenous response driven by the physical characteristics of the stimuli.
View Article and Find Full Text PDFSci Rep
January 2025
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing, 100081, China.
Speech-to-speech translation (S2ST) has evolved from cascade systems which integrate Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS), to end-to-end models. This evolution has been driven by advancements in model performance and the expansion of cross-lingual speech datasets. Despite the paucity of research on Tibetan speech translation, this paper endeavors to tackle the challenge of Tibetan-to-Chinese direct speech-to-speech translation within the multi-task learning framework, employing self-supervised learning (SSL) and sequence-to-sequence model training.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!