Advancing the knowledge about neural speech mechanisms is critical for developing next-generation, faster brain computer interface to assist in speech communication for the patients with severe neurological conditions (e.g., locked-in syndrome). Among current neuroimaging techniques, Magnetoencephalography (MEG) provides direct representation for the large-scale neural dynamics of underlying cognitive processes based on its optimal spatiotemporal resolution. However, the MEG measured neural signals are smaller in magnitude compared to the background noise and hence, MEG usually suffers from a low signal-to-noise ratio (SNR) at the single-trial level. To overcome this limitation, it is common to record many trials of the same event-task and use the time-locked average signal for analysis, which can be very time consuming. In this study, we investigated the effect of the number of MEG recording trials required for speech decoding using a machine learning algorithm. We used a wavelet filter for generating the denoised neural features to train an Artificial Neural Network (ANN) for speech decoding. We found that wavelet based denoising increased the SNR of the neural signal prior to analysis and facilitated accurate speech decoding performance using as few as 40 single-trials. This study may open up the possibility of limiting MEG trials for other task evoked studies as well.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6876632 | PMC |
http://dx.doi.org/10.1007/978-3-030-05587-5_16 | DOI Listing |
Sci Rep
January 2025
Mallinckrodt Institute of Radiology, Washington University School of Medicine, 4515 McKinley Ave., St. Louis, MO, 63110, USA.
Functional magnetic resonance imaging (fMRI) has dramatically advanced non-invasive human brain mapping and decoding. Functional near-infrared spectroscopy (fNIRS) and high-density diffuse optical tomography (HD-DOT) non-invasively measure blood oxygen fluctuations related to brain activity, like fMRI, at the brain surface, using more-lightweight equipment that circumvents ergonomic and logistical limitations of fMRI. HD-DOT grids have smaller inter-optode spacing (~ 13 mm) than sparse fNIRS (~ 30 mm) and therefore provide higher image quality, with spatial resolution ~ 1/2 that of fMRI, when using the several source-detector distances (13-40 mm) afforded by the HD-DOT grid.
View Article and Find Full Text PDFSensors (Basel)
January 2025
SHCCIG Yubei Coal Industry Co., Ltd., Xi'an 710900, China.
The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as "Shapu", characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture.
View Article and Find Full Text PDFNeurophotonics
January 2025
Washington University School of Medicine, Mallinckrodt Institute of Radiology, St. Louis, Missouri, United States.
Significance: Decoding naturalistic content from brain activity has important neuroscience and clinical implications. Information about visual scenes and intelligible speech has been decoded from cortical activity using functional magnetic resonance imaging (fMRI) and electrocorticography, but widespread applications are limited by the logistics of these technologies.
Aim: High-density diffuse optical tomography (HD-DOT) offers image quality approaching that of fMRI but with the silent, open scanning environment afforded by optical methods, thus opening the door to more naturalistic research and applications.
Front Hum Neurosci
January 2025
Center for Ear-EEG, Department of Electrical and Computer Engineering, Aarhus University, Aarhus, Denmark.
The recent progress in auditory attention decoding (AAD) methods is based on algorithms that find a relation between the audio envelope and the neurophysiological response. The most popular approach is based on the reconstruction of the audio envelope from electroencephalogram (EEG) signals. These methods are primarily based on the exogenous response driven by the physical characteristics of the stimuli.
View Article and Find Full Text PDFSci Rep
January 2025
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing, 100081, China.
Speech-to-speech translation (S2ST) has evolved from cascade systems which integrate Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS), to end-to-end models. This evolution has been driven by advancements in model performance and the expansion of cross-lingual speech datasets. Despite the paucity of research on Tibetan speech translation, this paper endeavors to tackle the challenge of Tibetan-to-Chinese direct speech-to-speech translation within the multi-task learning framework, employing self-supervised learning (SSL) and sequence-to-sequence model training.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!