Speech decoding from brain activity can enable development of brain-computer interfaces (BCIs) to restore naturalistic communication in paralyzed patients. Previous work has focused on development of decoding models from isolated speech data with a clean background and multiple repetitions of the material. In this study, we describe a novel approach to speech decoding that relies on a generative adversarial neural network (GAN) to reconstruct speech from brain data recorded during a naturalistic speech listening task (watching a movie). We compared the GAN-based approach, where reconstruction was done from the compressed latent representation of sound decoded from the brain, with several baseline models that reconstructed sound spectrogram directly. We show that the novel approach provides more accurate reconstructions compared to the baselines. These results underscore the potential of GAN models for speech decoding in naturalistic noisy environments and further advancing of BCIs for naturalistic communication. Clinical Relevance - This study presents a novel speech decoding paradigm that combines advances in deep learning, speech synthesis and neural engineering, and has the potential to advance the field of BCI for severely paralyzed individuals.

Download full-text PDF

Source
http://dx.doi.org/10.1109/EMBC48229.2022.9871301DOI Listing

Publication Analysis

Top Keywords

speech decoding
20
naturalistic speech
8
brain data
8
speech
8
naturalistic communication
8
novel approach
8
decoding
6
naturalistic
5
decoding intracranial
4
brain
4

Similar Publications

Listeners with hearing loss have trouble following a conversation in multitalker environments. While modern hearing aids can generally amplify speech, these devices are unable to tune into a target speaker without first knowing to which speaker a user aims to attend. Brain-controlled hearing aids have been proposed using auditory attention decoding (AAD) methods, but current methods use the same model to compare the speech stimulus and neural response, regardless of the dynamic overlap between talkers which is known to influence neural encoding.

View Article and Find Full Text PDF

Beyond Averaging: A Transformer Approach to Decoding Event Related Brain Potentials.

Neuroimage

January 2025

Department of Computer Science, University of Innsbruck, Technikerstrasse 21a, Innsbruck, 6020, Austria. Electronic address:

The objective of this study is to assess the potential of a transformer-based deep learning approach applied to event-related brain potentials (ERPs) derived from electroencephalographic (EEG) data. Traditional methods involve averaging the EEG signal of multiple trials to extract valuable neural signals from the high noise content of EEG data. However, this averaging technique may conceal relevant information.

View Article and Find Full Text PDF

Functional magnetic resonance imaging (fMRI) has dramatically advanced non-invasive human brain mapping and decoding. Functional near-infrared spectroscopy (fNIRS) and high-density diffuse optical tomography (HD-DOT) non-invasively measure blood oxygen fluctuations related to brain activity, like fMRI, at the brain surface, using more-lightweight equipment that circumvents ergonomic and logistical limitations of fMRI. HD-DOT grids have smaller inter-optode spacing (~ 13 mm) than sparse fNIRS (~ 30 mm) and therefore provide higher image quality, with spatial resolution ~ 1/2 that of fMRI, when using the several source-detector distances (13-40 mm) afforded by the HD-DOT grid.

View Article and Find Full Text PDF

An End-To-End Speech Recognition Model for the North Shaanxi Dialect: Design and Evaluation.

Sensors (Basel)

January 2025

SHCCIG Yubei Coal Industry Co., Ltd., Xi'an 710900, China.

The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as "Shapu", characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture.

View Article and Find Full Text PDF

Significance: Decoding naturalistic content from brain activity has important neuroscience and clinical implications. Information about visual scenes and intelligible speech has been decoded from cortical activity using functional magnetic resonance imaging (fMRI) and electrocorticography, but widespread applications are limited by the logistics of these technologies.

Aim: High-density diffuse optical tomography (HD-DOT) offers image quality approaching that of fMRI but with the silent, open scanning environment afforded by optical methods, thus opening the door to more naturalistic research and applications.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!