Hearing impairment is often characterized by poor speech-in-noise recognition. State-of-the-art laboratory-based noise-reduction technology can eliminate background sounds from a corrupted speech signal and improve intelligibility, but it can also hinder environmental sound recognition (ESR), which is essential for personal independence and safety. This paper presents a time-frequency mask, the ideal compressed mask (ICM), that aims to provide listeners with improved speech intelligibility without substantially reducing ESR.
View Article and Find Full Text PDFObjectives: This study aimed to determine the speech-to-background ratios (SBRs) at which normal-hearing (NH) and hearing-impaired (HI) listeners can recognize both speech and environmental sounds when the two types of signals are mixed. Also examined were the effect of individual sounds on speech recognition and environmental sound recognition (ESR), and the impact of divided versus selective attention on these tasks.
Design: In Experiment 1 (divided attention), 11 NH and 10 HI listeners heard sentences mixed with environmental sounds at various SBRs and performed speech recognition and ESR tasks concurrently in each trial.
Recent years have brought considerable advances to our ability to increase intelligibility through deep-learning-based noise reduction, especially for hearing-impaired (HI) listeners. In this study, intelligibility improvements resulting from a current algorithm are assessed. These benefits are compared to those resulting from the initial demonstration of deep-learning-based noise reduction for HI listeners ten years ago in Healy, Yoho, Wang, and Wang [(2013).
View Article and Find Full Text PDFPurpose: Background noise reduces speech intelligibility. Time-frequency (T-F) masking is an established signal processing technique that improves intelligibility of neurotypical speech in background noise. Here, we investigated a novel application of T-F masking, assessing its potential to improve intelligibility of neurologically degraded speech in background noise.
View Article and Find Full Text PDFPurpose: A dual-task paradigm was implemented to investigate how noise type and sentence context may interact with age and hearing loss to impact word recall during speech recognition.
Method: Three noise types with varying degrees of temporal/spectrotemporal modulation were used: speech-shaped noise, speech-modulated noise, and three-talker babble. Participant groups included younger listeners with normal hearing (NH), older listeners with near-normal hearing, and older listeners with sensorineural hearing loss.
The fundamental requirement for real-time operation of a speech-processing algorithm is causality-that it operate without utilizing future time frames. In the present study, the performance of a fully causal deep computational auditory scene analysis algorithm was assessed. Target sentences were isolated from complex interference consisting of an interfering talker and concurrent room reverberation.
View Article and Find Full Text PDFThe practical efficacy of deep learning based speaker separation and/or dereverberation hinges on its ability to generalize to conditions not employed during neural network training. The current study was designed to assess the ability to generalize across extremely different training versus test environments. Training and testing were performed using different languages having no known common ancestry and correspondingly large linguistic differences-English for training and Mandarin for testing.
View Article and Find Full Text PDFReal-time operation is critical for noise reduction in hearing technology. The essential requirement of real-time operation is causality-that an algorithm does not use future time-frame information and, instead, completes its operation by the end of the current time frame. This requirement is extended currently through the concept of "effectively causal," in which future time-frame information within the brief delay tolerance of the human speech-perception mechanism is used.
View Article and Find Full Text PDFAdverse listening conditions involve glimpses of spectro-temporal speech information. This study investigated if the acoustic organization of the spectro-temporal masking pattern affects speech glimpsing in "checkerboard" noise. The regularity and coherence of the masking pattern was varied.
View Article and Find Full Text PDFDeep learning based speech separation or noise reduction needs to generalize to voices not encountered during training and to operate under multiple corruptions. The current study provides such a demonstration for hearing-impaired (HI) listeners. Sentence intelligibility was assessed under conditions of a single interfering talker and substantial amounts of room reverberation.
View Article and Find Full Text PDFHearing-impaired listeners' intolerance to background noise during speech perception is well known. The current study employed speech materials free of ceiling effects to reveal the optimal trade-off between rejecting noise and retaining speech during time-frequency masking. This relative criterion value (-7 dB) was found to hold across noise types that differ in acoustic spectro-temporal complexity.
View Article and Find Full Text PDFFor deep learning based speech segregation to have translational significance as a noise-reduction tool, it must perform in a wide variety of acoustic environments. In the current study, performance was examined when target speech was subjected to interference from a single talker and room reverberation. Conditions were compared in which an algorithm was trained to remove both reverberation and interfering speech, or only interfering speech.
View Article and Find Full Text PDFJ Speech Lang Hear Res
November 2018
Purpose: The goal of this study was to examine the role of carrier cues in sound source segregation and the possibility to enhance the intelligibility of 2 sentences presented simultaneously. Dual-carrier (DC) processing (Apoux, Youngdahl, Yoho, & Healy, 2015) was used to introduce synthetic carrier cues in vocoded speech.
Method: Listeners with normal hearing heard sentences processed either with a DC or with a traditional single-carrier (SC) vocoder.
J Acoust Soc Am
September 2018
Time-frequency (T-F) masks represent powerful tools to increase the intelligibility of speech in background noise. Translational relevance is provided by their accurate estimation based only on the signal-plus-noise mixture, using deep learning or other machine-learning techniques. In the current study, a technique is designed to capture the benefits of existing techniques.
View Article and Find Full Text PDFRecently, deep learning based speech segregation has been shown to improve human speech intelligibility in noisy environments. However, one important factor not yet considered is room reverberation, which characterizes typical daily environments. The combination of reverberation and background noise can severely degrade speech intelligibility for hearing-impaired (HI) listeners.
View Article and Find Full Text PDFSpeech recognition in fluctuating maskers is influenced by the spectro-temporal properties of the noise. Three experiments examined different temporal and spectro-temporal noise properties. Experiment 1 replicated previous work by highlighting maximum performance at a temporal gating rate of 4-8 Hz.
View Article and Find Full Text PDFThe degrading influence of noise on various critical bands of speech was assessed. A modified version of the compound method [Apoux and Healy (2012) J. Acoust.
View Article and Find Full Text PDFBand-importance functions created using the compound method [Apoux and Healy (2012). J. Acoust.
View Article and Find Full Text PDFJ Speech Lang Hear Res
February 2018
Purpose: Psychoacoustic data indicate that infants and children are less likely than adults to focus on a spectral region containing an anticipated signal and are more susceptible to remote masking of a signal. These detection tasks suggest that infants and children, unlike adults, do not listen selectively. However, less is known about children's ability to listen selectively during speech recognition.
View Article and Find Full Text PDFPurpose: Although there is evidence that emotional valence of stimuli impacts lexical processes, there is limited work investigating its specific impact on lexical retrieval. The current study aimed to determine the degree to which emotional valence of pictured stimuli impacts naming latencies in healthy younger and older adults.
Method: Eighteen healthy younger adults and 18 healthy older adults named positive, negative, and neutral images, and reaction time was measured.
Individuals with hearing impairment have particular difficulty perceptually segregating concurrent voices and understanding a talker in the presence of a competing voice. In contrast, individuals with normal hearing perform this task quite well. This listening situation represents a very different problem for both the human and machine listener, when compared to perceiving speech in other types of background noise.
View Article and Find Full Text PDFAnnu Int Conf IEEE Eng Med Biol Soc
August 2016
A primary complaint of hearing-impaired individuals involves poor speech understanding when background noise is present. Hearing aids and cochlear implants often allow good speech understanding in quiet backgrounds. But hearing-impaired individuals are highly noise intolerant, and existing devices are not very effective at combating background noise.
View Article and Find Full Text PDFJ Acoust Soc Am
October 2016
Listeners can reliably perceive speech in noisy conditions, but it is not well understood what specific features of speech they use to do this. This paper introduces a data-driven framework to identify the time-frequency locations of these features. Using the same speech utterance mixed with many different noise instances, the framework is able to compute the importance of each time-frequency point in the utterance to its intelligibility.
View Article and Find Full Text PDFSupervised speech segregation has been recently shown to improve human speech intelligibility in noise, when trained and tested on similar noises. However, a major challenge involves the ability to generalize to entirely novel noises. Such generalization would enable hearing aid and cochlear implant users to improve speech intelligibility in unknown noisy environments.
View Article and Find Full Text PDFMachine learning algorithms to segregate speech from background noise hold considerable promise for alleviating limitations associated with hearing impairment. One of the most important considerations for implementing these algorithms into devices such as hearing aids and cochlear implants involves their ability to generalize to conditions not employed during the training stage. A major challenge involves the generalization to novel noise segments.
View Article and Find Full Text PDF