Objectives: The present study aimed to (1) evaluate the accuracy of envelope following responses (EFRs) in predicting speech audibility as a function of the statistical indicator used for objective response detection, stimulus phoneme, frequency, and level, and (2) quantify the minimum sensation level (SL; stimulus level above behavioral threshold) needed for detecting EFRs.
Design: In 21 participants with normal hearing, EFRs were elicited by 8 band-limited phonemes in the male-spoken token /susa∫i/ (2.05 sec) presented between 20 and 65 dB SPL in 15 dB increments. Vowels in /susa∫i/ were modified to elicit two EFRs simultaneously by selectively lowering the fundamental frequency (f0) in the first formant (F1) region. The modified vowels elicited one EFR from the low-frequency F1 and another from the mid-frequency second and higher formants (F2+). Fricatives were amplitude-modulated at the average f0. EFRs were extracted from single-channel EEG recorded between the vertex (Cz) and the nape of the neck when /susa∫i/ was presented monaurally for 450 sweeps. The performance of the three statistical indicators, F-test, Hotelling's T, and phase coherence, was compared against behaviorally determined audibility (estimated SL, SL ≥0 dB = audible) using area under the receiver operating characteristics (AUROC) curve, sensitivity (the proportion of audible speech with a detectable EFR [true positive rate]), and specificity (the proportion of inaudible speech with an undetectable EFR [true negative rate]). The influence of stimulus phoneme, frequency, and level on the accuracy of EFRs in predicting speech audibility was assessed by comparing sensitivity, specificity, positive predictive value (PPV; the proportion of detected EFRs elicited by audible stimuli) and negative predictive value (NPV; the proportion of undetected EFRs elicited by inaudible stimuli). The minimum SL needed for detection was evaluated using a linear mixed-effects model with the predictor variables stimulus and EFR detection p value.
Results: of the 3 statistical indicators were similar; however, at the type I error rate of 5%, the sensitivities of Hotelling's T (68.4%) and phase coherence (68.8%) were significantly higher than the F-test (59.5%). In contrast, the specificity of the F-test (97.3%) was significantly higher than the Hotelling's T (88.4%). When analyzed using Hotelling's T as a function of stimulus, fricatives offered higher sensitivity (88.6 to 90.6%) and NPV (57.9 to 76.0%) compared with most vowel stimuli (51.9 to 71.4% and 11.6 to 51.3%, respectively). When analyzed as a function of frequency band (F1, F2+, and fricatives aggregated as low-, mid- and high-frequencies, respectively), high-frequency stimuli offered the highest sensitivity (96.9%) and NPV (88.9%). When analyzed as a function of test level, sensitivity improved with increases in stimulus level (99.4% at 65 dB SPL). The minimum SL for EFR detection ranged between 13.4 and 21.7 dB for F1 stimuli, 7.8 to 12.2 dB for F2+ stimuli, and 2.3 to 3.9 dB for fricative stimuli.
Conclusions: EFR-based inference of speech audibility requires consideration of the statistical indicator used, phoneme, stimulus frequency, and stimulus level.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8132745 | PMC |
http://dx.doi.org/10.1097/AUD.0000000000000892 | DOI Listing |
eNeuro
January 2025
Research School of Psychology, Australian National University, 0200, Australia.
Inner speech refers to the silent production of language in one's mind. As a purely mental action without obvious physical manifestations, inner speech has been notoriously difficult to quantify. Inner speech is thought to be closely related to overt speech.
View Article and Find Full Text PDFEar Hear
December 2024
Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska, USA.
Objectives: To investigate the influence of frequency-specific audibility on audiovisual benefit in children, this study examined the impact of high- and low-pass acoustic filtering on auditory-only and audiovisual word and sentence recognition in children with typical hearing. Previous studies show that visual speech provides greater access to consonant place of articulation than other consonant features and that low-pass filtering has a strong impact on perception on acoustic consonant place of articulation. This suggests visual speech may be particularly useful when acoustic speech is low-pass filtered because it provides complementary information about consonant place of articulation.
View Article and Find Full Text PDFJ Voice
January 2025
Department of Otolaryngology Head and Neck Surgery, Mackay Memorial Hospital, Taipei, Taiwan; School of Medicine, Mackay Medical College, New Taipei City, Taiwan; Department of Audiology and Speech Language Pathology, Mackay Medical College, New Taipei City, Taiwan. Electronic address:
Objectives: This study investigated the relationship between the position of the paralyzed vocal fold and voice quality in patients with unilateral vocal fold paralysis (UVFP) and identified a reliable acoustic analysis tool to enhance the accuracy of voice quality assessments in this population.
Methods: A retrospective case-control study was conducted with 70 patients with UVFP diagnosed at Mackay Memorial Hospital. Acoustic features-jitter, shimmer, the harmonic-to-noise ratio (HNR), and the cepstral peak prominence smoothed (CPPs)-were analyzed using the Praat software.
Mult Scler
December 2024
UCSF Weill Institute for Neurosciences, San Francisco, CA, USA.
Background: Fatigue is a major "invisible" symptom in people with multiple sclerosis (PwMS), which may affect speech. Automated speech analysis is an objective, rapid tool to capture digital speech biomarkers linked to functional outcomes.
Objective: To use automated speech analysis to assess multiple sclerosis (MS) fatigue metrics.
Cereb Cortex
December 2024
Instituto de Investigaciones Biológicas Clemente Estable, Department of Integrative and Computational Neurosciences, Av. Italia 3318, Montevideo, 11.600, Uruguay.
A social scene is particularly informative when people are distinguishable. To understand somebody amid a "cocktail party" chatter, we automatically index their voice. This ability is underpinned by parallel processing of vocal spectral contours from speech sounds, but it has not yet been established how this occurs in the brain's cortex.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!