Vocabulary size has been suggested as a useful measure of "verbal abilities" that correlates with speech recognition scores. Knowing more words is linked to better speech recognition. How vocabulary knowledge translates to general speech recognition mechanisms, how these mechanisms relate to offline speech recognition scores, and how they may be modulated by acoustical distortion or age, is less clear. Age-related differences in linguistic measures may predict age-related differences in speech recognition in noise performance. We hypothesized that speech recognition performance can be predicted by the efficiency of lexical access, which refers to the speed with which a given word can be searched and accessed relative to the size of the mental lexicon. We tested speech recognition in a clinical German sentence-in-noise test at two signal-to-noise ratios (SNRs), in 22 younger (18-35 years) and 22 older (60-78 years) listeners with normal hearing. We also assessed receptive vocabulary, lexical access time, verbal working memory, and hearing thresholds as measures of individual differences. Age group, SNR level, vocabulary size, and lexical access time were significant predictors of individual speech recognition scores, but working memory and hearing threshold were not. Interestingly, longer accessing times were correlated with better speech recognition scores. Hierarchical regression models for each subset of age group and SNR showed very similar patterns: the combination of vocabulary size and lexical access time contributed most to speech recognition performance; only for the younger group at the better SNR (yielding about 85% correct speech recognition) did vocabulary size alone predict performance. Our data suggest that successful speech recognition in noise is mainly modulated by the efficiency of lexical access. This suggests that older adults' poorer performance in the speech recognition task may have arisen from reduced efficiency in lexical access; with an average vocabulary size similar to that of younger adults, they were still slower in lexical access.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4930932 | PMC |
http://dx.doi.org/10.3389/fpsyg.2016.00990 | DOI Listing |
J Acoust Soc Am
January 2025
USC Viterbi School of Engineering, University of Southern California, Los Angeles, California 90089-1455, USA.
Voice quality serves as a rich source of information about speakers, providing listeners with impressions of identity, emotional state, age, sex, reproductive fitness, and other biologically and socially salient characteristics. Understanding how this information is transmitted, accessed, and exploited requires knowledge of the psychoacoustic dimensions along which voices vary, an area that remains largely unexplored. Recent studies of English speakers have shown that two factors related to speaker size and arousal consistently emerge as the most important determinants of quality, regardless of who is speaking.
View Article and Find Full Text PDFData Brief
February 2025
Department of Electrical, Electronic and Communication Engineering, Military Institute of Science and Technology (MIST), Dhaka 1216, Bangladesh.
The dataset represents a significant advancement in Bengali lip-reading and visual speech recognition research, poised to drive future applications and technological progress. Despite Bengali's global status as the seventh most spoken language with approximately 265 million speakers, linguistically rich and widely spoken languages like Bengali have been largely overlooked by the research community. fills this gap by offering a pioneering dataset tailored for Bengali lip-reading, comprising visual data from 150 speakers across 54 classes, encompassing Bengali phonemes, alphabets, and symbols.
View Article and Find Full Text PDFInt J Audiol
January 2025
Department of Otorhinolaryngology and Head & Neck Surgery, Leiden University Medical Center, Leiden, Netherlands.
Objective: Measuring listening effort using pupillometry is challenging in cochlear implant (CI) users. We assess three validated speech tests (Matrix, LIST, and DIN) to identify the optimal speech material for measuring peak-pupil-dilation (PPD) in CI users as a function of signal-to-noise ratio (SNR).
Design: Speech tests were administered in quiet and two noisy conditions, namely at the speech recognition threshold (0 dB re SRT), i.
Sci Rep
January 2025
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing, 100081, China.
Speech-to-speech translation (S2ST) has evolved from cascade systems which integrate Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS), to end-to-end models. This evolution has been driven by advancements in model performance and the expansion of cross-lingual speech datasets. Despite the paucity of research on Tibetan speech translation, this paper endeavors to tackle the challenge of Tibetan-to-Chinese direct speech-to-speech translation within the multi-task learning framework, employing self-supervised learning (SSL) and sequence-to-sequence model training.
View Article and Find Full Text PDFPerspect ASHA Spec Interest Groups
December 2024
DeVault Otologic Research Laboratory, Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis.
Purpose: Cochlear implants (CIs) have improved the quality of life for many children with severe-to-profound sensorineural hearing loss. Despite the reported CI benefits of improved speech recognition, speech intelligibility, and spoken language processing, large individual differences in speech and language outcomes are still consistently reported in the literature. The enormous variability in CI outcomes has made it challenging to predict which children may be at high risk for limited benefits and how potential risk factors can be improved with interventions.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!