Successes and critical failures of neural networks in capturing human-like speech recognition.

Neural Netw

Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Department of Psychology, New York University, NY, United States; Max Planck NYU Center for Language, Music, and Emotion, Frankfurt, Germany, New York, NY, United States.

Published: May 2023

Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a closer mutual examination would potentially enrich artificial hearing systems and process models of the mind and brain. Speech recognition - an area ripe for such exploration - is inherently robust in humans to a number transformations at various spectrotemporal granularities. To what extent are these robustness profiles accounted for by high-performing neural network systems? We bring together experiments in speech recognition under a single synthesis framework to evaluate state-of-the-art neural networks as stimulus-computable, optimized observers. In a series of experiments, we (1) clarify how influential speech manipulations in the literature relate to each other and to natural speech, (2) show the granularities at which machines exhibit out-of-distribution robustness, reproducing classical perceptual phenomena in humans, (3) identify the specific conditions where model predictions of human performance differ, and (4) demonstrate a crucial failure of all artificial systems to perceptually recover where humans do, suggesting alternative directions for theory and model building. These findings encourage a tighter synergy between the cognitive science and engineering of audition.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2023.02.032DOI Listing

Publication Analysis

Top Keywords

speech recognition
12
neural networks
8
cognitive science
8
science engineering
8
engineering audition
8
speech
5
successes critical
4
critical failures
4
failures neural
4
networks capturing
4

Similar Publications

This article presents a creative biography of Sergey Selivanovich Golovin, the prominent Russian ophthalmologist of the first quarter of the 20th century. The work is based on archival research and analysis of published materials, and characterizes the career of S.S.

View Article and Find Full Text PDF

A non-local dual-stream fusion network for laryngoscope recognition.

Am J Otolaryngol

December 2024

Department of Otorhinolaryngology Head and Neck Surgery, Tianjin First Central Hospital, Tianjin 300192, China; Institute of Otolaryngology of Tianjin, Tianjin, China; Key Laboratory of Auditory Speech and Balance Medicine, Tianjin, China; Key Clinical Discipline of Tianjin (Otolaryngology), Tianjin, China; Otolaryngology Clinical Quality Control Centre, Tianjin, China.

Purpose: To use deep learning technology to design and implement a model that can automatically classify laryngoscope images and assist doctors in diagnosing laryngeal diseases.

Materials And Methods: The experiment was based on 3057 images (normal, glottic cancer, granuloma, Reinke's Edema, vocal cord cyst, leukoplakia, nodules and polyps) from the dataset Laryngoscope8. A classification model based on deep neural networks was developed and tested.

View Article and Find Full Text PDF

Unlabelled: Central auditory disorders (CSD) - this is a violation of the processing of sound stimuli, including speech, above the cochlear nuclei of the brain stem, which is mainly manifested by difficulties in speech recognition, especially in noisy environments. Children with this pathology are more likely to have behavioral problems, impaired auditory, linguistic and cognitive development, and especially difficulties with learning at school.

Objective: To analyze the literature data on the epidemiology of central auditory disorders in school-age children.

View Article and Find Full Text PDF

How Does Deep Neural Network-Based Noise Reduction in Hearing Aids Impact Cochlear Implant Candidacy?

Audiol Res

December 2024

Division of Audiology, Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Rochester, MN 55902, USA.

Background/objectives: Adult hearing-impaired patients qualifying for cochlear implants typically exhibit less than 60% sentence recognition under the best hearing aid conditions, either in quiet or noisy environments, with speech and noise presented through a single speaker. This study examines the influence of deep neural network-based (DNN-based) noise reduction on cochlear implant evaluation.

Methods: Speech perception was assessed using AzBio sentences in both quiet and noisy conditions (multi-talker babble) at 5 and 10 dB signal-to-noise ratios (SNRs) through one loudspeaker.

View Article and Find Full Text PDF

: Hearing loss is a highly prevalent condition in the world population that determines emotional, social, and economic costs. In recent years, it has been definitely recognized that the lack of physiological binaural hearing causes alterations in the localization of sounds and reduced speech recognition in noise and reverberation. This study aims to explore the psycho-social profile of adult workers affected by single-sided deafness (SSD), without other major medical conditions and otological symptoms, through comparison to subjects with normal hearing.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!