Lipreading Architecture Based on Multiple Convolutional Neural Networks for Sentence-Level Visual Speech Recognition.

Sanghun Jeon Ahmed Elsharkawy Mun Sang Kim

Sensors (Basel)

Center for Healthcare Robotics, Gwangju Institute of Science and Technology (GIST), School of Integrated Technology, Gwangju 61005, Korea.

Published: December 2021

In visual speech recognition (VSR), speech is transcribed using only visual information to interpret tongue and teeth movements. Recently, deep learning has shown outstanding performance in VSR, with accuracy exceeding that of lipreaders on benchmark datasets. However, several problems still exist when using VSR systems. A major challenge is the distinction of words with similar pronunciation, called homophones; these lead to word ambiguity. Another technical limitation of traditional VSR systems is that visual information does not provide sufficient data for learning words such as "a", "an", "eight", and "bin" because their lengths are shorter than 0.02 s. This report proposes a novel lipreading architecture that combines three different convolutional neural networks (CNNs; a 3D CNN, a densely connected 3D CNN, and a multi-layer feature fusion 3D CNN), which are followed by a two-layer bi-directional gated recurrent unit. The entire network was trained using connectionist temporal classification. The results of the standard automatic speech recognition evaluation metrics show that the proposed architecture reduced the character and word error rates of the baseline model by 5.681% and 11.282%, respectively, for the unseen-speaker dataset. Our proposed architecture exhibits improved performance even when visual ambiguity arises, thereby increasing VSR reliability for practical applications.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8747278	PMC
http://dx.doi.org/10.3390/s22010072	DOI Listing

Publication Analysis

Top Keywords

speech recognition

lipreading architecture

convolutional neural

neural networks

visual speech

vsr systems

proposed architecture

visual

vsr

architecture based

Similar Publications

Understanding Parental Perspectives on Childhood Hearing Impairment and Timely Interventions.

Cureus

January 2025

College of Medicine, Department of Otolaryngology - Head and Neck Surgery, University of Jeddah, Jeddah, SAU.

Nada Alharbi Daniyah Baqalaqil Hams Alharthi Nouf Almalki Samar Altoukhi

Objectives: Hearing impairment during childhood is a widespread health issue. Prompt recognition and timely intervention are vital for the advancement of language skills. Insufficient parental knowledge can lead to a delay in diagnosing and treating a condition, which can have a negative impact on academic performance.

View Article and Find Full Text PDF

Similar Publications

Hearing Performance and Soft-Tissue Outcomes of Minimally Invasive Ponto Surgery and Local Anesthesia in Children with Unilateral Craniofacial Malformation.

Int Arch Otorhinolaryngol

January 2025

School of Medical Sciences, Santa Casa de São Paulo, São Paulo, SP, Brazil.

Andrea Caruso Leone Arthur Menino Castilho Fabiana Danieli Daniela Bortoloti Calil Katia de Almeida

Minimally invasive Ponto surgery (MIPS) enables the installation of percutaneous bone-anchored hearing implants (BAHIs) with a drill guide through a hole punch incision. Despite being well established for adults, there is a lack of studies in the literature regarding its use in pediatric patients. The aim of the present study was to investigate the hearing performance and soft-tissue outcomes of the use of MIPS under local anesthesia in children with unilateral craniofacial malformation (UCM).

View Article and Find Full Text PDF

Similar Publications

Effects of Interaural Latency and Frequency Mismatch on Speech Recognition for Bimodal Cochlear Implant Users.

Laryngoscope

January 2025

Department of Otolaryngology/Head & Neck Surgery, University of North Carolina School of Medicine, Chapel Hill, North Carolina, U.S.A.

Margaret T Dillon Emily Buss Margaret E Richter Kevin D Brown

Objectives: Bimodal cochlear implant (CI) users vary in speech recognition outcomes. This variability may be influenced partly by the CI and contralateral hearing aid (HA) programming procedures, which can result in mismatches in latency and frequency. We assessed the performance of bimodal listeners when latency mismatches were corrected and analyzed how frequency mismatches influenced outcomes.

View Article and Find Full Text PDF

Similar Publications

Automatic speech recognition predicts contemporaneous earthquake fault displacement.

Nat Commun

January 2025

Los Alamos National Laboratory, EES-17 National Security Earth Science, Los Alamos, NM, 87545, USA.

Christopher W Johnson Kun Wang Paul A Johnson

Significant progress has been made in probing the state of an earthquake fault by applying machine learning to continuous seismic waveforms. The breakthroughs were originally obtained from laboratory shear experiments and numerical simulations of fault shear, then successfully extended to slow-slipping faults. Here we apply the Wav2Vec-2.

View Article and Find Full Text PDF

Similar Publications

Accuracy and variability in clinical predictions of speech recognition outcomes for cochlear implant users.

Int J Audiol

January 2025

Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN, USA.

Valeriy Shafiro Michael S Harris Berenice Ramirez Liping Du Aaron C Moberly

Objectives: An improvement in speech perception is a major well-documented benefit of cochlear implantation (CI), which is commonly discussed with CI candidates to set expectations. However, a large variability exists in speech perception outcomes. We evaluated the accuracy of clinical predictions of post-CI speech perception scores.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!