Lipreading Architecture Based on Multiple Convolutional Neural Networks for Sentence-Level Visual Speech Recognition.

Sensors (Basel)

Center for Healthcare Robotics, Gwangju Institute of Science and Technology (GIST), School of Integrated Technology, Gwangju 61005, Korea.

Published: December 2021

In visual speech recognition (VSR), speech is transcribed using only visual information to interpret tongue and teeth movements. Recently, deep learning has shown outstanding performance in VSR, with accuracy exceeding that of lipreaders on benchmark datasets. However, several problems still exist when using VSR systems. A major challenge is the distinction of words with similar pronunciation, called homophones; these lead to word ambiguity. Another technical limitation of traditional VSR systems is that visual information does not provide sufficient data for learning words such as "a", "an", "eight", and "bin" because their lengths are shorter than 0.02 s. This report proposes a novel lipreading architecture that combines three different convolutional neural networks (CNNs; a 3D CNN, a densely connected 3D CNN, and a multi-layer feature fusion 3D CNN), which are followed by a two-layer bi-directional gated recurrent unit. The entire network was trained using connectionist temporal classification. The results of the standard automatic speech recognition evaluation metrics show that the proposed architecture reduced the character and word error rates of the baseline model by 5.681% and 11.282%, respectively, for the unseen-speaker dataset. Our proposed architecture exhibits improved performance even when visual ambiguity arises, thereby increasing VSR reliability for practical applications.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8747278PMC
http://dx.doi.org/10.3390/s22010072DOI Listing

Publication Analysis

Top Keywords

speech recognition
12
lipreading architecture
8
convolutional neural
8
neural networks
8
visual speech
8
vsr systems
8
proposed architecture
8
visual
5
vsr
5
architecture based
4

Similar Publications

Objectives: Hearing impairment during childhood is a widespread health issue. Prompt recognition and timely intervention are vital for the advancement of language skills. Insufficient parental knowledge can lead to a delay in diagnosing and treating a condition, which can have a negative impact on academic performance.

View Article and Find Full Text PDF

 Minimally invasive Ponto surgery (MIPS) enables the installation of percutaneous bone-anchored hearing implants (BAHIs) with a drill guide through a hole punch incision. Despite being well established for adults, there is a lack of studies in the literature regarding its use in pediatric patients.  The aim of the present study was to investigate the hearing performance and soft-tissue outcomes of the use of MIPS under local anesthesia in children with unilateral craniofacial malformation (UCM).

View Article and Find Full Text PDF

Objectives: Bimodal cochlear implant (CI) users vary in speech recognition outcomes. This variability may be influenced partly by the CI and contralateral hearing aid (HA) programming procedures, which can result in mismatches in latency and frequency. We assessed the performance of bimodal listeners when latency mismatches were corrected and analyzed how frequency mismatches influenced outcomes.

View Article and Find Full Text PDF

Automatic speech recognition predicts contemporaneous earthquake fault displacement.

Nat Commun

January 2025

Los Alamos National Laboratory, EES-17 National Security Earth Science, Los Alamos, NM, 87545, USA.

Significant progress has been made in probing the state of an earthquake fault by applying machine learning to continuous seismic waveforms. The breakthroughs were originally obtained from laboratory shear experiments and numerical simulations of fault shear, then successfully extended to slow-slipping faults. Here we apply the Wav2Vec-2.

View Article and Find Full Text PDF

Objectives: An improvement in speech perception is a major well-documented benefit of cochlear implantation (CI), which is commonly discussed with CI candidates to set expectations. However, a large variability exists in speech perception outcomes. We evaluated the accuracy of clinical predictions of post-CI speech perception scores.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!