[Development and evaluation of a deep learning algorithm for German word recognition from lip movements].

HNO

Universitätsklinik und Poliklinik für Hals‑, Nasen‑, Ohrenheilkunde, Kopf- und Halschirurgie, Universitätsklinikum Halle (Saale), Martin-Luther-Universität Halle-Wittenberg, Ernst-Grube-Str. 40, 06120, Halle (Saale), Deutschland.

Published: June 2022

Background: When reading lips, many people benefit from additional visual information from the lip movements of the speaker, which is, however, very error prone. Algorithms for lip reading with artificial intelligence based on artificial neural networks significantly improve word recognition but are not available for the German language.

Materials And Methods: A total of 1806 videoclips with only one German-speaking person each were selected, split into word segments, and assigned to word classes using speech-recognition software. In 38,391 video segments with 32 speakers, 18 polysyllabic, visually distinguishable words were used to train and validate a neural network. The 3D Convolutional Neural Network and Gated Recurrent Units models and a combination of both models (GRUConv) were compared, as were different image sections and color spaces of the videos. The accuracy was determined in 5000 training epochs.

Results: Comparison of the color spaces did not reveal any relevant different correct classification rates in the range from 69% to 72%. With a cut to the lips, a significantly higher accuracy of 70% was achieved than when cut to the entire speaker's face (34%). With the GRUConv model, the maximum accuracies were 87% with known speakers and 63% in the validation with unknown speakers.

Conclusion: The neural network for lip reading, which was first developed for the German language, shows a very high level of accuracy, comparable to English-language algorithms. It works with unknown speakers as well and can be generalized with more word classes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9160146PMC
http://dx.doi.org/10.1007/s00106-021-01143-9DOI Listing

Publication Analysis

Top Keywords

word recognition
8
lip reading
8
word classes
8
neural network
8
color spaces
8
word
5
[development evaluation
4
evaluation a deep
4
a deep learning
4
learning algorithm
4

Similar Publications

The next step in the evolution of static 3-dimensionally (3D) printed models may be the creation of "smart" models, where subcomponents can be seamlessly interacted with through a feedback mechanism, with potential applications in trainee education and patient counseling. Considering the complexity of the ventricular and cisternal systems, they were chosen for segmentation, using Materialize InPrint with outward hollowing using 2.5-mm wall thickness.

View Article and Find Full Text PDF

Efficient visual word recognition presumably relies on orthographic prediction error (oPE) representations. On the basis of a transparent neurocognitive computational model rooted in the principles of the predictive coding framework, we postulated that readers optimize their percept by removing redundant visual signals, allowing them to focus on the informative aspects of the sensory input (i.e.

View Article and Find Full Text PDF

Sulfur-containing small molecules, mainly including cysteine (Cys), homocysteine (Hcy), glutathione (GSH), and hydrogen sulfide (HS), are crucial biomarkers, and their levels in different body locations (living cells, tissues, blood, urine, saliva, ) are inconsistent and constantly changing. Therefore, it is highly meaningful and challenging to synchronously and accurately detect them in complex multi-component samples without mutual interference. In this work, we propose a steric hindrance-regulated probe, NBD-2FDCI, with single excitation dual emissions to achieve self-adaptive detection of four analytes.

View Article and Find Full Text PDF

Background: Patients who are informed about the causes, pathophysiology, treatment and prevention of a disease are better able to participate in treatment procedures in the event of illness. Artificial intelligence (AI), which has gained popularity in recent years, is defined as the study of algorithms that provide machines with the ability to reason and perform cognitive functions, including object and word recognition, problem solving and decision making. This study aimed to examine the readability, reliability and quality of responses to frequently asked keywords about low back pain (LBP) given by three different AI-based chatbots (ChatGPT, Perplexity and Gemini), which are popular applications in online information presentation today.

View Article and Find Full Text PDF

The Association Between Hearing Loss and Depression in a Large Electronic Health Record System.

Otolaryngol Head Neck Surg

January 2025

Department of Otolaryngology-Head and Neck Surgery, Columbia University Vagelos College of Physicians and Surgeons,  NewYork-Presbyterian/Columbia University Irving Medical Center, New York, New York, USA.

Objective: Hearing loss (HL) is associated with depression, but existing datasets are limited by the type of data available for both hearing and mental health conditions. The purpose of this study is to determine if there is an association between HL and depressive disorders within a large bi-institutional electronic health record (EHR) system containing more granular diagnostic information.

Study Design: Cross-sectional epidemiologic study.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!