In this paper, we propose a visual embedding approach to improve embedding aware speech enhancement (EASE) by synchronizing visual lip frames at the phone and place of articulation levels. We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE). Next, we extract audio-visual embedding from noisy speech and lip frames in an information intersection manner, utilizing a complementarity of audio and visual features for multi-modal EASE (MEASE). Experiments on the TCD-TIMIT corpus corrupted by simulated additive noises show that our proposed subword based VEASE approach is more effective than conventional embedding at the word level. Moreover, visual embedding at the articulation place level, leveraging upon a high correlation between place of articulation and lip shapes, demonstrates an even better performance than that at the phone level. Finally the experiments establish that the proposed MEASE framework, incorporating both audio and visual embeddings, yields significantly better speech quality and intelligibility than those obtained with the best visual-only and audio-only EASE systems.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2021.06.003DOI Listing

Publication Analysis

Top Keywords

visual embedding
12
lip frames
12
articulation lip
8
lip shapes
8
embedding aware
8
speech enhancement
8
place articulation
8
articulation place
8
audio visual
8
embedding
7

Similar Publications

Clinical entity-aware domain adaptation in low resource setting for inflammatory bowel disease.

Front Artif Intell

January 2025

Language Intelligence and Information Retrieval (LIIR) Lab, Department of Computer Science, KU Leuven, Leuven, Belgium.

The digitization of healthcare records has revolutionized medical research and patient care, with electronic health records (EHRs) containing a wealth of structured and unstructured data. Extracting valuable information from unstructured clinical text presents a significant challenge, necessitating automated tools for efficient data mining. Natural language processing (NLP) methods have been pivotal in this endeavor, aiming to extract crucial clinical concepts embedded within free-form text.

View Article and Find Full Text PDF

Abstract visual reasoning based on algebraic methods.

Sci Rep

January 2025

School of Computer Science and Technology, Donghua University, Shanghai, 201620, China.

Extracting high-order abstract patterns from complex high-dimensional data forms the foundation of human cognitive abilities. Abstract visual reasoning involves identifying abstract patterns embedded within composite images, considered a core competency of machine intelligence. Traditional neuro-symbolic methods often infer unknown objects through data fitting, without fully exploring the abstract patterns within composite images and the sequential sensitivity of visual sequences.

View Article and Find Full Text PDF

Diabetic retinopathy, a retinal disorder resulting from diabetes mellitus, is a prominent cause of visual degradation and loss among the global population. Therefore, the identification and classification of diabetic retinopathy are of utmost importance in the clinical diagnosis and therapy. Currently, these duties are extensively carried out by manual examination utilizing the human visual system.

View Article and Find Full Text PDF

Although second-order surface analyses, mainly mean power and cylinder maps, are commonly used to characterize the progressive addition lens (PAL) surface, recently it has been suggested that third-order variations may also have relevancy in PAL optical and visual performance. This paper proposes a third-order smoothness metric, and its associated Riemannian distance, to further characterize PAL's surface optical performance. These metrics can provide a complementary scoring tool to those classical ones, particularly, to analyze the transition zones between far, near, intermediate, and blending zones.

View Article and Find Full Text PDF

Killer whales () have been documented to prey on white sharks (), in some cases causing localised shark displacement and triggering ecological cascades. Notably, a series of such predation events have been reported from South Africa over the last decade, with killer whales specifically targeting sharks' liver. However, observations of these interactions are rare, and knowledge of their frequency across the world's oceans remains limited.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!