In this paper, we propose a visual embedding approach to improve embedding aware speech enhancement (EASE) by synchronizing visual lip frames at the phone and place of articulation levels. We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE). Next, we extract audio-visual embedding from noisy speech and lip frames in an information intersection manner, utilizing a complementarity of audio and visual features for multi-modal EASE (MEASE).
View Article and Find Full Text PDFIEEE Trans Cybern
November 2015
Nowadays, with the continual development of digital capture technologies and social media services, a vast number of media documents are captured and shared online to help attendees record their experience during events. In this paper, we present a method combining semantic inference and multimodal analysis for automatically finding media content to illustrate events using an adaptive probabilistic hypergraph model. In this model, media items are taken as vertices in the weighted hypergraph and the task of enriching media to illustrate events is formulated as a ranking problem.
View Article and Find Full Text PDFAcquiring light field with larger angular resolution and higher spatial resolution in low cost is the goal of light field capture. Combining or modifying traditional optical cameras is a usual method for designing light field capture equipment, among which most models should deliberate trade-off between angular and spatial resolution, but augmenting coded aperture avoids this consideration by multiplexing information from different views. On the basis of coded aperture, this paper suggests an improved light field camera model that has double measurements and one mask.
View Article and Find Full Text PDFIEEE Trans Cybern
February 2016
Fast and accurately categorizing the millions of aerial images on Google Maps is a useful technique in pattern recognition. Existing methods cannot handle this task successfully due to two reasons: 1) the aerial images' topologies are the key feature to distinguish their categories, but they cannot be effectively encoded by a conventional visual codebook and 2) it is challenging to build a realtime image categorization system, as some geo-aware Apps update over 20 aerial images per second. To solve these problems, we propose an efficient aerial image categorization algorithm.
View Article and Find Full Text PDF