IEEE Trans Pattern Anal Mach Intell
November 2012
IEEE Trans Pattern Anal Mach Intell
June 2009
We address content-based retrieval of complete 3D object models by a probabilistic generative description of local shape properties. The proposed shape description framework characterizes a 3D object with sampled multivariate probability density functions of its local surface features. This density-based descriptor can be efficiently computed via kernel density estimation (KDE) coupled with fast Gauss transform.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2008
We propose a new two-stage framework for joint analysis of head gesture and speech prosody patterns of a speaker towards automatic realistic synthesis of head gestures from speech prosody. In the first stage analysis, we perform Hidden Markov Model (HMM) based unsupervised temporal segmentation of head gesture and speech prosody features separately to determine elementary head gesture and speech prosody patterns, respectively, for a particular speaker. In the second stage, joint analysis of correlations between these elementary head gesture and prosody patterns is performed using Multi-Stream HMMs to determine an audio-visual mapping model.
View Article and Find Full Text PDFIEEE Trans Image Process
October 2006
There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features.
View Article and Find Full Text PDFIEEE Trans Image Process
October 2012
In this correspondence, the problem of directional and multiscale edge detection is considered. Orthogonal and linear-phase M-band wavelet transform is used to decompose the image into MxM channels. These channels are then combined such that each combination, which we refer to as decomposition filter, results in zero-crossings at the locations of edges corresponding to different directions and resolutions, and inherently performs regularization against noise.
View Article and Find Full Text PDF