IEEE Trans Neural Netw Learn Syst
August 2022
Modeling the temporal behavior of data is of primordial importance in many scientific and engineering fields. Baseline methods assume that both the dynamic and observation equations follow linear-Gaussian models. However, there are many real-world processes that cannot be characterized by a single linear behavior.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
May 2021
In this article, we address the problem of tracking multiple speakers via the fusion of visual and auditory information. We propose to exploit the complementary nature and roles of these two modalities in order to accurately estimate smooth trajectories of the tracked persons, to deal with the partial or total absence of one of the modalities over short periods of time, and to estimate the acoustic status-either speaking or silent-of each tracked person over time. We propose to cast the problem at hand into a generative audio-visual fusion (or association) model formulated as a latent-variable temporal graphical model.
View Article and Find Full Text PDFDeep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks, such as human pose estimation, did not escape from this trend. There is a large number of deep models, where small changes in the network architecture, or in the data pre-processing, together with the stochastic nature of the optimization procedures, produce notably different results, making extremely difficult to sift methods that significantly outperform others.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
November 2018
The visual focus of attention (VFOA) has been recognized as a prominent conversational cue. We are interested in estimating and tracking the VFOAs associated with multi-party social interactions. We note that in this type of situations the participants either look at each other or at an object of interest; therefore their eyes are not always visible.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
June 2018
This paper addresses the problem of registering multiple point sets. Solutions to this problem are often approximated by repeatedly solving for pairwise registration, which results in an uneven treatment of the sets forming a pair: a model set and a data set. The main drawback of this strategy is that the model set may contain noise and outliers, which negatively affects the estimation of the registration parameters.
View Article and Find Full Text PDFIEEE Trans Image Process
March 2017
Head-pose estimation has many applications, such as social event analysis, human-robot and human-computer interaction, driving assistance, and so forth. Head-pose estimation is challenging, because it must cope with changing illumination conditions, variabilities in face orientation and in appearance, partial occlusions of facial landmarks, as well as bounding-box-to-face alignment errors. We propose to use a mixture of linear regressions with partially-latent output.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
May 2018
Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
December 2016
Data clustering has received a lot of attention and numerous methods, algorithms and software packages are available. Among these techniques, parametric finite-mixture models play a central role due to their interesting mathematical properties and to the existence of maximum-likelihood estimators based on expectation-maximization (EM). In this paper we propose a new mixture model that associates a weight with each observed point.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
November 2015
This paper addresses the problem of range-stereo fusion, for the construction of high-resolution depth maps. In particular, we combine low-resolution depth data with high-resolution stereo data, in a maximum a posteriori (MAP) formulation. Unlike existing schemes that build on MRF optimizers, we infer the disparity map from a series of local energy minimization problems that are solved hierarchically, by growing sparse initial disparities obtained from the depth data.
View Article and Find Full Text PDFInt J Neural Syst
February 2015
In this paper, we address the problems of modeling the acoustic space generated by a full-spectrum sound source and using the learned model for the localization and separation of multiple sources that simultaneously emit sparse-spectrum sounds. We lay theoretical and methodological grounds in order to introduce the binaural manifold paradigm. We perform an in-depth study of the latent low-dimensional structure of the high-dimensional interaural spectral data, based on a corpus recorded with a human-like audiomotor robot head.
View Article and Find Full Text PDFThe receptive fields of simple cells in the visual cortex can be understood as linear filters. These filters can be modeled by Gabor functions or gaussian derivatives. Gabor functions can also be combined in an energy model of the complex cell response.
View Article and Find Full Text PDFThe problem of multimodal clustering arises whenever the data are gathered with several physically different sensors. Observations from different modalities are not necessarily aligned in the sense there there is no obvious way to associate or compare them in some common space. A solution may consist in considering multiple clustering tasks independently for each modality.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
April 2011
Triangulated meshes have become ubiquitous discrete surface representations. In this paper, we address the problem of how to maintain the manifold properties of a surface while it undergoes strong deformations that may cause topological changes. We introduce a new self-intersection removal algorithm, TransforMesh, and propose a mesh evolution framework based on this algorithm.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
March 2011
This paper addresses the issue of matching rigid and articulated shapes through probabilistic point registration. The problem is recast into a missing data framework where unknown correspondences are handled via mixture models. Adopting a maximum likelihood principle, we introduce an innovative EM-like algorithm, namely, the Expectation Conditional Maximization for Point Registration (ECMPR) algorithm.
View Article and Find Full Text PDFThe human visual system obeys Listing's law, which means that the cyclorotation of the eye (around the line of sight) can be predicted from the direction of the fixation point. It is shown here that Listing's law can conveniently be formulated in terms of rotation matrices. The function that defines the observed cyclorotation is derived in this representation.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
January 2009
We address the problem of human motion tracking by registering a surface to 3-D data. We propose a method that iteratively computes two things: Maximum likelihood estimates for both the kinematic and free-motion parameters of an articulated object, as well as probabilities that the data are assigned either to an object part, or to an outlier cluster. We introduce a new metric between observed points and normals on one side, and a parameterized surface on the other side, the latter being defined as a blending over a set of ellipsoids.
View Article and Find Full Text PDFThe geometry of binocular projection is analyzed in relation to the primate visual system. An oculomotor parameterization that includes the classical vergence and version angles is defined. It is shown that the epipolar geometry of the system is constrained by binocular coordination of the eyes.
View Article and Find Full Text PDF