IEEE Trans Image Process
November 2017
Light source position (LSP) estimation is a difficult yet an important problem in computer vision. A common approach for estimating the LSP assumes Lambert's law. However, in real-world scenes, Lambert's law does not hold for all different types of surfaces.
View Article and Find Full Text PDFFront Comput Neurosci
February 2015
The human visual system is assumed to transform low level visual features to object and scene representations via features of intermediate complexity. How the brain computationally represents intermediate features is still unclear. To further elucidate this, we compared the biologically plausible HMAX model and Bag of Words (BoW) model from computer vision.
View Article and Find Full Text PDFThere is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is a difficult problem, therefore, it remains a most active area of research in computer vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities, and at least six more aspects.
View Article and Find Full Text PDFState-of-the-art bottom-up saliency models often assign high saliency values at or near high-contrast edges, whereas people tend to look within the regions delineated by those edges, namely the objects. To resolve this inconsistency, in this work we estimate saliency at the level of coherent image regions. According to object-based attention theory, the human brain groups similar pixels into coherent regions, which are called proto-objects.
View Article and Find Full Text PDFReconstruction of 3D scene geometry is an important element for scene understanding, autonomous vehicle and robot navigation, image retrieval, and 3D television. We propose accounting for the inherent structure of the visual world when trying to solve the scene reconstruction problem. Consequently, we identify geometric scene categorization as the first step toward robust and efficient depth estimation from single images.
View Article and Find Full Text PDFThis paper studies automatic image classification by modeling soft assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inherent component of the codebook model is the assignment of discrete visual words to continuous image features.
View Article and Find Full Text PDFThe visual appearance of natural scenes is governed by a surprisingly simple hidden structure. The distributions of contrast values in natural images generally follow a Weibull distribution, with beta and gamma as free parameters. Beta and gamma seem to structure the space of natural images in an ecologically meaningful way, in particular with respect to the fragmentation and texture similarity within an image.
View Article and Find Full Text PDFWe propose a new method for contour tracking in video. The inverted distance transform of the edge map is used as an edge indicator function for contour detection. Using the concept of topographical distance, the watershed segmentation can be formulated as a minimization.
View Article and Find Full Text PDFWe derive the decomposition of the anisotropic Gaussian in a one-dimensional (1-D) Gauss filter in the x-direction followed by a 1-D filter in a nonorthogonal direction phi. So also the anisotropic Gaussian can be decomposed by dimension. This appears to be extremely efficient from a computing perspective.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
January 2007
In multitarget tracking, the main challenge is to maintain the correct identity of targets even under occlusions or when differences between the targets are small. The paper proposes a new approach to this problem by incorporating the context information. The context of a target in an image sequence has two components: the spatial context including the local background and nearby targets and the temporal context including all appearances of the targets that have been seen previously.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
October 2006
This paper presents the semantic pathfinder architecture for generic indexing of multimedia archives. The semantic pathfinder extracts semantic concepts from video by exploring different paths through three consecutive analysis steps, which we derive from the observation that produced video is the result of an authoring-driven process. We exploit this authoring metaphor for machine-driven understanding.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
April 2006
This paper offers a sparse, multiscale representation of objects. It captures the object appearance by selection from a very large dictionary of Gaussian differential basis functions. The learning procedure results from the matching pursuit algorithm, while the recognition is based on polynomial approximation to the bases, turning image matching into a problem of polynomial evaluation.
View Article and Find Full Text PDFLuminance-based features are widely used as low-level input for computer vision applications, even when color data is available. The extension of feature detection to the color domain prevents information loss due to isoluminance and allows us to exploit the photometric information. To fully exploit the extra information in the color data, the vector nature of color data has to be taken into account and a sound framework is needed to combine feature and photometric invariance theory.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2004
We propose a new method for object tracking in image sequences using template matching. To update the template, appearance features are smoothed temporally by robust Kalman filters, one to each pixel. The resistance of the resulting template to partial occlusions enables the accurate detection and handling of more severe occlusions.
View Article and Find Full Text PDFSegmentation of the spine directly from three-dimensional (3-D) image data is desirable to accurately capture its morphological properties. We describe a method that allows true 3-D spinal image segmentation using a deformable integral spine model. The method learns the appearance of vertebrae from multiple continuous features recorded along vertebra boundaries in a given training set of images.
View Article and Find Full Text PDFWe propose a method for concept-based medical image retrieval that is a superset of existing semantic-based image retrieval methods. We conceive of a concept as an incremental and interactive formalization of the user's conception of an object in an image. The premise is that such a concept is closely related to a user's specific preferences and subjectivity and, thus, allows to deal with the complexity and content-dependency of medical image content.
View Article and Find Full Text PDF