Publications by authors named "Ales Leonardis"

Data distribution gaps often pose significant challenges to the use of deep segmentation models. However, retraining models for each distribution is expensive and time-consuming. In clinical contexts, device-embedded algorithms and networks, typically unretrainable and unaccessable post-manufacture, exacerbate this issue.

View Article and Find Full Text PDF

Depth information opens up new opportunities for video object segmentation (VOS) to be more accurate and robust in complex scenes. However, the RGBD VOS task is largely unexplored due to the expensive collection of RGBD data and time-consuming annotation of segmentation. In this work, we first introduce a new benchmark for RGBD VOS, named DepthVOS, which contains 350 videos (over 55k frames in total) annotated with masks and bounding boxes.

View Article and Find Full Text PDF

High dynamic range (HDR) imaging is of fundamental importance in modern digital photography pipelines and used to produce a high-quality photograph with well exposed regions despite varying illumination across the image. This is typically achieved by merging multiple low dynamic range (LDR) images taken at different exposures. However, over-exposed regions and misalignment errors due to poorly compensated motion result in artefacts such as ghosting.

View Article and Find Full Text PDF

In this paper we consider recent advances in the use of deep convolutional neural networks to understanding biological vision. We focus on claims about the plausibility of feedforward deep convolutional neural networks (fDCNNs) as models of image classification in the biological system. Despite the putative similarity of these networks to some properties of the biological vision system, and the remarkable levels of performance accuracy of some fDCNNs, we argue that their plausibility as a framework for understanding image classification remains unclear.

View Article and Find Full Text PDF

Recurrent models are a popular choice for video enhancement tasks such as video denoising or super-resolution. In this work, we focus on their stability as dynamical systems and show that they tend to fail catastrophically at inference time on long video sequences. To address this issue, we (1) introduce a diagnostic tool which produces input sequences optimized to trigger instabilities and that can be interpreted as visualizations of temporal receptive fields, and (2) propose two approaches to enforce the stability of a model during training: constraining the spectral norm or constraining the stable rank of its convolutional layers.

View Article and Find Full Text PDF

Image demoireing is a multi-faceted image restoration task involving both moire pattern removal and color restoration. In this paper, we raise a general degradation model to describe an image contaminated by moire patterns, and propose a novel multi-scale bandpass convolutional neural network (MBCNN) for single image demoireing. For moire pattern removal, we propose a multi-block-size learnable bandpass filters (M-LBFs), based on a block-wise frequency domain transform, to learn the frequency domain priors of moire patterns.

View Article and Find Full Text PDF

Data-driven deep learning approaches to image registration can be less accurate than conventional iterative approaches, especially when training data is limited. To address this issue and meanwhile retain the fast inference speed of deep learning, we propose VR-Net, a novel cascaded variational network for unsupervised deformable image registration. Using a variable splitting optimization scheme, we first convert the image registration problem, established in a generic variational framework, into two sub-problems, one with a point-wise, closed-form solution and the other one being a denoising problem.

View Article and Find Full Text PDF

Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch.

View Article and Find Full Text PDF

This work examines the differences between a human and a machine in object recognition tasks. The machine is useful as much as the output classification labels are correct and match the dataset-provided labels. However, very often a discrepancy occurs because the dataset label is different than the one expected by a human.

View Article and Find Full Text PDF

This paper presents a novel robust method for single target tracking in RGB-D images, and also contributes a substantial new benchmark dataset for evaluating RGB-D trackers. While a target object's color distribution is reasonably motion-invariant, this is not true for the target's depth distribution, which continually varies as the target moves relative to the camera. It is therefore nontrivial to design target models which can fully exploit (potentially very rich) depth information for target tracking.

View Article and Find Full Text PDF

The paper presents a new compositional hierarchical model for robust music transcription. Its main features are unsupervised learning of a hierarchical representation of input data, transparency, which enables insights into the learned representation, as well as robustness and speed which make it suitable for real-world and real-time use. The model consists of multiple layers, each composed of a number of parts.

View Article and Find Full Text PDF

The problem of visual tracking evaluation is sporting a large variety of performance measures, and largely suffers from lack of consensus about which measures should be used in experiments. This makes the cross-paper tracker comparison difficult. Furthermore, as some measures may be less effective than others, the tracking results may be skewed or biased toward particular tracking aspects.

View Article and Find Full Text PDF

This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison.

View Article and Find Full Text PDF

Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Organized for a computer vision audience, we present functional principles of the processing hierarchies present in the primate visual system considering recent discoveries in neurophysiology.

View Article and Find Full Text PDF

We propose a new method for a supervised online estimation of probabilistic discriminative models for classification tasks. The method estimates the class distributions from a stream of data in the form of Gaussian mixture models (GMMs). The reconstructive updates of the distributions are based on the recently proposed online kernel density estimator (oKDE).

View Article and Find Full Text PDF

This paper addresses the problem of tracking objects which undergo rapid and significant appearance changes. We propose a novel coupled-layer visual model that combines the target's global and local appearance by interlacing two layers. The local layer in this model is a set of local patches that geometrically constrain the changes in the target's appearance.

View Article and Find Full Text PDF

We propose a new dynamic model which can be used within blob trackers to track the target's center of gravity. A strong point of the model is that it is designed to track a variety of motions which are usually encountered in applications such as pedestrian tracking, hand tracking, and sports. We call the dynamic model a two-stage dynamic model due to its particular structure, which is a composition of two models: a liberal model and a conservative model.

View Article and Find Full Text PDF

In this paper, we propose a novel method for efficiently calculating the eigenvectors of uniformly rotated images of a set of templates. As we show, the images can be optimally approximated by a linear series of eigenvectors which can be calculated without actually decomposing the sample covariance matrix.

View Article and Find Full Text PDF

Linear subspace methods that provide sufficient reconstruction of the data, such as PCA, offer an efficient way of dealing with missing pixels, outliers, and occlusions that often appear in the visual data. Discriminative methods, such as LDA, which, on the other hand, are better suited for classification tasks, are highly sensitive to corrupted data. We present a theoretical framework for achieving the best of both types of methods: An approach that combines the discrimination power of discriminative methods with the reconstruction property of reconstructive methods which enables one to work on subsets of pixels in images to efficiently detect and reject the outliers.

View Article and Find Full Text PDF

We propose a method for optimizing the complexity of Radial basis function (RBF) networks. The method involves two procedures: adaptation (training) and selection. The first procedure adaptively changes the locations and the width of the basis functions and trains the linear weights.

View Article and Find Full Text PDF