Publications by authors named "Mathieu Salzmann"

A recent trend in Non-Rigid Structure-from-Motion (NRSfM) is to express local, differential constraints between pairs of images, from which the surface normal at any point can be obtained by solving a system of polynomial equations. While this approach is more successful than its counterparts relying on global constraints, the resulting methods face two main problems: First, most of the equation systems they formulate are of high degree and must be solved using computationally expensive polynomial solvers. Some methods use polynomial reduction strategies to simplify the system, but this adds some phantom solutions.

View Article and Find Full Text PDF

In this work, we tackle the task of estimating the 6D pose of an object from point cloud data. While recent learning-based approaches have shown remarkable success on synthetic datasets, we have observed them to fail in the presence of real-world data. We investigate the root causes of these failures and identify two main challenges: The sensitivity of the widely-used SVD-based loss function to the range of rotation between the two point clouds, and the difference in feature distributions between the source and target point clouds.

View Article and Find Full Text PDF
Article Synopsis
  • Vehicles face various road obstacles that can't all be pre-recorded for detector training.
  • To tackle this, researchers choose specific image sections and fill them in with the surrounding road texture to hide the obstacles.
  • A specialized neural network is then used to identify differences between the original and modified images, indicating whether an obstacle has been removed.
View Article and Find Full Text PDF

While adversarial training and its variants have shown to be the most effective algorithms to defend against adversarial attacks, their extremely slow training process makes it hard to scale to large datasets like ImageNet. The key idea of recent works to accelerate adversarial training is to substitute multi-step attacks (e.g.

View Article and Find Full Text PDF

In this article we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying contrastive loss only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features, well-suited for human pose estimation.

View Article and Find Full Text PDF
Article Synopsis
  • Supervised object detection and segmentation methods are highly accurate but struggle to generalize to images that differ significantly from their training data.
  • The proposed self-supervised approach focuses on linking object segmentation with background reconstruction, allowing reconstruction of background regions while highlighting the moving objects.
  • A Monte Carlo-based training strategy is used to explore various object proposals, resulting in better performance for human detection and segmentation in non-standard images compared to existing self-supervised methods.
View Article and Find Full Text PDF

Training certifiable neural networks enables us to obtain models with robustness guarantees against adversarial attacks. In this work, we introduce a framework to obtain a provable adversarial-free region in the neighborhood of the input data by a polyhedral envelope, which yields more fine-grained certified robustness than existing methods. We further introduce polyhedral envelope regularization (PER) to encourage larger adversarial-free regions and thus improve the provable robustness of the models.

View Article and Find Full Text PDF

Weight sharing promises to make neural architecture search (NAS) tractable even on commodity hardware. Existing methods in this space rely on a diverse set of heuristics to design and train the shared-weight backbone network, a.k.

View Article and Find Full Text PDF
Counting People by Estimating People Flows.

IEEE Trans Pattern Anal Mach Intell

November 2022

Modern methods for counting people in crowded scenes rely on deep networks to estimate people densities in individual images. As such, only very few take advantage of temporal consistency in video sequences, and those that do only impose weak smoothness constraints across consecutive frames. In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing them.

View Article and Find Full Text PDF
Robust Differentiable SVD.

IEEE Trans Pattern Anal Mach Intell

September 2022

Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. However, the derivatives of the eigenvectors tend to be numerically unstable, whether using the SVD to compute them analytically or using the Power Iteration (PI) method to approximate them. This instability arises in the presence of eigenvalues that are close to each other.

View Article and Find Full Text PDF
Article Synopsis
  • The paper discusses a two-stream deep learning model designed for 3D cloth draping on virtual human bodies, which is efficient and produces realistic results.
  • The model mimics traditional physics-based simulation methods but uses significantly less computation time, achieving this through specialized loss functions to enhance detail and collision awareness.
  • The research validates the model's effectiveness across different garment types and body shapes, demonstrating better performance compared to existing methods.
View Article and Find Full Text PDF

Many classical Computer Vision problems, such as essential matrix computation and pose estimation from 3D to 2D correspondences, can be tackled by solving a linear least-square problem, which can be done by finding the eigenvector corresponding to the smallest, or zero, eigenvalue of a matrix representing a linear system. Incorporating this in deep learning frameworks would allow us to explicitly encode known notions of geometry, instead of having the network implicitly learn them from data. However, performing eigendecomposition within a network requires the ability to differentiate this operation.

View Article and Find Full Text PDF
Article Synopsis
  • Obtaining annotations for training data in biomedical image analysis is challenging, hindering deep learning methods, but this study presents a solution using 2D annotations from Maximum Intensity Projections (MIP) instead of 3D.
  • Annotating 2D projections is about twice as fast as annotating the full 3D volumes according to user studies, significantly reducing annotation time.
  • The authors introduce a new loss function that enables 3D predictions from 2D annotations without altering existing deep network architectures, demonstrating that networks can achieve comparable performance with this approach in extensive experiments on various biomedical imaging datasets.
View Article and Find Full Text PDF

We present an Unsupervised Domain Adaptation strategy to compensate for domain shifts on Electron Microscopy volumes. Our method aggregates visual correspondences-motifs that are visually similar across different acquisitions-to infer changes on the parameters of pretrained models, and enable them to operate on new data. In particular, we examine the annotations of an existing acquisition to determine pivot locations that characterize the reference segmentation, and use a patch matching algorithm to find their candidate visual correspondences in a new volume.

View Article and Find Full Text PDF

The performance of a classifier trained on data coming from a specific domain typically degrades when applied to a related but different one. While annotating many samples from the new domain would address this issue, it is often too expensive or impractical. Domain Adaptation has therefore emerged as a solution to this problem; It leverages annotated data from a source domain, in which it is abundant, to train a classifier to operate in a target domain, in which it is either sparse or even lacking altogether.

View Article and Find Full Text PDF

Multi-label submodular Markov Random Fields (MRFs) have been shown to be solvable using max-flow based on an encoding of the labels proposed by Ishikawa, in which each variable X is represented by l nodes (where l is the number of labels) arranged in a column. However, this method in general requires 2 l edges for each pair of neighbouring variables. This makes it inapplicable to realistic problems with many variables and labels, due to excessive memory requirement.

View Article and Find Full Text PDF

Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using image tags.

View Article and Find Full Text PDF

Representing images and videos with Symmetric Positive Definite (SPD) matrices, and considering the Riemannian geometry of the resulting space, has been shown to yield high discriminative power in many visual recognition tasks. Unfortunately, computation on the Riemannian manifold of SPD matrices -especially of high-dimensional ones- comes at a high cost that limits the applicability of existing techniques. In this paper, we introduce algorithms able to handle high-dimensional SPD matrices by constructing a lower-dimensional SPD manifold.

View Article and Find Full Text PDF

In this paper, we develop an approach to exploiting kernel methods with manifold-valued data. In many computer vision problems, the data can be naturally represented as points on a Riemannian manifold. Due to the non-Euclidean geometry of Riemannian manifolds, usual Euclidean computer vision and machine learning algorithms yield inferior results on such data.

View Article and Find Full Text PDF

This paper tackles the problem of reconstructing the shape of a smooth mirror surface from a single image. In particular, we consider the case where the camera is observing the reflection of a static reference target in the unknown mirror. We first study the reconstruction problem given dense correspondences between 3D points on the reference target and image locations.

View Article and Find Full Text PDF

Low-dimensional representations are key to the success of many video classification algorithms. However, the commonly-used dimensionality reduction techniques fail to account for the fact that only part of the signal is shared across all the videos in one class. As a consequence, the resulting representations contain instance-specific information, which introduces noise in the classification process.

View Article and Find Full Text PDF

Most recent approaches to monocular nonrigid 3D shape recovery rely on exploiting point correspondences and work best when the whole surface is well textured. The alternative is to rely on either contours or shading information, which has only been demonstrated in very restrictive settings. Here, we propose a novel approach to monocular deformable shape recovery that can operate under complex lighting and handle partially textured surfaces.

View Article and Find Full Text PDF

Recovering the 3D shape of a nonrigid surface from a single viewpoint is known to be both ambiguous and challenging. Resolving the ambiguities typically requires prior knowledge about the most likely deformations that the surface may undergo. It often takes the form of a global deformation model that can be learned from training data.

View Article and Find Full Text PDF

Three-dimensional detection and shape recovery of a nonrigid surface from video sequences require deformation models to effectively take advantage of potentially noisy image data. Here, we introduce an approach to creating such models for deformable 3D surfaces. We exploit the fact that the shape of an inextensible triangulated mesh can be parameterized in terms of a small subset of the angles between its facets.

View Article and Find Full Text PDF