In this paper, we propose a object alignment method that detects the landmarks of an object in 2D images. In the regression forests (RFs) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several landmarks. We propose to refine the votes before accumulating them into the Hough space, by sieving and/or aggregating. In order to filter out false positive votes, we pass them through several sieves, each associated with a discrete or continuous latent variable. The sieves filter out votes that are not consistent with the latent variable in question, something that implicitly enforces global constraints. In order to aggregate the votes when necessary, we adjusts on-the-fly a proximity threshold by applying a classifier on middle-level features extracted from voting maps for the object landmark in question. Moreover, our method is able to predict the unreliability of an individual object landmark. This information can be useful for subsequent object analysis like object recognition. Our contributions are validated for two object alignment tasks, face alignment and car alignment, on data sets with challenging images collected in the wild, i.e., the Labeled Face in the Wild, the Annotated Facial Landmarks in the Wild, and the street scene car data set. We show that with the proposed approach, and without explicitly introducing shape models, we obtain performance superior or close to the state of the art for both tasks.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2014.2383325 | DOI Listing |
Sensors (Basel)
January 2025
Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China.
Accurate 6D object pose estimation is critical for autonomous docking. To address the inefficiencies and inaccuracies associated with maximal cliques-based pose estimation methods, we propose a fast 6D pose estimation algorithm that integrates feature space and space compatibility constraints. The algorithm reduces the graph size by employing Laplacian filtering to resample high-frequency signal nodes.
View Article and Find Full Text PDFSensors (Basel)
January 2025
School of Mechanical and Electrical Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China.
Unsupervised Domain Adaptation for Object Detection (UDA-OD) aims to adapt a model trained on a labeled source domain to an unlabeled target domain, addressing challenges posed by domain shifts. However, existing methods often face significant challenges, particularly in detecting small objects and over-relying on classification confidence for pseudo-label selection, which often leads to inaccurate bounding box localization. To address these issues, we propose a novel UDA-OD framework that leverages scale consistency (SC) and Temporal Ensemble Pseudo-Label Selection (TEPLS) to enhance cross-domain robustness and detection performance.
View Article and Find Full Text PDFSensors (Basel)
December 2024
Institute of Computer and Communication Engineering, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan.
Precision depth estimation plays a key role in many applications, including 3D scene reconstruction, virtual reality, autonomous driving and human-computer interaction. Through recent advancements in deep learning technologies, monocular depth estimation, with its simplicity, has surpassed the traditional stereo camera systems, bringing new possibilities in 3D sensing. In this paper, by using a single camera, we propose an end-to-end supervised monocular depth estimation autoencoder, which contains an encoder with a structure with a mixed convolution neural network and vision transformers and an effective adaptive fusion decoder to obtain high-precision depth maps.
View Article and Find Full Text PDFJ Vis
January 2025
Magic Leap Switzerland GmbH, Zürich, Switzerland.
When rendering the visual scene for near-eye head-mounted displays, accurate knowledge of the geometry of the displays, scene objects, and eyes is required for the correct generation of the binocular images. Despite possible design and calibration efforts, these quantities are subject to positional and measurement errors, resulting in some misalignment of the images projected to each eye. Previous research investigated the effects in virtual reality (VR) setups that triggered such symptoms as eye strain and nausea.
View Article and Find Full Text PDFFront Hum Neurosci
December 2024
Department of Neuroscience, Erasmus Medical Center, Rotterdam, Netherlands.
Introduction: Global Visual Selective Attention (VSA) is the ability to integrate multiple visual elements of a scene to achieve visual overview. This is essential for navigating crowded environments and recognizing objects or faces. Clinical pediatric research on global VSA deficits primarily focuses on autism spectrum disorder (ASD).
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!