Feature description for local image patch is widely used in computer vision. While the conventional way to design local descriptor is based on expert experience and knowledge, learning-based methods for designing local descriptor become more and more popular because of their good performance and data-driven property. This paper proposes a novel data-driven method for designing binary feature descriptor, which we call receptive fields descriptor (RFD). Technically, RFD is constructed by thresholding responses of a set of receptive fields, which are selected from a large number of candidates according to their distinctiveness and correlations in a greedy way. Using two different kinds of receptive fields (namely rectangular pooling area and Gaussian pooling area) for selection, we obtain two binary descriptors RFDR and RFDG .accordingly. Image matching experiments on the well-known patch data set and Oxford data set demonstrate that RFD significantly outperforms the state-of-the-art binary descriptors, and is comparable with the best float-valued descriptors at a fraction of processing time. Finally, experiments on object recognition tasks confirm that both RFDR and RFDG successfully bridge the performance gap between binary descriptors and their floating-point competitors.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2014.2317981DOI Listing

Publication Analysis

Top Keywords

receptive fields
16
binary descriptors
12
selection binary
8
binary feature
8
feature description
8
local descriptor
8
pooling area
8
rfdr rfdg
8
data set
8
binary
5

Similar Publications

Electrocardiogram (ECG) signals contain complex and diverse features, serving as a crucial basis for arrhythmia diagnosis. The subtle differences in characteristics among various types of arrhythmias, coupled with class imbalance issues in datasets, often hinder existing models from effectively capturing key information within these complex signals, leading to a bias towards normal classes. To address these challenges, this paper proposes a method for arrhythmia classification based on a multi-branch, multi-head attention temporal convolutional network (MB-MHA-TCN).

View Article and Find Full Text PDF

Cascaded Feature Fusion Grasping Network for Real-Time Robotic Systems.

Sensors (Basel)

December 2024

College of Engineering, Huaqiao University, Quanzhou 362021, China.

Grasping objects of irregular shapes and various sizes remains a key challenge in the field of robotic grasping. This paper proposes a novel RGB-D data-based grasping pose prediction network, termed Cascaded Feature Fusion Grasping Network (CFFGN), designed for high-efficiency, lightweight, and rapid grasping pose estimation. The network employs innovative structural designs, including depth-wise separable convolutions to reduce parameters and enhance computational efficiency; convolutional block attention modules to augment the model's ability to focus on key features; multi-scale dilated convolution to expand the receptive field and capture multi-scale information; and bidirectional feature pyramid modules to achieve effective fusion and information flow of features at different levels.

View Article and Find Full Text PDF

Remote photo-plethysmography (rPPG) is a useful camera-based health motioning method that can measure the heart rhythm from facial videos. Many well-established deep learning models can provide highly accurate and robust results in measuring heart rate (HR) and heart rate variability (HRV). However, these methods are unable to effectively eliminate illumination variation and motion artifact disturbances, and their substantial computational resource requirements significantly limit their applicability in real-world scenarios.

View Article and Find Full Text PDF

In order to achieve infrared aircraft detection under interference conditions, this paper proposes an infrared aircraft detection algorithm based on high-resolution feature-enhanced semantic segmentation network. Firstly, the designed location attention mechanism is utilized to enhance the current-level feature map by obtaining correlation weights between pixels at different positions. Then, it is fused with the high-level feature map rich in semantic features to construct a location attention feature fusion network, thereby enhancing the representation capability of target features.

View Article and Find Full Text PDF

Background/objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions.

Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!