We investigate the localization of subtle yet discriminative parts for fine-grained image recognition. Based on the observation that such parts typically exist within a hierarchical structure (e.g., from a coarse-scale "head" to a fine-scale "eye" when recognizing bird species), we propose a novel progressive-attention convolutional neural network (PA-CNN) to progressively localize parts at multiple scales. The PA-CNN localizes parts in two steps, where a part proposal network (PPN) generates multiple local attention maps, and a part rectification network (PRN) learns part-specific features from each proposal and provides the PPN with refined part locations. This coupling of the PPN and PRN allows them to be optimized in a mutually reinforcing manner, leading to improved pinpointing of fine-grained parts. Moreover, the convolutional parameters for a PPN at a finer scale can be inherited from the PRN at a coarser scale, enabling a rich part hierarchy (e.g., eye and beak in a bird's head) to be learned in a stacked fashion. Case studies show that PA-CNN can precisely identify parts without using bounding box/part annotations. In addition, quantitative evaluations demonstrate that PA-CNN yields state-of-the-art performance in three challenging fine-grained recognition tasks. i.e., CUB-200-2011, FGVC-Aircraft, and Stanford Cars.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2019.2921876DOI Listing

Publication Analysis

Top Keywords

fine-grained image
8
image recognition
8
parts
6
learning rich
4
rich hierarchies
4
hierarchies progressive
4
progressive attention
4
attention networks
4
fine-grained
4
networks fine-grained
4

Similar Publications

A 3D decoupling Alzheimer's disease prediction network based on structural MRI.

Health Inf Sci Syst

December 2025

School of Mathematics and Computing, University of Southern Queensland, 487-535 West Street, Toowoomba, QLD 4350 Australia.

Purpose: This paper aims to develop a three-dimensional (3D) Alzheimer's disease (AD) prediction method, thereby bettering current predictive methods, which struggle to fully harness the potential of structural magnetic resonance imaging (sMRI) data.

Methods: Traditional convolutional neural networks encounter pressing difficulties in accurately focusing on the AD lesion structure. To address this issue, a 3D decoupling, self-attention network for AD prediction is proposed.

View Article and Find Full Text PDF

When multiple instance learning meets foundation models: Advancing histological whole slide image analysis.

Med Image Anal

January 2025

Department of Radiology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China; Medical Research Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Southern Medical University, Guangzhou, China. Electronic address:

Deep multiple instance learning (MIL) pipelines are the mainstream weakly supervised learning methodologies for whole slide image (WSI) classification. However, it remains unclear how these widely used approaches compare to each other, given the recent proliferation of foundation models (FMs) for patch-level embedding and the diversity of slide-level aggregations. This paper implemented and systematically compared six FMs and six recent MIL methods by organizing different feature extractions and aggregations across seven clinically relevant end-to-end prediction tasks using WSIs from 4044 patients with four different cancer types.

View Article and Find Full Text PDF

CTCNet: a fine-grained classification network for fluorescence images of circulating tumor cells.

Med Biol Eng Comput

January 2025

Anhui BioX-Vision Biological Technology Co., Ltd, Hefei, 230031, Anhui, China.

The identification and categorization of circulating tumor cells (CTCs) in peripheral blood are imperative for advancing cancer diagnostics and prognostics. The intricacy of various CTCs subtypes, coupled with the difficulty in developing exhaustive datasets, has impeded progress in this specialized domain. To date, no methods have been dedicated exclusively to overcoming the classification challenges of CTCs.

View Article and Find Full Text PDF

Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its reliance on fixed text templates as prior knowledge can limit adaptability to fine-grained category distinctions.

View Article and Find Full Text PDF

In the developing mouse ventral spinal cord, HES5, a transcription factor downstream of Notch signalling, is expressed as evenly spaced clusters of high HES5-expressing neural progenitor cells along the dorsoventral axis. While Notch signalling requires direct membrane contact for its activation, we have previously shown mathematically that contact needs to extend beyond neighbouring cells for the HES5 pattern to emerge. However, the presence of cellular structures that could enable such long-distance signalling was unclear.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!