We investigate the localization of subtle yet discriminative parts for fine-grained image recognition. Based on the observation that such parts typically exist within a hierarchical structure (e.g., from a coarse-scale "head" to a fine-scale "eye" when recognizing bird species), we propose a novel progressive-attention convolutional neural network (PA-CNN) to progressively localize parts at multiple scales. The PA-CNN localizes parts in two steps, where a part proposal network (PPN) generates multiple local attention maps, and a part rectification network (PRN) learns part-specific features from each proposal and provides the PPN with refined part locations. This coupling of the PPN and PRN allows them to be optimized in a mutually reinforcing manner, leading to improved pinpointing of fine-grained parts. Moreover, the convolutional parameters for a PPN at a finer scale can be inherited from the PRN at a coarser scale, enabling a rich part hierarchy (e.g., eye and beak in a bird's head) to be learned in a stacked fashion. Case studies show that PA-CNN can precisely identify parts without using bounding box/part annotations. In addition, quantitative evaluations demonstrate that PA-CNN yields state-of-the-art performance in three challenging fine-grained recognition tasks. i.e., CUB-200-2011, FGVC-Aircraft, and Stanford Cars.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2019.2921876 | DOI Listing |
Health Inf Sci Syst
December 2025
School of Mathematics and Computing, University of Southern Queensland, 487-535 West Street, Toowoomba, QLD 4350 Australia.
Purpose: This paper aims to develop a three-dimensional (3D) Alzheimer's disease (AD) prediction method, thereby bettering current predictive methods, which struggle to fully harness the potential of structural magnetic resonance imaging (sMRI) data.
Methods: Traditional convolutional neural networks encounter pressing difficulties in accurately focusing on the AD lesion structure. To address this issue, a 3D decoupling, self-attention network for AD prediction is proposed.
Med Image Anal
January 2025
Department of Radiology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China; Medical Research Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Southern Medical University, Guangzhou, China. Electronic address:
Deep multiple instance learning (MIL) pipelines are the mainstream weakly supervised learning methodologies for whole slide image (WSI) classification. However, it remains unclear how these widely used approaches compare to each other, given the recent proliferation of foundation models (FMs) for patch-level embedding and the diversity of slide-level aggregations. This paper implemented and systematically compared six FMs and six recent MIL methods by organizing different feature extractions and aggregations across seven clinically relevant end-to-end prediction tasks using WSIs from 4044 patients with four different cancer types.
View Article and Find Full Text PDFMed Biol Eng Comput
January 2025
Anhui BioX-Vision Biological Technology Co., Ltd, Hefei, 230031, Anhui, China.
The identification and categorization of circulating tumor cells (CTCs) in peripheral blood are imperative for advancing cancer diagnostics and prognostics. The intricacy of various CTCs subtypes, coupled with the difficulty in developing exhaustive datasets, has impeded progress in this specialized domain. To date, no methods have been dedicated exclusively to overcoming the classification challenges of CTCs.
View Article and Find Full Text PDFSci Rep
January 2025
Zhengzhou University of Light Industry, Zhengzhou, 450001, China.
Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its reliance on fixed text templates as prior knowledge can limit adaptability to fine-grained category distinctions.
View Article and Find Full Text PDFBiol Open
January 2025
Faculty of Biology Medicine and Health, The University of Manchester, Manchester M13 9PT, UK.
In the developing mouse ventral spinal cord, HES5, a transcription factor downstream of Notch signalling, is expressed as evenly spaced clusters of high HES5-expressing neural progenitor cells along the dorsoventral axis. While Notch signalling requires direct membrane contact for its activation, we have previously shown mathematically that contact needs to extend beyond neighbouring cells for the HES5 pattern to emerge. However, the presence of cellular structures that could enable such long-distance signalling was unclear.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!