The Atlas Structure of Images.

IEEE Trans Pattern Anal Mach Intell

Published: January 2019

Many operations of vision require image regions to be isolated and inter-related. This is challenging when they are different in detail and extent. Practical methods of Computer Vision approach this through the tools of downsampling, pyramids, cropping and patches. In this paper we develop an ideal geometric structure for this, compatible with the existing scale space model of image measurement. Its elements are apertures which view the image like fuzzy-edged portholes of frosted glass. We establish containment and cause/effect relations between apertures, and show that these link them into cross-scale atlases. Atlases formed of Gaussian apertures are shown to be a continuous version of the image pyramid used in Computer Vision, and allow various types of image description to naturally be expressed within their framework. We show that views through Gaussian apertures are approximately equivalent to the jets of derivative of Gaussian filter responses that form part of standard Scale Space theory. This supports a view of the simple cells of mammalian V1 as implementing a system of local views of the retinal image of varying extent and resolution. As a worked example we develop a keypoint descriptor scheme that outperforms previous schemes that do not make use of learning.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2017.2777856DOI Listing

Publication Analysis

Top Keywords

computer vision
8
scale space
8
gaussian apertures
8
image
6
atlas structure
4
structure images
4
images operations
4
operations vision
4
vision require
4
require image
4

Similar Publications

Polysomnography (PSG) is crucial for diagnosing sleep disorders, but manual scoring of PSG is time-consuming and subjective, leading to high variability. While machine-learning models have improved PSG scoring, their clinical use is hindered by the 'black-box' nature. In this study, we present SleepXViT, an automatic sleep staging system using Vision Transformer (ViT) that provides intuitive, consistent explanations by mimicking human 'visual scoring'.

View Article and Find Full Text PDF

A vision model for automated frozen tuna processing.

Sci Rep

January 2025

School of Food and Pharmacy, Zhejiang Ocean University, Zhoushan, 316022, People's Republic of China.

Accurate and rapid segmentation of key parts of frozen tuna, along with precise pose estimation, is crucial for automated processing. However, challenges such as size differences and indistinct features of tuna parts, as well as the complexity of determining fish poses in multi-fish scenarios, hinder this process. To address these issues, this paper introduces TunaVision, a vision model based on YOLOv8 designed for automated tuna processing.

View Article and Find Full Text PDF

Although the Transformer architecture has established itself as the industry standard for jobs involving natural language processing, it still has few uses in computer vision. In vision, attention is used in conjunction with convolutional networks or to replace individual convolutional network elements while preserving the overall network design. Differences between the two domains, such as significant variations in the scale of visual things and the higher granularity of pixels in images compared to words in the text, make it difficult to transfer Transformer from language to vision.

View Article and Find Full Text PDF

Vision transformer-based multimodal fusion network for classification of tumor malignancy on breast ultrasound: A retrospective multicenter study.

Int J Med Inform

January 2025

School of Computer Science and Engineering, Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, PR China. Electronic address:

Background: In the context of routine breast cancer diagnosis, the precise discrimination between benign and malignant breast masses holds utmost significance. Notably, few prior investigations have concurrently explored the integration of imaging histology features, deep learning characteristics, and clinical parameters. The primary objective of this retrospective study was to pioneer a multimodal feature fusion model tailored for the prediction of breast tumor malignancy, harnessing the potential of ultrasound images.

View Article and Find Full Text PDF

Background And Objectives: Hypertensive Retinopathy (HR) is a retinal manifestation resulting from persistently elevated blood pressure. Severity grading of HR is essential for patient risk stratification, effective management, progression monitoring, timely intervention, and minimizing the risk of vision impairment. Computer-aided diagnosis and artificial intelligence (AI) systems play vital roles in the diagnosis and grading of HR.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!