Current scene parsers have effectively distilled abstract relationships among refined instances, while overlooking the discrepancies arising from variations in scene depth. Hence, their potential to imitate the intrinsic 3D perception ability of humans is constrained. In accordance with the principle of perspective, we advocate first grading the depth of the scenes into several slices, and then digging semantic correlations within a slice or between multiple slices. Two attention-based components, namely the Scene Depth Grading Module (SDGM) and the Edge-oriented Correlation Refining Module (EoCRM), comprise our framework, the Line-of-Sight Depth Network (LoSDN). SDGM grades scene into several slices by calculating depth attention tendencies based on parameters with explicit physical meanings, e.g., albedo, occlusion, specular embeddings. This process allocates numerous multi-scale instances to each scene slice based on their line-of-sight extension distance, establishing a solid groundwork for ordered association mining in EoCRM. Since the primary step in distinguishing distant faint targets is boundary delineation, EoCRM implements edge-wise saliency quantification and association digging. Quantitative and diagnostic experiments on Cityscapes, ADE20K, and PASCAL Context datasets reveal the competitiveness of LoSDN and the individual contribution of each highlight. Visualizations display that our strategy offers clear benefits in detecting distant, faint targets.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2025.3540265DOI Listing

Publication Analysis

Top Keywords

line-of-sight depth
8
depth attention
8
scene depth
8
distant faint
8
faint targets
8
scene
5
depth
5
attention panoptic
4
panoptic parsing
4
parsing distant
4

Similar Publications

Current scene parsers have effectively distilled abstract relationships among refined instances, while overlooking the discrepancies arising from variations in scene depth. Hence, their potential to imitate the intrinsic 3D perception ability of humans is constrained. In accordance with the principle of perspective, we advocate first grading the depth of the scenes into several slices, and then digging semantic correlations within a slice or between multiple slices.

View Article and Find Full Text PDF

Relating visual and pictorial space: Integration of binocular disparity and motion parallax.

J Vis

December 2024

BioMotionLab, Centre for Vision Research and Department of Biology, York University, Toronto, Ontario, Canada.

Traditionally, perceptual spaces are defined by the medium through which the visual environment is conveyed (e.g., in a physical environment, through a picture, or on a screen).

View Article and Find Full Text PDF
Article Synopsis
  • Advances in brain PET scanners have improved spatial resolution, but head movement remains a primary cause of image blur, necessitating real-time motion tracking.
  • A new electromagnetic motion tracking (EMMT) system has been developed to enable precise motion correction for PET-CT imaging.
  • The EMMT integrates with existing PET scanners and uses advanced sensors to track head movements in real time, significantly enhancing imaging performance and accuracy.
View Article and Find Full Text PDF

The aims of this paper are twofold: first, to discuss and analyze the concept of binocular disparity and second, to contrast the traditional "air theory" of three-dimensional vision with the much older "ground theory," first suggested by Ibn al-Haytham more than a thousand years ago. The origins of an "air theory" of perception can be traced back to Descartes and subsequently to the philosopher George Berkeley, who claimed that distance "could not be seen" because points lying along the same line of sight (in an empty space) would all project to the same location on the retina. However, Descartes was also aware that the angle of convergence of the two eyes could solve the problem of the "missing" information for the monocular observer and, since then, most visual scientists have assumed that eye vergence plays an important role both in judging absolute distance and for scaling retinal size and binocular disparities.

View Article and Find Full Text PDF
Article Synopsis
  • The text discusses the importance of split-second decision-making for NFL game officials and highlights key visual functions necessary for accurate officiating, such as visual acuity and unobstructed line of sight.
  • It reviews previous research showing that training in neuro-ophthalmic principles improves officials’ understanding and confidence in their decision-making.
  • Additionally, it explores the potential of virtual reality technology to create immersive training environments that simulate real NFL gameplay, allowing officials to practice important visual skills.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!