Recent state-of-the-art one-stage instance segmentation model SOLO divides the input image into a grid and directly predicts per grid cell object masks with fully-convolutional networks, yielding comparably good performance as traditional two-stage Mask R-CNN yet enjoying much simpler architecture and higher efficiency. We observe SOLO generates similar masks for an object at nearby grid cells, and these neighboring predictions can complement each other as some may better segment certain object part, most of which are however directly discarded by non-maximum-suppression. Motivated by the observed gap, we develop a novel learning-based aggregation method that improves upon SOLO by leveraging the rich neighboring information while maintaining the architectural efficiency. The resulting model is named SODAR. Unlike the original per grid cell object masks, SODAR is implicitly supervised to learn mask representations that encode geometric structure of nearby objects and complement adjacent representations with context. The aggregation method further includes two novel designs: 1) a mask interpolation mechanism that enables the model to generate much fewer mask representations by sharing neighboring representations among nearby grid cells, and thus saves computation and memory; 2) a deformable neighbour sampling mechanism that allows the model to adaptively adjust neighbor sampling locations thus gathering mask representations with more relevant context and achieving higher performance. SODAR significantly improves the instance segmentation performance, e.g., it outperforms a SOLO model with ResNet-101 backbone by 2.2 AP on COCO test set, with only about 3% additional computation. We further show consistent performance gain with the SOLOv2 model.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2021.3135717DOI Listing

Publication Analysis

Top Keywords

mask representations
16
instance segmentation
8
grid cell
8
cell object
8
object masks
8
nearby grid
8
grid cells
8
aggregation method
8
mask
6
representations
6

Similar Publications

STMGraph: spatial-context-aware of transcriptomes via a dual-remasked dynamic graph attention model.

Brief Bioinform

November 2024

Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, No. 15 Shangxiadian Road, Cangshan District, Fuzhou 350002, China.

Spatial transcriptomics (ST) technologies enable dissecting the tissue architecture in spatial context. To perceive the global contextual information of gene expression patterns in tissue, the spatial dependence of cells must be fully considered by integrating both local and non-local features by means of spatial-context-aware. However, the current ST integration algorithm ignores for ST dropouts, which impedes the spatial-aware of ST features, resulting in challenges in the accuracy and robustness of microenvironmental heterogeneity detecting, spatial domain clustering, and batch-effects correction.

View Article and Find Full Text PDF

The estimation of the pose of surgical instruments is important in Robot-assisted Minimally Invasive Surgery (RMIS) to assist surgical navigation and enable autonomous robotic task execution. The performance of current instrument pose estimation methods deteriorates significantly in the presence of partial tool visibility, occlusions, and changes in the surgical scene. In this work, a vision-based framework is proposed for markerless estimation of the 6DoF pose of surgical instruments.

View Article and Find Full Text PDF

This study introduces a novel deep learning approach for 3D teeth scan segmentation and labeling, designed to enhance accuracy in computer-aided design (CAD) systems. Our method is organized into three key stages: coarse localization, fine teeth segmentation, and labeling. In the teeth localization stage, we employ a Mask-RCNN model to detect teeth in a rendered three-channel 2D representation of the input scan.

View Article and Find Full Text PDF

Cloud Removal in the Tibetan Plateau Region Based on Self-Attention and Local-Attention Models.

Sensors (Basel)

December 2024

School of Surveying and Geo-Informatics, Shandong Jianzhu University, Fengming Road, Jinan 250101, China.

Optical remote sensing images have a wide range of applications but are often affected by cloud cover, which interferes with subsequent analysis. Therefore, cloud removal has become indispensable in remote sensing data processing. The Tibetan Plateau, as a sensitive region to climate change, plays a crucial role in the East Asian water cycle and regional climate due to its snow cover.

View Article and Find Full Text PDF

The Impact of Ancestry on Genome-Wide Association Studies.

Pac Symp Biocomput

December 2024

Genomics and Computational Biology Graduate Group, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.

Genome-wide association studies (GWAS) are an important tool for the study of complex disease genetics. Decisions regarding the quality control (QC) procedures employed as part of a GWAS can have important implications on the results and their biological interpretation. Many GWAS have been conducted predominantly in cohorts of European ancestry, but many initiatives aim to increase the representation of diverse ancestries in genetic studies.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!