IEEE Trans Pattern Anal Mach Intell
December 2024
Panoptic Part Segmentation (PPS) unifies panoptic and part segmentation into one task. Previous works utilize separate approaches to handle things, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework, Panoptic-PartFormer.
View Article and Find Full Text PDFIn the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training.
View Article and Find Full Text PDFAttention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations, wherein the attention maps of different layers are learned separately without explicit interactions.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
June 2023
Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their performance on Video Object Detection (VOD) has not been well explored. In this paper, we present TransVOD, the first end-to-end video object detection system based on simple yet effective spatial-temporal Transformer architectures.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
May 2023
Video Instance Segmentation (VIS) is a new and inherently multi-task problem, which aims to detect, segment, and track each instance in a video sequence. Existing approaches are mainly based on single-frame features or single-scale features of multiple frames, where either temporal information or multi-scale information is ignored. To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.
View Article and Find Full Text PDFModelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation. However, convolutional neural networks (CNNs) are inherently limited to model such dependencies due to the naive structure in its building modules (e.g.
View Article and Find Full Text PDFGraph-based convolutional model such as non-local block has shown to be effective for strengthening the context modeling ability in convolutional neural networks (CNNs). However, its pixel-wise computational overhead is prohibitive which renders it unsuitable for high resolution imagery. In this paper, we explore the efficiency of context graph reasoning and propose a novel framework called Squeeze Reasoning.
View Article and Find Full Text PDFIn the stock market, return reversal occurs when investors sell overbought stocks and buy oversold stocks, reversing the stocks' price trends. In this paper, we develop a new method to identify key drivers of return reversal by incorporating a comprehensive set of factors derived from different economic theories into one unified dynamical Bayesian factor graph. We then use the model to depict factor relationships and their dynamics, from which we make some interesting discoveries about the mechanism behind return reversals.
View Article and Find Full Text PDF