Publications by authors named "Yunhai Tong"

Panoptic Part Segmentation (PPS) unifies panoptic and part segmentation into one task. Previous works utilize separate approaches to handle things, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework, Panoptic-PartFormer.

View Article and Find Full Text PDF
Article Synopsis
  • Referring Image Segmentation (RIS) traditionally outputs object masks based on text descriptions, but struggles with misleading descriptions that don't correspond to the image.
  • The authors introduce Robust Referring Image Segmentation (R-RIS), which accounts for both positive and negative sentence inputs to improve segmentation accuracy.
  • They also present a new transformer model, RefSegformer, and create datasets and metrics to evaluate this approach, achieving state-of-the-art results for both RIS and R-RIS.
View Article and Find Full Text PDF

In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training.

View Article and Find Full Text PDF

Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations, wherein the attention maps of different layers are learned separately without explicit interactions.

View Article and Find Full Text PDF

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their performance on Video Object Detection (VOD) has not been well explored. In this paper, we present TransVOD, the first end-to-end video object detection system based on simple yet effective spatial-temporal Transformer architectures.

View Article and Find Full Text PDF

Video Instance Segmentation (VIS) is a new and inherently multi-task problem, which aims to detect, segment, and track each instance in a video sequence. Existing approaches are mainly based on single-frame features or single-scale features of multiple frames, where either temporal information or multi-scale information is ignored. To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.

View Article and Find Full Text PDF

Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation. However, convolutional neural networks (CNNs) are inherently limited to model such dependencies due to the naive structure in its building modules (e.g.

View Article and Find Full Text PDF

Graph-based convolutional model such as non-local block has shown to be effective for strengthening the context modeling ability in convolutional neural networks (CNNs). However, its pixel-wise computational overhead is prohibitive which renders it unsuitable for high resolution imagery. In this paper, we explore the efficiency of context graph reasoning and propose a novel framework called Squeeze Reasoning.

View Article and Find Full Text PDF

In the stock market, return reversal occurs when investors sell overbought stocks and buy oversold stocks, reversing the stocks' price trends. In this paper, we develop a new method to identify key drivers of return reversal by incorporating a comprehensive set of factors derived from different economic theories into one unified dynamical Bayesian factor graph. We then use the model to depict factor relationships and their dynamics, from which we make some interesting discoveries about the mechanism behind return reversals.

View Article and Find Full Text PDF
Article Synopsis
  • The electrocardiogram (ECG) is used to help doctors diagnose heart diseases and has a long history of development.
  • The article reviews important breakthroughs in ECG technology and its various uses, like identifying conditions such as long QT syndrome and heart attacks.
  • It also highlights current hot topics in ECG research and predicts future trends, focusing on making ECG tools better and using data to improve heart health diagnoses.
View Article and Find Full Text PDF