IEEE Trans Pattern Anal Mach Intell
December 2024
We propose integrally pre-trained transformer pyramid network (iTPN), towards jointly optimizing the network backbone and the neck, so that transfer gap between representation models and downstream tasks is minimal. iTPN is born with two elaborated designs: 1) The first pre-trained feature pyramid upon vision transformer (ViT). 2) Multi-stage supervision to the feature pyramid using masked feature modeling (MFM).
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
July 2024
Conventional neural architecture search (NAS) algorithms typically work on search spaces with short-distance node connections. We argue that such designs, though safe and stable, are obstacles to exploring more effective network architectures. In this brief, we explore the search algorithm upon a complicated search space with long-distance connections and show that existing weight-sharing search algorithms fail due to the existence of interleaved connections (ICs).
View Article and Find Full Text PDFThere are two mainstream approaches for object detection: top-down and bottom-up. The state-of-the-art approaches are mainly top-down methods. In this paper, we demonstrate that bottom-up approaches show competitive performance compared with top-down approaches and have higher recall rates.
View Article and Find Full Text PDFWeather forecasting is important for science and society. At present, the most accurate forecast system is the numerical weather prediction (NWP) method, which represents atmospheric states as discretized grids and numerically solves partial differential equations that describe the transition between those states. However, this procedure is computationally expensive.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
October 2023
Style-based GANs achieve state-of-the-art results for generating high-quality images, but lack explicit and precise control over camera poses. Recently proposed NeRF-based GANs have made great progress towards 3D-aware image generation. However, the methods either rely on convolution operators which are not rotationally invariant, or utilize complex yet suboptimal training procedures to integrate both NeRF and CNN sub-structures, yielding un-robust, low-quality images with a large computational burden.
View Article and Find Full Text PDFPre-training on large-scale datasets has played an increasingly significant role in computer vision and natural language processing recently. However, as there exist numerous application scenarios that have distinctive demands such as certain latency constraints and specialized data distributions, it is prohibitively expensive to take advantage of large-scale pre-training for per-task requirements. we focus on two fundamental perception tasks (object detection and semantic segmentation) and present a complete and flexible system named GAIA-Universe(GAIA), which could automatically and efficiently give birth to customized solutions according to heterogeneous downstream needs through data union and super-net training.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
October 2023
Few-shot class-incremental learning (FSCIL) faces the challenges of memorizing old class distributions and estimating new class distributions given few training samples. In this study, we propose a learnable distribution calibration (LDC) approach, to systematically solve these two challenges using a unified framework. LDC is built upon a parameterized calibration unit (PCU), which initializes biased distributions for all classes based on classifier vectors (memory-free) and a single covariance matrix.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2023
In order to enable the model to generalize to unseen "action-objects" (compositional action), previous methods encode multiple pieces of information (i.e., the appearance, position, and identity of visual instances) independently and concatenate them for classification.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2023
The rapid development of deep learning has made a great progress in image segmentation, one of the fundamental tasks of computer vision. However, the current segmentation algorithms mostly rely on the availability of pixel-level annotations, which are often expensive, tedious, and laborious. To alleviate this burden, the past years have witnessed an increasing attention in building label-efficient, deep-learning-based image segmentation algorithms.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2023
With convolution operations, Convolutional Neural Networks (CNNs) are good at extracting local features but experience difficulty to capture global representations. With cascaded self-attention modules, vision transformers can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take both advantages of convolution operations and self-attention mechanisms for enhanced representation learning.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
July 2023
Batch normalization (BN) is a fundamental unit in modern deep neural networks. However, BN and its variants focus on normalization statistics but neglect the recovery step that uses linear transformation to improve the capacity of fitting complex data distributions. In this paper, we demonstrate that the recovery step can be improved by aggregating the neighborhood of each neuron rather than just considering a single neuron.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
March 2023
Self-supervised learning based on instance discrimination has shown remarkable progress. In particular, contrastive learning, which regards each image as well as its augmentations as an individual class and tries to distinguish them from all other images, has been verified effective for representation learning. However, conventional contrastive learning does not model the relation between semantically similar samples explicitly.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
September 2021
Differentiable architecture search (DARTS) enables effective neural architecture search (NAS) using gradient descent, but suffers from high memory and computational costs. In this paper, we propose a novel approach, namely Partially-Connected DARTS (PC-DARTS), to achieve efficient and stable neural architecture search by reducing the channel and spatial redundancies of the super-network. In the channel level, partial channel connection is presented to randomly sample a small subset of channels for operation selection to accelerate the search process and suppress the over-fitting of the super-network.
View Article and Find Full Text PDFIEEE Trans Image Process
February 2021
Natural language moment localization aims at localizing video clips according to a natural language description. The key to this challenging task lies in modeling the relationship between verbal descriptions and visual contents. Existing approaches often sample a number of clips from the video, and individually determine how each of them is related to the query sentence.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
June 2023
Group activity recognition (GAR) is a challenging task aimed at recognizing the behavior of a group of people. It is a complex inference process in which visual cues collected from individuals are integrated into the final prediction, being aware of the interaction between them. This paper goes one step further beyond the existing approaches by designing a Hierarchical Graph-based Cross Inference Network (HiGCIN), in which three levels of information, i.
View Article and Find Full Text PDFIEEE Trans Med Imaging
February 2020
We aim at segmenting a wide variety of organs, including tiny targets (e.g., adrenal gland), and neoplasms (e.
View Article and Find Full Text PDFIEEE Trans Image Process
November 2015
State-of-the-art web image search frameworks are often based on the bag-of-visual-words (BoVWs) model and the inverted index structure. Despite the simplicity, efficiency, and scalability, they often suffer from low precision and/or recall, due to the limited stability of local features and the considerable information loss on the quantization stage. To refine the quality of retrieved images, various postprocessing methods have been adopted after the initial search process.
View Article and Find Full Text PDFIEEE Trans Image Process
May 2014
In image classification tasks, one of the most successful algorithms is the bag-of-features (BoFs) model. Although the BoF model has many advantages, such as simplicity, generality, and scalability, it still suffers from several drawbacks, including the limited semantic description of local descriptors, lack of robust structures upon single visual words, and missing of efficient spatial weighting. To overcome these shortcomings, various techniques have been proposed, such as extracting multiple descriptors, spatial context modeling, and interest region detection.
View Article and Find Full Text PDF