6,845 results match your criteria: "IEEE transactions on pattern analysis and machine intelligence[Journal]"

The Segment Anything Model (SAM), a profound vision foundation model pretrained on a large-scale dataset, breaks the boundaries of general segmentation and sparks various downstream applications. This paper introduces Hi-SAM, a unified model leveraging SAM for hierarchical text segmentation. Hi-SAM excels in segmentation across four hierarchies, including pixel-level text, word, text-line, and paragraph, while realizing layout analysis as well.

View Article and Find Full Text PDF

In this study, we address the challenge of using energy-based models to produce high-quality, label-specific data in complex structured datasets, such as population genetics, RNA or protein sequences data. Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing, which affects the diversity of synthetic data and increases generation times. To address these issues, we use a novel training algorithm that exploits non-equilibrium effects.

View Article and Find Full Text PDF

Information theory is an outstanding framework for measuring uncertainty, dependence, and relevance in data and systems. It has several desirable properties for real-world applications: naturally deals with multivariate data, can handle heterogeneous data, and the measures can be interpreted. However, it has not been adopted by a wider audience because obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality.

View Article and Find Full Text PDF

Deep networks on 3D point clouds have achieved remarkable success in 3D classification, but they are vulnerable to geometric variations resulting from inconsistent data acquisition procedures. This leads to challenging 3D domain generalization and adaptation tasks, aiming to tackle the challenge that the performance of a model trained on a source domain will degrade on an out-of-distribution target domain. In this paper, we introduce a novel Multi-Scale Part-based feature Representation, dubbed MSPR, as a generalizable representation for point cloud domain generalization and adaptation.

View Article and Find Full Text PDF

Video snapshot compressive imaging (SCI) encodes the target dynamic scene compactly into a snapshot and reconstructs its high-speed frame sequence afterward, greatly reducing the required data footprint and transmission bandwidth as well as enabling high-speed imaging with a low frame rate intensity camera. In implementation, high-speed dynamics are encoded via temporally varying patterns, and only frames at corresponding temporal intervals can be reconstructed, while the dynamics occurring between consecutive frames are lost. To unlock the potential of conventional snapshot compressive videography, we propose a novel hybrid "intensity + event" imaging scheme by incorporating an event camera into a video SCI setup.

View Article and Find Full Text PDF

We present a novel deep camera path optimization framework for minimum latency online video stabilization. Typically, a stabilization pipeline consists of three steps: motion estimation, path smoothing, and novel view synthesis. Most previous methods concentrate on motion estimation while path optimization receives less attention, particularly in the crucial online setting where future frames are inaccessible.

View Article and Find Full Text PDF

Insights on 'Complex-Valued Iris Recognition Network'.

IEEE Trans Pattern Anal Mach Intell

November 2024

Article Synopsis
  • - The paper discusses a new iris recognition algorithm published in TPAMI, which has some interesting aspects.
  • - However, there are several inconsistencies and errors identified in their methodology and findings.
  • - The authors emphasize the unfair comparison with current leading methods and aim to provide clarity for other researchers in the biometrics field.
View Article and Find Full Text PDF

Depicting novel classes with language descriptions by observing few-shot samples is inherent in human-learning systems. This lifelong learning capability helps to distinguish new knowledge from old ones through the increase of open-world learning, namely Few-Shot Class-Incremental Learning (FSCIL). Existing works to solve this problem mainly rely on the careful tuning of visual encoders, which shows an evident trade-off between the base knowledge and incremental ones.

View Article and Find Full Text PDF

In the Image Aesthetics Computing (IAC) field, most prior methods leveraged the off-the-shelf backbones pre-trained on the large-scale ImageNet database. While these pre-trained backbones have achieved notable success, they often overemphasize object-level semantics and fail to capture the high-level concepts of image aesthetics, which may only achieve suboptimal performances. To tackle this long-neglected problem, we propose a multi-modality multi-attribute contrastive pre-training framework, targeting at constructing an alternative to ImageNet-based pre-training for IAC.

View Article and Find Full Text PDF
Article Synopsis
  • This paper tackles the difficult issue of source-free unsupervised domain adaptation (SFUDA) specifically for semantic segmentation in pinhole and panoramic images, using a pre-trained model and unlabeled target images.
  • The authors highlight three main challenges: differences in field-of-view (FoV) between domains, style discrepancies, and distortions in panoramic images.
  • To address these, they propose 360SFUDA++, employing Tangent Projection for effective knowledge extraction, and introducing modules for reliable knowledge adaptation and feature alignment between the pinhole and panoramic domains, resulting in improved performance on various benchmarks.
View Article and Find Full Text PDF

Existing Masked Image Modeling methods apply fixed mask patterns to guide the self-supervised training. As those mask patterns resort to different criteria to depict image contents, sticking to a fixed pattern leads to a limited vision cues modeling capability. This paper introduces an evolved hierarchical masking method to pursue general visual cues modeling in self-supervised learning.

View Article and Find Full Text PDF

Regular unsupervised domain adaptive person re-identification (ReID) focuses on adapting a model from a source domain to a fixed target domain. However, an adapted ReID model can hardly retain previously-acquired knowledge and generalize to unseen data. In this paper, we propose a Dual-level Joint Adaptation and Anti-forgetting (DJAA) framework, which incrementally adapts a model to new domains without forgetting source domain and each adapted target domain.

View Article and Find Full Text PDF

Offset-based representation has emerged as a promising approach for modeling semantic relations between pixels and object motion, demonstrating efficacy across various computer vision tasks. In this paper, we introduce a novel one-stage multi-tasking network tailored to extend the offset-based approach to MOTS. Our proposed framework, named OffsetNet, is designed to concurrently address amodal bounding box detection, instance segmentation, and tracking.

View Article and Find Full Text PDF

Modeling count data using suitable statistical distributions has been instrumental for analyzing the patterns it conveys. However, failing to address critical aspects, like overdispersion, jeopardizes the effectiveness of such an analysis. In this paper, overdispersed count data is modeled using the Dirichlet Multinomial (DM) distribution by maximizing its likelihood using a fixed-point iteration algorithm.

View Article and Find Full Text PDF

Applying current machine learning algorithms in complex and open environments remains challenging, especially when different changing elements are coupled and the training data is scarce. For example, in the activity recognition task, the motion sensors may change position or fall off due to the intensity of the activity, leading to changes in feature space and finally resulting in label noise. Learning from such a problem where the dynamic features are coupled with noisy labels is crucial but rarely studied, particularly when the noisy samples in new feature space are limited.

View Article and Find Full Text PDF

Recently, Optimal Transport has been proposed as a probabilistic framework in Machine Learning for comparing and manipulating probability distributions. This is rooted in its rich history and theory, and has offered new solutions to different problems in machine learning, such as generative modeling and transfer learning. In this survey we explore contributions of Optimal Transport for Machine Learning over the period 2012 - 2023, focusing on four sub-fields of Machine Learning: supervised, unsupervised, transfer and reinforcement learning.

View Article and Find Full Text PDF

The Decoupling Concept Bottleneck Model.

IEEE Trans Pattern Anal Mach Intell

November 2024

The Concept Bottleneck Model (CBM) is an interpretable neural network that leverages high-level concepts to explain model decisions and conduct human-machine interaction. However, in real-world scenarios, the deficiency of informative concepts can impede the model's interpretability and subsequent interventions. This paper proves that insufficient concept information can lead to an inherent dilemma of concept and label distortions in CBM.

View Article and Find Full Text PDF

Restoration tasks in low-level vision aim to restore high-quality (HQ) data from their low-quality (LQ) observations. To circumvents the difficulty of acquiring paired data in real scenarios, unpaired approaches that aim to restore HQ data solely on unpaired data are drawing increasing interest. Since restoration tasks are tightly coupled with the degradation model, unknown and highly diverse degradations in real scenarios make learning from unpaired data quite challenging.

View Article and Find Full Text PDF

Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions of seen primitives. Prior studies have attempted to either learn primitives individually (non-connected) or establish dependencies among them in the composition (fully-connected). In contrast, human comprehension of composition diverges from the aforementioned methods as humans possess the ability to make composition-aware adaptation for these primitives, instead of inferring them rigidly through the aforementioned methods.

View Article and Find Full Text PDF

Non-maximum suppression (NMS) is an essential post-processing step for object detection. The de-facto standard for NMS, namely GreedyNMS, is not parallelizable and could thus be the performance bottleneck in object detection pipelines. MaxpoolNMS is introduced as a fast and parallelizable alternative to GreedyNMS.

View Article and Find Full Text PDF

Bias in computer vision systems can perpetuate or even amplify discrimination against certain populations. Considering that bias is often introduced by biased visual datasets, many recent research efforts focus on training fair models using such data. However, most of them heavily rely on the availability of protected attribute labels in the dataset, which limits their applicability, while label-unaware approaches, i.

View Article and Find Full Text PDF

Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set. Previous works mainly project an image and its corresponding composition into a common embedding space to measure their compatibility score. However, both attributes and objects share the visual representations learned above, leading the model to exploit spurious correlations and bias towards seen compositions.

View Article and Find Full Text PDF

The utilization of synthetic data for fingerprint recognition has garnered increased attention due to its potential to alleviate privacy concerns surrounding sensitive biometric data. However, current methods for generating fingerprints have limitations in creating impressions of the same finger with useful intra-class variations. To tackle this challenge, we present GenPrint, a framework to produce fingerprint images of various types while maintaining identity and offering humanly understandable control over different appearance factors such as fingerprint class, acquisition type, sensor device, and quality level.

View Article and Find Full Text PDF

The act of telling stories is a fundamental part of what it means to be human. This work introduces the concept of narrative information, which we define as the overlap in information space between a story and the items that compose the story. Using contrastive learning methods, we show how modern artificial neural networks can be leveraged to distill stories and extract a representation of the narrative information.

View Article and Find Full Text PDF