Publications by authors named "Xiatian Zhu"

Article Synopsis
  • Few-shot Semantic Segmentation (FSS) allows pretrained models to learn new object classes with minimal labeled training data, but current methods struggle with complex backgrounds, particularly in medical imaging.
  • The proposed DetailSelf-refinedPrototypeNetwork (DSPNet) improves segmentation accuracy by creating detailed prototypes that better represent both foreground objects and background contexts.
  • Experimental results on challenging medical image datasets demonstrate that DSPNet outperforms existing state-of-the-art techniques, and the available code can be found at the provided GitHub link.
View Article and Find Full Text PDF

The new generation of organic light emitting diode display is designed to enable the high dynamic range (HDR), going beyond the standard dynamic range (SDR) supported by the traditional display devices. However, a large quantity of videos are still of SDR format. Further, most pre-existing videos are compressed at varying degrees for minimizing the storage and traffic flow demands.

View Article and Find Full Text PDF

Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and Big Data, Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data.

View Article and Find Full Text PDF

Background: To quantify the association between the free distal segment length of the internal carotid artery (FDS-ICA) and permanent cranial nerve injury (p-CNI) following carotid body tumor (CBT) resection.

Methods: This study is a case-control study. We surveyed 109 consecutive patients who underwent CBT resection between June 2015 and June 2020 at our single center.

View Article and Find Full Text PDF

State-of-the-art deep learning models are often trained with a large amount of costly labeled training data. However, requiring exhaustive manual annotations may degrade the model's generalizability in the limited-label regime.Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data.

View Article and Find Full Text PDF

Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation. However, convolutional neural networks (CNNs) are inherently limited to model such dependencies due to the naive structure in its building modules (e.g.

View Article and Find Full Text PDF

Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain, given labeled data on a source domain whose distribution diverges from the target one. Mainstream UDA methods strive to learn domain-aligned features such that classifiers trained on the source features can be readily applied to the target ones. Although impressive results have been achieved, these methods have a potential risk of damaging the intrinsic data structures of target discrimination, raising an issue of generalization particularly for UDA tasks in an inductive setting.

View Article and Find Full Text PDF

Most existing person re-identification (re-id) methods rely on supervised model learning on per-camera-pair manually labelled pairwise training data. This leads to poor scalability in a practical re-id deployment, due to the lack of exhaustive identity labelling of positive and negative image pairs for every camera-pair. In this work, we present an unsupervised re-id deep learning approach.

View Article and Find Full Text PDF

Model learning from class imbalanced training data is a long-standing and significant challenge for machine learning. In particular, existing deep learning methods consider mostly either class balanced data or moderately imbalanced data in model training, and ignore the challenge of learning from significantly imbalanced training data. To address this problem, we formulate a class imbalanced deep learning model based on batch-wise incremental minority (sparsely sampled) class rectification by hard sample mining in majority (frequently sampled) classes during model training.

View Article and Find Full Text PDF

Most existing person re-identification (re-id) methods are unsuitable for real-world deployment due to two reasons: , and . In this work, we present a unified solution to address both problems. Specifically, we propose to construct an identity regression space (IRS) based on embedding different training person identities (classes) and formulate re-id as a regression problem solved by identity regression in the IRS.

View Article and Find Full Text PDF

Existing person re-identification (re-id) methods typically assume that: 1) any probe person is guaranteed to appear in the gallery target population during deployment (i.e., closed-world) and 2) the probe set contains only a limited number of people (i.

View Article and Find Full Text PDF

The challenge of person re-identification (re-id) is to match individual images of the same person captured by different non-overlapping camera views against significant and unknown cross-view feature distortion. While a large number of distance metric/subspace learning models have been developed for re-id, the cross-view transformations they learned are view-generic and thus potentially less effective in quantifying the feature distortion inherent to each camera view. Learning view-specific feature transformations for re-id (i.

View Article and Find Full Text PDF

Current person re-identification (ReID) methods typically rely on single-frame imagery features, whilst ignoring space-time information from image sequences often available in the practical surveillance scenarios. Single-frame (single-shot) based visual appearance matching is inherently limited for person ReID in public spaces due to the challenging visual ambiguity and uncertainty arising from non-overlapping camera views where viewing condition changes can cause significant people appearance variations. In this work, we present a novel model to automatically select the most discriminative video fragments from noisy/incomplete image sequences of people from which reliable space-time and appearance features can be computed, whilst simultaneously learning a video ranking function for person ReID.

View Article and Find Full Text PDF

While clustering is usually an unsupervised operation, there are circumstances where we have access to prior belief that pairs of samples should (or should not) be assigned with the same cluster. Constrained clustering aims to exploit this prior belief as constraint (or weak supervision) to influence the cluster formation so as to obtain a data structure more closely resembling human perception. Two important issues remain open: 1) how to exploit sparse constraints effectively and 2) how to handle ill-conditioned/noisy constraints generated by imperfect oracles.

View Article and Find Full Text PDF

A PHP Error was encountered

Severity: Warning

Message: fopen(/var/lib/php/sessions/ci_session7nmdccr71ntigac05nuf384fof7s6aa6): Failed to open stream: No space left on device

Filename: drivers/Session_files_driver.php

Line Number: 177

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once

A PHP Error was encountered

Severity: Warning

Message: session_start(): Failed to read session data: user (path: /var/lib/php/sessions)

Filename: Session/Session.php

Line Number: 137

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once