Publications by authors named "Jungong Han"

Prompt learning is a powerful technique that enables the transfer of Vision-Language Models (VLMs) like CLIP to downstream tasks. However, when the prompt-based methods are fine-tuned solely on base classes, they often struggle to generalize to novel classes lacking visual samples during training, especially in scenarios with limited training data. To address this challenge, we propose an innovative approach called Synth-CLIP that leverages synthetic data to enhance CLIP's generalization capability for base classes and the general capability for novel classes.

View Article and Find Full Text PDF

Toward desirable saliency prediction, the types and numbers of inputs for a salient object detection (SOD) algorithm may dynamically change in many real-life applications. However, existing SOD algorithms are mainly designed or trained for one particular type of inputs, failing to be generalized to other types of inputs. Consequentially, more types of SOD algorithms need to be prepared in advance for handling different types of inputs, raising huge hardware and research costs.

View Article and Find Full Text PDF

For visible-infrared person re-identification (VI-ReID), current models that compensate modality-specific information strive to generate missing modality images from existing ones to bridge the cross-modality discrepancies. Despite that, those generated images often suffer from low qualities due to the significant modality gap and include interfering information, e.g.

View Article and Find Full Text PDF

Recently, fast Magnetic Resonance Imaging reconstruction technology has emerged as a promising way to improve the clinical diagnostic experience by significantly reducing scan times. While existing studies have used Generative Adversarial Networks to achieve impressive results in reconstructing MR images, they still suffer from challenges such as blurred zones/boundaries and abnormal spots caused by inevitable noise in the reconstruction process. To this end, we propose a novel deep framework termed Anisotropic Diffusion-Assisted Generative Adversarial Networks, which aims to maximally preserve valid high-frequency information and structural details while minimizing noises in reconstructed images by optimizing a joint loss function in a unified framework.

View Article and Find Full Text PDF

Few-Shot Class-Incremental Learning (FSCIL) aims at incrementally learning new knowledge from limited training examples without forgetting previous knowledge. However, we observe that existing methods face a challenge known as supervision collapse, where the model disproportionately emphasizes class-specific features of base classes at the detriment of novel class representations, leading to restricted cognitive capabilities. To alleviate this issue, we propose a new framework, Model aTtention Expansion for Few-Shot Class-Incremental Learning (MTE-FSCIL), aimed at expanding the model attention fields to improve transferability without compromising the discriminative capability for base classes.

View Article and Find Full Text PDF

In RGB-T tracking, there exist rich spatial relationships between the target and backgrounds within multi-modal data as well as sound consistencies of spatial relationships among successive frames, which are crucial for boosting the tracking performance. However, most existing RGB-T trackers overlook such multi-modal spatial relationships and temporal consistencies within RGB-T videos, hindering them from robust tracking and practical applications in complex scenarios. In this paper, we propose a novel Multi-modal Spatial-Temporal Context (MMSTC) network for RGB-T tracking, which employs a Transformer architecture for the construction of reliable multi-modal spatial context information and the effective propagation of temporal context information.

View Article and Find Full Text PDF

Deep neural networks tend to suffer from the overfitting issue when the training data are not enough. In this paper, we introduce two metrics from the intra-class distribution of correct-predicted and incorrect-predicted samples to provide a new perspective on the overfitting issue. Based on it, we propose a knowledge distillation approach without pretraining a teacher model in advance named Tolerant Self-Distillation (TSD) for alleviating the overfitting issue.

View Article and Find Full Text PDF

Supervised learning-based image classification in computer vision relies on visual samples containing a large amount of labeled information. Considering that it is labor-intensive to collect and label images and construct datasets manually, Zero-Shot Learning (ZSL) achieves knowledge transfer from seen categories to unseen categories by mining auxiliary information, which reduces the dependence on labeled image samples and is one of the current research hotspots in computer vision. However, most ZSL methods fail to properly measure the relationships between classes, or do not consider the differences and similarities between classes at all.

View Article and Find Full Text PDF

Due to the costliness of labelled data in real-world applications, semi-supervised learning, underpinned by pseudo labelling, is an appealing solution. However, handling confusing samples is nontrivial: discarding valuable confusing samples would compromise the model generalisation while using them for training would exacerbate the issue of confirmation bias caused by the resulting inevitable mislabelling. To solve this problem, this paper proposes to use confusing samples proactively without label correction.

View Article and Find Full Text PDF

Capsule networks (CapsNets) have been known difficult to develop a deeper architecture, which is desirable for high performance in the deep learning era, due to the complex capsule routing algorithms. In this article, we present a simple yet effective capsule routing algorithm, which is presented by a residual pose routing. Specifically, the higher-layer capsule pose is achieved by an identity mapping on the adjacently lower-layer capsule pose.

View Article and Find Full Text PDF

Continual learning (CL) aims at studying how to learn new knowledge continuously from data streams without catastrophically forgetting the previous knowledge. One of the key problems is catastrophic forgetting, that is, the performance of the model on previous tasks declines significantly after learning the subsequent task. Several studies addressed it by replaying samples stored in the buffer when training new tasks.

View Article and Find Full Text PDF

Background: Individual differences have been detected in individuals with opioid use disorders (OUD) in rehabilitation following protracted abstinence. Recent studies suggested that prediction models were effective for individual-level prognosis based on neuroimage data in substance use disorders (SUD).

Aims: This prospective cohort study aimed to assess neuroimaging biomarkers for individual response to protracted abstinence in opioid users using connectome-based predictive modelling (CPM).

View Article and Find Full Text PDF

Patients with mild traumatic brain injury have a diverse clinical presentation, and the underlying pathophysiology remains poorly understood. Magnetic resonance imaging is a non-invasive technique that has been widely utilized to investigate neurobiological markers after mild traumatic brain injury. This approach has emerged as a promising tool for investigating the pathogenesis of mild traumatic brain injury.

View Article and Find Full Text PDF

In Few-Shot Learning (FSL), the objective is to correctly recognize new samples from novel classes with only a few available samples per class. Existing methods in FSL primarily focus on learning transferable knowledge from base classes by maximizing the information between feature representations and their corresponding labels. However, this approach may suffer from the "supervision collapse" issue, which arises due to a bias towards the base classes.

View Article and Find Full Text PDF

The existence of redundancy in convolutional neural networks (CNNs) enables us to remove some filters/channels with acceptable performance drops. However, the training objective of CNNs usually tends to minimize an accuracy-related loss function without any attention paid to the redundancy, making the redundancy distribute randomly on all the filters, such that removing any of them may trigger information loss and accuracy drop, necessitating a fine-tuning step for recovery. In this article, we propose to manipulate the redundancy during training to facilitate network pruning.

View Article and Find Full Text PDF

Semantic segmentation models gain robustness against adverse illumination conditions by taking advantage of complementary information from visible and thermal infrared (RGB-T) images. Despite its importance, most existing RGB-T semantic segmentation models directly adopt primitive fusion strategies, such as elementwise summation, to integrate multimodal features. Such strategies, unfortunately, overlook the modality discrepancies caused by inconsistent unimodal features obtained by two independent feature extractors, thus hindering the exploitation of cross-modal complementary information within the multimodal data.

View Article and Find Full Text PDF

Recently, the Part-Object Relational (POR) saliency underpinned by the Capsule Network (CapsNet) has been demonstrated to be an effective modeling mechanism to improve the saliency detection accuracy. However, it is widely known that the current capsule routing operations have huge computational complexity, which seriously limited the usability of the POR saliency models in real-time applications. To this end, this paper takes an early step towards a fast POR saliency inference by proposing a novel disentangled part-object relational network.

View Article and Find Full Text PDF

Most existing RGB-D salient object detection (SOD) models adopt a two-stream structure to extract the information from the input RGB and depth images. Since they use two subnetworks for unimodal feature extraction and multiple multi-modal feature fusion modules for extracting cross-modal complementary information, these models require a huge number of parameters, thus hindering their real-life applications. To remedy this situation, we propose a novel middle-level feature fusion structure that allows to design a lightweight RGB-D SOD model.

View Article and Find Full Text PDF

Deep learning-based semi-supervised learning (SSL) algorithms are promising in reducing the cost of manual annotation of clinicians by using unlabelled data, when developing medical image segmentation tools. However, to date, most existing semi-supervised learning (SSL) algorithms treat the labelled images and unlabelled images separately and ignore the explicit connection between them; this disregards essential shared information and thus hinders further performance improvements. To mine the shared information between the labelled and unlabelled images, we introduce a class-specific representation extraction approach, in which a task-affinity module is specifically designed for representation extraction.

View Article and Find Full Text PDF

The performance of zero-shot learning (ZSL) can be improved progressively by learning better features and generating pseudosamples for unseen classes. Existing ZSL works typically learn feature extractors and generators independently, which may shift the unseen samples away from their real distribution and suffers from the domain bias problem. In this article, to tackle this challenge, we propose a variational autoencoder (VAE)-based framework, that is, joint Attentive Region Embedding with Enhanced Semantics (AREES), which is tailored to advance the zero-shot recognition.

View Article and Find Full Text PDF

Zero-shot learning (ZSL) aims to classify unseen samples based on the relationship between the learned visual features and semantic features. Traditional ZSL methods typically capture the underlying multimodal data structures by learning an embedding function between the visual space and the semantic space with the Euclidean metric. However, these models suffer from the hubness problem and domain bias problem, which leads to unsatisfactory performance, especially in the generalized ZSL (GZSL) task.

View Article and Find Full Text PDF

Dense captioning provides detailed captions of complex visual scenes. While a number of successes have been achieved in recent years, there are still two broad limitations: 1) most existing methods adopt an encoder-decoder framework, where the contextual information is sequentially encoded using long short-term memory (LSTM). However, the forget gate mechanism of LSTM makes it vulnerable when dealing with a long sequence and 2) the vast majority of prior arts consider regions of interests (RoIs) equally important, thus failing to focus on more informative regions.

View Article and Find Full Text PDF

Semantic information provides intra-class consistency and inter-class discriminability beyond visual concepts, which has been employed in Few-Shot Learning (FSL) to achieve further gains. However, semantic information is only available for labeled samples but absent for unlabeled samples, in which the embeddings are rectified unilaterally by guiding the few labeled samples with semantics. Therefore, it is inevitable to bring a cross-modal bias between semantic-guided samples and nonsemantic-guided samples, which results in an information asymmetry problem.

View Article and Find Full Text PDF

Multi-view clustering has become an active topic in artificial intelligence. Yet, similar investigation for graph-structured data clustering has been absent so far. To fill this gap, we present a Multi-View Graph embedding Clustering network (MVGC).

View Article and Find Full Text PDF

Accurate object detection requires correct classification and high-quality localization. Currently, most of the single shot detectors (SSDs) conduct simultaneous classification and regression using a fully convolutional network. Despite high efficiency, this structure has some inappropriate designs for accurate object detection.

View Article and Find Full Text PDF