Publications by authors named "Hongkai Xiong"

Federated learning (FL) commonly encourages the clients to perform multiple local updates before the global aggregation, thus avoiding frequent model exchanges and relieving the communication bottleneck between the server and clients. Though empirically effective, the negative impact of multiple local updates on the stability of FL is not thoroughly studied, which may result in a globally unstable and slow convergence. Based on sensitivity analysis, we define in this paper a local-update stability index for the general FL, as measured by the maximum inter-client model discrepancy after the multiple local updates that mainly stems from the data heterogeneity.

View Article and Find Full Text PDF

Object pose estimation constitutes a critical area within the domain of 3D vision. While contemporary state-of-the-art methods that leverage real-world pose annotations have demonstrated commendable performance, the procurement of such real training data incurs substantial costs. This paper focuses on a specific setting wherein only 3D CAD models are utilized as a priori knowledge, devoid of any background or clutter information.

View Article and Find Full Text PDF

Source-free domain adaptation (SFDA) shows the potential to improve the generalizability of deep learning-based face anti-spoofing (FAS) while preserving the privacy and security of sensitive human faces. However, existing SFDA methods are significantly degraded without accessing source data due to the inability to mitigate domain and identity bias in FAS. In this paper, we propose a novel Source-free Domain Adaptation framework for FAS (SDA-FAS) that systematically addresses the challenges of source model pre-training, source knowledge adaptation, and target data exploration under the source-free setting.

View Article and Find Full Text PDF

By introducing randomness on the environments, domain randomization (DR) imposes diversity to the policy training of deep reinforcement learning, and thus improves its capability of generalization. The randomization of environments, however, introduces another source of variability for the estimate of policy gradients, in addition to the already high variance incurred by trajectory sampling. Therefore, with standard state-dependent baselines, the policy gradient methods may still suffer high variance, causing a low sample efficiency during the training of DR.

View Article and Find Full Text PDF

3-D point clouds facilitate 3-D visual applications with detailed information of objects and scenes but bring about enormous challenges to design efficient compression technologies. The irregular signal statistics and high-order geometric structures of 3-D point clouds cannot be fully exploited by existing sparse representation and deep learning based point cloud attribute compression schemes and graph dictionary learning paradigms. In this paper, we propose a novel p-Laplacian embedding graph dictionary learning framework that jointly exploits the varying signal statistics and high-order geometric structures for 3-D point cloud attribute compression.

View Article and Find Full Text PDF

This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives.

View Article and Find Full Text PDF

Batch normalization (BN) is a fundamental unit in modern deep neural networks. However, BN and its variants focus on normalization statistics but neglect the recovery step that uses linear transformation to improve the capacity of fitting complex data distributions. In this paper, we demonstrate that the recovery step can be improved by aggregating the neighborhood of each neuron rather than just considering a single neuron.

View Article and Find Full Text PDF

The Cox proportional hazards model is a popular semi-parametric model for survival analysis. In this paper, we aim at developing a federated algorithm for the Cox proportional hazards model over vertically partitioned data (i.e.

View Article and Find Full Text PDF

Photonic neural networks perform brain-inspired computations using photons instead of electrons to achieve substantially improved computing performance. However, existing architectures can only handle data with regular structures but fail to generalize to graph-structured data beyond Euclidean space. Here, we propose the diffractive graph neural network (DGNN), an all-optical graph representation learning architecture based on the diffractive photonic computing units (DPUs) and on-chip optical devices to address this limitation.

View Article and Find Full Text PDF

Message passing has evolved as an effective tool for designing graph neural networks (GNNs). However, most existing methods for message passing simply sum or average all the neighboring features to update node representations. They are restricted by two problems: 1) lack of interpretability to identify node features significant to the prediction of GNNs and 2) feature overmixing that leads to the oversmoothing issue in capturing long-range dependencies and inability to handle graphs under heterophily or low homophily.

View Article and Find Full Text PDF

Self-supervised learning based on instance discrimination has shown remarkable progress. In particular, contrastive learning, which regards each image as well as its augmentations as an individual class and tries to distinguish them from all other images, has been verified effective for representation learning. However, conventional contrastive learning does not model the relation between semantically similar samples explicitly.

View Article and Find Full Text PDF

It is promising to solve linear inverse problems by unfolding iterative algorithms (e.g., iterative shrinkage thresholding algorithm (ISTA)) as deep neural networks (DNNs) with learnable parameters.

View Article and Find Full Text PDF

Endobronchial ultrasound (EBUS) elastography videos have shown great potential to supplement intrathoracic lymph node diagnosis. However, it is laborious and subjective for the specialists to select the representative frames from the tedious videos and make a diagnosis, and there lacks a framework for automatic representative frame selection and diagnosis. To this end, we propose a novel deep learning framework that achieves reliable diagnosis by explicitly selecting sparse representative frames and guaranteeing the invariance of diagnostic results to the permutations of video frames.

View Article and Find Full Text PDF

Background: Endoscopic ultrasound (EBUS) strain elastography can diagnose intrathoracic benign and malignant lymph nodes (LNs) by reflecting the relative stiffness of tissues. Due to strong subjectivity, it is difficult to give full play to the diagnostic efficiency of strain elastography. This study aims to use machine learning to automatically select high-quality and stable representative images from EBUS strain elastography videos.

View Article and Find Full Text PDF

In this paper, we propose the K-Shot Contrastive Learning (KSCL) of visual features by applying multiple augmentations to investigate the sample variations within individual instances. It aims to combine the advantages of inter-instance discrimination by learning discriminative features to distinguish between different instances, as well as intra-instance variations by matching queries against the variants of augmented samples over instances. Particularly, for each instance, it constructs an instance subspace to model the configuration of how the significant factors of variations in K-shot augmentations can be combined to form the variants of augmentations.

View Article and Find Full Text PDF

Model quantization is essential to deploy deep convolutional neural networks (DCNNs) on resource-constrained devices. In this article, we propose a general bitwidth assignment algorithm based on theoretical analysis for efficient layerwise weight and activation quantization of DCNNs. The proposed algorithm develops a prediction model to explicitly estimate the loss of classification accuracy led by weight quantization with a geometrical approach.

View Article and Find Full Text PDF

With the advent of data science, the analysis of network or graph data has become a very timely research problem. A variety of recent works have been proposed to generalize neural networks to graphs, either from a spectral graph theory or a spatial perspective. The majority of these works, however, focus on adapting the convolution operator to graph representation.

View Article and Find Full Text PDF

Differentiable architecture search (DARTS) enables effective neural architecture search (NAS) using gradient descent, but suffers from high memory and computational costs. In this paper, we propose a novel approach, namely Partially-Connected DARTS (PC-DARTS), to achieve efficient and stable neural architecture search by reducing the channel and spatial redundancies of the super-network. In the channel level, partial channel connection is presented to randomly sample a small subset of channels for operation selection to accelerate the search process and suppress the over-fitting of the super-network.

View Article and Find Full Text PDF

Background And Objectives: Along with the rapid improvement of imaging technology, convex probe endobronchial ultrasound (CP-EBUS) sonographic features play an increasingly important role in the diagnosis of intrathoracic lymph nodes (LNs). Conventional qualitative and quantitative methods for EBUS multimodal imaging are time-consuming and rely heavily on the experience of endoscopists. With the development of deep-learning (DL) models, there is great promise in the diagnostic field of medical imaging.

View Article and Find Full Text PDF

This paper introduces a new model for Weakly Supervised Object Localization (WSOL) problems where only image-level supervision is provided. The key to solve such problems is to infer the object locations accurately. Previous methods usually model the missing object locations as latent variables, and alternate between updating their estimates and learning a detector accordingly.

View Article and Find Full Text PDF

The task of reidentifying groups of people under different camera views is an important yet less-studied problem. Group reidentification (Re-ID) is a very challenging task since it is not only adversely affected by common issues in traditional single-object Re-ID problems, such as viewpoint and human pose variations, but also suffers from changes in group layout and group membership. In this paper, we propose a novel concept of group granularity by characterizing a group image by multigrained objects: individual people and subgroups of two and three people within a group.

View Article and Find Full Text PDF

Dimension reduction is widely regarded as an effective way for decreasing the computation, storage and communication loads of data-driven intelligent systems, leading to a growing demand for statistical methods that allow analysis (e.g., clustering) of compressed data.

View Article and Find Full Text PDF

The cognitive radio technique allows secondary users (SUs) to share the spectrum with primary users (PUs) in an exclusive or opportunistic manner. This paper studies spectrum pricing conducted by spectrum owners, that is, primary operators (POs), and SU decision-making strategies for three kinds of duopoly markets. The single-band exclusive use market considers two POs with each providing a single band dedicated to SUs.

View Article and Find Full Text PDF

Recent methods for object co-segmentation focus on discovering single co-occurring relation of candidate regions representing the foreground of multiple images. However, region extraction based only on low and middle level information often occupies a large area of background without the help of semantic context. In addition, seeking single matching solution very likely leads to discover local parts of common objects.

View Article and Find Full Text PDF

Dictionary learning has emerged as a promising alternative to the conventional hybrid coding framework. However, the rigid structure of sequential training and prediction degrades its performance in scalable video coding. This paper proposes a progressive dictionary learning framework with hierarchical predictive structure for scalable video coding, especially in low bitrate region.

View Article and Find Full Text PDF