Publications by authors named "Yuankai Qi"

Crowd localization aims to predict the positions of humans in images of crowded scenes. While existing methods have made significant progress, two primary challenges remain: (i) a fixed number of evenly distributed anchors can cause excessive or insufficient predictions across regions in an image with varying crowd densities, and (ii) ranking inconsistency of predictions between the testing and training phases leads to the model being sub-optimal in inference. To address these issues, we propose a Consistency-Aware Anchor Pyramid Network (CAAPN) comprising two key components: an Adaptive Anchor Generator (AAG) and a Localizer with Augmented Matching (LAM).

View Article and Find Full Text PDF

Video captioning aims to generate natural language descriptions for a given video clip. Existing methods mainly focus on end-to-end representation learning via word-by-word comparison between predicted captions and ground-truth texts. Although significant progress has been made, such supervised approaches neglect semantic alignment between visual and linguistic entities, which may negatively affect the generated captions.

View Article and Find Full Text PDF

Visual attention advances object detection by attending neural networks to object representations. While existing methods incorporate empirical modules to empower network attention, we rethink attentive object detection from the network learning perspective in this work. We propose a NEural Attention Learning approach (NEAL) which consists of two parts.

View Article and Find Full Text PDF

Recent works attempt to employ pre-training in Vision-and-Language Navigation (VLN). However, these methods neglect the importance of historical contexts or ignore predicting future actions during pre-training, limiting the learning of visual-textual correspondence and the capability of decision-making. To address these problems, we present a history-enhanced and order-aware pre-training with the complementing fine-tuning paradigm (HOP+) for VLN.

View Article and Find Full Text PDF

Introduction: Digital health is rapidly expanding due to surging healthcare costs, deteriorating health outcomes, and the growing prevalence and accessibility of mobile health (mHealth) and wearable technology. Data from Biometric Monitoring Technologies (BioMeTs), including mHealth and wearables, can be transformed into that act as indicators of health outcomes and can be used to diagnose and monitor a number of chronic diseases and conditions. There are many challenges faced by digital biomarker development, including a lack of regulatory oversight, limited funding opportunities, general mistrust of sharing personal data, and a shortage of open-source data and code.

View Article and Find Full Text PDF

Convolutional neural networks (CNNs) have achieved great success in several face-related tasks, such as face detection, alignment and recognition. As a fundamental problem in computer vision, face tracking plays a crucial role in various applications, such as video surveillance, human emotion detection and human-computer interaction. However, few CNN-based approaches are proposed for face (bounding box) tracking.

View Article and Find Full Text PDF

The dynamic time warping (DTW) algorithm is widely used in pattern matching and sequence alignment tasks, including speech recognition and time series clustering. However, DTW algorithms perform poorly when aligning sequences of uneven sampling frequencies. This makes it difficult to apply DTW to practical problems, such as aligning signals that are recorded simultaneously by sensors with different, uneven, and dynamic sampling frequencies.

View Article and Find Full Text PDF

Epithelial-mesenchymal transition (EMT) is one of the most important mechanisms in the initiation and promotion of cancer cell metastasis. The phosphoinositide 3-kinase (PI3K) signaling pathway has been demonstrated to be involved in TGF-β induced EMT, but the complicated TGF-β signaling network makes it challenging to dissect the important role of PI3K on regulation of EMT process. Here, we applied optogenetic controlled PI3K module (named 'Opto-PI3K'), which based on CRY2 and the N-terminal of CIB1 (CIBN), to rapidly and reversibly control the endogenous PI3K activity in cancer cells with light.

View Article and Find Full Text PDF

Convolutional Neural Networks (CNNs) have been applied to visual tracking with demonstrated success in recent years. Most CNN-based trackers utilize hierarchical features extracted from a certain layer to represent the target. However, features from a certain layer are not always effective for distinguishing the target object from the backgrounds especially in the presence of complicated interfering factors (e.

View Article and Find Full Text PDF

Sparse coding has been applied to visual tracking and related vision problems with demonstrated success in recent years. Existing tracking methods based on local sparse coding sample patches from a target candidate and sparsely encode these using a dictionary consisting of patches sampled from target template images. The discriminative strength of existing methods based on local sparse coding is limited as spatial structure constraints among the template patches are not exploited.

View Article and Find Full Text PDF