Publications by authors named "Junyu Gao"

Article Synopsis
  • Video captioning automatically generates descriptive text based on videos and is crucial for various applications, but previous research has mainly focused on generating captions without properly aligning visual and textual elements.
  • The proposed model integrates both feature extraction and caption generation in a single framework, using a center-enhanced strategy for improved semantic feature alignment through incremental clustering.
  • Experimental results show that this end-to-end model significantly outperforms existing methods, leading to higher quality captions on popular datasets like MSVD and MSR-VTT.
View Article and Find Full Text PDF

Despite the great progress of unsupervised domain adaptation (UDA) with the deep neural networks, current UDA models are opaque and cannot provide promising explanations, limiting their applications in the scenarios that require safe and controllable model decisions. At present, a surge of work focuses on designing deep interpretable methods with adequate data annotations and only a few methods consider the distributional shift problem. Most existing interpretable UDA methods are post-hoc ones, which cannot facilitate the model learning process for performance enhancement.

View Article and Find Full Text PDF
Article Synopsis
  • Structural variants (SVs) are crucial for genetic research, but current detection methods have high false positive rates, necessitating improved filtering techniques.
  • The CSV-Filter tool, developed using deep learning, enhances SV detection by employing a unique grayscale image encoding and image augmentation to better identify true SVs while reducing false positives.
  • CSV-Filter not only outperforms existing short-read filtering tools like DeepSVFilter but also supports long reads, making it a versatile option for effective SV analysis.
View Article and Find Full Text PDF

While photocatalytic technology has brought additional opportunities and possibilities for the green conversion and sustainable development of ammonium-based nitrogen fertilizers, the low activation efficiency of the molecular N has impeded its further application feasibility. Here to address the concern, we designed an amorphous molybdenum hydroxide anchored on the ultrathin magnesium-aluminum layered double hydroxide (Mo@MgAl-LDH) nanosheets for benefiting the N photofixation to NH. With the aid of the designed amorphous Mo(V) species, the pristine MgAl-LDH exhibited a considerable performance of nitrogen photofixation under visible light irradiation (NH production rate of 114.

View Article and Find Full Text PDF

With the increasing availability of cameras in vehicles, obtaining license plate (LP) information via on-board cameras has become feasible in traffic scenarios. LPs play a pivotal role in vehicle identification, making automatic LP detection (ALPD) a crucial area within traffic analysis. Recent advancements in deep learning have spurred a surge of studies in ALPD.

View Article and Find Full Text PDF

Although polylactic acid (PLA) represents a pivotal biodegradable polymer, its biodegradability has inadvertently overshadowed the development of effective recycling techniques, leading to the potential wastage of carbon resources. The photoreforming-recycling approach for PLA exhibits significant potential in terms of concepts and methods. However, the reaction faces enormous challenges due to the limited selectivity of organic oxidation products as well as the increased costs and challenging separation of organic products associated with alkali-solution-assisted prehydrolysis.

View Article and Find Full Text PDF

With the explosive growth of videos, weakly-supervised temporal action localization (WS-TAL) task has become a promising research direction in pattern analysis and machine learning. WS-TAL aims to detect and localize action instances with only video-level labels during training. Modern approaches have achieved impressive progress via powerful deep neural networks.

View Article and Find Full Text PDF

Weakly-supervised temporal action localization (WTAL) aims to localize the action instances and recognize their categories with only video-level labels. Despite great progress, existing methods suffer from severe action-background ambiguity, which mainly arises from background noise and neglect of non-salient action snippets. To address this issue, we propose a generalized evidential deep learning (EDL) framework for WTAL, called Uncertainty-aware Dual-Evidential Learning (UDEL), which extends the traditional paradigm of EDL to adapt to the weakly-supervised multi-label classification goal with the guidance of epistemic and aleatoric uncertainties, of which the former comes from models lacking knowledge, while the latter comes from the inherent properties of samples themselves.

View Article and Find Full Text PDF

Drawing inspiration from the enzyme nitrogenase in nature, researchers are increasingly delving into semiconductor photocatalytic nitrogen fixation due to its similar surface catalytic processes. Herein, we reported a facile and efficient approach to achieving the regulation of ZnO/ZnCr O photocatalysts with ZnCr-layered double hydroxide (ZnCr-LDH) as precursors. By optimizing the composition ratio of Zn/Cr in ZnCr-LDH to tune interfaces, we can achieve an enhanced nitrogen photofixation performance (an ammonia evolution rate of 31.

View Article and Find Full Text PDF

Weakly-supervised temporal action localization (WSTAL) aims to automatically identify and localize action instances in untrimmed videos with only video-level labels as supervision. In this task, there exist two challenges: (1) how to accurately discover the action categories in an untrimmed video (what to discover); (2) how to elaborately focus on the integral temporal interval of each action instance (where to focus). Empirically, to discover the action categories, discriminative semantic information should be extracted, while robust temporal contextual information is beneficial for complete action localization.

View Article and Find Full Text PDF

Crowd localization is to predict each instance head position in crowd scenarios. Since the distance of pedestrians being to the camera are variant, there exists tremendous gaps among scales of instances within an image, which is called the intrinsic scale shift. The core reason of intrinsic scale shift being one of the most essential issues in crowd localization is that it is ubiquitous in crowd scenes and makes scale distribution chaotic.

View Article and Find Full Text PDF

Point-level weakly-supervised temporal action localization (P-WSTAL) aims to localize temporal extents of action instances and identify the corresponding categories with only a single point label for each action instance for training. Due to the sparse frame-level annotations, most existing models are in the localization-by-classification pipeline. However, there exist two major issues in this pipeline: large intra-action variation due to task gap between classification and localization and noisy classification learning caused by unreliable pseudo training samples.

View Article and Find Full Text PDF

Video crowd localization is a crucial yet challenging task, which aims to estimate exact locations of human heads in the given crowded videos. To model spatial-temporal dependencies of human mobility, we propose a multi-focus Gaussian neighborhood attention (GNA), which can effectively exploit long-range correspondences while maintaining the spatial topological structure of the input videos. In particular, our GNA can also capture the scale variation of human heads well using the equipped multi-focus mechanism.

View Article and Find Full Text PDF

We target at the task of weakly-supervised video object grounding (WSVOG), where only video-sentence annotations are available during model learning. It aims to localize objects described in the sentence to visual regions in the video, which is a fundamental capability needed in pattern analysis and machine learning. Despite the recent progress, existing methods all suffer from the severe problem of spurious association, which will harm the grounding performance.

View Article and Find Full Text PDF

Recently, crowd counting using supervised learning achieves a remarkable improvement. Nevertheless, most counters rely on a large amount of manually labeled data. With the release of synthetic crowd data, a potential alternative is transferring knowledge from them to real data without any manual label.

View Article and Find Full Text PDF

This study aimed to investigate the mental health status of nurses from low-risk areas of novel coronavirus (COVID-19) pandemic, its potential impact factors, and the main stressors under the normalized prevention and control in China. A mobile phone app-based survey was conducted among registered nurses in Jiangsu province via a region-stratified sampling method. The questionnaire consisted of items on the demographic characteristics of the nursing staff and their Depression, Anxiety, Stress Scale-21 (DASS-21) along with questions for self-assessment of stressors that are associated with COVID-19.

View Article and Find Full Text PDF

Cross-domain crowd counting (CDCC) is a hot topic due to its importance in public safety. The purpose of CDCC is to alleviate the domain shift between the source and target domain. Recently, typical methods attempt to extract domain-invariant features via image translation and adversarial learning.

View Article and Find Full Text PDF

Many CNN-based segmentation methods have been applied in lane marking detection recently and gain excellent success for a strong ability in modeling semantic information. Although the accuracy of lane line prediction is getting better and better, lane markings' localization ability is relatively weak, especially when the lane marking point is remote. Traditional lane detection methods usually utilize highly specialized handcrafted features and carefully designed postprocessing to detect the lanes.

View Article and Find Full Text PDF

With the development of deep neural networks, the performance of crowd counting and pixel-wise density estimation is continually being refreshed. Despite this, there are still two challenging problems in this field: 1) current supervised learning needs a large amount of training data, but collecting and annotating them is difficult and 2) existing methods cannot generalize well to the unseen domain. A recently released synthetic crowd dataset alleviates these two problems.

View Article and Find Full Text PDF

Recently, crowd counting draws much attention on account of its significant meaning in congestion control, public safety, and ecological surveys. Although the performance is improved dramatically due to the development of deep learning, the scales of these networks also become larger and more complex. Moreover, a large model also entails more time to train for better performance.

View Article and Find Full Text PDF

In the last decade, crowd counting and localization attract much attention of researchers due to its wide-spread applications, including crowd monitoring, public safety, space design, etc. Many convolutional neural networks (CNN) are designed for tackling this task. However, currently released datasets are so small-scale that they can not meet the needs of the supervised CNN-based algorithms.

View Article and Find Full Text PDF

With the explosive growth of video categories, zero-shot learning (ZSL) in video classification has become a promising research direction in pattern analysis and machine learning. Based on some auxiliary information such as word embeddings and attributes, the key to a robust ZSL method is to transfer the learned knowledge from seen classes to unseen classes, which requires relationship modeling between these concepts (e.g.

View Article and Find Full Text PDF

Semantic segmentation, a pixel-level vision task, is rapidly developed by using convolutional neural networks (CNNs). Training CNNs requires a large amount of labeled data, but manually annotating data is difficult. For emancipating manpower, in recent years, some synthetic datasets are released.

View Article and Find Full Text PDF

Most existing trackers are either sampling-based or regression-based methods. Sampling-based methods estimate the target state by sampling many target candidates. Although these methods achieve significant performance, they often suffer from a high computational burden.

View Article and Find Full Text PDF

Most existing part based tracking methods are part-to-part trackers, which usually have two separated steps including part matching and target localization. Different from existing methods, in this paper, we propose a novel part-totarget (P2T) tracker in a unified fashion by inferring target location from parts directly. To achieve this goal, we propose a novel deep regression model for part to target regression in an end-to-end framework via Convolutional Neural Networks.

View Article and Find Full Text PDF