IEEE Trans Neural Netw Learn Syst
February 2024
Text-based person retrieval is the process of searching a massive visual resource library for images of a particular pedestrian, based on a textual query. Existing approaches often suffer from a problem of color (CLR) over-reliance, which can result in a suboptimal person retrieval performance by distracting the model from other important visual cues such as texture and structure information. To handle this problem, we propose a novel framework to Excavate All-round Information Beyond Color for the task of text-based person retrieval, which is therefore termed EAIBC.
View Article and Find Full Text PDFCross-modal human pose estimation has a wide range of applications. Traditional image-based pose estimation will not work well in poor light or darkness. Therefore, some sensors such as LiDAR or Radio Frequency (RF) signals are now using to estimate human pose.
View Article and Find Full Text PDFRecently, deep neural network-based image compressed sensing methods have achieved impressive success in reconstruction quality. However, these methods (1) have limitations in sampling pattern and (2) usually have the disadvantage of high computational complexity. To this end, a fast multi-scale generative adversarial network (FMSGAN) is implemented in this paper.
View Article and Find Full Text PDFMetal-organic frameworks (MOFs) have become an active topic because of their excellent carbon capture and storage (CCS) properties. However, it is quite challenging to identify MOFs with superior performance within a massive combinatorial search space. To this end, we propose a deep-learning-based end-to-end prediction model to rapidly and accurately predict the CO working capacity and CO/N selectivity of a given MOF under low-pressure conditions.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
December 2023
Modeling the spatiotemporal relationship (STR) of traffic data is important yet challenging for existing graph networks. These methods usually capture features separately in temporal and spatial dimensions or represent the spatiotemporal data by adopting multiple local spatial-temporal graphs. The first kind of method mentioned above is difficult to capture potential temporal-spatial relationships, while the other is limited for long-term feature extraction due to its local receptive field.
View Article and Find Full Text PDFIn the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution.
View Article and Find Full Text PDFIEEE Trans Cybern
December 2021
Generating action proposals in untrimmed videos is a challenging task, since video sequences usually contain lots of irrelevant contents and the duration of an action instance is arbitrary. The quality of action proposals is key to action detection performance. The previous methods mainly rely on sliding windows or anchor boxes to cover all ground-truth actions, but this is infeasible and computationally inefficient.
View Article and Find Full Text PDF