Concrete is the most widely used and highest-volume basic material in the word today. Enhancing its toughness, including tensile strength and deformation resistance, can boost the structural load-bearing capacity, minimize cracking, and decrease the amount of concrete and steel required in engineering projects. These advancements are crucial for the safety, durability, energy efficiency, and emission reduction of structural engineering.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
October 2024
In this paper, we propose the Vision-Audio-Language Omni-peRception pretraining model (VALOR) for multimodal understanding and generation. Unlike widely-studied vision-language pretraining models, VALOR jointly models the relationships among vision, audio, and language in an end-to-end manner. It consists of three separate encoders for single modality representations and a decoder for multimodal conditional text generation.
View Article and Find Full Text PDFIEEE Trans Image Process
August 2024
Fine-grained visual classification aims to classify similar sub-categories with the challenges of large variations within the same sub-category and high visual similarities between different sub-categories. Recently, methods that extract semantic parts of the discriminative regions have attracted increasing attention. However, most existing methods extract the part features via rectangular bounding boxes by object detection module or attention mechanism, which makes it difficult to capture the rich shape information of objects.
View Article and Find Full Text PDFBiological materials relying on hierarchically ordered architectures inspire the emergence of advanced composites with mutually exclusive mechanical properties, but the efficient topology optimization and large-scale manufacturing remain challenging. Herein, this work proposes a scalable bottom-up approach to fabricate a novel nacre-like cement-resin composite with gradient brick-and-mortar (BM) structure, and demonstrates a machine learning-assisted method to optimize the gradient structure. The fabricated gradient composite exhibits an extraordinary combination of high flexural strength, toughness, and impact resistance.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
April 2024
Knowledge distillation-based anomaly detection (KDAD) methods rely on the teacher-student paradigm to detect and segment anomalous regions by contrasting the unique features extracted by both networks. However, existing KDAD methods suffer from two main limitations: 1) the student network can effortlessly replicate the teacher network's representations and 2) the features of the teacher network serve solely as a "reference standard" and are not fully leveraged. Toward this end, we depart from the established paradigm and instead propose an innovative approach called asymmetric distillation postsegmentation (ADPS).
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2024
How to effectively explore the colors of exemplars and propagate them to colorize each frame is vital for exemplar-based video colorization. In this article, we present a BiSTNet to explore colors of exemplars and utilize them to help video colorization by a bidirectional temporal feature fusion with the guidance of semantic image prior. We first establish the semantic correspondence between each frame and the exemplars in deep feature space to explore color information from exemplars.
View Article and Find Full Text PDFIEEE Trans Image Process
February 2024
The image-level label has prevailed in weakly supervised semantic segmentation tasks due to its easy availability. Since image-level labels can only indicate the existence or absence of specific categories of objects, visualization-based techniques have been widely adopted to provide object location clues. Considering class activation maps (CAMs) can only locate the most discriminative part of objects, recent approaches usually adopt an expansion strategy to enlarge the activation area for more integral object localization.
View Article and Find Full Text PDFIEEE Trans Image Process
December 2023
Recognizing actions performed on unseen objects, known as Compositional Action Recognition (CAR), has attracted increasing attention in recent years. The main challenge is to overcome the distribution shift of "action-objects" pairs between the training and testing sets. Previous works for CAR usually introduce extra information (e.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
May 2024
Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that have the same category as others. However, most previous methods underestimate such information.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
April 2024
Stereo matching is a fundamental building block for many vision and robotics applications. An informative and concise cost volume representation is vital for stereo matching of high accuracy and efficiency. In this article, we present a novel cost volume construction method, named attention concatenation volume (ACV), which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume.
View Article and Find Full Text PDFAlite dissolution plays a crucial role in cement hydration. However, quantitative investigations into alite powder dissolution are limited, especially regarding the influence of chemical admixtures. This study investigates the impact of particle size, temperature, saturation level, and mixing speed on alite powder dissolution rate, considering the real-time evolution of specific surface area during the alite powder dissolution process.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
November 2023
This article proposes a new hashing framework named relational consistency induced self-supervised hashing (RCSH) for large-scale image retrieval. To capture the potential semantic structure of data, RCSH explores the relational consistency between data samples in different spaces, which learns reliable data relationships in the latent feature space and then preserves the learned relationships in the Hamming space. The data relationships are uncovered by learning a set of prototypes that group similar data samples in the latent feature space.
View Article and Find Full Text PDFIEEE Trans Image Process
November 2023
Text-Image Person Re-identification (TIReID) aims to retrieve the image corresponding to the given text query from a pool of candidate images. Existing methods employ prior knowledge from single-modality pre-training to facilitate learning, but lack multi-modal correspondence information. Vision-Language Pre-training, such as CLIP (Contrastive Language-Image Pretraining), can address the limitation.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
September 2023
Text-based person search (TBPS) is a challenging task that aims to search pedestrian images with the same identity from an image gallery given a query text. In recent years, TBPS has made remarkable progress, and state-of-the-art (SOTA) methods achieve superior performance by learning local fine-grained correspondence between images and texts. However, most existing methods rely on explicitly generated local parts to model fine-grained correspondence between modalities, which is unreliable due to the lack of contextual information or the potential introduction of noise.
View Article and Find Full Text PDFPolymers are known to effectively improve the toughness of inorganic matrices; however, the mechanism at the molecular level is still unclear. In this study, we used molecular dynamics simulations to unravel the effects and mechanisms of different molecular chain lengths of polyacrylic acid (PAA) on toughening calcium silicate hydrate (CSH), which is the basic building block of cement-based materials. Our simulation results indicate that an optimal molecular chain length of polymers contributes to the largest toughening effect on the matrix, leading to up to 60.
View Article and Find Full Text PDFThe visual feature pyramid has shown its superiority in both effectiveness and efficiency in a variety of applications. However, current methods overly focus on inter-layer feature interactions while disregarding the importance of intra-layer feature regulation. Despite some attempts to learn a compact intra-layer feature representation with the use of attention mechanisms or vision transformers, they overlook the crucial corner regions that are essential for dense prediction tasks.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
October 2024
Thanks to the advantages of the friendly annotations and the satisfactory performance, weakly-supervised semantic segmentation (WSSS) approaches have been extensively studied. Recently, the single-stage WSSS (SS-WSSS) was awakened to alleviate problems of the expensive computational costs and the complicated training procedures in multistage WSSS. However, the results of such an immature model suffer from problems of background incompleteness and object incompleteness.
View Article and Find Full Text PDFIEEE Trans Image Process
May 2023
Weakly supervised semantic segmentation (WSSS) models relying on class activation maps (CAMs) have achieved desirable performance comparing to the non-CAMs-based counterparts. However, to guarantee WSSS task feasible, we need to generate pseudo labels by expanding the seeds from CAMs which is complex and time-consuming, thus hindering the design of efficient end-to-end (single-stage) WSSS approaches. To tackle the above dilemma, we resort to the off-the-shelf and readily accessible saliency maps for directly obtaining pseudo labels given the image-level class labels.
View Article and Find Full Text PDFNanofibrous composite membranes consisting of polyvinyl alcohol (PVA), sodium alginate (SA), chitosan-nano zinc oxide nanoparticles (CS-Nano-ZnO) and curcumin (Cur) were prepared by ultrasonic processing and electrospinning. When the ultrasonic power was set to 100 W, the prepared CS-Nano-ZnO had a minimum size (404.67 ± 42.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2023
In order to enable the model to generalize to unseen "action-objects" (compositional action), previous methods encode multiple pieces of information (i.e., the appearance, position, and identity of visual instances) independently and concatenate them for classification.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2023
We present compact and effective deep convolutional neural networks (CNNs) by exploring properties of videos for video deblurring. Motivated by the non-uniform blur property that not all the pixels of the frames are blurry, we develop a CNN to integrate a temporal sharpness prior (TSP) for removing blur in videos. The TSP exploits sharp pixels from adjacent frames to facilitate the CNN for better frame restoration.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
February 2023
Deep learning-based models have been shown to outperform human beings in many computer vision tasks with massive available labeled training data in learning. However, humans have an amazing ability to easily recognize images of novel categories by browsing only a few examples of these categories. In this case, few-shot learning comes into being to make machines learn from extremely limited labeled examples.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
July 2024
Supervised classification of hyperspectral image (HSI) is generally required to obtain better performance in spectral-spatial feature learning by fully using complex pixel- and superpixel-level interdependencies with small labeled samples. Limited by the local regular convolutions, convolutional neural networks (CNNs) can only exploit information from the short-range Euclidean neighbors of a target, hindering the effectiveness of feature representation. In contrast, graph convolutional networks (GCNs) can learn long-range dependencies between non-Euclidean neighbors but usually require the input of a full graph constructed from a whole HSI, making GCNs must be trained in a full-batch manner with tremendous computational consumption.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
June 2023
In the semi-supervised skeleton-based action recognition task, obtaining more discriminative information from both labeled and unlabeled data is a challenging problem. As the current mainstream approach, contrastive learning can learn more representations of augmented data, which can be considered as the pretext task of action recognition. However, such a method still confronts three main limitations: 1) It usually learns global-granularity features that cannot well reflect the local motion information.
View Article and Find Full Text PDF