IEEE Trans Pattern Anal Mach Intell
April 2025
We introduce Hyper-YOLO, a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features. Traditional YOLO models, while powerful, have limitations in their neck designs that restrict the integration of cross-level features and the exploitation of high-order feature interrelationships. To address these challenges, we propose the Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS) framework, which transposes visual feature maps into a semantic space and constructs a hypergraph for high-order message propagation.
View Article and Find Full Text PDFObjective: To address the high-order correlation modeling and fusion challenges between functional and structural brain networks.
Method: This paper proposes a hypergraph transformer method for modeling high-order correlations between functional and structural brain networks. By utilizing hypergraphs, we can effectively capture the high-order correlations within brain networks.
IEEE J Biomed Health Inform
February 2025
Elastography ultrasound imaging is increasingly important in the diagnosis of thyroid cancer and other diseases, but its reliance on specialized equipment and techniques limits widespread adoption. This paper proposes a novel multimodal ultrasound diagnostic pipeline that expands the application of elastography ultrasound by translating B-ultrasound (BUS) images into elastography images (EUS). Additionally, to address the limitations of existing image-to-image translation methods, which struggle to effectively model inter-sample variations and accurately capture regional-scale structural consistency, we propose a BUS-to-EUS translation method based on hierarchical structural consistency.
View Article and Find Full Text PDFVis Comput Ind Biomed Art
July 2024
Pneumonia is a serious disease that can be fatal, particularly among children and the elderly. The accuracy of pneumonia diagnosis can be improved by combining artificial-intelligence technology with X-ray imaging. This study proposes X-ODFCANet, which addresses the issues of low accuracy and excessive parameters in existing deep-learning-based pneumonia-classification methods.
View Article and Find Full Text PDFIEEE Trans Image Process
June 2024
Inferring 3D human motion is fundamental in many applications, including understanding human activity and analyzing one's intention. While many fruitful efforts have been made to human motion prediction, most approaches focus on pose-driven prediction and inferring human motion in isolation from the contextual environment, thus leaving the body location movement in the scene behind. However, real-world human movements are goal-directed and highly influenced by the spatial layout of their surrounding scenes.
View Article and Find Full Text PDFCrowd counting models in highly congested areas confront two main challenges: weak localization ability and difficulty in differentiating between foreground and background, leading to inaccurate estimations. The reason is that objects in highly congested areas are normally small and high-level features extracted by convolutional neural networks are less discriminative to represent small objects. To address these problems, we propose a learning discriminative features framework for crowd counting, which is composed of a masked feature prediction module (MPM) and a supervised pixel-level contrastive learning module (CLM).
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
October 2024
IEEE Trans Pattern Anal Mach Intell
September 2024
Self-supervised representation learning for 3D point clouds has attracted increasing attention. However, existing methods in the field of 3D computer vision generally use fixed embeddings to represent the latent features, and impose hard constraints on the embeddings to make the latent feature values of the positive samples converge to consistency, which limits the ability of feature extractors to generalize over different data domains. To address this issue, we propose a Generative Variational-Contrastive Learning (GVC) model, where Gaussian distribution is used to construct a continuous, smoothed representation of the latent features.
View Article and Find Full Text PDFPredicting the trajectory of pedestrians in crowd scenarios is indispensable in self-driving or autonomous mobile robot field because estimating the future locations of pedestrians around is beneficial for policy decision to avoid collision. It is a challenging issue because humans have different walking motions, and the interactions between humans and objects in the current environment, especially between humans themselves, are complex. Previous researchers focused on how to model human-human interactions but neglected the relative importance of interactions.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
December 2024
In this article, we propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis, which exploits a strategy named GradUally Enriching SyntheSis (GUESS as its abbreviation). The strategy sets up generation objectives by grouping body joints of detailed skeletons in close semantic proximity together and then replacing each of such joint group with a single body-part node. Such an operation recursively abstracts a human pose to coarser and coarser skeletons at multiple granularity levels.
View Article and Find Full Text PDFIn the realm of modern medicine, medical imaging stands as an irreplaceable pillar for accurate diagnostics. The significance of precise segmentation in medical images cannot be overstated, especially considering the variability introduced by different practitioners. With the escalating volume of medical imaging data, the demand for automated and efficient segmentation methods has become imperative.
View Article and Find Full Text PDFIEEE Trans Image Process
November 2023
Counting objects in crowded scenes remains a challenge to computer vision. The current deep learning based approach often formulate it as a Gaussian density regression problem. Such a brute-force regression, though effective, may not consider the annotation displacement properly which arises from the human annotation process and may lead to different distributions.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
April 2024
The traditional 3D object retrieval (3DOR) task is under the close-set setting, which assumes the categories of objects in the retrieval stage are all seen in the training stage. Existing methods under this setting may tend to only lazily discriminate their categories, while not learning a generalized 3D object embedding. Under such circumstances, it is still a challenging and open problem in real-world applications due to the existence of various unseen categories.
View Article and Find Full Text PDFIn the 3D skeleton-based action recognition task, learning rich spatial and temporal motion patterns from body joints are two foundational yet under-explored problems. In this paper, we propose two methods for improving these problems: (I) a novel glimpse-focus action recognition strategy that captures multi-range pose features from the whole body and key body parts jointly; (II) a powerful temporal feature extractor JD-TC that enriches trajectory features by inferring different inter-frame correlations for different joints. By coupling these two proposals, we develop a powerful skeleton-based action recognition system that extracts rich pose and trajectory features from a skeleton sequence and outperforms previous state-of-the-art methods on three large-scale datasets.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
December 2023
After decades of investigation, point cloud registration is still a challenging task in practice, especially when the correspondences are contaminated by a large number of outliers. It may result in a rapidly decreasing probability of generating a hypothesis close to the true transformation, leading to the failure of point cloud registration. To tackle this problem, we propose a transformation estimation method, named Hunter, for robust point cloud registration with severe outliers.
View Article and Find Full Text PDFVisual saliency refers to the human's ability to quickly focus on important parts of their visual field, which is a crucial aspect of image processing, particularly in fields like medical imaging and robotics. Understanding and simulating this mechanism is crucial for solving complex visual problems. In this paper, we propose a salient object detection method based on boundary enhancement, which is applicable to both 2D and 3D sensors data.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
December 2023
Recent years have witnessed remarkable achievements in video-based action recognition. Apart from traditional frame-based cameras, event cameras are bio-inspired vision sensors that only record pixel-wise brightness changes rather than the brightness value. However, little effort has been made in event-based action recognition, and large-scale public datasets are also nearly unavailable.
View Article and Find Full Text PDFUnderstanding and analyzing 2D/3D sensor data is crucial for a wide range of machine learning-based applications, including object detection, scene segmentation, and salient object detection. In this context, interactive object segmentation is a vital task in image editing and medical diagnosis, involving the accurate separation of the target object from its background based on user annotation information. However, existing interactive object segmentation methods struggle to effectively leverage such information to guide object-segmentation models.
View Article and Find Full Text PDFIEEE J Biomed Health Inform
October 2023
Bone age, as a measure of biological age (BA), plays an important role in a variety of fields, including forensics, orthodontics, sports, and immigration. Despite its significance, accurate estimation of BA remains a challenge due to the uncertainty error between BA and chronological age (CA) caused by individual diversity and the difficult integration of multiple factors, such as sex, and identified or measured anatomical structures, into the estimation process. To address problems, we propose an uncertainty-aware and sex-prior guided biological age estimation from orthopantomogram images (OPGs), named UASP-BAE, which models uncertainty errors while setting sex dimorphism as tractive features to enhance age-related specific features, aiming to improve the accuracy of BA estimation.
View Article and Find Full Text PDFIntroduction: The human brain processes shape and texture information separately through different neurons in the visual system. In intelligent computer-aided imaging diagnosis, pre-trained feature extractors are commonly used in various medical image recognition methods, common pre-training datasets such as ImageNet tend to improve the texture representation of the model but make it ignore many shape features. Weak shape feature representation is disadvantageous for some tasks that focus on shape features in medical image analysis.
View Article and Find Full Text PDFAutomated analysis of the vessel structure in intravascular optical coherence tomography (IVOCT) images is critical to assess the health status of vessels and monitor coronary artery disease progression. However, deep learning-based methods usually require well-annotated large datasets, which are difficult to obtain in the field of medical image analysis. Hence, an automatic layers segmentation method based on meta-learning was proposed, which can simultaneously extract the surfaces of the lumen, intima, media, and adventitia using a handful of annotated samples.
View Article and Find Full Text PDFIntroduction: In the clinical setting, it becomes increasingly important to detect epileptic seizures automatically since it could significantly reduce the burden for the care of patients suffering from intractable epilepsy. Electroencephalography (EEG) signals record the brain's electrical activity and contain rich information about brain dysfunction. As a non-invasive and inexpensive tool for detecting epileptic seizures, visual evaluation of EEG recordings is labor-intensive and subjective and requires significant improvement.
View Article and Find Full Text PDFSex estimation is very important in forensic applications as part of individual identification. Morphological sex estimation methods predominantly focus on anatomical measurements. Based on the close relationship between sex chromosome genes and facial characterization, craniofacial hard tissues morphology shows sex dimorphism.
View Article and Find Full Text PDF