Speech emotion recognition (SER) is not only a ubiquitous aspect of everyday communication, but also a central focus in the field of human-computer interaction. However, SER faces several challenges, including difficulties in detecting subtle emotional nuances and the complicated task of recognizing speech emotions in noisy environments. To effectively address these challenges, we introduce a Transformer-based model called MelTrans, which is designed to distill critical clues from speech data by learning core features and long-range dependencies.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
September 2024
Filter pruning has gained widespread adoption for the purpose of compressing and speeding up convolutional neural networks (CNNs). However, the existing approaches are still far from practical applications due to biased filter selection and heavy computation cost. This article introduces a new filter pruning method that selects filters in an interpretable, multiperspective, and lightweight manner.
View Article and Find Full Text PDFIEEE Trans Image Process
October 2024
Few-shot object detection (FSOD) identifies objects from extremely few annotated samples. Most existing FSOD methods, recently, apply the two-stage learning paradigm, which transfers the knowledge learned from abundant base classes to assist the few-shot detectors by learning the global features. However, such existing FSOD approaches seldom consider the localization of objects from local to global.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
February 2024
Unsupervised graph-structure learning (GSL) which aims to learn an effective graph structure applied to arbitrary downstream tasks by data itself without any labels' guidance, has recently received increasing attention in various real applications. Although several existing unsupervised GSL has achieved superior performance in different graph analytical tasks, how to utilize the popular graph masked autoencoder to sufficiently acquire effective supervision information from the data itself for improving the effectiveness of learned graph structure has been not effectively explored so far. To tackle the above issue, we present a multilevel contrastive graph masked autoencoder (MCGMAE) for unsupervised GSL.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
June 2023
Sparse additive machines (SAMs) have shown competitive performance on variable selection and classification in high-dimensional data due to their representation flexibility and interpretability. However, the existing methods often employ the unbounded or nonsmooth functions as the surrogates of 0-1 classification loss, which may encounter the degraded performance for data with outliers. To alleviate this problem, we propose a robust classification method, named SAM with the correntropy-induced loss (CSAM), by integrating the correntropy-induced loss (C-loss), the data-dependent hypothesis space, and the weighted l -norm regularizer ( q ≥ 1 ) into additive machines.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
November 2023
Zero-shot learning (ZSL) tackles the novel class recognition problem by transferring semantic knowledge from seen classes to unseen ones. Semantic knowledge is typically represented by attribute descriptions shared between different classes, which act as strong priors for localizing object attributes that represent discriminative region features, enabling significant and sufficient visual-semantic interaction for advancing ZSL. Existing attention-based models have struggled to learn inferior region features in a single image by solely using unidirectional attention, which ignore the transferable and discriminative attribute localization of visual features for representing the key semantic knowledge for effective knowledge transfer in ZSL.
View Article and Find Full Text PDFBackground: In the brain tumor magnetic resonance image (MRI) segmentation, although the 3D convolution networks (CNNs) has achieved state-of-the-art results, the class and hard-voxel imbalances in the 3D images have not been well addressed. Voxel independent losses are dependent on the setting of class weights for the class imbalance issue, and are hard to assign each class equally. Region-related losses cannot correctly focus on hard voxels dynamically and not be robust to misclassification of small structures.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
April 2024
Despite the great success of the existing work in fine-grained visual categorization (FGVC), there are still several unsolved challenges, e.g., poor interpretation and vagueness contribution.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
April 2024
Zero-shot learning (ZSL) tackles the unseen class recognition problem by transferring semantic knowledge from seen classes to unseen ones. Typically, to guarantee desirable knowledge transfer, a direct embedding is adopted for associating the visual and semantic domains in ZSL. However, most existing ZSL methods focus on learning the embedding from implicit global features or image regions to the semantic space.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
October 2023
Recent weakly supervised semantic segmentation methods generate pseudolabels to recover the lost position information in weak labels for training the segmentation network. Unfortunately, those pseudolabels often contain mislabeled regions and inaccurate boundaries due to the incomplete recovery of position information. It turns out that the result of semantic segmentation becomes determinate to a certain degree.
View Article and Find Full Text PDFIEEE Trans Cybern
July 2023
The divide-and-conquer strategy is a very effective method of dealing with big data. Noisy samples in big data usually have a great impact on algorithmic performance. In this article, we introduce Markov sampling and different weights for distributed learning with the classical support vector machine (cSVM).
View Article and Find Full Text PDFPurpose: In neonatal brain magnetic resonance image (MRI) segmentation, the model we trained on the training set (source domain) often performs poorly in clinical practice (target domain). As the label of target-domain images is unavailable, this cross-domain segmentation needs unsupervised domain adaptation (UDA) to make the model adapt to the target domain. However, the shape and intensity distribution of neonatal brain MRI images across the domains are largely different from adults'.
View Article and Find Full Text PDFUltrasound images are widely used for diagnosis of congenital abnormalities of the kidney and urinary tract (CAKUT). Since a typical clinical ultrasound image captures 2D information of a specific view plan of the kidney and images of the same kidney on different planes have varied appearances, it is challenging to develop a computer aided diagnosis tool robust to ultrasound images in different views. To overcome this problem, we develop a multi-instance deep learning method for distinguishing children with CAKUT from controls based on their clinical ultrasound images, aiming to automatic diagnose the CAKUT in children based on ultrasound imaging data.
View Article and Find Full Text PDFObjective: To reliably and quickly diagnose children with posterior urethral valves (PUV), we developed a multi-instance deep learning method to automate image analysis.
Methods: We built a robust pattern classifier to distinguish 86 children with PUV from 71 children with mild unilateral hydronephrosis based on ultrasound images (3504 in sagittal view and 2558 in transverse view) obtained during routine clinical care.
Results: The multi-instance deep learning classifier performed better than classifiers built on either single sagittal images or single transverse images.
IEEE Trans Neural Netw Learn Syst
March 2021
Low-rank Multiview Subspace Learning (LMvSL) has shown great potential in cross-view classification in recent years. Despite their empirical success, existing LMvSL-based methods are incapable of handling well view discrepancy and discriminancy simultaneously, which, thus, leads to performance degradation when there is a large discrepancy among multiview data. To circumvent this drawback, motivated by the block-diagonal representation learning, we propose structured low-rank matrix recovery (SLMR), a unique method of effectively removing view discrepancy and improving discriminancy through the recovery of the structured low-rank matrix.
View Article and Find Full Text PDFUncertain Safe Util Machine Learn Med Imaging Clin Image Based Proced (2019)
October 2019
Ultrasound imaging (US) is commonly used in nephrology for diagnostic studies of the kidneys and lower urinary tract. However, it remains challenging to automate the disease diagnosis based on clinical 2D US images since they provide partial anatomic information of the kidney and the 2D images of the same kidney may have heterogeneous appearance. To overcome this challenge, we develop a novel multi-instance deep learning method to build a robust classifier by treating multiple 2D US images of each individual subject as multiple instances of one bag.
View Article and Find Full Text PDFProc IEEE Int Symp Biomed Imaging
April 2019
It remains challenging to automatically segment kidneys in clinical ultrasound images due to the kidneys' varied shapes and image intensity distributions, although semi-automatic methods have achieved promising performance. In this study, we developed a novel boundary distance regression deep neural network to segment the kidneys, informed by the fact that the kidney boundaries are relatively consistent across images in terms of their appearance. Particularly, we first use deep neural networks pre-trained for classification of natural images to extract high-level image features from ultrasound images, then these feature maps are used as input to learn kidney boundary distance maps using a boundary distance regression network, and finally the predicted boundary distance maps are classified as kidney pixels or non-kidney pixels using a pixel classification network in an end-to-end learning fashion.
View Article and Find Full Text PDFIt remains challenging to automatically segment kidneys in clinical ultrasound (US) images due to the kidneys' varied shapes and image intensity distributions, although semi-automatic methods have achieved promising performance. In this study, we propose subsequent boundary distance regression and pixel classification networks to segment the kidneys automatically. Particularly, we first use deep neural networks pre-trained for classification of natural images to extract high-level image features from US images.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
January 2021
With the expansion of data, increasing imbalanced data has emerged. When the imbalance ratio (IR) of data is high, most existing imbalanced learning methods decline seriously in classification performance. In this paper, we systematically investigate the highly imbalanced data classification problem, and propose an uncorrelated cost-sensitive multiset learning (UCML) approach for it.
View Article and Find Full Text PDFWe present a novel cross-view classification algorithm where the gallery and probe data come from different views. A popular approach to tackle this problem is the multiview subspace learning (MvSL) that aims to learn a latent subspace shared by multiview data. Despite promising results obtained on some applications, the performance of existing methods deteriorates dramatically when the multiview data is sampled from nonlinear manifolds or suffers from heavy outliers.
View Article and Find Full Text PDFVideo-based person re-identification (re-id) is an important application in practice. Since large variations exist between different pedestrian videos, as well as within each video, it's challenging to conduct re-identification between pedestrian videos. In this paper, we propose a simultaneous intra-video and inter-video distance learning (SI2DL) approach for video-based person re-id.
View Article and Find Full Text PDFPurpose: Low signal-to-noise-ratio and limited scan time of diffusion magnetic resonance imaging (dMRI) in current clinical settings impede obtaining images with high spatial and angular resolution (HSAR) for a reliable fiber reconstruction with fine anatomical details. To overcome this problem, we propose a joint space-angle regularization approach to reconstruct HSAR diffusion signals from a single 4D low resolution (LR) dMRI, which is down-sampled in both 3D-space and q-space.
Methods: Different from the existing works which combine multiple 4D LR diffusion images acquired using specific acquisition protocols, the proposed method reconstructs HSAR dMRI from only a single 4D dMRI by exploring and integrating two key priors, that is, the nonlocal self-similarity in the spatial domain as a prior to increase spatial resolution and ridgelet approximations in the diffusion domain as another prior to increase the angular resolution of dMRI.
Support vector machine (SVM) is one of the most widely used learning algorithms for classification problems. Although SVM has good performance in practical applications, it has high algorithmic complexity as the size of training samples is large. In this paper, we introduce SVM classification (SVMC) algorithm based on -times Markov sampling and present the numerical studies on the learning performance of SVMC with -times Markov sampling for benchmark data sets.
View Article and Find Full Text PDFIn cognitive radio networks, self-interested secondary users (SUs) desire to maximize their own throughput. They compete with each other for transmit time once the absence of primary users (PUs) is detected. To satisfy the requirement of PU protection, on the other hand, they have to form some coalitions and cooperate to conduct spectrum sensing.
View Article and Find Full Text PDFVisual tracking is a critical task in many computer vision applications such as surveillance and robotics. However, although the robustness to local corruptions has been improved, prevailing trackers are still sensitive to large scale corruptions, such as occlusions and illumination variations. In this paper, we propose a novel robust object tracking technique depends on subspace learning-based appearance model.
View Article and Find Full Text PDF