Attention-based convolutional neural network (CNN) models are increasingly being adopted for speaker and language recognition (SR/LR) tasks. These include time, frequency, spatial and channel attention, which can focus on useful time frames, frequency bands, regions or channels while extracting features. However, these traditional attention methods lack the exploration of complex information and multi-scale long-range speech feature interactions, which can benefit SR/LR tasks. To address these issues, this paper firstly proposes mixed-order attention (MOA) for low frame-level speech features to gain the finest grain multi-order information at higher resolution. We then combine that with a non-local attention (NLA) mechanism and a dilated residual structure to balance fine grained local detail with convolution from multi-scale long-range time/frequency regions in feature space. The proposed dilated mixed-order non-local attention network (D-MONA) exploits the detail available from the first and the second-order feature attention analysis, but achieves this over a much wider context than purely local attention. Experiments are conducted on three datasets, including two SR tasks of Voxceleb and CN-celeb, and one LR task, NIST LRE 07. For SR, D-MONA improves on ResNet-34 results by at least 29% and 15% for Voxceleb1 and CN-celeb respectively. For the LR task, a large improvement is achieved over ResNet-34 of 21% for the challenging 3s utterance condition, 59% for the 10s condition and 67% for the 30s condition. It also outperforms the state-of-the-art deep bottleneck feature-DNN (DBF-DNN) x-vector system at all scales.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2021.03.014 | DOI Listing |
J Chem Phys
December 2024
Computational Science Research Center, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea.
Graph neural network interatomic potentials (GNN-IPs) are gaining significant attention due to their capability of learning from large datasets. Specifically, universal interatomic potentials based on GNN, usually trained with crystalline geometries, often exhibit remarkable extrapolative behavior toward untrained domains, such as surfaces and amorphous configurations. However, the origin of this extrapolation capability is not well understood.
View Article and Find Full Text PDFSensors (Basel)
November 2024
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China.
For binocular stereo matching techniques, the most advanced method currently is using an iterative structure based on GRUs. Methods in this class have shown high performance on both high-resolution images and standard benchmarks. However, simply replacing cost aggregation with a GRU iterative method leads to the original cost volume for disparity calculation lacking non-local geometric and contextual information.
View Article and Find Full Text PDFSensors (Basel)
November 2024
College of Information Science and Engineering, Ritsumeikan University, Osaka 603-8577, Japan.
Hyperspectral image (HSI) reconstruction is a critical and indispensable step in spectral compressive imaging (CASSI) systems and directly affects our ability to capture high-quality images in dynamic environments. Recent research has increasingly focused on deep unfolding frameworks for HSI reconstruction, showing notable progress. However, these approaches have to break the optimization task into two sub-problems, solving them iteratively over multiple stages, which leads to large models and high computational overheads.
View Article and Find Full Text PDFMagn Reson Imaging
February 2025
Department of Computer Science and Technology, Zhejiang Normal University, Jinhua, Zhejiang 321004, China.
Purpose: This study introduces GraFMRI, a novel framework designed to address the challenges of reconstructing high-quality MRI images from undersampled k-space data. Traditional methods often suffer from noise amplification and loss of structural detail, leading to suboptimal image quality. GraFMRI leverages Graph Neural Networks (GNNs) to transform multi-modal MRI data (T1, T2, PD) into a graph-based representation, enabling the model to capture intricate spatial relationships and inter-modality dependencies.
View Article and Find Full Text PDFAchieving high-fidelity image transmission through turbid media is a significant challenge facing both the AI and photonic/optical communities. While this capability holds promise for a variety of applications, including data transfer, neural endoscopy, and multi-mode optical fiber-based imaging, conventional deep learning methods struggle to capture the nuances of light propagation, leading to weak generalization and limited reconstruction performance. To address this limitation, we investigated the non-locality present in the reconstructed images and discovered that conventional deep learning methods rely on specific features extracted from the training dataset rather than meticulously reconstructing each pixel.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!