Recent studies have shown that integrating inertial measurement unit (IMU) signals with surface electromyographic (sEMG) can greatly improve hand gesture recognition (HGR) performance in applications such as prosthetic control and rehabilitation training. However, current deep learning models for multimodal HGR encounter difficulties in invasive modal fusion, complex feature extraction from heterogeneous signals, and limited inter-subject model generalization. To address these challenges, this study aims to develop an end-to-end and inter-subject transferable model that utilizes non-invasively fused sEMG and acceleration (ACC) data.The proposed non-invasive modal fusion-transformer (NIMFT) model utilizes 1D-convolutional neural networks-based patch embedding for local information extraction and employs a multi-head cross-attention (MCA) mechanism to non-invasively integrate sEMG and ACC signals, stabilizing the variability induced by sEMG. The proposed architecture undergoes detailed ablation studies after hyperparameter tuning. Transfer learning is employed by fine-tuning a pre-trained model on new subject and a comparative analysis is performed between the fine-tuning and subject-specific model. Additionally, the performance of NIMFT is compared to state-of-the-art fusion models.The NIMFT model achieved recognition accuracies of 93.91%, 91.02%, and 95.56% on the three action sets in the Ninapro DB2 dataset. The proposed embedding method and MCA outperformed the traditional invasive modal fusion transformer by 2.01% (embedding) and 1.23% (fusion), respectively. In comparison to subject-specific models, the fine-tuning model exhibited the highest average accuracy improvement of 2.26%, achieving a final accuracy of 96.13%. Moreover, the NIMFT model demonstrated superiority in terms of accuracy, recall, precision, and F1-score compared to the latest modal fusion models with similar model scale.The NIMFT is a novel end-to-end HGR model, utilizes a non-invasive MCA mechanism to integrate long-range intermodal information effectively. Compared to recent modal fusion models, it demonstrates superior performance in inter-subject experiments and offers higher training efficiency and accuracy levels through transfer learning than subject-specific approaches.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1088/1741-2552/ad39a5 | DOI Listing |
Int J Surg
January 2025
Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.
Background: Integrating comprehensive information on hepatocellular carcinoma (HCC) is essential to improve its early detection. We aimed to develop a model with multi-modal features (MMF) using artificial intelligence (AI) approaches to enhance the performance of HCC detection.
Materials And Methods: A total of 1,092 participants were enrolled from 16 centers.
Water Res
January 2025
School of Civil, Environmental and Architectural Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Republic of Korea. Electronic address:
Algal blooms in freshwater, which are exacerbated by urbanization and climate change, pose significant challenges in the water treatment process. These blooms affect water quality and treatment efficiency. Effective identification of algal proliferation based on the dominant species is important to ensure safe drinking water and a clean water supply.
View Article and Find Full Text PDFSci Rep
January 2025
School of Electronic and Information Engineering, Changsha Institute of Technology, Changsha, 410200, China.
In order to solve the limitations of flipped classroom in personalized teaching and interactive effect improvement, this paper designs a new model of flipped classroom in colleges and universities based on Virtual Reality (VR) by combining the algorithm of Contrastive Language-Image Pre-Training (CLIP). Through cross-modal data fusion, the model deeply combines students' operation behavior with teaching content, and improves teaching effect through intelligent feedback mechanism. The test data shows that the similarity between video and image modes reaches 0.
View Article and Find Full Text PDFSensors (Basel)
January 2025
College of Communication Engineering, Jilin University, Changchun 130012, China.
A moving ground-target recognition system can monitor suspicious activities of pedestrians and vehicles in key areas. Currently, most target recognition systems are based on devices such as fiber optics, radar, and vibration sensors. A system based on vibration sensors has the advantages of small size, low power consumption, strong concealment, easy installation, and low power consumption.
View Article and Find Full Text PDFSensors (Basel)
January 2025
The 54th Research Institute, China Electronics Technology Group Corporation, College of Signal and Information Processing, Shijiazhuang 050081, China.
The multi-sensor fusion, such as LiDAR and camera-based 3D object detection, is a key technology in autonomous driving and robotics. However, traditional 3D detection models are limited to recognizing predefined categories and struggle with unknown or novel objects. Given the complexity of real-world environments, research into open-vocabulary 3D object detection is essential.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!