Transferable non-invasive modal fusion-transformer (NIMFT) for end-to-end hand gesture recognition.

J Neural Eng

School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, People's Republic of China.

Published: April 2024

Recent studies have shown that integrating inertial measurement unit (IMU) signals with surface electromyographic (sEMG) can greatly improve hand gesture recognition (HGR) performance in applications such as prosthetic control and rehabilitation training. However, current deep learning models for multimodal HGR encounter difficulties in invasive modal fusion, complex feature extraction from heterogeneous signals, and limited inter-subject model generalization. To address these challenges, this study aims to develop an end-to-end and inter-subject transferable model that utilizes non-invasively fused sEMG and acceleration (ACC) data.The proposed non-invasive modal fusion-transformer (NIMFT) model utilizes 1D-convolutional neural networks-based patch embedding for local information extraction and employs a multi-head cross-attention (MCA) mechanism to non-invasively integrate sEMG and ACC signals, stabilizing the variability induced by sEMG. The proposed architecture undergoes detailed ablation studies after hyperparameter tuning. Transfer learning is employed by fine-tuning a pre-trained model on new subject and a comparative analysis is performed between the fine-tuning and subject-specific model. Additionally, the performance of NIMFT is compared to state-of-the-art fusion models.The NIMFT model achieved recognition accuracies of 93.91%, 91.02%, and 95.56% on the three action sets in the Ninapro DB2 dataset. The proposed embedding method and MCA outperformed the traditional invasive modal fusion transformer by 2.01% (embedding) and 1.23% (fusion), respectively. In comparison to subject-specific models, the fine-tuning model exhibited the highest average accuracy improvement of 2.26%, achieving a final accuracy of 96.13%. Moreover, the NIMFT model demonstrated superiority in terms of accuracy, recall, precision, and F1-score compared to the latest modal fusion models with similar model scale.The NIMFT is a novel end-to-end HGR model, utilizes a non-invasive MCA mechanism to integrate long-range intermodal information effectively. Compared to recent modal fusion models, it demonstrates superior performance in inter-subject experiments and offers higher training efficiency and accuracy levels through transfer learning than subject-specific approaches.

Download full-text PDF

Source
http://dx.doi.org/10.1088/1741-2552/ad39a5DOI Listing

Publication Analysis

Top Keywords

modal fusion
16
model utilizes
12
nimft model
12
model
10
non-invasive modal
8
modal fusion-transformer
8
fusion-transformer nimft
8
hand gesture
8
gesture recognition
8
invasive modal
8

Similar Publications

Background: Integrating comprehensive information on hepatocellular carcinoma (HCC) is essential to improve its early detection. We aimed to develop a model with multi-modal features (MMF) using artificial intelligence (AI) approaches to enhance the performance of HCC detection.

Materials And Methods: A total of 1,092 participants were enrolled from 16 centers.

View Article and Find Full Text PDF

Multi-modal learning-based algae phyla identification using image and particle modalities.

Water Res

January 2025

School of Civil, Environmental and Architectural Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Republic of Korea. Electronic address:

Algal blooms in freshwater, which are exacerbated by urbanization and climate change, pose significant challenges in the water treatment process. These blooms affect water quality and treatment efficiency. Effective identification of algal proliferation based on the dominant species is important to ensure safe drinking water and a clean water supply.

View Article and Find Full Text PDF

In order to solve the limitations of flipped classroom in personalized teaching and interactive effect improvement, this paper designs a new model of flipped classroom in colleges and universities based on Virtual Reality (VR) by combining the algorithm of Contrastive Language-Image Pre-Training (CLIP). Through cross-modal data fusion, the model deeply combines students' operation behavior with teaching content, and improves teaching effect through intelligent feedback mechanism. The test data shows that the similarity between video and image modes reaches 0.

View Article and Find Full Text PDF

Ground-Target Recognition Method Based on Transfer Learning.

Sensors (Basel)

January 2025

College of Communication Engineering, Jilin University, Changchun 130012, China.

A moving ground-target recognition system can monitor suspicious activities of pedestrians and vehicles in key areas. Currently, most target recognition systems are based on devices such as fiber optics, radar, and vibration sensors. A system based on vibration sensors has the advantages of small size, low power consumption, strong concealment, easy installation, and low power consumption.

View Article and Find Full Text PDF

Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection.

Sensors (Basel)

January 2025

The 54th Research Institute, China Electronics Technology Group Corporation, College of Signal and Information Processing, Shijiazhuang 050081, China.

The multi-sensor fusion, such as LiDAR and camera-based 3D object detection, is a key technology in autonomous driving and robotics. However, traditional 3D detection models are limited to recognizing predefined categories and struggle with unknown or novel objects. Given the complexity of real-world environments, research into open-vocabulary 3D object detection is essential.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!