The Transformer model has received significant attention in Human Activity Recognition (HAR) due to its self-attention mechanism that captures long dependencies in time series. However, for Inertial Measurement Unit (IMU) sensor time-series signals, the Transformer model does not effectively utilize the a priori information of strong complex temporal correlations. Therefore, we proposed using multi-layer convolutional layers as a Convolutional Feature Extractor Block (CFEB). CFEB enables the Transformer model to leverage both local and global time series features for activity classification. Meanwhile, the absolute position embedding (APE) in existing Transformer models cannot accurately represent the distance relationship between individuals at different time points. To further explore positional correlations in temporal signals, this paper introduces the Vector-based Relative Position Embedding (vRPE), aiming to provide more relative temporal position information within sensor signals for the Transformer model. Combining these innovations, we conduct extensive experiments on three HAR benchmark datasets: KU-HAR, UniMiB SHAR, and USC-HAD. Experimental results demonstrate that our proposed enhancement scheme substantially elevates the performance of the Transformer model in HAR.

Download full-text PDF

Source
http://dx.doi.org/10.3390/s25020301DOI Listing

Publication Analysis

Top Keywords

transformer model
24
position embedding
12
convolutional feature
8
feature extractor
8
extractor block
8
vector-based relative
8
relative position
8
human activity
8
activity recognition
8
time series
8

Similar Publications

Polysomnography (PSG) is crucial for diagnosing sleep disorders, but manual scoring of PSG is time-consuming and subjective, leading to high variability. While machine-learning models have improved PSG scoring, their clinical use is hindered by the 'black-box' nature. In this study, we present SleepXViT, an automatic sleep staging system using Vision Transformer (ViT) that provides intuitive, consistent explanations by mimicking human 'visual scoring'.

View Article and Find Full Text PDF

Although the Transformer architecture has established itself as the industry standard for jobs involving natural language processing, it still has few uses in computer vision. In vision, attention is used in conjunction with convolutional networks or to replace individual convolutional network elements while preserving the overall network design. Differences between the two domains, such as significant variations in the scale of visual things and the higher granularity of pixels in images compared to words in the text, make it difficult to transfer Transformer from language to vision.

View Article and Find Full Text PDF

The current work introduces the hybrid ensemble framework for the detection and segmentation of colorectal cancer. This framework will incorporate both supervised classification and unsupervised clustering methods to present more understandable and accurate diagnostic results. The method entails several steps with CNN models: ADa-22 and AD-22, transformer networks, and an SVM classifier, all inbuilt.

View Article and Find Full Text PDF

Vision transformer-based multimodal fusion network for classification of tumor malignancy on breast ultrasound: A retrospective multicenter study.

Int J Med Inform

January 2025

School of Computer Science and Engineering, Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, PR China. Electronic address:

Background: In the context of routine breast cancer diagnosis, the precise discrimination between benign and malignant breast masses holds utmost significance. Notably, few prior investigations have concurrently explored the integration of imaging histology features, deep learning characteristics, and clinical parameters. The primary objective of this retrospective study was to pioneer a multimodal feature fusion model tailored for the prediction of breast tumor malignancy, harnessing the potential of ultrasound images.

View Article and Find Full Text PDF

Background And Objectives: Hypertensive Retinopathy (HR) is a retinal manifestation resulting from persistently elevated blood pressure. Severity grading of HR is essential for patient risk stratification, effective management, progression monitoring, timely intervention, and minimizing the risk of vision impairment. Computer-aided diagnosis and artificial intelligence (AI) systems play vital roles in the diagnosis and grading of HR.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!