Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition.

Xin Guo Young Kim Xueli Ning Se Dong Min

Sensors (Basel)

Department of Software Convergence, Soonchunhyang University, Asan 31538, Republic of Korea.

Published: January 2025

The Transformer model has received significant attention in Human Activity Recognition (HAR) due to its self-attention mechanism that captures long dependencies in time series. However, for Inertial Measurement Unit (IMU) sensor time-series signals, the Transformer model does not effectively utilize the a priori information of strong complex temporal correlations. Therefore, we proposed using multi-layer convolutional layers as a Convolutional Feature Extractor Block (CFEB). CFEB enables the Transformer model to leverage both local and global time series features for activity classification. Meanwhile, the absolute position embedding (APE) in existing Transformer models cannot accurately represent the distance relationship between individuals at different time points. To further explore positional correlations in temporal signals, this paper introduces the Vector-based Relative Position Embedding (vRPE), aiming to provide more relative temporal position information within sensor signals for the Transformer model. Combining these innovations, we conduct extensive experiments on three HAR benchmark datasets: KU-HAR, UniMiB SHAR, and USC-HAD. Experimental results demonstrate that our proposed enhancement scheme substantially elevates the performance of the Transformer model in HAR.

Download full-text PDF	Source
http://dx.doi.org/10.3390/s25020301	DOI Listing

Publication Analysis

Top Keywords

transformer model

position embedding

convolutional feature

feature extractor

extractor block

vector-based relative

relative position

human activity

activity recognition

time series

Similar Publications

Explainable vision transformer for automatic visual sleep staging on multimodal PSG signals.

NPJ Digit Med

January 2025

Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea.

Hyojin Lee You Rim Choi Hyun Kyung Lee Jaemin Jeong Joopyo Hong

Polysomnography (PSG) is crucial for diagnosing sleep disorders, but manual scoring of PSG is time-consuming and subjective, leading to high variability. While machine-learning models have improved PSG scoring, their clinical use is hindered by the 'black-box' nature. In this study, we present SleepXViT, an automatic sleep staging system using Vision Transformer (ViT) that provides intuitive, consistent explanations by mimicking human 'visual scoring'.

View Article and Find Full Text PDF

Similar Publications

Leveraging two-dimensional pre-trained vision transformers for three-dimensional model generation via masked autoencoders.

Sci Rep

January 2025

Department of Electrical Power, Adama Science and Technology University, Adama, 1888, Ethiopia.

Muhammad Sajid Kaleem Razzaq Malik Ateeq Ur Rehman Tauqeer Safdar Malik Masoud Alajmi

Although the Transformer architecture has established itself as the industry standard for jobs involving natural language processing, it still has few uses in computer vision. In vision, attention is used in conjunction with convolutional networks or to replace individual convolutional network elements while preserving the overall network design. Differences between the two domains, such as significant variations in the scale of visual things and the higher granularity of pixels in images compared to words in the text, make it difficult to transfer Transformer from language to vision.

View Article and Find Full Text PDF

Similar Publications

Colorectal cancer detection with enhanced precision using a hybrid supervised and unsupervised learning approach.

Sci Rep

January 2025

Ministry of Higher Education, Mataria Technical College, Cairo, 11718, Egypt.

Akella S Narasimha Raju K Venkatesh Ranjith Kumar Gatla Eswara Prasad Konakalla Marwa M Eid

The current work introduces the hybrid ensemble framework for the detection and segmentation of colorectal cancer. This framework will incorporate both supervised classification and unsupervised clustering methods to present more understandable and accurate diagnostic results. The method entails several steps with CNN models: ADa-22 and AD-22, transformer networks, and an SVM classifier, all inbuilt.

View Article and Find Full Text PDF

Similar Publications

Vision transformer-based multimodal fusion network for classification of tumor malignancy on breast ultrasound: A retrospective multicenter study.

Int J Med Inform

January 2025

School of Computer Science and Engineering, Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, PR China. Electronic address:

Mengying Li Yin Fang Jiong Shao Yan Jiang Guoping Xu

Background: In the context of routine breast cancer diagnosis, the precise discrimination between benign and malignant breast masses holds utmost significance. Notably, few prior investigations have concurrently explored the integration of imaging histology features, deep learning characteristics, and clinical parameters. The primary objective of this retrospective study was to pioneer a multimodal feature fusion model tailored for the prediction of breast tumor malignancy, harnessing the potential of ultrasound images.

View Article and Find Full Text PDF

Similar Publications

Severity grading of hypertensive retinopathy using hybrid deep learning architecture.

Comput Methods Programs Biomed

January 2025

Regional Institute of Ophthalmology, Indira Gandhi Institute of Medical Sciences, Patna, 800025, Bihar, India.

Supriya Suman Anil Kumar Tiwari Shreya Sachan Kuldeep Singh Seema Meena

Background And Objectives: Hypertensive Retinopathy (HR) is a retinal manifestation resulting from persistently elevated blood pressure. Severity grading of HR is essential for patient risk stratification, effective management, progression monitoring, timely intervention, and minimizing the risk of vision impairment. Computer-aided diagnosis and artificial intelligence (AI) systems play vital roles in the diagnosis and grading of HR.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!