Transfer learning has attracted considerable attention in medical image analysis because of the limited number of annotated 3-D medical datasets available for training data-driven deep learning models in the real world. We propose Medical Transformer, a novel transfer learning framework that effectively models 3-D volumetric images as a sequence of 2-D image slices. To improve the high-level representation in 3-D-form empowering spatial relations, we use a multiview approach that leverages information from three planes of the 3-D volume, while providing parameter-efficient training. For building a source model generally applicable to various tasks, we pretrain the model using self-supervised learning (SSL) for masked encoding vector prediction as a proxy task, using a large-scale normal, healthy brain magnetic resonance imaging (MRI) dataset. Our pretrained model is evaluated on three downstream tasks: 1) brain disease diagnosis; 2) brain age prediction; and 3) brain tumor segmentation, which are widely studied in brain MRI research. Experimental results demonstrate that our Medical Transformer outperforms the state-of-the-art (SOTA) transfer learning methods, efficiently reducing the number of parameters by up to approximately 92% for classification and regression tasks and 97% for segmentation task, and it also achieves good performance in scenarios where only partial training samples are used.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2023.3308712DOI Listing

Publication Analysis

Top Keywords

medical transformer
12
transfer learning
12
brain mri
8
brain
6
medical
5
learning
5
transformer universal
4
universal encoder
4
3-d
4
encoder 3-d
4

Similar Publications

While radiation hazards induced by cone-beam computed tomography (CBCT) in image-guided radiotherapy (IGRT) can be reduced by sparse-view sampling, the image quality is inevitably degraded. We propose a deep learning-based multi-view projection synthesis (DLMPS) approach to improve the quality of sparse-view low-dose CBCT images. In the proposed DLMPS approach, linear interpolation was first applied to sparse-view projections and the projections were rearranged into sinograms; these sinograms were processed with a sinogram restoration model and then rearranged back into projections.

View Article and Find Full Text PDF

Lead-Grouped Multi-Stage Learning for Myocardial Infarction Localization.

Methods

January 2025

School of Design, Hunan University, Changsha, 410082, China. Electronic address:

The electrocardiogram (ECG) is a ubiquitous medical diagnostic tool employed to localize myocardial infarction (MI) that is characterized by abnormal waveform patterns on the ECG. MI is a serious cardiovascular disease, and accurate, timely diagnosis is crucial for preventing severe outcomes. Current ECG analysis methods mainly rely on intra- and inter-lead feature extraction, but most models overlook the medical knowledge relevant to disease diagnosis.

View Article and Find Full Text PDF

A novel hybrid ViT-LSTM model with explainable AI for brain stroke detection and classification in CT images: A case study of Rajshahi region.

Comput Biol Med

January 2025

Department of Biomedical Engineering, Islamic University, Kushtia, 7003, Bangladesh; Bio-Imaging Research Laboratory, Islamic University, Kushtia, 7003, Bangladesh. Electronic address:

Computed tomography (CT) scans play a key role in the diagnosis of stroke, a leading cause of morbidity and mortality worldwide. However, interpreting these scans is often challenging, necessitating automated solutions for timely and accurate diagnosis. This research proposed a novel hybrid model that integrates a Vision Transformer (ViT) and a Long Short Term Memory (LSTM) to accurately detect and classify stroke characteristics using CT images.

View Article and Find Full Text PDF

Purpose: The purpose of this study was to develop a deep learning approach that restores artifact-laden optical coherence tomography (OCT) scans and predicts functional loss on the 24-2 Humphrey Visual Field (HVF) test.

Methods: This cross-sectional, retrospective study used 1674 visual field (VF)-OCT pairs from 951 eyes for training and 429 pairs from 345 eyes for testing. Peripapillary retinal nerve fiber layer (RNFL) thickness map artifacts were corrected using a generative diffusion model.

View Article and Find Full Text PDF

Aims: This study evaluates the performance of OpenAI's latest large language model (LLM), Chat Generative Pre-trained Transformer-4o, on the Adult Clinical Cardiology Self-Assessment Program (ACCSAP).

Methods And Results: Chat Generative Pre-trained Transformer-4o was tested on 639 ACCSAP questions, excluding 45 questions containing video clips, resulting in 594 questions for analysis. The questions included a mix of text-based and static image-based [electrocardiogram (ECG), angiogram, computed tomography (CT) scan, and echocardiogram] formats.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!