GSB: Group superposition binarization for vision transformer with limited training samples.

Neural Netw

State Key Laboratory of Internet of Things for Smart City (SKL-IOTSC), University of Macau, 999078, Macao Special Administrative Region of China; Department of Computer and Information Science (CIS), University of Macau, 999078, Macao Special Administrative Region of China; Department of Electromechanical Engineering (EME), University of Macau, 999078, Macao Special Administrative Region of China. Electronic address:

Published: April 2024

AI Article Synopsis

  • Vision Transformers (ViT) are effective in computer vision but struggle with overfitting and require significant computing resources, making them hard to deploy on less powerful devices.
  • Model binarization is proposed as a solution, simplifying computations by using binary operations instead of complex calculations, which can help reduce model size and computational demands.
  • The study introduces a new binarization approach called Group Superposition Binarization (GSB), addressing challenges in maintaining accuracy during the binarization of ViTs, and incorporates techniques like knowledge distillation to enhance performance and mitigate overfitting.

Article Abstract

Vision Transformer (ViT) has performed remarkably in various computer vision tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compression method, model binarization is potentially a good choice to solve the above problems. Compared with the full-precision one, the model with the binarization method replaces complex tensor multiplication with simple bit-wise binary operations and represents full-precision model parameters and activations with only 1-bit ones, which potentially solves the problem of model size and computational complexity, respectively. In this paper, we investigate a binarized ViT model. Empirically, we observe that the existing binarization technology designed for Convolutional Neural Networks (CNN) cannot migrate well to a ViT's binarization task. We also find that the decline of the accuracy of the binary ViT model is mainly due to the information loss of the Attention module and the Value vector. Therefore, we propose a novel model binarization technique, called Group Superposition Binarization (GSB), to deal with these issues. Furthermore, in order to further improve the performance of the binarization model, we have investigated the gradient calculation procedure in the binarization process and derived more proper gradient calculation equations for GSB to reduce the influence of gradient mismatch. Then, the knowledge distillation technique is introduced to alleviate the performance degradation caused by model binarization. Analytically, model binarization can limit the parameter's search space during parameter updates while training a model. Therefore, the binarization process can actually play an implicit regularization role and help solve the problem of overfitting in the case of insufficient training data. Experiments on three datasets with limited numbers of training samples demonstrate that the proposed GSB model achieves state-of-the-art performance among the binary quantization schemes and exceeds its full-precision counterpart on some indicators. Code and models are available at: https://github.com/IMRL/GSB-Vision-Transformer.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2024.106133DOI Listing

Publication Analysis

Top Keywords

model binarization
24
binarization
12
training samples
12
model
12
group superposition
8
superposition binarization
8
vision transformer
8
full-precision model
8
vit model
8
gradient calculation
8

Similar Publications

The advantages of lexicon-based sentiment analysis in an age of machine learning.

PLoS One

January 2025

Department of Political Science, Middlebury College, Middlebury, Vermont, United States of America.

Assessing whether texts are positive or negative-sentiment analysis-has wide-ranging applications across many disciplines. Automated approaches make it possible to code near unlimited quantities of texts rapidly, replicably, and with high accuracy. Compared to machine learning and large language model (LLM) approaches, lexicon-based methods may sacrifice some in performance, but in exchange they provide generalizability and domain independence, while crucially offering the possibility of identifying gradations in sentiment.

View Article and Find Full Text PDF

Background: Echocardiography is widely used to assess aortic stenosis (AS) but can yield inconsistent results, leading to uncertainty about AS severity and the need for further diagnostics. This retrospective study aimed to evaluate a novel echocardiography-based marker, the signal intensity coefficient (SIC), for its potential in accurately identifying and quantifying calcium in AS, enhancing noninvasive diagnostic methods.

Methods: Between May 2022 and October 2023, 112 cases of AS that were previously considered severe by echocardiography were retrospectively evaluated, as well as a group of 50 cases of mild or moderate AS, both at the Eastern Slovak Institute of Cardiovascular Diseases in Kosice, Slovakia.

View Article and Find Full Text PDF

Binary Transformer Based on the Alignment and Correction of Distribution.

Sensors (Basel)

December 2024

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China.

Transformer is a powerful model widely used in artificial intelligence applications. It contains complex structures and has extremely high computational requirements that are not suitable for embedded intelligent sensors with limited computational resources. The binary quantization technology takes up less memory space and has a faster calculation speed; however, it is seldom studied for the lightweight transformer.

View Article and Find Full Text PDF

Metastatic spine cancer can cause pain and neurological issues, making it challenging to distinguish from spinal compression fractures using magnetic resonance imaging (MRI). To improve diagnostic accuracy, this study developed artificial intelligence (AI) models to differentiate between metastatic spine cancer and spinal compression fractures in MRI images. MRI data from Gyeongsang National University Hospital, collected from January 2019 to April 2022, were processed using Otsu's binarization and Canny edge detection algorithms.

View Article and Find Full Text PDF

To assess the choroidal vessels in healthy eyes using a novel three-dimensional (3D) deep learning approach. In this cross-sectional retrospective study, swept-source OCT 6 × 6 mm scans on Plex Elite 9000 device were obtained. Automated segmentation of the choroidal layer was achieved using a deep-learning ResUNet model along with a volumetric smoothing approach.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!