Vision Transformer (ViT) has performed remarkably in various computer vision tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compression method, model binarization is potentially a good choice to solve the above problems. Compared with the full-precision one, the model with the binarization method replaces complex tensor multiplication with simple bit-wise binary operations and represents full-precision model parameters and activations with only 1-bit ones, which potentially solves the problem of model size and computational complexity, respectively. In this paper, we investigate a binarized ViT model. Empirically, we observe that the existing binarization technology designed for Convolutional Neural Networks (CNN) cannot migrate well to a ViT's binarization task. We also find that the decline of the accuracy of the binary ViT model is mainly due to the information loss of the Attention module and the Value vector. Therefore, we propose a novel model binarization technique, called Group Superposition Binarization (GSB), to deal with these issues. Furthermore, in order to further improve the performance of the binarization model, we have investigated the gradient calculation procedure in the binarization process and derived more proper gradient calculation equations for GSB to reduce the influence of gradient mismatch. Then, the knowledge distillation technique is introduced to alleviate the performance degradation caused by model binarization. Analytically, model binarization can limit the parameter's search space during parameter updates while training a model. Therefore, the binarization process can actually play an implicit regularization role and help solve the problem of overfitting in the case of insufficient training data. Experiments on three datasets with limited numbers of training samples demonstrate that the proposed GSB model achieves state-of-the-art performance among the binary quantization schemes and exceeds its full-precision counterpart on some indicators. Code and models are available at: https://github.com/IMRL/GSB-Vision-Transformer.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2024.106133 | DOI Listing |
PLoS One
January 2025
Department of Political Science, Middlebury College, Middlebury, Vermont, United States of America.
Assessing whether texts are positive or negative-sentiment analysis-has wide-ranging applications across many disciplines. Automated approaches make it possible to code near unlimited quantities of texts rapidly, replicably, and with high accuracy. Compared to machine learning and large language model (LLM) approaches, lexicon-based methods may sacrifice some in performance, but in exchange they provide generalizability and domain independence, while crucially offering the possibility of identifying gradations in sentiment.
View Article and Find Full Text PDFCardiovasc Diagn Ther
December 2024
East Slovak Institute of Cardiovascular Diseases and School of Medicine, Pavol Jozef Safarik University, Kosice, Slovakia.
Background: Echocardiography is widely used to assess aortic stenosis (AS) but can yield inconsistent results, leading to uncertainty about AS severity and the need for further diagnostics. This retrospective study aimed to evaluate a novel echocardiography-based marker, the signal intensity coefficient (SIC), for its potential in accurately identifying and quantifying calcium in AS, enhancing noninvasive diagnostic methods.
Methods: Between May 2022 and October 2023, 112 cases of AS that were previously considered severe by echocardiography were retrospectively evaluated, as well as a group of 50 cases of mild or moderate AS, both at the Eastern Slovak Institute of Cardiovascular Diseases in Kosice, Slovakia.
Sensors (Basel)
December 2024
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China.
Transformer is a powerful model widely used in artificial intelligence applications. It contains complex structures and has extremely high computational requirements that are not suitable for embedded intelligent sensors with limited computational resources. The binary quantization technology takes up less memory space and has a faster calculation speed; however, it is seldom studied for the lightweight transformer.
View Article and Find Full Text PDFBioengineering (Basel)
December 2024
Department of Orthopaedic Surgery, Institute of Medical Science, Gyeongsang National University College of Medicine and Gyeongsang National University Hospital, Jinju 52727, Republic of Korea.
Metastatic spine cancer can cause pain and neurological issues, making it challenging to distinguish from spinal compression fractures using magnetic resonance imaging (MRI). To improve diagnostic accuracy, this study developed artificial intelligence (AI) models to differentiate between metastatic spine cancer and spinal compression fractures in MRI images. MRI data from Gyeongsang National University Hospital, collected from January 2019 to April 2022, were processed using Otsu's binarization and Canny edge detection algorithms.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Ophthalmology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
To assess the choroidal vessels in healthy eyes using a novel three-dimensional (3D) deep learning approach. In this cross-sectional retrospective study, swept-source OCT 6 × 6 mm scans on Plex Elite 9000 device were obtained. Automated segmentation of the choroidal layer was achieved using a deep-learning ResUNet model along with a volumetric smoothing approach.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!