Model quantization can reduce the model size and computational latency, it has been successfully applied for many applications of mobile phones, embedded devices, and smart chips. Mixed-precision quantization models can match different bit precision according to the sensitivity of different layers to achieve great performance. However, it is difficult to quickly determine the quantization bit precision of each layer in deep neural networks under some constraints (for example, hardware resources, energy consumption, model size, and computational latency). In this article, a novel sequential single-path search (SSPS) method for mixed-precision model quantization is proposed, in which some given constraints are introduced to guide the searching process. A single-path search cell is proposed to combine a fully differentiable supernet, which can be optimized by gradient-based algorithms. Moreover, we sequentially determine the candidate precisions according to the selection certainties to exponentially reduce the search space and speed up the convergence of the searching process. Experiments show that our method can efficiently search the mixed-precision models for different architectures (for example, ResNet-20, 18, 34, 50, and MobileNet-V2) and datasets (for example, CIFAR-10, ImageNet, and COCO) under given constraints, and our experimental results verify that SSPS significantly outperforms their uniform-precision counterparts.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TCYB.2022.3164285 | DOI Listing |
IEEE Trans Pattern Anal Mach Intell
May 2024
Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks. However, modeling global correlations with multi-head self-attention (MSA) layers leads to two widely recognized issues: the massive computational resource consumption and the lack of intrinsic inductive bias for modeling local visual patterns. To solve both issues, we devise a simple yet effective method named Single-Path Vision Transformer pruning (SPViT), to efficiently and automatically compress the pre-trained ViTs into compact models with proper locality added.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
November 2024
Modeling the architecture search process on a supernet and applying a differentiable method to find the importance of architecture are among the leading tools for differentiable neural architectures search (DARTS). One fundamental problem in DARTS is how to discretize or select a single-path architecture from the pretrained one-shot architecture. Previous approaches mainly exploit heuristic or progressive search methods for discretization and selection, which are not efficient and easily trapped by local optimizations.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
October 2023
Network pruning and quantization are proven to be effective ways for deep model compression. To obtain a highly compact model, most methods first perform network pruning and then conduct quantization based on the pruned model. However, this strategy may ignore that the pruning and quantization would affect each other and thus performing them separately may lead to sub-optimal performance.
View Article and Find Full Text PDFModel quantization can reduce the model size and computational latency, it has been successfully applied for many applications of mobile phones, embedded devices, and smart chips. Mixed-precision quantization models can match different bit precision according to the sensitivity of different layers to achieve great performance. However, it is difficult to quickly determine the quantization bit precision of each layer in deep neural networks under some constraints (for example, hardware resources, energy consumption, model size, and computational latency).
View Article and Find Full Text PDFSensors (Basel)
March 2022
Research Institute of Electronics, Shizuoka University, Shizuoka 422-8011, Japan.
Multi-path interference causes depth errors in indirect time-of-flight (ToF) cameras. In this paper, resolving multi-path interference caused by surface reflections using a multi-tap macro-pixel computational CMOS image sensor is demonstrated. The imaging area is implemented by an array of macro-pixels composed of four subpixels embodied by a four-tap lateral electric field charge modulator (LEFM).
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!