Over the past few years, 2-D convolutional neural networks (CNNs) have demonstrated their great success in a wide range of 2-D computer vision applications, such as image classification and object detection. At the same time, 3-D CNNs, as a variant of 2-D CNNs, have shown their excellent ability to analyze 3-D data, such as video and geometric data. However, the heavy algorithmic complexity of 2-D and 3-D CNNs imposes a substantial overhead over the speed of these networks, which limits their deployment in real-life applications. Although various domain-specific accelerators have been proposed to address this challenge, most of them only focus on accelerating 2-D CNNs, without considering their computational efficiency on 3-D CNNs. In this article, we propose a unified hardware architecture to accelerate both 2-D and 3-D CNNs with high hardware efficiency. Our experiments demonstrate that the proposed accelerator can achieve up to 92.4% and 85.2% multiply-accumulate efficiency on 2-D and 3-D CNNs, respectively. To improve the hardware performance, we propose a hardware-friendly quantization approach called static block floating point (BFP), which eliminates the frequent representation conversions required in traditional dynamic BFP arithmetic. Comparing with the integer linear quantization using zero-point, the static BFP quantization can decrease the logic resource consumption of the convolutional kernel design by nearly 50% on a field-programmable gate array (FPGA). Without time-consuming retraining, the proposed static BFP quantization is able to quantize the precision to 8-bit mantissa with negligible accuracy loss. As different CNNs on our reconfigurable system require different hardware and software parameters to achieve optimal hardware performance and accuracy, we also propose an automatic tool for parameter optimization. Based on our hardware design and optimization, we demonstrate that the proposed accelerator can achieve 3.8-5.6 times higher energy efficiency than graphics processing unit (GPU) implementation. Comparing with the state-of-the-art FPGA-based accelerators, our design achieves higher generality and up to 1.4-2.2 times higher resource efficiency on both 2-D and 3-D CNNs.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2021.3116302 | DOI Listing |
View (Beijing)
October 2024
Department of Bioengineering, UCLA, California, 90095, USA.
Light-sheet fluorescence microscopy (LSFM) introduces fast scanning of biological phenomena with deep photon penetration and minimal phototoxicity. This advancement represents a significant shift in 3-D imaging of large-scale biological tissues and 4-D (space + time) imaging of small live animals. The large data associated with LSFM requires efficient imaging acquisition and analysis with the use of artificial intelligence (AI)/machine learning (ML) algorithms.
View Article and Find Full Text PDFJ Imaging Inform Med
December 2024
Department of Radiology, Mayo Clinic, Rochester, MN, USA.
IEEE Trans Neural Netw Learn Syst
February 2024
Future frame prediction is a challenging task in computer vision with practical applications in areas such as video generation, autonomous driving, and robotics. Traditional recurrent neural networks have limited effectiveness in capturing long-range dependencies between frames, and combining convolutional neural networks (CNNs) with recurrent networks has limitations in modeling complex dependencies. Generative adversarial networks have shown promising results, but they are computationally expensive and suffer from instability during training.
View Article and Find Full Text PDFIEEE Trans Ultrason Ferroelectr Freq Control
March 2024
This study presents a deep-learning (DL) methodology using 3-D convolutional neural networks (CNNs) to detect defects in carbon fiber-reinforced polymer (CFRP) composites through volumetric ultrasonic testing (UT) data. Acquiring large amounts of ultrasonic training data experimentally is expensive and time-consuming. To address this issue, a synthetic data generation method was extended to incorporate volumetric data.
View Article and Find Full Text PDFFront Plant Sci
December 2023
School of Information Engineering, Henan Institute of Science and Technology, Xinxiang, China.
Introduction: Efficient and accurate varietal classification of wheat grains is crucial for maintaining varietal purity and reducing susceptibility to pests and diseases, thereby enhancing crop yield. Traditional manual and machine learning methods for wheat grain identification often suffer from inefficiencies and the use of large models. In this study, we propose a novel classification and recognition model called SCGNet, designed for rapid and efficient wheat grain classification.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!