Accelerating Inference of Convolutional Neural Networks Using In-memory Computing.

Martino Dazzi Abu Sebastian Luca Benini Evangelos Eleftheriou

Front Comput Neurosci

IBM Research Europe, Rüschlikon, Zurich, Switzerland.

Published: August 2021

In-memory computing (IMC) is a non-von Neumann paradigm that has recently established itself as a promising approach for energy-efficient, high throughput hardware for deep learning applications. One prominent application of IMC is that of performing matrix-vector multiplication in time complexity by mapping the synaptic weights of a neural-network layer to the devices of an IMC core. However, because of the significantly different pattern of execution compared to previous computational paradigms, IMC requires a rethinking of the architectural design choices made when designing deep-learning hardware. In this work, we focus on application-specific, IMC hardware for inference of Convolution Neural Networks (CNNs), and provide methodologies for implementing the various architectural components of the IMC core. Specifically, we present methods for mapping synaptic weights and activations on the memory structures and give evidence of the various trade-offs therein, such as the one between on-chip memory requirements and execution latency. Lastly, we show how to employ these methods to implement a pipelined dataflow that offers throughput and latency beyond state-of-the-art for image classification tasks.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369825	PMC
http://dx.doi.org/10.3389/fncom.2021.674154	DOI Listing

Publication Analysis

Top Keywords

neural networks

in-memory computing

mapping synaptic

synaptic weights

imc core

imc

accelerating inference

inference convolutional

convolutional neural

networks in-memory

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!