LRMP: Layer Replication with Mixed Precision for spatial in-memory DNN accelerators.

Abinand Nallathambi Christin David Bose Wilfried Haensch Anand Raghunathan

Front Artif Intell

Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United States.

Published: October 2024

In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained IMC accelerators. LRMP uses a combination of reinforcement learning and mixed integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.6-9.3× latency and 8-18× throughput improvement at minimal (<1%) degradation in accuracy.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486753	PMC
http://dx.doi.org/10.3389/frai.2024.1268317	DOI Listing

Publication Analysis

Top Keywords

layer replication

replication mixed

mixed precision

imc accelerators

lrmp

lrmp layer

precision spatial

spatial in-memory

in-memory dnn

dnn accelerators

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered