Large language models (LLMs), with their remarkable generative capacities, have greatly impacted a range of fields, but they face scalability challenges due to their large parameter counts, which result in high costs for training and inference. The trend of increasing model sizes is exacerbating these challenges, particularly in terms of memory footprint, latency and energy consumption. Here we explore the deployment of 'mixture of experts' (MoEs) networks-networks that use conditional computing to keep computational demands low despite having many parameters-on three-dimensional (3D) non-volatile memory (NVM)-based analog in-memory computing (AIMC) hardware.
View Article and Find Full Text PDFAnalog in-memory computing-a promising approach for energy-efficient acceleration of deep learning workloads-computes matrix-vector multiplications but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable inference accuracy. Here, we develop an hardware-aware retraining approach to systematically examine the accuracy of analog in-memory computing across multiple network topologies, and investigate sensitivity and robustness to a broad set of nonidealities.
View Article and Find Full Text PDFModels of artificial intelligence (AI) that have billions of parameters can achieve high accuracy across a range of tasks, but they exacerbate the poor energy efficiency of conventional general-purpose processors, such as graphics processing units or central processing units. Analog in-memory computing (analog-AI) can provide better energy efficiency by performing matrix-vector multiplications in parallel on 'memory tiles'. However, analog-AI has yet to demonstrate software-equivalent (SW) accuracy on models that require many such tiles and efficient communication of neural-network activations between the tiles.
View Article and Find Full Text PDFAtlantic salmon (Salmo salar) in Northeastern US and Eastern Canada has high economic value for the sport fishing and aquaculture industries. Large differences exist between the genomes of Atlantic salmon of European origin and North American (N.A.
View Article and Find Full Text PDFAnalogue memory-based deep neural networks provide energy-efficiency and per-area throughput gains relative to state-of-the-art digital counterparts such as graphics processing units. Recent advances focus largely on hardware-aware algorithmic training and improvements to circuits, architectures, and memory devices. Optimal translation of software-trained weights into analogue hardware weights-given the plethora of complex memory non-idealities-represents an equally important task.
View Article and Find Full Text PDF