Publication
E\PCOS 2023
Invited talk

Deep neural network inference with a 64-core in-memory compute chip based on phase-change memory

Abstract

The need to repeatedly shuttle around synaptic weight values from memory to processing units has been a key source of energy inefficiency associated with hardware implementation of artificial neural networks [1]. Analog in-memory computing (AIMC) with spatially instantiated synaptic weights holds high promise to overcome this challenge, by performing matrix-vector multiplications (MVMs) directly within the network weights stored on a chip to execute an inference workload [2-4]. However, to achieve end-to-end improvements in latency and energy consumption, AIMC must be combined with on-chip digital operations and on-chip communication as a critical next step towards configurations in which a full inference workload is realized entirely on-chip. Moreover, it is highly desirable from an ease of network deployment perspective to achieve high MVM and inference accuracy without re-tuning the chip after programming the weights. To address those challenges, we designed and fabricated a multi-core AIMC chip in 14-nm complementary metal–oxide–semiconductor (CMOS) technology with backend-integrated phase-change memory (PCM) (see Fig. 1) [5]. The fully-integrated chip features 64 256x256 AIMC cores interconnected via an on-chip communication network. Each core has 256 integrated analog-to-digital converters, its own programming circuitry and a local digital processing unit to perform affine scaling and ReLU activation. One global digital processing unit in the middle of the chip implements long short-term memory (LSTM) activation functions and cell-state computation. In this talk, I will present our latest efforts in employing this chip for performing inference of deep neural networks. First, the PCM technology and computational unit-cell we use will be described. Next, experimental inference results on ResNet and LSTM networks will be presented, with all the computations associated with the weight layers and the activation functions implemented on-chip. Finally, I will present our open-source toolkit (https://analog-ai.mybluemix.net/) [6] to simulate inference and training of neural networks with AIMC.

Date

Publication

E\PCOS 2023

Authors

Topics

Share