Analog In-Memory Computing for Deep Neural Network Acceleration
Abstract
Multiply-accumulate (MAC) operations are at the core of Deep Neural Network (DNN) workloads. In-Memory Computing (IMC) enables hardware accelerators that achieve very high-throughput and energy-efficient MAC, thus tackling the issue of exploding computational costs in ever growing DNNs. In particular, non-volatile memory (NVM)-based analog accelerators materialize massively parallelized compute by leveraging Ohm’s law and Kirchhoff’s current law on arrays of resistive memory devices. Provided that weights are accurately programmed onto NVM devices and MAC operations are sufficiently linear, competitive end-to-end DNN accuracies can be achieved via this approach. In this presentation, we describe an analog IMC chip consisting of more than 35 million Phase-Change Memory devices, analog peripheral circuitry, and massive parallel routing to accelerate communication between inputs, outputs, and analog cores. We demonstrate the speed and power advantages of analog computing when applied to multiple DNN inference benchmarks, with tasks ranging from image classification to natural language processing, and show that high accuracy can be retained by a careful combination of materials, circuit, architecture, and operational choices.