C.A. Micchelli, W.L. Miranker
Journal of the ACM
The increasing complexity of modern neural network architectures has led to a substantial rise in energy consumption during both training and inference, especially when relying on conventional CMOS hardware based on the von Neumann architecture. The continuous exchange of data between memory and processing units represents a major bottleneck, limiting both efficiency and speed in artificial neural network (ANN) computations. To address these challenges, specialized neuromorphic architectures leveraging analog memristor crossbar arrays have emerged as a promising alternative, offering improved energy efficiency and computational speed for AI workloads, by enabling some arithmetic and logic operations to be performed directly at the location where the data is stored [1].
Recent implementations of the analog in-memory computing (AIMC) paradigm have primarily focused on accelerating the inference step (i.e., the forward pass) of digitally trained deep neural networks (DNNs), even though the training phase is orders of magnitude more demanding in terms of time and energy costs. This is because analog training acceleration imposes even more stringent requirements on memristive devices. In addition to performing inference, the learning phase requires handling error backpropagation, gradient computation, and weight updates. Promising memristive technologies that could address these challenges include redox-based resistive switching memory (ReRAM) [2] and electrochemical random access memory (ECRAM) [3]. However, a unified analog in-memory technology platform—capable of on-chip training, weight retention, and long-term inference acceleration—has yet to be demonstrated.
This work fills this gap by demonstrating an all-in-one AI accelerator based on Conductive Metal-Oxide (CMO)/HfOx ReRAM technology, enabling the execution of forward and backward passes, along with weight updates and gradient computations, directly on a unified analog in-memory platform with O(1) time complexity [4]. The CMO/HfOx ReRAM devices are integrated in the BEOL of a NMOS transistor platform in a scalable 1T1R array architecture. The highly reproducible forming step demonstrates compatibility with NMOS rated for 3.3 V operation, while the uniform quasi-static cycling characteristics, achieved with voltage amplitudes of less than ± 1.5 V, exhibit a significant conductance window and a low off-state. The multi-bit capability of more than 32 states (5 bits) as well as the record-low programming noise ranging from 10 nS to 100 nS will be presented. Inference performance is validated through matrix-vector multiplication simulations on a 64×64 array, achieving a root-mean-square error improvement by a factor of 20 at 1 second and 3 at 10 years after programming, compared to state-of-the-art. Training accuracy closely matching the software equivalent is achieved across different datasets using the same technology.
The CMO/HfOx ReRAM technology lays the foundation for efficient analog systems accelerating both inference and training in deep neural networks.
References: [1] A. Sebastian, et al. (2017). “Temporal correlation detection using computational phase-change memory”. Nat Commun 8, 1115. https://doi.org/10.1038/s41467-017-01481-9 [2] F. Zahoor, et al. (2020). “Resistive random access memory (rram): an overview of materials, switching mechanism, performance, multilevel cell (mlc) storage, modeling, and applications”. https://doi.org/10.1186/s11671-020-03299-9. [3] J. Tang, et al. In Technical Digest - International Electron Devices Meeting. IEDM, volume 2018-December, ISSN 01631918, 2019 . 10.1109/IEDM.2018.8614551 [4] Falcone, D. F. et al. "All-in-One Analog AI Accelerator: On-Chip Training and Inference with Conductive-Metal-Oxide/HfOx ReRAM Devices" pre-print at arXiv.04524 (2025). https://arxiv.org/abs/2502.04524
C.A. Micchelli, W.L. Miranker
Journal of the ACM
Saurabh Paul, Christos Boutsidis, et al.
JMLR
Joxan Jaffar
Journal of the ACM
Kenneth L. Clarkson, Elad Hazan, et al.
Journal of the ACM