Analog-AI Hardware Accelerators for Low-Latency Transformer-Based Language Models (Invited)

G.W. Burr; H. Tsai; I. Boybat; W. Simon; J. Buchel; A. Vasilopoulos; P. Narayanan; A. Fasoli; K. Hosokawa; M. Le Gallo; M. Ishii; Y. Kohda; A. Okazaki; A. Chen; C. Mackin; E. Ferro; K. El Maghraoui; H. Benmeziane; T. Philicelli; C. Lammie; A. Friz; J. Luquin; S. Jain; A. Sebastian; V. Narayanan

doi:10.1109/CICC63670.2025.10983594

CICC 2025

Conference paper

13 Apr 2025

Analog-AI Hardware Accelerators for Low-Latency Transformer-Based Language Models (Invited)

View publication

Abstract

Analog Non-Volatile Memory-based accelerators offer highthroughput and energy-efficient Multiply-Accumulate operations for the large Fully-Connected layers that dominate Transformer-based Large Language Models (LLMs). We describe recent chip-demo and architectural efforts, quantify the unique benefits of Fully- (rather than Partially-) Weight-Stationary systems, and discuss factors affecting latency of token-processing and generation.

Paper