Analog-AI Hardware Accelerators for low-latency Transformer-based Language Models (Invited)

Geoffrey Burr; Sidney Tsai; Irem Boybat-Kara; William Simon; Julian Büchel; A. Vasilopoulos; Pritish Narayanan; Andrea Fasoli; Kohji Hosokawa; Manuel Le Gallo; Masatoshi Ishii; Y. Kohda; Atsuya Okazaki; An Chen; Charles Mackin; Elena Ferro; Kaoutar El Maghraoui; Hadjer Benmeziane; Timothy Philip; Corey Liam Lammie; Alexander Friz; Jose Luquin; Shubham Jain; Abu Sebastian; Vijay Narayanan

CICC 2025

Invited talk

13 Apr 2025

Analog-AI Hardware Accelerators for low-latency Transformer-based Language Models (Invited)

Abstract

Analog Non-Volatile Memory-based accelerators offer high-throughput and energy-efficient Multiply-Accumulate operations for the large Fully-Connected layers that dominate Transformer-based Large Language Models (LLMs). We describe recent chip-demo and architectural efforts, quantify the unique benefits of Fully- (rather than Partially-) Weight-Stationary systems, and discuss factors affecting latency of token-processing and generation.

Conference paper