Workshop

Architectural Benchmarking of Compute-in-Memory Systems

Abstract

Deep Neural Networks (DNNs) have demonstrated unparalleled capabilities in recent years for several applications, such as image processing, natural language understanding, and content generation. As DNNs have evolved over time – from convolutional neural networks, recurrent neural networks, to transformers and beyond – precision requirements, performance bottlenecks, and hardware design considerations have changed with the DNN characteristics. While Compute-in Memory (CIM) is a promising approach for accelerating the workhorse Multiply-Accumulate operations of DNNs, architecting future DNN systems goes well beyond CIM tile design, as macro-level efficiency may not necessarily translate into system level efficiency. Amdahl’s law cannot be ignored, causing auxiliary operations such as attention, layerNorm, etc. to become more important and nullifying tile efficiency gains. Von Neumann’s bottleneck could make this worse, as increasing DNN model sizes may preclude full stationarity and force weight movement. In this workshop, we focus on application benchmarking for CIM systems, translating from application requirements to circuit, device, architecture, and manufacturing requirements. Topics of interest include pipeline design and scheduling for CIM, data-transport topologies, architectural tools, and 3D approaches to address weight capacity requirements.