Deep In-Memory Architectures for Machine Learning-Accuracy Versus Efficiency Trade-Offs
Abstract
In-memory architectures, in particular, the deep in-memory architecture (DIMA) has emerged as an attractive alternative to the traditional von Neumann (digital) architecture for realizing energy and latency-efficient machine learning systems in silicon. Multiple DIMA integrated circuit (IC) prototypes have demonstrated energy-delay product (EDP) gains of up to 100\times over a digital architecture. These EDP gains were achieved minimal or sometimes no loss in decision-making accuracy which is surprising given its intrinsic analog mixed-signal nature. This paper establishes models and methods to understand the fundamental energy-delay and accuracy trade-offs underlying DIMA by: 1) presenting silicon-validated energy, delay, and accuracy models; and 2) employing these to quantify DIMA's decision-level accuracy and to identify the most effective design parameters to maximize its EDP gains at a given level of accuracy. For example, it is shown that: 1) DIMA has the potential to realize between 21\times -To-1365\times gains; 2) its energy-per-decision is approximately 10\times lower at the same decision-making accuracy under most conditions; 3) its accuracy can always be improved by increasing the input vector dimension and/or by increasing the bitline swing; and 4) unlike the digital architecture, there are quantifiable conditions under which DIMA's accuracy is fundamentally limited due to noise.