Publication
IEEE Micro
Paper

DNNDaSher: A Compiler Framework for Dataflow Compatible End-to-End Acceleration on IBM AIU

View publication

Abstract

Artificial intelligence unit (AIU) is a specialized accelerator card from IBM offering state-of-the-art compute capabilities (hundreds of tera-operations) through dataflow-driven compute arrays attached to a multilevel hierarchy of distributed memory elements. In mapping entire AI models, functional correctness hinges on maintaining dataflow compatibility between producer-consumer operations, i.e., the element organization with which a tensor is produced in memory must match the organization expected by the consumer(s). This paper presents a key component in AIU's compiler stack, DNN Data-Shuffler (DnnDaSher), a systematic framework to analyze such dataflow incompatibilities and invoke an intermediate operation to shuffle tensor elements within and/or across memory elements to resolve the discrepancy. It targets opportunities to eliminate shuffles and increase granularity of memory accesses. Compared to well-optimized baseline implementations of four Convolutional Neural Networks and Transformer benchmarks, DNNDaSher achieves 1.27× -4.12× - (average 2.3× ) end-to-end latency improvement based on measured execution cycles on the AIU.

Date

Publication

IEEE Micro