Shubham Jain, Hsinyu Tsai, et al.
IEEE Transactions on VLSI Systems
Artificial intelligence unit (AIU) is a specialized accelerator card from IBM offering state-of-the-art compute capabilities (hundreds of tera-operations) through dataflow-driven compute arrays attached to a multilevel hierarchy of distributed memory elements. In mapping entire AI models, functional correctness hinges on maintaining dataflow compatibility between producer-consumer operations, i.e., the element organization with which a tensor is produced in memory must match the organization expected by the consumer(s). This paper presents a key component in AIU's compiler stack, DNN Data-Shuffler (DnnDaSher), a systematic framework to analyze such dataflow incompatibilities and invoke an intermediate operation to shuffle tensor elements within and/or across memory elements to resolve the discrepancy. It targets opportunities to eliminate shuffles and increase granularity of memory accesses. Compared to well-optimized baseline implementations of four Convolutional Neural Networks and Transformer benchmarks, DNNDaSher achieves 1.27× -4.12× - (average 2.3× ) end-to-end latency improvement based on measured execution cycles on the AIU.
Shubham Jain, Hsinyu Tsai, et al.
IEEE Transactions on VLSI Systems
Monodeep Kar, Joel Silberman, et al.
ISSCC 2024
Sanchari Sen, Swagath Venkataramani, et al.
ISLPED 2021
Sarada Krithivasan, Sanchari Sen, et al.
ISLPED 2019