Swagath Venkataramani, Jungwook Choi, et al.
IEEE Micro
This paper presents the design and implementation of a compiler for a deep neural network accelerator that provides high performance and energy efficiency. The compiler allows deep learning frameworks, such as TensorFlow, to exploit the accelerator hardware by automatically creating data transfer code and outer loops around highly-tuned hand-crafted inner-loops for a wide range of neural network parameters. In other words, our compiler significantly reduces the development effort for deep learning libraries without sacrificing their performance. We have evaluated our prototype compiler to show that it can generate code for five most-critical deep learning operators with a comparative performance obtained from hand-tuned code.
Swagath Venkataramani, Jungwook Choi, et al.
IEEE Micro
Kazuaki Ishizaki, Ken Mizuno, et al.
Annual Haifa Experimental Systems Conference 2010
Kazuaki Ishizaki, Shahrokh Daijavad, et al.
PADTAD 2011
Kazuaki Ishizaki
Data + AI Summit 2021