Reducing global reductions in large-scale distributed training
Guojing Cong, Chih-Chieh Yang, et al.
ICPP 2019
High productivity is critical in harnessing the power of high-performance computing systems to solve science and engineering problems. It is a challenge to bridge the gap between the hardware complexity and the software limitations. Despite significant progress in programming language, compiler, and performance tools, tuning an application remains largely a manual task, and is done mostly by experts. In this paper, we propose a systematic approach toward automated performance analysis and tuning that we expect to improve the productivity of performance debugging significantly. Our approach seeks to build a framework that facilitates the combination of expert knowledge, compiler techniques, and performance research for performance diagnosis and solution discovery. With our framework, once a diagnosis and tuning strategy has been developed, it can be stored in an open and extensible database and thus be reused in the future. We demonstrate the effectiveness of our approach through the automated performance analysis and tuning of two scientific applications. We show that the tuning process is highly automated, and the performance improvement is significant.
Guojing Cong, Chih-Chieh Yang, et al.
ICPP 2019
Seetharami Seelam, I-Hsin Chung, et al.
IPDPS 2010
Guojing Cong, Huifang Wen
CF 2013
I-Hsin Chung, Carlos H.A. Costa, et al.
CIT 2016