Hardware thread-level speculation performance analysis
Ying-Chieh Wang, I-Hsin Chung, et al.
HPCC-ICESS-CSS 2015
Barrier synchronization, an essential mechanism for a block of threads to guard data consistency, is regarded as a threat to performance. This study, however, provides a different viewpoint for barrier synchronization on GPUs: adding barrier synchronization, even when functionally unnecessary, can improve the performance of some memory-intensive applications. We explain this phenomenon using a memory contention model in which artificial barrier synchronization helps reduce memory contention and preserve data access locality. To yield practical applications, we identify a program pattern: artificial barrier synchronization can be used to synchronize the memory accesses when the data locality among threads is violated. Empirical results from three real-world applications demonstrate that artificial barrier synchronization can increase performance by 10 to 20 percent. © 2014 IEEE.
Ying-Chieh Wang, I-Hsin Chung, et al.
HPCC-ICESS-CSS 2015
Che-Rung Lee, Shih-Hsiang Lo, et al.
CCGrid 2012
Ying-Chieh Wang, Che-Rung Lee, et al.
IPDPSW 2014
Che-Rung Lee, I-Hsin Chung, et al.
IPDPS 2010