Accelerating business analytics applications
Valentina Salapura, Tejas Karkhanis, et al.
HPCA 2012
This article introduces YaConv, a new algorithm to compute convolution using GEMM microkernels from a Basic Linear Algebra Subprograms library that is efficient for multiple CPU architectures. Previous approaches either create a copy of each image element for each filter element or reload these elements into cache for each GEMM call, leading to redundant instances of the image elements in cache. Instead, YaConv loads each image element once into the cache and maximizes the reuse of these elements. The output image is computed by scattering results of the GEMM microkernel calls to the correct locations in the output image. The main advantage of this new algorithm - which leads to better performance in comparison to the existing im2col approach on several architectures - is a more efficient use of the memory hierarchy. The experimental evaluation on convolutional layers from PyTorch, along with a parameterized study, indicates an average 24% speedup over im2col convolution. Increased performance comes as a result of 3× reduction in L3 cache accesses and 2× fewer branch instructions.
Valentina Salapura, Tejas Karkhanis, et al.
HPCA 2012
Mark S. Squillante, Yanyong Zhang, et al.
SIGMETRICS 2002
Hao Yu, I-Hsin Chung, et al.
ACM/IEEE SC 2006
Bronis R. de Supinski, Martin Schulz, et al.
International Journal of High Performance Computing Applications