PATer: A hardware prefetching automatic tuner on IBM POWER8 processor
Abstract
Hardware prefetching on IBM's latest POWER8 processor is able to improve performance of many applications significantly, but it can also cause performance loss for others. The IBM POWER8 processor provides one of the most sophisticated hardware prefetching designs which supports 225 different configurations. Obviously, it is a big challenge to find the optimal or near-optimal hardware prefetching configuration for a specific application. We present a dynamic prefetching tuning scheme in this paper, named prefetch automatic tuner (PATer). PATer uses a prediction model based on machine learning to dynamically tune the prefetch configuration based on the values of hardware performance monitoring counters (PMCs). By developing a two-phase prefetching selection algorithm and a prediction accuracy optimization algorithm in this tool, we identify a set of selected key hardware prefetch configurations that matter mostly to performance as well as a set of PMCs that maximize the machine learning prediction accuracy. We show that PATer is able to accelerate the execution of diverse workloads up to 1.4×.