Energy Efficiency Boost in the AI-Infused POWER10 Processor
Brian Thompto, Dq Nguyen, et al.
ISCA 2021
In this paper, we describe the IBM POWER8™ cache, interconnect, memory, and input/output subsystems, collectively referred to as the "nest. " This paper focuses on the enhancements made to the nest to achieve balanced and scalable designs, ranging from small 12-core single-socket systems, up to large 16-processor-socket, 192-core enterprise rack servers. A key aspect of the design has been increasing the end-to-end data and coherence bandwidth of the system, now featuring more than twice the bandwidth of the POWER7® processor. The paper describes the new memory-buffer chip, called Centaur, providing up to 128 MB of eDRAM (embedded dynamic random-access memory) buffer cache per processor, along with an improved DRAM (dynamic random-access memory) scheduler with support for prefetch and write optimizations, providing industry-leading memory bandwidth combined with low memory latency. It also describes new coherence-transport enhancements and the transition to directly integrated PCIe® (PCI Express®) support, as well as additions to the cache subsystem to support higher levels of virtualization and scalability including snoop filtering and cache sharing.
Brian Thompto, Dq Nguyen, et al.
ISCA 2021
Dieter F. Wendel, Ron Kalla, et al.
IEEE Journal of Solid-State Circuits
Balaram Sinharoy, R. Swanberg, et al.
IBM J. Res. Dev
Bulent Abali, Bart Blaner, et al.
ISCA 2020