Olivier Maher, N. Harnack, et al.
DRC 2023
A processor core is presented for AI training and inference products. Leading-edge compute efficiency is achieved for robust fp16 training via efficient heterogeneous 2-D systolic array-SIMD compute engines leveraging compact DLFloat16 FPUs. Architectural flexibility is maintained for very high compute utilization across neural network topologies. A modular dual-corelet architecture with a shared scratchpad and a software-controlled network/memory interface enables scalability to many-core SoCs and large-scale systems. The 14nm AI core achieves fp16 peak performance of 3.0 TFLOPS at 0.62V and 1.4 TFLOPS/W at 0.54V.
Olivier Maher, N. Harnack, et al.
DRC 2023
Sae Kyu Lee, Ankur Agrawal, et al.
IEEE JSSC
Tommaso Stecconi, Roberto Guido, et al.
Advanced Electronic Materials
Max Bloomfield, Amogh Wasti, et al.
ITherm 2025