BRAVO: Balanced Reliability-Aware Voltage Optimization
Abstract
Defining a processor micro-architecture for a targeted productspace involves multi-dimensional optimization across performance, power and reliability axes. A key decision in sucha definition process is the circuit-and technology-driven parameterof the nominal (voltage, frequency) operating point. This is a challenging task, since optimizing individually orpair-wise amongst these metrics usually results in a designthat falls short of the specification in at least one of the threedimensions. Aided by academic research, industry has nowadopted early-stage definition methodologies that considerboth energy-and performance-related metrics. Reliabilityrelatedenhancements, on the other hand, tend to get factoredin via a separate thread of activity. This task is typically pursuedwithout thorough pre-silicon quantifications of the energyor even the performance cost. In the late-CMOS designera, reliability needs to move from a post-silicon afterthoughtor validation-only effort to a pre-silicon definitionprocess. In this paper, we present BRAVO, a methodologyfor such reliability-aware design space exploration. BRAVOis supported by a multi-core simulation framework that integratesperformance, power and reliability modeling capability. Errors induced by both soft and hard fault incidence arecaptured within the reliability models. We introduce the notionof the Balanced Reliability Metric (BRM), that we useto evaluate overall reliability of the processor across soft andhard error incidences. We demonstrate up to 79% improvementin reliability in terms of this metric, for only a 6% dropin overall energy efficiency over design points that maximizeenergy efficiency. We also demonstrate several real-life usecaseapplications of BRAVO in an industrial setting.