(Invited) cross-layer resilience: Challenges, insights, and the road ahead
Abstract
Resilience to errors in the underlying hardware is a key design objective for a large class of computing systems, from embedded systems all the way to the cloud. Sources of hardware errors include radiation, circuit aging, variability induced by manufacturing and operating conditions, manufacturing test escapes, and early-life failures. Many publications have suggested that cross-layer resilience, where multiple error resilience techniques from different layers of the system stack cooperate to achieve cost-effective resilience, is essential for designing cost-effective resilient digital systems. This paper presents a comprehensive overview of crosslayer resilience by addressing fundamental cross-layer resilience questions, by summarizing insights derived from recent advances in cross-layer resilience research, and by discussing future crosslayer resilience challenges.