Kaoutar El Maghraoui, Gokul Kandiraju, et al.
WOSP/SIPEW 2010
Recovering from OS crashes has traditionally been done using reboot or checkpoint-restart mechanisms. Such techniques either fail to preserve the state before the crash happens or require modifications to applications. To eliminate these problems, we present a novel OS-hyper visor infrastructure for automated OS crash diagnosis and recovery in virtual servers. Our approach uses a small hidden OS-repair-image that is dynamically created from the healthy running OS instance. Upon an OS crash, the hyper visor automatically loads this repair-image to perform diagnosis and repair. The offending process is then quarantined, and the fixed OS automatically resumes running without a reboot. Our experimental evaluations demonstrated that it takes less than 3 seconds to recover from an OS crash. This approach can significantly reduce the downtime and maintenance costs in data centers. This is the first design and implementation of an OS-hyper visor combo capable of automatically resurrecting a crashed commercial server-OS. © 2012 IEEE.
Kaoutar El Maghraoui, Gokul Kandiraju, et al.
WOSP/SIPEW 2010
Priya Nagpurkar, William Horn, et al.
IISWC 2008
Joefon Jann, Niteesh Dubey, et al.
IBM J. Res. Dev
Gabriel Tanase, Gheorghe Almási, et al.
SBAC-PAD 2012