Partha Pal, Paul Rubel, et al.
Software - Practice and Experience
A large number of distributed checkpointing protocols have appeared in the literature. However, to make informed decisions about which protocol performs best for a given environment, one must use an objective measure for comparing them. Obviously, a distributed checkpointing protocol could be the best in a specific environment, but not in another environment. This paper presents an objective measure, called overhead ratio, for evaluating distributed checkpointing protocols. This measure extends previous evaluation schemes by incorporating several additional parameters that are inherent in distributed environments. In particular, we take into account the rollback propagation of the protocol, which impacts the length of the recovery process, and therefore the expected program run-time in executions that involve failures and recoveries. Using the objective measure as an evaluation technique, the paper also analyses several known protocols and compares their overhead ratios. © 2007 Elsevier Ltd. All rights reserved.
Partha Pal, Paul Rubel, et al.
Software - Practice and Experience
Adnan Agbaria, Gidon Gershinsky, et al.
PerCom 2009
Ohad Eytan, Danny Harnik, et al.
HotStorage 2020
Iman Saleh, Adnan Agbaria, et al.
DIWANS 2006