A checkpointing strategy for scalable recovery on distributed parallel systemsVijay K. NaikSamuel P. Midkiffet al.1997ACM/IEEE SC 1997