The Qx-coder
M.J. Slattery, Joan L. Mitchell
IBM J. Res. Dev
In response to the strong desire of customers to be provided with advance notice of unplanned outages, techniques were developed that detect the occurrence of software aging due to resource exhaustion, estimate the time remaining until the exhaustion reaches a critical level, and automatically perform proactive software rejuvination of an application, process group, or entire operating system. The resulting techniques are very general and can capture a multitude of cluster system characteristics, failure behavior, and performability measures.
M.J. Slattery, Joan L. Mitchell
IBM J. Res. Dev
Zohar Feldman, Avishai Mandelbaum
WSC 2010
Thomas R. Puzak, A. Hartstein, et al.
CF 2007
Joel L. Wolf, Mark S. Squillante, et al.
IEEE Transactions on Knowledge and Data Engineering