Architecture of a fault-tolerant cached RAID controller
Jai Menon, Jim Cortney
ISCA 1993
A software RAID is a RAID implemented purely in software running on a host computer. One problem with software RAIDs is that they do not have access to special hardware such as NVRAM. Thus, software RAIDs may need to check every parity group of an array for consistency following a host crash or power failure. This process of checking parity groups is called recovery, and results in long delays when the software RAID is restarted. In this paper, we review two algorithms to reduce this recovery time for software RAIDs: the PGS Bitmap algorithm we proposed in [5] and the List Algorithm proposed in [1]. We compare the performance of these two algorithms using trace-driven simulations. Our results show that the PGS Bitmap Algorithm can reduce recovery time by a factor of 12 with a response time penalty of less than 1%, or by a factor of 50 with a response time penalty of less than 2%, and a memory requirement of around 9 Kbytes. The List Algorithm can reduce recovery time by a factor of 50 but cannot achieve a response time penalty of less than 16%.
Jai Menon, Jim Cortney
ISCA 1993
Jai Menon
VLDB 1986
M. Blaum, Jim Brady, et al.
ISCA 1994
Jai Menon
HPDC 1995