Augmenting overlay trees for failure resiliency
Abstract
Overlay trees typically use directed trees as efficient structures for disseminating information, but their single-path structure means that just one node failure results in the disconnection of all descendants, possibly a significant portion of the graph. The addition of extra "backup" links to a directed tree can provide alternate data paths that significantly reduce the number of nodes disconnected when some set of nodes are removed from the graph. We investigate several deterministic and randomized algorithms for adding such backup links to a directed tree and analyze the connectedness of the resulting graphs when nodes in the network fail with some random probability. We present closed-form approximations and simulation measurements for the connectivity of these augmented trees in networks ranging from hundreds to hundreds of thousands of nodes. We also identify and measure the costs of adding backup links, using simulations and real-world measurements from overlays constructed using PlanetLab latency data. We find that, with node failure rates up to 10%, deterministic backup link selection policies offer comparable resiliency to random backup links with significantly lower overhead and resource usage. © 2004 IEEE.