Performance implications of remote-only load balancing under adversarial traffic in dragonflies
Abstract
Dragonfly topologies are recent network designs that are considered one of the most promising interconnect options for Exascale systems. They offer a low diameter and low network cost, but do so at the expense of path diversity, which makes them vulnerable to certain adversarial traffic patterns. Indirect routing approaches can alleviate the performance degradation that these workloads experience. However, there are limits to the improvements that can be achieved using the indirect routing approach that is popular today, limits that are inherent to the Dragonfly topological structure. In this work, we explore these limits by providing a theoretical justification to why adversarial traffic patterns routed indirectly with an algorithm that perfectly distributes load across inter-Dragonfly-group links can still induce significant bottlenecks in the intra-group links. We equally provide estimations of the performance impact of these imbalances, as well as present a set of simulation based benchmarks that confirm the theoretical predictions for practical Dragonfly systems. Copyright © 2014 ACM.