Very efficient cyclic shifts on hypercubes
Abstract
Cyclic shifts are intrinsic operations in many parallel algorithms. Therefore, it is important to execute them efficiently. The authors present and analyze algorithms for the cyclic shift operation on n-dimensional (distributed memory) hypercubes. The first algorithm by S.L. Johnsson (1987) always uses shortest paths between hypercube nodes for routing. The authors prove that when using this algorithm, there is no node and link congestion in any communication step on synchronous hypercubes. Consequently, any cyclic shift can be implemented in n steps on such machines, without any local buffering of messages. They show that all previously known algorithms need local buffering, in the context of asynchronous hypercubes. In order to overcome this, they design an algorithm that always uses link-disjoint paths for routing. In this case, they prove that any cyclic shift can be realized by using at most 4/3n steps, without any local message buffers.