Smita Krishnaswamy, Stephen M. Plaza, et al.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
This paper describes a fault-tolerant communication scheme that facilitates near-optimal routing and broadcasting in hypercube computers subject to node failures. The concept of an unsafe node is introduced to identify fault-free nodes that may cause communication difficulties in faulty hypercubes. It is then shown that by only using “feasible” paths that try to avoid unsafe nodes, routing and broadcasting can be substantially simplified. A computationally efficient routing algorithm that uses local information is presented which can route a message via a path of length no greater than p + 2, where p is the minimum distance from the source to the destination, provided that not all nonfaulty nodes in the hypercube are unsafe. Broadcasting can be achieved under the same fault conditions with only one more time unit than the fault-free case. The problems posed by deadlock in faulty hypercubes are discussed, and deadlock-free implementations of the proposed communication schemes using store-and-forward, virtual cut-through, and wormhole techniques are presented. © 1992 IEEE
Smita Krishnaswamy, Stephen M. Plaza, et al.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Tze Chiang Lee, John P. Hayes
Journal of Parallel and Distributed Computing
Tze Chiang Lee, John P. Hayes
Journal of Parallel and Distributed Computing
Smita Krishnaswamy, Igor L. Markov, et al.
DAC 2009