Efficient topology reconstruction via machine learning based traffic patterns recognition in optically interconnected computing system
Abstract
The traffic flows in parallel computing systems show clustered, correlative nature and the flows are always latency-sensitive. These flows have been abstracted as 'Coflow' to pursue overall optimization. Concurrent Coflows on the network show very novel traffic patterns. On the other hand, multiple optical interconnection network architectures have been proposed to enable the traffic adaption topology reconstructions. Nevertheless, topology reconstruction strategies are application-agnostic, and their optimization objective of network performance cannot meet the Coflow demand. In order to exert the flexibility of optical topology to promote the performance of parallel computing application by Coflow acceleration, the traffic patterns are preferred to be well recognized and then an adaptive topology is generated accordingly. To avoid further complex, such recognition is expected to finish without prior knowledge from the application layer. Then, the topology should be reconstructed to minimize the Coflow completion time. To implement these procedures, we proposed a traffic pattern-aware topology reconstruction strategy. Our strategy first combines CNN and spectral clustering to realize the traffic patterns awareness. And then, the genetic searching algorithm is used to mind the proper topology. Based on real traffic trace from Facebook computing application, large-scale simulations have verified the efficiency of such a strategy by lowering the completion time of computing jobs. In addition, the experimental demonstration has confirmed the conclusions.