Task assignment optimization in geographically distributed data centers
Abstract
Recent advance in geo-distributed systems has made distributed data processing possible, where tasks are decomposed into subtasks, deployed into multiple data centers and run in parallel. Compared to conventional approaches that process every task in a single datacenter resulting in high latency and large data aggregation, the geo-distributed cloud systems provide a highly available and more economic platform. However, distributed application (task) execution introduces extra cost and latency as data need to be exchanged between data centers. In addition, task dependency and diverse task constraints make it even more challenging to choose an appropriate task assignment strategy. In this paper, we discuss a task assignment problem in geographically distributed cloud systems. In light of growing demand from big data processing and storage, we consider data intensive tasks where a task often requires significant computing resources and its input data typically located in multiple data centers. By taking the distributed input, task dependency, heterogeneous pricing scheme, and resource constraints into account, we aim to optimize the performance when deploying tasks in geo-graphically distributed data centers. A heuristic algorithm is presented to provide an approximate solution to the proposed NP-hard problem. We perform an extensive simulation study to evaluate the performance of our solution under various settings. The simulation results demonstrate that our approach can outperform the state-of-the-art strategies, and achieve significant reduction in cost and latency.