Publication
SoCC 2024
Conference paper

Cloud-native Workflow Scheduling using a Hybrid Priority Rule, Dynamic Resource Allocation, and Dynamic Task Partition

Abstract

As cloud-native workflow orchestration tools become increasingly important for complex data science workloads, there is a growing need for more efficient scheduling. Existing cloud schedulers rely on basic heuristics and user choice for task partitioning for parallel computing, leading to underutilization of cluster resources and prolonged job completion times. To address this, we propose a novel workflow scheduling algorithm that leverages workflow characteristics to enhance resource utilization and reduce weighted job completion time. The algorithm combines three sub-algorithms, each reflecting a distinct aspect of the scheduling strategy: 1) Hybrid Maximum Children (MC) -Weighted Shortest Critical Path Time (WSCPT) rule alternates between two heuristics, MC and WSCPT, which prioritize jobs based on workflow structure and critical path, respectively. The choice between these heuristics is dynamically adjusted according to the cluster queue size, 2) Dynamic Resource Allocation (DRA), which dynamically adjusts the number of executors assigned to each workflow, and 3) Dynamic Task Partition (DTP), which autonomously determines the task parallelism level. We tested our algorithm with extensive experiments on various workflow types using Spark-imitated simulation. Our algorithm outperformed other schedulers, including learning-based models, by reducing 21-47% of the combined performance of average job completion time and makespan for unweighted workflows and reducing at least 50% of weighted job completion time for weighted workflows.