Kaoutar El Maghraoui, Gokul Kandiraju, et al.
WOSP/SIPEW 2010
Federated learning enables collaborative training of a model while keeping the training data decentralized and private. However, in IoT systems, inherent heterogeneity in processing power, communication bandwidth, and task size can significantly hinder the efficient training of large models. Such heterogeneity would render vast variations in the training time of clients, lengthening overall training and wasting resources of faster clients. To tackle these heterogeneity challenges, we propose Dynamic Tiering-based Federated Learning (DTFL), a novel system that leverages distributed optimization principles to improve edge learning performance. Based on clients' resources, DTFL dynamically offloads part of the global model to the server, alleviating resource constraints on slower clients and speeding up training.By leveraging Split Learning, DTFL offloads different portions of the global model to clients in different tiers and enables each client to update the models in parallel via local-loss-based training. This helps reduce the computation and communication demand on resource-constrained devices, mitigating the straggler problem. DTFL introduces a dynamic tier scheduler that uses tier profiling to estimate the expected training time of each client based on their historical training time, communication speed, and dataset size. The dynamic tier scheduler assigns clients to suitable tiers to minimize the overall training time in each round. We theoretically prove the convergence properties of DTFL and validate its effectiveness by training large models (ResNet-56 and ResNet-110) across varying numbers of clients (from 10 to 200) using popular image datasets (CIFAR-10, CIFAR-100, CINIC-10, and HAM10000) under both IID and non-IID systems. DTFL seamlessly integrates various privacy measures without sacrificing performance. Extensive experimental results show that compared with state-of-the-art FL methods, DTFL can significantly reduce the training time by up to 80% while maintaining model accuracy.
Kaoutar El Maghraoui, Gokul Kandiraju, et al.
WOSP/SIPEW 2010
Yao Qi, Raja Das, et al.
ISSTA 2009
S. Sattanathan, N.C. Narendra, et al.
CONTEXT 2005
Minkyong Kim, Zhen Liu, et al.
INFOCOM 2008