Estimating end-to-end performance by collaborative prediction with active sampling
Abstract
Accurately estimating end-to-end performance in distributed systems is essential both for monitoring compliance with service-level agreements (SLAs) and for performance optimization (e.g., choosing the highest-bandwidth server for a down-load request in a content-distribution system). Due to infeasibility of exhaustive pairwise measurements, a natural alternative is to predict unobserved end-to-end performances from available historic data, with minimal additional measurements. In this paper we present an approach to this based on Collaborative Prediction (CP), an estimation method designed to work with sparse data, that has enjoyed much success in other domains (e.g. product recommendation systems), and obviates the need for landmark nodes commonly assumed in other approaches. Specifically, we use Max-Margin Matrix Factorization (MMMF), a linear factor model for CP that has outperformed state-of-art CP techniques. Moreover, our approach readily admits active sampling based on prediction confidence, and we further propose a novel active-sampling CP approach yielding even higher predictive accuracy, while allowing a flexible trade-off between "exploration" (choosing suboptimal samples to improve estimation accuracy) and "exploitation" (choosing node with best estimated performance). We demonstrate successful empirical results on a variety of practical problems, including network latency prediction (NLANR-AMP, P2PSim and PlanetLab datasets) and bandwidth prediction in content-distribution systems (IBM's downloadGrid data). © 2007 IEEE.