Short paper

Optimizing Telco Networks with Foundation Models: A Scalable Approach for Tabular Data

Abstract

Foundation models have revolutionized natural language processing and computer vision, yet their potential for tabular data, particularly in telecommunications, remains underexplored. This paper demonstrates the application of foundation models to large-scale telco drive test data, achieving up to 17 points improvement in R2R^2 in regression tasks over state-of-the-art methods like XGBoost. A central focus is transforming numeric telco data into tokens, enabling meaningful embeddings. Unlike NLP and computer vision, the telco domain lacks pre-trained models, necessitating training from scratch to capture domain-specific patterns. We also detail data preprocessing and sequence conversion techniques tailored for foundation models, as well as the trade-offs of various numeric binning methods (e.g., cut, qcut, Lloyd-Max quantization) affecting data balance and token frequency. Beyond KPI prediction, we demonstrate the ability of the foundation model as a network optimization simulator, offering significant advantages over manual tuning. Our results show that foundation models excel on large datasets with millions of rows and high categorical complexity, consistently outperforming XGBoost, which remains more effective on simpler datasets. Furthermore, foundation models achieve up to 75-point gains in R2R^2 in public tabular datasets, underscoring their versatility for complex, high-dimensional data challenges.

Related