Larry Carvalho, Anca Sailer, et al.
KubeCon EU 2025
Foundation models have revolutionized natural language processing and computer vision, yet their potential for tabular data, particularly in telecommunications, remains underexplored. This paper demonstrates the application of foundation models to large-scale telco drive test data, achieving up to 17 points improvement in in regression tasks over state-of-the-art methods like XGBoost. A central focus is transforming numeric telco data into tokens, enabling meaningful embeddings. Unlike NLP and computer vision, the telco domain lacks pre-trained models, necessitating training from scratch to capture domain-specific patterns. We also detail data preprocessing and sequence conversion techniques tailored for foundation models, as well as the trade-offs of various numeric binning methods (e.g., cut, qcut, Lloyd-Max quantization) affecting data balance and token frequency. Beyond KPI prediction, we demonstrate the ability of the foundation model as a network optimization simulator, offering significant advantages over manual tuning. Our results show that foundation models excel on large datasets with millions of rows and high categorical complexity, consistently outperforming XGBoost, which remains more effective on simpler datasets. Furthermore, foundation models achieve up to 75-point gains in in public tabular datasets, underscoring their versatility for complex, high-dimensional data challenges.
Larry Carvalho, Anca Sailer, et al.
KubeCon EU 2025
Antonio Martınez Ibarra, Julian James Stephen, et al.
ICDCS 2025
Matthew Arnold, Jeffrey Boston, et al.
MLSys 2020
Genady Ya. Grabarnik, Filippo Poltronieri, et al.
CASCON 2023