SafeCOMM: Investigating Safety Degradation in Fine-Tuned Telecom Large Language Models

Aladin Djuhera; Swanand Ravindra Kadhe; Farhan Ahmed; Syed Zawad; Fernando Koch; Walid Saad; Holger Boche

NeurIPS 2025

Workshop paper

02 Dec 2025

SafeCOMM: Investigating Safety Degradation in Fine-Tuned Telecom Large Language Models

Abstract

Fine-tuning large language models (LLMs) on telecom datasets is a common practice to adapt general-purpose models to the telecom domain. However, little attention has been paid to how this process may compromise model safety. Recent research has shown that even benign fine-tuning can degrade the safety alignment of LLMs, causing them to respond to harmful or unethical user queries. In this paper, we investigate this issue for fine-tuning LLMs using three representative datasets featured by the GenAINet initiative, and show that safety degradation occurs even after fine-tuning with seemingly harmless telecom data. We further extend our analysis to publicly available TeleLLMs continually pre-trained on telecom corpora, revealing that safety alignment is often severely lacking, primarily due to the omission of safety-focused instruction tuning. To address these issues, we evaluate three safety realignment defenses (SafeInstruct, SafeLoRA, and SafeMERGE) using established red-teaming benchmarks. The results show that, across all settings, the proposed defenses can effectively restore safety without compromising downstream task performance, leading to Safe teleCOMMunication (SafeCOMM) models. Our work serves as a diagnostic study and practical guide for safety realignment in telecom-tuned LLMs, emphasizing the importance of safety-aware instruction and fine-tuning for real-world deployments of telecom LLMs.

Conference paper