Real-time container integrity monitoring for large-scale kubernetes cluster
Abstract
Container integrity monitoring is defined as a key requirement for regulatory compliance such as PCI-DSS, in which any unexpected changes such as file updates or program runs must be logged for later audit. System call monitoring provides comprehensive monitoring of such change events on container since it may suffer from large amount of false alarms unless well-defined allowlist rules are coordinated before deploying a container. Defining such a comprehensive allowlist is not feasible especially when managing various kinds of application workloads in large-scale enterprise cluster. We propose a new approach for identifying real anomalies in system call events effectively without relying on any predefined allowlist configuration in this paper. Our novel filtering algorithm based on the knowledge acquired autonomously from Kubernetes cluster control plane reduces 99.999% of noise effectively and distills only abnormal events in real time. Furthermore, we define concrete criteria for highly-scalable container integrity monitoring and verify the implementation of proposing filtering method that has actual high scalability while maintaining its detection capability. Our experiment with real applications on around 3,800 containers demonstrates its effectiveness even on large-scale clusters, and we clarified how detected events are triggered by user operation.