Talk

Taming the Wild West of Research Computing: How Policies Saved Us a Thousand Headaches

Abstract

Administering large Kubernetes clusters can be a daunting task, especially when dealing with a number of non-expert users. Issues like resource contention - especially when dealing with GPUs -, large volumes of support requests, and required interventions even for simple tasks are extremely common.

This talk describes our experience in dealing with these problems in a research environment and explains how Kyverno, along with Argo CD and Kueue helped us to automate tasks and enforce automatically best practices to improve quality of life for both users and administrators.