IBM at KubeCon + CloudNativeCon NA 2025

About

IBM is proud to sponsor KubeCon + CloudNativeCon North America 2025.

The Cloud Native Computing Foundation’s flagship conference brings together adopters and technologists from leading open source and cloud native communities.

Be a part of the conversation as CNCF Graduated, Incubating, and Sandbox Projects unite for four days of collaboration, learning, and innovation to drive the future of cloud native computing.

Agenda

  • Description:

    Boaz Michaely, Red Hat & Adi Sosnovich, IBM Research

    Kubernetes networking by default is Malicious Actors’ heaven.

    Why? Because by default, any pod can send and receive traffic to and from any other pod, ignoring namespace and privilege boundaries. External traffic in both directions is allowed as well, as far as Kubernetes is concerned.

    Indeed, best practices rightfully dictate that this default be modified, using "Kubernetes Network Policies" .

    Yet most teams find this too difficult to implement. Authoring NetworkPolicy YAML is very challenging. Baseline/AdminNetworkPolicy fills a gap for cluster administrators, but authoring these policies and understanding their impact is a new, additional challenge. Furthermore, policy authors may not know what the application’s communication needs are.

    What if there was a way to automatically produce tight network policy rules, in YAML, and see the impact of applied B/ANP network policies?

    Join this session to see the magic yourself, and learn how you can leverage this technology today!

  • Description:

    Maia Iyer, Alan Cha & Mariusz Sabath, IBM Research; Anjali Telang & Andrew Block, Red Hat

    Agentic platforms are redefining how cloud-native applications interact—but behind every action lies a critical question: who is allowed to do what, and why? Emerging standards such as MCP allow AI agents to easily connect with tools, but organizations looking to support agents must maintain security and transparency. They can do so by combining the power of OAuth 2.0 with strongly attested workload identity from SPIFFE.

    In this hands-on workshop, we’ll dive into the mechanics of secure workload identity for agents and tools—no prior experience required. Attendees will work hands-on with a working agentic stack, including MCP for agentic tool-calling, and integrate with cloud-native tools such as SPIRE for workload identity, and Keycloak for user management. These existing technologies are key for enabling granular access control and rich audit trails across the full agentic flow. This workshop lays the foundations to building identity-first, zero-trust agentic platforms.

  • Description:

    Moderated by Katie Norton; Alex Zenla, Edera; Jason Hall, Chainguard; Jon Ceanfaglione, IBM

    After a decade of "microservices all the things," the industry is experiencing a fascinating recalibration. Organizations that rushed to decompose monoliths are now grappling with distributed system complexity, operational overhead, and the cognitive load on development teams. This panel explores how modern organizations are making more intentional architectural choices and evolving their approach to software consumption and deployment.

    This panel will cover:

    • From "microservices by default" to "complexity-aware" architectural decisions
    • The hidden costs of distributed systems: network calls, data consistency, observability overhead
    • Why some organizations are consolidating services or building "modular monoliths"
    • The rise of platform engineering as a response to operational complexity
    • Shifting from "move fast and break things" to sustainable velocity
  • Description:

    Blaine Gardner, IBM

    The Rook project will be introduced to attendees of all levels and experience. Rook is an open source cloud-native storage operator for Kubernetes, providing the platform, framework, and support for Ceph to natively integrate with Kubernetes. The panel will discuss various scenarios to show how Rook configures Ceph to provide stable block, shared file system, and object storage for your production data. Rook was accepted as a graduated project by the Cloud Native Computing Foundation in October 2020.

  • Description:

    Ryan Jarvinen, Red Hat & Daniel Oh, IBM 

    Running Java applications in Kubernetes brings a set of performance expectations: fast startup, low memory usage, and efficient container images. This session is a hands-on walkthrough of tools and techniques to help meet those goals. You'll learn how to use Jib to build lean container images, accelerate cold starts with GraalVM native image compilation, and improve runtime responsiveness with Class Data Sharing (CDS) and Coordinated Restore at Checkpoint (CRaC). We'll dive into real-world configuration examples, discuss trade-offs, and demonstrate how to combine these tools to boost performance in Kubernetes-native Java workloads.

  • Description:

    Sunyanan Choochotkaew, IBM Research & John Belamaric, Google

    Divvying up a network card using Kubernetes is really hard to do. If you need to spin up virtual interfaces on top of a NIC, limit their bandwidth, and hand them out to different Pods, you will have a rough time.

    Come find out how the Kubernetes project will make sharing network hardware just as easy as sharing node CPU and memory! And networking is just the initial use case - this functionality can work with any device. Being able to sub-divide devices will really improve utilization of your pricey hardware.

    In this talk, we detail a new way to request resources from attached devices like NICs, GPUs, and DPUs. Building on the recently released Device Resource Allocation (DRA), this feature performs on-demand provisioning based on resource requests, allowing a physical device to be independently shared among Pods multiple times. It extends K8s multi-tenancy to the sub-device level. We’ll dive deep and explore real-world use cases, under the hood details, and future extensions.

  • Description:

    Daniel Oh & Kevin Dubois, IBM

    This session delves into the critical aspects of developing production-ready Large Language Model (LLM) applications using Java. We'll explore how to leverage Java's strengths to build scalable and efficient LLM systems, addressing key challenges such as performance optimization, resource management, and seamless integration with existing infrastructures.

    Attendees will gain practical knowledge on handling massive datasets, optimizing model inference, and fine-tuning LLMs for optimal performance. We'll discuss strategies for ensuring the reliability and scalability of your LLM deployments, empowering you to create robust and high-performing AI applications. Whether you're a seasoned Java developer or new to the AI domain, this session will provide valuable insights and guidance for your LLM development journey, equipping you with the tools and knowledge to navigate the complexities of building production-grade LLM systems.

  • Description:

    Maroon Ayoub, IBM & Michey Mehta, Red Hat

    Kubernetes excels at stateless service routing - but modern AI workloads are not stateless. Generative workloads demand context-aware routing that maximizes performance while reducing costs.

    This talk explores layered routing strategies for stateful LLM workloads on Kubernetes - from round-robin to full KV-Cache-aware load balancing. We’ll explain when each level applies, and its effects on performance.

    Based on our experience developing llm-d - a framework using the K8s Gateway API Inference Extension, a collaboration between Google, IBM Research, and RedHat - we’ll cover: - Why traditional Kubernetes routing falls short for generative AI - Routing patterns for long-context, sessionful traffic - Global cache indices and local offloading for smart routing - Benchmarks showing latency, cache hit rates, and GPU utilization - Practical ways to adopt cache-aware routing without major infra changes

    If you’re scaling multi-turn, agentic, or LLM-powered workloads, this session is for you.

  • Description:

    Jing Chen, IBM Research; Junchen Jiang, University of Chicago; Ganesh Kudleppanavar, NVIDIA; Samuel Monson, Red Hat; Jason Kramberger, Google

    As organizations deploy LLMs as distributed stacks in production Kubernetes environments, optimizing inference performance has been critical. This collaborative tutorial brings together experts from Google, NVIDIA, RedHat, IBM, and University of Chicago (LMCache) to provide practical benchmarking techniques for impactful LLM optimization strategies.

    Using identified use cases as examples, we'll show how to benchmark key optimization strategies: KV Cache offloading, autoscaling, prefix/session-aware routing, KVCache-aware routing, and xPyD for prefill decode disaggregation. Attendees will learn a unified benchmarking approach integrating tools including vLLM, LMBenchmark, GuideLLM, GenAIperf, inference-perf, and fmperf. Through live demonstrations, participants gain hands-on experience with production-tested methodologies reflecting real-world scenarios. Attendees will be equipped to implement these approaches for data-driven LLM serving optimizations on Kubernetes.

  • Description:

    Prajakta Kashalkar-Joshi & Socheat Sou, IBM 

    When working on a cloud-native product that is an aggregate of separate products, each with their own cadence, clear communication is critical to smooth integration. Ideally, each product team wants to know when the other related products have published a new release and if it's ready to be integrated with their product. It becomes an exponential, logistical nightmare to have each team subscribe to notifications of each other team. Using an "integration repository" helps solve both the business and technical needs of product integration and currency. In this talk, learn how the Fusion DevOps team turned the pull-request into a mechanism for streamlining team handoffs, notifying the appropriate focals, and clearly defining boundaries of responsibilities between teams.

Upcoming events