IBM at 2025 Symposium on VLSI Technology and Circuits

About

The VLSI Symposium is a premier international conference on semiconductor technology and circuits, held from June 8th to 12th, 2025 in Kyoto, Japan. It brings together technologists and circuit designers at one venue. It offers an opportunity to interact and synergize on topics spanning the range from process technology to system-on-chip.

Why attend

We look forward to meeting you at the event and telling you more about our latest work. Our team will be presenting a series of papers in the main conference, short course lectures and workshop lectures within the conference.

Agenda

  • Abstract High Performance Computing (HPC) systems and AI accelerators, while having different computing characteristics and applications, share a common need for higher performance, improved power efficiency, and flexible scalability. Chiplet and heterogeneous integration are gaining attention in both areas as technologies to meet these requirements. Chiplets optimize and partition functional blocks to manufacture chips, which improves chip yield and enables modularization of designs. On the other hand, heterogeneous integration, which integrates these separate chips in a single package at high density, enables flexible functional configurations by integrating heterogeneous devices and contributes to future system scalability. This presentation will start with the background and focus on the progress of 3D integration, which is the core of heterogeneous integration, and review the process, implementation issues, and experimental results of hybrid bonding technology, which enables higher density and lower power in die-to-die interconnections in particular.

    Speaker: Katsuyuki Sakuma

  • Abstract To achieve system-level benefits, compute-in-memory tiles need to be integrated into heterogeneous architectures alongside general and application-specific digital compute cores, together with a high-bandwidth and reconfigurable on-chip routing fabric that can deliver the right vectors to the right locations for just-in-time DNN compute. In the first part of my talk, I will review some of IBM’s work in developing weight-stationary analog compute cores with a focus on the design choices and optimizations for high tile efficiency. I will then provide a brief introduction to heterogeneous architectures for CIM systems followed by architectural studies of DNNs identifying auxiliary operations that bottleneck the performance. Finally, I will highlight the issue of achieving true weight-stationarity in large models such as Mixture-of-Expert (MoE) Transformer models, and the system-level benefits that such an architecture can achieve.

    Speaker: Pritish Narayanan

  • Abstract The data-intensive and highly parallel compute demands of AI models have driven the integration of specialized Neural Processing Units (NPUs) into System-on-Chip devices for edge AI applications. Analog In-Memory Computing (AIMC) offers a promising approach by co-locating memory and computation, enabling notable energy efficiency improvements. This talk will present an embedded NPU architecture for deep learning inference, tailored to meet the stringent energy, area, and cost constraints of edge AI. The heterogeneous architecture combines digital and analog accelerator nodes to support diverse operation types and precision requirements. AIMC tiles leveraging Phase-Change Memory (PCM) are employed for energy-efficient matrix-vector multiplications while supporting a high non-volatile on-chip weight capacity. Complementing this, a digital data path and programmable software cluster provide flexibility and enable end-to-end inference across multiple precision levels. The discussion will also address the challenge of preserving high accuracy in AIMC-based acceleration, focusing on offline training techniques and efficient mapping strategies.

    Speaker: Irem Boybat

  • Abstract The advent of large language models and generative AI has ushered enormous demand for hardware accelerators to perform AI training, fine-tuning, and inference. The design of such accelerators depends on holistic optimization of technology, circuits, and systems, but also fundamentally upon the models and use cases that this hardware needs to serve. Achieving the proper balance of compute vs. communication to optimize latency and throughput in AI workloads will require tradeoffs across the hardware/software stack to reconcile the long development cycles needed to build chips and systems with the torrid pace of innovation in AI models and algorithms. This talk will provide an overview of the landscape for AI hardware accelerators and discuss research roadmaps to improve both compute efficiency and communication bandwidth, particularly as Generative AI evolves towards Agentic AI and smaller, fit-for-purpose models.

    LC
    Leland Chang
    Leland Chang
    Principal Research Staff Member & Sr Manager, AI Hardware Design
    IBM Research

Upcoming events