Mixture of Shared Experts

Danila Seliayeu; Quinn Pham; Prasanth Chatarasi; Jose Nelson Amaral

CASCON 2024

Poster

11 Nov 2024

Mixture of Shared Experts

Abstract

Memory consumption is a key bottleneck in deploying large Mixture-of-Experts (MoE) transformer models, particularly on edge and resource-constrained devices. While MoE architectures improve compute efficiency through sparse activation of experts, the parameters across experts significantly increase the memory footprint. This work introduces MoSE (Mixture of Shared Experts), an exploratory study on reducing memory usage in MoE models through structured weight sharing among experts. Instead of maintaining fully independent expert parameters, MoSE simulates shared weights by pairing experts based on similarity metrics and replacing their parameters with averaged values, effectively emulating weight sharing without modifying the underlying framework.

Paper