Asymmetry in Low-Rank Adapters of Foundation Models

Jiacheng Zhu; Kristjan Greenewald; Kimia Nadjahi; Haitz Saez De Ocariz Borde; Rickard Gabrielsson; Leshem Choshen; Marzyeh Ghassemi; Mikhail Yurochkin; Justin Solomon

ICLR 2024

Workshop paper

07 May 2024

Asymmetry in Low-Rank Adapters of Foundation Models

Abstract

Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by an effort to investigate the different roles of LoRA matrices during fine-tuning, this paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. Specifically, when updating the parameter matrices of a neural network by adding a product $\textit{BA}$ , we observe that the $\textit{A}$ and $\textit{B}$ matrices have distinct functions: $\textit{A}$ extracts features from the input, while $\textit{B}$ uses these features to create the desired output. Based on this observation, we demonstrate that fine-tuning $\textit{B}$ is inherently more effective than fine-tuning $\textit{A}$ and that a random untrained $\textit{A}$ should perform nearly as well as a fine-tuned one. Using an information-theoretic lens, we also bound generalization of low-rank adapters, showing that the parameter savings of exclusively training $\textit{B}$ improves the bound. We support our conclusions with experiments on RoBERTa, BART, LLaMA-2, and ViT.