A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-ExpertsMohammed Nowaz Rabbani ChowdhuryMeng Wanget al.2024ICML 2024
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional EncodingHongkang LiMeng Wanget al.2024ICML 2024
Asymmetry in Low-Rank Adapters of Foundation ModelsJiacheng ZhuKristjan Greenewaldet al.2024ICML 2024
Humans Linguistically Align to their Conversational Partners, and Language Models Should TooRachel OstrandSara Berger2024ICML 2024
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMsSwanand Ravindra KadheFarhan Ahmedet al.2024ICML 2024
A Multi-View Mixture-of-Experts based on Language and Graphs for Molecular Properties PredictionVictor ShirasunaEduardo Almeida Soareset al.2024ICML 2024