SCALING STICK-BREAKING ATTENTION: AN EFFICIENT IMPLEMENTATION AND IN-DEPTH STUDYShawn TanSonglin Yanget al.2025ICLR 2025
Gated Linear Attention Transformers with Hardware-Efficient TrainingSonglin YangBailin Wanget al.2024ICML 2024