Activity–weight duality in feed-forward neural networks reveals two co-determinants for generalizationYu FengWei Zhanget al.2023Nature Machine Intelligence
The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minimaYu FengYuhai Tu2021PNAS
Phases of learning dynamics in artificial neural networks in the absence or presence of mislabeled dataYu FengYuhai Tu2021Machine Learning: Science and Tech.