Fast matrix multiplication via compiler-only layered data reorganization and intrinsic loweringBraedy KuzmaIvan Korostelevet al.2023Software - Practice and Experience