Workshop paperTowards Pareto Optimal Throughput in Small Language Model ServingPol G. Recasens, Yue Zhu, et al.EuroSys 2024
Conference paperA distributed architecture for fast SGD sequence discriminative training of DNN acoustic modelsGeorge SaonSLT 2014
PaperBasis scaling and double pruning for efficient inference in network-based transfer learningKen C.L. Wong, Satyananda Kashyap, et al.Pattern Recognition Letters