Shuang Chen, Herbert Freeman
International Journal of Pattern Recognition and Artificial Intelligence
Large Language Models have transformed cloud computing, but their deployment presents a challenging trilemma between operational costs, energy consumption, and performance requirements. This keynote presents a novel open architecture that harmonizes multiple efficiency techniques to address these competing concerns. We examine critical optimization strategies including quantization, batching strategies, KV-caching, auto-scaling, model parallelisms, and specialized hardware accelerators—analyzing their individual strengths and compounding benefits when integrated as a cohesive system.
Shuang Chen, Herbert Freeman
International Journal of Pattern Recognition and Artificial Intelligence
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Wei Zhang, Timothy Wood, et al.
ICAC 2014
Chen-chia Chang, Wan-hsuan Lin, et al.
ICML 2025