Deferred prefill for throughput maximization in LLM inference
- Moonmoon Mohanty
- Gautham Bolar
- et al.
- 2025
- EuroMLSys 2025
I am a Senior Research Scientist at IBM Research - India, where I am a part of the Sustainable Computing team, which I also lead.
Quantification, assessment, and optimization of carbon emissions in hybrid multicloud environments