Papers

LLM Serving

Distributed System

SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference, pdf
Yongchao He, Bohan Zhao, Zheng Cao

Speculative Decoding

HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding, pdf
Siran Liu, Yang Ye, Qianchao Zhu, Zheng Cao, Yongchao He

LLM Training

MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training, pdf
Bohan Zhao, Guang Yang, Shuo Chen, Ruitao Liu, Tingrui Zhang, Yongchao He, Wei Xu

Systems and Infrastructure

A generic service to provide in-network aggregation for key-value streams, ASPLOS’23 (Distinguished Paper Award), pdf
Yongchao He, Wenfei Wu, Yanfang Le, Ming Liu, ChonLam Lao