Papers
LLM Serving
Distributed System
SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference, pdf
Yongchao He, Bohan Zhao, Zheng Cao
Speculative Decoding
HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding, pdf
Siran Liu, Yang Ye, Qianchao Zhu, Zheng Cao, Yongchao He
LLM Training
MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training, pdf
Bohan Zhao, Guang Yang, Shuo Chen, Ruitao Liu, Tingrui Zhang, Yongchao He, Wei Xu
Systems and Infrastructure
A generic service to provide in-network aggregation for key-value streams, ASPLOS’23 (Distinguished Paper Award), pdf
Yongchao He, Wenfei Wu, Yanfang Le, Ming Liu, ChonLam Lao