Papers

Papers

LLM Serving

Distributed System
  • SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference, pdf

    Yongchao He, Bohan Zhao, Zheng Cao

Speculative Decoding
  • HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding, pdf

    Siran Liu, Yang Ye, Qianchao Zhu, Zheng Cao, Yongchao He

LLM Training

  • MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training, pdf

    Bohan Zhao, Guang Yang, Shuo Chen, Ruitao Liu, Tingrui Zhang, Yongchao He, Wei Xu

Systems and Infrastructure

  • A generic service to provide in-network aggregation for key-value streams, ASPLOS’23 (Distinguished Paper Award), pdf

    Yongchao He, Wenfei Wu, Yanfang Le, Ming Liu, ChonLam Lao

Full List