Publications
A collection of my research work.
Legend: # Co-First Author,* Corresponding Author
FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving
2026Qingxiu Liu,Yongchao He * ,Hanser Jiang,Zion Wang,Alan Zhao,Patrick P. C. Lee
PreprintPaper
Umap: Revisiting Memory-mapped I/O on Distributed File Systems for Efficient Matrix Access
2026Yongchao He,Guangyan Zhang,Zane Cao,Wenfei Wu
OSDI 2026
ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification
2026Liu Siran,Yongchao He *
PreprintPaper
L4: Low-Latency and Load-Balanced LLM Serving via Length-Aware Scheduling
2025Yitao Yuan,Chenqi Zhao,Bohan Zhao,Zane Cao,Yongchao He * ,Wenfei Wu *
PreprintPaper
A Unified Sparse Attention via Multi-Granularity Compression
2025Siran Liu,Zane Cao,Yongchao He *
PreprintPaper
SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving
2025Bohan Zhao,Zane Cao,Yongchao He
PreprintPaper
MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training
2025Bohan Zhao,Guang Yang,Shuo Chen,Ruitao Liu,Tingrui Zhang,Yongchao He,Wei Xu
Technical ReportPaper
SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference
2025Yongchao He,Bohan Zhao,Zheng Cao
PreprintPaper
HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding
2025Siran Liu,Yang Ye,Qianchao Zhu,Zane Cao,Yongchao He *
PreprintPaper