Publications
A collection of my research work.
Legend: # Co-First Author,* Corresponding Author
SiDP: Memory-Efficient Data Parallelism for Offline LLM Inference
2026Alan Zhao,Yongchao He
A Unified Sparse Attention via Multi-Granularity Compression
2026Siran Liu,Zane Cao,Yongchao He *
ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification
2026Liu Siran,Yongchao He *
HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding
2026Siran Liu,Yang Ye,Qianchao Zhu,Zane Cao,Yongchao He *
FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving
2026Qingxiu Liu,Yongchao He * ,Hanser Jiang,Zion Wang,Alan Zhao,Patrick P. C. Lee
Umap: Revisiting Memory-mapped I/O on Distributed File Systems for Efficient Matrix Access
2026Yongchao He,Guangyan Zhang,Zane Cao,Wenfei Wu
L4: Low-Latency and Load-Balanced LLM Serving via Length-Aware Scheduling
2025Yitao Yuan,Chenqi Zhao,Bohan Zhao,Zane Cao,Yongchao He * ,Wenfei Wu *
SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving
2025Bohan Zhao,Zane Cao,Yongchao He
MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training
2025Bohan Zhao,Guang Yang,Shuo Chen,Ruitao Liu,Tingrui Zhang,Yongchao He,Wei Xu
SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference
2025Yongchao He,Bohan Zhao,Zheng Cao
RateSheriff: Multipath Flow-aware and Resource Efficient Rate Limiter Placement for Data Center Networks
2023Songshi Dou,Yongchao He,Sen Liu,Wenfei Wu,Zehua Guo
A Generic Service to Provide In-Network Aggregation for Key-Value Streams
2023Yongchao He,Wenfei Wu * ,Yanfang Le,Ming Liu,ChonLam Lao
Consistent and Fine-Grained Rule Update with In-Network Control for Distributed Rate Limiting
2022Yongchao He,Wenfei Wu *
SFP: Service Function Chain Provision on Programmable Switches for Cloud Tenants
2022Hongyi Huang,Wenfei Wu * ,Yongchao He,Zehua Guo
Scalable On-Switch Rate Limiters for the Cloud
2021Yongchao He,Wenfei Wu * ,Xuemin Wen,Haifeng Li,Yongqiang Yang
NFD: Using Behavior Models to Develop Cross-Platform Network Functions
2021Hongyi Huang,Wenfei Wu * ,Yongchao He,Bangwen Deng,Ying Zhang,Yongqiang Xiong,Guo Chen,Yong Cui,Peng Cheng
Fully Functional Rate Limiter Design on Programmable Hardware Switches
2019Yongchao He,Wenfei Wu *
SpeedyBox: Low-Latency NFV Service Chains with Cross-NF Runtime Consolidation
2019Yimin Jiang,Yong Cui,Wenfei Wu,Zhe Xu,Jiahan Gu,K. K. Ramakrishnan,Yongchao He,Xuehai Qian