Publications

A collection of my research work.

Legend: # Co-First Author,* Corresponding Author

FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving

2026

Qingxiu Liu,Yongchao He * ,Hanser Jiang,Zion Wang,Alan Zhao,Patrick P. C. Lee

PreprintPaper

Umap: Revisiting Memory-mapped I/O on Distributed File Systems for Efficient Matrix Access

2026

Yongchao He,Guangyan Zhang,Zane Cao,Wenfei Wu

OSDI 2026

ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification

2026

Liu Siran,Yongchao He *

PreprintPaper

L4: Low-Latency and Load-Balanced LLM Serving via Length-Aware Scheduling

2025

Yitao Yuan,Chenqi Zhao,Bohan Zhao,Zane Cao,Yongchao He * ,Wenfei Wu *

PreprintPaper

A Unified Sparse Attention via Multi-Granularity Compression

2025

Siran Liu,Zane Cao,Yongchao He *

PreprintPaper

SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving

2025

Bohan Zhao,Zane Cao,Yongchao He

PreprintPaper

MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training

2025

Bohan Zhao,Guang Yang,Shuo Chen,Ruitao Liu,Tingrui Zhang,Yongchao He,Wei Xu

Technical ReportPaper

SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference

2025

Yongchao He,Bohan Zhao,Zheng Cao

PreprintPaper

HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding

2025

Siran Liu,Yang Ye,Qianchao Zhu,Zane Cao,Yongchao He *

PreprintPaper

RateSheriff: Multipath Flow-aware and Resource Efficient Rate Limiter Placement for Data Center Networks

2023

Songshi Dou,Yongchao He,Sen Liu,Wenfei Wu,Zehua Guo

IWQoS 2023DOIPaper

A Generic Service to Provide In-Network Aggregation for Key-Value Streams

2023

Yongchao He,Wenfei Wu * ,Yanfang Le,Ming Liu,ChonLam Lao

ASPLOS 2023DOIPaper

Consistent and Fine-Grained Rule Update with In-Network Control for Distributed Rate Limiting

2022

Yongchao He,Wenfei Wu *

IWQoS 2022DOIPaper

SFP: Service Function Chain Provision on Programmable Switches for Cloud Tenants

2022

Hongyi Huang,Wenfei Wu * ,Yongchao He,Zehua Guo

IPDPS 2022DOIPaper

Scalable On-Switch Rate Limiters for the Cloud

2021

Yongchao He,Wenfei Wu * ,Xuemin Wen,Haifeng Li,Yongqiang Yang

INFOCOM 2021DOIPaper

NFD: Using Behavior Models to Develop Cross-Platform Network Functions

2021

Hongyi Huang,Wenfei Wu * ,Yongchao He,Bangwen Deng,Ying Zhang,Yongqiang Xiong,Guo Chen,Yong Cui,Peng Cheng

INFOCOM 2021DOIPaper

Fully Functional Rate Limiter Design on Programmable Hardware Switches

2019

Yongchao He,Wenfei Wu *

SIGCOMM 2019DOIPaper

SpeedyBox: Low-Latency NFV Service Chains with Cross-NF Runtime Consolidation

2019

Yimin Jiang,Yong Cui,Wenfei Wu,Zhe Xu,Jiahan Gu,K. K. Ramakrishnan,Yongchao He,Xuehai Qian

ICDCS 2019DOIPaper