Publications

A collection of my research work.

ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification

Liu Siran, Yongchao He

2026

Paper

L4: Low-Latency and Load-Balanced LLM Serving via Length-Aware Scheduling

Yitao Yuan, Chenqi Zhao, Bohan Zhao, Zane Cao, Yongchao He, Wenfei Wu

2025

Paper

A Unified Sparse Attention via Multi-Granularity Compression

Siran Liu, Zane Cao, Yongchao He

2025

Paper

SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving

Bohan Zhao, Zane Cao, Yongchao He

2025

Paper

MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training

Bohan Zhao, Guang Yang, Shuo Chen, Ruitao Liu, Tingrui Zhang, Yongchao He, Wei Xu

2025

Paper

SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference

Yongchao He, Bohan Zhao, Zheng Cao

2025

Paper

HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding

Siran Liu, Yang Ye, Qianchao Zhu, Zane Cao, Yongchao He

2025

Paper

RateSheriff: Multipath Flow-aware and Resource Efficient Rate Limiter Placement for Data Center Networks

Songshi Dou, Yongchao He, Sen Liu, Wenfei Wu, Zehua Guo

2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS) 2023

DOI Paper

A Generic Service to Provide In-Network Aggregation for Key-Value Streams

Yongchao He, Wenfei Wu, Yanfang Le, Ming Liu, ChonLam Lao

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 2023

DOI Paper

Consistent and Fine-Grained Rule Update with In-Network Control for Distributed Rate Limiting

Yongchao He, Wenfei Wu

2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS) 2022

DOI Paper

SFP: Service Function Chain Provision on Programmable Switches for Cloud Tenants

Hongyi Huang, Wenfei Wu, Yongchao He, Zehua Guo

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2022

DOI Paper

Scalable On-Switch Rate Limiters for the Cloud

Yongchao He, Wenfei Wu, Xuemin Wen, Haifeng Li, Yongqiang Yang

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications 2021

DOI Paper

NFD: Using Behavior Models to Develop Cross-Platform Network Functions

Hongyi Huang, Wenfei Wu, Yongchao He, Bangwen Deng, Ying Zhang, Yongqiang Xiong, Guo Chen, Yong Cui, Peng Cheng

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications 2021

DOI Paper

Fully Functional Rate Limiter Design on Programmable Hardware Switches

Yongchao He, Wenfei Wu

Proceedings of the ACM SIGCOMM 2019 Conference Posters and Demos 2019

DOI Paper

SpeedyBox: Low-Latency NFV Service Chains with Cross-NF Runtime Consolidation

Yimin Jiang, Yong Cui, Wenfei Wu, Zhe Xu, Jiahan Gu, K. K. Ramakrishnan, Yongchao He, Xuehai Qian

2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS) 2019

DOI Paper