About
Welcome to Yongchao He's homepage!
I’m a researcher and engineer working on LLM systems and AI infrastructure. I build systems that keep large models alive — and sometimes make them faster, from distributed training to LLM inference and all the little optimizations that help big things scale.
My work sits at the intersection of distributed systems, runtime design, and large-scale LLM training and inference, with a focus on system-level abstractions, performance trade-offs, and system–model co-design.
I earned my Ph.D. from the Institute for Interdisciplinary Information Sciences (IIIS, 交叉信息研究院) at Tsinghua University, where I was advised by Wei Xu (徐葳) and Wenfei Wu (吴文斐).
Now, I lead a research & develop group focused on AI infra — exploring how to make large-scale AI systems faster, cheaper, and a little less painful.
Occasional cat wrangler, full-time system tinkerer. Just another small node in the AI era.
Selected Publications
View All →L4: Low-Latency and Load-Balanced LLM Serving via Length-Aware Scheduling
Yitao Yuan, Chenqi Zhao, Bohan Zhao, Zane Cao, Yongchao He, Wenfei Wu
A Unified Sparse Attention via Multi-Granularity Compression
Siran Liu, Zane Cao, Yongchao He
SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference
Yongchao He, Bohan Zhao, Zheng Cao
HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding
Siran Liu, Yang Ye, Qianchao Zhu, Zane Cao, Yongchao He
A Generic Service to Provide In-Network Aggregation for Key-Value Streams
Yongchao He, Wenfei Wu, Yanfang Le, Ming Liu, ChonLam Lao
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
News
I founded and lead the AI Infrastructure R&D group at UbiQuant. ⚙️
Selected as the first member of UbiQuant’s CTO List, a high-potential technical leadership program. 🏆
I started my position at UbiQuant under the Wutong Program (梧桐计划), a premier Technical Talent Program. 🚀
I earned my Ph.D. in Computer Science 🎓
Our paper titled 'A Generic Service to Provide In-Network Aggregation for Key-Value Streams' received the Distinguished Paper Award at ASPLOS 2023. 🏆