Projects

Toolchain built around Megatron-LM for Distributed Training

MegatronApp is a toolchain built around the Megatron-LM training framework, designed to give practitioners a suite of value-added capabilities such as performance tuning, slow-node detection, and training-process visualization.

InstantTensor

An Ultra-fast, Distributed Safetensors Loader

InstantTensor is an ultra-fast, distributed Safetensors loader designed to maximize I/O throughput when moving model weights from Safetensors files to GPU memory. It can be installed via pip and is natively supported in vLLM and SGLang.