-
PH.D. student at Sun Yat-sen university
-
AI Infra, MLSys, Simulaters, GPU architecture
- [2026/01/13] [LMSYS blog] EPD Disaggregation: Elastic Encoder Scaling for Vision-Language Models in SGLang
- [2025/06/27] [arXiv] [Code] gLLM is accepted by SC'25. Congratulations!
- [2025/05/28] [arXiv] [Code] EFIM is accepted by Euro-Par'25
- [2025/04/27] [arXiv] [Code] We have released gLLM, an efficient pipeline parallelism inference engine for LLM.
- [Model] Support Unlimited OCR
- Fix the order of _free_encoder_inputs
- Remove unused EVS functions in qwen3_vl.py
- Support online use_audio_in_video
- [Bugfix] Fix benchmark_moe.py
- [Benchmark] Refactor sample_requests in benchmark_throughput
- [Bugfix] fix automatic prefix args and add log info
- [Minor Fix] Fix comments in benchmark_serving
- [Minor Fix] Remove unused code in benchmark_prefix_caching.py





