Backend / AI Infrastructure Engineer
-
MLIS — Built a distributed AI platform with control/data plane separation, admission control, workload isolation, and resource-aware scheduling for production ML workloads.
-
AttnForge — Built and benchmarked custom CUDA attention kernels for LLM prefill and decode, integrated them into a lightweight inference runtime, and established a validated baseline against PyTorch SDPA on NVIDIA H100 across context lengths 512-4096.
-
applied-ai-lab — Organized a hands-on applied AI seminar repository covering prompting, tools, agent workflows, and RAG patterns across four topic areas.
-
Ecommerce_Recommender_System — Built an e-commerce recommender system with feature engineering and multiple ranking/modeling approaches, including LR, GBDT, and wide/deep methods on large-scale user behavior data.

