Research lab for GPU-resident LLM inference loops: persistent kernels, sparse KV selection, tiered residency, speculative decode, and trace-driven scheduling.
runtime cuda kv-cache gpu-systems llm-inference speculative-decoding model-systems persistent-kernel mega-kernel
-
Updated
Jun 13, 2026 - Python