sn74
Here are 5 public repositories matching this topic...
Reproducible MoE inference benchmarks for RTX Spark and RTX 5090: flash decode, grouped GEMM, end-to-end generation
-
Updated
Jun 22, 2026 - Python
Sync-free MoE dispatch engine with CUDA-graph-safe routing for Qwen3.5-35B and Gemma4 on RTX Spark and RTX 5090
-
Updated
Jun 22, 2026 - C++
NCU-driven autonomous kernel optimization agent: profile → identify bottleneck → propose variant → compile → benchmark
-
Updated
Jun 22, 2026 - Python
Edge AI inference runtime: scheduler, memory manager, CUDA graph engine, KV cache, MoE dispatch
-
Updated
Jun 22, 2026 - C++
Improve this page
Add a description, image, and links to the sn74 topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the sn74 topic, visit your repo's landing page and select "manage topics."