InquiringMinds-AI

InquiringMinds-AI

Popular repositories Loading

llama.cpp llama.cpp Public

Forked from ggml-org/llama.cpp

LLM inference in C/C++

C++ 2
llama-droid llama-droid Public

Run LLMs locally on Android with GPU acceleration. No root required.

Shell 1
claude-recall claude-recall Public

Privacy-first persistent memory for Claude Code. Local-only, human-readable.

Shell
LiteRT-LM LiteRT-LM Public

Forked from google-ai-edge/LiteRT-LM

C++
dgx-spark-moe-triton-configs dgx-spark-moe-triton-configs Public

Triton fused-MoE kernel configs tuned on NVIDIA DGX Spark (GB10), with reproducible Docker setup and sglang patches. Drop-in replacements for sglang/Triton SM12.1 deployments.

Python
longcat-next-multimodal longcat-next-multimodal Public

Meituan LongCat-Next (75B-A3B any-to-any multimodal) served on a single NVIDIA DGX Spark (GB10): every modality — text, image/audio/video understanding, image + voice-clone generation, tool calling…

Python