A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
-
Updated
Jun 30, 2025 - Python
A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
Redis Vector Library (RedisVL) -- the AI-native Python client for Redis.
Unified AI Gateway for 30+ LLMs (OpenAI, Anthropic, Bedrock, Azure etc) with Caching, Guardrails, A/B test & cost controls. Go-native Fastest & Scalable AI Gateway LiteLLM & Kong AI Gateway alternative.
mimir is a drop-in proxy that caches LLM API responses using semantic similarity, reducing costs and latency for repeated or similar queries.
SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.
Reliable and Efficient Semantic Prompt Caching with vCache
RAMen is a fast in-memory data store like Redis, but built for AI: drop-in Redis protocol, native vector search, semantic caching, and a built-in MCP server for agents. Single Go binary, BSD-3.
Redis integration for Google Agent Development Kit (ADK) - Memory, Sessions, Search Tools, MCP
Unified multi-layer caching library for AI/agent pipelines — LangChain, LangGraph, AutoGen, CrewAI, Agno, A2A
Redis Vector Library (RedisVL) -- the AI-native Java client for Redis.
Enterprise AI traffic gateway — unified compliance, routing across 20+ LLM providers, semantic cache, quotas, and audit. SDK / network / OS-layer intercept.
This is a RAG based chatbot in which semantic cache and guardrails have been incorporated.
This repository contains sample code demonstrating how to implement a verified semantic cache using Amazon Bedrock Knowledge Bases to prevent hallucinations in Large Language Model (LLM) responses while improving latency and reducing costs.
ToolOps is a framework-agnostic middleware SDK that treats every tool call as a first-class operation. By wrapping your tools in a single decorator, you instantly upgrade them with industrial-grade caching, resilience, and observability.
High-performance LLM query cache with semantic search. Reduce API costs 80% and latency from 8.5s to 1ms using Redis + Qdrant vector DB. Multi-provider support (OpenAI, Anthropic).
RouterArena #1 among known public baselines: 96.77% accuracy, $0.0768/1K, 1.0000 robustness. OpenAI-compatible LLM router across 47+ providers.
Local-first semantic cache for AI agents. A small C daemon + CLI that remembers what your agent learned across sessions. Plugs into Claude Code, Codex, Gemini CLI, and Claude Desktop / ChatGPT via MCP. No LLM calls, no SaaS, no API key.
AI real-estate automation platform: Telegram bot, RAG, apartment search, CRM workflows, voice agent, Langfuse observability, and Dockerized AI runtime.
OpenAI-compatible LLM gateway that reduces API costs using Redis exact cache and Qdrant semantic cache.
Add a description, image, and links to the semantic-cache topic page so that developers can more easily learn about it.
To associate your repository with the semantic-cache topic, visit your repo's landing page and select "manage topics."