Skip to content

Backend resilience: LLM retry/backoff, OpenAI-compatible embeddings, connection guards#50

Open
shivswami wants to merge 1 commit into
nikmcfly:mainfrom
shivswami:pr1-backend-resilience
Open

Backend resilience: LLM retry/backoff, OpenAI-compatible embeddings, connection guards#50
shivswami wants to merge 1 commit into
nikmcfly:mainfrom
shivswami:pr1-backend-resilience

Conversation

@shivswami

Copy link
Copy Markdown

What

Improves backend robustness for local deployments (Ollama / LMStudio / Neo4j in Docker) where services are flaky or use non-Ollama providers.

Why

  • LLM rate-limits / timeouts mid-extraction aborted the whole build with no retry.
  • Only Ollama's embedding format was supported; OpenAI-compatible local servers (e.g. LMStudio) couldn't be used.
  • A failed batch embedding was logged as a warning and silently produced empty vectors, so vector search quietly stopped working.
  • A build that extracted 0 entities (e.g. the LLM returned no JSON) still reported "completed".

Changes

  • llm_client.py — retry transient failures (429 / 5xx / timeout / connection) with exponential backoff (2/4/8…s, capped at 60s), honoring Retry-After; non-retryable errors fail fast. Configurable via LLM_MAX_RETRIES.
  • embedding_service.py — support OpenAI-compatible /v1/embeddings endpoints alongside Ollama; format auto-detected from EMBEDDING_BASE_URL.
  • neo4j_storage.pyhealth_check() + pre-flight _verify_connection() before create_graph / add_text; batch-embedding failures now logged as errors (empty vectors silently break vector search).
  • graph.py — a build that extracts 0 entities now fails the task explicitly with an actionable message instead of showing "completed".

Notes / limitations

  • Embedding format detection is heuristic (port :11434 → Ollama; /v1 in path → OpenAI). A non-default Ollama port behind a /v1 proxy could misdetect.
  • _verify_connection adds one RETURN 1 round-trip before each add_text.

Tested

  • Module imports verified (backend/venv).
  • [Runtime against Ollama / LMStudio / Neo4j confirmed locally — happy to add repro steps.]

…guards

- llm_client: retry transient failures (429/5xx/timeout/connection) with
  exponential backoff honoring Retry-After; non-retryable errors fail fast
- embedding_service: support OpenAI-compatible /v1/embeddings servers
  (e.g. LMStudio) alongside Ollama, auto-detected from EMBEDDING_BASE_URL
- neo4j_storage: verify Neo4j reachable before create_graph/add_text;
  log batch-embedding failures as errors (empty vectors silently break
  vector search)
- graph build: fail the task explicitly when NER extracts 0 entities
  instead of reporting a misleading "completed" status
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant