Skip to content

Defer provider SDK imports and harden startup launchers#347

Open
namabeeru wants to merge 8 commits into
CraftOS-dev:improvement/fix-provider-sdk-importsfrom
namabeeru:codex/fix-provider-sdk-imports
Open

Defer provider SDK imports and harden startup launchers#347
namabeeru wants to merge 8 commits into
CraftOS-dev:improvement/fix-provider-sdk-importsfrom
namabeeru:codex/fix-provider-sdk-imports

Conversation

@namabeeru

@namabeeru namabeeru commented Jun 26, 2026

Copy link
Copy Markdown

Summary

  • Lazy-load OpenAI and Anthropic SDK imports in the model factory so CraftBot can import/start without provider SDKs before a selected provider path needs them.
  • Keep OpenAI-compatible providers on the existing OpenAI SDK implementation instead of adding a second HTTP client path; if that SDK is missing when such a provider is initialized, raise a provider-specific install error.
  • Add a shared runtime preflight (app.runtime_preflight) used by both run.py and direct app.main startup. It checks startup-critical imports in the exact Python runtime that will run the backend and reports wrong-interpreter installs before the traceback path.
  • Make craftbot.py start detect early child-process failure, remove stale PID files, print the relevant log tail instead of reporting false success, and return a nonzero CLI exit code on failed start / restart.
  • Harden launcher edge cases: avoid duplicate backend preflight after run.py already checked the runtime, ignore stale CRAFTBOT IS READY lines from previous logs, and make install flows fail instead of claiming success when the service does not start.
  • Make the macOS desktop shortcut for source installs open the running frontend only when both frontend and backend respond, or start CraftBot with the Python that created the shortcut when the service is not healthy.
  • Add focused regression tests for lazy provider SDK imports, missing-SDK provider errors, runtime dependency detection, provider SDK exclusion from preflight, app.main preflight ordering, service startup failure reporting, source shortcut generation, stale readiness logs, install failure reporting, and CLI failure exit codes.

Root cause

There are four related failure modes:

  1. agent_core.core.models.factory imported openai and anthropic at module load, so backend startup could fail before the selected provider mattered.
  2. CraftBot can be launched with a different Python than the one used during install. On systems with multiple Python installs, such as macOS with Homebrew Python, users can accidentally run CraftBot with an interpreter that did not receive install.py dependencies. The crash then appears deep inside backend imports after frontend startup has already begun.
  3. craftbot.py start wrote the child PID and printed CRAFTBOT STARTED immediately after spawning run.py, even if the child exited right away. That left users with a desktop shortcut to a server that was not actually running. Loop testing also found the failed start command still exited 0; this PR changes failed start / restart CLI calls to exit 1.
  4. In source mode on macOS, CraftBot.command only opened http://localhost:7925; it did not start the service if nothing was listening there, and it could also open a frontend-only/stale service without verifying the backend.

Why this approach

This avoids introducing a parallel OpenAI-compatible HTTP client. The project already declares and uses the OpenAI SDK for DeepSeek/OpenRouter/Grok/MiniMax/Moonshot request paths, so the safer open-source fix is to import it only when needed and report a clear install error if that selected path is unavailable.

The runtime preflight handles the broader Python mismatch directly. It checks packages that are imported during backend startup, but intentionally leaves provider SDKs out so provider-specific SDK failures happen only when the selected provider path actually needs that SDK.

The service-manager changes keep craftbot.py start, craftbot.py install, and the generated desktop shortcut honest: quick startup failures surface in the terminal with the last log lines, stale PID files are removed, failed CLI starts return nonzero, stale readiness markers are ignored, and the macOS source shortcut opens the existing local service only when both frontend and backend respond.

Validation

Focused checks:

  • /usr/local/bin/python3.10 -m pytest tests/test_craftbot_service.py tests/test_model_factory.py tests/test_run_dependency_check.py -q -> 21 passed.
  • /usr/local/bin/python3.10 -m py_compile craftbot.py tests/test_craftbot_service.py agent_core/core/models/factory.py run.py app/main.py app/runtime_preflight.py tests/test_model_factory.py tests/test_run_dependency_check.py
  • git diff --check

Loop checks:

  • Focused pytest loop: 10 iterations, all passed (21 passed each iteration).
  • python3 run.py --cli with Homebrew Python 3.14: 10 iterations, all exited nonzero with the missing-dependencies message and no traceback.
  • python3 -m app.main --cli with Homebrew Python 3.14: 10 iterations, all exited nonzero with the missing-dependencies message and no traceback.
  • python3 craftbot.py start with Homebrew Python 3.14: 10 iterations, all reported CraftBot failed to start, exited nonzero, did not print CRAFTBOT STARTED, and did not leave a stale PID.
  • Current Python runtime preflight: 10 iterations, all reported no missing imports under /usr/local/bin/python3.10.
  • Imported app.main with openai and anthropic deliberately blocked: 10 iterations, all imported successfully.

Manual smoke:

  • /usr/local/bin/python3.10 craftbot.py start --cli started successfully, craftbot.py status reported the PID running, and /usr/local/bin/python3.10 craftbot.py stop stopped it cleanly.

Broader baseline:

  • /usr/local/bin/python3.10 -m pytest tests -q -> 107 passed, 4 unrelated failures in tests/test_trigger_router_and_parking.py; the fake LLM there does not accept the newer prompt_name keyword.
  • /usr/local/bin/python3.10 -m pytest -q still fails during collection in generated/template living-ui test folders:
    • app/data/living_ui_modules/auth/backend/tests/test_auth.py imports models as a top-level module.
    • app/data/living_ui_template/backend/tests conflicts with the root tests.conftest import path.

@namabeeru namabeeru changed the title [codex] Handle missing OpenAI SDK for compatible providers Defer provider SDK imports and detect wrong Python runtime Jun 26, 2026
@namabeeru namabeeru marked this pull request as ready for review June 26, 2026 15:19
@namabeeru namabeeru marked this pull request as draft June 27, 2026 00:10
@namabeeru namabeeru changed the title Defer provider SDK imports and detect wrong Python runtime Defer provider SDK imports and harden startup launchers Jun 27, 2026
@namabeeru namabeeru marked this pull request as ready for review June 27, 2026 00:57
@namabeeru namabeeru marked this pull request as draft June 27, 2026 01:08
@namabeeru namabeeru marked this pull request as ready for review June 27, 2026 01:27
@zfoong zfoong requested a review from korivi-CraftOS June 27, 2026 08:03
@zfoong

zfoong commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

@namabeeru Can you point this PR to the improvement/fix-provider-sdk-imports branch?
@korivi-CraftOS Can you do a clean installation test on Mac?

@namabeeru namabeeru changed the base branch from main to improvement/fix-provider-sdk-imports June 28, 2026 01:30
@namabeeru

Copy link
Copy Markdown
Author

Done, this PR now targets improvement/fix-provider-sdk-imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants