Defer provider SDK imports and harden startup launchers#347
Open
namabeeru wants to merge 8 commits into
Open
Conversation
Collaborator
|
@namabeeru Can you point this PR to the improvement/fix-provider-sdk-imports branch? |
Author
|
Done, this PR now targets improvement/fix-provider-sdk-imports. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
app.runtime_preflight) used by bothrun.pyand directapp.mainstartup. It checks startup-critical imports in the exact Python runtime that will run the backend and reports wrong-interpreter installs before the traceback path.craftbot.py startdetect early child-process failure, remove stale PID files, print the relevant log tail instead of reporting false success, and return a nonzero CLI exit code on failedstart/restart.run.pyalready checked the runtime, ignore staleCRAFTBOT IS READYlines from previous logs, and make install flows fail instead of claiming success when the service does not start.app.mainpreflight ordering, service startup failure reporting, source shortcut generation, stale readiness logs, install failure reporting, and CLI failure exit codes.Root cause
There are four related failure modes:
agent_core.core.models.factoryimportedopenaiandanthropicat module load, so backend startup could fail before the selected provider mattered.install.pydependencies. The crash then appears deep inside backend imports after frontend startup has already begun.craftbot.py startwrote the child PID and printedCRAFTBOT STARTEDimmediately after spawningrun.py, even if the child exited right away. That left users with a desktop shortcut to a server that was not actually running. Loop testing also found the failedstartcommand still exited0; this PR changes failedstart/restartCLI calls to exit1.CraftBot.commandonly openedhttp://localhost:7925; it did not start the service if nothing was listening there, and it could also open a frontend-only/stale service without verifying the backend.Why this approach
This avoids introducing a parallel OpenAI-compatible HTTP client. The project already declares and uses the OpenAI SDK for DeepSeek/OpenRouter/Grok/MiniMax/Moonshot request paths, so the safer open-source fix is to import it only when needed and report a clear install error if that selected path is unavailable.
The runtime preflight handles the broader Python mismatch directly. It checks packages that are imported during backend startup, but intentionally leaves provider SDKs out so provider-specific SDK failures happen only when the selected provider path actually needs that SDK.
The service-manager changes keep
craftbot.py start,craftbot.py install, and the generated desktop shortcut honest: quick startup failures surface in the terminal with the last log lines, stale PID files are removed, failed CLI starts return nonzero, stale readiness markers are ignored, and the macOS source shortcut opens the existing local service only when both frontend and backend respond.Validation
Focused checks:
/usr/local/bin/python3.10 -m pytest tests/test_craftbot_service.py tests/test_model_factory.py tests/test_run_dependency_check.py -q-> 21 passed./usr/local/bin/python3.10 -m py_compile craftbot.py tests/test_craftbot_service.py agent_core/core/models/factory.py run.py app/main.py app/runtime_preflight.py tests/test_model_factory.py tests/test_run_dependency_check.pygit diff --checkLoop checks:
21 passedeach iteration).python3 run.py --cliwith Homebrew Python 3.14: 10 iterations, all exited nonzero with the missing-dependencies message and no traceback.python3 -m app.main --cliwith Homebrew Python 3.14: 10 iterations, all exited nonzero with the missing-dependencies message and no traceback.python3 craftbot.py startwith Homebrew Python 3.14: 10 iterations, all reportedCraftBot failed to start, exited nonzero, did not printCRAFTBOT STARTED, and did not leave a stale PID./usr/local/bin/python3.10.app.mainwithopenaiandanthropicdeliberately blocked: 10 iterations, all imported successfully.Manual smoke:
/usr/local/bin/python3.10 craftbot.py start --clistarted successfully,craftbot.py statusreported the PID running, and/usr/local/bin/python3.10 craftbot.py stopstopped it cleanly.Broader baseline:
/usr/local/bin/python3.10 -m pytest tests -q-> 107 passed, 4 unrelated failures intests/test_trigger_router_and_parking.py; the fake LLM there does not accept the newerprompt_namekeyword./usr/local/bin/python3.10 -m pytest -qstill fails during collection in generated/template living-ui test folders:app/data/living_ui_modules/auth/backend/tests/test_auth.pyimportsmodelsas a top-level module.app/data/living_ui_template/backend/testsconflicts with the roottests.conftestimport path.