A reusable devcontainer + agentic workflow you drop into your own project. The
devcontainer image is published to GHCR and booted by a single self-contained
docker-compose.yml — no clone, no local build, no init.sh. Claude drives the
feature lifecycle from requirements to implementation; all shell execution is isolated
in a Docker sandbox, and GitHub issues hold the durable state.
Drop .devcontainer/docker-compose.yml into your
project (it points at the published image), set your GitHub + Claude credentials, then:
docker compose -f .devcontainer/docker-compose.yml up -d # standalone — no .env, no init.sh
# …drive the workflow with Claude:
# /grill-me-with-docs → /to-prd → /to-issues → /afk | /hitl …
docker compose -f .devcontainer/docker-compose.yml downEvery variable in the compose file carries a fallback (${PWD}, ${HOME}/.ssh, …), so a
bare docker compose up works with no generated .env. Grab a ready-made setup from
examples/, or see Outer orchestrator image
to pin a release instead of :latest.
Developing agentic-dev itself (the dogfood) uses
./up.sh .instead — it adds per-project container naming viainit.sh. See Local development & dogfood.
You do not need VS Code — the workflow is headless (see Do I need VS Code?).
Don't want to build the images locally? Pull the prebuilt ones from GHCR instead — see Inner sandbox images and the ready-made setups in examples/.
host (you + Claude Code) Docker
┌──────────────────────────────┐ ┌──────────────────────┐
│ /grill /to-prd /to-issues │ │ service: devcontainer│
│ /afk /hitl ───────────────┐│ │ /workspaces/<project>│
│ /exec ││ docker │ (your project bound) │
│ mcp__docker__run_command ─┼┼─────────► vscode user │
│ (service=devcontainer) ││ compose │ docker CLI (DooD) │
└──────────────────────────────┘ exec └──────────────────────┘
│ │
└── GitHub issues = durable state ────┘
- Claude runs on the host and shells into the container with
/exec→mcp__docker__run_command(service="devcontainer"), a thindocker compose execwrapper. Nothing AI runs inside the container. - GitHub issues are the durable state. The container is disposable; progress lives in issues, not in local files.
- The image bakes the workflow in. The published devcontainer installs the slash commands →
~/.claude/commandsand engineering disciplines →~/.claude/skills(ADR-0017). Today Claude still drives from the host (above), which needs the docker MCP wiring (see Sandbox wiring); baking the workflow in is groundwork toward driving it entirely from the in-container Claude.
The up.sh / down.sh / code.sh helpers and init.sh are for developing
agentic-dev itself and the per-project-subfolder model — they're not needed to use
the published image, where the standalone compose file is the whole story.
up.sh adds one thing the bare docker compose up doesn't: per-project container
naming (DEVCONTAINER_NAME) derived by init.sh, so several projects under this repo
can run side by side without colliding.
| Command | What it does |
|---|---|
./up.sh <folder> |
Spin up <folder>'s sandbox (the folder must hold its own .devcontainer/), bound at /workspaces/<folder> in a per-project container. Re-runs init.sh automatically; rebuilds the container (image layers stay cached). ./up.sh . brings up this repo. |
./down.sh <folder> |
Tear down the sandbox for <folder>. |
./down.sh |
Tear down all sandboxes started from this repo (safe — scoped by label, never touches other projects). |
./code.sh <folder> |
Optional. Open VS Code attached to the running sandbox, at /workspaces/<folder>. |
docker exec -it $(docker ps -q --filter label=devcontainer.local_folder=$PWD/<folder>) bash |
Optional. Drop into a shell in the running sandbox without VS Code. |
up.sh is a thin wrapper; the raw equivalent is:
devcontainer up \
--workspace-folder "$(pwd)" \
--remove-existing-containerNo --config is needed: the project root holds its own .devcontainer/, so the devcontainer CLI (and VS Code Reopen in Container) auto-discovers it. The initializeCommand runs the in-project .devcontainer/init.sh, which derives a per-project container name (DEVCONTAINER_NAME, e.g. agentic-dev) and writes .devcontainer/.env. This replaces the previous shared-.devcontainer/AGENTIC_DC_INIT model (see ADR-0012).
No. The agentic workflow is headless: Claude reaches the container from the host via the docker MCP sandbox (/exec), so the loop runs with nothing but the container up. There are two ways to be inside the container, both optional and only for when you want to look or edit interactively:
| You want to… | Do this |
|---|---|
| Run the agentic workflow | Just boot the sandbox (docker compose … up -d, or ./up.sh <folder> for the dogfood) — no editor needed |
| Edit/inspect in VS Code | ./code.sh <folder> (attaches at the right workspace folder), or in VS Code: Dev Containers: Attach to Running Container → pick the container → File ▸ Open Folder ▸ /workspaces/<folder> |
| Just a shell | docker exec -it <container> bash (see table above) |
Attaching VS Code to an already-running container is fine and does not disturb the workflow — it's the same container, just with an editor pointed at it.
End to end, from a clean checkout to merged work:
- One-time setup
- Add
mcp__docker__run_commandto theallowlist in.claude/settings.local.json(see Permissions). - Make sure the docker MCP server targets your project's compose project (see Sandbox wiring).
- Add
- Boot the sandbox in your project:
docker compose -f .devcontainer/docker-compose.yml up -d. (Developing this repo? Use./up.sh <folder>— see Local development & dogfood.) /grill-me-with-docs— Claude interviews you and reads any docs you point at, producingdocs/grill-output.md./to-prd— turns the interview into a structureddocs/prd.md./to-issues— breaks the PRD into vertical slices as GitHub issues. Unblocked issues get theready-for-agentlabel.- Implement — pick one:
/afk— autonomous: spawns one sub-agent perready-for-agentissue, implements with/tdd, commits, closes, and re-labels newly-unblocked dependents. No interruptions./hitl— same, but pauses for your approval between issues.
- Review & merge the resulting commits/PRs as usual.
- Tear down when finished:
docker compose -f .devcontainer/docker-compose.yml down(or./down.sh <folder>).
Each implementation sub-agent is constrained to: the issue body verbatim, /exec for all shell (never host Bash), scope limited to that issue's files, no pushing to main, no extra dependencies, and /tdd discipline.
| Command | Phase | Output |
|---|---|---|
/grill-me-with-docs |
Interview — ask questions, read provided docs | docs/grill-output.md |
/to-prd |
Structure the requirements | docs/prd.md |
/to-issues |
Break PRD into vertical slices | GitHub issues (label: ready-for-agent) |
/afk |
Autonomous implementation — sub-agent per issue, no interruptions | commits + closed issues |
/hitl |
Reviewed implementation — approve between each issue | commits + closed issues |
Supporting commands:
| Command | Purpose |
|---|---|
/exec <cmd> |
Run a shell command in the Docker sandbox (mcp__docker__run_command(service="devcontainer")). The only place that MCP tool is named — every other command routes through it. |
/tdd |
Red → green → refactor loop for each acceptance criterion. Used by the implementation agents. |
/to-issuescreates issues as vertical slices (end-to-end value, not layers) and appliesready-for-agentonly to issues with no blockers./afkand/hitlpick up onlyready-for-agentissues.- After an issue closes, its dependents are re-checked; if all their blockers are now closed, they get
ready-for-agent. - GitHub is the durable state — nothing is stored in
/tmpor local files that would be lost on container restart.
| File | Role |
|---|---|
up.sh |
Thin devcontainer up wrapper for a self-contained project (./up.sh .). |
down.sh |
Tear down sandboxes from this repo, scoped by label (./down.sh [folder]). |
code.sh |
Optional: attach VS Code to a running sandbox at its workspace folder. |
.devcontainer/init.sh |
Host initializeCommand: derives the per-project DEVCONTAINER_NAME and writes .devcontainer/.env. --dry-run prints the resolved names without side effects. |
.devcontainer/Dockerfile |
Ubuntu 24.04 devcontainer base; bakes in claude-persist-setup and the afk/hitl launchers. When built for publishing (BAKE_ORCHESTRATOR=1) it also bakes the orchestrator source + workflow commands + upstream skills (ADR-0016/0017). |
.devcontainer/docker-compose.yml |
Service devcontainer — the sandbox target, container_name: ${DEVCONTAINER_NAME}. Mounts workspace (consistency: cached), SSH (ro), Claude persist dir, Docker socket. Every var has a fallback, so it boots standalone without init.sh. |
.devcontainer/devcontainer.json |
claude-code + docker-outside-of-docker features; in-project initializeCommand → init.sh; postCreateCommand → claude-persist-setup. |
.devcontainer/init.sh |
Host-side: generates .env, pre-creates the persist dir. Optional — the compose fallbacks cover a bare docker compose up. |
.devcontainer/claude-persist-setup |
Symlinks ~/.claude.json + ~/.claude/ into the persist mount; installs the baked workflow commands → ~/.claude/commands and upstream skills → ~/.claude/skills. |
mcp__docker__run_command(service="devcontainer") is a plain docker compose exec wrapper — no AI model, it just runs a command in the running devcontainer service. Claude reaches it through /exec. This works from the host, so the sandbox itself does not require Docker installed inside the container.
Docker inside the container (for docker build, testcontainers, etc. invoked by the workflow) is provided separately by the docker-outside-of-docker feature, which installs the CLI and shares the host socket. Socket permissions for the non-root vscode user are fixed by the feature at startup — no group_add/DOCKER_GID needed.
For /exec to land in this repo's container, the docker MCP server must run docker compose exec against this project (service devcontainer). After ./up.sh <folder>, sanity-check with /exec whoami → expect vscode. If it can't find the service, point the MCP server's compose context at .devcontainer/docker-compose.yml.
Config, MCP registrations, and memory persist on the host at ~/.devcontainer-claude/, surviving container rebuilds. .env reflects the last folder spun up (init.sh upserts the workspace paths on each up.sh).
Each implementation runs in a disposable inner container built from one of two
images: the default claude image (SANDCASTLE_IMAGE, default sandcastle:local)
and the local/Ollama image (SANDCASTLE_OPENCODE_IMAGE, default
sandcastle-opencode:local). You can either pull prebuilt images or build them
locally.
Pull prebuilt (no build step). CI publishes both images, multi-arch (amd64 + arm64), to GHCR (ADR-0014). Point a project at them with env vars — the orchestrator already honours these overrides, so nothing else changes:
export SANDCASTLE_IMAGE=ghcr.io/lsfera/agentic-dev/sandbox:latest
export SANDCASTLE_OPENCODE_IMAGE=ghcr.io/lsfera/agentic-dev/sandbox-opencode:latestSet these in .sandcastle/orchestrator.env to make them stick. :latest tracks
the newest publish. The images are built by .github/workflows/publish-images.yml
on version tags (v*), on main, and via manual dispatch; each release also
publishes a matching :X.Y.Z image tag (the image tag drops the release's v,
per Docker convention). To pin a release instead of floating :latest, resolve
the current tag:
curl -s https://api.github.com/repos/lsfera/agentic-dev/releases | jq -r '.[0].tag_name | ltrimstr("v")'
# → e.g. 0.2.0 ; then use ...sandbox:0.2.0Package visibility. The published packages (
sandbox,sandbox-opencode, anddevcontainer) must be public to pull anonymously. If they're private, rundocker login ghcr.ioon the host first (with a token that hasread:packages), or make them public in the repo's Packages settings.
Build locally (the default). The defaults stay sandcastle:local /
sandcastle-opencode:local so source-of-truth and offline dev don't depend on a
pull. Build them from the .sandcastle context:
docker build -f .sandcastle/Dockerfile -t sandcastle:local .sandcastle
docker build -f .sandcastle/Dockerfile.opencode -t sandcastle-opencode:local .sandcastleA project that needs an owned image (so the #40 orphan sweep won't let another
project reap its sandboxes) builds with --build-arg AGENTIC_PROJECT=<name>; the
published GHCR images are unowned/legacy.
The outer image — the devcontainer the orchestrator runs in — is also
published to GHCR, multi-arch, as ghcr.io/lsfera/agentic-dev/devcontainer
(ADR-0015), so adopters can skip the local devcontainer build (~2.3 GB, four
features). It is a devcontainer (base + features + Dockerfile), so it is built
with the devcontainer CLI, not a plain docker build; the workflow is
.github/workflows/publish-devcontainer.yml.
The published image also bakes the orchestrator source into
/opt/agentic-orchestrator (ADR-0016), so an adopter using it doesn't need to
vendor .sandcastle/*.ts at all — keep only your config in .sandcastle/
(orchestrator.env, .env, opencode.json) and run afk/hitl. The launchers
prefer a workspace .sandcastle/ when it carries the source (so this repo and any
vendoring project still run their own copy) and fall back to the baked one
otherwise; deps install into /opt on first use. Override the baked location with
AGENTIC_ORCHESTRATOR_HOME.
It also bakes the workflow itself (ADR-0017):
the user-invoked slash commands (/afk, /hitl, /exec, /to-prd, /to-issues,
/tdd, /grill-me-with-docs) at /opt/agentic-commands, plus four model-invoked
engineering disciplines (tdd, diagnosing-bugs, domain-modeling, codebase-design)
pulled from mattpocock/skills (MIT) at
/opt/agentic-skills. At container creation claude-persist-setup installs them into
~/.claude/commands and ~/.claude/skills. This is groundwork for running the
workflow from inside the container — the current flow runs Claude on the host (see
How it fits together), which uses the workspace
.claude/commands, so the baked copies are inert for the host flow today.
To consume it instead of building locally, either point the compose service at it
# .devcontainer/docker-compose.yml
services:
devcontainer:
image: ghcr.io/lsfera/agentic-dev/devcontainer:latest # instead of build:or keep build: and add the published image as a cache source so devcontainer up
is a registry cache hit rather than a full rebuild. This repo dogfoods the
published image — its docker-compose.yml uses image: (with build: kept as a
commented fallback); the workspace still carries .sandcastle/*.ts, so the
orchestrator runs from the workspace source while the devcontainer itself comes
from GHCR.
The same package-visibility note applies — pulling the
devcontainer image anonymously needs it to be public, else docker login ghcr.io.
The committed docker-compose.yml carries a fallback for every variable
(LOCAL_WORKSPACE_FOLDER:-${PWD}, SSH_DIR:-${HOME}/.ssh, …), so a bare
docker compose up from a project root works with no .env and no init.sh —
${PWD} preserves the path-matched mount (ADR-0011).
init.sh (run by up.sh / devcontainer up) stays the path for VS Code and
per-project container naming, and when it sets those vars they win; it is now an
optimization, not a requirement.
By default each implementation sandbox runs Claude (claudeCode). You can instead point the implementer at a local Ollama model via opencode — no API cost, fully offline. Useful for cheap parallel work; quality depends on the local model, so it is not yet a drop-in for Claude on hard slices.
Prerequisites
- Ollama running on the host, bound to
0.0.0.0:11434so containers can reach it athttp://host.docker.internal:11434. - A coding model pulled, e.g.
ollama pull qwen3-coder:30b. Keep weights ~15–22 GB so RAM stays free for Docker + the sandbox stack.
One-time: get the opencode inner image
Pull the prebuilt sandbox-opencode image or build it locally — see
Inner sandbox images. The local build is:
docker build -f .sandcastle/Dockerfile.opencode -t sandcastle-opencode:local .sandcastleThis image ships the opencode CLI (pinned opencode-ai@1.17.9) instead of Claude Code. The provider config in .sandcastle/opencode.json points opencode at the host Ollama; on sandbox start an onSandboxReady hook copies it into opencode's global config dir (~/.config/opencode/) — opencode resolves its provider from there, not the worktree cwd, so this step is what makes the model actually load.
Run the orchestrator on the local tier
The devcontainer image bakes in afk and hitl commands, so from anywhere inside a project you can just:
afk # autonomous, claude tier
afk local # default local model
afk local qwen2.5-coder:32b # pick a model
hitl local # review before mergeafk/hitl are thin launchers over .sandcastle/run.sh (which steers tier/model from arguments and handles the cd to the path-matched mount, npm install, and orchestrator.env sourcing). They resolve the project root and set LOCAL_WORKSPACE_FOLDER from .devcontainer/.env so the run launches from the host-path mount (ADR-0011) regardless of your working directory. The underlying script is equivalent:
.sandcastle/run.sh afk local qwen2.5-coder:32bThe same arguments work from the /afk and /hitl slash commands (e.g. /afk local qwen2.5-coder:32b). A bare model name is routed to the active tier and gets the ollama/ prefix added if missing. Under the hood the arguments resolve to these env vars (arguments override orchestrator.env); afk … --dry-run prints the resolved values without launching:
| Env var | Default | Purpose |
|---|---|---|
AGENTIC_TIER |
claude |
Set to local (arg: local) to use opencode + host Ollama instead of Claude Code. |
AGENTIC_LOCAL_MODEL |
ollama/qwen3-coder:30b |
opencode model ref (ollama/<model>); the model must exist in ollama list. |
SANDCASTLE_OPENCODE_IMAGE |
sandcastle-opencode:local |
Inner image for the local tier. |
To add a model, pull it in Ollama and add it to the models map in .sandcastle/opencode.json, then pass it as the model argument (or via AGENTIC_LOCAL_MODEL). Verify reachability from a container with docker run --rm --add-host=host.docker.internal:host-gateway curlimages/curl -s http://host.docker.internal:11434/api/tags.
Concrete, copy-pasteable setups by use case live in examples/ —
standard Claude + prebuilt image, offline Ollama tier, build-from-source,
version-pinned reproducible runs, and parallel/high-throughput. Each directory has
the config files to copy into .sandcastle/ plus a short README.
Add mcp__docker__run_command to the allow list in .claude/settings.local.json so /exec never prompts:
{
"permissions": {
"allow": ["mcp__docker__run_command"]
}
}consistency: cachedon the workspace mount is a no-op on modern Docker Desktop (VirtioFS) — kept for correct intent / older osxfs. The real macOS perf levers are VirtioFS (default) and not bind-mounting heavy dirs (node_modules,.venv) — use named volumes for those.- Per-project, self-contained. Each project carries its own
.devcontainer/, soup.sh,devcontainer up, and VS Code Reopen in Container all discover the config natively (no--configsplit, noAGENTIC_DC_INIT). The container is named per project (DEVCONTAINER_NAME, derived from the workspace folder), so two projects don't collide. Tradeoff: each project keeps its own copy of.devcontainer/, so improvements to the shared config must be propagated to each project (e.g. by re-copying or templating) rather than landing everywhere at once. See ADR-0012. - The docker MCP map is per-project.
/exectargets the sandbox viaALLOWED_CONTAINERS=devcontainer:<DEVCONTAINER_NAME>; for this repo that'sagentic-dev. Point it at the project's container name (no longer the old fixedagentic-sandbox).
MIT © 2026 Luca Giordano.
The orchestrator under .sandcastle/ is built on sandcastle by Matt Pocock — a TypeScript library for orchestrating sandboxed coding agents (sandcastle.run(), published as @ai-hero/sandcastle). It handles the disposable, git-isolated Docker sandbox each agent runs in; this project wraps it into the issue-driven /afk and /hitl workflow. Thanks to Matt and the sandcastle contributors.
The published image also bakes a few model-invoked engineering disciplines — tdd, diagnosing-bugs, domain-modeling, codebase-design — from mattpocock/skills (MIT, © Matt Pocock), installed into ~/.claude/skills. The user-invoked workflow commands (/afk, /hitl, /to-prd, …) are this project's own.