Forge

Forge is a lean software delivery framework for AI agents — from inception to production.

Where other agent frameworks hand you a coding assistant, Forge gives you an autonomous engineering organization: seven specialized agents, a shared delivery state machine, and a strict ATDD-first process built on battle-tested XP and Lean delivery practices.

Your agents don't just write code. They run discovery workshops, write INVEST-compliant user stories, make architecture decisions, implement vertical slices test-first, desk check each acceptance criterion, manage feature flags, and ship — continuously, to production, without you babysitting them.

The Big Idea

Most agent frameworks start at "write code". Forge starts at "what are we building and why".

Every story in Forge traces back to a customer pain identified in an empathy map. Every acceptance criterion is testable through the UI alone — no backdoors, no database queries, no internal system access. Every feature ships behind a feature flag on trunk. Nothing goes to production that a QA agent and a PO agent haven't independently verified against the original acceptance criteria.

This is how high-performing software teams actually work. Forge encodes it as agent skills.

The Seven Agents

Agent	Owns
po-agent	Inception, story writing, backlog management, story acceptance
ux-agent	Empathy mapping, UX specs, frontend acceptance criteria
architect-agent	Architecture Decision Records, service boundaries, tech debt
developer-agent	ATDD loops, TDD inner loops, contract tests, feature flag code
qa-agent	Acceptance test authoring, desk checks, regression suite
devops-agent	CI/CD, environments, Unleash feature flags, deployments
secops-agent	Threat modeling, security ACs, SAST/DAST pipeline gates

Each agent has a defined role, a default skill set, and a strict boundary. The developer agent doesn't make architecture decisions. The architect agent doesn't write production code. Roles are enforced, not suggested.

The Delivery Lifecycle

1. Inception

The po-agent and ux-agent facilitate a structured discovery process:

Lean Canvas — problem, solution, unique value proposition
Empathy Mapping — understand the customer before writing a single requirement
Trade-off Sliders — team-wide prioritization: quality vs. cost vs. UX vs. security (no ties allowed)
Event Storming — interactive back-and-forth conversation mapping domain events, commands, policies, and UI elements. UI stickies become user stories. Policies and commands become acceptance criteria. The final phase produces CONTEXT.md — the project's ubiquitous language.
Iteration Mapping — dependency graph → topological sort → iteration layers → Linear Projects + Cycles, automatically

2. Story Refinement

Every story passes a four-gate review before it's playable:

PO drafts — persona, customer value, rough acceptance criteria
UX gate — is this valuable? traces to empathy map pain point?
Developer gate — is this technically feasible? cost signal? estimable?
QA gate — is every AC testable through the UI alone, as an outside customer would test it?

Stories that fail any gate go back to the PO. Stories that pass land in their iteration's Linear Project as ready-for-dev.

All stories follow the INVEST principle and Mike Cohn's User Stories Applied — no technical detail in stories. Backend shape, API design, database schema all live in ADRs, not stories.

3. Iteration Zero

Before story 1 starts:

architect-agent writes ADRs for iteration 1 stories
devops-agent provisions CI/CD pipeline, test environment, and Unleash feature flag server
qa-agent authors the acceptance test scaffold — first dummy Acceptance Test must be green on CI
secops-agent wires SAST/DAST and secret scanning into the pipeline

Iteration 1 cannot start until the acceptance test scaffold is green on CI and a test environment is live.

4. Iteration N — The ATDD (Acceptance Test-Driven Development) Loop

ATDD is a collaborative practice where acceptance criteria are expressed as executable tests before any implementation begins. The outer Acceptance Test is the sole definition of done — everything else serves it.

For each story, the developer agent runs a strict process:

Outer Acceptance Test → RED

  For each sub-slice in the AC:
    FE inner loop: component test RED → GREEN → REFACTOR
    BE inner loop: CDC contract test RED → GREEN → REFACTOR
    Sub-slice complete ✓  (never batch FE loops then BE loops)

Outer Acceptance Test → GREEN
Desk check with QA (local first, then deployed environment)

One sub-slice at a time. One AC at a time. No implementation code before the outer Acceptance Test is RED and visible. No moving to the next sub-slice until the current one is fully green.

5. The Kanban Stages

Linear is the delivery state machine. There are two boards:

Board 1 — Iteration Map (Linear Roadmap/Projects) One Project per iteration. Stories are assigned to iterations based on a topological sort of their dependency graph — stories with no dependencies go into Iteration 1, stories depending on Iteration 1 go into Iteration 2, and so on. This board is the backlog. Stories with no iteration assignment are unscoped.

Board 2 — Delivery Board (Linear Cycle) The active iteration's stories move through the delivery state machine:

in-analysis          → story refinement (all four gates)
ready-for-dev        → any developer agent can pull (first available wins)
in-dev               → ATDD loop + desk check per AC
ready-for-qa         → story deployed to test environment; QA agent pulls
in-qa                → QA full regression suite
ready-for-acceptance → PO agent pulls
in-acceptance        → PO smoke tests all ACs
ready-to-deploy      → HUMAN approves flag flip
(done)               → feature flag on, Linear card closed

Bug cards found in in-qa or in-acceptance go directly to ready-for-dev — no refinement needed.

Stories are pulled, not assigned. When a developer agent starts a session with no active story, it queries Linear for the oldest story in ready-for-dev and atomically moves it to in-dev with self-assignment in a single API call. If two agents race, only one wins — the other pulls the next available story. No orchestrator needed.

Iteration completion. After completing a story, every agent checks: "are all stories in this iteration done?" If yes, the po-agent posts a completion notice on the Linear milestone and idles. A human PO reviews, then triggers the next iteration by activating the next Cycle. Iterations are variable duration — they end when done, not on a fixed date.

6. Iteration Scoping

Forge replaces velocity-based sprint planning with dependency-driven iteration scoping:

Human team concept	Forge equivalent
Story points	AC count (1 AC ≈ 1 ATDD loop)
Team velocity	Max concurrent agents (human sets this)
Fixed-length sprint	Variable duration — iteration ends when all stories reach `done`
Planning poker	Architect agent flags stories with >5 ACs as split candidates
Sprint planning	Topological sort of dependency graph → iteration layers

Parallel execution within an iteration is determined by story independence — stories in the same layer with no inter-story dependencies run simultaneously with separate developer agents.

7. Trunk-Based Continuous Deployment

No feature branches. Every push goes to production if CI is green — but unfinished stories are feature-flagged off via Unleash (self-hosted, open source). When a story is accepted, the human PO approves the flag flip and the feature is live.

Ubiquitous Language

"With a ubiquitous language, conversations among developers and expressions of the code are all derived from the same domain model." — Eric Evans, Domain-Driven Design

Agents dropped into a project with no shared vocabulary use 20 words where 1 will do. Forge solves this by generating CONTEXT.md as the final phase of every event storming session.

CONTEXT.md defines:

Domain terms — canonical names for every entity, event, command, and policy discovered during event storming, with explicit "avoid" aliases
Bounded context boundaries — which terms belong to which service/domain
Agent communication protocol — the shared vocabulary of the delivery process itself (e.g. "outer Acceptance Test" not "E2E test", "desk check" not "demo", "sub-slice" not "partial implementation")
Flagged ambiguities — terms that look like synonyms but aren't

CONTEXT.md lives in the repo root and is a mandatory read for every agent at session start. Any agent that encounters a term not in CONTEXT.md must stop and propose an addition before using it. The po-agent owns it; all agents can propose updates.

Agent Session Start Protocol

Every agent, every session, in this order:

1. Query Linear → do I have an active story in-progress?
   If yes  → resume it (re-run outer Acceptance Test first)
   If no   → pull oldest story from ready-for-dev (atomic claim)
2. Read CONTEXT.md → speak the project's language
3. Read project.constraints.yaml → know the priorities
4. Begin — the Linear stage determines what happens next

No handoff notes. No plan files. No conversation summaries. State lives in Linear and the repo — not in context windows.

Project Artifacts

Two files live in the repo root and are read by all agents:

CONTEXT.md                  # ubiquitous language — generated from event storming
project.constraints.yaml    # trade-off slider output — team priorities

Story content lives in Linear. A snapshot is committed to stories/ when a story moves to ready-for-dev — at that point the story is locked and the snapshot never drifts.

Skill Precedence

Forge uses an explicit three-level skill hierarchy to prevent the most common agent failure mode: rationalization over process.

L1 RIGID    → running-atdd-sessions, running-tdd-loops
              (these override everything; no exceptions)
L2 GUIDED   → writing-stories, facilitating-inception, deciding-architecture
              (structured process with human gates)
L3 MECH     → finishing-stories, managing-feature-flags
              (mechanical execution after L1/L2 complete)

An agent cannot rationalize "I'll just wire the handler first" — running-atdd-sessions is L1 and wins over plan files, conversation summaries, and implementation suggestions without exception.

Verified by Loopkit

All 21 Forge skills are continuously verified by loopkit — a static analysis tool for agent skill contracts. Every SKILL.md, LOOP.md, and .loopkit.yaml passes 0 errors, 0 warnings across all validators:

cargo install loopkit
loopkit . --verbose
# 21 skills checked. 0 error(s), 0 warning(s).

Loopkit validates state transitions, enforced states, handoff references, desk check patterns, bug feedback loops, progressive disclosure, naming conventions, terminology, and 10 other dimensions of skill quality.

Skills Library

Meta

using-forge — precedence rules, agent roles, session start protocol
resuming-sessions — query Linear + read CONTEXT.md before anything else

Discovery

facilitating-inception — lean canvas, empathy map, trade-off sliders, event storming, iteration map
facilitating-event-storming — interactive domain event discovery; final phase produces CONTEXT.md
establishing-ubiquitous-language — generates and maintains CONTEXT.md from event storming output
writing-stories — INVEST-compliant story writing with four-gate review
building-iteration-map — topological sort of dependencies → Linear Projects + Cycles via MCP

Architecture

deciding-architecture — ADR authoring, service boundary definition

Iteration Zero

bootstrapping-project — CI/CD, repos, environments, Unleash setup
validating-test-harness — gate skill: blocks iteration 1 until scaffold is green

Development (L1 Rigid)

running-atdd-sessions — outer Acceptance Test RED → sub-slice loops → GREEN
running-tdd-loops — FE component loop and BE CDC contract loop (CDC contracts live inside this loop)
managing-feature-flags — Unleash REST API integration (lifecycle protocol called from delivery loops)

Quality

running-desk-checks — per-AC verification, generates desk-check artifact, pauses for human
writing-acceptance-tests — Playwright/Cypress, UI-interactions only, no backdoors
running-regression-suite — full suite on test environment

Acceptance & Delivery

approving-stories — PO smoke test against ACs
finishing-stories — flag flip + Linear update + post-deploy security check
threat-modeling — security ACs injected per story from project.constraints.yaml
securing-pipeline — SAST/DAST/secret scanning gates

Installation

git clone https://github.com/loopworx/forge ~/.agents/forge
bash ~/.agents/forge/install.sh

This symlinks all 21 skills into ~/.agents/skills/ where OpenCode, Claude Code, and Hermes discover them automatically.

To verify:

cargo install loopkit
loopkit ~/.agents/forge --verbose
# 21 skills checked. 0 error(s), 0 warning(s).

Philosophy

Customer value first — every story traces to an empathy map pain point; no story is written without it
Shared language — agents and humans speak from the same domain model; CONTEXT.md is generated before the first story is written
INVEST or bust — stories with technical detail, untestable ACs, or no clear customer value never reach the backlog
Test-first is non-negotiable — the first file edit in any session is always a test file; implementation code is forbidden until the outer Acceptance Test is RED
Vertical slices — every story is deployable end-to-end; no frontend-only or backend-only stories
Pull, don't push — agents pull work when ready; no assignment, no queue management, no orchestrator
Trunk over branches — feature flags make branches unnecessary; continuous deployment makes them dangerous
Agents have roles — a developer agent that makes architecture decisions is a liability; role separation is enforced by skill design
State over memory — agents query Linear and read repo artifacts; delivery state survives context windows

Contributing

Forge skills follow Anthropic's agent skill best practices:

Description field is the primary selection mechanism — write it first
SKILL.md stays under 500 lines — use linked reference files for detail
Add deterministic contract tests for the skill's state machine and handoff routes
Skills are state machines with explicit gates, not knowledge documents

See the Forge skill authoring guide for the full process.

Fork the repository
Draft the description field
Write the skill body
Add or update the contract test harness to cover new states/transitions
Submit a PR with cargo test results from tools/contract-tests

License

MIT — see LICENSE file.

Forge is inspired by XP and Lean delivery practices, Eric Evans' Domain-Driven Design, Mike Cohn's User Stories Applied, and the hard lessons learned from watching AI agents skip the outer RED.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
skills		skills
stories		stories
.gitignore		.gitignore
.loopkit.yaml		.loopkit.yaml
README.md		README.md
project.constraints.yaml		project.constraints.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Forge

The Big Idea

The Seven Agents

The Delivery Lifecycle

1. Inception

2. Story Refinement

3. Iteration Zero

4. Iteration N — The ATDD (Acceptance Test-Driven Development) Loop

5. The Kanban Stages

6. Iteration Scoping

7. Trunk-Based Continuous Deployment

Ubiquitous Language

Agent Session Start Protocol

Project Artifacts

Skill Precedence

Verified by Loopkit

Skills Library

Installation

Philosophy

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Forge

The Big Idea

The Seven Agents

The Delivery Lifecycle

1. Inception

2. Story Refinement

3. Iteration Zero

4. Iteration N — The ATDD (Acceptance Test-Driven Development) Loop

5. The Kanban Stages

6. Iteration Scoping

7. Trunk-Based Continuous Deployment

Ubiquitous Language

Agent Session Start Protocol

Project Artifacts

Skill Precedence

Verified by Loopkit

Skills Library

Installation

Philosophy

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages