DG-Anvil

DG-Anvil enforces contract-first, evidence-based development for Claude Code, ensuring verification before marking tasks as done.

<div align="center">

DG-Anvil

Contract-first, evidence-based development for Claude Code.

A plugin that turns "the agent thinks it's done" into "the agent has proven it's done."

Version License: MIT Node Claude Code Zero deps

Quick start - Installation - How it works - Commands - Architecture

</div>

What this is

DG-Anvil is a Claude Code plugin that enforces the Anvil loop as structural gates. The plugin makes specifications machine-readable before code runs, runs verification probes externally, and persists every failure as a Ledger lesson that future contracts inherit.

In one sentence: the agent cannot mark work as done; verification can.

Why it exists

Most agent loops fail the same way: the agent claims success, the user discovers it didn't actually work, and the lesson lives only in the user's frustration. DG-Anvil solves the three failure modes structurally:

Failure modeDG-Anvil's structural gate
"Looks done to me"Four-level Verify probe (Exists, Substantive, Wired, Functional) - all four must pass per criterion
"I'll remember next time"Every failed run appends a non-null lesson to the global Ledger; future contracts inject matching lessons as counter-examples
"Trust me, I read the spec"Contract is YAML, parsed and validated; criteria without all four verification slots are rejected at save time

Two human touchpoints

The production loop has exactly two human touchpoints; everything between them runs autonomously:

  1. Contract confirm. After /start <intent>, you review anvil/contract.yml and accept or reject. Binary; no list-of-N auto-pick.
  2. PR merge. /ship opens the PR via gh pr create. You review and merge; /ship does not merge for you.

Everything between those two gates runs autonomously.


Quick start

After installing, inside any project repository, in a Claude Code session:

/start add a rate limiter that caps inbound requests at 100 per minute per client

The contracting skill activates, queries the Ledger for matching prior lessons, drafts anvil/contract.yml with all four verification levels populated per criterion, and presents it for one-shot binary confirmation.

After you accept, the loop runs autonomously through:

contracting -> planning -> executing -> verifying -> judging -> (pass | reset)

Until every contract criterion is green. Then:

/ship

opens the PR via gh pr create. You review and merge.


Installation

DG-Anvil is both a Claude Code plugin and a single-plugin marketplace. Pick the path that matches how you want it to live in your environment.

Option A: Marketplace install (recommended)

The supported, upgrade-friendly path. Plugin lives under ~/.claude/plugins/ and is shared across every project.

/plugin marketplace add forsonny/DG-Anvil
/plugin install dg-anvil@dg-anvil

Verify:

/plugin list

You should see dg-anvil enabled. The five slash commands are now available everywhere, and hooks/session-start auto-loads using-anvil on every fresh session.

Upgrade later:

/plugin marketplace update dg-anvil
/plugin install dg-anvil@dg-anvil

Remove:

/plugin uninstall dg-anvil@dg-anvil
/plugin marketplace remove dg-anvil

Option B: Project-level install (single repo only)

Use this when you want DG-Anvil active in exactly one repository.

From your project root:

mkdir -p .claude/plugins
git clone https://github.com/forsonny/DG-Anvil.git .claude/plugins/dg-anvil

Then in Claude Code:

/plugin marketplace add ./.claude/plugins/dg-anvil
/plugin install dg-anvil@dg-anvil

The plugin is now active for this repository only. Upgrade with git -C .claude/plugins/dg-anvil pull. Add .claude/plugins/dg-anvil to your project's .gitignore if you do not want to commit the plugin source.

Option C: User-level install (no marketplace)

Skip the marketplace command and drop the plugin into Claude Code's user-scoped plugin directory:

mkdir -p ~/.claude/plugins
git clone https://github.com/forsonny/DG-Anvil.git ~/.claude/plugins/dg-anvil

Restart any active Claude Code session. Components are auto-discovered. Upgrade with git -C ~/.claude/plugins/dg-anvil pull.

Option D: One-off session

For a single Claude Code session without installing anything:

git clone https://github.com/forsonny/DG-Anvil.git /tmp/dg-anvil
claude --plugin-dir /tmp/dg-anvil

The plugin is loaded for that session only.

Requirements

RequirementWhy
Node.js 20.0.0+Runs cli/anvil.js; uses Node's built-in node:test and coverage
git on PATHExecutor uses git worktree add per task
Bash (Unix) or Git for WindowsPolyglot hooks dispatch through hooks/run-hook.cmd
python + coverage.py (optional)Verify Python contracts; missing tooling produces structured E_COVERAGE_UNAVAILABLE, never silent passes
go (optional)Verify Go contracts

DG-Anvil itself ships with zero runtime dependencies.


How it works

       user intent
           |
           v
   +---------------+        +-----------+
   |  contracting  |<------>|  Ledger   |   inject counter-examples
   +---------------+        +-----------+
           |
           v   [HUMAN: confirm contract]
   +---------------+
   |    planning   |   atomic task DAG, every task cites criterion ids
   +---------------+
           |
           v   per task, in parallel waves:
   +---------------+
   |   executing   |   fresh subagent, isolated git worktree
   +---------------+
           |
           v   read diff + tool output (never narration):
   +---------------+
   |   verifying   |   Exists -> Substantive -> Wired -> Functional
   +---------------+
           |
           v   if any level fails, dispatch the Court:
   +---------------+        +-----------+
   |    judging    |------->| resetting |---> append lesson, re-queue
   +---------------+        +-----------+
           |                       ^
           v all green             | loop_cap reached -> escalate
   +---------------+
   |     /ship     |   [HUMAN: review PR + merge]
   +---------------+

Five primitives

PrimitiveWhere it livesWhat it is
Contractanvil/contract.ymlYAML with four verification levels per criterion
Plananvil/plan.ymlAtomic task DAG with wave ordering; each task cites criterion ids
Loopcli/lib/executor.js + verifier.jsContract -> Plan -> Execute -> Verify -> Judge -> Pass or Reset
Ledger~/.anvil/ledger.jsonlDurable lesson log; future contracts query and inject counter-examples
Courtcli/lib/court.jsAdversarial adjudicator. Plan, commit messages, prior verdicts are structurally withheld

Seven skills

SkillTriggers
using-anvilLoaded by session-start on every fresh session
contracting/start <intent>; whenever no confirmed contract exists
planningAfter contract confirmation
executingPer task, once a wave unlocks
verifyingAfter each executing returns
judgingAfter Verify reports allGreen or suspicious
resettingWhen Verify or Court fails; appends lesson, re-queues
authoring-skillsMeta-skill governing every change to any skill

Five commands

CommandPurpose
/start <intent>Begin a new Anvil run. Drafts and confirms the contract.
/continueResume from anvil/state.json at the next incomplete task.
/shipWhole-branch Court runs; on merge, opens the PR via gh.
/abort <reason>Stop the run. Captures a final lesson if applicable.
/ledger query | append | auditInteract with the global lesson Ledger.

Commands reference

The anvil CLI is the canonical surface; every skill is reified as a code path it composes.

anvil contract --validate <file>
anvil plan     --validate <file> --contract <file>
anvil run      --task <id> [--dispatcher anvil_subagent|stub]
anvil verify   --worktree <dir> --contract <file>
anvil judge    --task <id> --worktree <dir> --contract <file>
anvil ledger   query <pattern> | append --file <jsonl> | audit
anvil metrics  [--trace-path <file>] [--seeded-path <file>]
anvil ship
anvil escalation list | describe --task <id>
anvil cassette record --scenario <name> --out <path>
anvil contract-migrate --in <file> --out <file>
anvil plan-migrate     --in <file> --out <file>
anvil ledger-migrate   --in <file> --out <file>

Run anvil --help for the full list, or anvil --version for the version.

Every error is a structured {error, code, details} JSON object on stderr with a non-zero exit. There are no silent failures and no advisory warnings.


Architecture

DG-Anvil's design is documented in two places:

DocumentWhat it covers
dev/anvil-design/00_intro.md through 12_Bottom_Line.mdCanonical product design: thesis, anatomy, contract grammar, ledger, court, observability, scenarios
dev/dg-anvil/00_Architecture.md + dev/dg-anvil/plans/Build architecture and the five stage plans that produced the plugin
docs/failure-taxonomy.mdThe 30-row failure taxonomy that every pressure test cites by row number

See dev/README.md for a contributor-oriented overview of the design provenance.

Key invariants (enforced in code, not prose)

  • Single-writer Ledger. Only cli/lib/ledger-write.js may append to ~/.anvil/ledger.jsonl.
  • Evidence-only Court. cli/lib/court.js does not import plan.js, ledger.js, or ledger-write.js. Plan, commit messages, and prior verdicts are structurally withheld.
  • Fresh subagent per task. Executor carries no module-level state; every briefing is Object.freezed before dispatch.
  • Null-lesson prohibition. Lessons with empty contract_gap, evidence, or remediation are rejected at the write path. Failed resets escalate instead.
  • Trace fields are closed. The 17-field trace event schema is frozen; additions require an architecture document update.
  • Zero runtime dependencies. package.json declares no dependencies or devDependencies.

Project layout

dg-anvil/
  .claude-plugin/
    plugin.json           plugin manifest
    marketplace.json      single-plugin marketplace manifest
  cli/
    anvil.js              CLI entry; dispatch table for every subcommand
    contract-schema.json  JSON Schema, contract.yml frontmatter
    plan-schema.json      JSON Schema, plan.yml
    ledger-schema.json    JSON Schema, ledger.jsonl entries
    lib/                  contract, plan, ledger, ledger-write, executor,
                          verifier, worktree, court, metrics, trace, hooks,
                          subagent-bridge, escalation, errors, args, io, yaml
  commands/               start, continue, ship, abort, ledger
  hooks/                  session-start, pre-tool-use, post-tool-use,
                          user-prompt-submit, stop, hooks.json, run-hook.cmd
  skills/                 using-anvil, authoring-skills, contracting, planning,
                          executing, verifying, judging, resetting
  docs/                   failure-taxonomy.md, anvil_workflow.svg,
                          contract/plan/ledger fixtures
  tests/
    unit/                 per-module unit tests
    pressure/             RED-then-GREEN skill pressure tests
    loop/                 fixture-repo-node, fixture-repo-python,
                          fixture-repo-go, orchestrating loop.test.js

Development

git clone https://github.com/forsonny/DG-Anvil.git
cd DG-Anvil
npm test                                          # all unit tests
node --test tests/pressure/*.pressure.js          # all pressure tests
node --test tests/loop/loop.test.js               # v1 release shape checks

123 tests pass on a clean clone (101 unit + 11 pressure + 11 loop).

Contributing

Skill changes require a paired RED-then-GREEN pressure test that cites a failure-taxonomy row by number. Schema changes require a paired migration subcommand. New error codes must be registered in cli/lib/errors.js. The full discipline is in skills/authoring-skills/SKILL.md.

License

MIT.