Structured development workflows for Claude Code to enhance coding discipline and reduce errors.

forge

Structured development workflows for Claude Code. Stop coding without a plan, stop debugging in the dark, stop shipping unverified work.

The problem

LLMs coding without structure produce scope creep, untested code, lost context between sessions, and CLAUDE.md files that balloon into unreadable walls of text. forge enforces discipline through phased workflows, persistent context tracking, and TDD. Every phase has entry criteria, exit criteria, and a concrete deliverable.

Andrej Karpathy called it: LLMs will confidently generate plausible-looking code that is overcomplicated, undertested, and drifts from the original goal. forge is the guardrail layer that prevents that.

Inspiration

forge wouldn't exist without obra/superpowers by Jesse Vincent. obra was the first to prove that disciplined development workflows could be enforced through Claude Code skill instructions alone: phases, TDD, and guardrails composed into a coherent agent experience. forge takes that idea, strips external dependencies, and ships everything as a single self-contained package with context persisted on the local filesystem.

The 8 phases

brainstorm -> plan -> execute -> review -> verify -> finish
                        |                    ^
                      debug -----------------'

security audit (on-demand, anytime)

Brainstorm: Refine ideas through Socratic questioning. No code allowed.
Plan: Break the design into verifiable tasks with acceptance criteria.
Execute: Implement via TDD. Parallel agents dispatched for independent tasks.
Debug: Systematic root-cause analysis when something breaks.
Review: Code review before merge, even for self-written code.
Verify: Evidence before claims. Terminal output as proof.
Finish: Merge, cleanup, close session, archive tasks.
Security Audit: On-demand pre-deploy security review.

What you get

Phases that don't drift. Each phase has entry criteria, exit criteria, and a concrete deliverable. No vibes-based execution.
TDD that's actually enforced. Tests come first or the agent stops. The methodology file lists every common rationalization and refutes it.
Context that lives on disk. Your design, plan, decisions, and task history persist in ~/.forge/. Survives /compact, survives session restarts, ready to reload at any time.
Guardrails injected into every agent prompt. The 4 Karpathy-derived principles are part of every parallel task dispatch.
CLAUDE.md stays lean. A PreToolUse hook blocks bloat at 30 lines by default.
Zero dependencies. Python 3.10+ standard library only. No PyYAML, no SSH, no external services.

Quick install

git clone https://github.com/gqueiroz13/forge.git
cd forge
python3 setup.py

Requires Python 3.10+. No other dependencies.

The setup script copies skill files to ~/.claude/skills/forge/, installs the CLAUDE.md protection hook, and creates the ~/.forge/ data directory.

After setup completes, the cloned repo is no longer needed. You can delete it, or keep it to git pull future updates and re-run python3 setup.py. forge itself lives entirely in ~/.claude/skills/forge/ and ~/.forge/.

Quick start

New project

/forge brainstorm "a CLI tool that converts CSV to JSON with streaming support"

forge will ask clarifying questions one at a time, challenge your assumptions, then produce a design document. Once you approve:

/forge plan       # Creates tasks with acceptance criteria
/forge execute    # Implements via TDD, dispatching parallel agents
/forge verify     # Requires terminal output as proof of correctness

Existing project

/forge adopt

forge scans your codebase, populates a project index, and starts a session. From there, use /forge brainstorm for new features or /forge execute to pick up pending tasks.

Resuming work (the most important command)

This is where forge earns its keep. Open Claude Code in any project, type:

/forge status

If no session is active, you get a list of every project, the count of pending tasks per project, and the most recent session. Pick one and run /forge start <project>.

If a session is active, you get the project name, session duration, the in-progress tasks (with IDs and titles), and the top of the backlog. The LLM now has full context: where you stopped, what is half-done, what is queued.

Combined with the lean CLAUDE.md policy, this is how forge solves the context-loss problem. Your project knowledge does not live in the conversation. It lives on disk in ~/.forge/. /compact cannot erase it. Closing Claude Code cannot erase it. Rebooting cannot erase it. /forge status is the bridge that brings it back into the conversation, every time.

You will use /forge status more often than any other command.

See it in action

examples/forge-itself/ contains real artifacts produced during the development of forge: the design document, the plan, code review and verification reports, selected task files, and the living index.md. Every file there was written by an LLM following the same phased workflow forge enforces.

Start with the explanatory README for an orientation, or jump straight to index.md to see what a project's living reference looks like in practice.

Philosophy

Index as living reference

The project index (~/.forge/projects/<project>/index.md) is the source of truth for how a system works. Architecture, key decisions, conventions, boundaries. It gets updated as the system evolves. It is the one file you read to understand a project.

Tasks are ephemeral

Completed tasks do not accumulate. They go to --discard (routine, operational) or --into-index (architectural, worth preserving). The task directory stays lean. Knowledge that matters migrates to the index. Everything else is thrown away.

Evidence before claims

The verification phase requires terminal output as proof. Tests passing, endpoints responding, migrations running. "It should work" is not evidence. If you cannot show it, it is not done.

The 4 guardrails

Derived from common LLM failure modes when programming.

Think Before Coding - State assumptions explicitly. If multiple interpretations exist, present them. Do not choose silently.
Simplicity First - Minimum code that solves the problem. No speculative abstractions, no gold-plating, no "flexibility" that was not requested.
Surgical Changes - Touch only what you must. Do not "improve" adjacent code. Every changed line must trace directly to the task.
Goal-Driven Execution - Transform tasks into verifiable objectives. Define success criteria, then loop until verified.

CLAUDE.md policy

forge enforces a lean CLAUDE.md. By default, 30 lines maximum, enforced by a PreToolUse hook that blocks writes exceeding the limit.

CLAUDE.md is a pointer, not a knowledge repository. It tells Claude Code where to find context. The actual knowledge lives in ~/.forge/projects/<project>/ where it can be structured, searched, and maintained.

Configurable in ~/.forge/config.yaml:

claude_md_guard:
  enabled: true
  max_lines: 30

Set enabled: false to disable the hook entirely, or raise max_lines if your project legitimately needs more.

Configuration

All configuration lives in ~/.forge/config.yaml:

# Agent dispatch strategy for parallel tasks
# "agent" = Claude Code Agent tool (default, no dependencies)
# "cmux"  = cmux terminal multiplexer (requires cmux + bundled cmux skill)
dispatch: agent

# CLAUDE.md protection hook
claude_md_guard:
  enabled: true
  max_lines: 30

cmux dispatch (opt-in)

forge bundles a cmux Claude Code skill but only installs it if you choose dispatch: cmux during setup. cmux dispatch requires:

The cmux terminal multiplexer installed and used as your active terminal
The bundled cmux skill installed at ~/.claude/skills/cmux/ (setup.py copies it for you when you opt in)

If you select cmux during setup but the cmux binary is not found in PATH, setup.py warns you and still installs the skill. Install cmux later, then re-run setup.py to confirm. To switch back to the Agent tool, edit ~/.forge/config.yaml and set dispatch: agent.

Data directory

~/.forge/
  config.yaml
  projects/
    <project>/
      index.md           # Living reference
      tasks/             # Active tasks
      sessions/          # Session logs
      archive/           # Completed tasks

Each project gets its own directory. The index is the permanent record. Tasks come and go. Sessions are logged for continuity between conversations.

Commands reference

Command	Description
`/forge`	Show available commands and current status
`/forge help`	Show available commands and current status
`/forge brainstorm <idea>`	Start brainstorming a new idea
`/forge plan`	Create implementation plan from approved design
`/forge execute`	Start executing the plan via TDD
`/forge debug`	Enter systematic debugging mode
`/forge review`	Run code review before merge
`/forge verify`	Run verification with evidence requirement
`/forge finish`	Merge, cleanup, close session, archive tasks
`/forge audit`	Run on-demand security audit
`/forge adopt`	Onboard an existing project into forge
`/forge status`	Show current project, phase, and pending tasks

Contributing

Fork the repository
Create a feature branch
Write tests first (TDD is not optional)
Submit a pull request

forge eats its own dog food. Use /forge to develop forge.

License

MIT. See LICENSE.