forge
Structured development workflows for Claude Code to enhance coding discipline and reduce errors.
forge
Structured development workflows for Claude Code. Stop coding without a plan, stop debugging in the dark, stop shipping unverified work.
The problem
LLMs coding without structure produce scope creep, untested code, lost context between sessions, and CLAUDE.md files that balloon into unreadable walls of text. forge enforces discipline through phased workflows, persistent context tracking, and TDD. Every phase has entry criteria, exit criteria, and a concrete deliverable.
Andrej Karpathy called it: LLMs will confidently generate plausible-looking code that is overcomplicated, undertested, and drifts from the original goal. forge is the guardrail layer that prevents that.
Inspiration
forge wouldn't exist without obra/superpowers by Jesse Vincent. obra was the first to prove that disciplined development workflows could be enforced through Claude Code skill instructions alone: phases, TDD, and guardrails composed into a coherent agent experience. forge takes that idea, strips external dependencies, and ships everything as a single self-contained package with context persisted on the local filesystem.
The 8 phases
brainstorm -> plan -> execute -> review -> verify -> finish
| ^
debug -----------------'
security audit (on-demand, anytime)
- Brainstorm: Refine ideas through Socratic questioning. No code allowed.
- Plan: Break the design into verifiable tasks with acceptance criteria.
- Execute: Implement via TDD. Parallel agents dispatched for independent tasks.
- Debug: Systematic root-cause analysis when something breaks.
- Review: Code review before merge, even for self-written code.
- Verify: Evidence before claims. Terminal output as proof.
- Finish: Merge, cleanup, close session, archive tasks.
- Security Audit: On-demand pre-deploy security review.
What you get
- Phases that don't drift. Each phase has entry criteria, exit criteria, and a concrete deliverable. No vibes-based execution.
- TDD that's actually enforced. Tests come first or the agent stops. The methodology file lists every common rationalization and refutes it.
- Context that lives on disk. Your design, plan, decisions, and task history
persist in
~/.forge/. Survives/compact, survives session restarts, ready to reload at any time. - Guardrails injected into every agent prompt. The 4 Karpathy-derived principles are part of every parallel task dispatch.
- CLAUDE.md stays lean. A PreToolUse hook blocks bloat at 30 lines by default.
- Zero dependencies. Python 3.10+ standard library only. No PyYAML, no SSH, no external services.
Quick install
git clone https://github.com/gqueiroz13/forge.git
cd forge
python3 setup.py
Requires Python 3.10+. No other dependencies.
The setup script copies skill files to ~/.claude/skills/forge/, installs the
CLAUDE.md protection hook, and creates the ~/.forge/ data directory.
After setup completes, the cloned repo is no longer needed. You can delete it,
or keep it to git pull future updates and re-run python3 setup.py. forge
itself lives entirely in ~/.claude/skills/forge/ and ~/.forge/.
Quick start
New project
/forge brainstorm "a CLI tool that converts CSV to JSON with streaming support"
forge will ask clarifying questions one at a time, challenge your assumptions, then produce a design document. Once you approve:
/forge plan # Creates tasks with acceptance criteria
/forge execute # Implements via TDD, dispatching parallel agents
/forge verify # Requires terminal output as proof of correctness
Existing project
/forge adopt
forge scans your codebase, populates a project index, and starts a session.
From there, use /forge brainstorm for new features or /forge execute to
pick up pending tasks.
Resuming work (the most important command)
This is where forge earns its keep. Open Claude Code in any project, type:
/forge status
If no session is active, you get a list of every project, the count of pending
tasks per project, and the most recent session. Pick one and run
/forge start <project>.
If a session is active, you get the project name, session duration, the in-progress tasks (with IDs and titles), and the top of the backlog. The LLM now has full context: where you stopped, what is half-done, what is queued.
Combined with the lean CLAUDE.md policy, this is how forge solves the
context-loss problem. Your project knowledge does not live in the conversation.
It lives on disk in ~/.forge/. /compact cannot erase it. Closing Claude Code
cannot erase it. Rebooting cannot erase it. /forge status is the bridge that
brings it back into the conversation, every time.
You will use /forge status more often than any other command.
See it in action
examples/forge-itself/ contains real artifacts
produced during the development of forge: the design document, the plan, code
review and verification reports, selected task files, and the living
index.md. Every file there was written by an LLM following the same phased
workflow forge enforces.
Start with the explanatory README for an
orientation, or jump straight to index.md
to see what a project's living reference looks like in practice.
Philosophy
Index as living reference
The project index (~/.forge/projects/<project>/index.md) is the source of
truth for how a system works. Architecture, key decisions, conventions,
boundaries. It gets updated as the system evolves. It is the one file you
read to understand a project.
Tasks are ephemeral
Completed tasks do not accumulate. They go to --discard (routine, operational)
or --into-index (architectural, worth preserving). The task directory stays
lean. Knowledge that matters migrates to the index. Everything else is thrown
away.
Evidence before claims
The verification phase requires terminal output as proof. Tests passing, endpoints responding, migrations running. "It should work" is not evidence. If you cannot show it, it is not done.
The 4 guardrails
Derived from common LLM failure modes when programming.
-
Think Before Coding - State assumptions explicitly. If multiple interpretations exist, present them. Do not choose silently.
-
Simplicity First - Minimum code that solves the problem. No speculative abstractions, no gold-plating, no "flexibility" that was not requested.
-
Surgical Changes - Touch only what you must. Do not "improve" adjacent code. Every changed line must trace directly to the task.
-
Goal-Driven Execution - Transform tasks into verifiable objectives. Define success criteria, then loop until verified.
CLAUDE.md policy
forge enforces a lean CLAUDE.md. By default, 30 lines maximum, enforced by a PreToolUse hook that blocks writes exceeding the limit.
CLAUDE.md is a pointer, not a knowledge repository. It tells Claude Code where
to find context. The actual knowledge lives in ~/.forge/projects/<project>/
where it can be structured, searched, and maintained.
Configurable in ~/.forge/config.yaml:
claude_md_guard:
enabled: true
max_lines: 30
Set enabled: false to disable the hook entirely, or raise max_lines if your
project legitimately needs more.
Configuration
All configuration lives in ~/.forge/config.yaml:
# Agent dispatch strategy for parallel tasks
# "agent" = Claude Code Agent tool (default, no dependencies)
# "cmux" = cmux terminal multiplexer (requires cmux + bundled cmux skill)
dispatch: agent
# CLAUDE.md protection hook
claude_md_guard:
enabled: true
max_lines: 30
cmux dispatch (opt-in)
forge bundles a cmux Claude Code skill but only installs it if you choose
dispatch: cmux during setup. cmux dispatch requires:
- The cmux terminal multiplexer installed and used as your active terminal
- The bundled cmux skill installed at
~/.claude/skills/cmux/(setup.py copies it for you when you opt in)
If you select cmux during setup but the cmux binary is not found in PATH,
setup.py warns you and still installs the skill. Install cmux later, then
re-run setup.py to confirm. To switch back to the Agent tool, edit
~/.forge/config.yaml and set dispatch: agent.
Data directory
~/.forge/
config.yaml
projects/
<project>/
index.md # Living reference
tasks/ # Active tasks
sessions/ # Session logs
archive/ # Completed tasks
Each project gets its own directory. The index is the permanent record. Tasks come and go. Sessions are logged for continuity between conversations.
Commands reference
| Command | Description |
|---|---|
/forge | Show available commands and current status |
/forge help | Show available commands and current status |
/forge brainstorm <idea> | Start brainstorming a new idea |
/forge plan | Create implementation plan from approved design |
/forge execute | Start executing the plan via TDD |
/forge debug | Enter systematic debugging mode |
/forge review | Run code review before merge |
/forge verify | Run verification with evidence requirement |
/forge finish | Merge, cleanup, close session, archive tasks |
/forge audit | Run on-demand security audit |
/forge adopt | Onboard an existing project into forge |
/forge status | Show current project, phase, and pending tasks |
Contributing
- Fork the repository
- Create a feature branch
- Write tests first (TDD is not optional)
- Submit a pull request
forge eats its own dog food. Use /forge to develop forge.
License
MIT. See LICENSE.