Claude Forge Smith

Claude Forge Smith automates skill evolution for Claude Code using TDD principles.

English | 한국어

⚔️ Forge your skills into legendary weapons

TDD-powered automatic skill evolution for Claude Code

</div>

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔥 The Forging Process

Every legendary weapon starts as raw material. Through heat, strikes, and tempering, ordinary metal becomes extraordinary.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#2D1810',
  'primaryTextColor': '#FFD700',
  'primaryBorderColor': '#FF6B00',
  'lineColor': '#FFB800',
  'secondaryColor': '#1A0A00',
  'tertiaryColor': '#1A0A00'
}}}%%
graph LR
    A["⚙️ RAW<br/>SKILL"] -->|"🔥 HEAT"| B["🔍 ANALYZE<br/>Structure"]
    B -->|"🔨 STRIKE"| C["⚡ EVOLVE<br/>Refine"]
    C -->|"💧 TEMPER"| D["✅ VERIFY<br/>Tests"]
    D -->|"⚔️"| E["✨ LEGENDARY"]

    style A fill:#2D1810,stroke:#A0A0A0,stroke-width:2px,color:#A0A0A0
    style B fill:#1A0A00,stroke:#FF6B00,stroke-width:3px,color:#FFB800
    style C fill:#1A0A00,stroke:#FFB800,stroke-width:3px,color:#FFD700
    style D fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style E fill:#FFD700,stroke:#FFD700,color:#1A0A00,stroke-width:4px

The Forge never rests — Each skill is heated in analysis, struck with improvements, tempered by tests, and emerges stronger.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📋 Prerequisites

Before firing up the forge, ensure you have the required tools:

Requirement	Version	Check
Bash	4.0+	`bash --version`
Git	2.0+	`git --version`
Python 3	3.6+	`python3 --version`
bc	any	`which bc`
jq	1.6+	`jq --version`
Claude Code CLI	latest	`claude --version`

Environment Variables

Variable	Default	Description
`CLAUDE_PLUGIN_ROOT`	(your plugin install directory)	Plugin installation path
`FORGE_EVALUATOR_CMD`	(built-in)	Custom evaluator script path

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚡ Quick Start

# Install the forge
git clone https://github.com/quantsquirrel/claude-forge-smith.git \
  "$CLAUDE_PLUGIN_ROOT"

# Ignite the flames
/forge:forge --scan

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💎 Features

🔨 Forged in Fire	⚡ Auto Evolution	🛡️ Safe Trials	📊 Triple Strike
Every change tested	3× evaluation consensus	Original preserved	95% CI validation

🔀 Dual Forging Paths (v1.0)

Skills can be forged through two methods depending on material quality:

Path	Condition	Technique
⚔️ TDD Forge	Test files exist	Statistical validation (95% CI)
🔥 Pattern Forge	No tests	Usage patterns + heuristic analysis

# Check forging method
source hooks/lib/storage-local.sh
get_upgrade_mode "my-skill"  # Returns: TDD_FIT or HEURISTIC

📊 Forge Monitor (v1.0)

Track your weapons and see which need reforging:

/monitor [--priority=HIGH|MED|LOW] [--type=explicit|silent|all]

Output:

╔══════════════════════════════════════════════════════════════════════╗
║                      🔥 Forge Monitor                                  ║
╠══════════════════════════════════════════════════════════════════════╣
║ Quality Analysis (품질 기반 - 사용량과 무관)                          ║
╠════════════════════════╤══════════╤═══════╤══════════╤═══════════════╣
║ Skill                  │ Type     │ Score │ Grade    │ Priority      ║
╠════════════════════════╪══════════╪═══════╪══════════╪═══════════════╣
║ omc:git-master         │ silent   │   45  │ C        │ [HIGH] ⚡     ║
║ forge:forge      │ explicit │   90  │ A        │ [READY] ✓     ║
╚════════════════════════╧══════════╧═══════╧══════════╧═══════════════╝

⚔️ Skill Type Detection (v1.0)

Skills are classified by how they're invoked:

Type	Description	Quality Criteria
explicit	User invokes with `/command`	argument-hint, mode docs, examples
silent	Auto-triggered by context	trigger keywords, when-to-use, red-flags

# Check skill type
source hooks/lib/storage-local.sh
get_skill_type "my-skill"  # Returns: explicit | silent

📈 Quality-Based Recommendations (v1.0)

Core Principle: Usage ≠ Quality

The forge evaluates skills by structure, not popularity:

Priority	Score	Action
HIGH	< 40	Immediate reforging needed
MED	40-59	Improvement recommended
LOW	60-79	Optional enhancement
READY	≥ 80	Quality assured

# Get quality score
get_skill_quality_score "my-skill"
# Returns: JSON with score, breakdown, grade (A/B/C/D)

🎖️ Legendary Grades (v1.0)

Exceptional weapons earn special marks:

Enhancement	Bonus	Forged When
Reforged	+1	`upgraded: true`
Efficient	+0.5	tokens/usage < 1500
Hot Streak	+0.5	positive trend
Tested	+0.5	has test files

S + Reforged + Efficient = ★★★ SSS LEGENDARY

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🛡️ Trial Branch — The Safe Anvil

Master smiths never work directly on the masterpiece. They test on trial pieces first.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#2D1810',
  'primaryTextColor': '#FFD700',
  'primaryBorderColor': '#FF6B00',
  'lineColor': '#FFB800',
  'secondaryColor': '#1A0A00',
  'tertiaryColor': '#1A0A00'
}}}%%
flowchart TB
    subgraph MAIN["⚔️ main (Master Weapon)"]
        direction LR
        C1["v0.6<br/>71pts"]
        C2["v0.7<br/>90pts"]
        C1 -.-> C2
    end

    subgraph TRIAL["🔥 trial/skill-name (Testing Anvil)"]
        direction LR
        T1["🔨 Strike"]
        T2["🔨 Strike"]
        T3["🔨 Strike"]
        T4{"Worthy?"}
        T1 --> T2 --> T3 --> T4
    end

    C1 -->|"fork"| T1
    T4 -->|"✅ Stronger"| C2
    T4 -->|"❌ Brittle"| D["🗑️ Discard"]

    style C1 fill:#2D1810,stroke:#FFD700,stroke-width:2px,color:#FFD700
    style C2 fill:#FFD700,stroke:#FFD700,color:#1A0A00,stroke-width:3px
    style T1 fill:#1A0A00,stroke:#FF6B00,stroke-width:2px,color:#FFB800
    style T2 fill:#1A0A00,stroke:#FF6B00,stroke-width:2px,color:#FFB800
    style T3 fill:#1A0A00,stroke:#FF6B00,stroke-width:2px,color:#FFB800
    style T4 fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style D fill:#1A0A00,stroke:#A0A0A0,stroke-width:1px,color:#A0A0A0

Safety First — The master weapon (main) is never touched until the trial proves worthy. Failed experiments are discarded, not merged.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔨 Triple Strike — The Smith's Consensus

A single hammer blow can deceive. Three strikes reveal the truth.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#2D1810',
  'primaryTextColor': '#FFD700',
  'primaryBorderColor': '#FF6B00',
  'lineColor': '#FFB800',
  'secondaryColor': '#1A0A00',
  'tertiaryColor': '#1A0A00'
}}}%%
flowchart LR
    subgraph STRIKE["🔨 Triple Strike Evaluation"]
        direction TB
        S1["🔨 Smith 1<br/>Score: 78"]
        S2["🔨 Smith 2<br/>Score: 81"]
        S3["🔨 Smith 3<br/>Score: 79"]
    end

    subgraph MEASURE["⚖️ Measure Quality"]
        direction TB
        M1["Mean: 79.3"]
        M2["95% Confidence"]
    end

    subgraph VERDICT["⚔️ Final Verdict"]
        V1{"Stronger than<br/>before?"}
        V1 -->|"YES"| ACCEPT["✅ REFORGE"]
        V1 -->|"NO"| REJECT["❌ DISCARD"]
    end

    STRIKE --> MEASURE --> VERDICT

    style S1 fill:#1A0A00,stroke:#FFB800,stroke-width:2px,color:#FFD700
    style S2 fill:#1A0A00,stroke:#FFB800,stroke-width:2px,color:#FFD700
    style S3 fill:#1A0A00,stroke:#FFB800,stroke-width:2px,color:#FFD700
    style M1 fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style M2 fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style ACCEPT fill:#FFD700,stroke:#FFD700,color:#1A0A00,stroke-width:3px
    style REJECT fill:#1A0A00,stroke:#A0A0A0,stroke-width:1px,color:#A0A0A0

Statistical Consensus — Three independent evaluations. Statistical confidence intervals. Only merge if the new version is provably superior.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📊 Forging Results

Before: 71 points — Raw, unrefined After: 90.33 points — Tempered, legendary

+27% improvement — Forge reforged itself

The ultimate test: A tool that improves itself through its own process.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔒 Safety Mechanisms

Master smiths build in multiple safeguards:

Safeguard	Protection
🔄 Rollback Ready	Original always preserved
🔒 Isolated Trials	Test in separate branch
📝 Full Logs	Every strike recorded
⏱️ Iteration Limit	Maximum 6 attempts
✅ Test Verification	All tests must pass

No weapon leaves the forge untested. No master version is ever corrupted.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🚀 Commands

Command	Action
`/forge:forge --scan`	🔍 Scout for skills ready to reforge
`/forge:forge <skill>`	⚡ Reforge a specific skill
`/forge:forge --history`	📜 View forging chronicles
`/forge:forge --watch`	👁️ Monitor the forge
`/forge:monitor`	📊 Quality dashboard
`/forge:smelt`	🔥 Skill creation with TDD methodology

💡 Argument Hints (v1.0)

When typing a slash command, you'll see available modes:

/forge <skill-name> [--precision=high|-n5] - modes: TDD_FIT|HEURISTIC
/monitor [--priority=HIGH|MED|LOW] [--type=explicit|silent|all]

Add argument-hint to your skill's frontmatter to enable this feature.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚙️ Configuration

Forge behavior can be customized via config/settings.env:

Setting	Default	Description
`STORAGE_MODE`	`local`	Storage backend (currently only local supported)
`LOCAL_STORAGE_DIR`	`~/.claude/.skill-evaluator`	Local storage directory for skill data
`SKILL_EVAL_DEBUG`	`false`	Enable debug logging to stderr

Example:

# Enable debug mode
export SKILL_EVAL_DEBUG=true

# Use custom storage location
export LOCAL_STORAGE_DIR="$HOME/.my-forge-data"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔧 Troubleshooting

Common Issues

`bc: command not found`

# macOS
brew install bc

# Ubuntu/Debian
sudo apt-get install bc

# Fedora/RHEL
sudo dnf install bc

`jq: command not found`

# macOS
brew install jq

# Ubuntu/Debian
sudo apt-get install jq

# Fedora/RHEL
sudo dnf install jq

`Permission denied` when running commands

# Make scripts executable
cd "$CLAUDE_PLUGIN_ROOT"
chmod +x hooks/*.sh
chmod +x bin/*

Plugin not detected by Claude Code

Check installation path matches CLAUDE_PLUGIN_ROOT
Verify plugin.json exists in the plugin root
Restart Claude Code CLI
Run /help to see if Forge commands appear

Forge evaluations fail silently

# Enable debug logging
export SKILL_EVAL_DEBUG=true

# Check storage directory exists
ls -la ~/.claude/.skill-evaluator

# Verify evaluator script is executable
ls -la "$CLAUDE_PLUGIN_ROOT/bin/skill-evaluator.py"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📚 The Theory Behind the Forge

Gödel Machines (Schmidhuber 2007) — Self-referential systems that can improve their own code Dynamic Adaptation — Incremental evolution with statistical validation TDD Safety Boundaries — Tests prevent catastrophic self-modification Multi-Evaluator Consensus — Multiple independent judges reduce bias

Read the full theory →

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Inspired by skill-up

⚒️ Forged with Claude Code · 🔥 MIT License · ⚔️ v1.0

This project is not affiliated with or endorsed by Anthropic. Claude and Claude Code are trademarks of Anthropic PBC.

</div>

⚔️ Forge your skills into legendary weapons

🔥 The Forging Process

📋 Prerequisites

Environment Variables

⚡ Quick Start

💎 Features

🔀 Dual Forging Paths (v1.0)

📊 Forge Monitor (v1.0)

⚔️ Skill Type Detection (v1.0)

📈 Quality-Based Recommendations (v1.0)

🎖️ Legendary Grades (v1.0)

🛡️ Trial Branch — The Safe Anvil

🔨 Triple Strike — The Smith's Consensus

📊 Forging Results

🔒 Safety Mechanisms

🚀 Commands

💡 Argument Hints (v1.0)

⚙️ Configuration

🔧 Troubleshooting

Common Issues

bc: command not found

jq: command not found

Permission denied when running commands

Plugin not detected by Claude Code

Forge evaluations fail silently

📚 The Theory Behind the Forge

`bc: command not found`

`jq: command not found`

`Permission denied` when running commands