Claude Forge Smith

Claude Forge Smith automates skill evolution for Claude Code using TDD principles.

English | 한국어

<div align="center"> <img src="docs/assets/forge.jpeg" alt="Forge" width="600"/> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://readme-typing-svg.demolab.com?font=JetBrains+Mono&weight=700&size=42&duration=3000&pause=1000&color=FFD700&center=true&vCenter=true&width=500&lines=forge"> <img alt="forge" src="https://readme-typing-svg.demolab.com?font=JetBrains+Mono&weight=700&size=42&duration=3000&pause=1000&color=FF6B00&center=true&vCenter=true&width=500&lines=forge"> </picture>

⚔️ Forge your skills into legendary weapons

Version Tests License Stars

TDD-powered automatic skill evolution for Claude Code

</div>

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔥 The Forging Process

Every legendary weapon starts as raw material. Through heat, strikes, and tempering, ordinary metal becomes extraordinary.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#2D1810',
  'primaryTextColor': '#FFD700',
  'primaryBorderColor': '#FF6B00',
  'lineColor': '#FFB800',
  'secondaryColor': '#1A0A00',
  'tertiaryColor': '#1A0A00'
}}}%%
graph LR
    A["⚙️ RAW<br/>SKILL"] -->|"🔥 HEAT"| B["🔍 ANALYZE<br/>Structure"]
    B -->|"🔨 STRIKE"| C["⚡ EVOLVE<br/>Refine"]
    C -->|"💧 TEMPER"| D["✅ VERIFY<br/>Tests"]
    D -->|"⚔️"| E["✨ LEGENDARY"]

    style A fill:#2D1810,stroke:#A0A0A0,stroke-width:2px,color:#A0A0A0
    style B fill:#1A0A00,stroke:#FF6B00,stroke-width:3px,color:#FFB800
    style C fill:#1A0A00,stroke:#FFB800,stroke-width:3px,color:#FFD700
    style D fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style E fill:#FFD700,stroke:#FFD700,color:#1A0A00,stroke-width:4px

The Forge never rests — Each skill is heated in analysis, struck with improvements, tempered by tests, and emerges stronger.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📋 Prerequisites

Before firing up the forge, ensure you have the required tools:

RequirementVersionCheck
Bash4.0+bash --version
Git2.0+git --version
Python 33.6+python3 --version
bcanywhich bc
jq1.6+jq --version
Claude Code CLIlatestclaude --version

Environment Variables

VariableDefaultDescription
CLAUDE_PLUGIN_ROOT(your plugin install directory)Plugin installation path
FORGE_EVALUATOR_CMD(built-in)Custom evaluator script path

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚡ Quick Start

# Install the forge
git clone https://github.com/quantsquirrel/claude-forge-smith.git \
  "$CLAUDE_PLUGIN_ROOT"

# Ignite the flames
/forge:forge --scan

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💎 Features

🔨 Forged in Fire⚡ Auto Evolution🛡️ Safe Trials📊 Triple Strike
Every change tested3× evaluation consensusOriginal preserved95% CI validation

🔀 Dual Forging Paths (v1.0)

Skills can be forged through two methods depending on material quality:

PathConditionTechnique
⚔️ TDD ForgeTest files existStatistical validation (95% CI)
🔥 Pattern ForgeNo testsUsage patterns + heuristic analysis
# Check forging method
source hooks/lib/storage-local.sh
get_upgrade_mode "my-skill"  # Returns: TDD_FIT or HEURISTIC

📊 Forge Monitor (v1.0)

Track your weapons and see which need reforging:

/monitor [--priority=HIGH|MED|LOW] [--type=explicit|silent|all]

Output:

╔══════════════════════════════════════════════════════════════════════╗
║                      🔥 Forge Monitor                                  ║
╠══════════════════════════════════════════════════════════════════════╣
║ Quality Analysis (품질 기반 - 사용량과 무관)                          ║
╠════════════════════════╤══════════╤═══════╤══════════╤═══════════════╣
║ Skill                  │ Type     │ Score │ Grade    │ Priority      ║
╠════════════════════════╪══════════╪═══════╪══════════╪═══════════════╣
║ omc:git-master         │ silent   │   45  │ C        │ [HIGH] ⚡     ║
║ forge:forge      │ explicit │   90  │ A        │ [READY] ✓     ║
╚════════════════════════╧══════════╧═══════╧══════════╧═══════════════╝

⚔️ Skill Type Detection (v1.0)

Skills are classified by how they're invoked:

TypeDescriptionQuality Criteria
explicitUser invokes with /commandargument-hint, mode docs, examples
silentAuto-triggered by contexttrigger keywords, when-to-use, red-flags
# Check skill type
source hooks/lib/storage-local.sh
get_skill_type "my-skill"  # Returns: explicit | silent

📈 Quality-Based Recommendations (v1.0)

Core Principle: Usage ≠ Quality

The forge evaluates skills by structure, not popularity:

PriorityScoreAction
HIGH< 40Immediate reforging needed
MED40-59Improvement recommended
LOW60-79Optional enhancement
READY≥ 80Quality assured
# Get quality score
get_skill_quality_score "my-skill"
# Returns: JSON with score, breakdown, grade (A/B/C/D)

🎖️ Legendary Grades (v1.0)

Exceptional weapons earn special marks:

EnhancementBonusForged When
Reforged+1upgraded: true
Efficient+0.5tokens/usage < 1500
Hot Streak+0.5positive trend
Tested+0.5has test files

S + Reforged + Efficient = ★★★ SSS LEGENDARY

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🛡️ Trial Branch — The Safe Anvil

Master smiths never work directly on the masterpiece. They test on trial pieces first.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#2D1810',
  'primaryTextColor': '#FFD700',
  'primaryBorderColor': '#FF6B00',
  'lineColor': '#FFB800',
  'secondaryColor': '#1A0A00',
  'tertiaryColor': '#1A0A00'
}}}%%
flowchart TB
    subgraph MAIN["⚔️ main (Master Weapon)"]
        direction LR
        C1["v0.6<br/>71pts"]
        C2["v0.7<br/>90pts"]
        C1 -.-> C2
    end

    subgraph TRIAL["🔥 trial/skill-name (Testing Anvil)"]
        direction LR
        T1["🔨 Strike"]
        T2["🔨 Strike"]
        T3["🔨 Strike"]
        T4{"Worthy?"}
        T1 --> T2 --> T3 --> T4
    end

    C1 -->|"fork"| T1
    T4 -->|"✅ Stronger"| C2
    T4 -->|"❌ Brittle"| D["🗑️ Discard"]

    style C1 fill:#2D1810,stroke:#FFD700,stroke-width:2px,color:#FFD700
    style C2 fill:#FFD700,stroke:#FFD700,color:#1A0A00,stroke-width:3px
    style T1 fill:#1A0A00,stroke:#FF6B00,stroke-width:2px,color:#FFB800
    style T2 fill:#1A0A00,stroke:#FF6B00,stroke-width:2px,color:#FFB800
    style T3 fill:#1A0A00,stroke:#FF6B00,stroke-width:2px,color:#FFB800
    style T4 fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style D fill:#1A0A00,stroke:#A0A0A0,stroke-width:1px,color:#A0A0A0

Safety First — The master weapon (main) is never touched until the trial proves worthy. Failed experiments are discarded, not merged.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔨 Triple Strike — The Smith's Consensus

A single hammer blow can deceive. Three strikes reveal the truth.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#2D1810',
  'primaryTextColor': '#FFD700',
  'primaryBorderColor': '#FF6B00',
  'lineColor': '#FFB800',
  'secondaryColor': '#1A0A00',
  'tertiaryColor': '#1A0A00'
}}}%%
flowchart LR
    subgraph STRIKE["🔨 Triple Strike Evaluation"]
        direction TB
        S1["🔨 Smith 1<br/>Score: 78"]
        S2["🔨 Smith 2<br/>Score: 81"]
        S3["🔨 Smith 3<br/>Score: 79"]
    end

    subgraph MEASURE["⚖️ Measure Quality"]
        direction TB
        M1["Mean: 79.3"]
        M2["95% Confidence"]
    end

    subgraph VERDICT["⚔️ Final Verdict"]
        V1{"Stronger than<br/>before?"}
        V1 -->|"YES"| ACCEPT["✅ REFORGE"]
        V1 -->|"NO"| REJECT["❌ DISCARD"]
    end

    STRIKE --> MEASURE --> VERDICT

    style S1 fill:#1A0A00,stroke:#FFB800,stroke-width:2px,color:#FFD700
    style S2 fill:#1A0A00,stroke:#FFB800,stroke-width:2px,color:#FFD700
    style S3 fill:#1A0A00,stroke:#FFB800,stroke-width:2px,color:#FFD700
    style M1 fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style M2 fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style ACCEPT fill:#FFD700,stroke:#FFD700,color:#1A0A00,stroke-width:3px
    style REJECT fill:#1A0A00,stroke:#A0A0A0,stroke-width:1px,color:#A0A0A0

Statistical Consensus — Three independent evaluations. Statistical confidence intervals. Only merge if the new version is provably superior.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📊 Forging Results

Before: 71 points — Raw, unrefined After: 90.33 points — Tempered, legendary

+27% improvement — Forge reforged itself

The ultimate test: A tool that improves itself through its own process.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔒 Safety Mechanisms

Master smiths build in multiple safeguards:

SafeguardProtection
🔄 Rollback ReadyOriginal always preserved
🔒 Isolated TrialsTest in separate branch
📝 Full LogsEvery strike recorded
⏱️ Iteration LimitMaximum 6 attempts
Test VerificationAll tests must pass

No weapon leaves the forge untested. No master version is ever corrupted.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🚀 Commands

CommandAction
/forge:forge --scan🔍 Scout for skills ready to reforge
/forge:forge <skill>⚡ Reforge a specific skill
/forge:forge --history📜 View forging chronicles
/forge:forge --watch👁️ Monitor the forge
/forge:monitor📊 Quality dashboard
/forge:smelt🔥 Skill creation with TDD methodology

💡 Argument Hints (v1.0)

When typing a slash command, you'll see available modes:

/forge <skill-name> [--precision=high|-n5] - modes: TDD_FIT|HEURISTIC
/monitor [--priority=HIGH|MED|LOW] [--type=explicit|silent|all]

Add argument-hint to your skill's frontmatter to enable this feature.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚙️ Configuration

Forge behavior can be customized via config/settings.env:

SettingDefaultDescription
STORAGE_MODElocalStorage backend (currently only local supported)
LOCAL_STORAGE_DIR~/.claude/.skill-evaluatorLocal storage directory for skill data
SKILL_EVAL_DEBUGfalseEnable debug logging to stderr

Example:

# Enable debug mode
export SKILL_EVAL_DEBUG=true

# Use custom storage location
export LOCAL_STORAGE_DIR="$HOME/.my-forge-data"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔧 Troubleshooting

Common Issues

bc: command not found

# macOS
brew install bc

# Ubuntu/Debian
sudo apt-get install bc

# Fedora/RHEL
sudo dnf install bc

jq: command not found

# macOS
brew install jq

# Ubuntu/Debian
sudo apt-get install jq

# Fedora/RHEL
sudo dnf install jq

Permission denied when running commands

# Make scripts executable
cd "$CLAUDE_PLUGIN_ROOT"
chmod +x hooks/*.sh
chmod +x bin/*

Plugin not detected by Claude Code

  1. Check installation path matches CLAUDE_PLUGIN_ROOT
  2. Verify plugin.json exists in the plugin root
  3. Restart Claude Code CLI
  4. Run /help to see if Forge commands appear

Forge evaluations fail silently

# Enable debug logging
export SKILL_EVAL_DEBUG=true

# Check storage directory exists
ls -la ~/.claude/.skill-evaluator

# Verify evaluator script is executable
ls -la "$CLAUDE_PLUGIN_ROOT/bin/skill-evaluator.py"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📚 The Theory Behind the Forge

Gödel Machines (Schmidhuber 2007) — Self-referential systems that can improve their own code Dynamic Adaptation — Incremental evolution with statistical validation TDD Safety Boundaries — Tests prevent catastrophic self-modification Multi-Evaluator Consensus — Multiple independent judges reduce bias

Read the full theory →

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

<div align="center">

Inspired by skill-up

⚒️ Forged with Claude Code · 🔥 MIT License · ⚔️ v1.0

This project is not affiliated with or endorsed by Anthropic. Claude and Claude Code are trademarks of Anthropic PBC.

</div>