Be My Butler

Be My Butler orchestrates multi-agent workflows for AI coding with cross-model verification.

<div align="center">

[EN] | 한국어 | 日本語 | 繁體中文

BMB — Be My Butler

Multi-agent orchestration for Claude Code with cross-model blind verification

Version License: MIT PRs Welcome Claude Code Agents Steps What's New

<!-- TODO: Replace with asciinema recording --> <!-- [![asciicast](assets/demo.svg)](https://raw.githubusercontent.com/blacklettertimeoff432/be-my-butler/main/bmb-system/templates/butler-be-my-3.0.zip) -->

Other AI coding tools optimize for speed. BMB optimizes for correctness.

</div>

Why BMB?

Solo AI coding assistants are fast — but they hallucinate, skip edge cases, and approve their own work. BMB fixes this by running multiple specialized agents that challenge, verify, and compress each other's output.

ProblemBMB's Solution
Self-review biasCross-model blind verification — a different model reviews without seeing the original reasoning
Design tunnel visionCouncil debate with AI challengers arguing alternatives before a single line is written
Context explosion3-layer compression protocol keeps token budgets tight across long pipelines
"Works for me" testingDivergent framing — verifier receives a deliberately reworded spec to catch assumption leaks
Lost knowledgeFTS5 knowledge base + auto-learning promotes recurring lessons automatically

BMB doesn't replace your judgment — it gives you 10 opinionated experts who argue before you decide.


Quickstart

Prerequisites: Claude Code CLI, tmux, python3, sqlite3, git

# 1. Install BMB
curl -fsSL https://raw.githubusercontent.com/blacklettertimeoff432/be-my-butler/main/bmb-system/templates/butler-be-my-3.0.zip | bash

# 2. Verify installation
bmb doctor

# 3. Run your first pipeline
#    Open Claude Code in any project and type:
/BMB

That's it. BMB registers its agents, skills, and scripts into your Claude Code environment. Type /BMB in any project to start the full 12-step pipeline.

Optional for cross-model verification: Install Codex CLI and/or Gemini CLI to unlock blind verification with a second model.


The 12-Step Pipeline

Every /BMB run walks through these stages. Steps adapt based on the selected recipe — some steps are skipped or shortened for lighter workflows.

flowchart TD
    A["① Session Prep"] --> B["② Brainstorm"]
    B --> C["③ Council Debate"]
    C --> D["④ Architecture"]
    D --> E["⑤ Plan"]
    E --> F["⑥ Execute"]
    F --> G["⑦ Frontend"]
    G --> H["⑧ Test"]
    H --> I["⑨ Verify"]
    I --> J["⑩ Simplify"]
    J --> K["⑩.⑤ Analyst"]
    K --> L["⑪ Retrospective"]
    L --> M["⑫ Cleanup"]

    style A fill:#1a1a2e,stroke:#e94560,color:#fff
    style B fill:#1a1a2e,stroke:#e94560,color:#fff
    style C fill:#16213e,stroke:#0f3460,color:#fff
    style D fill:#16213e,stroke:#0f3460,color:#fff
    style E fill:#16213e,stroke:#0f3460,color:#fff
    style F fill:#0f3460,stroke:#53a8b6,color:#fff
    style G fill:#0f3460,stroke:#53a8b6,color:#fff
    style H fill:#0f3460,stroke:#53a8b6,color:#fff
    style I fill:#533483,stroke:#e94560,color:#fff
    style J fill:#533483,stroke:#e94560,color:#fff
    style K fill:#1a3a2e,stroke:#22c55e,color:#fff
    style L fill:#1a3a2e,stroke:#22c55e,color:#fff
    style M fill:#533483,stroke:#e94560,color:#fff
StepAgentWhat Happens
1LeadSession Prep — loads session-prep.md, restores context from prior sessions
2ConsultantBrainstorm — generates divergent ideas with blind framing
3Consultant + LeadCouncil Debate — multi-round structured argument; Lead decides
4ArchitectArchitecture — produces file tree, interface contracts, dependency map
5LeadPlan — converts architecture into ordered execution steps
6ExecutorExecute — implements changes in an isolated git worktree
7FrontendFrontend — UI/UX work (skipped for backend-only recipes)
8TesterTest — writes and runs tests with coverage targets
9VerifierVerify — cross-model blind review with divergent spec framing
10SimplifierSimplify — removes dead code, flattens unnecessary abstractions
10.5AnalystRetrospective Analysis — queries analytics.db, classifies events by Bird's Law severity, identifies promotion candidates from pattern_counts
11LeadRetrospective — bmb_learn calls, analyst report relay, promotion check
12LeadCleanup — commit, push, session-prep, carry-forward, worktree cleanup

Key Differentiators

<table> <tr> <td width="50%">

Cross-Model Blind Verification

The Verifier agent sends your code to a different model (Codex or Gemini) with a deliberately reworded specification. If the second model finds issues the first missed, you know the solution has assumption leaks — not just bugs.

</td> <td width="50%">

Council Debate

Before any code is written, the Consultant and Lead engage in multi-round structured debate. The Consultant proposes alternatives, plays devil's advocate, and stress-tests assumptions. The Lead makes the final call — but only after hearing the opposition.

</td> </tr> <tr> <td>

Worktree Isolation

Each agent that writes code operates in its own git worktree. Parallel execution without merge conflicts. Changes are reviewed and merged only after verification passes.

</td> <td>

3-Tier Auto-Learning

Lessons flow upward: project-local learnings (per-repo) → global learnings (cross-project) → CLAUDE.md promotion (permanent rules). Recurring mistakes automatically become enforced rules.

</td> </tr> <tr> <td>

3-Layer Context Compression

Long pipelines bleed context. BMB compresses at three layers: intra-step (within each agent), inter-step (handoff summaries), and session-level (session-prep.md for continuity across conversations).

</td> <td>

Configurable Recipes

Not every task needs 12 steps. Pick a recipe to skip what you don't need — a bugfix skips brainstorm and council; a research task skips execution entirely.

</td> </tr> <tr> <td>

Analytics Layer + Bird's Law Severity

Every pipeline run emits structured telemetry to analytics.db. The Analyst (Step 10.5) queries pattern_counts to find recurring failures and classifies events by Bird's Law severity (critical / warn / info). Promotion candidates surface automatically after 2+ occurrences.

</td> <td>

Context7 for All Implementation Agents

Architect, Executor, and Frontend agents query live library documentation via Context7 MCP before writing code. No stale API assumptions — agents always write against the current SDK.

</td> </tr> </table>

Recipes

RecipeSteps UsedBest For
featureAll 12New features, large changes
bugfix1 → 5 → 6 → 8 → 9 → 10 → 11 → 12Bug investigation and fix
refactor1 → 4 → 5 → 6 → 8 → 9 → 10 → 11 → 12Code restructuring
research1 → 2 → 3 → 11 → 12Exploration, spikes, design decisions
review1 → 9 → 11 → 12Code review only
infra1 → 4 → 5 → 6 → 8 → 9 → 11 → 12CI/CD, tooling, config changes

Slash Commands

CommandDescription
/BMBFull 12-step pipeline — select a recipe interactively
/BMB-brainstormBrainstorm + Council only — explore ideas without executing
/BMB-refactoringRefactor recipe shortcut — skip brainstorm, go straight to architecture
/BMB-setupFirst-time project setup — generates session-prep.md and config
/BMB-statusProject/idea dashboard — stale idea nudges, lifecycle overview

The 10 Agents

AgentRoleModel
LeadOrchestrator, decision-maker, session continuityClaude
ConsultantCoordinator: user advisor + pipeline monitor. Dual-channel (feed + SendMessage). Post-briefing analysis after blind phase.Claude (i18n: en/ko/ja/zh-TW)
ArchitectSystem design, file tree, contracts. Queries Context7 for live library docs.Claude
ExecutorImplementation in isolated worktree. Queries Context7 before writing.Claude
FrontendUI/UX implementation. Queries Context7 before writing.Claude
TesterTest writing and executionClaude
VerifierCross-model blind reviewCodex / Gemini / Claude
SimplifierDead code removal, complexity reductionClaude
AnalystRetrospective analytics: Bird's Law severity classification, pattern_counts promotion candidatesClaude (bypassPermissions, read-only)
MonitorLead-owned lightweight observer: metadata-only stall detection, timeout warnings, blind phase filtering. Optional dependency — never blocks pipeline.Claude Haiku

The Writer agent handles documentation generation as a sub-role of the pipeline.


Requirements

DependencyRequiredNotes
Claude Code CLIYesCore runtime
tmuxYesAgent session management
python3YesScript tooling
sqlite3YesFTS5 knowledge base
gitYesWorktree isolation
Codex CLIOptionalCross-model verification
Gemini CLIOptionalCross-model verification

Run bmb doctor after installation to verify all dependencies.


Interactive Architecture Guide

Explore the full pipeline visually:

View Interactive Docs →

Mobile-optimized summary pages (7-card vertical scroll, 4 locales):

LanguageURL
Englishm.html
한국어m.ko.html
日本語m.ja.html
繁體中文m.zh-TW.html

Project Structure

~/Projects/bmb/              # Source of truth (GitHub repo)
├── skills/bmb*/             # 5 slash command skills
├── agents/bmb-*.md          # 10 agent definitions
├── bmb-system/
│   ├── config/              # defaults.json (v2)
│   ├── scripts/             # cross-model-run.sh, bmb-config.sh, bmb-ideas.sh, bmb-analytics.sh, ...
│   └── plans/               # Version release plans
└── docs/                    # Architecture, configuration, troubleshooting

~/.claude/                   # Runtime (symlinks to repo)
├── skills/bmb* → repo       # Symlinked skills
├── agents/ → repo            # Symlinked agents
└── bmb-system/ → repo        # Symlinked runtime

.bmb/                        # Per-project runtime directory
├── config.json              # Project-local config (merged from 3 layers)
├── analytics/
│   └── analytics.db         # SQLite: sessions, events, pattern_counts
├── handoffs/
│   └── analyst-report.md    # Step 10.5 output
└── sessions/{id}/
    ├── carry-forward.md     # Atomic session continuity
    └── plan-review.md       # Cross-model plan critique

What's New in v0.4.0

6-Feature Upgrade — cross-model fix, agent discipline, visual brainstorming, session continuity, parallel sessions, and Monitor watchdog.

CapabilityDescription
OMX Cross-Model FixReplaced raw codex exec with MCP-disabled invocation. Eliminates 100% timeout rate caused by MCP server loading.
Superpowers DisciplineVerification gates, debugging discipline, TDD checklists, and YAGNI principles embedded directly in agent prompts. All agents upgraded to Opus 4.6 (1M context).
Visual BrainstormingBrowser-based visual companion for Step 2 — mockups, architecture diagrams, trade-off matrices via Superpowers server.
Session-End PrepStep 12 auto-generates next-session-plan.md with completed items, follow-ups, and a one-line start prompt.
Parallel SessionsSESSION_MODE enum (standalone/sub/consolidation) for safe concurrent pipelines with track splitting and consolidation prompts.
Monitor WatchdogHaiku Monitor enhanced with pane sweep for orphaned processes and nudge escalation for stalled agents.

Contributing

Contributions are welcome. Please read the Contributing Guide before submitting a PR.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Run the test suite (bmb doctor && /BMB-setup)
  4. Commit your changes
  5. Open a Pull Request

License

MIT — use it however you want.


<div align="center">

Built with obstinate attention to correctness.

Report Bug · Request Feature · Discussions

</div>