cold-start-audit
Simulate a new user in a sandboxed environment to discover CLI tool UX friction. Use when auditing a tool's onboarding experience or running isolation-mode audits.
Cold-Start UX Audit: AI agents simulate new users in sandboxed environments.
Arguments
mode <tool-name>: Analyze the tool's help output and recommend the appropriate sandbox isolation mode. See Mode Advisory.preflight <tool-name> --mode <mode> [mode-args]: Validate all prerequisites before running an audit. Auto-run beforesetupandrun. See preflight.md.container build <tool-name> [--dockerfile PATH]: Auto-increment round number, check for reusable containers/images, build new image if needed, start container. See container.md.container list <tool-name>: Show all images and containers for a tool with their status. See container.md.container cleanup <tool-name> [--keep-latest N]: Remove old containers and images, keeping the latest N rounds. See container.md.setup <tool-name> --mode container <container-name>: Run the filler agent to discover tool metadata from the running container and produce a filled audit prompt atdocs/cold-start-audit-prompt.md.setup <tool-name> --mode local --env KEY=VALUE [--env KEY=VALUE ...]: Run the filler agent in local mode. Creates temp dir, sets env vars, discovers tool metadata.setup <tool-name> --mode worktree --dir <source-path>: Run the filler agent in worktree mode. Copies<source-path>to a temp dir, discovers tool metadata.run <tool-name> --mode <mode> [mode-args]: Full audit. Checks for existing prompt (reuse or regenerate), launches audit agent, writes findings todocs/cold-start-audit.md. Same mode args assetup.report: Read and summarize findings fromdocs/cold-start-audit.md, grouped by severity.init <container-name>: Scaffold.claude/settings.jsonwith scoped permissions for the container (container mode only).
Sandbox Modes
Three modes satisfy the core invariant — the audit agent must not modify production state — using different isolation mechanisms:
| Mode | Argument form | When to use |
|---|---|---|
container | --mode container <container-name> | Tool has destructive ops (remove, delete, system writes) |
local | --mode local --env KEY=VALUE [--env KEY=VALUE ...] | Tool writes only to self-managed state (own DB or config) |
worktree | --mode worktree --dir <source-path> | Tool reads/writes files in the current directory |
Container mode (default): audit agent prefixes every command with docker exec <container-name>. Requires Docker and a running sandbox container.
Local mode: audit agent sets env vars that redirect the tool's state to a temp directory. No Docker needed. Example: --mode local --env COMMITMUX_DB=/tmp/audit-$$/db.sqlite3.
Worktree mode: audit agent runs inside a fresh copy of a directory. Example: --mode worktree --dir /path/to/project.
Prompt Reuse Strategy
When running subsequent audits on the same tool:
Check for existing prompt first:
- Look for
docs/cold-start-audit-prompt.md - If found, read the metadata header to see the previous container name and date
- Ask user: "Found existing prompt from [date] using container [name]. Options:
- Reuse - Update only container name and date (fast, use when tool commands haven't changed)
- Regenerate - Run filler agent to rediscover everything (use when tool structure changed)"
Implementation:
- For reuse: Use Edit tool to update container name throughout and metadata date
- For regenerate: Run the full filler agent as before
This prevents wasteful parallel execution of filler agent when audit agent doesn't need it.
Permissions
Background agents cannot prompt for tool approval. Without an explicit allow rule, every Bash call is denied silently.
Project-level (recommended, scoped)
Create .claude/settings.json in the project repo. Use a wildcard that covers all round numbers so it doesn't need updating each round:
{
"permissions": {
"allow": ["Bash(docker exec <tool>-r*)"]
}
}
Example for brewprune: "Bash(docker exec brewprune-r*)" covers r1 through r99 without ever needing to update the settings file.
User-level (broader, needed for background agents)
Ensure ~/.claude/settings.json includes:
{
"permissions": {
"allow": ["Bash", "Read", "Write"]
}
}
Common mistake: allow_bash: true is not valid and is silently ignored.
After editing settings, restart the Claude Code session. Settings do not hot-reload.
Mode Advisory (mode <tool-name>)
When the argument is mode <tool-name>, analyze the tool's help output and emit an isolation mode recommendation. Run the tool directly on the host (no sandbox yet):
Step 1 — Discover commands and flags
<tool-name> --help— collect top-level subcommand list- For each subcommand:
<tool-name> <subcommand> --help— collect flags
Step 2 — Answer three diagnostic questions
Q1: Does the tool have destructive operations? Look for subcommands or flags that: remove/delete/uninstall/reset/purge system state, write to package managers, modify system config, or send external requests (email, webhooks). If yes → lean toward container.
Q2: Does the tool write only to self-managed state?
Look for: --db, --config, --data-dir, or env vars that redirect all persistent state to a single path. If the tool's entire blast radius is one file or directory controlled by an env var → lean toward local.
Q3: Does the tool operate on the current directory or file tree?
Look for: commands that read/write files in ./, git operations, file processors, or path arguments with no state isolation flag. If the tool's output is files on disk → lean toward worktree.
Step 3 — Emit recommendation
Mode recommendation: container | local | worktree
Rationale: [2-3 sentences explaining which diagnostic questions triggered the recommendation]
Confidence: high | medium | low
If medium or low — explain what's ambiguous and how to resolve it.
Suggested commands:
/cold-start-audit container build <tool-name> # (if container mode)
/cold-start-audit run <tool-name> --mode <mode> [mode-specific args]
If the tool clearly fits multiple modes (e.g., has both destructive ops AND a self-contained DB), recommend container — it's the most conservative isolation.
Launching the Audit Agent
The audit agent MUST run in a fresh context with zero knowledge of the project. Launch as a background Task agent:
- subagent_type:
audit(falls back togeneral-purposeif not installed) - run_in_background: true
- prompt: the filled audit prompt content (read from
docs/cold-start-audit-prompt.mdor pass directly as text)
Sequencing (when regenerating the prompt):
- Run preflight first — auto-validates prerequisites
- Launch filler agent and wait for completion
- Validate prompt — check for unfilled placeholders
- Then launch audit agent with that prompt content
Anti-pattern to avoid:
- Launching filler agent in background + immediately launching audit agent with manually-edited old prompt = wasteful parallel execution
- Either wait for filler to complete, OR skip filler entirely and reuse/adapt the existing prompt
See filler-agent.md for the full filler agent protocol.
Severity Tiers
| Tier | Meaning | Action |
|---|---|---|
| UX-critical | Broken, misleading, or blocks the user | Fix before next release |
| UX-improvement | Confusing but functional | Prioritize for next sprint |
| UX-polish | Minor friction or inconsistency | Batch into a cleanup PR |
Supporting Files
- preflight.md — Preflight validation checks per mode with fix steps
- container.md — Container lifecycle: build, list, cleanup, Dockerfile pattern
- filler-agent.md — Filler agent protocol: discovery, environment, audit area construction, prompt writing