Validates completed work against defined acceptance criteria. Use after completing a task that has specific success criteria defined in issues, specs, or task descriptions.
Detects and prevents shortcut rationalization — when the agent is about to skip a step, simplify a requirement, or optimize away a constraint. Triggers proactively when rationalization patterns are noticed.
Provides a persistent cross-session task queue for work that should be done but isn't blocking current tasks. Enables background and future-session work tracking.
Safely integrates commits from parallel agent branches using sequential cherry-pick. Use after parallel work completes in isolated branches or worktrees.
Enforces tiered test coverage standards with three dimensions — line coverage by tier, spec-to-test traceability, and spec-to-code implementation mapping. Use before claiming code is complete.
Extracts and records architectural decisions from diffs, conversation context, and explicit choices. Supports automatic extraction from staged changes, deduplication against prior decisions, and persistent ADR-format logging with spec traceability.
Challenges recommendations, designs, or synthesis by adversarial review across 5 dimensions. Use before shipping any significant recommendation or design document.
Generates fuzz test scaffolding for parsers handling external input (YAML, JSON, config files, user input). Seeds corpus from existing fixtures and runs initial pass.
Closes production readiness gaps from a gap analysis document. Autonomously loops dependency analysis, parallel agent dispatch, test gate, re-audit, and repeats until target dimensions reach target score.
Implements snapshot/golden file tests with temporal normalization so tests don't break daily. Use when implementing tests that compare output against expected snapshots.
Generates a session continuity document at the end of a long session or when switching context on a multi-session project. Enables seamless resumption.
Analyzes large tasks for independent subtasks that can be safely parallelized. Produces a DAG-based dispatch plan with dependency ordering and maximum parallelism.
Extracts learnings from completed work, feedback, or failures. Updates a persistent learnings file with capped entries. Use after receiving feedback, fixing bugs, or completing complex tasks.
Validates PRD, user stories, or feature requirements against SMART criteria. Use when reviewing requirements to catch vagueness before it causes expensive rework.
Dispatches SOTA research before proposing any new feature, architecture, or technology choice. Use when facing design decisions, not when implementing existing specs.
Performs deployment with pre-flight checks, atomic replacement, post-deploy verification, and automatic rollback. Use before any deployment to staging or production.
Generates a behavioral scenario acceptance matrix for comprehensive test coverage planning. Maps user scenarios to acceptance criteria with stable IDs.
Performs smart context summarization when the context window is approaching capacity. Identifies what to keep, what to summarize, and creates a compact resume state.
Bidirectional sync between specification documents and code. Detects spec drift from staged changes, updates spec to reflect approved decisions, and ensures every commit represents a reconciled snapshot of spec, tests, and code.
Enforces structured output format for analyses, reports, and recommendations — Executive Summary, Detailed Analysis, Next Steps, Metrics. Prevents wall-of-text responses.
Routes users to the right vibe-engineering skill for their current task. Use when user asks "what skill should I use?" or seems unsure about which skill to invoke.
Dispatches research agents to find real projects, papers, and documented failures before proposing any feature, architecture, or technology choice. Triggers on creative/design tasks, not on implementation of already-designed specs.