
16 results

Use at the start of any LLM agent project, or when reconsidering an existing architecture. Guides decisions across four layers: workflow vs agent, single-agent vs multi-agent, tool-use vs specialized nodes, and retrieval strategy. Each decision has concrete tradeoffs and a recommended default.

Use when connecting an LLM agent to a full-stack application, external API, or third-party platform. Covers four integration patterns (REST, WebSocket/SSE, Webhook, Message Queue), interface design, reliability, security, and observability. Framework-agnostic — guides you to the right pattern for your situation, then gives concrete implementation direction for your chosen stack.

Use when a user has an idea for a product or feature that might involve AI, but doesn't know where to start or how to design the system. Guides non-technical users through a conversational process to clarify their idea, decide where AI fits, and produce a system design they can understand — and that Claude can use to start building. One question at a time. Never assume technical knowledge.

Use when reviewing code — either Claude reviews your code and produces a structured report, or Claude guides you through reviewing someone else's code. Default mode: Claude performs the review and produces a report organized by severity. Second mode: guided self-review with a structured checklist and probing questions. Covers correctness, security, performance, maintainability, and AI-specific concerns for LLM applications.

Use when designing any LLM-as-Judge, Critic, or Evaluator node. Covers input structure, output schema, chain-of-thought ordering, single-pass vs multi-stage tradeoffs, and known failure modes. Prevents the most common design mistakes that cause Critic nodes to be unreliable.

Use when designing a database schema for a new system, or when an existing schema needs to be revised. Two modes: (1) general schema design for any application, (2) AI-specific schema design for systems with conversation history, embeddings, agent state, or LLM outputs. Guides through data modeling decisions, type choices, indexing, and AI-specific storage patterns.

Use when deploying an application to production, setting up CI/CD, or moving from local development to a hosted environment. Guides through deployment target selection (PaaS, Docker+VPS, cloud platform), containerization, environment configuration, CI/CD pipeline design, monitoring, and rollback strategy. Covers both simple and complex deployment scenarios.

Use at the start of any implementation task on an LLM system. A 10-step process from evidence collection to git commit. Prevents the most common failure mode in LLM development: writing code based on assumptions instead of observed system behavior. Use for any change — prompt tuning, node logic, routing, architecture — not just experiments.

Use when designing or improving the action space, observation format, tool boundaries, and evaluation signals of an LLM agent. Covers the five layers of harness design: what tools the agent has, what it sees after acting, how each tool is scoped, how behavior is evaluated, and how to iterate when the agent misbehaves. Prevents the most common harness failures — agents that pick wrong tools, ignore critical information, or loop without progress.

Use when designing how an agent remembers and uses information across turns, sessions, or runs. Covers the four types of agent memory (in-context, episodic, semantic, procedural), when to use each, how to manage context window limits, retrieval strategies, and memory decay. Cross-references database-design skill for storage schema patterns.

Use when choosing a model for an LLM application, deciding between API and local deployment, or evaluating whether fine-tuning is needed. Guides through deployment decision (API vs local), model selection based on task requirements and constraints, configuration setup, and the prompt engineering vs fine-tuning decision. Records user's setup for future reference.

Use when facing a technical problem with multiple possible solutions, or when a previous approach failed and you need to reason about alternatives. Structures the space of options before committing to any one. Core output: a brainstorm log with options, tradeoffs, a chosen approach, and unresolved gaps that must be answered before implementation starts. Use before experiment-driven-development Step 4 when the design space is large or uncertain.

Use before making any prompt change in an LLM system. Covers the full cycle: pre-change documentation, change scoping, post-change validation, and rollback protocol. Prevents silent regressions caused by prompt changes with unexpected downstream effects.

Use before writing any new LLM component prompt, or when an existing prompt is producing unstable, inconsistent, or wrong outputs. Two modes: (1) design from scratch by defining component boundaries and structure before writing, (2) diagnose an existing prompt to find root cause of failure. Covers all dimensions of prompt quality: job definition, input structure, output schema, rules, few-shot examples, system/user split, token budget, and prompt decay over time.

Use when comparing two versions of an LLM system, debugging non-deterministic behavior, or establishing a baseline before making changes. Covers controlled comparison protocol, determinism verification, and bisect protocol for finding which change caused a regression.