
10 results

Optimize LLM token costs and latency for AI agents and LLM applications. Use this skill when the user wants to reduce API costs, control token usage, implement model routing (cheap vs expensive models), add token budget guardrails to LangGraph graphs, leverage Anthropic prompt caching, compress context windows, or implement batch processing strategies. Trigger whenever you see concerns about cost, "too expensive", "too slow", token limits, or scaling LLM workloads.

FastAPI best practices and conventions. Use when working with FastAPI APIs and Pydantic models for them. Keeps FastAPI code clean and up to date with the latest features and patterns, updated with new versions. Write new code or refactor and update old code.

Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and stream partial results with Instructor - battle-tested structured output library

Use this skill whenever building AI agents, multi-agent systems, or stateful LLM applications using LangChain and LangGraph. Make sure to consult this skill if the user asks about cyclic execution, checkpointing, memory (long or short term), Map-Reduce graph fan-out, human-in-the-loop (interrupts), callbacks, streaming events, the Functional API (@entrypoint/@task), durable execution, extended thinking, Anthropic server tools, prompt caching, or MCP connectors.

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

Master prompt engineering for AI agents and LLM applications, with deep expertise in Claude/Anthropic-specific techniques. Use when: designing system prompts for agents, writing tool descriptions, chain-of-thought prompting, few-shot examples, preventing hallucinations, Constitutional AI self-correction, LLM-as-a-Judge evaluation, A/B testing prompts, optimizing agent role definitions, structuring output formats, or debugging why an agent makes wrong decisions. Trigger even when the user says things like 'why does the agent keep doing X', 'how do I make Claude more accurate', or 'prompt isn't working'.

Build Retrieval-Augmented Generation (RAG) systems for LLM applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases.

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

Build tools that agents can use effectively, including architectural reduction patterns