Agentic Validation Skills

Guidelines for agentic AI test automation based on the CIGE standard.

Agentic Validation Skills

Guidelines and skills for agentic AI test automation, based on the CIGE standard — a structured test case format designed for self-healing, validation-driven agentic test automation.

A CLAUDE.md file and Claude Code skill that improve how AI agents author, execute, and recover from agentic test cases.

The Problem

Traditional test formats (BDD, POM, AAA) were designed for deterministic systems with rigid step sequences. When an AI agent runs these tests:

  • Intent collapses into procedures — when the UI changes, the whole test breaks even if the goal hasn't changed
  • No guardrails — agents optimize for test completion, not safe test completion
  • Failure is binary — pass/fail with no classification, so agents can't decide whether to self-heal, escalate, or retry
  • Context is assumed — agents explore blindly instead of operating within a defined decision space
  • Token waste — loading the full test upfront dilutes attention and increases cost

The Solution: CIGE

The CIGE standard separates four concerns that traditional test formats conflate:

ComponentRoleStability
ContextSystem under test, environment, tools, preconditionsChanges with environment
IntentThe single outcome the test must confirmStable — survives system changes
GuardrailsExplicit constraints on agent behaviorStable — safety boundaries
ExecutionAdaptive guidance for reaching the goalMutable — self-heals with the system

Every agentic test case is expressed as:

{
  "Context": "E-commerce checkout service, staging environment, user pre-authenticated with test account, payment gateway mocked",
  "Intent": "User can successfully place an order and receive an order confirmation",
  "Guardrails": [
    "Do not submit orders with real payment credentials",
    "Do not mutate production order records",
    "Halt if checkout flow exits the staging domain"
  ],
  "Execution": [
    "Navigate to cart with at least one item",
    "Proceed through checkout flow to order summary",
    "Submit order using test payment credentials",
    "Verify: order confirmation page is displayed with a valid order ID"
  ]
}

The Four Principles

1. Establish Context Before Testing

Narrow the agent's decision space. Never test into the void.

Declare system, environment, tools, and state before writing a single execution step. Missing context causes wasted exploration and flaky results.

2. Anchor Every Test to Intent

Intent is the stable core. Never collapse it into steps.

Write intent as one outcome-focused sentence. It must survive UI changes, infrastructure migrations, and workflow refactors. The question is always: "Did the agent safely achieve the intended outcome?" — not "Did it follow every step?"

3. Define Guardrails Before Executing

Bounded autonomy is safe autonomy. No guardrails = no execution.

Declare what the agent must never do before any test runs. Guardrails are constraints, not suggestions. An agent that violates a guardrail to complete a test has not passed the test.

4. Treat Execution as Adaptive Guidance

Steps are starting points. Goal fidelity is the finish line.

Execution steps are guidance, not scripts. When the system changes, execution self-heals. But only after classifying the failure:

Failure TypeRecovery
Product DefectEscalate — preserve intent
Outdated Test LogicUpdate execution only — intent must not change
Infrastructure FailureRestore context — do not alter test definition

Never rewrite intent to make a failing test pass.

Reported Impact

From production use of the CIGE standard:

  • ~90% reduction in test maintenance effort
  • ~38% reduction in false positives
  • ~62% reduction in average test execution time

Install

Option A: Claude Code Plugin (recommended)

/plugin marketplace add vivekkrishna/agentic-validation-skills
/plugin install agentic-validation-skills

Once installed, invoke the skill with:

/agentic-validation-skills:cige-guidelines

Option B: CLAUDE.md (per-project)

New project:

curl -o CLAUDE.md https://raw.githubusercontent.com/vivekkrishna/agentic-validation-skills/main/CLAUDE.md

Existing project (append):

echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/vivekkrishna/agentic-validation-skills/main/CLAUDE.md >> CLAUDE.md

How to Know It's Working

These guidelines are working if you see:

  • Tests survive system changes — execution updates but intent never rewrites
  • Failures are classified, not just reported — agents know whether to self-heal, escalate, or retry
  • Guardrail violations surface before execution — not discovered mid-run
  • Context is always explicit — no blind exploration, no assumed state
  • Shorter, more focused test runs — progressive context disclosure reduces token waste

Customization

Merge CLAUDE.md with your project-specific test guidelines:

## Project-Specific Test Context

- Staging URL: https://staging.example.com
- Test account credentials are in Vault at secret/test-accounts
- Payment gateway mock is enabled by default in staging
- Do not test against the orders-v1 endpoint — deprecated, use orders-v2

Background

CIGE was proposed as an alternative to free-form agentic test definitions. The full rationale is in the original article: CIGE: An Agentic AI Test Case Standard Proposed.

The core insight: modern AI agents operate contextually, not mechanically — just like human testers. Test definitions should reflect that. Rigid step sequences served deterministic automation. Structured, intent-driven formats serve agentic automation.

License

MIT