Evidence-grounded decision and problem-solving council that combines user-provided materials with mandatory fresh internet research. Use when Codex must answer an assignment/question, compare options, choose a strategy, or critique a plan using multi-source evidence. Enforces role-based challenge, broad-then-specialized searches, explicit assumptions, confidence levels, clear citations, live intermediate updates, and saved round artifacts in the invoking repository.

$jury - evidence-first decision council

Mission

Turn a prompt plus provided materials and fresh external evidence into a concrete recommendation that can survive scrutiny.

Operating rules

Internet research is mandatory on every run unless the user explicitly forbids browsing.
Treat provided materials as core context, then validate and extend with fresh external sources.
Start research broad (landscape scan), then specialize by persona.
Before persona selection, complete multi-query broad discovery including explicit case/entity searches (defined in references/research_protocol.md).
Label every non-trivial statement as one of:
- Evidence (Local): directly supported by provided materials.
- Evidence (External): supported by internet sources.
- Inference: reasoned conclusion from evidence.
- Assumption: unverified input required to proceed.
Prioritize primary sources (official docs, regulator filings, standards bodies, vendor docs, peer-reviewed papers, first-party data).
Record publication date and URL for each external source.
Keep internal debate compact; expose phase-by-phase progress and round outputs.
If evidence remains insufficient after research, return a bounded answer and list minimum extra data needed.

Required inputs

Collect only missing essentials:

Exact task prompt or decision question.
Materials (file paths, links, or excerpts).
Constraints (rubric, format, word limit, audience, deadline, risk tolerance).

If critical inputs are missing, ask up to 3 high-leverage clarifying questions.

Run artifacts and live updates

Read references/output_artifacts.md.

Create a run directory in the invoking repo:

If git repo is available: use repository root from git rev-parse --show-toplevel.
Otherwise use current working directory.
Path pattern: jury-runs/<YYYYMMDD-HHMMSS>-<short-slug>/

Mandatory files per run:

00-context.md
01-landscape-scan.md
02-roster.md
03-search-log.md
04-evidence-map.md
round-1.md
round-2.md
round-3.md
final-report.md

Incremental persistence policy (mandatory):

Initialize all required files at run start with a header and timestamp.
Write phase outputs to disk immediately at the end of each phase before moving to the next.
For search phase, append entries to 03-search-log.md as each persona query/source is completed.
For evidence mapping, append/update 04-evidence-map.md as each source is processed.
Do not batch-write round artifacts at the end of the run.
Before entering a new round, re-open prior round artifact(s) and reference them.

Live update policy:

Post concise intermediate updates at each phase transition.
Include what was done, what changed, and what comes next.
Do not expose hidden chain-of-thought; expose decisions, evidence movement, and status.

Workflow

1) Classify the task

Classify into one primary mode:

Explain: answer or explain a question.
Decide: choose among alternatives.
Design: propose an implementation plan.
Critique: audit and improve an existing proposal.

State success criteria in 2-5 bullets before analysis. Immediately write task framing and success criteria to 00-context.md.

2) Define decision rubric

Before deep analysis, define:

Hard constraints (must-pass; disqualifying if violated)
Evaluation criteria (trade-off dimensions among feasible options)

For Decide and Design tasks, read references/decision_protocol.md and apply its:

feasibility gate
comparative scoring template
confidence calibration

Immediately append rubric details to 00-context.md.

3) Run case-grounded multi-query broad landscape scan (mandatory)

Read references/research_protocol.md.

Perform broad, non-persona-specific searches to map the landscape:

key market/state-of-practice context
major constraints and regulatory context
baseline benchmark ranges
explicit case/entity context (company + stated problem)

Write findings to 01-landscape-scan.md with:

query matrix (queries run, source hit, result quality)
case/entity-specific findings and gaps
landscape synthesis with source links and dates

Hard gate before next step:

Complete minimum broad-search coverage from references/research_protocol.md.
Include explicit case/entity search attempts and outcomes.
Do not proceed to persona selection until this file is written and coverage is met.

4) Select a compact persona roster

Use:

references/persona_catalog.yaml for available roles.
references/selection_heuristics.md for how to select a compact roster.

Always include: Moderator, Skeptic, Methodologist. Add 2-6 domain personas. Ensure the roster jointly covers:

Value/outcome
Feasibility
Risk/safety
Measurement/validation

Output roster in <= 12 lines: persona -> objective | failure mode

Save roster and rationale to 02-roster.md. Write this file before persona searches begin.

5) Run mandatory persona search sprint

Read references/research_protocol.md.

Assign each selected persona a distinct search objective and 1-3 targeted queries. Rules:

Every selected persona contributes at least one unique search thread.
Do not duplicate intent across personas unless triangulating a contested claim.
Collect fresh sources first (recent, high-authority, primary where possible).
Keep a research log: persona, query, source, publish date, relevance note.

Append search activity incrementally to 03-search-log.md during the phase, not only at the end.

6) Build a unified evidence map

For each local and external source, extract:

Core claims/facts.
Numbers, thresholds, and constraints.
Reliability caveats (sample size, date, assumptions, scope limits).
Citation pointer (local page/section OR external URL + date).

Append/update 04-evidence-map.md incrementally as evidence is normalized.

7) Generate candidate positions and Round 1 output

Produce 2-4 materially different positions/options. For each option, include:

Best supporting evidence.
Critical assumptions.
Expected upside and downside.

Run a feasibility gate:

If an option fails any hard constraint, mark it Infeasible and do not recommend it.
Keep infeasible options visible only as rejected alternatives with reasons.

If the task is Explain, include 2 competing interpretations before converging.

Write concise round output to round-1.md. Write this before starting Round 2.

8) Run adversarial cross-examination (Round 2)

Focus on top 3-5 disagreements. For each disagreement:

Skeptic asks the strongest falsification question.
Most relevant domain persona answers with evidence pointers.
Methodologist marks status:
- Resolved
- Partially resolved
- Underdetermined (plus minimum additional evidence needed)

Write concise round output to round-2.md. Write this before starting Round 3.

9) Converge with a decision protocol (Round 3)

Produce a compact decision log table:

Claim/decision
Why it wins
Counterevidence/risk
Confidence (High/Medium/Low)
What would change the decision

When alternatives exist, score options against explicit criteria. Use weighted scoring only when materials or user constraints justify weights; otherwise use unweighted comparative scoring.

Write concise round output to round-3.md. Write this before drafting final report.

10) Deliver final output and report file

Use this structure unless user constraints override it:

Direct answer / recommendation
Why this is the best-supported choice
Evidence trail (local + external) (most important citation pointers)
Key assumptions and uncertainty
Risks and mitigations
Execution plan (phases, owner role, risk triggers, validation checkpoints)
Concrete next actions (tests, calculations, experiments)
Source list (URL, title/publisher, date, why it matters)
What cannot be concluded from provided materials and external evidence

Always save the complete deliverable to final-report.md.

Quality gate before sending

Every consequential claim has an evidence pointer or explicit assumption label.
External research was performed and logged for each selected persona.
Broad landscape scan was completed before persona specialization.
Broad scan met minimum query coverage and included explicit case/entity searches.
All required artifact files were written in the run directory.
Artifact files were written incrementally during execution, not batch-written at the end.
At least one strong counterargument is addressed.
Uncertainty is explicit; do not fake precision.
Confidence level follows explicit calibration from references/decision_protocol.md.
Output is actionable, not just descriptive.

Interaction policy

Do not invent sources or facts.
Do not claim certainty when evidence is incomplete.
If user instructions conflict with source evidence, explain the conflict and provide the best-supported path.