AITP-Research-Protocol
AITP Research Protocol transforms AI coding agents into disciplined theoretical-physics collaborators.
How it works
It starts when you describe what you want to study — in plain language, the way you would explain it to a colleague. AITP takes that description and turns it into a bounded research topic with a clear question, scope, and validation plan. You don't need to learn any special commands; just talk.
Once the topic is set up, your AI agent does the actual research work — gathering sources, reading papers, sketching derivations, or running benchmarks — inside a layered protocol that keeps every piece of evidence traceable. Exploratory notes and tentative claims stay clearly labeled as such. Nothing gets promoted to "trusted" status just because the agent sounds confident.
When the work reaches a natural checkpoint, AITP presents what it found, what gaps remain, and whether the results are ready for your review. You decide what gets promoted to reusable memory. This is the human gate — the agent executes, but you own the trust decisions.
Everything is durable. You can close your laptop, come back days later, say "continue this topic," and the agent picks up exactly where it left off — with full context of what was done, what was decided, and what is still open. For the full experience, see docs/USER_TOPIC_JOURNEY.md.
Protocol and Current Implementation
This repository contains both:
- the AITP research protocol
- the current public reference implementation of that protocol
The protocol is the durable contract: layers, trust boundaries, promotion rules, runtime artifacts, and the rule that evidence stays separate from conjecture.
The implementation is what you actually run today:
- the installable kernel under
research/knowledge-hub/ - the
aitpandaitp-mcpentrypoints - the runtime scripts and topic-shell materialization
- the current Codex / OpenCode / Claude Code / OpenClaw front doors
The important boundary is:
- the protocol is the durable model
- the current Python kernel and front-door integrations are one implementation of that model
Another implementation would still count as AITP only if it preserves the same layer semantics, durable artifact model, evidence boundaries, and governed promotion into trusted memory.
Installation
Kernel Install
python -m pip install aitp-kernel
aitp --version
aitp doctor
Then install the platform adapter you use:
Codex (recommended)
aitp install-agent --agent codex --scope user
See .codex/INSTALL.md for details.
OpenCode
Add to your opencode.json:
{ "plugin": ["aitp@git+https://github.com/bhjia-phys/AITP-Research-Protocol.git"] }
See .opencode/INSTALL.md for details.
Claude Code
aitp install-agent --agent claude-code --scope user
This path now installs both the SessionStart bootstrap and the native Claude Code AITP MCP registration.
See docs/INSTALL_CLAUDE_CODE.md for details.
OpenClaw
aitp install-agent --agent openclaw --scope user
See docs/INSTALL_OPENCLAW.md for details.
For contributor/local-dev editable installs, Windows-specific instructions,
migration from older installs, and troubleshooting, see
docs/INSTALL.md.
What aitp doctor Means
aitp doctor is the current install/bootstrap truth surface.
Use it to confirm:
- the kernel is installed correctly
- the fixed protocol roots are present
- your selected front door is wired correctly
- the current machine has enough structure to run the bounded first-use path
Important boundary:
aitp doctorproves front-door readiness- it does not by itself prove deep-execution parity across every runtime
For machine-readable inspection, run:
aitp doctor --json
Current Use Paths
The current implementation supports three honest entry styles.
1. Agent-front-door usage
After installing the adapter for Codex, OpenCode, Claude Code, or OpenClaw, you can start from a natural-language request inside that agent surface.
That is the most protocol-native experience:
- describe the idea
- let AITP route it into a bounded topic
- continue through the topic shell and runtime protocol artifacts
2. Explicit topic bootstrap
If you want the most direct CLI path from idea to topic:
aitp bootstrap --topic "<topic>" --statement "<initial idea or question>"
aitp loop --topic-slug <topic_slug> --human-request "Continue with the next bounded step."
aitp status --topic-slug <topic_slug>
This is the current verified first-run path for the public kernel.
Useful follow-up reads:
aitp replay-topic --topic-slug <topic_slug>aitp capability-audit --topic-slug <topic_slug>aitp paired-backend-audit --topic-slug <topic_slug>aitp h-plane-audit --topic-slug <topic_slug>aitp compile-source-catalogaitp trace-source-citations --canonical-source-id <canonical_source_id>aitp compile-source-family --source-type paperaitp export-source-bibtex --canonical-source-id <canonical_source_id>aitp import-bibtex-sources --topic-slug <topic_slug> --bibtex-path <path-to-bib-file>
3. Lightweight idea-first exploration
If the idea is still too loose for full topic bootstrap:
aitp explore "Sketch the idea before opening a full topic loop."
That writes a lightweight exploration carrier instead of the full topic shell.
When the idea becomes specific enough, promote it into normal topic work with
promote-exploration or start a full topic explicitly.
The Basic Workflow
-
Topic bootstrap — You describe what you want to study. AITP sets up a research topic with a bounded question, scope, and validation contract. No special commands needed.
-
Source acquisition (L0) — AITP gathers papers, notes, and upstream references. Everything is traceable back to its origin.
-
Analysis and exploration (L1, L3) — The agent reads, annotates, sketches derivations, or runs benchmarks. Exploratory outputs are clearly labeled as candidates, not conclusions.
-
Validation and trust audit (L4) — When results look promising, AITP runs explicit checks — consistency, convergence, reproduction — before asking whether the work is ready.
-
Promotion to reusable memory (L2) — Only after your explicit approval does material move into long-term trusted memory. The agent cannot promote on its own.
Research Model
AITP keeps research state in layers instead of flattening everything into one chat transcript.
| Layer | Purpose | What goes here |
|---|---|---|
| L0 | Source acquisition | papers, notes, upstream code references |
| L1 | Provisional understanding | analysis notes, derivation sketches |
| L3 | Exploratory outputs | candidate claims, tentative material |
| L4 | Validation and trust audit | checks, benchmarks, human decisions |
| L2 | Long-term trusted memory | promoted knowledge, reusable workflows |
The default route is L0 → L1 → L3 → L4 → L2. Layer 2 is intentionally last — exploratory work does not become reusable memory just because the agent sounds confident.
flowchart LR
A[Research question] --> B[aitp bootstrap]
B --> C[L0 source traces]
C --> D[L1 provisional analysis]
D --> E[L3 candidate outputs]
E --> F[L4 validation and trust audit]
F --> G{Human approval gate}
G -->|approved| H[L2 reusable memory]
G -->|not yet| E
What's Inside
Three Research Lanes
The same protocol kernel drives different categories of theoretical-physics work.
| Lane | Typical inputs | Validation | Trusted output |
|---|---|---|---|
| Formal theory and derivation | papers, definitions, prior claims | proof-gap analysis, consistency checks | semi-formal theory objects, Lean-ready packets |
| Toy-model numerics | model specs, observables, scripts | convergence checks, benchmarks | validated workflows, reusable operations |
| Code-backed algorithm development | upstream codebases, existing methods | reproduction, trust audit | trusted methods, backend writeback |
Capabilities
- Multi-topic runtime — Work on several research topics in one workspace. Switch between them with natural language.
- Cross-session memory — Every topic survives session resets. Resume days later with full context.
- Lean-ready export — Bridge validated theory results into Lean 4 declaration packets with proof-obligation sidecars.
- Bounded autonomous execution — Run multi-step research loops with explicit human gates at decision points (OpenClaw).
- L1 three-layer vault — Materialize raw/wiki/output intake vaults with explicit flowback receipts on the existing topic-shell path.
- Statement compilation before proof repair — Compile bounded theory statements into declaration skeletons and explicit proof-repair plans before Lean-facing export.
- L2 compiler helpers — Seed and inspect reusable knowledge views with commands like
aitp seed-l2-direction --direction tfim-benchmark-first,aitp consult-l2 --query-text "TFIM exact diagonalization benchmark workflow" --retrieval-profile l3_candidate_formation,aitp compile-l2-graph-report, andaitp compile-l2-knowledge-report.
Runtime Support
| Runtime | Install path | Role |
|---|---|---|
| Codex | .codex/INSTALL.md | Baseline — cleanest end-to-end experience |
| OpenCode | .opencode/INSTALL.md | Plugin-based natural-language routing |
| Claude Code | docs/INSTALL_CLAUDE_CODE.md | SessionStart bootstrap |
| OpenClaw | docs/INSTALL_OPENCLAW.md | Bounded autonomous research loops |
Run aitp doctor --json to check what is converged on your machine.
Current baseline: Codex.
Parity target: Claude Code and OpenCode.
Specialized lane: OpenClaw.
aitp doctor reports front-door readiness only. Deep-execution parity is a
separate surface.
Across Codex, Claude Code, and OpenCode, the front door now publishes the same
plain-language human-control posture and autonomous-continuation posture in
session_start.generated.md and runtime_protocol.generated.md.
When no real checkpoint is active, AITP should continue bounded work without
ritual reconfirmation; in verify + iterative_verify, the bounded L3-L4 loop
is allowed to keep cycling until success, a real blocker, or a real human
checkpoint appears.
The current bounded parity probes are available for Claude Code and OpenCode
via python research/knowledge-hub/runtime/scripts/run_runtime_parity_acceptance.py --runtime <runtime> --json.
The cross-runtime closure report is available via
python research/knowledge-hub/runtime/scripts/run_runtime_parity_audit.py --json.
The bounded L1 raw/wiki/output vault acceptance is available via
python research/knowledge-hub/runtime/scripts/run_l1_vault_acceptance.py --json.
The bounded L1 assumption/reading-depth acceptance is available via
python research/knowledge-hub/runtime/scripts/run_l1_assumption_depth_acceptance.py --json.
The bounded L1 method-specificity acceptance is available via
python research/knowledge-hub/runtime/scripts/run_l1_method_specificity_acceptance.py --json.
The bounded L1 concept-graph intake acceptance is available via
python research/knowledge-hub/runtime/scripts/run_l1_concept_graph_acceptance.py --json.
The bounded Layer 0 source-catalog acceptance is available via
python research/knowledge-hub/runtime/scripts/run_source_catalog_acceptance.py --json.
The bounded runtime transition/demotion acceptance is available via
python research/knowledge-hub/runtime/scripts/run_transition_history_acceptance.py --json.
The bounded promotion-gate human-modification acceptance is available via
python research/knowledge-hub/runtime/scripts/run_human_modification_record_acceptance.py --json.
The bounded competing-hypotheses acceptance is available via
python research/knowledge-hub/runtime/scripts/run_competing_hypotheses_acceptance.py --json.
The bounded statement-compilation acceptance is available via
python research/knowledge-hub/runtime/scripts/run_statement_compilation_acceptance.py --json.
The machine-readable install view exposes:
runtime_convergencedeep_execution_parityfull_convergence_repairruntime_support_matrix.runtimes.<runtime>.remediationruntime_support_matrix.deep_execution_parity.runtimes.<runtime>.status
Windows local-checkout note:
scripts\aitp-local.cmd doctorscripts\aitp-local.cmd bootstrap --topic "<topic>" --statement "<statement>"
Useful runtime audit entrypoints once a topic exists:
aitp capability-audit --topic-slug <topic_slug>aitp paired-backend-audit --topic-slug <topic_slug>aitp h-plane-audit --topic-slug <topic_slug>
Philosophy
- Evidence before confidence — sources stay separate from speculation at every layer
- Bounded steps, not freestyle — every unit of work has a clear question and scope
- Humans own trust — nothing becomes reusable memory without explicit approval
- Durable by default — research state survives session resets and machine changes
- Light until it matters — ordinary work stays minimal; the runtime only expands when something important happens
Current Status
- Ships one public reference implementation of the AITP protocol under
research/knowledge-hub - Supports Codex, OpenCode, Claude Code, and OpenClaw front doors over the same kernel
- Has a verified first-run path:
bootstrap -> loop -> status - Has a verified lightweight idea-first path:
explore -> promote-exploration - Has an explicit human approval gate before
L2promotion - Bridges into the Theoretical-Physics-Knowledge-Network formal-theory backend
What is mostly done:
- protocol surface coverage for the current
L0-L4kernel - install/bootstrap/front-door adoption surface
- runtime control plane,
H-plane, layer graph, and route-transition visibility
What is not yet fully proven:
- real-topic end-to-end research utility across
L0 -> L1 -> L3 -> L4 -> L2 - deep-execution parity on every non-Codex runtime
- full maturity of the
L2knowledge surface and statement-compilation pipeline
Contributing
AITP stabilizes the research protocol, not one frozen implementation. Contributions that preserve the layer model, durable artifacts, evidence boundaries, and governed promotion gates are welcome.
See docs/CHARTER.md for what counts as disciplined AI-assisted theoretical-physics work.
See docs/AITP_GSD_WORKFLOW_CONTRACT.md
for the boundary between research-topic work in AITP and implementation work
in GSD.
License
MIT License — see LICENSE file for details.
Read Next
docs/QUICKSTART.md— detailed walkthrough with a real topicdocs/USER_TOPIC_JOURNEY.md— what AITP feels like in practicedocs/INSTALL.md— all installation details and troubleshootingdocs/PUBLISH_PYPI.md— public package build and release workflowdocs/CHARTER.md— the full research charterdocs/architecture.md— technical architecturedocs/MULTI_TOPIC_RUNTIME.md— multi-topic runtime behaviordocs/MIGRATE_MULTI_TOPIC.md— migration notes for multi-topic stateresearch/knowledge-hub/L5_PUBLICATION_FACTORY_PROTOCOL.md— publication/output layer contractdocs/AITP_GSD_WORKFLOW_CONTRACT.md— when to use AITP vs GSDdocs/AITP_WORKFLOW_SHELL_AND_PROTOCOL_KERNEL.md— why the UX converges on Superpowers' install shapedocs/roadmap.md— development roadmap