Vibe Science

Vibe Science is an integrity-first research runtime designed for AI-assisted scientific work.

<p align="center"> <img src="logos/hero-v7.0-trace-adapt.png" alt="Vibe Science v7.0 TRACE + TRACE+ADAPT V0" width="980"> </p> <p align="center"> <a href="https://doi.org/10.5281/zenodo.18665031"><img src="https://zenodo.org/badge/1148022920.svg" alt="DOI"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License"></a> <img src="https://img.shields.io/badge/version-7.0.0-purple.svg" alt="Version"> <img src="https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen.svg" alt="Node"> <img src="https://img.shields.io/badge/platform-Windows%20%7C%20macOS%20%7C%20Linux-lightgrey.svg" alt="Platform"> </p>

Vibe Science

Integrity-first research runtime for Claude Code. Plugin-enforced checks, adversarial review, confounder discipline, cross-session state, and audit trails for AI-assisted scientific work.

Start Here

If you are landing on this repository for the first time, the shortest accurate summary is:

Vibe Science is an integrity-first research runtime for Claude Code.

It combines:

  • a plugin that persists state and structurally enforces selected checks
  • a scientific skill that defines the research methodology and adversarial review posture
  • a design lineage of blueprints and archives that explains how the system evolved

It is not a paper-writing bot, and it is not just a prompt pack.
The point is to make AI-assisted scientific work harder to rush, harder to fake, and easier to audit.

If You Only Read Three Things

GoalReadWhy
Try it quicklyQuick StartFastest way to run the plugin in Claude Code
Understand the systemThree-Layer ArchitectureExplains the split between constitution, skill, and plugin
Inspect technical depthARCHITECTURE.mdRuntime details, version history, and subsystem internals

Current Release — v7.0 TRACE

TRACE is the runtime-closure release. v6.0 NEXUS built the dual architecture; v7.0 makes the critical loops actually flow through the plugin.

The repo now also includes TRACE+ADAPT V0, a small deterministic adaptation layer that sits on top of TRACE without changing the scientific truth model.

Operational today in the repo:

  • 7 Claude Code lifecycle hooks
  • SQLite persistence for sessions, claim lifecycle, R2 reviews, serendipity seeds, gate checks, citations, observer alerts, patterns, and benchmarks
  • real DQ4, L-1+, L0, D1, and stop-time enforcement
  • citation extraction plus bounded DOI/PMID/arXiv verification
  • FTS5/BM25 retrieval with fallback tiers
  • SessionStart context recovery, observer alerts, cross-session patterns, and advisory harness hints

Verification status (repo state):

  • 170/170 end-to-end tests passing
  • smoke passing
  • readiness passing

Important scope note:

Not every methodological rule is hard-enforced in the plugin yet. The core value proposition is already real, but some methodology still lives at skill/review level rather than code-level enforcement.

TRACE+ADAPT V0

TRACE+ADAPT V0 is not a new release codename. It is a feature layer inside v7.0 TRACE.

What it adds:

  • deterministic carry-over hints injected at SessionStart
  • hints derived only from already-persisted runtime signals such as recurring gate failures and observer alerts
  • max 3 hints per session, with cooldown logic so stale patterns fade out
  • zero schema changes, zero LLM calls in hooks, zero protocol rewrites

Its boundary is strict:

  • the harness may adapt
  • the truth model may not

In practice, this means Vibe Science can improve how it reminds and steers the workflow, but it does not let the runtime redefine what counts as valid evidence, gate semantics, or scientific truth.

Why This Project Exists

Vibe Science was shaped by repeated real research failure modes, not prompt aesthetics.

Across 21 CRISPR research sprints, the system caught a result that looked extremely publishable:

  • odds ratio 2.30
  • p < 10^-100
  • clean narrative
  • wrong conclusion

After propensity matching, the sign reversed. Without an adversarial structure, that claim would likely have been written up as a finding.

That is the failure mode this project is built against: an AI agent can analyze data, read papers, and produce a convincing conclusion faster than it can verify whether that conclusion is actually robust.

This is also not just a local anecdote. Recent benchmark work points in the same direction: in MLR-Bench, current coding agents frequently produced fabricated or invalidated experimental results in a large fraction of cases. Vibe Science is an attempt to build against that structural problem, not around it.

What You'll See In Your First Session

When Vibe Science is active, your Claude Code session behaves differently. Here's what to expect:

What happensWhenWhat you'll see
Context injectionSession starts~700-850 tokens of prior state, alerts, calibration hints, pending seeds, patterns, and carry-over runtime discipline
Literature gateYou try to set a research directionBlocked if no prior literature search is logged
Confounder barrierYou write a claim to CLAIM-LEDGER.mdBlocked unless the claim carries the required confounder metadata; the full raw -> conditioned -> matched judgment still happens at methodology/review level
Citation verificationYou reference a DOI/PMID/arXivAuto-extracted, verified, persisted. Unresolved sources block L0
Promotion gateYou advance a claimBlocked at D1 if citations are not all VERIFIED
Stop blockingYou try to end the sessionBlocked if claims have not been reviewed, killed, or disputed
Spine loggingEvery tool useAuto-classified and logged to SQLite — no manual journaling needed
Integrity trackingInfrastructure failsSession marked INTEGRITY_DEGRADED; VIBE_SCIENCE_STRICT=1 makes failures loud

The plugin does not replace judgment. It removes some of the most dangerous shortcuts: promoting without verified sources, closing without adversarial review, losing context between sessions, and writing polished prose detached from structured evidence.

Quick Glossary

TermMeaning
OTAEObserve-Think-Act-Evaluate — the research loop that runs each cycle
R2Reviewer 2 — the adversarial reviewer agent whose job is to destroy weak claims
R3 / J0Judge Agent — meta-reviews R2's reviews (scores quality of the review itself)
SFISeeded Fault Injection — known faults injected before R2 reviews to test vigilance
BFPBlind-First Pass — R2 reviews claims without seeing the researcher's justification first
DQ1-DQ4Data Quality gates at extraction, training, calibration, and finding stages
L0 / L-1+Literature gates: L-1+ requires search before direction, L0 verifies citation sources
D1Claim promotion gate — requires all citations VERIFIED before a claim advances
SSOTSingle Source of Truth — numbers in prose must trace back to structured data files
SalvagenteItalian for "life preserver" — when R2 kills a claim, it must produce a serendipity seed
SPINEResearch Spine — structured audit trail of every action logged to SQLite
FTS5Full-Text Search 5 — SQLite's built-in search engine used for memory retrieval

What This Repository Contains

This repository is a layered system for running research-oriented AI work under stronger discipline:

  • Constitution layer: project-level laws and behavioral constraints in CLAUDE.md
  • Methodology layer: the skill in skills/vibe/, which defines how the research process should work
  • Enforcement layer: the plugin in plugin/, which persists state and structurally enforces selected checks inside Claude Code
  • Design lineage: blueprints and archived versions that explain why the system looks the way it does

That distinction matters because many capabilities live first as methodology, then later become plugin enforcement.


Three-Layer Architecture

Vibe Science is not a single file — it's three layers that reinforce each other:

┌─────────────────────────────────────────────────────────────────────────┐
│  LAYER 1: CLAUDE.md (Constitution)                                     │
│  12 immutable laws · role constraints · permission model                │
│  Loaded automatically by Claude Code at session start                   │
├─────────────────────────────────────────────────────────────────────────┤
│  LAYER 2: SKILL (Methodology Brain)                                    │
│  OTAE-Tree loop · R2 Ensemble (7 modes) · 32 gates · 21 protocols     │
│  Brainstorm engine · Serendipity radar · Evidence engine               │
│  12 constitutional laws · 7 agent roles · 36 reference documents       │
│  → Guides WHAT the agent thinks                                        │
├─────────────────────────────────────────────────────────────────────────┤
│  LAYER 3: PLUGIN (Enforcement Body)                                    │
│  7 lifecycle hooks · Gate engine · Permission engine                    │
│  SQLite persistence (16 tables + FTS5) · Research spine (auto-log)     │
│  R2 calibration hints · Pattern extraction + harness hints · Silent observer │
│  Context builder (~700-850 tokens) · Narrative engine                  │
│  → Enforces selected checks and persistence                            │
└─────────────────────────────────────────────────────────────────────────┘
LayerPurposeEnforcement level
CLAUDE.mdSets dispositional rules — what agents MUST and MUST NOT doPrompt-level
SkillDefines methodology — OTAE loop, R2 protocol, gates, evidence standardsPrompt/spec/schema-level
PluginPersists state and enforces selected behaviors inside Claude CodeStructural for implemented checks, not for every methodological rule

Why a Plugin, Not Just a Skill?

Versions v3.5 through v5.5 were prompt-only: the agent was told to run quality gates and use Reviewer 2. It worked — sometimes. But four subsystems were bypassable because they relied on voluntary compliance:

SubsystemAs Skill (v5.5)As Plugin (v6.0)
Reviewer 2"Please review this claim"Hook blocks clean session end if claims are still unreviewed
Quality Gates"Check gate DQ4 before proceeding"Exit code 2 = tool action blocked until gate passes
Research Logging"Write to PROGRESS.md"Auto-logged to SQLite after every tool use
Memory Recall"Read STATE.md at session start"Hook injects ~700-850 tokens of context automatically

The plugin wraps part of the skill in code-level enforcement and persistence: 7 lifecycle hooks, a gate engine, a permission engine, and SQLite state across sessions.


Key Features

Adversarial Review (Reviewer 2)

The methodology is built around Reviewer 2, an adversarial reviewer whose job is to destroy claims before they harden into conclusions. The skill defines 7 activation modes: INLINE, FORCED, BATCH, BRAINSTORM, SHADOW, VETO, and REDIRECT. The plugin then enforces some of the operational consequences of that posture, including stop-time checks on unreviewed claims.

The important boundary is this: the runtime can ingest review artifacts and enforce claim lifecycle consequences, but it does not yet semantically verify the full quality of every R2 mode or every review protocol declared in the methodology.

Quality Gates (32 gates, 8 schema-enforced)

The methodology defines 32 gates across literature, decision, tree, brainstorm, and data-quality stages. 8 of them are schema-backed. The plugin currently hard-enforces a valuable subset of these checks and logs gate outcomes to SQLite; the rest remain skill-level methodology and review discipline.

Today the hard-stop core in the plugin is centered on DQ4, L-1+, L0, and D1. Adjacent enforcement rails also exist for stop-time claim resolution, Salvagente, and TEAM permission boundaries.

Confounder Harness (LAW 9)

LAW 9 requires every quantitative claim to pass: raw -> conditioned -> matched. Sign change = ARTIFACT. Collapse >50% = CONFOUNDED. Survives all three = ROBUST. Today this discipline is central in the skill and partially enforced in the plugin through CLAIM-LEDGER constraints and gate logic.

The plugin's hard guarantee today is narrower than the full harness semantics: it enforces metadata and structural discipline on claim-like writes, while the actual ARTIFACT / CONFOUNDED / ROBUST judgment still depends on the methodology and adversarial review flow.

Seeded Fault Injection (SFI)

Seeded Fault Injection is a core review protocol: known faults are introduced before R2 review so vigilance can be tested rather than assumed. Runtime support today is mainly artifact ingestion, calibration, and downstream analysis of populated SFI fields, rather than full automatic enforcement of every SFI step.

Tree Search Over Hypotheses

OTAE-Tree architecture explores multiple hypotheses in parallel with 7 node types, 3 tree modes, and best-first selection. Minimum 3 draft nodes before any is promoted (LAW 8).

Cross-Session Learning and Adaptation

  • Pattern extraction: Gate failure clusters, repeated actions, and claim lifecycle patterns are extracted at session end
  • Instinct model: The full reference model is broader than the current runtime; the live path today is pattern carry-over plus deterministic advisory hints, rather than a fully autonomous instinct lifecycle
  • R2 calibration: Historical weakness tracking is computed at read time from ingested r2_reviews; it is useful, but it is not yet a closed-loop evaluator of review quality
  • Semantic recall: Present in the architecture, but still environment-dependent and not yet the whole story of memory retrieval
  • Harness adaptation (TRACE+ADAPT V0): recurring runtime failures can now surface as short advisory hints at SessionStart, with deterministic activation rules and cooldowns

Serendipity Radar

Every cycle scans for unexpected findings. Score >= 10 → QUEUE for triage. Score >= 15 → INTERRUPT current work. Killed claims produce serendipity seeds (Salvagente rule).


Plugin Subsystems (~7,800 LOC)

7 Lifecycle Hooks

HookTriggerWhat It Does
SessionStartSession opensAuto-setup, DB init, injects ~700-850 tokens (state, alerts, R2 calibration, patterns, seeds, harness hints)
UserPromptSubmitBefore each promptAgent role detection, prompt logging, semantic recall via vector search
PreToolUseBefore Write/Edit toolLAW 9: blocks CLAIM-LEDGER modifications without confounder_status
PostToolUseAfter every toolGate enforcement, permission checks, auto-logging to research spine, observer alerts
StopSession endingNarrative summary, blocks stop if unreviewed claims exist (LAW 4), STATE.md export
PreCompactBefore context compactionSnapshots current state to DB for post-compaction recovery (LAW 7)
SubagentStopSubagent finishesSalvagente Rule: killed claims must produce serendipity seed

SQLite Persistence (16 tables + retrieval virtual tables)

meta, sessions, spine_entries, claim_events, r2_reviews, serendipity_seeds, gate_checks, literature_searches, citation_checks, observer_alerts, calibration_log, prompt_log, memory_embeddings, embed_queue, research_patterns, benchmark_runs — plus memory_fts for TRACE retrieval and optional vec_memories when sqlite-vec is available.

Other Engines

  • Gate Engine: Enforces DQ4, L-1+, L0, D1, and claim-gate aggregation at PostToolUse. Exit code 2 = BLOCK. (DQ1-DQ3, DD0, and DC0 remain skill-level methodological checks until runtime producers exist.)
  • Permission Engine: TEAM mode with role-based access control (researcher, reviewer2, judge, serendipity, lead, experimenter).
  • Context Builder: Progressive disclosure with semantic recall (~700-850 tokens per session start, depending on carry-over).
  • Narrative Engine: Template-based session summaries (deterministic, no LLM).
  • R2 Calibration: Weakness tracking, SFI catch rates, J0 trends across sessions with temporal decay.
  • Pattern Extractor: Cross-session pattern detection with confidence scoring and auto-archiving.
  • Silent Observer: Periodic health checks (stale STATE.md, FINDINGS/JSON desync, orphaned data, design drift).

v6.0 Skill (36 Reference Documents)

The v6.0 skill lives in skills/vibe/ and contains the full methodology:

ComponentCountExamples
References36constitution, loop-otae, reviewer2-ensemble, hook-system, pattern-extraction, instinct-model, context-resilience, handoff-protocol, r2-calibration...
Python scripts6dq_gate.py, gate_check.py, spine_entry.py, sync_check.py, tree_health.py, observer.py
JSON schemas12brainstorm-quality, claim-promotion, review-completeness, serendipity-seed, data-quality-gate, finding-validation, spine-entry...
Asset files7fault-taxonomy.yaml, judge-rubric.yaml, templates.md, stage-prompts.md, metric-parser.md, node-schema.md, domain-config-example.yaml
Agent roles7researcher, r2-deep, r2-inline, observer, explorer, r3-judge, instinct-scanner

6 New References (v6.0 additions)

ReferenceWhat It Documents
hook-system.mdAll 7 hooks: triggers, I/O format, LAW enforcement mapping
pattern-extraction.mdCross-session patterns: gate failure clusters, repeated actions, claim lifecycles
r2-calibration.mdTemporal decay formula, weakness tracking, SFI catch rates, J0 trends
handoff-protocol.mdAgent-to-agent handoff: Context, Findings, Files Modified, Open Questions, Recommendations
instinct-model.mdLearned behaviors with confidence (0.3-0.9), auto-promotion, decay, scope (project/global)
context-resilience.mdLAW 7 implementation: PreCompact snapshots, STATE.md, DB recovery, progressive context building

Multi-Agent Architecture (7 Roles)

RoleModelDispositionKey Constraint
Researcherclaude-opus-4-6BUILDCannot declare "done" — only R2 can clear
R2-Deepclaude-opus-4-6DESTROYAssumes every claim is wrong. No congratulations.
R2-Inlineclaude-sonnet-4-6SKEPTIC7-point checklist on every finding
Observerclaude-haiku-4-5DETECTRead-only project health scanner
Explorerclaude-sonnet-4-6EXPLOREBranch artifacts only, no main claim ledger
R3-Judgeclaude-opus-4-6META-REVIEWReviews R2's reviews, not claims directly
Instinct Scannerclaude-haiku-4-5PATTERN-DETECTSession-end pattern extraction and instinct promotion

Separation of powers: R2 produces verdicts, the orchestrator writes to the claim ledger. R2 never writes to the claim ledger directly. R3 never modifies R2's report.


Requirements

RequirementVersionWhyCheck
Node.js>= 18.0.0Runtime for hooks and plugin scriptsnode --version
Claude Code>= 1.0.33Plugin hostclaude --version
GitanyClone the repogit --version
C++ Build ToolsRequired by better-sqlite3 (native SQLite binding)See below

C++ Build Tools by platform:

  • Windows: Visual Studio Build Tools with "Desktop development with C++" workload, or npm install -g windows-build-tools
  • macOS: xcode-select --install
  • Linux: sudo apt install build-essential (Debian/Ubuntu) or equivalent

Optional:

  • Python 3.8+ — for deterministic helper scripts used in audits and methodology support (stdlib only, no pip dependencies)

Quick Start

# 1. Clone and install
git clone https://github.com/th3vib3coder/vibe-science.git
cd vibe-science
npm install

# 2. Launch Claude Code with the plugin
claude --plugin-dir .

# 3. Start a research session
/vibe

On first startup, the SessionStart hook auto-creates ~/.vibe-science/, initializes the SQLite database (16 regular tables plus retrieval virtual tables), and injects structured research context (~700-850 tokens, depending on carry-over hints).


Installation Methods

Marketplace (Recommended)

/plugin marketplace add th3vib3coder/vibe-science
/plugin install vibe-science@vibe-science
# Restart Claude Code

--plugin-dir (Quick Test)

claude --plugin-dir /path/to/vibe-science

Global Settings (Permanent)

Add to ~/.claude/settings.json (or %USERPROFILE%\.claude\settings.json on Windows):

{
  "plugins": ["/absolute/path/to/vibe-science"]
}

Project-Level

Add to your project's .claude/settings.json to load only in that project.


What Does It Do

When active, every Claude Code session becomes a structured research session:

Research question → Brainstorm (Phase 0, 10-step ideation)
                         ↓
                    OTAE-Tree Loop (repeats):
                      OBSERVE  → Read current state + hook context injection
                      THINK    → Plan next action + check instincts
                      ACT      → Execute ONE action (auto-logged to spine)
                      EVALUATE → Extract claims, score confidence, check patterns
                         ↓
                    Reviewer 2 (adversarial review, 7 modes)
                         ↓
                    Only surviving claims advance
                         ↓
                    Stop hook: narrative summary + pattern extraction

What you'll notice:

  • Claims are expected to carry explicit IDs, evidence, status, and confidence
  • Reviewer 2 assumes every claim is wrong and demands evidence
  • Selected gates block or warn at runtime; the full 32-gate system lives in the methodology
  • Session state is persisted and can be exported back into durable artifacts such as STATE.md
  • Serendipity is tracked — unexpected findings are scored and preserved
  • SQLite keeps the audit trail across sessions
  • Pattern extraction and resumability are part of the real working core

Repository Structure

vibe-science/
├── .claude-plugin/              ← Plugin manifests
│   ├── plugin.json              ← Plugin metadata (v7.0.0)
│   └── marketplace.json         ← Marketplace config
│
├── skills/vibe/                 ← v7.0 TRACE skill/runtime methodology
│   ├── SKILL.md                 ← Full methodology (528 lines)
│   ├── AGENTS.md                ← 7 agent roles with YAML frontmatter
│   ├── references/              ← 36 reference documents
│   ├── scripts/                 ← 6 Python helper scripts for deterministic checks
│   ├── assets/
│   │   └── schemas/             ← 12 JSON validation schemas
│   └── agents/
│       └── claude-code.yaml     ← Model tier config
│
├── plugin/                      ← Enforcement engine (~7,800 LOC)
│   ├── scripts/                 ← 7 hook scripts + 2 utilities
│   │   ├── session-start.js     ← Context injection + auto-setup
│   │   ├── prompt-submit.js     ← Role detection + semantic recall
│   │   ├── post-tool-use.js     ← Gate enforcement + auto-logging
│   │   ├── pre-tool-use.js      ← CLAIM-LEDGER write guard (v6.0.1)
│   │   ├── stop.js              ← Narrative summary + stop blocking
│   │   ├── pre-compact.js       ← Context resilience snapshots
│   │   ├── subagent-stop.js     ← Salvagente Rule enforcement (v6.0.1)
│   │   ├── setup.js             ← DB initialization (utility)
│   │   └── worker-embed.js      ← Background embedding daemon (utility)
│   ├── lib/                     ← selected engine modules (18 total)
│   │   ├── db.js                ← SQLite operations
│   │   ├── gate-engine.js       ← Runtime gate helpers (DQ4, L0, D1, L-1+, claim gate checks)
│   │   ├── permission-engine.js ← Role-based access control
│   │   ├── context-builder.js   ← Progressive context disclosure
│   │   ├── harness-hints.js     ← Advisory carry-over hints at SessionStart
│   │   ├── narrative-engine.js  ← Template-based summaries
│   │   ├── r2-calibration.js    ← Temporal decay calibration
│   │   ├── pattern-extractor.js ← Cross-session pattern detection
│   │   ├── vec-search.js        ← Vector similarity search
│   │   └── ...                  ← Ingestion, parsing, benchmarking, and support modules
│   └── db/
│       ├── schema.sql           ← 16 regular tables + retrieval virtual tables
│       └── domain-config-template.json
│
├── commands/                    ← Slash commands (auto-discovered)
│   ├── start.md                 ← /start — conversational entry
│   ├── init.md                  ← /init — initialize RQ workspace
│   ├── loop.md                  ← /loop — run OTAE cycle
│   ├── search.md                ← /search — literature search
│   └── reviewer2.md             ← /reviewer2 — trigger R2 review
│
├── agents/reviewer2.md          ← R2 subagent definition
├── hooks/hooks.json             ← 7 hook definitions (plugin mode)
├── .claude/settings.json        ← Hook definitions (dev mode)
│
├── CLAUDE.md                    ← Project constitution (12 laws)
├── SKILL.md                     ← Legacy v5.5 methodology (1,368 lines)
├── ARCHITECTURE.md              ← Deep technical architecture
├── CHANGELOG.md                 ← Full version history
│
├── protocols/                   ← 21 methodology protocols
├── gates/gates.md               ← Gate specification
├── schemas/                     ← 12 JSON schemas
├── assets/                      ← Fault taxonomy, rubrics, templates
├── logos/                       ← SVG logos (v3.5 → v6.0)
│
└── archive/                     ← Historical versions + blueprints
    ├── v6.0-NEXUS-BLUEPRINT.md
    ├── v6.0.1-BEST-PRACTICES-BLUEPRINT.md
    ├── v5.5-ORO-BLUEPRINT.md
    ├── v5.0-IUDEX-BLUEPRINT.md
    ├── PHOTONICS-BLUEPRINT.md
    ├── vibe-science-v6.0-claude-code/  ← Archive copy of v6.0 skill
    ├── vibe-science-v5.5/
    ├── vibe-science-v5.0/
    ├── vibe-science-v5.0-codex/
    ├── vibe-science-v4.5/
    ├── vibe-science-v4.0/
    ├── vibe-science-v3.5/
    ├── vibe-science-photonics/
    └── vibe-science-legacy-pre-v5.0/

Using Without Claude Code

You can use the methodology with any LLM: upload skills/vibe/SKILL.md as a system prompt, plus the references/ directory. Note: without the plugin, gates are prompt-enforced only (voluntary compliance). The v5.5 SKILL.md at root (1,368 lines) is the legacy standalone version.


Version History

VersionCodenameDateKey InnovationBlueprint
v1.02025-01Core 6-phase loop, single R2 prompt, state files
v2.0NULLIS SECUNDUS2026-02-06R2 Ensemble (4 specialists), quantitative confidence (0-1), 12 gates
v3.0TERTIUM DATUR2026-02-07OTAE loop, serendipity engine, knowledge base, MCP integration
v3.5TERTIUM DATUR2026-02-07R2 double-pass, 3-level attack (Logic/Stats/Data), typed claims
v4.0ARBOR VITAE2026-02-12Tree search, 7 node types, 5-stage experiment manager, 23 gates, 10 laws
v4.5ARBOR VITAE (Pruned)2026-02-14Phase 0 brainstorm, R2 6 modes, -381 lines via progressive disclosure
v5.0IUDEX2026-02-16SFI, blind-first pass, R3 judge, schema-validated gates, circuit breakerIUDEX
v5.5ORO2026-02-19DQ1-DQ4 gates, DD0, DC0, R2 INLINE, SSOT rule (post-mortem driven)ORO
v6.0NEXUS2026-02-20Plugin architecture, 7 hooks, SQLite, cross-session learning, 7 agent rolesNEXUS
v7.0TRACE2026-03-24Lifecycle closure, citation gates, strict integrity, benchmark A/B compare, FTS5 retrievalTRACE

v7.0 TRACE — What Changed

The jump from v6.0 to v7.0 is operational, not cosmetic. NEXUS built the body; TRACE makes the critical subsystems circulate through the runtime and the database.

New in v7.0:

  • claim lifecycle is populated for real in claim_events
  • serendipity seeds and R2 review artifacts are ingested automatically
  • DOI / PMID / arXiv citations are extracted, persisted, verified, and gated (L0 / D1)
  • benchmark runner writes artifacts and records to benchmark_runs, with readiness A/B comparison
  • strict mode persists integrity degradation instead of failing open silently
  • retrieval no longer depends on dead semantic memory alone: memory_fts provides live BM25-ranked recall
  • schema migrations now carry the runtime from v6.x to TRACE (schema_version = 4)

Codex → Claude Code migration: v5.0 had a Codex-specific variant (archive/vibe-science-v5.0-codex/). v6.0 is Claude Code native — no Codex variant needed.

Detailed Changelog

For the complete changelog with every feature, fix, and breaking change across all versions, see CHANGELOG.md.


Troubleshooting

ProblemCauseFix
Plugin not foundWrong pathVerify .claude-plugin/plugin.json exists at the path
npm install fails (Windows)better-sqlite3 needs C++Install VS Build Tools with C++ workload
npm install fails (macOS)Missing Xcode toolsxcode-select --install
Hooks don't fireNot loaded as pluginUse --plugin-dir, marketplace, or settings.json
SQLite errorsDB corruptionDelete ~/.vibe-science/db/ and restart
Embedding worker failsMissing ONNX runtimeNon-critical — falls back to keyword search
"34 gates" in old docsPre-debug artifactCorrect count is 32 gates (8 schema-enforced). Fixed in v6.0.0.

Archive & Blueprints

Every major version has a blueprint documenting its design rationale, innovations, evidence base, and lineage:

BlueprintContent
v6.0-NEXUS-BLUEPRINT.md9 innovations, hook architecture, cross-session learning, lineage from v5.5
v6.0.1-BEST-PRACTICES-BLUEPRINT.mdBest practices upgrade from community analysis, Claude Code spec cross-reference
v5.5-ORO-BLUEPRINT.mdPost-mortem analysis, 12 mistakes → 7 new gates, evidence-driven development
v5.0-IUDEX-BLUEPRINT.mdVerification architecture, SFI, BFP, R3 judge, schema-validated gates
PHOTONICS-BLUEPRINT.mdDomain fork for photonics research

Each historical version is preserved intact in archive/vibe-science-v{X.Y}/ with its original SKILL.md, README, and CLAUDE.md.


Citation

Vibe Science Contributors (2026). Vibe Science: an AI-native research engine with adversarial review and serendipity tracking. GitHub: th3vib3coder/vibe-science · DOI: 10.5281/zenodo.18665031

@software{vibe_science_2026,
  title     = {Vibe Science: AI-native research with adversarial review and serendipity tracking},
  author    = {{Vibe Science Contributors}},
  year      = {2026},
  version   = {7.0.0},
  url       = {https://github.com/th3vib3coder/vibe-science},
  doi       = {10.5281/zenodo.18665031},
  license   = {Apache-2.0}
}

License

Apache 2.0 — see LICENSE.

Authors

Carmine Russo, Elisa Bertelli (MD)


Built with Claude Code · Powered by Claude Opus · Made with adversarial love