Eureka
Eureka enforces rigorous research workflows for coding agents, ensuring scientific integrity and reproducibility.
Eureka
Eureka is a complete research rigor workflow for your coding agents, built on top of a set of composable "skills" and some initial instructions that make sure your agent uses them.
It is the research-workflow counterpart to Superpowers — same plugin architecture, same skill system, same "rigid discipline through mandatory skills" philosophy. Where Superpowers enforces test-driven development and code review for software, Eureka enforces hypothesis pre-registration, statistical rigor, claims auditing, and reproducibility for scientific work.
How it works
It starts from the moment you open a research conversation with your agent. As soon as it sees that you're about to run an experiment or make a scientific claim, it doesn't just jump into analysis. Instead, it steps back and asks you what you're really trying to test.
Once it's teased a research question out of the conversation, it walks you through nine mandatory questions — null hypothesis, falsifiability criterion, primary outcome, confounds, statistical power, contradictory evidence, data provenance — and refuses to proceed until they all have clear answers. One question at a time. Multiple choice when possible.
After you've approved the design, the agent registers the hypothesis to version control before touching any data. Statistical test, significance threshold, data version hash, preprocessing pipeline version — all committed as a pre-registration record. Deviations from the pre-registered plan are allowed, but they must be explicitly labeled as exploratory.
When experiments fail or produce unexpected results, the agent runs a systematic troubleshooting process — reading logs, reproducing the issue, checking for data leakage, tracing the pipeline stage by stage — before proposing any explanation. No post-hoc rationalization.
Before you submit anything, the agent runs a multi-dimensional research review (7 independent dimensions scored out of 100), a full claims audit (every number in the manuscript traced to a source file, every figure verified as script-generated, all experiments — including negative ones — accounted for), and a fresh reproducibility check. If any gate fails, submission is blocked.
There's a bunch more to it, but that's the core of the system. And because the skills trigger automatically, you don't need to do anything special. Your research agent just has Eureka.
Credit
Eureka is modeled directly on Superpowers by Jesse Vincent. The plugin architecture, session-start hook mechanism, rigid-vs-flexible skill distinction, rationalization tables, red-flag checklists, iron laws, and subagent review pattern are all borrowed from Superpowers. If you do software engineering as well as research, install both — they share a namespace convention and were designed to coexist.
Installation
Note: Installation differs by platform. Claude Code and Cursor have built-in plugin marketplaces. Codex and OpenCode require manual setup. Gemini CLI uses its own extension system.
Claude Code (via Plugin Marketplace)
In Claude Code, register this repository as a marketplace, then install:
/plugin marketplace add jeonnoin-alt/Eureka
/plugin install Eureka
The plugin name is case-sensitive and capitalized (Eureka) so that slash commands appear with an Eureka: namespace prefix (e.g. /Eureka:research-brainstorming).
Coming from v1.1.0 or v1.1.1? First clean up the old install:
/plugin marketplace remove eureka /plugin marketplace add jeonnoin-alt/Eureka /plugin install Eurekav1.1.1 had a Windows install bug (EPERM rename) caused by a case-collision between the marketplace name and the plugin name. v1.1.2 fixes this by renaming the marketplace to
eureka-marketplace(the plugin itself staysEureka).
Claude Code (Manual Installation)
The marketplace install above is the recommended path. If you need to install manually (air-gapped environment, modified fork, etc.), clone the repo and point Claude Code at it:
# Pick the current latest release tag
VERSION=$(git ls-remote --tags --refs https://github.com/jeonnoin-alt/Eureka | awk -F/ '{print $NF}' | sort -V | tail -1 | sed 's/^v//')
# Clone into the plugin cache path expected by Claude Code
git clone --branch "v${VERSION}" https://github.com/jeonnoin-alt/Eureka \
~/.claude/plugins/cache/eureka-marketplace/Eureka/${VERSION}
Then restart Claude Code. The SessionStart hook will automatically inject the using-eureka bootstrap skill at the start of every new session.
Note: The cache path uses eureka-marketplace/Eureka/<version>/ (marketplace name and plugin name are different strings by design — avoids a Windows case-insensitive filesystem collision, see v1.1.2 CHANGELOG). Do NOT hardcode 1.0.0 as the version — that path is for historical reference only.
Cursor
In Cursor Agent chat, install from the plugin marketplace:
/add-plugin eureka
Cursor uses .cursor-plugin/plugin.json to discover the skills, agents, and session-start hooks. The bootstrap injection mechanism is the same as Claude Code's.
Gemini CLI
gemini extensions install https://github.com/jeonnoin-alt/Eureka
Gemini CLI loads Eureka via gemini-extension.json and the GEMINI.md context file, which @-imports the using-eureka bootstrap skill at session start. Tool name mappings (e.g. Skill → activate_skill) are documented in skills/using-eureka/references/gemini-tools.md.
To update:
gemini extensions update eureka
Codex
Tell Codex:
Fetch and follow instructions from https://raw.githubusercontent.com/jeonnoin-alt/Eureka/main/.codex/INSTALL.md
Or install manually:
git clone https://github.com/jeonnoin-alt/Eureka ~/.codex/eureka
mkdir -p ~/.agents/skills
ln -s ~/.codex/eureka/skills ~/.agents/skills/eureka
Then restart Codex. To enable the research-reviewer subagent, add multi_agent = true under [features] in ~/.codex/config.toml. Full instructions: .codex/INSTALL.md.
OpenCode
Add Eureka to the plugin array in your opencode.json:
{
"plugin": ["eureka@git+https://github.com/jeonnoin-alt/Eureka.git"]
}
Restart OpenCode. The .opencode/plugins/eureka.js plugin auto-registers the skills directory and injects the bootstrap context via a system prompt transform. Full instructions: .opencode/INSTALL.md.
Verify Installation
Start a new Claude Code session and ask for something that should trigger a skill:
- "I want to run an experiment comparing Model A and Model B" → should invoke
research-brainstorming - "My baseline results look wrong" → should invoke
systematic-troubleshooting - "I finished the analysis, ready to write up" → should invoke
requesting-research-review
The agent should automatically announce which skill it is using.
Quick Start
You just installed Eureka and want to try it on a real project. The shortest path:
1. Start with whats-next
You don't need to memorize the skill list. Just open a new session and say:
"I just installed Eureka. I'm working on [your project]. What should I do next?"
The whats-next skill scans your project state (git log, results/, existing docs), asks 2–3 diagnostic questions, identifies which research phase you're currently in, and recommends one specific next skill to invoke. It then hands off to that skill.
2. End the session with research-journal
When you stop for the day, say:
"Log this session before I stop."
The research-journal skill drafts a structured entry capturing the decisions you made, what failed, what's blocking you, and — most importantly — what your next session should start with. It saves to docs/eureka/journal/YYYY-MM-DD.md.
When you return days or weeks later, the next whats-next invocation reads that entry and picks up exactly where you left off. This is how Eureka stops you from losing context across gaps.
3. Let everything else trigger automatically
You don't need to learn the other nine skills up front. They announce themselves and walk you through their own checklists when the moment comes. Keep doing your research — the discipline layer activates on its own.
Eureka does not require restructuring your existing project
It's strictly additive. New artifacts (design docs, hypothesis registrations, experiment plans, journal entries, audit reports) land under docs/eureka/... as you use the skills. Your existing notes/, experiments/, results/, and manuscript/ directories stay exactly where they are — Eureka reads them as context but never modifies them.
The Research Workflow
-
research-ideation (optional) — Activates when the researcher has no specific question yet — only keywords, a dataset, or vague interest. Generates 3-5 concrete research ideas with metadata (difficulty, data needs, duration, methodology). Recommends one and suggests handing off to
research-brainstorming. Skip this step if you already have a question. -
research-brainstorming — Activates when a research question is detected. Explores the idea through nine mandatory questions (H0, falsifiability, primary outcome, confounds + data leakage, power, alternative explanation, prior work, contradictory evidence, data provenance). Presents the design in sections for validation. Saves a research design document.
-
hypothesis-first — Activates after design approval. The scientific equivalent of test-driven development. Forces you to commit H1, H0, exact statistical test, significance threshold, data version hash, and preprocessing version to version control before any data is analyzed. No analysis without a registered hypothesis.
-
experiment-design — Activates with a registered hypothesis. Breaks the design into bite-sized experiment tasks with exact data paths, version hashes, configs, seeds, and commands. Every task produces committable output.
-
systematic-troubleshooting — Activates when an experiment fails or produces unexpected results. A four-phase process (Investigate → Pattern Analysis → Hypothesis → Resolution) that blocks you from re-running with a different seed until root cause is identified.
-
requesting-research-review — Activates when an experiment phase is complete. Dispatches a
research-reviewersubagent that scores the work across seven independent dimensions (Scientific Foundation, Methodological Rigor, Experimental Execution, Results Quality, Novelty, Reproducibility, Domain Standards). All seven must meet the threshold to PASS. -
manuscript-writing — Activates when writing the manuscript. Guides section-by-section writing with prerequisite gates (no Results before results exist, Abstract written last), citation discipline (every claim cited), number traceability (every value traced to a source file), and per-section subagent review via
section-reviewer. Format-agnostic (LaTeX, Markdown, or other). -
claims-audit — Activates after all sections are written. Traces every quantitative claim to a source file, verifies every figure is script-generated, and confirms that all experiments (including negative ones) are reported. Unreported null results are flagged as publication bias.
-
verification-before-publication — Activates before submission. Fresh verification of every claim, regeneration of every figure from scripts, end-to-end reproducibility check (raw → processed → analysis → results with a single command), and confirmation that the
research-reviewerscore meets the pre-submission threshold. -
submission-readiness — Activates after verification passes. Presents four structured options: submit to target journal, preprint first then submit, continue refining, or pivot. Forces explicit documentation of the decision.
The agent checks for relevant skills before any task. Mandatory workflows, not suggestions.
Two orthogonal skills run alongside the linear workflow above:
whats-next— Triage / dispatcher. Runs when you're stuck or disoriented. Scans project state, diagnoses which phase you're in, and routes to the right specialist skill from the list above.research-journal— Narrative writer. Runs at session end, or after significant decisions and failures. Appends structured entries todocs/eureka/journal/YYYY-MM-DD.mdso the next session has context instead of starting cold.
Why hypothesis-first comes before experiment-design
If you're coming from Superpowers, the skill ordering may look unfamiliar. Superpowers has a 3-step flow:
brainstorming → writing-plans → executing-plans (TDD per task)
(design) (bite-sized tasks) (test → code → verify)
Eureka has a 4-step flow with an extra step inserted between design and planning:
research-brainstorming → hypothesis-first → experiment-design → run experiments
(design) (pre-registration) (bite-sized tasks) (your work)
The new step is hypothesis-first, and it is not the equivalent of writing-plans. It is the equivalent of "write the failing test first" from Superpowers' test-driven-development skill. The scientific TDD cycle:
| TDD (software) | Scientific analog (hypothesis-first) |
|---|---|
| RED — write failing test | REGISTER — commit H1, H0, statistical test, threshold, data version (before seeing data) |
| GREEN — minimal code to pass | EXECUTE — run the pre-specified analysis exactly |
| REFACTOR — clean up | INTERPRET — compare to prediction, report all results including nulls |
The registration produced by hypothesis-first is analogous to a failing test — a committed prediction you make before you can possibly know the answer. That is what prevents HARKing and p-hacking.
experiment-design is Eureka's actual equivalent of Superpowers' writing-plans. It comes after hypothesis-first and produces the bite-sized, checkbox-tracked, version-hashed task list you run the experiments from.
Why register before planning (not planning before registering)? If you plan first, the temptation is to shape the hypothesis to fit convenient experiments — a subtle form of HARKing. Registering first locks the hypothesis, then the plan is forced to operationalize it honestly. The order is scientifically deliberate, not an accident of organization.
So if you have just finished research-brainstorming and are being transitioned to hypothesis-first, expect a registration document, not a task list. The task list comes in the next step via experiment-design.
Which skill do I need?
New to Eureka? Two skills have "research" in the name — here's how to tell them apart:
| I have... | Use this |
|---|---|
| A dataset but no idea what to do with it | research-ideation |
| A vague interest area ("something with EEG") | research-ideation |
| A specific question ("Does X cause Y?") | research-brainstorming |
| A hypothesis I want to test | research-brainstorming |
research-ideation generates many ideas. research-brainstorming takes one idea and makes it rigorous. Ideation is divergent; brainstorming is convergent. You can always start with ideation and let it hand you off to brainstorming when you're ready.
What's Inside
Skills Library
Triage
- whats-next — Scan project state, diagnose which phase you're in, route to the right specialist skill
Ideation
- research-ideation — Generate research ideas from keywords, datasets, and papers with difficulty/duration/methodology metadata
Design & Registration
- research-brainstorming — Nine-question Socratic design refinement
- hypothesis-first — Pre-register H1, H0, analysis plan, and data version before analysis
- experiment-design — Break approved designs into executable, versioned tasks
Execution & Debugging
- systematic-troubleshooting — Four-phase root cause investigation for failed experiments
Review
- requesting-research-review — Dispatch a seven-dimension scientific rigor review
- receiving-research-review — Respond to review feedback with technical rigor, not performative agreement
Writing
- manuscript-writing — Section-by-section writing with prerequisite gates, citation discipline, number traceability, per-section subagent review
- figure-design — Chart-type selection, typography, colorblind-safe palette, layout, and journal-specific export specs; dispatches a
figure-reviewersubagent after every figure
Publication Gates
- claims-audit — Trace every number, verify every figure, report every experiment
- novelty-competitive-audit — Pre-submission external competitiveness check: preemption detection against recent literature, contribution altitude re-verification, differentiation test, PASS/CONCERN/BLOCK verdict (internal rigor gates don't catch "the field moved in the 6-12 months since you started")
- verification-before-publication — Fresh evidence for every claim before submission
- submission-readiness — Four-option decision gate for finished work
Continuity
- research-journal — Append structured narrative entries capturing decisions, failures, blockers, and next-session handoff
Meta
- using-eureka — Bootstrap skill that teaches the agent to auto-invoke Eureka skills
Agents
- research-reviewer — Senior research reviewer that scores work across seven dimensions and produces a Gap-to-Threshold analysis for failing dimensions. Dispatched via
requesting-research-review. Includes per-sub-criterion scoring anchors for D1-D7 to improve inter-run reliability.
Subagents
Seven subagent prompts used by skills for fresh-eyes review (all use 3-tier severity — Advisory / Should-fix / Must-fix — and red-team mode by default):
- design-document-reviewer — per-design review (dispatched by
research-brainstorming) - registration-reviewer — pre-commit gate for registrations (dispatched by
hypothesis-first) - experiment-plan-reviewer — plan buildability + contingency-inheritance check (dispatched by
experiment-design) - section-reviewer — per-manuscript-section review (dispatched by
manuscript-writing) - figure-reviewer — per-figure design + legend review (dispatched by
figure-design) - novelty-audit-reviewer — pre-submission novelty check (dispatched by
novelty-competitive-audit) - traceability-auditor — computational subagent (v1.10.0): regex-extracts manuscript numbers + filesystem-scans
results/+ produces machine-readable diff (dispatched byclaims-audit). First computational subagent pattern in Eureka.
Reference Documents
- docs/references/statistical-guide.md — Test selection, effect size interpretation, multiple comparison correction
- docs/references/data-checklist.md — Data provenance, preprocessing, leakage taxonomy, missing value handling, split strategies
- docs/references/latex-guide.md — LaTeX conventions:
main.textemplate, section files, BibTeX, natbib citations, math notation, figures/tables, compile workflow - docs/references/figure-guide.md — Figure design: chart-type flowchart, colorblind-safe palettes (hex codes), per-journal export specs, matplotlib style recipe, accessibility tools
- docs/references/narrative-guide.md — Manuscript framing: contribution altitude (method improvement / framework / phenomenon / falsification), story arc patterns, Discovery-Adjusted Framing (post-results narrative pivot with HARKing guardrails), negative-result reframing, Intro-Discussion symmetry, venue-specific altitude tuning for 7 journal families
- docs/references/novelty-audit-guide.md — Pre-submission novelty competitive audit: search strategy by field (medical/CS/neuroscience/physics/social/etc.), time-window guidance, 4-dimensional preemption assessment rubric, differentiation test templates with bad/good examples, PASS/CONCERN/BLOCK decision tree, action menu (narrow / venue change / expand evidence / re-frame altitude / abandon), common anti-patterns, search log template
- docs/references/registration-lifecycle.md — Registration lifecycle: active/amended/superseded/archived state machine, YAML frontmatter schema, filename convention, amendment vs supersede decision tree,
docs/eureka/registrations/INDEX.mdchain tracking, HARKing severity spectrum (6-tier), data-discovery feedback workflow, plan↔registration contingency inheritance rules
Templates
- docs/templates/research-design-doc.md — Output of
research-brainstorming - docs/templates/research-review-report.md — Output of
research-reviewer - docs/templates/research-journal-entry.md — Output of
research-journal - docs/templates/registrations-index-template.md — Seed file for
docs/eureka/registrations/INDEX.md; copy into your project repo when starting registration tracking (maintained thereafter byhypothesis-first)
Consumer Paths (docs/eureka/ in your project)
Eureka skills produce durable artifacts in canonical paths under docs/eureka/ — this is a directory Eureka expects to exist in YOUR research project, not in Eureka's own plugin repo (where docs/eureka/ is .gitignored because it's reserved for consumers).
| Consumer path | Written by | Purpose |
|---|---|---|
docs/eureka/designs/YYYY-MM-DD-<topic>-design.md | research-brainstorming | Approved research design documents |
docs/eureka/registrations/YYYY-MM-DD-<topic>-registration.md | hypothesis-first | Pre-registered hypotheses (immutable post-commit) |
docs/eureka/registrations/INDEX.md | hypothesis-first (maintained) | Machine-readable chain of active/superseded/amended registrations |
docs/eureka/plans/YYYY-MM-DD-<topic>-experiments.md | experiment-design | Executable experiment plans |
docs/eureka/audits/YYYY-MM-DD-claims-audit.md | claims-audit | Number-traceability + figure-integrity audit reports |
docs/eureka/reviews/YYYY-MM-DD-review.md | research-reviewer agent | 7-dimension scored review reports |
docs/eureka/novelty-audits/YYYY-MM-DD-novelty-audit.md | novelty-competitive-audit | Pre-submission novelty competitive audit reports |
docs/eureka/verifications/YYYY-MM-DD-verification.md | verification-before-publication | Final pre-submission verification reports |
docs/eureka/journal/YYYY-MM-DD.md | research-journal | Narrative session journal entries |
Why these paths are in YOUR repo, not Eureka's: these files are your project's state — they document decisions, hypotheses, and reviews for your research. Eureka ships the skills; you ship the artifacts. The plugin repo intentionally gitignores docs/eureka/ to avoid accidentally committing example artifacts that would confuse new users about what belongs to the plugin vs. their project.
Getting started: mkdir -p docs/eureka/{designs,registrations,plans,audits,reviews,novelty-audits,verifications,journal} in your project the first time you invoke an Eureka skill that writes to those paths. Skills will create them on demand if missing.
Philosophy
- Hypothesis before data — The scientific TDD. Register the prediction before you can possibly know the answer.
- Evidence before claims — No publication claim without fresh verification.
- Null results are results — Report all experiments, including the ones that failed to support your hypothesis. Selective reporting is p-hacking by omission.
- Systematic over ad-hoc — Root cause investigation beats re-running with a different seed.
- Reproducibility is not retrofittable — Data version, seeds, configs, and environment must be locked from day one.
Coexistence with Superpowers
Eureka and Superpowers are designed to work side by side. They share the same plugin architecture and session-start hook mechanism. Both hooks fire; both using-* bootstrap skills are injected at session start. Claude uses the appropriate namespace for the task:
| Task | Namespace |
|---|---|
| Writing or refactoring code | superpowers:* |
| Running or writing about experiments | eureka:* |
| Software design | superpowers:brainstorming |
| Research design | eureka:research-brainstorming |
| Code review | superpowers:code-reviewer |
| Scientific review | eureka:research-reviewer |
Use the namespace that matches the artifact: code → Superpowers, science → Eureka.
FAQ
I already have an ongoing project. Do I need to migrate or restructure anything?
No. Eureka is strictly additive. It never requires you to move, rename, or delete files. New artifacts (designs, registrations, plans, journal entries, audit reports) are saved under docs/eureka/... with specific filenames. Your existing project structure stays untouched.
I've already run experiments without pre-registration. Is that a problem?
No. Label those experiments as exploratory in your eventual writeup — which is scientifically honest — and apply hypothesis-first to the next confirmatory analysis you run. Retroactive pre-registration is not a thing; exploratory-then-confirmatory-replication is the right workflow in the scientific literature, and Eureka models it directly.
I have a CLAUDE.md in my project. Does Eureka override my instructions?
No. Your instructions always win. The priority order is:
- Your explicit instructions (
CLAUDE.md, direct requests) — highest - Eureka skills — override default agent behavior
- Default system prompt — lowest
If your CLAUDE.md says "skip pre-registration for this sandbox project," Eureka honors that.
The mandatory questions feel like overkill for exploratory work.
Two options:
- Label the session as exploratory.
hypothesis-firsthas an explicit Exploratory Track — you can proceed without full pre-registration as long as the exploratory label is preserved in any eventual writeup. The discipline exists to prevent confirmatory claims from being smuggled in through the exploratory door, not to block legitimate exploration. - Add a standing rule to
CLAUDE.md. Example:For scratch analyses under notebooks/exploratory/, skip research-brainstorming's mandatory questions.
Does Eureka respond to non-English trigger phrases?
The skill bodies and README are English-only. However, a few skills (currently research-journal and whats-next) include Korean trigger phrases in their description field — short phrases like "기록해둬" (log this), "이제 뭐 해야 하지?" (what should I do next?), "어디쯤이지" (where am I?) — so that Korean-speaking users can trigger the skill naturally in Korean. The agent's response is in whatever language the user is using. If you want additional language support (Japanese, Spanish, etc.), open an issue or PR — the mechanism is trivial (add phrases to the description frontmatter field).
How do I disable a specific skill temporarily?
Add an override to CLAUDE.md. Example:
Do not invoke eureka:claims-audit during drafting — only when the manuscript is complete.
The instruction-priority rule ensures this beats the skill's default behavior. Eureka skills are opinionated but never insubordinate.
Contributing
Skills live directly in this repository. To contribute a new skill or improve an existing one:
- Fork the repository
- Create a branch for your skill
- Follow the structure of existing
SKILL.mdfiles (seeskills/hypothesis-first/SKILL.mdas a canonical example) - Every new discipline-enforcing skill should include an Iron Law, a rationalization table, and red-flag checklist
- Submit a PR
License
MIT License — see LICENSE file for details.
Acknowledgments
- Jesse Vincent for Superpowers, whose architecture and discipline-through-skills philosophy this plugin directly borrows from.