Lunheng (论衡)
Lunheng is a multi-agent framework for comprehensive paper reviews, using specialized roles to enhance manuscript quality.
Lunheng (论衡)
Multi-agent paper review framework, anchored to top-tier journal rubrics.
Pure Claude / Claude Code skill — no extra API key required.
"权衡论证, 校释虚妄" — 王充《论衡》(~80 CE)
"Weighing arguments, examining what is false" — Wang Chong, Lunheng
What is Lunheng?
Lunheng is a multi-agent paper review and improvement framework that turns Claude into a coordinated review committee. Six specialist agents (Architect, Evaluator-9, Writer, Refiner, Chief Editor, Aggregator) collaborate around a persistent visual contract AND a word-budget contract — shared JSON state that tracks figures, tables, terminology, cross-references, AND venue-appropriate length throughout the revision loop.
Distinguishing features:
- Anchored 1–10 rubric across 9 dimensions (8 original + D9 Narrative & Conciseness), mapped to actual NeurIPS 2025 / Nature / JACS reviewer practice (not generic categories).
- Venue-aware word budgeting —
journal/thesis/nature_sub/conferenceprofiles with enforced per-section char limits. - Chief Editor role (v2.0) — enforces budget and narrative flow; can override Writer additions; prevents the "answer-every-reviewer-in-text" anti-pattern.
- Every Evaluator proposes
cutspaired withfixes(v2.0) — no net additions without offsetting deletions. - NeurIPS-style 16-item Reproducibility Checklist with chemistry/materials extensions.
- Pure Claude sub-agents — no third-party API key required.
- Public case study showing R0 → R3 trajectory on a real Q1-target manuscript.
What's new in v2.0 (2026-04-14)
v1.0 produced technically polished but overstuffed papers — reviewers rewarded additions (expand Limitations to 6 items, add \paragraph{Protocol} for every method detail, pad Broader Impact to 1000+ chars). The composite score climbed, but the paper read like an FAQ rather than a coherent narrative. v2.0 fixes this with four structural changes:
- Required
venueparameter with hard word budgets (journal= 8-10k CN chars / 6-8k EN words;nature_sub= 2.5-3.5k;thesis= 15-30k) - D9 Narrative & Conciseness as the 9th evaluation dimension — penalizes overshoot, excess
\paragraphsmall-headings, FAQ-style fragmentation - Chief Editor role inserted after Refiner — authority to override Writer; compresses to budget; moves method details to SI
- Mandatory
cutsfield in every Evaluator YAML — each fix must be paired with a comparable-size cut unless the paper is under budget
See skills/lunheng/SKILL.md for full v2.0 workflow and anti-pattern catalogue.
What's new in v2.1 (2026-04-17)
v2.0 shipped a robust budget/narrative framework, but a real-paper audit (a DAC-ML manuscript targeting a Q1 journal) uncovered 17 "placeholder" errors where Writer agents had emitted specific numbers — subset sizes, enrichment factors, equations, units — that were never verified against source data. Representative cases:
B subset (n=2142)— an experiment that was never run; the number came from3555 × 60%as a Writer shortcut for "the literature-only subset"C subset (n=3591)— a stray number from an early draft; the actual dataset had 3555 rowsAmine EF = 1.57— Writer's pen-and-paper arithmetic (50.4% / 32.1%) while the real OOF value was1.33CC backfill covers ~38%— actual coverage was 2.6% (91 / 3555 rows)- SI's Clausius–Clapeyron form written as equi-loading
ln P vs 1/Twhen the code actually used van't Hoff equi-pressureln(q₁/q₂) vs 1/T N content (wt%)in the main-text feature table, when the CSV column and SI both saymmol/g
v2.1 adds numerical grounding as a Writer hard constraint (HARD CONSTRAINT §8):
- For every specific number / count / percentage / ratio, Writer must cite its source file path (e.g.,
data/xxx.csvrow count,output/yyy.jsonfield, computed fromzzz.npz). - If the source is inaccessible → use qualitative language ("majority", "small fraction") OR tag
[VERIFY: <description>]for Refiner / Chief Editor to resolve. - Forbidden: fabricating a specific count by multiplying total × rounded percentage (e.g.,
3555 × 60% → 2142presented in prose as if from a real experiment). - Writer output block must include a traceability section listing each number's source file.
This is a Writer-level patch — no workflow changes, no new agents, no new dimensions. v2.0 pipelines keep working; v2.1 just catches a category of fake-specific numbers that v1.0–v2.0 Writers had been generating without guard-rails.
See skills/lunheng/SKILL.md Writer HARD CONSTRAINTS §8 and Anti-patterns #6 for the exact prompt language.
Why "Lunheng"?
The name comes from Lunheng (《论衡》), Wang Chong's ~80 CE treatise — the first systematic Chinese work on weighing evidence and refuting unsupported claims. Lunheng's job is the same: weigh every claim against the evidence the paper provides, and surface what is unsupported.
Quick Start
Install as a Claude Code / OpenClaw skill
One-liner (any platform):
git clone https://github.com/Yugenee/Lunheng.git && cd Lunheng && python install.py
The installer copies both lunheng and lunheng-quick skills to ~/.claude/skills/. Run python install.py --uninstall to remove.
Windows users can also double-click install.bat. macOS / Linux users: bash install.sh.
Manual install (if you don't trust the script):
mkdir -p ~/.claude/skills
cp -r skills/lunheng ~/.claude/skills/
cp -r skills/lunheng-quick ~/.claude/skills/
Run — basic
In Claude Code (or OpenClaw), invoke:
/lunheng <YOUR_PAPER_DIR>/
<YOUR_PAPER_DIR> is the path to the directory that holds your manuscript — it is a placeholder, not a literal name. Replace it with your actual directory name. The path can be relative (resolved against the Claude Code working directory) or absolute.
Examples — three users with three different directory layouts:
/lunheng paper_final/ # relative path, dir named "paper_final"
/lunheng my_thesis/ # dir named "my_thesis"
/lunheng D:/research/2026-q1/ # absolute path on Windows
/lunheng ~/papers/dac_review/ # absolute path on macOS/Linux
For example, if your project layout is:
my_paper/
├── main.tex # ← the manuscript source
├── references.bib # ← bibliography
└── figures/ # ← figures referenced by main.tex
├── Fig1.pdf
└── Fig2.png
then call:
/lunheng my_paper/
Lunheng will auto-detect main.tex as the source, references.bib as the bibliography, and figures/ as the asset directory. Auto-detection priority: *.tex (containing \documentclass) → *.md → *.docx → *.pdf.
What happens, step by step
-
Step 0 — Detect & init (≈ 5 sec) The skill detects the format and creates
<paper_dir>/lunheng_workspace/to hold all intermediate artifacts. -
Step 1 — Architect agent (≈ 1–3 min, 1 sub-agent) Reads the entire manuscript and produces:
lunheng_workspace/blueprint.md— section-by-section argument outline + 3 most severe structural issueslunheng_workspace/visual_contract.json— registry of every figure/table/term that downstream agents must respect
-
Step 2 — Evaluator bench (Round 1) (≈ 3–5 min, 8 sub-agents in parallel) Eight specialized evaluators score each dimension D1–D8 with anchored 1–10 scores. Composite $R_1 = \frac{1}{8} \sum D_k$ is computed.
-
Step 3 — Generate-Evaluate-Adapt loop (≈ 5–10 min per round, only for
tex/md)- If $R \geq 7.0$ → STOP (paper is at borderline-accept quality).
- If $R < 7.0$ → dispatch Writer agents (one per low-scoring section) and a Refiner agent (global polish), then re-run the Evaluator bench. Repeat up to 3 times.
-
Step 4 — Compile & verify (≈ 30 sec, only for
tex/md)- LaTeX:
xelatex → bibtex → xelatex × 2, thengrep "undefined"must be 0 - Markdown: optional
pandoc → pdffor a visual check
- LaTeX:
-
Step 5 — Visual contract validation Verifies every label in the contract is actually defined and referenced.
-
Step 6 — Document results Writes
<paper_dir>/LUNHENG_LOG.mdwith the complete score progression, per-round changes, and remaining issues.
Expected output files
After a successful run on a .tex paper:
my_paper/
├── main.tex # ← REVISED in-place by Writer/Refiner
├── main.pdf # ← REGENERATED by xelatex
├── LUNHENG_LOG.md # ← NEW: full per-round score log
└── lunheng_workspace/
├── main_round0.tex # snapshot of original
├── main_round0.pdf
├── blueprint.md # Architect's plan
├── visual_contract.json # final contract state
├── round1_evaluators/ # 8 raw evaluator outputs
└── round1_writer/ # Writer outputs (if loop ran)
For .docx / .pdf papers, main.tex is not modified; instead you get LUNHENG_EDIT_LIST.md with structured edit instructions to apply manually.
Run — common variations
| Goal | Command |
|---|---|
| Default review (auto-detect everything) | /lunheng paper/ |
| Stricter quality bar (e.g. for top-tier venue) | /lunheng paper/ — stop_threshold: 8.0 |
| Faster, only 2 rounds | /lunheng paper/ — max_iterations: 2 |
| Pause for your approval before each round | /lunheng paper/ — human_checkpoint: true |
| Auto-detect picks wrong file | /lunheng paper/ — main: my_paper.tex |
| Force Word mode (review-only) | /lunheng paper/ — format: docx |
| Force PDF mode (read-only review) | /lunheng paper/ — format: pdf |
| Emphasize Nature criteria | /lunheng paper/ — venue: nature |
| Emphasize NeurIPS criteria | /lunheng paper/ — venue: neurips |
Quick-polish mode
If your paper is structurally sound and you just want a writing/clarity sweep (no Architect, no Writer/Refiner split, single reviewer):
/lunheng-quick path/to/paper/
- 2 rounds, single reviewer per round
- ≈ 4 sub-agent calls total (vs. 13–25 for full
/lunheng) - Best when you want fast feedback before major revision
Cost & timing reference
Based on the DAC case study (28-page chemistry-ML paper):
| Mode | Sub-agent calls | Wall time | Tokens (Opus 4.6) |
|---|---|---|---|
| Round 0 baseline only | 8 evaluators | ~5 min | ~40k |
| Full first round (Architect + Evaluators) | 9 | ~10 min | ~80k |
| Full revision round (Writer × N + Refiner + Evaluators) | 10–18 | ~15 min | ~120k |
Quick-polish (lunheng-quick) | 4 | ~3 min | ~30k |
Re-running on the same paper
You can run /lunheng repeatedly. Each call starts fresh:
- Archives the previous compiled PDF as
lunheng_workspace/main_round0.pdf - Re-detects format and main file (auto-detection runs again)
- Overwrites the previous workspace
This lets you iterate: get a review, manually apply some of the suggestions, then re-run to see the score change.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
Unknown skill: lunheng | Skill not installed | Run python install.py again, or check ~/.claude/skills/lunheng/ exists |
Auto-detect found no main file | <paper_dir> is empty or no recognized format | Pin with --main: filename.tex or check the path |
| Evaluators run but Writer/Refiner is skipped | Format is docx or pdf (review-only by design) | Use tex/md for full pipeline, or use the edit list |
| LaTeX compile fails after revision | Writer introduced a syntax issue | Restore from lunheng_workspace/main_round0.tex and re-run with --max_iterations: 0 to only review |
| Composite score doesn't improve between rounds | Some dimensions need experiments (e.g. external baselines) the agent can't fabricate | Lunheng will document the gap; you must run the actual experiments |
Evaluation Rubric
9 dimensions (v2.0 adds D9 Narrative & Conciseness), each scored 1–10 with anchored anchors at four levels (1–3 / 4–6 / 7–8 / 9–10), plus an Overall verdict (1–6) and Confidence (1–5).
| Dimension | Aligned with |
|---|---|
| D1. Soundness | NeurIPS Soundness, Nature Technical Rigor |
| D2. Significance & Originality | NeurIPS Significance + Originality |
| D3. Clarity & Organization | NeurIPS Clarity, JACS Clarity |
| D4. Experimental Substance | NeurIPS Quality (claims supported) |
| D5. Reproducibility | NeurIPS 16-item Checklist |
| D6. Citation Quality | JACS Literature Appropriateness |
| D7. Visual & Tabular Communication | Nature Figure Quality |
| D8. Ethics, Limitations & Broader Impact | NeurIPS Limitations + Ethics |
📖 Full rubric with anchored examples → docs/EVALUATION_RUBRIC.md
📋 Reproducibility checklist → docs/REPRODUCIBILITY_CHECKLIST.md
🏗 Architecture & contract schema → docs/ARCHITECTURE.md
📄 Multi-format support (.tex / .md / .docx / .pdf) → docs/MULTI_FORMAT.md
Supported Input Formats
| Format | Auto-detect | Evaluators | Architect | Writer/Refiner | Output |
|---|---|---|---|---|---|
.tex (LaTeX) | ✅ | ✅ | ✅ | ✅ in-place | revised .tex + LUNHENG_LOG.md |
.md (Markdown) | ✅ | ✅ | ✅ | ✅ in-place | revised .md + LUNHENG_LOG.md |
.docx (Word) | ✅ | ✅ | ✅ | edit-list only | LUNHENG_EDIT_LIST.md (apply manually) |
.pdf (compiled) | ✅ | ✅ | ✅ | read-only | LUNHENG_REVIEW.md (scores + suggested fixes) |
LaTeX and Markdown get the full pipeline (multi-agent revision in-place). Word and PDF get review-only mode — a structured edit list the author applies. See docs/MULTI_FORMAT.md for full per-format behavior.
Real Case Study
Lunheng was applied to a Chinese-language chemistry-ML manuscript (DAC sorbents prediction, Q1 target) over three rounds:
| Round | Composite Score | What changed |
|---|---|---|
| R0 (baseline) | 6.89 | Original draft |
| R3 (after gap fixes) | 8.04 | Targeted fixes per Lunheng's per-dimension score_change_criteria outputs (bootstrap CIs, hyperparameter table, Data Availability section, Broader Impact, table column-width fixes) |
📊 Full per-dimension breakdown → examples/dac_paper_case_study.md
Architecture in 60 Seconds
Architect ──→ Blueprint + Visual Contract
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
Writer (per-section) Refiner (global) Evaluator Bench
drafts under contract polish + contract (8 parallel agents,
update one per dimension)
│ │ │
└──────────────────────┴──────────────────────┘
▼
Aggregate R = Σ Dₖ / 8
R ≥ 7.0 → STOP
R < 7.0 → loop back
The Visual Contract is a JSON file persisting between agent calls. It carries:
global_visual_registry— every figure/table with semantic description and obligated sectionsection_obligations— which visuals MUST appear in each sectionterminology_glossary— canonical form for every key termvalidation_rules— unique-labels, all-visuals-referenced, etc.
This is the mechanism that prevents structural drift across multi-step revision. See docs/ARCHITECTURE.md for the full schema.
Installation Requirements
- Claude Code CLI ≥ 0.5 OR an equivalent Claude-agent harness
- A LaTeX distribution (MiKTeX / TeX Live) if you want PDF compilation
- Optional:
pandocif you want to also produce DOCX output
Ethical Use
Lunheng is for author-side self-review — writers polishing their own papers before submission.
It is NOT for:
- Producing fake peer reviews
- Generating fabricated experimental results
- Bypassing journal peer-review processes
- Reviewer impersonation
The framework explicitly refuses to invent experimental data; it only flags missing experiments and suggests what to add.
Roadmap
- v0.1 — Skill files + 8-dim rubric + visual contract schema
- v0.2 — Reproducibility checklist
- v0.3 — DAC case study public artifact
- v0.4 — Multi-format support (
.tex/.md/.docx/.pdf) - v0.5 — Venue-specific rubric profiles (
--venue=neurips/nature/jacs) - v0.6 — JSON schema validation for visual contract
- v0.7 — Telemetry hooks for review-quality monitoring
- v1.0 — Stable API + python wrapper for non-Claude-Code users
Citation
If Lunheng helps your paper, please cite:
@software{lunheng2026,
author = {Ding, Yujie},
title = {Lunheng: Multi-agent paper review framework anchored to top-tier journal rubrics},
year = {2026},
url = {https://github.com/Yugenee/Lunheng}
}
License
MIT — free to use, modify, and redistribute. No warranty.
Lunheng (论衡) — 中文版
多智能体学术论文审阅框架,对齐顶刊审稿标准。
纯Claude / Claude Code skill — 无需额外API密钥。
"权衡论证, 校释虚妄" — 王充《论衡》(~80 CE)
是什么
论衡是一个契约约束的多智能体论文评审与改进框架,把Claude变成一个协同工作的评审委员会。四类专业智能体(架构师 / 撰写者 / 润色者 / 评估者团)协同作业,共享一份持久化视觉契约——一个JSON状态记录所有图表、术语和交叉引用,在整个改稿循环中保持一致。
特色:
- 9维度 1–10 分锚定评分 (v2.0 新增 D9 叙事与精炼),对齐NeurIPS 2025 / Nature / JACS 的真实审稿规范
- Venue 感知字数预算 (v2.0):
journal/thesis/nature_sub/conference四种预设,按目标场景强约束总字数与小节字数 - Chief Editor 角色 (v2.0): 在 Refiner 之后强制压到字数预算,防止 "每个审稿点都在正文里答复" 的反模式
- 数字真值约束 (v2.1): Writer 产出的每个数字/计数/百分比必须引用源文件路径,或用
[VERIFY: ...]标签交给 Refiner 兜底;禁止总数 × 四舍五入百分比凑具体数 - NeurIPS式16项可复现性清单,含化学/材料学科扩展
- 纯Claude子代理,不依赖第三方API
- 公开真实案例:完整展示 R0→R3 在Q1论文上的轨迹
v2.1 新增 (2026-04-17)
v2.0 的字数/叙事框架稳定跑了一段时间后,在一次真实论文审计 (DAC-ML 稿件) 中发现 17 处 Writer 占位符错误: Writer 为了让段落看起来"具体",凭 总数 × 四舍五入百分比 造出 (1) 从未真正跑过的子集 n=2142、(2) 心算推出的 EF=1.57(真实 1.33)、(3) 与代码不符的 CC 方程形式、(4) 写错单位 (N 含量 wt% vs 实际 mmol/g) 等"看似具体但实际未核对"的数字。
v2.1 为 Writer 加一条硬约束:
- 每个数字、计数、百分比、比率必须引用源文件路径 (
data/*.csv,output/*.json,*.npz) - 无法访问源文件 → 用定性语言 ("多数"/"少数") 或
[VERIFY: ...]标签交给 Refiner 兜底 - 禁止
总数 × 四舍五入百分比 → 具体 N的心算造假 - Writer 输出块必须列每个数字的来源文件路径
这是 Writer 级别的 patch,不改变工作流或智能体数量。详见 skills/lunheng/SKILL.md Writer HARD CONSTRAINTS §8。
为什么叫"论衡"
王充《论衡》成书约公元80年,是中文世界第一部系统论述"权衡论据、校释虚妄"的著作。本框架的工作本质相同:把论文中的每个claim与其证据进行权衡,识别那些缺乏支撑的部分。
快速开始
一行安装(跨平台):
git clone https://github.com/Yugenee/Lunheng.git && cd Lunheng && python install.py
安装脚本会把 lunheng 和 lunheng-quick 两个skill复制到 ~/.claude/skills/。卸载用 python install.py --uninstall。
Windows用户也可以双击 install.bat;macOS/Linux 用 bash install.sh。
基本调用
在 Claude Code(或 OpenClaw)中:
/lunheng <你的论文目录>/
<你的论文目录> 是论文所在目录的路径——这是占位符,不是字面名字。替换为你实际的目录名。可以用相对路径(基于 Claude Code 当前工作目录)或绝对路径。
举例——三个用户的不同目录布局:
/lunheng paper_final/ # 相对路径,目录叫 paper_final
/lunheng my_thesis/ # 目录叫 my_thesis
/lunheng D:/research/2026-q1/ # Windows 绝对路径
/lunheng ~/papers/dac_review/ # macOS/Linux 绝对路径
例如项目结构:
my_paper/
├── main.tex # ← 论文源
├── references.bib # ← 参考文献
└── figures/ # ← main.tex 引用的图
├── Fig1.pdf
└── Fig2.png
调用 /lunheng my_paper/,论衡会自动识别 main.tex 是源文件、references.bib 是bib、figures/ 是图目录。自动检测优先级:*.tex(含 \documentclass)> *.md > *.docx > *.pdf。
执行流程(详细)
- Step 0 — 检测+初始化(约5秒)
检测格式,建立
<paper_dir>/lunheng_workspace/存所有中间产物。 - Step 1 — 架构师(约1-3分钟,1子代理)
通读全文,产出
blueprint.md(论证大纲+3大结构问题)+visual_contract.json(图表/术语注册表) - Step 2 — Round 1 评估者团(约3-5分钟,8并行子代理) 按D1-D8锚定打分;综合 $R_1 = \frac{1}{8} \sum D_k$
- Step 3 — 改稿循环(每轮5-10分钟,仅
tex/md)- $R \geq 7.0$ → STOP
- $R < 7.0$ → 撰写者(每低分节1个)+润色者(全局)+再评估,最多3轮
- Step 4 — 编译验证(约30秒,仅
tex/md)xelatex → bibtex → xelatex × 2,验证0 undefined refs - Step 5 — 视觉契约校验:每个label都有定义且被引用
- Step 6 — 写日志
<paper_dir>/LUNHENG_LOG.md
输出文件
.tex 论文跑完后:
my_paper/
├── main.tex # ← 撰写者/润色者已就地修改
├── main.pdf # ← xelatex 重新生成
├── LUNHENG_LOG.md # ← 新增:完整评分日志
└── lunheng_workspace/
├── main_round0.tex # 原稿快照
├── main_round0.pdf
├── blueprint.md # 架构师蓝图
├── visual_contract.json # 最终契约
├── round1_evaluators/ # 8个评估者原始输出
└── round1_writer/ # 撰写者输出(如循环触发)
.docx/.pdf 不会修改原稿;输出 LUNHENG_EDIT_LIST.md 由作者手动应用。
常用变种调用
| 目标 | 命令 |
|---|---|
| 默认评审(自动检测) | /lunheng paper/ |
| 顶刊更严标准 | /lunheng paper/ — stop_threshold: 8.0 |
| 只跑2轮加快 | /lunheng paper/ — max_iterations: 2 |
| 每轮等你确认再继续 | /lunheng paper/ — human_checkpoint: true |
| 指定主文件(自动选错时) | /lunheng paper/ — main: my_paper.tex |
| 强制Word模式(仅评不改) | /lunheng paper/ — format: docx |
| 强制PDF模式(只读评审) | /lunheng paper/ — format: pdf |
| Nature 标准侧重 | /lunheng paper/ — venue: nature |
| NeurIPS 标准侧重 | /lunheng paper/ — venue: neurips |
快速润色模式
论文结构已成型,只需文字打磨:
/lunheng-quick path/to/paper/
- 2轮,每轮单评审者
- 共约4子代理调用(vs
/lunheng13-25个) - 适合大改前先要快速反馈
成本与耗时(基于 DAC 案例研究)
| 模式 | 子代理数 | 墙钟时间 | tokens (Opus 4.6) |
|---|---|---|---|
| 仅Round 0基线 | 8评估者 | ~5分钟 | ~40k |
| 完整首轮(架构师+评估者) | 9 | ~10分钟 | ~80k |
| 完整改稿轮(撰写者+润色者+评估者) | 10-18 | ~15分钟 | ~120k |
| 快速润色 | 4 | ~3分钟 | ~30k |
重复运行
可在同一论文上反复调 /lunheng,每次fresh start:自动归档上轮PDF,重新检测,覆盖workspace。允许"评审→手动改→再评"的迭代工作流。
故障排查
| 症状 | 原因 | 修复 |
|---|---|---|
Unknown skill: lunheng | skill未安装 | 跑 python install.py,或检查 ~/.claude/skills/lunheng/ |
Auto-detect found no main file | 目录为空或格式不识别 | 用 --main: filename.tex 指定 |
| 评估者跑了但撰写者被跳过 | format 是 docx/pdf(设计上仅评不改) | 用 tex/md 走完整流程,或按 edit list 手动改 |
| 改稿后LaTeX编译失败 | 撰写者引入语法bug | 从 lunheng_workspace/main_round0.tex 恢复,加 --max_iterations: 0 仅评审 |
| 评分多轮不涨 | 某些维度需实际实验agent无法伪造 | 论衡会标记gap,需主人自己跑 |
8 维评分体系
每维度 1–10 分制,4个分数段都有锚点描述(1–3 / 4–6 / 7–8 / 9–10),加 Overall verdict (1–6) 与 Confidence (1–5)。
| 维度 | 对齐顶刊标准 |
|---|---|
| D1. 方法严谨性 | NeurIPS Soundness, Nature Technical Rigor |
| D2. 重要性与原创性 | NeurIPS Significance + Originality |
| D3. 清晰度与组织 | NeurIPS Clarity, JACS Clarity |
| D4. 实验充分性 | NeurIPS Quality |
| D5. 可复现性 | NeurIPS 16项清单 |
| D6. 引用质量 | JACS Literature Appropriateness |
| D7. 图表沟通 | Nature Figure Quality |
| D8. 伦理/局限/影响 | NeurIPS Limitations + Ethics |
支持的输入格式
| 格式 | 自动检测 | 评估者 | 架构师 | 撰写者/润色者 | 输出 |
|---|---|---|---|---|---|
.tex (LaTeX) | ✅ | ✅ | ✅ | ✅ in-place | 改稿 .tex + LUNHENG_LOG.md |
.md (Markdown) | ✅ | ✅ | ✅ | ✅ in-place | 改稿 .md + LUNHENG_LOG.md |
.docx (Word) | ✅ | ✅ | ✅ | 仅给清单 | LUNHENG_EDIT_LIST.md(手动应用) |
.pdf (已编译) | ✅ | ✅ | ✅ | 只读 | LUNHENG_REVIEW.md(评分+修复建议) |
LaTeX 和 Markdown 走完整流水线(多智能体原地改稿)。Word 和 PDF 走只评不改模式——结构化建议清单由作者手动应用。详见 docs/MULTI_FORMAT.md。
真实案例
将论衡应用于一篇中文化学-机器学习论文(DAC吸附材料预测,目标Q1期刊):
| 轮次 | 综合分 | 关键变化 |
|---|---|---|
| R0(基线) | 6.89 | 原始草稿 |
| R3(修补缺口后) | 8.04 | 按论衡 per-dimension score_change_criteria 针对性修复(bootstrap CI、超参表、数据可用性、Broader Impact、表格列宽) |
视觉契约(核心机制)
视觉契约(Visual Contract) 是一份在所有agent调用之间持久化的JSON文件,包含:
global_visual_registry— 每张图/表的语义描述与预期所属章节section_obligations— 每节必须包含哪些视觉元素terminology_glossary— 每个关键术语的标准形式validation_rules— unique-labels / all-visuals-referenced 等
这是防止多步改稿过程中"结构漂移"的核心机制。
伦理使用声明
论衡用于作者自审——写作者在投稿前自己打磨论文。不得用于伪造同行评审、生成虚假实验数据、绕过peer-review、冒充审稿人。框架明确拒绝编造实验数据;仅指出缺失的实验并建议补充方向。
协议
MIT — 自由使用、修改、再分发。无担保。