lean-formalization

Lean 4 formalization workflow for Erdős problem proofs. Use when formalizing a solution in Lean 4, checking existing formalizations, or setting up the Lean environment for proof verification.

Lean 4 Formalization Workflow

A structured approach to formalizing Erdős problem proofs in Lean 4, from assessing formalization feasibility through verified compilation and upstream contribution.

Quick Start

from tools.query_db import ErdosDB

with ErdosDB() as db:
    p = db.get_problem("42")
    print(p.lean_url)  # Check formalization status
# Build the local Lean project
cd lean && lake build

Phase 1: Assessment

Before attempting formalization, determine what already exists.

1.1 Check the Problem's Lean Status

from tools.query_db import ErdosDB

with ErdosDB() as db:
    p = db.get_problem("{number}")
    lean_url = p.lean_url

    if lean_url is None:
        print("No Lean link — formalization from scratch")
    elif "/blob/" in lean_url or lean_url.endswith(".lean"):
        print(f"Specific .lean file: {lean_url}")
    else:
        print(f"Generic repo link: {lean_url}")

1.2 Three Categories

CategoryCountWhat It Means
Specific .lean URL~355DeepMind has a formal statement — read it first
Generic repo URL~824Repo exists but no specific file — likely no formalization
No lean_urlremainingNo formalization link at all

1.3 Decision Table

Lean StatusProof Grade (A-F)Action
Specific .lean URLA or BFormalize — statement exists, add proof
Specific .lean URLC or belowSkip formalization — proof not rigorous enough
Generic/No URLAConsider writing statement + proof from scratch
Generic/No URLB or belowSkip — not worth formalizing without existing statement

Phase 2: Environment Setup

2.1 Use the Local Lean Project

The repository includes a Lean 4 project at lean/:

cd lean
lake update   # First time: fetches mathlib (~5-10 min)
lake build    # Compile all files

See lean/README.md for full setup instructions.

2.2 Project Structure

lean/
  ├── lean-toolchain          # Lean version (must match formal-conjectures)
  ├── lakefile.toml            # Package config + mathlib dependency
  ├── ErdosProofs.lean         # Root import file
  └── ErdosProofs/
        ├── Template.lean      # Template for new proofs
        └── P{number}.lean     # Individual problem proofs

2.3 Optional: lean-lsp-mcp

If available, the lean-lsp-mcp MCP server provides interactive type-checking and goal state inspection directly in Claude Code.

Phase 3: Statement Formalization

3.1 Read the Existing Statement

If a specific .lean URL exists, fetch the formal statement from formal-conjectures:

# The lean_url typically points to:
# https://github.com/google-deepmind/formal-conjectures/blob/main/FormalConjectures/ErdosProblems/{file}.lean

Key things to extract:

  • The theorem or conjecture declaration
  • Type signatures (what's being quantified over)
  • Imported mathlib modules
  • Any custom definitions

3.2 Translate Definitions

Map informal math to Lean/mathlib types. See LEAN_PATTERNS.md for the full type mapping table.

Common translations:

  • "For all integers n" → ∀ n : ℤ
  • "There exists a prime p" → ∃ p : ℕ, Nat.Prime p ∧ ...
  • "The set of primes" → {p : ℕ | Nat.Prime p}
  • "Arithmetic progression" → ArithmeticProgression or explicit formulation

3.3 Verify Interpretation

Cross-check:

  1. Does the Lean statement match the LaTeX from the database?
  2. Do the quantifiers match? (∀ vs ∃, bounded vs unbounded)
  3. Are there implicit assumptions made explicit in the formalization?

Phase 4: Proof Translation

4.1 Start with sorry

theorem erdos_problem_42 : <statement> := by
  sorry

Verify the statement compiles before writing any proof.

4.2 Incremental Construction

Replace sorry one step at a time:

theorem erdos_problem_42 : <statement> := by
  intro n hn
  -- Step 1: establish base property
  have h1 : <intermediate> := by
    sorry
  -- Step 2: apply key lemma
  sorry

Build after each step to catch type errors early.

4.3 Common Tactic Patterns

See LEAN_PATTERNS.md for the full tactic cheat sheet. Quick reference:

Proof StepLean Tactic
Case splitrcases, obtain, by_cases
Inductioninduction n with, strong_induction_on
Rewritingrw, simp, ring
Contradictionby_contra, absurd, omega
Existentialuse, exact ⟨_, _⟩, refine ⟨?_, ?_⟩
Finishingexact?, apply?, omega, norm_num, decide

Phase 5: Iteration

5.1 Build-Check Loop

cd lean
lake build   # Check for errors

Common issues and fixes:

ErrorFix
unknown identifierMissing import — check import Mathlib.X.Y
type mismatchWrong type — check #check on terms
unsolved goalsProof incomplete — add more tactics
declaration uses sorryStill has sorry — keep going
failed to synthesizeMissing instance — add [inst : Class]

5.2 Debugging with #check and #print

#check Nat.Prime           -- See the type
#print Nat.Prime           -- See the definition
#check @Finset.sum         -- See implicit arguments
example : 2 + 2 = 4 := by norm_num  -- Quick sanity check

5.3 When to Bail Out

Stop formalizing if:

  • The proof relies on results not in mathlib (need custom theory development)
  • Type-level encoding of the problem is unclear or disputed
  • More than 3 build-fix cycles without progress on the core argument
  • The informal proof has a gap that becomes apparent during formalization

Document what you achieved and what blocked progress.

Phase 6: Verification & Submission

6.1 Local Verification

cd lean
lake build   # Must pass with NO sorry and NO warnings

Check that:

  • lake build succeeds
  • No sorry in the proof
  • No axiom declarations (unless justified)
  • Imports are minimal

6.2 Contribute Upstream

If the proof is for a problem in formal-conjectures:

# Fork and clone
git clone https://github.com/{you}/formal-conjectures
cd formal-conjectures
lake update && lake build  # Verify clean build

# Add proof file
# Edit FormalConjectures/ErdosProblems/{file}.lean

# Test
lake build

# Submit PR

6.3 Update Local Documentation

After successful formalization:

  1. Add the proof file to lean/ErdosProofs/P{number}.lean
  2. Add the import to lean/ErdosProofs.lean
  3. Note the formalization in your solution document
### Lean Formalization
- File: `lean/ErdosProofs/P{number}.lean`
- Status: Compiles with `lake build`, no `sorry`
- Upstream PR: [link if submitted]

Checklist Summary

□ Phase 1: Assessment
  □ Checked lean_url via SDK
  □ Categorized: specific file / generic link / none
  □ Decision: formalize or skip

□ Phase 2: Environment
  □ lean/ project builds (lake build)

□ Phase 3: Statement
  □ Read existing formal statement (if any)
  □ Type mappings verified
  □ Statement matches informal problem

□ Phase 4: Proof
  □ sorry-stub compiles
  □ Incremental construction
  □ All sorry removed

□ Phase 5: Iteration
  □ lake build passes
  □ Type errors resolved

□ Phase 6: Verification
  □ Clean build, no sorry, no axioms
  □ Upstream PR (if applicable)
  □ Local docs updated