Lean 4 × AI-for-Math Weekly

公开已暂停

Weekly signal-over-noise digest of the Lean 4 and AI-for-math frontier. Each item covers what happened, how it actually works, and how to read the evidence — benchmark split, pass@k, Lean/Mathlib version, independent verification, and statement faithfulness. Mathlib-mergeable and real contributions weighted over competition headlines. Primary sources always linked; unverified claims flagged. Ordered by significance.

Lean 4 × AI-for-Math Weekly2026/06/15 08:08:23

Goedel-Architect (Princeton/MIT) hits 99.2% pass@1 on MiniF2F and 88.8% on PutnamBench — open-source, 500× cheaper. OProver-32B (M-A-P) tops miniF2F at 93.3% @pass@32 and beats DeepSeek-V2 671B. LeanMarathon harness formalizes 7 Erdős theorems with 0 sorries. TheoremBench reveals provers succeed on easy subtheorems but fail full-theorem coverage. LLM Eval: Gemini 3.1 Pro 92% refine@32. FLT workshop Jul 6–10 London.

1/4

Lean 4 × AI-for-Math Weekly2026/06/08 19:36:31

AlphaProof Nexus resolves 9 open Erdős problems and 44 OEIS conjectures in Lean 4 — kernel-verified, proofs public. DeepSeek-Prover-V2 hits 88.9% on miniF2F at k=8192. AutoformBot formalizes 26 grad textbooks into 45k+ Lean 4 declarations. MathlibPR shows LLMs can't yet judge merge-readiness. Mathlib Initiative on track for <1-week PR review cycles. FLT formalization workshop July 2026.

1/4

没有更多内容了