Automatic Textbook Formalization
Article URL: https://github.com/facebookresearch/repoprover Comments URL: https://news.ycombinator.com/item?id=47631648 Points: 21 # Comments: 7
Code for Automatic Textbook Formalization (Gloeckle, Rammal, Arnal, Munos, Cabannes, Synnaeve, Hayat, 2026).
RepoProver is a multi-agent scaffold for large-scale formalization of mathematics textbooks in Lean. It orchestrates multiple LLM agents that collaborate on a shared git repository with the Lean project: sketcher agents translate definitions and theorem statements, prover agents fill in proofs, and reviewer agents enforce quality via pull request reviews. Coordination happens through a lightweight file-system-based issue tracker and a merge queue that ensures the main branch always builds.
This code produced an automatic formalization of the graduate textbook Algebraic Combinatorics by Darij Grinberg.
Setup
Requires Python 3.10+. Install in editable mode:
pip install -e .
Preparing a formalization project
RepoProver operates on a Lean project repository. Before running, you need to set up:
- Create a Lean project with Mathlib and build it:
lake init MyProject math lake update lake build
- Add LaTeX source files under a tex/ directory inside the project, organized by topic:
MyProject/ ├── lakefile.lean ├── lean-toolchain ├── lake-manifest.json ├── MyProject.lean # root import file ├── MyProject/ │ └── tex/ # LaTeX source chapters │ ├── all.tex # full textbook source (optional) │ ├── Topic1/ │ │ ├── Chapter1.tex │ │ └── Chapter2.tex │ └── Topic2/ │ └── ... ├── manifest.json # chapter manifest (see below) ├── CONTENTS.md # structure documentation (see below) └── issues/ # issue tracker (see below)
The tex files should be split by chapter/section so each can be assigned to a sketcher agent independently. An all.tex with the full source can be included for reference. Note that tex files are read-only — agents can read them but never modify source material.
-
Create a CONTENTS.md at the project root documenting the structure of tex sources and corresponding Lean files. The coordinator generates an initial version from the manifest, and agents update it as the Lean codebase evolves. It serves as the central reference for project structure, proof status and architecture notes.
-
Create a manifest.json at the project root listing the chapters to formalize and their target theorems/definitions. Each chapter entry has:
id: unique identifier for the chapter title: human-readable chapter title source_path: path to the LaTeX source file (relative to project root) target_theorems: list of theorem/definition IDs to formalize from this chapter
See configs/example_manifest.json for a full example from the algebraic combinatorics case study.
-
Create an empty issues/ directory at the project root. Agents use this as a lightweight file-system-based issue tracker — they create short YAML files here to flag blockers, request refactorings, or coordinate work across chapters.
-
Initialize git (with branch name main) in the project if not already done — RepoProver uses git for version control, branching and merging.
Usage
Running the coordinator
python -m repoprover run /path/to/lean/project --pool-size 10
This starts the main coordinator loop which launches sketcher, prover, maintainer and reviewer agents, manages the merge queue and tracks progress. The project state is saved in .repoprover/ inside the Lean project directory.
Use --clean to start from scratch, --verbose for debug logging.
Multi-node (SLURM)
For distributed runs across multiple machines, use the stool launcher:
python -m repoprover.stool --name myrun --project /path/to/lean/project
The stool launcher snapshots the repoprover code to a dump directory, symlinks the Lean project (avoiding slow copies of .lake/ and .git/) and submits a SLURM job. Rank 0 runs the coordinator in a background thread; all ranks (including rank 0) run as workers that pull tasks from the coordinator.
Options:
-
--launcher bash — run directly if already inside an salloc session
-
--pool-size N — number of Lean REPL instances per node (default: 10)
-
--nodes N — number of SLURM nodes (default: 1)
-
--agents-per-target N — max parallel agents per theorem/issue (default: 1)
-
--prs-to-issues — convert pending PRs to issues when resuming a run
-
--clean — wipe state and restart from scratch
-
--dirs-exists-ok — reuse an existing dump directory
See configs/example.yaml for an example configuration.
Analysis scripts
# Token usage breakdown by agent type and outcome python scripts/count_tokens.py /path/to/lean/project# Token usage breakdown by agent type and outcome python scripts/count_tokens.py /path/to/lean/projectAgent efficiency plots over time
python scripts/plot_agent_efficiency.py /path/to/lean/project --out ./plots`
Smoke test
A toy project is included under examples/toy_project/ for quick testing. The setup script copies the files to a working directory, initializes git, fetches Mathlib and builds the project:
bash examples/toy_project/setup.sh /tmp/repoprover-toy-test
Then run repoprover on it:
source .venv/bin/activate python -m repoprover run /tmp/repoprover-toy-test --pool-size 2 --verbosesource .venv/bin/activate python -m repoprover run /tmp/repoprover-toy-test --pool-size 2 --verboseThe toy project has one chapter with 4 trivial targets (a definition and 3 theorems about doubling natural numbers).
Trajectory viewer
To inspect agent trajectories from a run:
python -m repoprover.viewer --dir /path/to/lean/project/runs --port 8080
License
This project is licensed under the terms in LICENSE.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchgithub
Direct Access for Answers to Conjunctive Queries with Aggregation
arXiv:2303.05327v3 Announce Type: replace Abstract: We study the fine-grained complexity of conjunctive queries with grouping and aggregation. For common aggregate functions (e.g., min, max, count, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated with a suitable commutative semiring. We investigate the ability to evaluate such queries by constructing in loglinear time a data structure that provides logarithmic-time direct access to the answers ordered by a given lexicographic order. This task is nontrivial since the number of answers might be larger than loglinear in the size of the input, so the data structure needs to provide a compact representation of the space of answers. In the absence of aggregation and annotation, past research establishe

StoryBlender: Inter-Shot Consistent and Editable 3D Storyboard with Spatial-temporal Dynamics
arXiv:2604.03315v1 Announce Type: new Abstract: Storyboarding is a core skill in visual storytelling for film, animation, and games. However, automating this process requires a system to achieve two properties that current approaches rarely satisfy simultaneously: inter-shot consistency and explicit editability. While 2D diffusion-based generators produce vivid imagery, they often suffer from identity drift along with limited geometric control; conversely, traditional 3D animation workflows are consistent and editable but require expert-heavy, labor-intensive authoring. We present StoryBlender, a grounded 3D storyboard generation framework governed by a Story-centric Reflection Scheme. At its core, we propose the StoryBlender system, which is built on a three-stage pipeline: (1) Semantic-S

Human-Robot Copilot for Data-Efficient Imitation Learning
arXiv:2604.03613v1 Announce Type: new Abstract: Collecting human demonstrations via teleoperation is a common approach for teaching robots task-specific skills. However, when only a limited number of demonstrations are available, policies are prone to entering out-of-distribution (OOD) states due to compounding errors or environmental stochasticity. Existing interactive imitation learning or human-in-the-loop methods try to address this issue by following the Human-Gated DAgger (HG-DAgger) paradigm, an approach that augments demonstrations through selective human intervention during policy execution. Nevertheless, these approaches struggle to balance dexterity and generality: they either provide fine-grained corrections but are limited to specific kinematic structures, or achieve generalit
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Subset Balancing and Generalized Subset Sum via Lattices
arXiv:2604.04656v1 Announce Type: new Abstract: We study the \emph{Subset Balancing} problem: given $\mathbf{x} \in \mathbb{Z}^n$ and a coefficient set $C \subseteq \mathbb{Z}$, find a nonzero vector $\mathbf{c} \in C^n$ such that $\mathbf{c}\cdot\mathbf{x} = 0$. The standard meet-in-the-middle algorithm runs in time $\tilde{O}(|C|^{n/2})=\tilde{O}(2^{n\log |C|/2})$, and recent improvements (SODA~2022, Chen, Jin, Randolph, and Servedio; STOC~2026, Randolph and W\k{e}grzycki) beyond this barrier apply mainly when $d$ is constant. We give a reduction from Subset Balancing with $C = \{-d, \dots, d\}$ to a single instance of $\mathrm{SVP}_{\infty}$ in dimension $n+1$, which yields a deterministic algorithm with running time $\tilde{O}((6\sqrt{2\pi e})^n) \approx \tilde{O}(2^{4.632n})$, and a r

A characterization of one-sided error testable graph properties in bounded degeneracy graphs
arXiv:2604.04466v1 Announce Type: new Abstract: We consider graph property testing in $p$-degenerate graphs under the random neighbor oracle model (Czumaj and Sohler, FOCS 2019). In this framework, a tester explores a graph by sampling uniform neighbors of vertices, and a property is testable with one-sided error if its query complexity is independent of the graph size. It is known that one-sided error testable properties for minor-closed families are exactly those that can be defined by forbidden subgraphs of bounded size. However, the much broader class of $p$-degenerate graphs allows for high-degree ``hubs" that can structurally hide forbidden subgraphs from local exploration. In this work, we provide a complete structural characterization of all properties testable with one-sided error

DAG Covers: The Steiner Point Effect
arXiv:2604.04186v1 Announce Type: new Abstract: Given a weighted digraph $G$, a $(t,g,\mu)$-DAG cover is a collection of $g$ dominating DAGs $D_1,\dots,D_g$ such that all distances are approximately preserved: for every pair $(u,v)$ of vertices, $\min_id_{D_i}(u,v)\le t\cdot d_{G}(u,v)$, and the total number of non-$G$ edges is bounded by $|(\cup_i D_i)\setminus G|\le \mu$. Assadi, Hoppenworth, and Wein [STOC 25] and Filtser [SODA 26] studied DAG covers for general digraphs. This paper initiates the study of \emph{Steiner} DAG cover, where the DAGs are allowed to contain Steiner points. We obtain Steiner DAG covers on the important classes of planar digraphs and low-treewidth digraphs. Specifically, we show that any digraph with treewidth tw admits a $(1,2,\tilde{O}(n\cdot tw))$-Steiner DA

Online Graph Balancing and the Power of Two Choices
arXiv:2604.04159v1 Announce Type: new Abstract: In the classic online graph balancing problem, edges arrive sequentially and must be oriented immediately upon arrival, to minimize the maximum in-degree. For adversarial arrivals, the natural greedy algorithm is $O(\log n)$-competitive, and this bound is the best possible for any algorithm, even with randomization. We study this problem in the i.i.d. model where a base graph $G$ is known in advance and each arrival is an independent uniformly random edge of $G$. This model generalizes the standard power-of-two choices setting, corresponding to $G = K_n$, where the greedy algorithm achieves an $O(\log\!\log n)$ guarantee. We ask whether a similar bound is possible for arbitrary base graphs. While the greedy algorithm is optimal for adversaria

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!