NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...3 min read

Automatic Textbook Formalization

Share
NOW LET US Article – Automatic Textbook Formalization

RepoProver is a multi-agent scaffold that orchestrates LLM agents to formalize mathematics textbooks into Lean code, featuring a collaborative workflow of sketchers, provers, and reviewers.

RepoProver is a multi-agent scaffold for large-scale formalization of mathematics textbooks in Lean. It orchestrates multiple LLM agents that collaborate on a shared git repository with the Lean project: sketcher agents translate definitions and theorem statements, prover agents fill in proofs, and reviewer agents enforce quality via pull request reviews. Coordination happens through a lightweight file-system-based issue tracker and a merge queue that ensures the main branch always builds.

This code produced an automatic formalization of the graduate textbook Algebraic Combinatorics by Darij Grinberg.

Requires Python 3.10+. Install in editable mode:

pip install -e .

RepoProver operates on a Lean project repository. Before running, you need to set up:

Create a Lean project with Mathliband build it:lake init MyProject math lake update lake build

Add LaTeX source filesunder atex/

directory inside the project, organized by topic:MyProject/ ├── lakefile.lean ├── lean-toolchain ├── lake-manifest.json ├── MyProject.lean # root import file ├── MyProject/ │ └── tex/ # LaTeX source chapters │ ├── all.tex # full textbook source (optional) │ ├── Topic1/ │ │ ├── Chapter1.tex │ │ └── Chapter2.tex │ └── Topic2/ │ └── ... ├── manifest.json # chapter manifest (see below) ├── CONTENTS.md # structure documentation (see below) └── issues/ # issue tracker (see below)

The tex files should be split by chapter/section so each can be assigned to a sketcher agent independently. An

all.tex

with the full source can be included for reference. Note that tex files are read-only — agents can read them but never modify source material. - Create aat the project root documenting the structure of tex sources and corresponding Lean files. The coordinator generates an initial version from the manifest, and agents update it as the Lean codebase evolves. It serves as the central reference for project structure, proof status and architecture notes.CONTENTS.md

Create aat the project root listing the chapters to formalize and their target theorems/definitions. Each chapter entry has:manifest.json

id

: unique identifier for the chaptertitle

: human-readable chapter titlesource_path

: path to the LaTeX source file (relative to project root)target_theorems

: list of theorem/definition IDs to formalize from this chapter

See

configs/example_manifest.json

for a full example from the algebraic combinatorics case study. - Create an emptyat the project root. Agents use this as a lightweight file-system-based issue tracker — they create short YAML files here to flag blockers, request refactorings, or coordinate work across chapters.issues/

directory - Initialize git(with branch namemain

) in the project if not already done — RepoProver uses git for version control, branching and merging.

python -m repoprover run /path/to/lean/project --pool-size 10

This starts the main coordinator loop which launches sketcher, prover, maintainer and reviewer agents, manages the merge queue and tracks progress. The project state is saved in .repoprover/

inside the Lean project directory.

Use --clean

to start from scratch, --verbose

for debug logging.

For distributed runs across multiple machines, use the stool launcher:

python -m repoprover.stool --name myrun --project /path/to/lean/project

The stool launcher snapshots the repoprover code to a dump directory, symlinks the Lean project (avoiding slow copies of .lake/

and .git/

) and submits a SLURM job. Rank 0 runs the coordinator in a background thread; all ranks (including rank 0) run as workers that pull tasks from the coordinator.

Options:

--launcher bash

— run directly if already inside ansalloc

session--pool-size N

— number of Lean REPL instances per node (default: 10)--nodes N

— number of SLURM nodes (default: 1)--agents-per-target N

— max parallel agents per theorem/issue (default: 1)--prs-to-issues

— convert pending PRs to issues when resuming a run--clean

— wipe state and restart from scratch--dirs-exists-ok

— reuse an existing dump directory

See configs/example.yaml

for an example configuration.

# Token usage breakdown by agent type and outcome
python scripts/count_tokens.py /path/to/lean/project
# Agent efficiency plots over time
python scripts/plot_agent_efficiency.py /path/to/lean/project --out ./plots

A toy project is included under examples/toy_project/

for quick testing. The setup script copies the files to a working directory, initializes git, fetches Mathlib and builds the project:

bash examples/toy_project/setup.sh /tmp/repoprover-toy-test

Then run repoprover on it:

source .venv/bin/activate
python -m repoprover run /tmp/repoprover-toy-test --pool-size 2 --verbose

The toy project has one chapter with 4 trivial targets (a definition and 3 theorems about doubling natural numbers).

To inspect agent trajectories from a run:

python -m repoprover.viewer --dir /path/to/lean/project/runs --port 8080

This project is licensed under the terms in LICENSE.

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Treating pancreatic tumours may have revealed cancer's master switch

dev-tools

Treating pancreatic tumours may have revealed cancer's master switch

A promising new drug called daraxonrasib has shown breakthrough results in treating pancreatic cancer, doubling median survival times. This achievement could pave the way for an entirely new class of cancer treatments.

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

NOW LET US Related – Open source AI must win

dev-tools

Open source AI must win

If artificial intelligence becomes a utility rented only from a few closed institutions, humanity loses its operational freedom. Open-source AI is a vital infrastructure for the future of our digital society.

NOW LET US Related – Statement on US government directive to suspend access to Fable 5 and Mythos 5

dev-tools

Statement on US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive forcing Anthropic to suspend all access to its Fable 5 and Mythos 5 models due to national security concerns, a move the AI safety startup strongly disputes.

NOW LET US Related – Electric motors with no rare earths

dev-tools

Electric motors with no rare earths

Renault Group is pioneering the development of electrically excited synchronous motors (EESM) that eliminate the need for rare earth magnets, reducing dependency on global monopolies while driving efficiency and sustainability.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.