NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...1 min read

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

Share
NOW LET US Article – LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

Researchers have introduced LeanMarathon, a novel multi-agent harness designed to solve the challenges of long-horizon mathematical autoformalization. By utilizing an evolving blueprint and a two-stage orchestrator, the system successfully formalized complex theorems autonomously.

Computer Science > Artificial Intelligence

Title:LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

View PDF HTML (experimental)Abstract:Long-horizon autoformalization of research mathematics fails not only at hard lemmas, but at scale: statements drift, dependencies tangle, context decays, and local repairs corrupt distant work. We present LeanMarathon, a multi-agent harness for reliable research-level Lean autoformalization. Its core abstraction is an evolving blueprint: a Lean file that serves simultaneously as formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents construct, audit, prove, and repair this blueprint. These agents are coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review and then discharges the proof directed acyclic graph (DAG) from its dynamic leaves upward in parallel CI-gated rounds. LeanMarathon turns one brittle multi-hour run into many local, recoverable, parallel transactions. We evaluate LeanMarathon on two recent research papers spanning four Erdős problems (#1051, #1196, #164, #1217). Across three autonomous runs, it formalizes all seven target theorems with no sorry, proving 258 lemmas and theorems. These results show that reliable AI co-mathematics requires not only stronger provers, but durable harnesses that preserve target fidelity across long mathematical developments. The code can be found at this https URL.

Current browse context:

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

© 2026 Now Let Us. All rights reserved.

Source: arXiv cs.AI Recent

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory

agentic-systems

Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory

Researchers have developed a novel AI framework that combines uncertainty-aware functional prediction with component-level fatigue assessment to evaluate the reuse potential of returned products in circular factories.

NOW LET US Related – Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers

agentic-systems

Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers

A new study analyzing 403 hyperscale data centers in the US reveals that the AI boom is driving electricity consumption and carbon emissions to alarming levels, with their carbon intensity averaging 48% higher than the national grid average.

NOW LET US Related – An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

agentic-systems

An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

Researchers have developed an interpretable and trustworthy AI framework to study the relationship between knee joint structural abnormalities and pain progression. By combining deep learning with advanced statistical modeling, this framework significantly improves prediction accuracy and clinical reliability.

NOW LET US Related – How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment

agentic-systems

How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment

A study analyzes a discontinued field experiment on Reddit where undisclosed AI agents engaged in live debates, revealing highly persuasive and manipulative rhetorical tactics.

NOW LET US Related – What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

agentic-systems

What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

Researchers propose PACT, a protocolized communication framework that optimizes inter-agent interaction in Multi-Agent Systems (MAS). By condensing raw outputs into compact action-state records, PACT significantly reduces token usage and inference costs while maintaining or improving task performance.

NOW LET US Related – Agents' Last Exam

agentic-systems

Agents' Last Exam

A new benchmark called "Agents' Last Exam" (ALE) has been introduced to evaluate AI agents on long-horizon, economically valuable, real-world tasks, revealing that current models achieve an average pass rate of just 2.6% on the hardest tier.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.