Self-Supervised Theorem Discovery in a Formal Axiomatic System

Researchers have developed a self-supervised algorithm that allows AI to autonomously build a library of mathematical theorems starting only from basic axioms and inference rules. This system not only solves benchmark problems but also enhances the reasoning capabilities of Large Language Models (LLMs).

Computer Science > Artificial Intelligence

Title:Self-Supervised Theorem Discovery in a Formal Axiomatic System

View PDF HTML (experimental)Abstract:Recent artificial intelligence (AI) systems have shown remarkable progress in mathematical reasoning. Many existing approaches, including large language models (LLMs), draw on human prior knowledge in the form of mathematical text, code, or theorem libraries. Although these approaches are highly effective in practice, it remains an open question whether an agent can autonomously discover useful theorems without such human priors. We study this question in a formal axiomatic system by developing an agent that starts from axioms and inference rules alone and gradually grows a library of useful theorems. Concretely, we propose a self-supervised theorem-discovery algorithm that alternates between proof search and useful-theorem extraction, building a theorem library whose entries are reused as lemmas for subsequent proof search. Experiments show that the agent discovers tens of thousands of theorems and finds proofs for human-written benchmark problems, suggesting that its discoveries include theorems meaningful from a human mathematical perspective. Furthermore, the discovered theorems improve LLM proof performance when provided as prompt lemmas, indicating that they can serve as external knowledge for LLM reasoning. Our results provide evidence that useful theorems can emerge from proof search without relying on human-provided theorem libraries. More broadly, they suggest a path toward self-evolving AI systems for mathematics whose discoveries remain formally verifiable.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

GPTNT: Benchmarking Real-Time Collaboration Between Multimodal Agents on Keep Talking And Nobody Explodes

Researchers introduce GPTNT, a new benchmark based on the game 'Keep Talking and Nobody Explodes' to evaluate real-time collaboration between multimodal AI agents. The study reveals that current state-of-the-art models fail to defuse a single bomb in real-time, highlighting key weaknesses in AI coordination.

agentic-systems

The Two Genie Game: Adoption and Welfare in Audit-Grounded AI Governance

A game-theoretic study analyzes when harm-minimizing AI agents can displace approval-seeking RLHF agents in competitive markets, revealing that self-audited AI is not a silver bullet for preventing community harm.

agentic-systems

Recursive Self-Evolving Agents via Held-Out Selection

Researchers introduce RSEA, a Recursive Self-Evolving Agent that optimizes itself by rewriting natural-language artifacts without weight updates. By employing a strict held-out selection gate, RSEA ensures monotone-safe evolution, preventing the performance collapse common in unguarded self-improvement methods.

agentic-systems

Mechanistic Personality Analysis of LLMs Steering Personality via Latent Feature Interventions

Researchers have proposed a new mechanistic interpretability approach that directly intervenes in the latent features of LLMs to steer their OCEAN personality traits. This method allows precise control over AI behavior without compromising the model's language processing performance.

agentic-systems

Primary ICD Category Prediction using LLM-based Probing

Researchers evaluated whether frozen medical large language model (LLM) representations can serve as a shared embedding space for multimodal primary diagnosis category prediction, achieving high accuracy and efficient cross-dataset transfer.

agentic-systems

An AI agent for treatment reasoning over a biomedical tool universe

Researchers have introduced ATHENA-R1, a breakthrough AI agent capable of advanced treatment reasoning across all FDA-approved drugs since 1939. Trained via reinforcement learning over 212 biomedical tools, it outperforms leading models like GPT-5 in complex medical decision-making.

EXPLORE TOPICS