NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...1 min read

OPINE-World: Programmatic World Modeling with Ontology-error-Prioritized Interactive Exploration

Share
NOW LET US Article – OPINE-World: Programmatic World Modeling with Ontology-error-Prioritized Interactive Exploration

Researchers have introduced OPINE-World, a breakthrough LLM agent that learns an object-centric programmatic world model online through interaction. By guiding exploration with a novel 'ontology error' metric, it overcomes the data-hungry nature of traditional deep networks and achieves high efficiency on the ARC-AGI-3 benchmark.

Computer Science > Artificial Intelligence

Title:OPINE-World: Programmatic World Modeling with Ontology-error-Prioritized Interactive Exploration

View PDF HTML (experimental)Abstract:Learning how an environment behaves from interaction is central to building agents that adapt to unfamiliar tasks. World models learned with deep networks are flexible but data-hungry and transfer poorly beyond their training distribution. Program-synthesized world models, written as source code by LLMs and refined through counterexample-guided inductive synthesis (CEGIS), are instead data-efficient and reusable, yet they have been demonstrated mainly on structured-state worlds with a given object vocabulary, and a single program search does not scale to pixel-rendered environments whose object structure must be hypothesized flexibly. We introduce OPINE-World, an LLM agent that learns an object-centric programmatic world model online from interaction. OPINE-World couples two cooperating agents in a loop of hypothesis and test, one acting in the environment and one synthesizing the model in code with replay verification and model-based planning, and it steers exploration with a Bayesian measure of object-type adequacy we call ontology error. We evaluate OPINE-World on ARC-AGI-3, a benchmark for skill-acquisition efficiency in which the object vocabulary, the goal, and the action semantics are withheld. OPINE-World solves 20 of 25 games without per-game training and reaches an action-efficiency score of 78.4 against the human baseline.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

© 2026 Now Let Us. All rights reserved.

Source: arXiv cs.AI Recent

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Beyond Next-Token Prediction: An RLVR Proof of Concept for Tool-Use Agents on Atlassian Workflows

agentic-systems

Beyond Next-Token Prediction: An RLVR Proof of Concept for Tool-Use Agents on Atlassian Workflows

Researchers demonstrate how Reinforcement Learning with Verifiable Rewards (RLVR) can bridge the gap between next-token prediction and complex API execution, significantly boosting the performance of small language models on Atlassian workflows.

NOW LET US Related – EO-Agents: A Three-Agent LLM Pipeline for Earth Observation Hypothesis Generation

agentic-systems

EO-Agents: A Three-Agent LLM Pipeline for Earth Observation Hypothesis Generation

Researchers have developed EO-Agents, a breakthrough AI system that leverages large language models to automate Earth observation scientific hypothesis generation using NASA's knowledge graph.

NOW LET US Related – Agent4cs: A Multi-agent System for Code Summarization in Large Hierarchical Codebases

agentic-systems

Agent4cs: A Multi-agent System for Code Summarization in Large Hierarchical Codebases

Researchers have proposed Agent4cs, a multi-agent framework designed to summarize large, hierarchical codebases in a bottom-up approach. By leveraging specialized agents, Agent4cs outperforms traditional single-model baselines in semantic consistency and keyword coverage.

NOW LET US Related – Scaling Trends for Lie Detector Oversight in Preference Learning

agentic-systems

Scaling Trends for Lie Detector Oversight in Preference Learning

A new study evaluates Scalable Oversight via Lie Detectors (SOLiD) on larger LLMs, showing that scaling reduces undetected deception to 14% and can eliminate the need for expensive human labelers during fine-tuning.

NOW LET US Related – The Agentic Garden of Forking Paths

agentic-systems

The Agentic Garden of Forking Paths

A new study reveals that AI agents can produce divergent, opposing scientific conclusions from the same dataset simply by being assigned different personas. To address this challenge to scientific credibility, researchers propose 'Agentic Bootstrap' to map the entire distribution of possible analytical paths.

NOW LET US Related – World Feedback for Clinical Agents: Diagnosing RL in FHIR Environments

agentic-systems

World Feedback for Clinical Agents: Diagnosing RL in FHIR Environments

The paper diagnoses the challenges of applying Reinforcement Learning (RL) to clinical agents in FHIR environments, introducing MedAgentBench-v3 to address feedback flaws and proposing a hybrid SFT-RL approach.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.