WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

Researchers introduce WorldLines, a new benchmark for long-horizon embodied household assistance, along with ObsMem, a memory framework designed to help agents maintain state-aware decisions in dynamic environments.

Computer Science > Artificial Intelligence

Title:WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

View PDF HTML (experimental)Abstract:To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Source: arXiv cs.AI Recent

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

Computer Science > Artificial Intelligence

Title:WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

More in this category

Analyzing the Narration Gap in LLM-Solver Loops

Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why

Uncertainty Decomposition for Clarification Seeking in LLM Agents

ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence

LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data

Deontic Policies for Runtime Governance of Agentic AI Systems

Discover All Categories