NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...2 min read

The Agentic Garden of Forking Paths

Share
NOW LET US Article – The Agentic Garden of Forking Paths

A new study reveals that AI agents can produce divergent, opposing scientific conclusions from the same dataset simply by being assigned different personas. To address this challenge to scientific credibility, researchers propose 'Agentic Bootstrap' to map the entire distribution of possible analytical paths.

Computer Science > Artificial Intelligence

Title:The Agentic Garden of Forking Paths

View PDFAbstract:Empirical research rarely admits a unique analysis. Different analytical choices can lead to different conclusions from the same data, yet these hidden forking paths are difficult to observe. We show that AI agents capture much of the analytical variation among human researchers while making these paths explicit. Across four high-stakes domains, assigning different personas is sufficient for AI agents to report divergent, often opposing, conclusions from the same data and question, with findings systematically aligned with those beliefs. In a study in which 42 human research teams analyzed the same immigration dataset, AI agents reproduced 72% of the human ideological gap in reported effect estimates. Despite reaching opposing conclusions, it is difficult to identify clear issues in each analysis based on the final AI reports: 86% passed independent AI review and 78% passed majority human expert review. These findings suggest that the central challenge is often not flawed analyses, but selective exploration and reporting from a large space of methodologically defensible analyses. AI agents may amplify this longstanding problem by making such exploration inexpensive and scalable. To address this, we introduce the m-value (multiverse value), the probability that an analysis path would produce a claim at least as extreme as the reported one. We further introduce Agentic Bootstrap, which estimates the m-value by using AI agents to sample plausible analysis paths. Applied to the human immigration study, 13.5% of reported human analyses fell in the most extreme 5% of the analysis space (m<0.05). Scientific evidence should therefore be evaluated not only by a single reported analysis but also by its position within the distribution of analyses that could reasonably have been reported. Agentic Bootstrap makes this distribution observable and turns it into a criterion for scientific credibility.

Current browse context:

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

© 2026 Now Let Us. All rights reserved.

Source: arXiv cs.AI Recent

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – World Feedback for Clinical Agents: Diagnosing RL in FHIR Environments

agentic-systems

World Feedback for Clinical Agents: Diagnosing RL in FHIR Environments

The paper diagnoses the challenges of applying Reinforcement Learning (RL) to clinical agents in FHIR environments, introducing MedAgentBench-v3 to address feedback flaws and proposing a hybrid SFT-RL approach.

NOW LET US Related – Discrete Diffusion Language Models for Interactive Radiology Report Drafting

agentic-systems

Discrete Diffusion Language Models for Interactive Radiology Report Drafting

Researchers have adapted a mixture-of-experts diffusion language model for medical applications, matching or exceeding traditional autoregressive models while decoding 3.5 to 4.4 times faster and enabling flexible, non-linear report drafting.

NOW LET US Related – OPINE-World: Programmatic World Modeling with Ontology-error-Prioritized Interactive Exploration

agentic-systems

OPINE-World: Programmatic World Modeling with Ontology-error-Prioritized Interactive Exploration

Researchers have introduced OPINE-World, a breakthrough LLM agent that learns an object-centric programmatic world model online through interaction. By guiding exploration with a novel 'ontology error' metric, it overcomes the data-hungry nature of traditional deep networks and achieves high efficiency on the ARC-AGI-3 benchmark.

NOW LET US Related – Janus: a Playground for User-Involved Agentic Permission Management

agentic-systems

Janus: a Playground for User-Involved Agentic Permission Management

As AI agents autonomously execute tools, managing permissions becomes a critical challenge. Janus is introduced as a playground system consisting of Janus-Core and Janus-Harness to implement and evaluate user-involved permission management designs.

NOW LET US Related – Profit-Based Counterfactual Explanations for Product Improvement: A Case Study of Manga Sales in Japan

agentic-systems

Profit-Based Counterfactual Explanations for Product Improvement: A Case Study of Manga Sales in Japan

Researchers propose a novel Profit-Based Counterfactual Explanation (PBCE) framework that integrates machine learning with business profit maximization, demonstrated through a case study on Japanese manga sales.

NOW LET US Related – When Should Service Agents Reconsider? Difficulty-Routed Control in Customer-Service Operations

agentic-systems

When Should Service Agents Reconsider? Difficulty-Routed Control in Customer-Service Operations

Researchers propose a difficulty-routed service-control architecture for autonomous customer-service agents. By separating routine queries from complex backend operations, the system maintains speed while preventing costly operational errors.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.