AGENTIC-SYSTEMSJune 12, 20261 min read16 views

Arbor: Tree Search as a Cognition Layer for Autonomous Agents

Arbor is a new multi-agent framework that introduces structured tree search as a cognition layer, enabling autonomous agents to achieve up to 193% throughput-latency Pareto improvement in LLM inference.

Computer Science > Artificial Intelligence

Title:Arbor: Tree Search as a Cognition Layer for Autonomous Agents

View PDF HTML (experimental)Abstract:Arbor is a multi-agent framework that introduces structured tree search as a cognition layer for autonomous agents operating in large, stateful action spaces. Prior autonomous optimization systems operate on isolated targets with stateless evaluation. Arbor instead maintains an explicit search tree of scored hypotheses that serves as the shared working memory across agents, evolving with every measurement, treating failures as diagnostic signal that reshapes subsequent exploration, and expanding as prior successes shift the bottleneck distribution.

We validate Arbor on full-stack LLM inference optimization, a domain where achieving peak performance has historically required coordinated effort from engineering teams across the application, framework, compiler, kernel, and hardware stack. Arbor pairs an Orchestrator agent, which drives optimization by delegating to Domain Specialists across the inference stack, with a Critic agent that safeguards stability through root-cause analysis, introspection, and measurement validation -- a checks-and-balances architecture where neither agent can unilaterally drive the system. Agent capabilities are decomposed into hard skills (domain expertise) and soft skills (coordination protocols that determine how contributions compose), enabling fully autonomous multi-day campaigns. Arbor achieves up to 193% inference throughput-latency Pareto improvement over vendor-optimized baselines, while a single agent without the harness plateaus at +33% throughput improvement and crashes irrecoverably within hours. Arbor generalizes to multiple generations of hardware platform, and run-to-run variance is within 2 percentage points demonstrating that the method is hardware-agnostic and reproducible.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

Teaching LLMs to Update Beliefs for Efficient Long-Horizon Interaction

As LLMs tackle longer tasks, retaining complete interaction histories becomes contextually and computationally expensive. The ABBEL framework addresses this by isolating summaries into natural-language 'belief states' and supervising them via belief grading, recovering accuracy while significantly cutting memory usage.

NOW LET US Related – Stochastic Sampling is Epistemically Shallow: The Dimensionality Gap Between Temperature Variation and Model Diversity in LLMs

agentic-systems

Stochastic Sampling is Epistemically Shallow: The Dimensionality Gap Between Temperature Variation and Model Diversity in LLMs

A new study reveals that stochastic sampling via temperature variation in a single LLM only provides per-question uncertainty, failing to capture complex cross-question epistemic structures compared to a diverse ensemble of distinct models.

NOW LET US Related – AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

agentic-systems

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

AINTMA introduces an agentic AI framework featuring six specialized autonomous agents to transform enterprise software test management. Evaluated across 12 projects over 18 months, the system cut test cycle times by 43% and reduced defect escape rates to 2.1%.

agentic-systems

Marking the Wrong Symptoms: Evaluating LLM Watermarks in Medical Texts

A new comprehensive study reveals that applying LLM watermarking schemes in the medical domain leads to severe performance degradation, inducing lexical corruption, hallucinations, and misinterpretations in clinical reasoning.

agentic-systems

ClickGuard: Detecting and Spoiling Clickbait News with Informativeness Measures and Large Language Models

Researchers have introduced ClickGuard, an AI-powered browser extension designed to detect and spoil clickbait news. Utilizing LLM embeddings and an XGBoost architecture, the tool achieves a 91% F1-score while providing concise article summaries.

agentic-systems

DecodeShare: Tracing the Shared Subspace of LLM Decode-Time Decisions

Researchers introduce DecodeShare, a protocol identifying a low-dimensional shared subspace in LLM decode-time hidden states that plays a crucial causal role in decision-making and activation steering.