AGENTIC-SYSTEMSJune 4, 20261 min read9 views

Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

Researchers have proposed State-Grounded Dynamic Retrieval (SGDR), an online skill learning method that enables web agents to dynamically retrieve and reuse skills based on real-time webpage states, significantly improving web automation success rates.

Computer Science > Artificial Intelligence

Title:Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce skills from previous task trajectories and reuse them in future tasks on the fly. However, existing methods mainly reuse skills at the task-level: a fixed set of skills is retrieved based on the initial task instruction and then held fixed throughout execution. This static strategy is misaligned with web execution, where the appropriate next action depends not only on the task goal but also on the current webpage state, which often transitions into situations that the initial skills fail to cover. To address this gap, we propose State-Grounded Dynamic Retrieval (SGDR), an online skill learning method that enables stepwise skill reuse for web agents. SGDR consists of three components: a sliding-window extraction process that turns completed trajectories into reusable sub-procedures invokable at intermediate execution states, a dual text-code representation that connects skill retrieval with executable action, and a state-grounded dynamic retrieval mechanism that matches skills to both the task goal and the current webpage state. Experiments on WebArena across five domains show that SGDR consistently outperforms strong baselines, achieving average success rates of 37.5% with GPT-4.1 and 24.3% with Qwen3-4B, corresponding to relative gains of 10.6% and 10.0% over the strongest baseline, respectively.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

ToolVerse: Unlocking Massive Environments and Long-Horizon Tasks for Agentic Reinforcement Learning

Researchers introduce ToolVerse, a comprehensive framework designed to scale up agentic reinforcement learning environments and enable LLM agents to perform complex, long-horizon reasoning tasks using thousands of real-world tools.

NOW LET US Related – Neuro-Symbolic AI for LEED compliance: Document-Centric Benchmarking, Deterministic Numeric Checking, and When Multimodal Hurts

agentic-systems

Neuro-Symbolic AI for LEED compliance: Document-Centric Benchmarking, Deterministic Numeric Checking, and When Multimodal Hurts

A new study introduces a local neuro-symbolic AI pipeline to automate LEED v4.1 green building certification screening. The findings reveal that the small 4-billion-parameter Gemma 3 model outperforms larger models, while incorporating multimodal drawing images consistently degrades performance.

agentic-systems

SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction

Researchers have introduced SeerGuard, a proactive safety framework that prevents mobile GUI agents from executing risky actions by predicting consequences beforehand using a Safety-Augmented World Model (SAWM).

NOW LET US Related – MGDT: MLLM-Guided Diffusion Transformer with Relation-Adaptive Mixture-of-Experts for Multimodal Knowledge Graph Completion

agentic-systems

MGDT: MLLM-Guided Diffusion Transformer with Relation-Adaptive Mixture-of-Experts for Multimodal Knowledge Graph Completion

Researchers have proposed MGDT, a novel framework for Multimodal Knowledge Graph Completion (MKGC) that utilizes an align-then-diffuse paradigm. By integrating a frozen MLLM and a Relation-Adaptive Mixture-of-Experts module, MGDT significantly outperforms existing baselines.

agentic-systems

DrawingVQA: A Real-World Benchmark for Multi-Depth Visual-Textual Reasoning on Construction Drawings

Researchers have introduced DrawingVQA, the first benchmark designed to evaluate multimodal large language models (MLLMs) on real-world construction drawings. Evaluation results reveal a significant performance gap between state-of-the-art AI models and human experts when dealing with these complex engineering documents.

agentic-systems

From Black Box to Executable Logic: Explainable Reinforcement Learning through Prolog Expert Systems

Researchers have proposed a novel method to transform black-box deep reinforcement learning policies into explainable, executable Prolog logic programs, achieving comparable or even superior performance in various control tasks.