LabGuard: Grounding Natural-Language Laboratory Rules into Runtime Guards for Embodied Laboratory Agents

Researchers have introduced LabGuard, a language-to-execution safety suite that translates natural-language laboratory rules into executable runtime constraints, significantly reducing unsafe events for embodied laboratory agents.

Computer Science > Artificial Intelligence

Title:LabGuard: Grounding Natural-Language Laboratory Rules into Runtime Guards for Embodied Laboratory Agents

Scientific embodied agents are increasingly capable of carrying out laboratory procedures, but executing these procedures safely in dynamic laboratory environments remains challenging. Current safety approaches often overlook the intermediate step of transforming laboratory natural language, including safety rules, manuals, protocols, and standard operating procedures, into machine-checkable runtime constraints. We introduce LabGuard (Laboratory Guard), a language-to-execution safety suite that grounds natural-language laboratory rules into executable specifications and deploys them as runtime guards. LabGuard includes three core components: LabGuard-IR, which defines a typed executable representation; LabGuard-Bench, which provides 812 supervised annotations expanded from 203 seed laboratory rules; and LabGuard-Grounder, which maps natural-language laboratory rules into LabGuard-IR. The resulting IR instances are handled by the LabGuard Pipeline, which compiles them into runtime monitors and applies them at the controller boundary. Experiments show that LabGuard generalizes to unseen laboratory-rule sources, achieves 79.4 task-scope F1, and reduces unsafe events from 39.5% to 23.8% after monitor compilation. In LabUtopia, its runtime monitors integrate with ACT, keeping interventions below 0.5% while preserving task success.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

2026 BAIR Graduate Showcase

The Berkeley Artificial Intelligence Research (BAIR) Lab celebrates its outstanding class of 2026 Ph.D. graduates, whose pioneering research in robotics, LLMs, and AI safety is set to shape the future of technology.

agentic-systems

MultiUAV-Plat: An LLM-Oriented Platform, Benchmark and Framework for Multi-UAV Collaborative Task Planning

Researchers have introduced MultiUAV-Plat, a breakthrough simulation and benchmarking platform for LLM-based multi-UAV collaborative task planning, alongside the Agent4Drone framework which significantly improves task success rates.

agentic-systems

Beyond Compilation: Evaluating Faithful Natural-Language-to-Lean Statement Formalization

A new study reveals that successfully compiling AI-generated Lean statements does not guarantee semantic accuracy, exposing a significant gap and proposing a more rigorous evaluation framework.

agentic-systems

BayesBench: Evaluating LLM Belief Trajectories Under Multi-Turn Evidence Accumulation

A new evaluation suite called BayesBench reveals that while scaling LLMs improves their latent inference and evidence accumulation, a significant gap remains in translating these gains into rational downstream predictions.

agentic-systems

Why Solve It Twice? Hierarchical Accumulation of Skills for Transfer-Efficient ML Engineering

Researchers introduce HASTE, a hierarchical multi-agent system that organizes cross-competition knowledge into three scope tiers, allowing ML agents to accumulate and reuse skills, significantly reducing compute costs and improving performance.

agentic-systems

How Can AI Find My Model? A Model-Finding Experimental Study Considering Data Formats, Embeddings, and Retrieval Strategies

A recent experimental study explores how AI can optimize the discovery and reuse of simulation models using natural language queries. By evaluating data formats, embedding models, and retrieval strategies, the research establishes a baseline for AI-driven model composability and interoperability.

EXPLORE TOPICS