NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...1 min read

World Feedback for Clinical Agents: Diagnosing RL in FHIR Environments

Share
NOW LET US Article – World Feedback for Clinical Agents: Diagnosing RL in FHIR Environments

The paper diagnoses the challenges of applying Reinforcement Learning (RL) to clinical agents in FHIR environments, introducing MedAgentBench-v3 to address feedback flaws and proposing a hybrid SFT-RL approach.

Computer Science > Artificial Intelligence

Title:World Feedback for Clinical Agents: Diagnosing RL in FHIR Environments

View PDF HTML (experimental)Abstract:Clinical protocol-execution tasks -- checking a lab value, applying a threshold, placing a correctly structured FHIR order -- are natural candidates for RL from world feedback: once clinical SMEs encode decision logic into a verifier, that verifier grades unlimited rollouts without per-episode annotation. But applying RL requires a sound feedback channel and sufficient base capability. We audit MedAgentBench v1/v2, find a 41.7% silent-finish ceiling that makes inaction the RL dominant strategy, and construct \textbf{MedAgentBench-v3 (MAB-v3)} (508 tasks, 8.9% ceiling). Training Qwen3-8B exposes two structural barriers: a \emph{capability ceiling} (10/20 task types have 0% base performance, zero gradient) and a \emph{format-knowledge barrier} (3/20 types require exact clinical codes undiscoverable by exploration). Pure RL reaches 18.2% pass@1 vs.\ 34.1% for rule-based SFT; the 15.9~pp gap is attributable entirely to these barriers. A decision/format-knowledge/lookup taxonomy predicts RL learnability and prescribes the fix: SFT to inject codes, RL to learn conditionals.

© 2026 Now Let Us. All rights reserved.

Source: arXiv cs.AI Recent

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Discrete Diffusion Language Models for Interactive Radiology Report Drafting

agentic-systems

Discrete Diffusion Language Models for Interactive Radiology Report Drafting

Researchers have adapted a mixture-of-experts diffusion language model for medical applications, matching or exceeding traditional autoregressive models while decoding 3.5 to 4.4 times faster and enabling flexible, non-linear report drafting.

NOW LET US Related – OPINE-World: Programmatic World Modeling with Ontology-error-Prioritized Interactive Exploration

agentic-systems

OPINE-World: Programmatic World Modeling with Ontology-error-Prioritized Interactive Exploration

Researchers have introduced OPINE-World, a breakthrough LLM agent that learns an object-centric programmatic world model online through interaction. By guiding exploration with a novel 'ontology error' metric, it overcomes the data-hungry nature of traditional deep networks and achieves high efficiency on the ARC-AGI-3 benchmark.

NOW LET US Related – Janus: a Playground for User-Involved Agentic Permission Management

agentic-systems

Janus: a Playground for User-Involved Agentic Permission Management

As AI agents autonomously execute tools, managing permissions becomes a critical challenge. Janus is introduced as a playground system consisting of Janus-Core and Janus-Harness to implement and evaluate user-involved permission management designs.

NOW LET US Related – Profit-Based Counterfactual Explanations for Product Improvement: A Case Study of Manga Sales in Japan

agentic-systems

Profit-Based Counterfactual Explanations for Product Improvement: A Case Study of Manga Sales in Japan

Researchers propose a novel Profit-Based Counterfactual Explanation (PBCE) framework that integrates machine learning with business profit maximization, demonstrated through a case study on Japanese manga sales.

NOW LET US Related – When Should Service Agents Reconsider? Difficulty-Routed Control in Customer-Service Operations

agentic-systems

When Should Service Agents Reconsider? Difficulty-Routed Control in Customer-Service Operations

Researchers propose a difficulty-routed service-control architecture for autonomous customer-service agents. By separating routine queries from complex backend operations, the system maintains speed while preventing costly operational errors.

NOW LET US Related – Hawk: Harnessing Hardware-Aware Knowledge for High-Performance NPU Kernel Generation

agentic-systems

Hawk: Harnessing Hardware-Aware Knowledge for High-Performance NPU Kernel Generation

Developing high-performance NPU kernels is a major bottleneck due to strict hardware constraints. Hawk, a training-free framework, addresses this by leveraging hardware-aware knowledge, boosting generation accuracy to 80% and achieving up to a 2.2x speedup.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.