NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...5 min read

Building reliable agentic AI systems

Share
NOW LET US Article – Building reliable agentic AI systems

A case study on how Bayer and Thoughtworks built PRINCE, an agentic AI platform that transforms preclinical drug discovery data retrieval using Agentic RAG and Text-to-SQL.

Building Reliable Agentic AI Systems

A Case Study in building production-ready agentic AI systems

This paper presents the Preclinical Information Center (PRINCE), a cloud-hosted platform developed by Bayer AG with Thoughtworks to address pharmaceutical industry challenges in drug development. PRINCE leverages Agentic Retrieval-Augmented Generation and Text-to-SQL to integrate decades of safety study reports. We describe PRINCE's evolution from keyword-based search to an intelligent research assistant capable of answering complex questions and drafting regulatory documents. We reflect on key engineering decisions through the lens of context engineering—how information was shaped and routed between specialized agents—and harness engineering—how orchestration, recovery, and observability were built around the models to maintain control and reliability. The system prioritizes trust through transparency, explainability, and human-in-the-loop integration. PRINCE demonstrates AI's transformative potential in pharmaceuticals, significantly improving data accessibility and research efficiency while ensuring governance and compliance. *

16 June 2026

Contents

  • The Challenge: Navigating the Preclinical Data Maze
  • The Solution: PRINCE - An Evolutionary Platform
  • System Architecture: Engineering a Reliable Agentic RAG System
  • The Agentic RAG System
  • Building Trust in a Production LLM System
  • Engineering for Resilience: Error Handling and Recovery
  • Enhancing Data Quality: Named Entity Recognition and Annotation
  • The Journey Continues: Iterative Development
  • Conclusion

Preclinical drug discovery is inherently complex and data-intensive. Researchers face the significant challenge of efficiently accessing and analyzing vast volumes of information generated during this critical phase. Traditional keyword-based search methods, often reliant on rigid Boolean logic, frequently fall short when confronted with the nuanced and intricate nature of preclinical research questions.

The advent of Large Language Models (LLMs) has presented a transformative opportunity. By combining the generative power of LLMs with the precision of information retrieval systems, Retrieval-Augmented Generation (RAG) has emerged as a promising technique. This approach holds the potential to revolutionize preclinical data access, enabling researchers to pose complex questions in natural language and receive accurate, context-rich answers grounded in proprietary data.

Recognizing this potential early, Bayer committed to exploring how these technologies could address longstanding challenges in preclinical research.

In this post, we share that journey—how Bayer's early investment in generative AI has resulted in PRINCE, an agentic AI system built on Agentic RAG. This case study explores the technical architecture, engineering decisions, and lessons learned in transforming preclinical data retrieval from a challenging maze into an intuitive conversational experience.

Many of the engineering decisions behind PRINCE can now be understood through the lens of context engineering and harness engineering, although when the system was first designed we did not use these terms. Context engineering shaped what information each model received, what it did not receive, and how context moved between specialized steps such as research, reflection, and writing. Harness engineering shaped the scaffolding around the models: orchestration, tool boundaries, state persistence, retries, fallbacks, validation, reflection loops, observability, and human review.

While this post focuses on the technical architecture and engineering challenges, our paper published in Frontiers in Artificial Intelligence covers the product evolution and business impact in more detail.

The Solution: PRINCE - An Evolutionary Platform

To address these challenges, Bayer developed the Preclinical Information Center (PRINCE) platform. PRINCE was conceived as a unified gateway to preclinical data, initially focusing on consolidating previously siloed structured study metadata and exposing them in a âSearchableâ manner. This initial phase allowed users to apply advanced filters and retrieve information primarily from structured study metadata.

However, a significant portion of Bayer's valuable preclinical knowledge resides within unstructured PDF study reports accumulated over decades. Due to numerous system migrations over the years, the structured metadata associated with these reports could be incomplete, missing, or even contain incorrect annotations. Crucially, the authoritative âgold standardâ information was consistently present within the approved PDF study reports.

The emergence of Generative AI, particularly RAG, provided the key to unlocking this wealth of unstructured data. By integrating RAG capabilities, PRINCE began to shift the paradigm from a filter-based 'search' tool to a natural language 'ask' system, enabling researchers to query the content of these study reports directly.

This evolution reflects PRINCE's progression through three distinct phases:

  • Search: the initial phase focused on creating a unified gateway to thousands of nonclinical study reports, consolidating multiple in-house data silos from various preclinical domains into a searchable format, primarily leveraging structured metadata.
  • Ask: this phase introduced an AI-powered question-answering system utilizing Retrieval Augmented Generation (RAG). This enabled researchers to derive insights directly from unstructured data, including scanned PDFs from historical reports, by posing questions in natural language.
  • Do: the current phase positions PRINCE as an active research assistant capable of executing complex tasks. This is achieved through the integration of multi-agent systems, allowing the platform to handle intricate queries, orchestrate workflows, and support activities like drafting regulatory documents.

This deliberate evolution from Search to Ask to Do represents a strategic response to the industry's need for greater efficiency and innovation in preclinical development. By providing researchers with increasingly powerful tools to access, analyze, and act upon preclinical data, PRINCE aims to enable faster data-driven decision-making, reduce the need for unnecessary experiments, and ultimately accelerate the development of safer, more effective therapies.

System Architecture: Engineering a Reliable Agentic RAG System

The system functions as an interactive conversational UI, powered by a robust backend infrastructure. Its architecture, designed for handling complex queries and delivering accurate, context-rich answers, is orchestrated using LangGraph and served via a * FastAPI* application.

Figure 1 provides the system context—UI, backend, data stores, LLM fallbacks, and observability—while Figure 2 zooms into how the system coordinates its specialized agents.

Figure 1: System context and supporting platforms.

  • User Request: the process begins when a user submits a request through the Conversational UI which is built with React.
  • Orchestration: the user's request is routed to a LangGraph-based orchestration layer in the backend. This workflow engine coordinates a multi-stage process that progresses through clarifying user intent, thinking and planning, conducting research (using RAG and Text-to-SQL), validating data completion, and finally generating a response through the Writer agent. The workflow includes deliberate pause points and feedback loops to ensure data completeness before proceeding. (We explore the details of this agentic workflow in a dedicated section later.)
  • Data Retrieval and State Management: the Researcher agents interact with a comprehensive and distributed data ecosystem:
  • Vector representations of all study reports are stored in OpenSearch, forming the coreknowledge basefor information retrieval. - Curated structured data, resulting from various ETL and harmonization processes, is
© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Who Owns Your ATProto Identity? Hint: It's Probably Not You

dev-tools

Who Owns Your ATProto Identity? Hint: It's Probably Not You

An in-depth analysis of ATProto's key management system reveals critical security risks, where PDS operators hold absolute control over user identities and actions across the entire ecosystem.

NOW LET US Related – Beyond All Reason (Free Total Annihilation Inspired RTS)

dev-tools

Beyond All Reason (Free Total Annihilation Inspired RTS)

Inspired by the legendary Total Annihilation, Beyond All Reason is a free-to-play RTS game that redefines the genre with unmatched scale and real-time physics simulation.

NOW LET US Related – A 3D voxel game engine written in APL

dev-tools

A 3D voxel game engine written in APL

An experimental project demonstrates the viability of using the APL array programming language to build a 3D voxel game engine, challenging traditional game development workflows.

NOW LET US Related – The 100k Whys of AI

dev-tools

The 100k Whys of AI

While large language models (LLMs) are highly sophisticated, distinguishing AI-generated content remains feasible due to the quasi-deterministic nature of the technology. The flood of repetitive books on Amazon clearly demonstrates how AI tends to replicate itself when faced with similar prompts.

NOW LET US Related – Public Service Announcement: Don't Say You Use AI for Writing

dev-tools

Public Service Announcement: Don't Say You Use AI for Writing

Publicly admitting to using AI for writing can inadvertently ruin your professional reputation. When the line between 'assisting' and 'replacing' becomes blurred, readers and professionals will likely question your actual capabilities.

NOW LET US Related – Your brain was never designed for this much bad news

dev-tools

Your brain was never designed for this much bad news

Humans evolved to pay close attention to danger, but today that instinct is being overwhelmed by an endless supply of bad news. Researchers suggest the solution is not to stop following current events, but to build healthier habits around how we consume news.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.