NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...1 min read

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Share
NOW LET US Article – Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Researchers have introduced DR-DCI, a novel framework that enables AI agents to efficiently search, filter, and compare data directly across massive document corpora of up to tens of millions of files.

Computer Science > Artificial Intelligence

Title:Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

View PDF HTML (experimental)Abstract:Agentic search over large corpora relies on retriever-mediated interfaces (e.g., BM25 or ColBERT) for scalable candidate discovery. While effective at ranking relevant documents, these interfaces expose evidence only as ranked results or bounded document views, limiting agents' ability to reorganize material and verify constraints across documents. Direct Corpus Interaction (DCI) addresses this limitation by exposing shell-executable corpus operations for flexible search, filtering, comparison, and verification. However, full-corpus terminal commands become slow and unstable as the corpus grows, degrading performance and efficiency. We introduce DR-DCI, a retriever-steered DCI framework that treats retrieval as an agent-callable action for expanding a local workspace. Rather than operating directly over the full corpus, the agent dynamically pulls relevant documents into an evolving workspace and conducts DCI operations within it. This design combines retriever-level recall with DCI-style precision: retrieval keeps exploration scalable, while DCI preserves the local operations needed for effective evidence resolution. Experiments show that DR-DCI is both effective and efficient across scales. On Browsecomp-Plus, DR-DCI reaches 71.2% accuracy, improving over raw DCI and ablated variants by up to 8.3 points while reducing tool usage, wall time, and estimated cost. With workspace-preserving context reset, accuracy further improves to 73.3%. In corpus-scaling experiments, DR-DCI remains effective from 100K to 10M documents, whereas raw DCI becomes unstable and BM25 performs substantially worse. DR-DCI also scales to a 20M-scale file-per-document Wiki-18 QA setting, achieving an average score of 63.0 across six benchmarks and outperforming retrieval-based and trained search-agent baselines. Ablation analysis further shows that ranked previews and inter-document DCI are key to performance.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

© 2026 Now Let Us. All rights reserved.

Source: arXiv cs.AI Recent

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Attribute Inference from Interactive Targeted Ads

agentic-systems

Attribute Inference from Interactive Targeted Ads

A new study models how interactive targeted advertising can act as a channel for attribute inference, allowing advertisers to deduce sensitive user data. The researchers propose defense mechanisms like aggregate reporting and randomized disclosure to mitigate these privacy risks.

NOW LET US Related – Metric Match: A Subset Selection Approach to Evaluating LLM Judge Reliability

agentic-systems

Metric Match: A Subset Selection Approach to Evaluating LLM Judge Reliability

Researchers have developed Metric Match, a novel method to estimate the reliability of LLM judges using limited human annotations. By selecting an optimal subset of samples, it reduces annotation needs by 32.5% and significantly cuts down evaluation costs.

NOW LET US Related – Semantics-Enhanced Retrieval-Augmented Time Series Forecasting

agentic-systems

Semantics-Enhanced Retrieval-Augmented Time Series Forecasting

Researchers have introduced SERAF, a novel multimodal time series forecasting framework that addresses the limitations of traditional methods under non-stationarity by leveraging dual retrieval over both numerical data and self-generated textual descriptions.

NOW LET US Related – AI Engram: In Search of Memory Traces in Artificial Intelligence

agentic-systems

AI Engram: In Search of Memory Traces in Artificial Intelligence

Researchers introduce 'AI Engram', a geometric framework to identify and isolate individual memory traces in deep neural networks. This biologically-inspired approach enables surgical manipulation of learned knowledge through simple linear arithmetic without iterative optimization.

NOW LET US Related – Risk-Aware LLM Agents for Geospatial Data Retrieval: Design and Preliminary Adversarial Evaluation

agentic-systems

Risk-Aware LLM Agents for Geospatial Data Retrieval: Design and Preliminary Adversarial Evaluation

Researchers have presented an LLM-driven framework that simplifies the retrieval of remote sensing data from cloud-based geospatial catalogs using natural language. The system integrates three specialized AI agents to optimize performance and mitigate adversarial API manipulation risks.

NOW LET US Related – Fusion is not one-size-fits-all: Cross-Modal Representation Alignment for Time-to-Event Modeling

agentic-systems

Fusion is not one-size-fits-all: Cross-Modal Representation Alignment for Time-to-Event Modeling

Researchers introduce a foundation model-driven framework for cross-modal representation alignment between CT imaging and longitudinal EHR data to improve time-to-event prediction. The study demonstrates that task-aware multimodal alignment is essential for robust generalization and scalable clinical deployment.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.