NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...1 min read

CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation

Share
NOW LET US Article – CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation

Researchers have proposed CONCORD, an asynchronous sparse aggregation framework for dual-end RAG under document isolation. It significantly improves end-to-end throughput and reduces communication overhead while maintaining high answer quality.

Computer Science > Artificial Intelligence

Title:CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation

View PDF HTML (experimental)Abstract:Retrieval-augmented generation (RAG) has emerged as a pivotal technique for improving language models by incorporating external knowledge at inference time. As device-cloud collaborative inference makes it feasible to deploy small language models on edge devices, a new setting arises in which private documents remain on the device and public knowledge resides in the cloud. Privacy and policy constraints often forbid raw document exchange, creating a document-isolated dual-end RAG setting. However, existing methods rely on frequent remote synchronization and dense evidence transfer, limiting throughput under realistic latency and bandwidth conditions. To address this issue, we propose CONCORD, an asynchronous sparse aggregation framework for dual-end RAG under document isolation. CONCORD treats the cloud as an asynchronously arriving evidence source rather than a continuously synchronized co-generator. Specifically, we introduce waiting debt control to decide whether each decoding step should continue waiting for remote participation based on the observed return of waiting. We also design a certificate-guided minimal supplementation mechanism that requests only the remote evidence needed to determine the current greedy decision. Steps that consult the cloud preserve the same greedy token as dense dual-end aggregation, while the remaining steps commit locally without remote evidence. Experiments on Natural Questions and WikiText-2 show that CONCORD improves end-to-end throughput over baselines by $1.66\times$ and $2.15\times$, respectively, while reducing per-token communication by over two orders of magnitude and maintaining comparable answer quality and perplexity.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

© 2026 Now Let Us. All rights reserved.

Source: arXiv cs.AI Recent

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Attribute Inference from Interactive Targeted Ads

agentic-systems

Attribute Inference from Interactive Targeted Ads

A new study models how interactive targeted advertising can act as a channel for attribute inference, allowing advertisers to deduce sensitive user data. The researchers propose defense mechanisms like aggregate reporting and randomized disclosure to mitigate these privacy risks.

NOW LET US Related – Metric Match: A Subset Selection Approach to Evaluating LLM Judge Reliability

agentic-systems

Metric Match: A Subset Selection Approach to Evaluating LLM Judge Reliability

Researchers have developed Metric Match, a novel method to estimate the reliability of LLM judges using limited human annotations. By selecting an optimal subset of samples, it reduces annotation needs by 32.5% and significantly cuts down evaluation costs.

NOW LET US Related – Semantics-Enhanced Retrieval-Augmented Time Series Forecasting

agentic-systems

Semantics-Enhanced Retrieval-Augmented Time Series Forecasting

Researchers have introduced SERAF, a novel multimodal time series forecasting framework that addresses the limitations of traditional methods under non-stationarity by leveraging dual retrieval over both numerical data and self-generated textual descriptions.

NOW LET US Related – AI Engram: In Search of Memory Traces in Artificial Intelligence

agentic-systems

AI Engram: In Search of Memory Traces in Artificial Intelligence

Researchers introduce 'AI Engram', a geometric framework to identify and isolate individual memory traces in deep neural networks. This biologically-inspired approach enables surgical manipulation of learned knowledge through simple linear arithmetic without iterative optimization.

NOW LET US Related – Risk-Aware LLM Agents for Geospatial Data Retrieval: Design and Preliminary Adversarial Evaluation

agentic-systems

Risk-Aware LLM Agents for Geospatial Data Retrieval: Design and Preliminary Adversarial Evaluation

Researchers have presented an LLM-driven framework that simplifies the retrieval of remote sensing data from cloud-based geospatial catalogs using natural language. The system integrates three specialized AI agents to optimize performance and mitigate adversarial API manipulation risks.

NOW LET US Related – Fusion is not one-size-fits-all: Cross-Modal Representation Alignment for Time-to-Event Modeling

agentic-systems

Fusion is not one-size-fits-all: Cross-Modal Representation Alignment for Time-to-Event Modeling

Researchers introduce a foundation model-driven framework for cross-modal representation alignment between CT imaging and longitudinal EHR data to improve time-to-event prediction. The study demonstrates that task-aware multimodal alignment is essential for robust generalization and scalable clinical deployment.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.