NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
STARTUPS-VC...6 min read

Fine-tuning forgets. RAG leaks context. Hypernetworks build the model your agent needs on demand.

Share
NOW LET US Article – Fine-tuning forgets. RAG leaks context. Hypernetworks build the model your agent needs on demand.

Enterprise AI agents often stall in production due to the limitations of fine-tuning and RAG. Hypernetworks offer a breakthrough by generating small, task-specific models on demand, bypassing context limits and retraining costs.

Enterprise teams keep watching the same thing happen. An AI agent demos beautifully, goes to production, and stalls: it runs for a short stretch, then needs a human to top up its context and check its output, and the promised efficiency drains into supervision. The agent did the work; you did the watching. It’s one reason so many agent pilots never turn into production systems.

The pitch on the other side of that wall is the one every team wants to believe: an agent that runs a long job on its own, overnight if it has to, and leaves a person to validate only the last 10%. Whether that is achievable turns on a problem the orchestration conversation mostly skips. When AI firm Chroma tested 18 leading models, every one lost accuracy as its input grew, a property of how attention works, not a gap a stronger model closes. An agent fed more and more of your business as it runs does not get steadier. It gets shakier.

This is the layer beneath the orchestration race. Routing, durable execution and observability all assume each agent is already competent enough to coordinate in the first place. The deeper question is how long an agent can run before a human has to step in, and that comes down to where your company's knowledge lives relative to the model. Both standard fixes leave a human in the loop.

Why teaching a model your business keeps you in the loop

Frontier models keep getting more capable, and the gap does not close, because it is not a capability problem. It is about where your knowledge sits relative to the model, and enterprises have had two ways to place it there.

The first is fine-tuning, which bakes knowledge into the weights. It remains subject to catastrophic forgetting, a problem identified in the 1980s and still unresolved in 2026: teaching a model something new tends to erode what it already knew. Teams work around it by isolating each task in its own fine-tuned model or adapter, which produces a sprawling estate of models that raises cost and governance overhead. And a fine-tuned model is a snapshot, stale the day a policy changes, when the expensive, slow retraining cycle starts over.

The second is in-context learning, which skips retraining by placing the relevant policies in the prompt at run time. This is where context rot bites. Retrieval narrows what goes into the prompt, but a retrieval miss looks identical to a confident answer, and both cost and latency climb with every token added.

The two failures rhyme. With fine-tuning, the model can be confidently working from last quarter's policy. With in-context learning, it can be confidently working from a detail it lost in the middle of a long prompt. Either way the output looks equally assured, so you cannot tell which parts are wrong without checking all of them. That is why the human never gets to leave. Some teams often run both at once, fine-tuning the stable knowledge and retrieving the rest. That softens each failure but removes neither: on any given output you still cannot be sure the model is both current and working from the right context, so you still check it.

A third path: generate the specialist model on demand

A third approach is moving from research into early product. Instead of retraining one model or stuffing its prompt, a generator builds a small, task-specific model on demand from your policies, at inference time. The generator is a hypernetwork: a network whose output is the weights of another network.

The idea was named in 2016; applying it to produce specialist language models from text or documents is recent and active. Sakana AI's Text-to-LoRA, presented at ICML 2025, generates a model adapter from a plain-language description in a single pass, and a 2026 system called SHINE calls hypernetwork adaptation a promising new frontier, precisely because it sidesteps both the retraining cost of fine-tuning and the context limits of prompting.

The point of generating adapters rather than training and storing them is to collapse a sprawling library of per-task LoRAs into one network that can produce them on demand, including for tasks it has not seen.

The elegant part is how this closes the loop on the problem above: the per-task adapter teams hand-build to dodge catastrophic forgetting is the same object a hypernetwork produces automatically. The model zoo stops being a governance headache and becomes a generated output.

The case for going small underneath all this was put most directly in a 2025 paper by Nvidia researchers: for the narrow, repetitive tasks that fill agent workflows, small models are capable enough and 10 to 30 times cheaper to run than frontier generalists. Nace.AI, a Palo Alto company that raised a $21.5 million seed round in May, is the clearest commercial instance. Its core technology, a generator it calls a MetaModel, produces parameter adaptations for a model at inference time from a company's policies, pointed at regulated work: audit, compliance, risk assessment. The company says its agents handle the bulk of a workflow while human experts validate the result, a split it markets as 90/10.

How the three approaches compare

| Criteria | Fine-tuning | In-context learning (RAG) | On-demand generated weights (Hypernetwork) | | :--- | :--- | :--- | :--- | | Where knowledge lives | In the model's weights | In the prompt, re-supplied each run | In on-demand generated weights | | Cost to update | High: retrain | Low: edit the source | Low: regenerate | | Staleness risk | High: a snapshot | Low | Low: regenerated from current policy | | Run-time cost | Low | High, grows with context | Low at run time | | Primary failure mode | Forgetting; model-zoo sprawl | Context rot; silent retrieval misses | Generator quality; calibration | | Who captures learning | Whoever trains the model | Whoever holds the data store | Depends where generator and feedback live |

Why a hypernetwork-built model raises the autonomy ceiling

A model that is narrow, current and small has a smaller surface on which to be wrong. Fewer errors, confined to a known domain, mean fewer outputs an agent has to escalate to a person, which is the real basis for any high-autonomy claim. It is also where a number like 90/10 comes from: not a dial set in advance, but an outcome of how little the system needs to hand back. Reported autonomy shares are best read as measurements of an architecture, not as settings.

Two design choices decide whether that autonomy is trustworthy or merely fast. The first is grounding: tying every output to its source so a reviewer can verify rather than redo. Research models built for exactly this, such as HalluGuard, label each claim as supported or not and cite the passage they relied on. Nace ships its agents with grounding models and reasoning traces for the same reason. A 10% review only means something if the human can confirm provenance in seconds.

The second is the feedback loop, and it forces a question every buyer should ask: when your experts validate the output, whose model improves, and where does it live? That decides whether the compounding asset belongs to the vendor or to you. Arrangements differ. Nace, for instance, uses an external network of certified experts for some engagements and, for direct enterprise deployments, the customer's own staff, with the resulting model kept inside the customer's cloud. Each choice routes the learning, and the ownership, somewhere different.

Where the third path breaks

The approach is still early, and a few questions will decide how far it goes. Calibration is the linchpin: the value rests on the model knowing when it is unsure. And it is genuinely unsettled, recent work generating these adapters found they do not automatically improve calibration over ordinary fine-tuning, with gains appearing only under specific constraints.

The quality of the generated model also depends heavily on the policy data it is built from, which puts a premium on data curation. And scale is the open research frontier, the hypernetworks shown in published work so far have been small. This is where Nace's own work gets interesting: in our interview, the company said it has scaled its generator well beyond those published sizes and derived a

© 2026 Now Let Us. All rights reserved.

Source: VentureBeat

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – 7,000 Langflow servers are under attack. LangGraph and LangChain have the same holes

startups-vc

7,000 Langflow servers are under attack. LangGraph and LangChain have the same holes

Three of the most popular AI agent frameworks—Langflow, LangGraph, and LangChain—are facing severe security vulnerabilities, exposing sensitive API keys and enabling remote code execution. With thousands of servers already under active attack, these classic application security flaws highlight the risks of insecure defaults in rapid AI deployment.

NOW LET US Related – Adobe embeds agentic AI workflows across Creative Cloud, shifting from media generation to production orchestration

startups-vc

Adobe embeds agentic AI workflows across Creative Cloud, shifting from media generation to production orchestration

Adobe has announced a major expansion of its creative agent across its flagship Creative Cloud suite and upgraded Firefly AI studio, shifting from simple media generation to complex production orchestration.

NOW LET US Related – AWS enters the context layer race with a graph that learns from agents, not manual curation

startups-vc

AWS enters the context layer race with a graph that learns from agents, not manual curation

Building a context layer between enterprise data stores and AI agents is bespoke work, with no standard service to automate or maintain the graphs over time. Amazon is making a direct play to change that.

NOW LET US Related – Databricks says it solved the decades-old data pipeline problem that's been slowing AI agents

startups-vc

Databricks says it solved the decades-old data pipeline problem that's been slowing AI agents

Databricks has unveiled Lakehouse//RT and LTAP, two new products designed to eliminate complex data pipelines and unify transactional and analytical data at the storage layer, removing latency bottlenecks for real-time AI agents.

NOW LET US Related – Satya Nadella warns that AI could hollow out entire industries, echoing the damage done by globalization

startups-vc

Satya Nadella warns that AI could hollow out entire industries, echoing the damage done by globalization

Microsoft CEO Satya Nadella warns of AI concentration risks that could commoditize industry expertise, drawing parallels to the outsourcing crisis of globalization, even as Microsoft and other tech giants grapple with soaring AI infrastructure costs.

NOW LET US Related – When deep research isn't enough for your business: Sakana AI launches 'ultra deep research' agent for 100+ page reports in 8 hours

startups-vc

When deep research isn't enough for your business: Sakana AI launches 'ultra deep research' agent for 100+ page reports in 8 hours

Tokyo-based Sakana AI has launched Sakana Marlin, an autonomous B2B research agent acting as a 'Virtual CSO'. It runs continuous reasoning loops for up to eight hours to deliver comprehensive, 100-page strategy reports.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.