NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
SAAS-GROWTH...6 min read

Agents Just Passed Humans in Token Usage. And They Burn Far More Than Anyone Budgeted. A Deep Dive With OpenRouter’s COO

Share
NOW LET US Article – Agents Just Passed Humans in Token Usage. And They Burn Far More Than Anyone Budgeted.  A Deep Dive With OpenRouter’s COO

According to OpenRouter's latest data, AI agentic token usage has officially overtaken human usage. However, this shift is driving massive operational costs that far exceed initial corporate budgets.

If you want a clean read on where AI is heading, look at the inference. Chris Clark, COO and co-founder of OpenRouter, runs the largest AI gateway in the world: around 70 model providers, hundreds of models, one integration that keeps working no matter which lab, cloud, or modality wins next. That vantage point comes with a number that lands hard. OpenRouter expects to process about 28 trillion tokens in a single week. That is roughly 1 percent of all global inference, and more than Salesforce has processed in the entire lifetime of the company.

Half that volume is US and half is the rest of the world, which makes their data a fair proxy for real global trends. And the trend that matters for every B2B founder is the one Clark put front and center.

Agentic usage overtook human usage

For two years the message was go make AI work for your company, and it was hard. People chatted with models, dropped some custom data into a project, and got modest results. In the last few months that changed. Agents started working. You can ask an agent to do something and it gets done. In OpenRouter’s data, agentic token usage is now overtaking human usage, and you do not need the chart to feel it. Everyone building right now feels it.

The part most teams have not priced in is the cost of that shift. Agentic usage burns far more tokens than people anticipated. A human turn in a chat box is short. An agentic turn carries a heavy context load: tool call definitions, MCP gateway definitions, skill front matter, plus reasoning and tool calls going back and forth before the agent returns anything. The token bill for one agentic task can dwarf a hundred human chats. If your forecast still models AI spend like people typing into a box, your forecast is wrong.

What agents need, point one: high-quality inference

Clark’s framing is that agents need three things to succeed, and the first is inference quality that holds up. The surprise is that the same model performs differently depending on who serves it. Artificial Analysis benchmarked one open-weight model across many providers and got variable results from identical weights. Same math, same numbers, different scores.

This is mostly not quantization, and it is not providers sabotaging each other. There is a large amount of software sitting between raw model weights and a valid API response, and plenty of room to misconfigure it, introduce bugs, or parse tool calls incorrectly. The takeaway for anyone building on top of a model: where you source your tokens changes the quality of what you get back, even when the model name on the label is the same.

Point two: agents live and die on tool calling

The second requirement is tool calling, and the data shows how central it has become. Looking at one frontier model family on OpenRouter, around 55 percent of requests asked for tools, the model used those tools 83 percent of the time, and 46 percent of completions finished because of a tool call. The line on that trend marches steadily upward, which maps to how agents work. They are not chatting. They are calling tools, reading results, and calling more tools.

That makes tool calling the load-bearing piece of agentic performance. A model that reasons beautifully but botches its tool calls is useless inside an agent.

Point three: the calls have to succeed

A tool call is a JSON result from the model that says call this tool with these parameters. If the JSON is malformed, the tool name is invented, or the parameters are wrong, the call fails and the agent stalls. And success rates vary meaningfully by provider, driven by the same infrastructure quirks that move inference quality.

Clark made it concrete with a live demo, which is a brave thing to do on stage with a non-deterministic system. He fired 213 tool calls at an open-weight model running on one provider and got an error. Then he switched providers by editing a saved preset, same model, same API, same code, and reran. The errors disappeared, because the second provider had a cleaner implementation. OpenRouter monitors this across thousands of API endpoints in real time and routes agents around the providers that are failing, whether the failure is uptime, malformed tool calls, or something else. Same model slug in your code, fewer broken agents in production.

The gateway problem is now part of your architecture

Stack these three points and a clear conclusion falls out. The hard part of running agents in production is no longer just picking a good model. It is inference quality that varies by provider, tool-call success that varies by provider, and failover when a provider degrades. Those used to be someone else’s problem. With agents as your dominant token consumer, they are now your architecture, whether you build that layer yourself or buy it.

And the cost side deserves its own line on the P&L. The token blowups are not theoretical. Plenty of large companies have already burned through their annual AI budgets early, because they sized the bill for chat and got hit with agents. The fix is to forecast agentic spend for what it is, a multiple of human usage, not an extension of it.

Build for the agents you are about to ship

The chat era is the baseline you are leaving behind. The next year of your AI bill, your reliability problems, and your performance gains will all be driven by agents, and agents behave nothing like a person at a keyboard. Budget tokens like a real and growing line item. Assume your tool-call success rate, not just your model choice, decides whether your agents work. And treat the routing and failover layer as core infrastructure, not a nice-to-have.

The companies that figure this out will ship agents that quietly succeed in the background. The ones that do not will ship agents that fail in ways they cannot see, on a bill they did not plan for.

Top 5 takeaways

**Agentic token usage has overtaken human usage.**OpenRouter processes around 28 trillion tokens a week, roughly 1 percent of global inference, and agents are now the dominant consumer.Agents burn far more tokens than chat. One agentic turn carries tool definitions, MCP gateway definitions, skill front matter, and reasoning loops, so a single task can dwarf a hundred human chats. Forecast it as a multiple of human usage, not an extension of it.The same model performs differently depending on who serves it. Identical weights produce variable benchmark results across providers, driven by the software between the weights and the API, not quantization. Where you source tokens changes the quality you get back.Tool calling is the load-bearing piece. On one frontier model family, 55 percent of requests asked for tools, the model used them 83 percent of the time, and 46 percent of completions finished on a tool call. Tool-call success rate, not just model choice, decides whether your agents work.Routing and failover are now core infrastructure. Tool-call success varies by provider, so monitoring across endpoints and routing agents around failures in real time is part of your architecture, whether you build it or buy it.

© 2026 Now Let Us. All rights reserved.

Source: SaaStr

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – The Agents #006: We Run SaaStr AI on 3 Humans and 21+ AI Agents. Here’s Every Agent, Agent by Agent, With the Numbers.

saas-growth

The Agents #006: We Run SaaStr AI on 3 Humans and 21+ AI Agents. Here’s Every Agent, Agent by Agent, With the Numbers.

At SaaStr AI 2026, the team revealed the backend of their 21+ AI agents running the entire operation alongside just 3 humans. Here is the detailed breakdown of their tech stack, capabilities, and key lessons learned.

NOW LET US Related – 7 AI GTM Sessions on One SaaStr Stage: Vercel, Artisan, Lightfield, Attention, Qualified, Aurasell, and Relevance

saas-growth

7 AI GTM Sessions on One SaaStr Stage: Vercel, Artisan, Lightfield, Attention, Qualified, Aurasell, and Relevance

At the SaaStr event, the trend of consolidating Go-To-Market (GTM) tools into autonomous AI agents is becoming clearer than ever. From Vercel's infrastructure to Artisan's automated sales assistants, AI is reshaping how businesses operate and interact with customers.

NOW LET US Related – The Top 15 Questions to Ask a VP of Sales During an Interview (Updated for 2026)

saas-growth

The Top 15 Questions to Ask a VP of Sales During an Interview (Updated for 2026)

The playbook for hiring a VP of Sales has fundamentally changed for 2026. With AI agents handling pipelines and hybrid teams becoming the norm, founders must screen for AI fluency and operational depth using these updated interview questions.

NOW LET US Related – The GTM Stack Anthropic Uses From Its Head of Industries: Surprisingly Familiar Names. Used In New Ways.

saas-growth

The GTM Stack Anthropic Uses From Its Head of Industries: Surprisingly Familiar Names. Used In New Ways.

You might expect Anthropic to use a custom, cutting-edge AI sales platform, but they actually rely on familiar B2B tools like Salesforce, Slack, and Jira. The magic lies in how they integrate Claude to automate and redefine their entire Go-To-Market (GTM) workflow.

NOW LET US Related – A Deep Dive With the Replit Team on Our Agents: 10K, QBee, the AGI-ish Bloomberg Beta Email, and Programming in English (For Real)

saas-growth

A Deep Dive With the Replit Team on Our Agents: 10K, QBee, the AGI-ish Bloomberg Beta Email, and Programming in English (For Real)

At SaaStr AI 2026, the Replit team shared the real-world playbook of running a B2B media, community, and VC firm with just 3 humans and over 21 AI agents, signaling the dawn of the 'post-software' era.

NOW LET US Related – One Unexpected Benefit of Our AI VP Customer Success: Customers Yell a Lot Less. Everything Is Just … More Calm.

saas-growth

One Unexpected Benefit of Our AI VP Customer Success: Customers Yell a Lot Less. Everything Is Just … More Calm.

AI agents don't just save costs; they change how customers behave. By remaining neutral and unflappable, AI customer success agents get better compliance from clients without the emotional friction that plagues human teams.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.