Agents Just Passed Humans in Token Usage. And They Burn Far More Than Anyone Budgeted. A Deep Dive With OpenRouter’s COO

According to OpenRouter's latest data, AI agentic token usage has officially overtaken human usage. However, this shift is driving massive operational costs that far exceed initial corporate budgets.
If you want a clean read on where AI is heading, look at the inference. Chris Clark, COO and co-founder of OpenRouter, runs the largest AI gateway in the world: around 70 model providers, hundreds of models, one integration that keeps working no matter which lab, cloud, or modality wins next. That vantage point comes with a number that lands hard. OpenRouter expects to process about 28 trillion tokens in a single week. That is roughly 1 percent of all global inference, and more than Salesforce has processed in the entire lifetime of the company.
Half that volume is US and half is the rest of the world, which makes their data a fair proxy for real global trends. And the trend that matters for every B2B founder is the one Clark put front and center.
Agentic usage overtook human usage
For two years the message was go make AI work for your company, and it was hard. People chatted with models, dropped some custom data into a project, and got modest results. In the last few months that changed. Agents started working. You can ask an agent to do something and it gets done. In OpenRouter’s data, agentic token usage is now overtaking human usage, and you do not need the chart to feel it. Everyone building right now feels it.
The part most teams have not priced in is the cost of that shift. Agentic usage burns far more tokens than people anticipated. A human turn in a chat box is short. An agentic turn carries a heavy context load: tool call definitions, MCP gateway definitions, skill front matter, plus reasoning and tool calls going back and forth before the agent returns anything. The token bill for one agentic task can dwarf a hundred human chats. If your forecast still models AI spend like people typing into a box, your forecast is wrong.
What agents need, point one: high-quality inference
Clark’s framing is that agents need three things to succeed, and the first is inference quality that holds up. The surprise is that the same model performs differently depending on who serves it. Artificial Analysis benchmarked one open-weight model across many providers and got variable results from identical weights. Same math, same numbers, different scores.
This is mostly not quantization, and it is not providers sabotaging each other. There is a large amount of software sitting between raw model weights and a valid API response, and plenty of room to misconfigure it, introduce bugs, or parse tool calls incorrectly. The takeaway for anyone building on top of a model: where you source your tokens changes the quality of what you get back, even when the model name on the label is the same.
Point two: agents live and die on tool calling
The second requirement is tool calling, and the data shows how central it has become. Looking at one frontier model family on OpenRouter, around 55 percent of requests asked for tools, the model used those tools 83 percent of the time, and 46 percent of completions finished because of a tool call. The line on that trend marches steadily upward, which maps to how agents work. They are not chatting. They are calling tools, reading results, and calling more tools.
That makes tool calling the load-bearing piece of agentic performance. A model that reasons beautifully but botches its tool calls is useless inside an agent.
Point three: the calls have to succeed
A tool call is a JSON result from the model that says call this tool with these parameters. If the JSON is malformed, the tool name is invented, or the parameters are wrong, the call fails and the agent stalls. And success rates vary meaningfully by provider, driven by the same infrastructure quirks that move inference quality.
Clark made it concrete with a live demo, which is a brave thing to do on stage with a non-deterministic system. He fired 213 tool calls at an open-weight model running on one provider and got an error. Then he switched providers by editing a saved preset, same model, same API, same code, and reran. The errors disappeared, because the second provider had a cleaner implementation. OpenRouter monitors this across thousands of API endpoints in real time and routes agents around the providers that are failing, whether the failure is uptime, malformed tool calls, or something else. Same model slug in your code, fewer broken agents in production.
The gateway problem is now part of your architecture
Stack these three points and a clear conclusion falls out. The hard part of running agents in production is no longer just picking a good model. It is inference quality that varies by provider, tool-call success that varies by provider, and failover when a provider degrades. Those used to be someone else’s problem. With agents as your dominant token consumer, they are now your architecture, whether you build that layer yourself or buy it.
And the cost side deserves its own line on the P&L. The token blowups are not theoretical. Plenty of large companies have already burned through their annual AI budgets early, because they sized the bill for chat and got hit with agents. The fix is to forecast agentic spend for what it is, a multiple of human usage, not an extension of it.
Build for the agents you are about to ship
The chat era is the baseline you are leaving behind. The next year of your AI bill, your reliability problems, and your performance gains will all be driven by agents, and agents behave nothing like a person at a keyboard. Budget tokens like a real and growing line item. Assume your tool-call success rate, not just your model choice, decides whether your agents work. And treat the routing and failover layer as core infrastructure, not a nice-to-have.
The companies that figure this out will ship agents that quietly succeed in the background. The ones that do not will ship agents that fail in ways they cannot see, on a bill they did not plan for.
Top 5 takeaways
**Agentic token usage has overtaken human usage.**OpenRouter processes around 28 trillion tokens a week, roughly 1 percent of global inference, and agents are now the dominant consumer.Agents burn far more tokens than chat. One agentic turn carries tool definitions, MCP gateway definitions, skill front matter, and reasoning loops, so a single task can dwarf a hundred human chats. Forecast it as a multiple of human usage, not an extension of it.The same model performs differently depending on who serves it. Identical weights produce variable benchmark results across providers, driven by the software between the weights and the API, not quantization. Where you source tokens changes the quality you get back.Tool calling is the load-bearing piece. On one frontier model family, 55 percent of requests asked for tools, the model used them 83 percent of the time, and 46 percent of completions finished on a tool call. Tool-call success rate, not just model choice, decides whether your agents work.Routing and failover are now core infrastructure. Tool-call success varies by provider, so monitoring across endpoints and routing agents around failures in real time is part of your architecture, whether you build it or buy it.
Source: SaaStr













