The Hidden Tax of Noise: What Verbose Terminal Output Costs in the Age of AI Agents

By Jeremy Martinez · June 1, 2026 · 8 min read

An analysis of how much agentic AI systems spend on tokens — and how much of that spend is wasted on error dialogs, stack traces, and other low-value text.

When a developer asks a chatbot a question, the economics are simple: one prompt in, one answer out, a few cents at most. But the way teams actually use large language models (LLMs) in 2026 has shifted decisively from single questions to agents — systems that read a task, take an action, read the result, and loop, often for dozens of turns before they finish. That shift has quietly turned token consumption into a meaningful line item, and it has made the quality of the text agents read into a real cost driver.

This piece looks at three things: how many tokens modern agentic workflows actually burn, what those tokens cost at current API prices, and how much of the bill can be traced to processing noisy, low-information output like error dumps, stack traces, and build logs.

Why agents are uniquely expensive

The headline finding comes from a 2026 study by researchers at the University of Michigan and Stanford's Digital Economy Lab, How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks. Analyzing trajectories from eight frontier models on the SWE-bench Verified benchmark, the authors found that agentic tasks are “uniquely expensive, consuming 1000x more tokens than code reasoning and code chat,” and that the cost is driven by input tokens rather than output tokens.¹²

The reason is structural, not incidental. In a one-shot chat, the model reads your question once and answers. In an agentic loop, the model reads the task, gets a response, then has to re-read everything — the original prompt, every file it has opened, every command result, and the full conversation so far — before deciding its next action. Then it re-reads all of that plus the latest result before the action after that. The Stanford summary calls this “one big pricey context snowball.”¹

Cloud-cost analysts at Vantage describe the same mechanic in dollar terms: every API call re-sends the model its entire accumulated context as input tokens, including “every file the agent has read or retrieved, every edit it's made, every error message it's encountered, and the full conversation history up to that point.” Their illustration: a session that sends ~5,000 input tokens on turn one may be carrying 25,000–35,000 input tokens per turn by turn 30.³

Two consequences follow. First, total consumption grows faster than linearly with task length, because each turn pays to re-read a context that keeps getting bigger. Second — and this is the crux for anyone worried about waste — anything that lands in the context early gets paid for again on every subsequent turn. A noisy log read on turn 5 of a 30-turn task isn't billed once; it's billed roughly 25 more times as it rides along in the re-sent context.

The scale is not hypothetical. An empirical study of more than 100 trillion tokens of real-world LLM traffic, published by a16z and OpenRouter in December 2025, documents how thoroughly multi-step, reasoning-heavy usage has come to dominate real deployments since reasoning models went mainstream.⁴

Table 1 — Token usage by workload

The table below puts the 1000× gap in context. Figures for chat, autocomplete, and one-shot reasoning are illustrative estimates based on typical context sizes; the agentic and SWE-bench ranges reflect the cited research.

Workload	Typical total tokens (per task)	Basis
Single chat question (“explain this function”)	500 – 2,000	Estimate (one round trip)
Autocomplete / short inline edit	1,000 – 5,000	Estimate
One-shot code reasoning (contained problem)	5,000 – 25,000	Estimate
Multi-turn agentic coding task (feature across several files, run tests, iterate)	100,000 – 1,000,000+	Derived from per-turn context growth³
Full SWE-bench-style agentic trajectory	Up to millions; ~1000× code chat/reasoning	Cited¹²

The jump from the third row to the fourth is the entire story: the same underlying model, pointed at the same kind of problem, costs two to three orders of magnitude more once it runs as a loop instead of a single call.

Where the noise comes from

Not all of those input tokens carry information the model needs. A large share of what an agent reads from a terminal is boilerplate that contributes nothing to solving the task:

Stack traces and error dialogs. A single uncaught exception can print dozens of frames of internal library paths, most of which are irrelevant to the one line of user code that actually broke.
Install and build logs. A failed npm install, pip install, or container build routinely emits tens of thousands of characters of progress bars, deprecation warnings, and peer-dependency chatter before the one real error.
Test-runner output. A failing suite often reprints every passing test, full diffs, and setup/teardown logging around the handful of assertions that failed.
Redundant logging. Verbose or debug-level output, ANSI color codes, and repeated status lines inflate byte counts without adding signal.

In a human workflow this is merely annoying — you scroll past it. In an agentic workflow it is expensive, because the agent ingests it as input tokens and, per the context-snowball effect, keeps paying to re-read it.

Table 2 — Current LLM API pricing

To convert tokens into dollars, the table below lists standard (non-batch) API list prices per 1 million tokens, as published by each provider as of June 2026. Output tokens are consistently several times more expensive than input tokens, but for agentic workloads the input column matters most, since input dominates the bill.¹

Provider / Model	Input ($ / 1M tokens)	Output ($ / 1M tokens)
OpenAI — GPT-5.5 (flagship)	$5.00	$30.00
OpenAI — GPT-5	$1.25	$10.00
Anthropic — Claude Opus 4.7	$5.00	$25.00
Anthropic — Claude Sonnet 4.6	$3.00	$15.00
Anthropic — Claude Haiku 4.5	$1.00	$5.00
Google — Gemini 2.5 Pro (≤200K context)	$1.25	$10.00
Google — Gemini 2.5 Pro (>200K context)	$2.50	$15.00
Google — Gemini 2.5 Flash	$0.30	$2.50

Sources: OpenAI, Anthropic, and Google Gemini pricing pages.⁵⁶⁷ Prices change frequently and exclude batch, cached-input, and priority modifiers.

Table 3 — The cost of noise (worked examples)

The following estimates quantify what verbose output costs once the context-snowball effect is included. They are illustrative, not measured, and depend on the stated assumptions.

Assumptions

~4 characters per token for English/log text, a standard rule of thumb.⁸
Blended input price of $3.50 per 1M tokens, a rough midpoint of the flagship input rates in Table 2.
Noise is carried for 8 subsequent turns before context is trimmed or summarized — a conservative figure given that real sessions run to 30+ turns.³

Scenario	Raw size	≈ Tokens	Useful signal	Wasted tokens (incl. re-reads ×8)	Wasted cost @ $3.50/1M
Verbose stack trace	~3,000 chars	~750	~75 tokens	(750−75) × 8 ≈ 5,400	≈ $0.019
Failed `npm install` / build log	~60,000 chars	~15,000	~400 tokens	(15,000−400) × 8 ≈ 116,800	≈ $0.41
Failing test suite (full reprint)	~25,000 chars	~6,250	~500 tokens	(6,250−500) × 8 ≈ 46,000	≈ $0.16

Scaling to a developer-month (illustrative). Suppose an agent assisting one developer hits roughly 20 noisy command outputs per active day of the build-log magnitude above, across 20 active days per month:

20 incidents/day × 20 days = 400 incidents/month
400 × $0.41  ≈  $164 / developer / month  in wasted input tokens

For a 50-developer engineering org running agents at this intensity, that is on the order of $8,000/month — roughly $100,000/year — spent re-reading text the model never needed. These figures are deliberately rounded and should be read as an order-of-magnitude illustration, not a precise forecast; the real number for any team depends on model choice, how aggressively context is trimmed, and how noisy their toolchain is. But the direction is unambiguous: because input is re-sent every turn, trimming low-value output has a multiplied, not one-time, effect on cost.

Why this is hard to see — and hard to predict

Part of what makes noise costs insidious is that they are nearly invisible at the moment they occur. You only see token usage after a task finishes, and the bill aggregates signal and noise into one number. The Stanford team found that agents cannot reliably predict their own token consumption before running, which they describe as “the fundamental bottleneck for result-based pricing for agents.”¹ If the systems spending the money can't forecast it, the humans paying the bill have even less visibility.

The takeaway is not that agents are too expensive to use — their productivity gains are precisely why adoption is climbing. It is that, in an architecture where every input token can be re-read many times over the life of a task, the text you feed an agent is not free. Verbose, low-signal terminal output is a tax that compounds with every turn, and at current prices and current usage patterns, that tax is large enough to measure.

Sources

Stanford Digital Economy Lab, “How are AI agents spending your tokens?” (May 5, 2026). digitaleconomy.stanford.edu
Bai, L., Huang, Z., Wang, X., Sun, J., Mihalcea, R., Brynjolfsson, E., Pentland, A., Pei, J. “How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks.” arXiv:2604.22750 (2026). arxiv.org/abs/2604.22750
Vantage, “Agentic Coding Costs.” vantage.sh/blog/agentic-coding-costs
Aubakirova, M., Atallah, A., Clark, C., Summerville, J., Midha, A. “State of AI: An Empirical 100 Trillion Token Study with OpenRouter” (a16z & OpenRouter, December 2025). openrouter.ai/state-of-ai
OpenAI API pricing. openai.com/api/pricing
Anthropic pricing. anthropic.com/pricing
Google Gemini API pricing. ai.google.dev/gemini-api/docs/pricing
OpenAI, “What are tokens and how to count them?” help.openai.com

Figures labeled “estimate” or “illustrative” are derived for explanatory purposes from the stated assumptions and are not measured values. API prices are as of June 2026 and subject to change.