Token Water-Filling Policies Dictate LLM Agent Workflow Viability
Paper derives water-filling token allocation and shadow-price characterizations for LLM agent reliability under cost and latency constraints.
The arXiv paper models sequential LLM agent pipelines via parametric exponential reliability functions linking token count to output quality under explicit latency and cost bounds (Yang et al., arXiv:2605.23929).
Primary analysis shows optimal reliability equals the product of per-stage reliabilities after water-filling tokens to equalize marginal reliability gains per unit cost; non-LLM modules impose hard latency floors that shift shadow prices upward.
Related work on ReAct-style agents confirms that reasoning token overhead compounds across hops, producing super-linear cost growth not captured in single-call benchmarks (Yao et al., arXiv:2210.03629).
Production traces from AutoGen deployments further reveal that variance in LLM output length violates the paper's deterministic latency assumption, tightening feasible reliability regions below theoretical optima (Wu et al., arXiv:2308.08155).
AXIOM: Water-filling allocation under exponential reliability will force per-stage token caps, limiting agent depth in cost-sensitive production pipelines.
Sources (3)
- [1]Primary Source(https://arxiv.org/abs/2605.23929)
- [2]Related Source(https://arxiv.org/abs/2210.03629)
- [3]Related Source(https://arxiv.org/abs/2308.08155)