OpenAI cuts GPT-5 input by 17%, expands cached input discount
Standard input drops from $3.00 to $2.50 per 1M. Cached input now $0.625 per 1M, a 50% reduction.
Tokentrendr monitors token pricing across top AI platforms and shows where costs are rising, falling, and creating savings opportunities.
The largest pricing changes in the selected range.
Filterable comparison across input, output, cached input, context window, and recent movement.
| Provider | Cached | Trend | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Gemini 2.5 Flash-Lite budget | GGoogle | $0.075 | $0.300 | — | 1000K | 0.0% | View | ||
| Gemini 2.5 Flash fast | GGoogle | $0.150 | $0.600 | $0.037 | 1000K | -25.0% | View | ||
| Command R fast | CCohere | $0.150 | $0.600 | — | 128K | -25.0% | View | ||
| GPT-5 mini fast | OOpenAI | $0.250 | $2.00 | $0.050 | 400K | -28.6% | View | ||
| DeepSeek V3.2 frontier | DDeepSeek | $0.270 | $1.10 | $0.070 | 128K | -22.9% | View | ||
| Grok 4 mini fast | xxAI | $0.300 | $1.50 | — | 131K | -14.3% | View | ||
| Codestral 2 coding | MMistral | $0.400 | $1.20 | — | 256K | 0.0% | View | ||
| DeepSeek R1 reasoning | DDeepSeek | $0.550 | $2.19 | — | 64K | -45.0% | View | ||
| Llama 4 70B fast | MMeta | $0.590 | $0.790 | — | 128K | 0.0% | View | ||
| Claude 4.5 Haiku fast | AAnthropic | $0.800 | $4.00 | $0.080 | 200K | +14.3% | View | ||
| Sonar Large multimodal | PPerplexity | $1.00 | $1.00 | — | 127K | 0.0% | View | ||
| Gemini 2.5 Pro frontier | GGoogle | $1.25 | $10.00 | $0.310 | 2000K | -10.7% | View | ||
| Mistral Large 3 frontier | MMistral | $2.00 | $6.00 | — | 128K | -20.0% | View | ||
| GPT-5 frontier | OOpenAI | $2.50 | $10.00 | $0.625 | 400K | -16.7% | View | ||
| Command R+ frontier | CCohere | $2.50 | $10.00 | — | 128K | -16.7% | View | ||
| Llama 4 405B frontier | MMeta | $2.70 | $2.70 | — | 128K | -10.0% | View | ||
| Claude 4.5 Sonnet frontier | AAnthropic | $3.00 | $15.00 | $0.300 | 200K | -8.3% | View | ||
| Grok 4 frontier | xxAI | $3.00 | $15.00 | — | 256K | +7.1% | View | ||
| o4 reasoning | OOpenAI | $5.00 | $20.00 | — | 200K | 0.0% | View | ||
| Claude 4.5 Opus reasoning | AAnthropic | $15.00 | $75.00 | $1.50 | 200K | 0.0% | View |
Standard input drops from $3.00 to $2.50 per 1M. Cached input now $0.625 per 1M, a 50% reduction.
Long-context inputs become materially cheaper via sparse attention. R1 also receives a permanent price cut.
Cached input now 1/10th the price of standard input across Sonnet, Haiku, and Opus.
Standard input now $0.15 per 1M tokens. Cached input remains 25% of standard.
Haiku now supports a 200K context window. The price increase reflects the expanded capability.
xAI cites realtime context infrastructure costs. Grok 4 mini remains unchanged.
Long, repeated system or tool-definition blocks can cost 90% less when cached. Restructure prompts so static content comes first.
Most providers offer 50% off when results can return within 24 hours. Ideal for evals, backfills, and enrichment.
Classify intent with a small model, then escalate. A simple router can cut spend 60%+ on mixed workloads.
Output tokens are typically 4–5x the price of input. Set max_tokens, request concise schemas, and prefer structured outputs.
Drop turn-history older than what the model needs. Summarize older state into a short memory block.
Native structured outputs reduce retries and wasted tokens from malformed JSON.
Prices are normalized to USD per 1M tokens. We track input, output, cached input, and batch discounts from official pricing pages and provider changelogs. Movement is calculated against the previous published price.
Read the full methodology