Market preview ·

Track AI model pricing like a market.

Tokentrendr monitors token pricing across top AI platforms and shows where costs are rising, falling, and creating savings opportunities.

Green down = price drop, savings Red up = price increase
Market snapshotUSD / 1M tokens
OA
GPT-5
OpenAI
$2.50
-16.7%
AN
Claude 4.5 Sonnet
Anthropic
$3.00
-8.3%
GG
Gemini 2.5 Pro
Google
$1.25
-10.7%
xA
Grok 4
xAI
$3.00
+7.1%
MS
Mistral Large 3
Mistral
$2.00
-20.0%
DPSDeepSeek R1$0.550-45.0%OAIGPT-5 mini$0.250-28.6%GGLGemini 2.5 Flash$0.150-25.0%COHCommand R$0.150-25.0%DPSDeepSeek V3.2$0.270-22.9%MSTMistral Large 3$2.00-20.0%OAIGPT-5$2.50-16.7%COHCommand R+$2.50-16.7%ANTClaude 4.5 Haiku$0.800+14.3%xAIGrok 4 mini$0.300-14.3%GGLGemini 2.5 Pro$1.25-10.7%MTALlama 4 405B$2.70-10.0%DPSDeepSeek R1$0.550-45.0%OAIGPT-5 mini$0.250-28.6%GGLGemini 2.5 Flash$0.150-25.0%COHCommand R$0.150-25.0%DPSDeepSeek V3.2$0.270-22.9%MSTMistral Large 3$2.00-20.0%OAIGPT-5$2.50-16.7%COHCommand R+$2.50-16.7%ANTClaude 4.5 Haiku$0.800+14.3%xAIGrok 4 mini$0.300-14.3%GGLGemini 2.5 Pro$1.25-10.7%MTALlama 4 405B$2.70-10.0%
Market overview

Today on the board

Providers tracked
9
Models tracked
20
Biggest drop
-45.0%
DeepSeek R1
Biggest increase
+14.3%
Claude 4.5 Haiku
Average move
-11.1%
12 down · 2 up
Last updated
Pricing comparison

All models, all providers

Filterable comparison across input, output, cached input, context window, and recent movement.

Open full table
ProviderCachedTrend
Gemini 2.5 Flash-Lite
budget
GGoogle
$0.075$0.3001000K
0.0%View
Gemini 2.5 Flash
fast
GGoogle
$0.150$0.600$0.0371000K
-25.0%View
Command R
fast
CCohere
$0.150$0.600128K
-25.0%View
GPT-5 mini
fast
OOpenAI
$0.250$2.00$0.050400K
-28.6%View
DeepSeek V3.2
frontier
DDeepSeek
$0.270$1.10$0.070128K
-22.9%View
Grok 4 mini
fast
xxAI
$0.300$1.50131K
-14.3%View
Codestral 2
coding
MMistral
$0.400$1.20256K
0.0%View
DeepSeek R1
reasoning
DDeepSeek
$0.550$2.1964K
-45.0%View
Llama 4 70B
fast
MMeta
$0.590$0.790128K
0.0%View
Claude 4.5 Haiku
fast
AAnthropic
$0.800$4.00$0.080200K
+14.3%View
Sonar Large
multimodal
PPerplexity
$1.00$1.00127K
0.0%View
Gemini 2.5 Pro
frontier
GGoogle
$1.25$10.00$0.3102000K
-10.7%View
Mistral Large 3
frontier
MMistral
$2.00$6.00128K
-20.0%View
GPT-5
frontier
OOpenAI
$2.50$10.00$0.625400K
-16.7%View
Command R+
frontier
CCohere
$2.50$10.00128K
-16.7%View
Llama 4 405B
frontier
MMeta
$2.70$2.70128K
-10.0%View
Claude 4.5 Sonnet
frontier
AAnthropic
$3.00$15.00$0.300200K
-8.3%View
Grok 4
frontier
xxAI
$3.00$15.00256K
+7.1%View
o4
reasoning
OOpenAI
$5.00$20.00200K
0.0%View
Claude 4.5 Opus
reasoning
AAnthropic
$15.00$75.00$1.50200K
0.0%View
Recent updates

Pricing news

All news
OOpenAI
pricing

OpenAI cuts GPT-5 input by 17%, expands cached input discount

Standard input drops from $3.00 to $2.50 per 1M. Cached input now $0.625 per 1M, a 50% reduction.

2026-05-15OpenAI
DDeepSeek
pricing

DeepSeek V3.2 launches with sparse attention, 23% cheaper than V3.1

Long-context inputs become materially cheaper via sparse attention. R1 also receives a permanent price cut.

2026-05-15DeepSeek
AAnthropic
api

Anthropic expands prompt caching to all Claude 4.5 tiers

Cached input now 1/10th the price of standard input across Sonnet, Haiku, and Opus.

2026-05-14Anthropic
GGoogle
pricing

Gemini 2.5 Flash price drops 25% across Vertex and AI Studio

Standard input now $0.15 per 1M tokens. Cached input remains 25% of standard.

2026-05-12Google
AAnthropic
pricing

Claude 4.5 Haiku input rises 14% alongside context window upgrade

Haiku now supports a 200K context window. The price increase reflects the expanded capability.

2026-04-01Anthropic
xxAI
pricing

Grok 4 standard input increases 7%

xAI cites realtime context infrastructure costs. Grok 4 mini remains unchanged.

2026-04-12xAI
Savings library

Tips to cut token spend

Full library
CachingBeginnerHigh impact

Use prompt caching for repeated system prompts

Long, repeated system or tool-definition blocks can cost 90% less when cached. Restructure prompts so static content comes first.

OpenAIAnthropicGoogleDeepSeek
BatchingBeginnerHigh impact

Route async workloads to Batch APIs

Most providers offer 50% off when results can return within 24 hours. Ideal for evals, backfills, and enrichment.

OpenAIAnthropicGoogle
RoutingIntermediateHigh impact

Route by task complexity, not by default to your largest model

Classify intent with a small model, then escalate. A simple router can cut spend 60%+ on mixed workloads.

Prompt designBeginnerMedium impact

Cap output tokens deliberately

Output tokens are typically 4–5x the price of input. Set max_tokens, request concise schemas, and prefer structured outputs.

ContextIntermediateMedium impact

Prune context aggressively

Drop turn-history older than what the model needs. Summarize older state into a short memory block.

Prompt designBeginnerMedium impact

Use structured outputs over freeform JSON parsing

Native structured outputs reduce retries and wasted tokens from malformed JSON.

OpenAIAnthropicGoogle
Methodology

How Tokentrendr measures the market

Prices are normalized to USD per 1M tokens. We track input, output, cached input, and batch discounts from official pricing pages and provider changelogs. Movement is calculated against the previous published price.

Read the full methodology
  • Normalized unitsPer 1M tokens, USD
  • SourcesOfficial pricing pages and changelogs
  • Update cadenceEvery 6 hours via Firecrawl scrape
  • Direction colorsGreen = down (good), Red = up (bad)