LLM Pricing Trends
How AI API costs have evolved since 2023 — and what the trajectory means for teams building on top of LLMs.
Last updated: April 2026. Prices are per million tokens unless noted.
The price compression arc (2023–2026)
2023: The GPT-4 era — premium pricing
GPT-4 launched in March 2023 at $30/M input tokens and $60/M output tokens — prices that reflected both the genuine cost of serving a large model and a significant premium for the category-defining position. Claude 2 and early Gemini launched at comparable levels, establishing a floor of "frontier AI costs $30–$60/M."
For most businesses, this meant LLM APIs were viable for high-value, low-volume use cases only. Bulk processing workloads were economically unworkable.
2024: The GPT-4o inflection — multimodal at half price
GPT-4o launched in May 2024 at $5/M input ($2.50 by end of year), delivering comparable quality to GPT-4 at a fraction of the cost. This created a new expectation: frontier-class models should cost single-digit dollars per million tokens, not tens of dollars.
The same period saw Claude 3 Haiku at $0.25/M and Gemini Flash at $0.075/M — models that demonstrated production-viable quality at near-commodity pricing. The "fast-tier" category emerged as a serious segment, not a compromise.
Llama 3 open-weight releases put additional pressure on proprietary pricing: developers could now run comparable models on their own infrastructure, and API providers (Together, Fireworks, Groq) offered hosted inference at $0.20–$0.80/M.
2025: Commoditization accelerates
DeepSeek V3 released in January 2025 at $0.27/M input — a 671B parameter model competitive with GPT-4o at less than 10% of the price. DeepSeek R1 followed for reasoning tasks at $0.55/M, undercutting OpenAI o1 by 10x. Both are open-weight.
Gemini 2.0 Flash launched at $0.10/M. Llama 4 Scout at $0.18/M. The fast-tier floor effectively dropped from $0.15/M to $0.06–$0.10/M across the year.
GPT-4.1 launched in April 2025 at $2/M input — significantly cheaper than GPT-4o at launch while being meaningfully more capable. OpenAI simultaneously dropped prices on GPT-4o in response to competitive pressure.
2026: The new landscape
As of April 2026, the market has stratified into clear tiers:
- Mini/nano tier: $0.035–$0.15/M input. Approaching commodity pricing for high-volume workloads.
- Fast tier: $0.06–$0.80/M input. Default choice for most production applications.
- Standard tier: $0.15–$3.00/M input. Balanced performance and cost for quality-sensitive tasks.
- Flagship tier: $1.25–$15.00/M input. Premium pricing persists for frontier capability, but the gap with standard tier has narrowed.
Claude Opus 4 remains at $15/M input — the highest price point on this site — reflecting Anthropic's positioning around capability and safety for enterprise buyers. Simultaneously, the cheapest viable option for many tasks is now under $0.10/M.
Provider strategy analysis
Volume-first pricing with aggressive discounts on mini/nano tiers. GPT-4.1 Nano at $0.10/M is a direct response to DeepSeek and Gemini Flash. OpenAI uses cheap entry tiers to capture developer mindshare, then upsells reasoning models (o3, o4-mini) at higher margins.
Premium positioning on safety, capability, and enterprise trust. Claude Opus remains the most expensive model on the market, targeting regulated industries and use cases where output quality justifies the cost. Haiku tiers are competitive on cost but Anthropic does not compete at the absolute bottom.
Context window and multimodal differentiation. Gemini 2.5 Flash at $0.15/M with a 1M token context window is not replicable at that price point by any other provider. Google can subsidize LLM pricing as a strategic asset rather than a standalone business.
Cost-leader disruption. DeepSeek V3 and R1 forced an industry-wide repricing by proving that frontier-class models could be trained and served at dramatically lower cost. Both are open-weight, further pressuring proprietary providers.
Llama is an infrastructure play, not an API business. By releasing open weights, Meta commoditizes the model layer, which benefits its broader advertising and social platform business by driving AI development on Meta infrastructure. Llama 4 API pricing through third parties is among the lowest for its capability tier.
Where pricing is heading
Based on the trajectory since 2023, several forces continue to drive prices down:
Every Llama or DeepSeek release sets a new floor for what proprietary providers can charge. Open-weight models are already within 20% of proprietary flagship quality on most benchmarks.
Each new generation of inference hardware (custom silicon, better quantization, speculative decoding) reduces serving cost. These savings do not always reach developers immediately, but competitive pressure forces eventual pass-through.
Mixture-of-experts architectures (Llama 4, Mixtral, DeepSeek) deliver competitive quality with far fewer active parameters per inference. This trend has more room to run.
AWS, Azure, and GCP all offer competing hosted models (Bedrock, Azure OpenAI, Vertex AI). Their price floors create ceilings on what first-party providers can charge.
The reasonable expectation is that current fast-tier prices ($0.10–$0.50/M) will become tomorrow's mini-tier prices within 18–24 months. Flagship capability will continue to shift down the price ladder as newer generations arrive.
Compare current prices across all providers on the browse page, or use the cost calculator to estimate what your workload would cost today.