The Cheapest LLM APIs in 2026: Full Price Comparison
Compare the most affordable LLM APIs across all major providers. From $0.03/1M tokens to budget tiers from OpenAI, Google, and Anthropic — find the lowest cost option for your workload.
The cheapest LLM APIs available in 2026
The absolute lowest-cost capable LLM APIs in 2026 come primarily from Chinese providers. Doubao Seed 2.0 Mini (ByteDance) leads at $0.03 per million input tokens and $0.29 per million output tokens. OpenAI's open-source gpt-oss-120b costs $0.04/$0.18 per million tokens and is available on Groq and Together AI. Qwen3.5-Flash (Alibaba) and GPT-5 Nano (OpenAI) both sit at $0.05 per million input tokens, while Groq Llama 3.1 8B offers $0.05/$0.08 with ultra-fast inference.
In the $0.06–$0.12 range: GLM-4.7 Flash (Zhipu, $0.06/$0.40), Gemma 4 26B ($0.06/$0.33), Hunyuan TurboS ($0.11/$0.28), and Groq Llama 4 Scout ($0.11/$0.34). These models are capable enough for most structured tasks and dramatically cheaper than flagship options.
Budget tiers from Western providers
Among major Western providers, the budget landscape has expanded significantly. OpenAI's GPT-5 Nano ($0.05/$0.40) and GPT-5.4 Nano ($0.20/$1.25) are the cheapest options that use the current model generation. Google's Gemini 2.5 Flash-Lite ($0.10/$0.40) and Gemini 3.1 Flash-Lite ($0.25/$1.50) cover the low end of the Gemini family. Anthropic's cheapest current model is Claude Haiku 4.5 at $1.00/$5.00 — more expensive than competitors in this category, but competitive on reasoning quality per dollar.
For teams that need Western-provider compliance, data residency, or enterprise agreements, GPT-5.4 Nano and GPT-5 Nano are usually the best starting point. They are one-third to one-fifth the cost of Claude Haiku 4.5 and can handle most routing, extraction, and classification tasks.
When cheap is not the right choice
Low-cost models save money per token but can increase total cost if they fail more often. A task that requires three retries on a $0.05/1M model may cost more than a single successful call on a $1.00/1M model. Evaluate cost per successful task, not cost per token alone.
The cheapest tier is appropriate for: routing and intent detection, structured data extraction from templated inputs, simple classification and tagging, summarization of short documents, and background batch jobs with moderate quality requirements. Move to a higher tier when answer accuracy or reasoning depth changes the user outcome.
How to reduce costs without switching models
Prompt caching can cut input costs by 50–90 percent for workloads that repeat a large system prompt or knowledge base across many requests. Most major providers now offer caching at a significant discount. Batch APIs — available from OpenAI and Anthropic — reduce cost by 50 percent for workloads that tolerate asynchronous processing.
Output length discipline is often overlooked. Setting a firm `max_tokens` limit, using structured output formats that eliminate filler text, and splitting multi-step tasks so only the final synthesis uses a long output can reduce output token counts by 20–50 percent. The calculator lets you model these assumptions directly.
Estimate your own workload
Use the calculator to compare your expected API bill with a purchased or rented GPU setup.
Open calculator