Cloud pricing directory
Cloud LLM API Pricing
Browse provider-specific LLM API pricing pages for OpenAI, Anthropic, Google Gemini, DeepSeek, Groq, Together AI, and other hosted model platforms. Each page lists model rates per 1M tokens and example monthly costs for common token volumes.
Pricing verified 2026-06-06. Always confirm final rates with the provider before making production decisions.
Core provider pricing pages
OpenAI
The most widely-used LLM API. GPT-5.5 and GPT-5.4 are the current frontier tiers; o3-pro and o4-mini lead the reasoning tier. OSS 120B offers open-source quality at near-free pricing.
Anthropic
Claude 4 models excel at agentic coding, reasoning, and long documents. Claude Opus 4.8 is the latest flagship; Sonnet 4.6 is the best speed/intelligence balance with 1M token context.
Google Gemini
Gemini 3.5 Flash is the latest frontier model. The 2.5 series offers excellent price/performance with 1M token context. Gemma 4 open-weights models are among the best open-source options.
Groq
Ultra-fast inference on dedicated LPU hardware — up to 1,000 tokens/sec. Best for latency-sensitive production apps. Predictable, linear pricing.
Together AI
Best prices for the latest open-source and frontier models. Supports Kimi K2.5, DeepSeek V4 Pro, Qwen3.5, and Llama 4 via serverless inference.
DeepSeek
Chinese AI lab delivering frontier open-source models at industry-leading prices. V4 Flash is the default chat tier; R1-0528 is the latest reasoning snapshot with improved performance.
More cloud LLM providers
Qwen (Alibaba)
Alibaba's Qwen series via DashScope API. Qwen3.7 Max is the latest flagship; Qwen3 Coder 480B leads on coding tasks; Qwen3.5-Flash is among the cheapest capable models globally.
Doubao (ByteDance)
ByteDance's Doubao model family, served via Volcano Engine API. Seed 2.0 Pro is the flagship with 256K context; Seed 2.0 Mini is one of the lowest-cost options for high-volume chat and classification.
Kimi (Moonshot AI)
Moonshot AI's Kimi series, known for exceptional long-context understanding and code generation. Kimi K2.6 is the latest flagship; K2.5 offers 262K context at lower cost.
Zhipu AI (GLM)
Tsinghua-affiliated AI lab. GLM-5.1 is the latest flagship; GLM-4.7 Flash delivers extreme cost efficiency for high-throughput workloads.
Hunyuan (Tencent)
Tencent Cloud's Hunyuan models. TurboS is the high-speed inference variant; T1 adds chain-of-thought reasoning. Both are priced in CNY and convert to strong USD value — TurboS offers 256K context at $0.11/$0.28 per 1M tokens.
How to compare provider pricing
Token prices are only one part of the final bill. Compare input and output rates, context length, model quality, retries, cache behavior, latency, and whether your workload can use a smaller model.
Estimate your monthly LLM API bill