LLM Cost Calculator

RAG Pipeline LLM API Cost Estimator

Estimate monthly LLM costs for a Retrieval-Augmented Generation (RAG) pipeline. RAG pipelines have higher input token counts due to retrieved context.

Recommended Setup

Model
Claude Haiku 4.5
Anthropic
Monthly tokens
230M
200M in / 30M out
Estimated monthly cost
$350

Cost Comparison: All Cloud Models

Based on 200M input + 30M output tokens/month

ModelProviderMonthly cost
Gemma 3 4BGoogle$10.40
Gemma 3 12BGoogle$11.90
Llama 3.1 8B (Groq)Groq$12.40
gpt-oss-120bOpenAI$13.40
Doubao Seed 2.0 MiniDoubao (ByteDance)$14.70
Gemma 3n 4BGoogle$15.60
Gemma 3 27BGoogle$20.80
Gemma 4 26B A4B Google$21.90
GPT-5 NanoOpenAI$22.00
Qwen3.5-FlashQwen (Alibaba)$22.00
GLM-4.7 FlashZhipu AI (GLM)$24.00
DeepSeek V4 FlashDeepSeek$26.00
Qwen3 14BQwen (Alibaba)$27.20
Doubao Pro 32KDoubao (ByteDance)$30.40
Hunyuan TurboSHunyuan (Tencent)$30.40
Qwen3 30B A3BQwen (Alibaba)$31.50
GPT-4.1 NanoOpenAI$32.00
Gemini 2.5 Flash-LiteGoogle$32.00
Llama 4 Scout (Groq)Groq$32.20
Qwen3 VL 32B InstructQwen (Alibaba)$32.60
Gemma 4 31BGoogle$34.80
Hunyuan T1Hunyuan (Tencent)$44.80
Qwen3 Coder NextQwen (Alibaba)$46.00
GPT-4o-miniOpenAI$48.00
GPT-OSS 120B (Groq)Groq$48.00
DeepSeek V3.2DeepSeek$56.20
DeepSeek V3DeepSeek$64.00
DeepSeek V3.1DeepSeek$65.70
Qwen3 VL 235B A22B InstructQwen (Alibaba)$66.40
Qwen3.6 FlashQwen (Alibaba)$71.90
Qwen2.5 VL 72B InstructQwen (Alibaba)$72.50
Qwen3.5-PlusQwen (Alibaba)$75.40
Qwen3 32B (Groq)Groq$75.70
GPT-5.4 NanoOpenAI$77.50
Qwen2.5 72B InstructQwen (Alibaba)$84.00
DeepSeek V3 (Mar 2025)DeepSeek$87.00
Gemini 3.1 Flash-LiteGoogle$95.00
Qwen3 Coder 480B A35BQwen (Alibaba)$98.00
GPT-5 MiniOpenAI$110
DeepSeek V4 ProDeepSeek$114
Qwen3.6 PlusQwen (Alibaba)$125
GPT-4.1 MiniOpenAI$128
Qwen3.7 PlusQwen (Alibaba)$128
Gemini 2.5 FlashGoogle$135
Kimi K2.5Kimi (Moonshot AI)$137
Llama 3.3 70B (Groq)Groq$142
Qwen3-MaxQwen (Alibaba)$145
Qwen2.5 Coder 32B InstructQwen (Alibaba)$162
R1 0528DeepSeek$165
Doubao Seed 2.0 ProDoubao (ByteDance)$165
Kimi K2Kimi (Moonshot AI)$183
Kimi K2.5 (Together)Together AI$184
Kimi K2 ThinkingKimi (Moonshot AI)$195
Llama 3.3 70B (Together)Together AI$202
DeepSeek R1DeepSeek$215
Qwen3 Coder PlusQwen (Alibaba)$228
Qwen3.5 397B (Together)Together AI$228
Qwen3 Max ThinkingQwen (Alibaba)$273
Claude 3.5 HaikuAnthropic$280
GPT-5.4 MiniOpenAI$285
GLM-5Zhipu AI (GLM)$296
Kimi K2.6Kimi (Moonshot AI)$320
Claude Haiku 4.5★ recommendedAnthropic$350
o4-miniOpenAI$352
o3 MiniOpenAI$352
GLM-5-TurboZhipu AI (GLM)$360
Qwen3.7 MaxQwen (Alibaba)$363
GLM-5.1Zhipu AI (GLM)$412
GPT-5OpenAI$550
GPT-5 CodexOpenAI$550
Gemini 2.5 ProGoogle$550
Moonshot V1 (128K)Kimi (Moonshot AI)$550
DeepSeek V4 Pro (Together)Together AI$552
Gemini 3.5 FlashGoogle$570
GPT-4.1OpenAI$640
o3OpenAI$640
o4 Mini Deep ResearchOpenAI$640
Gemini 3.1 Pro PreviewGoogle$760
GPT-4oOpenAI$800
GPT-5.4OpenAI$950
Claude Sonnet 4.6Anthropic$1,050
Claude Sonnet 4.5Anthropic$1,050
Claude Sonnet 4Anthropic$1,050
Claude Opus 4.7Anthropic$1,750
Claude Opus 4.6Anthropic$1,750
Claude Opus 4.8Anthropic$1,750
Claude Opus 4.5Anthropic$1,750
GPT-5.5OpenAI$1,900
o3 Deep ResearchOpenAI$3,200
Claude Opus 4.8 (Fast)Anthropic$3,500
o1OpenAI$4,800
Claude Opus 4.1Anthropic$5,250
Claude Opus 4Anthropic$5,250
o3 ProOpenAI$6,400
GPT-5 ProOpenAI$6,600
Claude Opus 4.7 (Fast)Anthropic$10,500
Claude Opus 4.6 (Fast)Anthropic$10,500
GPT-5.5 ProOpenAI$11,400
GPT-5.4 ProOpenAI$11,400
o1-proOpenAI$48,000

Frequently Asked Questions

Why are RAG pipeline token costs higher than regular chatbots?

RAG injects retrieved document chunks into every prompt. Each retrieval adds 500–2,000 tokens of context. A moderate RAG system serving 5,000 queries/day can consume 100–500M input tokens per month.

Which LLM is best for RAG pipelines?

Claude Haiku 4.5 ($1/1M input) and GPT-5.4 Nano ($0.20/1M) are popular choices. For RAG with very long context windows, Gemini 3.5 Flash ($1.50/1M) supports 1M tokens per request and has excellent price/performance. For budget-focused RAG, Gemini 2.5 Flash-Lite ($0.10/1M) is the cheapest option with 1M context.

Should I self-host the LLM for my RAG pipeline?

If your RAG pipeline consumes 500M+ tokens/month, a self-hosted A100 or two RTX 4090s may become cost-competitive. Use this calculator to find your break-even point.