LLM Cost Calculator

Document Summarization LLM Cost Estimator

How much does it cost to summarize legal contracts, research papers, or support tickets at scale? This calculator breaks down API costs for long-document summarization pipelines.

Recommended Setup

Model
Claude Haiku 4.5
Anthropic
Monthly tokens
220M
200M in / 20M out
Estimated monthly cost
$300

Cost Comparison: All Cloud Models

Based on 200M input + 20M output tokens/month

ModelProviderMonthly cost
Gemma 3 4BGoogle$9.60
Gemma 3 12BGoogle$10.60
gpt-oss-120bOpenAI$11.60
Llama 3.1 8B (Groq)Groq$11.60
Doubao Seed 2.0 MiniDoubao (ByteDance)$11.80
Gemma 3n 4BGoogle$14.40
GPT-5 NanoOpenAI$18.00
Qwen3.5-FlashQwen (Alibaba)$18.00
Gemma 4 26B A4B Google$18.60
Gemma 3 27BGoogle$19.20
GLM-4.7 FlashZhipu AI (GLM)$20.00
DeepSeek V4 FlashDeepSeek$24.00
Qwen3 14BQwen (Alibaba)$24.80
Qwen3 30B A3BQwen (Alibaba)$27.00
Doubao Pro 32KDoubao (ByteDance)$27.60
Hunyuan TurboSHunyuan (Tencent)$27.60
GPT-4.1 NanoOpenAI$28.00
Gemini 2.5 Flash-LiteGoogle$28.00
Qwen3 VL 32B InstructQwen (Alibaba)$28.40
Llama 4 Scout (Groq)Groq$28.80
Gemma 4 31BGoogle$31.20
Qwen3 Coder NextQwen (Alibaba)$38.00
Hunyuan T1Hunyuan (Tencent)$39.20
GPT-4o-miniOpenAI$42.00
GPT-OSS 120B (Groq)Groq$42.00
DeepSeek V3.2DeepSeek$52.80
DeepSeek V3DeepSeek$56.00
Qwen3 VL 235B A22B InstructQwen (Alibaba)$57.60
DeepSeek V3.1DeepSeek$57.80
Qwen3.6 FlashQwen (Alibaba)$60.60
GPT-5.4 NanoOpenAI$65.00
Qwen2.5 VL 72B InstructQwen (Alibaba)$65.00
Qwen3.5-PlusQwen (Alibaba)$67.60
Qwen3 32B (Groq)Groq$69.80
DeepSeek V3 (Mar 2025)DeepSeek$76.00
Gemini 3.1 Flash-LiteGoogle$80.00
Qwen3 Coder 480B A35BQwen (Alibaba)$80.00
Qwen2.5 72B InstructQwen (Alibaba)$80.00
GPT-5 MiniOpenAI$90.00
Qwen3.6 PlusQwen (Alibaba)$105
DeepSeek V4 ProDeepSeek$105
Gemini 2.5 FlashGoogle$110
GPT-4.1 MiniOpenAI$112
Qwen3.7 PlusQwen (Alibaba)$112
Kimi K2.5Kimi (Moonshot AI)$118
Qwen3-MaxQwen (Alibaba)$126
Llama 3.3 70B (Groq)Groq$134
Doubao Seed 2.0 ProDoubao (ByteDance)$141
R1 0528DeepSeek$143
Qwen2.5 Coder 32B InstructQwen (Alibaba)$152
Kimi K2.5 (Together)Together AI$156
Kimi K2Kimi (Moonshot AI)$160
Kimi K2 ThinkingKimi (Moonshot AI)$170
DeepSeek R1DeepSeek$190
Qwen3.5 397B (Together)Together AI$192
Llama 3.3 70B (Together)Together AI$194
Qwen3 Coder PlusQwen (Alibaba)$195
Qwen3 Max ThinkingQwen (Alibaba)$234
GPT-5.4 MiniOpenAI$240
Claude 3.5 HaikuAnthropic$240
GLM-5Zhipu AI (GLM)$264
Kimi K2.6Kimi (Moonshot AI)$280
Claude Haiku 4.5★ recommendedAnthropic$300
o4-miniOpenAI$308
o3 MiniOpenAI$308
GLM-5-TurboZhipu AI (GLM)$320
Qwen3.7 MaxQwen (Alibaba)$325
GLM-5.1Zhipu AI (GLM)$368
GPT-5OpenAI$450
GPT-5 CodexOpenAI$450
Gemini 2.5 ProGoogle$450
Gemini 3.5 FlashGoogle$480
Moonshot V1 (128K)Kimi (Moonshot AI)$500
DeepSeek V4 Pro (Together)Together AI$508
GPT-4.1OpenAI$560
o3OpenAI$560
o4 Mini Deep ResearchOpenAI$560
Gemini 3.1 Pro PreviewGoogle$640
GPT-4oOpenAI$700
GPT-5.4OpenAI$800
Claude Sonnet 4.6Anthropic$900
Claude Sonnet 4.5Anthropic$900
Claude Sonnet 4Anthropic$900
Claude Opus 4.7Anthropic$1,500
Claude Opus 4.6Anthropic$1,500
Claude Opus 4.8Anthropic$1,500
Claude Opus 4.5Anthropic$1,500
GPT-5.5OpenAI$1,600
o3 Deep ResearchOpenAI$2,800
Claude Opus 4.8 (Fast)Anthropic$3,000
o1OpenAI$4,200
Claude Opus 4.1Anthropic$4,500
Claude Opus 4Anthropic$4,500
GPT-5 ProOpenAI$5,400
o3 ProOpenAI$5,600
Claude Opus 4.7 (Fast)Anthropic$9,000
Claude Opus 4.6 (Fast)Anthropic$9,000
GPT-5.5 ProOpenAI$9,600
GPT-5.4 ProOpenAI$9,600
o1-proOpenAI$42,000

Frequently Asked Questions

How many tokens does a typical document summarization job use?

A 10-page PDF averages 4,000–8,000 input tokens. A 1-page summary output is ~300–600 tokens. Summarizing 10,000 documents/month = 40–80M input tokens and 3–6M output tokens. Long legal contracts or research papers can hit 50,000+ input tokens per document.

Which model handles long documents best?

For very long documents (50k–200k tokens), Gemini 3.5 Flash (1M context, $1.50/1M) or Claude models with 200k context windows are ideal. For medium-length documents (under 32k tokens), Claude Haiku 4.5 ($1/1M) gives excellent quality at low cost. For maximum context, Gemini models support 1M tokens per request.

Is summarization a good use case for a local LLM?

Yes — summarization is a quality-tolerant batch task. Llama 3 70B or Mistral Large run locally on 2× RTX 4090s and produce high-quality summaries. At 200M tokens/month, that's roughly $200–400 in cloud API costs vs ~$100–150 amortized GPU cost at that scale.