Document Summarization LLM Cost Estimator
How much does it cost to summarize legal contracts, research papers, or support tickets at scale? This calculator breaks down API costs for long-document summarization pipelines.
Recommended Setup
Cost Comparison: All Cloud Models
Based on 200M input + 20M output tokens/month
| Model | Provider | Monthly cost |
|---|---|---|
| Gemma 3 4B | $9.60 | |
| Gemma 3 12B | $10.60 | |
| gpt-oss-120b | OpenAI | $11.60 |
| Llama 3.1 8B (Groq) | Groq | $11.60 |
| Doubao Seed 2.0 Mini | Doubao (ByteDance) | $11.80 |
| Gemma 3n 4B | $14.40 | |
| GPT-5 Nano | OpenAI | $18.00 |
| Qwen3.5-Flash | Qwen (Alibaba) | $18.00 |
| Gemma 4 26B A4B | $18.60 | |
| Gemma 3 27B | $19.20 | |
| GLM-4.7 Flash | Zhipu AI (GLM) | $20.00 |
| DeepSeek V4 Flash | DeepSeek | $24.00 |
| Qwen3 14B | Qwen (Alibaba) | $24.80 |
| Qwen3 30B A3B | Qwen (Alibaba) | $27.00 |
| Doubao Pro 32K | Doubao (ByteDance) | $27.60 |
| Hunyuan TurboS | Hunyuan (Tencent) | $27.60 |
| GPT-4.1 Nano | OpenAI | $28.00 |
| Gemini 2.5 Flash-Lite | $28.00 | |
| Qwen3 VL 32B Instruct | Qwen (Alibaba) | $28.40 |
| Llama 4 Scout (Groq) | Groq | $28.80 |
| Gemma 4 31B | $31.20 | |
| Qwen3 Coder Next | Qwen (Alibaba) | $38.00 |
| Hunyuan T1 | Hunyuan (Tencent) | $39.20 |
| GPT-4o-mini | OpenAI | $42.00 |
| GPT-OSS 120B (Groq) | Groq | $42.00 |
| DeepSeek V3.2 | DeepSeek | $52.80 |
| DeepSeek V3 | DeepSeek | $56.00 |
| Qwen3 VL 235B A22B Instruct | Qwen (Alibaba) | $57.60 |
| DeepSeek V3.1 | DeepSeek | $57.80 |
| Qwen3.6 Flash | Qwen (Alibaba) | $60.60 |
| GPT-5.4 Nano | OpenAI | $65.00 |
| Qwen2.5 VL 72B Instruct | Qwen (Alibaba) | $65.00 |
| Qwen3.5-Plus | Qwen (Alibaba) | $67.60 |
| Qwen3 32B (Groq) | Groq | $69.80 |
| DeepSeek V3 (Mar 2025) | DeepSeek | $76.00 |
| Gemini 3.1 Flash-Lite | $80.00 | |
| Qwen3 Coder 480B A35B | Qwen (Alibaba) | $80.00 |
| Qwen2.5 72B Instruct | Qwen (Alibaba) | $80.00 |
| GPT-5 Mini | OpenAI | $90.00 |
| Qwen3.6 Plus | Qwen (Alibaba) | $105 |
| DeepSeek V4 Pro | DeepSeek | $105 |
| Gemini 2.5 Flash | $110 | |
| GPT-4.1 Mini | OpenAI | $112 |
| Qwen3.7 Plus | Qwen (Alibaba) | $112 |
| Kimi K2.5 | Kimi (Moonshot AI) | $118 |
| Qwen3-Max | Qwen (Alibaba) | $126 |
| Llama 3.3 70B (Groq) | Groq | $134 |
| Doubao Seed 2.0 Pro | Doubao (ByteDance) | $141 |
| R1 0528 | DeepSeek | $143 |
| Qwen2.5 Coder 32B Instruct | Qwen (Alibaba) | $152 |
| Kimi K2.5 (Together) | Together AI | $156 |
| Kimi K2 | Kimi (Moonshot AI) | $160 |
| Kimi K2 Thinking | Kimi (Moonshot AI) | $170 |
| DeepSeek R1 | DeepSeek | $190 |
| Qwen3.5 397B (Together) | Together AI | $192 |
| Llama 3.3 70B (Together) | Together AI | $194 |
| Qwen3 Coder Plus | Qwen (Alibaba) | $195 |
| Qwen3 Max Thinking | Qwen (Alibaba) | $234 |
| GPT-5.4 Mini | OpenAI | $240 |
| Claude 3.5 Haiku | Anthropic | $240 |
| GLM-5 | Zhipu AI (GLM) | $264 |
| Kimi K2.6 | Kimi (Moonshot AI) | $280 |
| Claude Haiku 4.5★ recommended | Anthropic | $300 |
| o4-mini | OpenAI | $308 |
| o3 Mini | OpenAI | $308 |
| GLM-5-Turbo | Zhipu AI (GLM) | $320 |
| Qwen3.7 Max | Qwen (Alibaba) | $325 |
| GLM-5.1 | Zhipu AI (GLM) | $368 |
| GPT-5 | OpenAI | $450 |
| GPT-5 Codex | OpenAI | $450 |
| Gemini 2.5 Pro | $450 | |
| Gemini 3.5 Flash | $480 | |
| Moonshot V1 (128K) | Kimi (Moonshot AI) | $500 |
| DeepSeek V4 Pro (Together) | Together AI | $508 |
| GPT-4.1 | OpenAI | $560 |
| o3 | OpenAI | $560 |
| o4 Mini Deep Research | OpenAI | $560 |
| Gemini 3.1 Pro Preview | $640 | |
| GPT-4o | OpenAI | $700 |
| GPT-5.4 | OpenAI | $800 |
| Claude Sonnet 4.6 | Anthropic | $900 |
| Claude Sonnet 4.5 | Anthropic | $900 |
| Claude Sonnet 4 | Anthropic | $900 |
| Claude Opus 4.7 | Anthropic | $1,500 |
| Claude Opus 4.6 | Anthropic | $1,500 |
| Claude Opus 4.8 | Anthropic | $1,500 |
| Claude Opus 4.5 | Anthropic | $1,500 |
| GPT-5.5 | OpenAI | $1,600 |
| o3 Deep Research | OpenAI | $2,800 |
| Claude Opus 4.8 (Fast) | Anthropic | $3,000 |
| o1 | OpenAI | $4,200 |
| Claude Opus 4.1 | Anthropic | $4,500 |
| Claude Opus 4 | Anthropic | $4,500 |
| GPT-5 Pro | OpenAI | $5,400 |
| o3 Pro | OpenAI | $5,600 |
| Claude Opus 4.7 (Fast) | Anthropic | $9,000 |
| Claude Opus 4.6 (Fast) | Anthropic | $9,000 |
| GPT-5.5 Pro | OpenAI | $9,600 |
| GPT-5.4 Pro | OpenAI | $9,600 |
| o1-pro | OpenAI | $42,000 |
Frequently Asked Questions
How many tokens does a typical document summarization job use?
A 10-page PDF averages 4,000–8,000 input tokens. A 1-page summary output is ~300–600 tokens. Summarizing 10,000 documents/month = 40–80M input tokens and 3–6M output tokens. Long legal contracts or research papers can hit 50,000+ input tokens per document.
Which model handles long documents best?
For very long documents (50k–200k tokens), Gemini 3.5 Flash (1M context, $1.50/1M) or Claude models with 200k context windows are ideal. For medium-length documents (under 32k tokens), Claude Haiku 4.5 ($1/1M) gives excellent quality at low cost. For maximum context, Gemini models support 1M tokens per request.
Is summarization a good use case for a local LLM?
Yes — summarization is a quality-tolerant batch task. Llama 3 70B or Mistral Large run locally on 2× RTX 4090s and produce high-quality summaries. At 200M tokens/month, that's roughly $200–400 in cloud API costs vs ~$100–150 amortized GPU cost at that scale.