Groq API Pricing 2025
Ultra-fast inference on dedicated LPU hardware — up to 1,000 tokens/sec. Best for latency-sensitive production apps. Predictable, linear pricing.
Get Groq API access →Groq Model Pricing
Prices in USD per 1M tokens
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
Llama 3.1 8B (Groq) Ultra-low cost; 840 tokens/sec on Groq LPU | $0.05 | $0.08 | 128,000 |
Llama 4 Scout (Groq) Fastest LLM inference on Groq LPU; 594 tokens/sec | $0.11 | $0.34 | 128,000 |
GPT-OSS 120B (Groq) OpenAI open-source model on Groq; 500 tokens/sec | $0.15 | $0.6 | 128,000 |
Qwen3 32B (Groq) Strong mid-size model with extended context; 662 tokens/sec | $0.29 | $0.59 | 131,072 |
Llama 3.3 70B (Groq) Reliable 70B on Groq LPU; 394 tokens/sec | $0.59 | $0.79 | 128,000 |
Estimated Monthly Cost (70% input / 30% output split)
| Model | 1M tokens/mo | 10M tokens/mo | 100M tokens/mo | 1B tokens/mo |
|---|---|---|---|---|
| Llama 3.1 8B (Groq) | $0.059 | $0.590 | $5.90 | $59.00 |
| Llama 4 Scout (Groq) | $0.179 | $1.79 | $17.90 | $179 |
| GPT-OSS 120B (Groq) | $0.285 | $2.85 | $28.50 | $285 |
| Qwen3 32B (Groq) | $0.380 | $3.80 | $38.00 | $380 |
| Llama 3.3 70B (Groq) | $0.650 | $6.50 | $65.00 | $650 |
Frequently Asked Questions
How much does Groq LLM API cost?
Groq offers 5 models ranging from $0.050/1M to $0.59/1M input tokens. Ultra-fast inference on dedicated LPU hardware — up to 1,000 tokens/sec. Best for latency-sensitive production apps. Predictable, linear pricing.
Is Groq cheaper than self-hosting?
For low-volume workloads (under 100M tokens/month), cloud APIs like Groq are almost always cheaper than purchasing and maintaining GPU hardware. Use our calculator to find the exact break-even point for your usage.