How to Choose the Right LLM
The right model depends on what you're building, not which model scores highest on benchmarks. This guide maps use cases to model tiers and highlights the tradeoffs.
The three-axis framework
Every model selection involves three competing pressures. Knowing which axis matters most for your use case leads directly to the right tier.
How fast does the user need a response? Real-time chat and autocomplete have <500ms budgets. Batch analysis can wait minutes or hours.
How much does output quality affect your product? Classification labels can tolerate 2–5% error. Medical or legal output cannot.
What is your cost per 1,000 requests? High-volume background jobs need cheap tokens. Low-volume human-in-the-loop tasks can afford more.
Model tiers explained
Capability requirements by feature
Some features require specific model support regardless of tier. Check before committing to a model:
| Feature | Minimum tier | Notes |
|---|---|---|
| Vision / image input | Fast+ | Not all fast-tier models support it — check per-model |
| Function calling / tools | Mini+ | Widely supported, but reliability improves in standard tier |
| JSON mode / structured output | Mini+ | Universally supported across current generation models |
| Long context (>128K tokens) | Standard+ | Gemini and Claude offer up to 1M–2M tokens; most others 128K |
| Streaming | Mini+ | Supported by all models on this site |
| Multi-turn agents | Standard+ | Mini-tier models can fail on complex tool orchestration |
Recommendation matrix by use case
Mini-tier models handle classification reliably at minimal cost. Fine-tuning can further reduce cost.
JSON mode support is required. Fast-tier models extract structured data accurately.
Short-to-medium summaries: fast-tier. Long documents or abstractive summarization: standard-tier.
Retrieval accuracy depends heavily on chunking and reranking, not just model quality. Cohere Command R+ is purpose-built for RAG.
Claude Sonnet and GPT-4.1 lead on multi-file code editing. Codestral specializes in code completion at lower cost.
Reasoning-tier models use chain-of-thought internally. Expect slower responses and higher cost per request.
Flagship models produce more coherent long documents. Anthropic models tend to be preferred for creative and nuanced writing.
All three support vision natively. Check that your chosen model tier supports image inputs.
Reliable function calling requires a model that understands tool schemas. Avoid mini-tier for complex multi-step agents.
Prioritize tokens/second and cost per token. Combine with caching and batching for maximum efficiency.
Provider-level considerations
Ready to compare costs? Use the cost calculator to model your specific workload, or read the cost optimization guide for strategies to reduce spend once you've chosen a model.