LLM API Quick Reference

Authentication patterns, example requests, and JSON response shapes for the major LLM provider APIs. All examples use curl and Python.

Common parameters

Most providers follow the OpenAI chat completions schema. Parameters below apply broadly; check provider docs for exact names and defaults.

ParameterTypeRequiredDescription
modelstringYesThe model identifier. Provider-specific format (e.g., gpt-4.1-mini, claude-sonnet-4-20250514).
messagesarrayYesArray of message objects with "role" (system/user/assistant) and "content" fields.
max_tokensintegerNoMaximum tokens in the response. Set explicitly to avoid unexpectedly long outputs. Anthropic calls this max_tokens (required).
temperaturefloatNoSampling temperature. 0 = deterministic, 1 = maximum randomness. Default varies by provider (usually 0.7–1.0).
streambooleanNoStream partial response tokens as server-sent events. Reduces time-to-first-token for real-time UIs.
top_pfloatNoNucleus sampling probability mass. Use temperature OR top_p, not both.
stopstring | string[]NoOne or more sequences where the model will stop generating. Useful for structured output parsing.
toolsarrayNoFunction definitions the model can call. Format differs slightly between providers.
Base URL
https://api.openai.com/v1
Auth header
Authorization: Bearer $OPENAI_API_KEY
Capabilities
JSONStreamingTools
curl
curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4.1-mini",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain prompt caching in one sentence." }
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'
Example response
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gpt-4.1-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Prompt caching stores the KV cache of a repeated token prefix so subsequent requests pay a fraction of the normal input token cost."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 38,
    "total_tokens": 70
  }
}
Python
import openai

client = openai.OpenAI()  # reads OPENAI_API_KEY from env

response = client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain prompt caching in one sentence."},
    ],
    max_tokens=256,
    temperature=0.7,
)

print(response.choices[0].message.content)

View all OpenAI models and pricing →

Base URL
https://api.anthropic.com/v1
Auth header
x-api-key: $ANTHROPIC_API_KEY
Capabilities
JSONStreamingTools
curl
curl https://api.anthropic.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 256,
    "system": "You are a helpful assistant.",
    "messages": [
      { "role": "user", "content": "Explain prompt caching in one sentence." }
    ]
  }'
Example response
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-20250514",
  "content": [
    {
      "type": "text",
      "text": "Prompt caching stores the key-value cache of a static prompt prefix so that repeated requests pay only 10% of the normal input token cost for cached tokens."
    }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 32,
    "output_tokens": 38
  }
}
Python
import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Explain prompt caching in one sentence."},
    ],
)

print(message.content[0].text)

View all Anthropic models and pricing →

Google Gemini

Official docs →
Base URL
https://generativelanguage.googleapis.com/v1beta
Auth header
?key=$GOOGLE_API_KEY (query param)
Capabilities
JSONStreamingTools
curl
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Explain prompt caching in one sentence." }]
      }
    ],
    "generationConfig": {
      "maxOutputTokens": 256,
      "temperature": 0.7
    },
    "systemInstruction": {
      "parts": [{ "text": "You are a helpful assistant." }]
    }
  }'
Example response
{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "Prompt caching stores the computation for a repeated prefix so subsequent API calls skip re-processing those tokens, reducing cost and latency."
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 29,
    "candidatesTokenCount": 35,
    "totalTokenCount": 64
  }
}
Python
import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

model = genai.GenerativeModel(
    model_name="gemini-2.5-flash",
    system_instruction="You are a helpful assistant.",
)

response = model.generate_content(
    "Explain prompt caching in one sentence.",
    generation_config=genai.types.GenerationConfig(
        max_output_tokens=256,
        temperature=0.7,
    ),
)

print(response.text)

View all Google Gemini models and pricing →

Base URL
https://api.mistral.ai/v1
Auth header
Authorization: Bearer $MISTRAL_API_KEY
Capabilities
JSONStreamingTools
curl
curl https://api.mistral.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -d '{
    "model": "mistral-small-latest",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain prompt caching in one sentence." }
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'
Example response
{
  "id": "cmpl-e5cc70bb28c444948073e77776eb30ef",
  "object": "chat.completion",
  "model": "mistral-small-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Prompt caching stores the key-value attention states for a repeated prompt prefix, allowing subsequent requests to skip recomputing those tokens and reducing both cost and latency."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 42,
    "total_tokens": 71
  }
}
Python
from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

response = client.chat.complete(
    model="mistral-small-latest",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain prompt caching in one sentence."},
    ],
    max_tokens=256,
    temperature=0.7,
)

print(response.choices[0].message.content)

View all Mistral models and pricing →

Base URL
https://api.deepseek.com
Auth header
Authorization: Bearer $DEEPSEEK_API_KEY
Capabilities
JSONStreamingTools
curl
curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -d '{
    "model": "deepseek-chat",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain prompt caching in one sentence." }
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'
Example response
{
  "id": "930c37e5-a7ab-4e42-8da3-4e7d4e2f7a22",
  "object": "chat.completion",
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Prompt caching reuses the computed key-value states for a static prompt prefix across requests, reducing token computation cost for repeated context."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "prompt_cache_hit_tokens": 0,
    "prompt_cache_miss_tokens": 29,
    "completion_tokens": 34,
    "total_tokens": 63
  }
}
Python
from openai import OpenAI  # DeepSeek uses OpenAI-compatible API
import os

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain prompt caching in one sentence."},
    ],
    max_tokens=256,
    temperature=0.7,
)

print(response.choices[0].message.content)

View all DeepSeek models and pricing →

Base URL
https://api.x.ai/v1
Auth header
Authorization: Bearer $XAI_API_KEY
Capabilities
JSONStreamingTools
curl
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -d '{
    "model": "grok-3-mini",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain prompt caching in one sentence." }
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'
Example response
{
  "id": "dc5fec56-6db8-40f2-8e27-f5faf4a8a2e6",
  "object": "chat.completion",
  "model": "grok-3-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Prompt caching stores the transformer key-value states for a fixed prompt prefix so that repeated API calls with the same prefix skip redundant computation."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 39,
    "total_tokens": 68
  }
}
Python
from openai import OpenAI  # xAI uses OpenAI-compatible API
import os

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

response = client.chat.completions.create(
    model="grok-3-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain prompt caching in one sentence."},
    ],
    max_tokens=256,
    temperature=0.7,
)

print(response.choices[0].message.content)

View all xAI (Grok) models and pricing →

OpenAI-compatible providers

Several providers implement the OpenAI chat completions schema. If you're using the OpenAI Python SDK, you can point it at these providers by setting base_url and api_key:

Providerbase_urlAPI key env var
xAI (Grok)https://api.x.ai/v1XAI_API_KEY
DeepSeekhttps://api.deepseek.comDEEPSEEK_API_KEY
Together (Llama, etc.)https://api.together.xyz/v1TOGETHER_API_KEY
Mistralhttps://api.mistral.ai/v1MISTRAL_API_KEY

See the cost optimization guide for strategies to reduce spend, or open the calculator to compare costs before choosing a provider.