LLM API Quick Reference

Authentication patterns, example requests, and JSON response shapes for the major LLM provider APIs. All examples use curl and Python.

Common parameters

Most providers follow the OpenAI chat completions schema. Parameters below apply broadly; check provider docs for exact names and defaults.

Parameter	Type	Required	Description
`model`	string	Yes	The model identifier. Provider-specific format (e.g., gpt-4.1-mini, claude-sonnet-4-20250514).
`messages`	array	Yes	Array of message objects with "role" (system/user/assistant) and "content" fields.
`max_tokens`	integer	No	Maximum tokens in the response. Set explicitly to avoid unexpectedly long outputs. Anthropic calls this max_tokens (required).
`temperature`	float	No	Sampling temperature. 0 = deterministic, 1 = maximum randomness. Default varies by provider (usually 0.7–1.0).
`stream`	boolean	No	Stream partial response tokens as server-sent events. Reduces time-to-first-token for real-time UIs.
`top_p`	float	No	Nucleus sampling probability mass. Use temperature OR top_p, not both.
`stop`	string \| string[]	No	One or more sequences where the model will stop generating. Useful for structured output parsing.
`tools`	array	No	Function definitions the model can call. Format differs slightly between providers.

OpenAI

Official docs →

Base URL

https://api.openai.com/v1

Auth header

Authorization: Bearer $OPENAI_API_KEY

Capabilities

JSONStreamingTools

curl

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4.1-mini",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain prompt caching in one sentence." }
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Example response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gpt-4.1-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Prompt caching stores the KV cache of a repeated token prefix so subsequent requests pay a fraction of the normal input token cost."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 38,
    "total_tokens": 70
  }
}

Python

import openai

client = openai.OpenAI()  # reads OPENAI_API_KEY from env

response = client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain prompt caching in one sentence."},
    ],
    max_tokens=256,
    temperature=0.7,
)

print(response.choices[0].message.content)

View all OpenAI models and pricing →

Anthropic

Official docs →

Base URL

https://api.anthropic.com/v1

Auth header

x-api-key: $ANTHROPIC_API_KEY

Capabilities

JSONStreamingTools

curl

curl https://api.anthropic.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 256,
    "system": "You are a helpful assistant.",
    "messages": [
      { "role": "user", "content": "Explain prompt caching in one sentence." }
    ]
  }'

Example response

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-20250514",
  "content": [
    {
      "type": "text",
      "text": "Prompt caching stores the key-value cache of a static prompt prefix so that repeated requests pay only 10% of the normal input token cost for cached tokens."
    }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 32,
    "output_tokens": 38
  }
}

Python

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Explain prompt caching in one sentence."},
    ],
)

print(message.content[0].text)

View all Anthropic models and pricing →

Google Gemini

Official docs →

Base URL

https://generativelanguage.googleapis.com/v1beta

Auth header

?key=$GOOGLE_API_KEY (query param)

Capabilities

JSONStreamingTools

curl

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Explain prompt caching in one sentence." }]
      }
    ],
    "generationConfig": {
      "maxOutputTokens": 256,
      "temperature": 0.7
    },
    "systemInstruction": {
      "parts": [{ "text": "You are a helpful assistant." }]
    }
  }'

Example response

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "Prompt caching stores the computation for a repeated prefix so subsequent API calls skip re-processing those tokens, reducing cost and latency."
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 29,
    "candidatesTokenCount": 35,
    "totalTokenCount": 64
  }
}

Python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

model = genai.GenerativeModel(
    model_name="gemini-2.5-flash",
    system_instruction="You are a helpful assistant.",
)

response = model.generate_content(
    "Explain prompt caching in one sentence.",
    generation_config=genai.types.GenerationConfig(
        max_output_tokens=256,
        temperature=0.7,
    ),
)

print(response.text)

View all Google Gemini models and pricing →

Mistral

Official docs →

Base URL

https://api.mistral.ai/v1

Auth header

Authorization: Bearer $MISTRAL_API_KEY

Capabilities

JSONStreamingTools

curl

curl https://api.mistral.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -d '{
    "model": "mistral-small-latest",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain prompt caching in one sentence." }
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Example response

{
  "id": "cmpl-e5cc70bb28c444948073e77776eb30ef",
  "object": "chat.completion",
  "model": "mistral-small-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Prompt caching stores the key-value attention states for a repeated prompt prefix, allowing subsequent requests to skip recomputing those tokens and reducing both cost and latency."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 42,
    "total_tokens": 71
  }
}

Python

from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

response = client.chat.complete(
    model="mistral-small-latest",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain prompt caching in one sentence."},
    ],
    max_tokens=256,
    temperature=0.7,
)

print(response.choices[0].message.content)

View all Mistral models and pricing →

DeepSeek

Official docs →

Base URL

https://api.deepseek.com

Auth header

Authorization: Bearer $DEEPSEEK_API_KEY

Capabilities

JSONStreamingTools

curl

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -d '{
    "model": "deepseek-chat",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain prompt caching in one sentence." }
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Example response

{
  "id": "930c37e5-a7ab-4e42-8da3-4e7d4e2f7a22",
  "object": "chat.completion",
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Prompt caching reuses the computed key-value states for a static prompt prefix across requests, reducing token computation cost for repeated context."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "prompt_cache_hit_tokens": 0,
    "prompt_cache_miss_tokens": 29,
    "completion_tokens": 34,
    "total_tokens": 63
  }
}

Python

from openai import OpenAI  # DeepSeek uses OpenAI-compatible API
import os

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain prompt caching in one sentence."},
    ],
    max_tokens=256,
    temperature=0.7,
)

print(response.choices[0].message.content)

View all DeepSeek models and pricing →

xAI (Grok)

Official docs →

Base URL

https://api.x.ai/v1

Auth header

Authorization: Bearer $XAI_API_KEY

Capabilities

JSONStreamingTools

curl

curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -d '{
    "model": "grok-3-mini",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain prompt caching in one sentence." }
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Example response

{
  "id": "dc5fec56-6db8-40f2-8e27-f5faf4a8a2e6",
  "object": "chat.completion",
  "model": "grok-3-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Prompt caching stores the transformer key-value states for a fixed prompt prefix so that repeated API calls with the same prefix skip redundant computation."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 39,
    "total_tokens": 68
  }
}

Python

from openai import OpenAI  # xAI uses OpenAI-compatible API
import os

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

response = client.chat.completions.create(
    model="grok-3-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain prompt caching in one sentence."},
    ],
    max_tokens=256,
    temperature=0.7,
)

print(response.choices[0].message.content)

View all xAI (Grok) models and pricing →

OpenAI-compatible providers

Several providers implement the OpenAI chat completions schema. If you're using the OpenAI Python SDK, you can point it at these providers by setting base_url and api_key:

Provider	base_url	API key env var
xAI (Grok)	`https://api.x.ai/v1`	`XAI_API_KEY`
DeepSeek	`https://api.deepseek.com`	`DEEPSEEK_API_KEY`
Together (Llama, etc.)	`https://api.together.xyz/v1`	`TOGETHER_API_KEY`
Mistral	`https://api.mistral.ai/v1`	`MISTRAL_API_KEY`

See the cost optimization guide for strategies to reduce spend, or open the calculator to compare costs before choosing a provider.