API reference

OpenAI proxy

POST /v1/chat/completions — speaks the OpenAI Chat Completions protocol. Mintoken accepts the same request body and returns the same response body. You keep using the OpenAI SDK.

Endpoint

Base URL: https://api.mintoken.dev
Path: POST /v1/chat/completions

Basic example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.mintoken.dev/v1",
    api_key="mt_live_xxxxx",
    default_headers={"X-Provider-Key": "sk-proj-xxxxx"},
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a database expert."},
        {"role": "user", "content": "How does connection pooling work?"},
    ],
    max_tokens=400,
)

print(response.choices[0].message.content)

Everything else works unchanged

Function calling, JSON mode, vision (image_url blocks), tool calls, response format, temperature, seed — the whole OpenAI surface is passthrough. Mintoken only touches the messages array to inject a system prompt.

Request headers

Header	Required	Description
`Authorization`	Yes	`Bearer mt_live_…` — your mintoken API key.
`X-Provider-Key`	Yes	Your OpenAI key, written into the upstream request.
`X-Mintoken-Intensity`	No	Override the API key's default compression level. `lite`, `full`, or `ultra`. Ignored if the API key has smart detection enabled and no explicit override is set.
`Content-Type`	Yes	`application/json`

Response headers

Header	Description
`X-Mintoken-Intensity`	Which intensity actually ran on this request. Useful when smart detection chose something non-obvious.
`X-Mintoken-Duration-Ms`	Total milliseconds mintoken spent on the request, including the upstream round-trip.
`X-Mintoken-Tokens-Used`	Your monthly quota usage after this request. Only present when approaching your limit.

Streaming

Pass stream: true exactly as you would with the OpenAI API. Mintoken forwards the SSE chunks as they arrive, adds the X-Mintoken-Intensity response header, and tracks usage after the stream closes.

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Token accounting on streams

Because the stream finishes at the provider's pace, we count tokens after the final chunk arrives. Your dashboard may show stream requests a few seconds behind non-stream requests. No data is lost.

Overriding intensity per request

Each API key has a default intensity (set at creation or editable in the dashboard). You can override it on a specific request with a header or query param:

curl https://api.mintoken.dev/v1/chat/completions \
  -H "Authorization: Bearer mt_live_xxxxx" \
  -H "X-Provider-Key: sk-proj-xxxxx" \
  -H "X-Mintoken-Intensity: ultra" \
  -H "Content-Type: application/json" \
  -d '{ "model": "gpt-4o-mini", "messages": [...] }'

If you want mintoken to pick automatically, enable smart detection on the key — see Smart detection.

Supported models

Any model your OpenAI account has access to. Mintoken doesn't maintain an allowlist — the model string is forwarded as-is. At time of writing, the common picks:

gpt-4o-mini — best cost/quality for most workloads
gpt-4o — highest quality, higher cost
gpt-4.1-mini / gpt-5-mini — newer, check your account access
o1 / o3 — reasoning models; mintoken compression applies to the visible output, not the reasoning traces

Error responses

Upstream OpenAI errors (4xx and 5xx) are relayed back to you with the original status code and body. Mintoken-specific errors use the following status codes:

Status	When
401	Missing / invalid `Authorization` header
400	Missing `X-Provider-Key` header
429	Monthly token quota exceeded. Response includes `X-Mintoken-Tokens-Used` and `X-Mintoken-Tokens-Limit`.
502	Upstream provider returned a non-JSON response, or the connection failed.
504	Upstream provider exceeded the 120-second timeout.

← Previous

Authentication

Anthropic proxy