API reference

OpenAI proxy

POST /v1/chat/completions — speaks the OpenAI Chat Completions protocol. Mintoken accepts the same request body and returns the same response body. You keep using the OpenAI SDK.

Endpoint

Base URL: https://api.mintoken.in
Path: POST /v1/chat/completions

Basic example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.mintoken.in/v1",
    api_key="mt_live_xxxxx",
    default_headers={"X-Provider-Key": "sk-proj-xxxxx"},
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a database expert."},
        {"role": "user", "content": "How does connection pooling work?"},
    ],
    max_tokens=400,
)

print(response.choices[0].message.content)
Everything else works unchanged
Function calling, JSON mode, vision (image_url blocks), tool calls, response format, temperature, seed — the whole OpenAI surface is passthrough. Mintoken only touches the messages array to inject a system prompt.

Request headers

HeaderRequiredDescription
AuthorizationYesBearer mt_live_… — your mintoken API key.
X-Provider-KeyYesYour OpenAI key, written into the upstream request.
X-Mintoken-IntensityNoOverride the API key's default compression level. lite, full, or ultra. Ignored if the API key has smart detection enabled and no explicit override is set.
Content-TypeYesapplication/json

Response headers

HeaderDescription
X-Mintoken-IntensityWhich intensity actually ran on this request. Useful when smart detection chose something non-obvious.
X-Mintoken-Duration-MsTotal milliseconds mintoken spent on the request, including the upstream round-trip.
X-Mintoken-Tokens-UsedYour monthly quota usage after this request. Only present when approaching your limit.

Streaming

Pass stream: true exactly as you would with the OpenAI API. Mintoken forwards the SSE chunks as they arrive, adds the X-Mintoken-Intensity response header, and tracks usage after the stream closes.

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
Token accounting on streams
Because the stream finishes at the provider's pace, we count tokens after the final chunk arrives. Your dashboard may show stream requests a few seconds behind non-stream requests. No data is lost.

Overriding intensity per request

Each API key has a default intensity (set at creation or editable in the dashboard). You can override it on a specific request with a header or query param:

curl https://api.mintoken.in/v1/chat/completions \
  -H "Authorization: Bearer mt_live_xxxxx" \
  -H "X-Provider-Key: sk-proj-xxxxx" \
  -H "X-Mintoken-Intensity: ultra" \
  -H "Content-Type: application/json" \
  -d '{ "model": "gpt-4o-mini", "messages": [...] }'

If you want mintoken to pick automatically, enable smart detection on the key — see Smart detection.

Supported models

Any model your OpenAI account has access to. Mintoken doesn't maintain an allowlist — the model string is forwarded as-is. At time of writing, the common picks:

  • gpt-4o-mini — best cost/quality for most workloads
  • gpt-4o — highest quality, higher cost
  • gpt-4.1-mini / gpt-5-mini — newer, check your account access
  • o1 / o3 — reasoning models; mintoken compression applies to the visible output, not the reasoning traces

Error responses

Upstream OpenAI errors (4xx and 5xx) are relayed back to you with the original status code and body. Mintoken-specific errors use the following status codes:

StatusWhen
401Missing / invalid Authorization header
400Missing X-Provider-Key header
429Monthly token quota exceeded. Response includes X-Mintoken-Tokens-Used and X-Mintoken-Tokens-Limit.
502Upstream provider returned a non-JSON response, or the connection failed.
504Upstream provider exceeded the 120-second timeout.