OpenAI proxy
POST /v1/chat/completions — speaks the OpenAI Chat Completions protocol. Mintoken accepts the same request body and returns the same response body. You keep using the OpenAI SDK.
Endpoint
Base URL: https://api.mintoken.in
Path: POST /v1/chat/completions
Basic example
from openai import OpenAI
client = OpenAI(
base_url="https://api.mintoken.in/v1",
api_key="mt_live_xxxxx",
default_headers={"X-Provider-Key": "sk-proj-xxxxx"},
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a database expert."},
{"role": "user", "content": "How does connection pooling work?"},
],
max_tokens=400,
)
print(response.choices[0].message.content)
messages array to inject a system prompt.Request headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer mt_live_… — your mintoken API key. |
X-Provider-Key | Yes | Your OpenAI key, written into the upstream request. |
X-Mintoken-Intensity | No | Override the API key's default compression level. lite, full, or ultra. Ignored if the API key has smart detection enabled and no explicit override is set. |
Content-Type | Yes | application/json |
Response headers
| Header | Description |
|---|---|
X-Mintoken-Intensity | Which intensity actually ran on this request. Useful when smart detection chose something non-obvious. |
X-Mintoken-Duration-Ms | Total milliseconds mintoken spent on the request, including the upstream round-trip. |
X-Mintoken-Tokens-Used | Your monthly quota usage after this request. Only present when approaching your limit. |
Streaming
Pass stream: true exactly as you would with the OpenAI API. Mintoken forwards the SSE chunks as they arrive, adds the X-Mintoken-Intensity response header, and tracks usage after the stream closes.
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Count to 10"}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
Overriding intensity per request
Each API key has a default intensity (set at creation or editable in the dashboard). You can override it on a specific request with a header or query param:
curl https://api.mintoken.in/v1/chat/completions \
-H "Authorization: Bearer mt_live_xxxxx" \
-H "X-Provider-Key: sk-proj-xxxxx" \
-H "X-Mintoken-Intensity: ultra" \
-H "Content-Type: application/json" \
-d '{ "model": "gpt-4o-mini", "messages": [...] }'
If you want mintoken to pick automatically, enable smart detection on the key — see Smart detection.
Supported models
Any model your OpenAI account has access to. Mintoken doesn't maintain an allowlist — the model string is forwarded as-is. At time of writing, the common picks:
gpt-4o-mini— best cost/quality for most workloadsgpt-4o— highest quality, higher costgpt-4.1-mini/gpt-5-mini— newer, check your account accesso1/o3— reasoning models; mintoken compression applies to the visible output, not the reasoning traces
Error responses
Upstream OpenAI errors (4xx and 5xx) are relayed back to you with the original status code and body. Mintoken-specific errors use the following status codes:
| Status | When |
|---|---|
| 401 | Missing / invalid Authorization header |
| 400 | Missing X-Provider-Key header |
| 429 | Monthly token quota exceeded. Response includes X-Mintoken-Tokens-Used and X-Mintoken-Tokens-Limit. |
| 502 | Upstream provider returned a non-JSON response, or the connection failed. |
| 504 | Upstream provider exceeded the 120-second timeout. |