Documentation

Mintoken in five minutes.

Mintoken is a drop-in compression proxy for the OpenAI, Anthropic, and Google AI APIs. You change one line of code — a base_url — and every response comes back in roughly 65% fewer tokens, with the same accuracy.

Quickstart

Get an API key and make your first compressed call in under 60 seconds.

OpenAI proxy

Point the OpenAI SDK at mintoken. Every chat completion, compressed.

Authentication

How mintoken API keys work, and how your provider key stays yours.

Analytics

Programmatic access to your usage numbers — tokens, requests, savings.

What mintoken does

Most AI responses are full of filler — articles, hedging phrases, sentence padding. Mintoken sits between your app and the AI provider, injects a compression skill as a system prompt, and the model responds with the same information in dramatically fewer tokens. You pay less per call because you receive fewer tokens.

Everything else — streaming, function calling, vision, JSON mode — is passed through unchanged. Your app doesn't need to know mintoken exists beyond the URL it points at.

The one-line change

Wherever your code reads base_url="https://api.openai.com/v1", replace it with base_url="https://api.mintoken.dev/v1". That's it.

What mintoken does not do

Doesn't change the model. Your modelparameter is forwarded as-is. You keep using GPT-4o, Claude Sonnet, Gemini 2.5, or whatever you're already on.
Doesn't hold your provider key. You send your OpenAI / Anthropic / Google key with every request (via a separate header). Mintoken forwards it, never persists it.
Doesn't compress security-critical content. Security warnings, legal text, compliance language — mintoken detects these and bypasses compression automatically.

Mental model

Mintoken is shaped exactly like the provider it proxies. An OpenAI endpoint on mintoken accepts the OpenAI request body, returns the OpenAI response body. Same for Anthropic. Same for Google. There is no mintoken-specific request format. This is a deliberate design choice: anything that works against the provider works against mintoken.

What mintoken adds on top:

Compression rules injected as a system prompt before your request hits the provider.
Usage tracking — every request gets logged (request count, tokens in, tokens out, duration).
Quota enforcement — plan-based monthly token limits, surfaced as 429s when exceeded.
Smart detection (optional) — classifies your prompt and picks an intensity level automatically.

Where next

If this is your first time here, go to Quickstart— you'll have a compressed response coming back from the API in a couple of minutes. If you already have a key and want the reference docs, jump to OpenAI proxy, Anthropic proxy, or Google Gemini proxy.

Quickstart