Question 1

Will my app break?

Accepted Answer

No. Mintoken is a transparent proxy that speaks the native OpenAI, Anthropic, and Google APIs. Your request/response JSON is identical — you change only the base URL. We run the full test suite of each provider's SDK against mintoken.

Question 2

Which models are supported?

Accepted Answer

Every OpenAI chat model (gpt-4o, gpt-4o-mini, o1, o3, gpt-5.x), every Anthropic model (Claude 3.5, 4.x), and every Google Gemini model. Streaming, function calling, vision inputs, JSON mode — all passthrough transparently.

Question 3

What's context compression and how is it different from output compression?

Accepted Answer

Output compression trims the response — fewer tokens come back from the model. Context compression trims the request — fewer tokens go to the model on every turn. For long agent sessions (Claude Code, AutoGPT-style workflows), the conversation history piles up and re-sends as cache reads on every message. We truncate huge tool results and deduplicate repeated file contents in older turns, so you stop paying to re-send the same context forever.

Question 4

What does context compression NOT touch?

Accepted Answer

The system prompt, your most recent 6 turns, code blocks, error messages, file paths, commands, and any tool_use/tool_result pair that's still in flight. Anything that the model genuinely needs verbatim stays verbatim. The science (Stanford's 'Lost in the Middle' paper) shows models barely attend to the middle of long contexts anyway — that's the slack we exploit.

Question 5

Doesn't Anthropic's prompt cache already solve this?

Accepted Answer

It makes re-reading cheaper (~10x cheaper per token) but doesn't reduce the number of tokens — your context keeps growing every turn. Caching is a discount on a bill that still scales with session length. Mintoken actually shrinks the bill. The two play well together: we shrink the prefix, Anthropic caches what's left.

Question 6

Is the compression lossy? Will accuracy suffer?

Accepted Answer

Output compression: zero accuracy loss measured on technical content. Filler dies, facts stay. Context compression Tier 1: heuristic only — we never paraphrase, only truncate stale tool outputs and deduplicate file reads. The compressor is opt-in and includes a safety net: if it would yield no savings, your original body is forwarded untouched.

Question 7

Is there a free tier?

Accepted Answer

Yes — 100k tokens/month, 2 API keys, basic analytics. No credit card required. Free tier users get the same compression quality as paid tiers — the only differences are quota and advanced features.

Question 8

How do you make money if the skill is open source?

Accepted Answer

The compression skill itself is free and MIT-licensed. The paid product is the hosted proxy: multi-provider routing, centralized analytics, team keys, smart auto-detection, custom vocab rules, SLAs. Think of it like Docker + Docker Hub — the tool is free, the managed service is paid.

Question 9

Can I self-host?

Accepted Answer

Enterprise plan includes a self-hosted Docker image with the same feature set. Full source code access. Typically used by fintech and healthcare customers needing air-gapped deployments.

Question 10

What about streaming responses?

Accepted Answer

Fully supported. SSE streams pass through unchanged — mintoken injects rules server-side before forwarding. Your streaming UI works without any change. Context compression also works with streaming requests.

Question 11

What if a request exceeds my plan's token limit?

Accepted Answer

Zero-spend protection: the request returns a 429 with clear headers (X-Mintoken-Tokens-Used, X-Mintoken-Tokens-Limit). No surprise overage bills. Upgrade in the dashboard to raise the ceiling.

Cut your AI bill,
not your brain.

One line of code. Two axes of savings.

Sign up, get a key

Change one line

Watch both sides drop

95% of your AI bill is cache reads.
We're the only proxy attacking that.

Truncate huge tool results

Deduplicate repeated reads

Recency window protected

Toggle it on. Watch your cache reads halve.

Show me the money.

Built for people who ship, not talk.

Bootstrapping? Mintoken is your CFO.

Not marketing numbers.
Measured ones.

Save more than you spend.

Free

Pro

Team

Enterprise

Questions, answered.

Start saving.
Like, right now.

Cut your AI bill,not your brain.