Mintoken CLI

A local-proxy CLI that sits between your AI coding tool — Claude Code, Codex, Cursor, Continue.dev, your own LangChain agent — and the upstream LLM. Dedupes repeated tool results, truncates the boring middle of giant logs, saves 40-70% of input tokens on long agent sessions. Same model output, no SDK changes, one env var swap.

OAuth-safe

Unlike the cloud proxy at api.mintoken.dev, the CLI runs on localhost. Your Claude Code OAuth token (or any API key) is forwarded upstream unchanged and never leaves your machine. Compression happens in your own process, in memory, in milliseconds.

Why use the CLI vs the cloud proxy

Both offer the same compression brain. The difference is where the auth token lives:

Cloud proxy (api.mintoken.dev) — best for apps where you control the auth (your own backend, your own SaaS). The API key is your mt_live_… and you supply the upstream provider key per-request via the X-Provider-Key header.
Local CLI (this page) — best for tools that authenticate via OAuth or store credentials locally (Claude Code, Codex, Cursor). The proxy on localhost forwards their existing auth header unchanged. No new keys, no token-handling concerns.

Install

# Install from PyPI (when published)
pip install mintoken-cli

# Or install from source while we're shipping fast
git clone https://github.com/Vijay-2005/mintoken
cd mintoken/mintoken-cli
pip install -e .

Quickstart — Claude Code + Codex side by side

One CLI, one port, both providers. Run it once and point both tools at it.

# (optional) save your mintoken key so the dashboard tracks savings
mintoken login --key mt_live_xxxxx

# start the local proxy
mintoken proxy

# in another shell, point Claude Code at it
export ANTHROPIC_BASE_URL="http://127.0.0.1:8788"
claude

# or point Codex at it
export OPENAI_BASE_URL="http://127.0.0.1:8788/v1"
codex

On first start you'll see an orange-bordered banner with the listening URL, your selected compression level, and the masked mintoken key. Both providers are routed on a single port (default 8788) — Claude Code hits /v1/messages, Codex hits /v1/chat/completions or /v1/responses, the proxy routes correctly.

How it works

Every request your tool makes goes localhost-first:

  Claude Code  /  Codex  /  your app
            │
            │  BASE_URL → localhost
            ▼
  ┌─────────────────────────┐
  │   mintoken-cli          │
  │   on localhost          │
  │                         │
  │   • dedupes tool calls  │
  │   • truncates logs      │
  │   • reports stats *     │
  └────────────┬────────────┘
               │
               │  forwards with same auth header
               ▼
  api.anthropic.com  /  api.openai.com


  * optional → api.mintoken.dev for dashboard analytics

Compression runs before the upstream provider sees the request. Anthropic and OpenAI bill you for the smaller body — which directly translates to fewer tokens consumed against your Claude Code 5-hour window or your OpenAI per-token bill.

Compression levels

Pass --level to mintoken proxy or set it in ~/.mintoken/config.json.

light — engages above 8,000 input tokens. Truncates tool results larger than 4,000 tokens. Last 8 turns untouched. Conservative, almost zero risk.
standard (default) — engages above 4,000 input tokens. Truncates tool results larger than 2,000 tokens. Last 6 turns untouched. The production sweet spot.
aggressive — engages above 2,000 input tokens. Truncates tool results larger than 1,000 tokens. Last 4 turns untouched. Maximum savings; may compress content the model might still want.

Verify it's working

The proxy attaches three response headers to every forwarded request, so you can see compression engaged on each call without leaving your terminal:

HTTP/1.1 200 OK
content-type: application/json
x-mintoken-cli-tokens-before: 9027
x-mintoken-cli-tokens-after:  2947
x-mintoken-cli-tokens-saved:  6080

Or open your dashboard— savings flow there in real time once you've set --key. CLI traffic shows up under endpoint labels like cli-messages and cli-chat-completions.

Use it with your own SDK code

Any code that talks to OpenAI or Anthropic via their official SDK can use the CLI proxy by changing one line:

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8788/v1",   # ← local proxy
    api_key="sk-proj-...",                  # ← your real OpenAI key
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "hello"}],
)

The proxy preserves the request and response shape exactly — your app behaves identically, just with a smaller bill.

What never gets touched

The system prompt
The most recent N messages (recency window — protected per level)
Mid-flight tool_use / tool_result pairs (no orphan calls)
Conversations under the per-level minimum threshold

If compression somehow produces a bigger body than the original (rare edge case where the truncation marker overhead exceeds savings), the proxy returns the original untouched. Compression can never increase what you send upstream.

Privacy

Your auth token (OAuth or API key) is forwarded upstream unchanged through localhost. It never leaves your machine and is never inspected by the proxy.
Request bodies are compressed in memory. Prompts, code, file contents — never logged, never written to disk.
The only thing sent to api.mintoken.dev is a small metric record per request: model name, token counts, duration, compression result. No content. No headers. No prompts.
If you don't set a mintoken key, no telemetry is sent at all — fully local mode.

CLI reference

`mintoken proxy`

Start the local HTTP proxy.

--port -p — local port. Default 8788.
--host — bind address. Default 127.0.0.1. Don't expose to 0.0.0.0 on a shared network.
--key — mintoken API key. Reads from MINTOKEN_KEY env var or ~/.mintoken/config.json if unset. Optional — without it the proxy runs in fully-local mode.
--level -l — light · standard · aggressive. Default standard.
--anthropic-upstream — override the Anthropic base URL (e.g. point to a different region or self-hosted relay). Default https://api.anthropic.com.
--openai-upstream — same for OpenAI. Default https://api.openai.com.

`mintoken login`

Save your mintoken key to ~/.mintoken/config.jsonso you don't have to pass --key on every run. Pass --key mt_live_… non-interactively, or omit for an interactive prompt.

`mintoken status`

Print current config (key prefix, level, upstream URLs). Useful to confirm which key the proxy will use before starting it.

`mintoken version`

Print the installed CLI version.

Environment variables

MINTOKEN_KEY — your mt_live_… key. Used for telemetry only.
MINTOKEN_TELEMETRY_URL — override the telemetry endpoint (used in self-hosted deployments). Default: https://api.mintoken.dev/v1/cli-telemetry.
ANTHROPIC_BASE_URL — set this to http://127.0.0.1:8788 to route Claude Code through the proxy.
OPENAI_BASE_URL — set this to http://127.0.0.1:8788/v1 to route Codex / OpenAI clients.

Troubleshooting

My tool says "authentication failed"

The proxy forwards your auth header unchanged — if the upstream provider rejects it, that's an upstream problem, not a proxy problem. Confirm by setting ANTHROPIC_BASE_URL back to its default and retrying. If your tool works without the proxy but fails with it, open an issue with the response status code from the headers.

My dashboard shows no CLI savings

Did you pass --key or run mintoken login? Without a key, telemetry stays local.
Is the dashboard hitting the right account? CLI telemetry is tagged to the user_id behind the mt_live_… key.
Are your conversations big enough to trigger compression? Below the per-level minimum the proxy is a transparent passthrough.

Port 8788 is already in use

Pass --port 9000(or any free port) and update your tool's BASE_URLenv var to match. The proxy doesn't need a privileged port.

Open source

The CLI is MIT-licensed and lives in the same monorepo as the cloud proxy at github.com/Vijay-2005/mintoken under mintoken-cli/. PRs and issues welcome.

← Previous

Quickstart

Recipes