API reference

Streaming

Mintoken streams Server-Sent Events without buffering. Chunks arrive at the same pace they leave the upstream provider. Your streaming UI works unchanged.

How it works

When your request sets stream: true, mintoken opens an HTTP connection to the upstream provider and pipes the SSE body back to you as each chunk arrives. The compression rules are still injected as a system prompt before the request — they affect what the model generates, not how fast chunks flow.

Examples

for chunk in client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True,
):
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Event shapes

OpenAI — standard data: {...} chunks, terminated by data: [DONE].
Anthropic — named events message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop.
Google — use the streaming endpoint suffix :streamGenerateContent upstream; mintoken mirrors the chunking behavior.

Token counts are logged post-stream

Because stream chunks arrive out-of-band, mintoken counts tokens after the final chunk. You'll see analytics for a stream request appear a second or two after a non-stream one, but no requests are lost.

Errors mid-stream

If the upstream provider returns an error mid-stream, mintoken forwards it to you as the provider emitted it (OpenAI sends an errorevent, Anthropic aborts the stream with a status), then closes the connection. Your SDK's error handling path should trigger as it would against the upstream provider directly.

← Previous

Google Gemini proxy

Text compression