Streaming
Mintoken streams Server-Sent Events without buffering. Chunks arrive at the same pace they leave the upstream provider. Your streaming UI works unchanged.
How it works
When your request sets stream: true, mintoken opens an HTTP connection to the upstream provider and pipes the SSE body back to you as each chunk arrives. The compression rules are still injected as a system prompt before the request — they affect what the model generates, not how fast chunks flow.
Examples
for chunk in client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Count to 10"}],
stream=True,
):
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
Event shapes
- OpenAI — standard
data: {...}chunks, terminated bydata: [DONE]. - Anthropic — named events
message_start,content_block_start,content_block_delta,content_block_stop,message_delta,message_stop. - Google — use the streaming endpoint suffix
:streamGenerateContentupstream; mintoken mirrors the chunking behavior.
Errors mid-stream
If the upstream provider returns an error mid-stream, mintoken forwards it to you as the provider emitted it (OpenAI sends an errorevent, Anthropic aborts the stream with a status), then closes the connection. Your SDK's error handling path should trigger as it would against the upstream provider directly.