Skip to Content
API ReferenceStreaming

Streaming

FlowAPI supports Server-Sent Events (SSE) for chat completion streaming. Streaming is recommended for long outputs because it reduces perceived latency and helps avoid timeout issues on slow or verbose generations.

Enable Streaming

Set stream to true in the request body:

{ "model": "deepseek-ai/DeepSeek-V3.2", "messages": [ { "role": "user", "content": "Write a long tutorial about SSE." } ], "stream": true }

HTTP Example

curl https://api.flowapi.net/v1/chat/completions \ -H "Authorization: Bearer YOUR_FLOW_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-ai/DeepSeek-V3.2", "messages": [ { "role": "user", "content": "Write a haiku about distributed systems." } ], "stream": true }'

Python Example

from openai import OpenAI client = OpenAI( api_key="YOUR_FLOW_API_KEY", base_url="https://api.flowapi.net/v1" ) stream = client.chat.completions.create( model="deepseek-ai/DeepSeek-V3.2", messages=[{"role": "user", "content": "Write a haiku about distributed systems."}], stream=True, ) for chunk in stream: if not chunk.choices: continue delta = chunk.choices[0].delta if delta.content: print(delta.content, end="", flush=True)

Event Shape

Each streamed chunk usually contains:

{ "id": "chatcmpl-abc123", "object": "chat.completion.chunk", "created": 1713000001, "model": "deepseek-ai/DeepSeek-V3.2", "choices": [ { "index": 0, "delta": { "content": "Hello" }, "finish_reason": null } ] }

The final chunk usually sets finish_reason and may include usage details depending on the upstream provider.

Parsing Guidance

  • Ignore chunks with empty choices unless you specifically need final usage info.
  • Append delta.content in arrival order.
  • Watch finish_reason to know when generation is complete.
  • Handle network disconnects separately from model-side errors.

Best Practices

  • Use streaming for long responses, chain-of-thought-adjacent tasks, or large documents.
  • Increase client-side timeout settings even when using SSE.
  • Do not assume every chunk contains text; some may contain role markers or tool call data.

Some upstream providers may emit slightly different chunk timing or optional fields. Your client should be tolerant of missing content values and provider-specific metadata.

Last updated on