Skip to Content
API ReferenceChat Completions

Chat Completions

POST /v1/chat/completions is the primary FlowAPI inference endpoint. It follows the familiar OpenAI-compatible chat format and is the recommended interface for most language and multimodal generation workflows.

Endpoint

POST https://api.flowapi.net/v1/chat/completions

Headers

Authorization: Bearer YOUR_FLOW_API_KEY Content-Type: application/json

You can also provide X-Request-ID for tracing.

Request Body

Use one of the following examples to send a request in your preferred language.

from openai import OpenAI

client = OpenAI(
  api_key="YOUR_FLOWAPI_API_KEY",
  base_url="https://api.flowapi.net/v1",
)

response = client.chat.completions.create(
  model="deepseek-ai/DeepSeek-V3.2",
  messages=[
      {
          "role": "system",
          "content": "You are a concise technical assistant.",
      },
      {
          "role": "user",
          "content": "Explain why embeddings are useful in RAG systems.",
      },
  ],
  enable_thinking=False,
  thinking_budget=4096,
  temperature=0.7,
  top_p=0.7,
  max_tokens=1024,
  stream=False,
)

print(response.choices[0].message.content)

Parameters

model

string required

The model ID to call.

Example:

"deepseek-ai/DeepSeek-V3.2"

messages

array required

A list of messages comprising the conversation so far in OpenAI-compatible format.

Each message may use plain string content or multimodal content parts such as text and image_url when the selected model supports them.

Example:

[ { "role": "system", "content": "You are a concise technical assistant." }, { "role": "user", "content": [ { "type": "text", "text": "Explain what is shown in this image." }, { "type": "image_url", "image_url": { "url": "https://example.com/image.png", "detail": "high" } } ] } ]

stream

boolean

If set, tokens are returned as Server-Sent Events as they are made available.

Example:

false

max_tokens

integer

Maximum number of output tokens to generate.

Example:

1024

enable_thinking

boolean

Enables thinking mode for supported reasoning models.

This is a provider-specific passthrough field. It may be accepted by compatible upstream models and ignored by models that do not support thinking mode.

Example:

false

thinking_budget

integer default: 4096

Maximum number of tokens for chain-of-thought output. This field applies to supported reasoning models when thinking mode is enabled.

Required range: 128 <= x <= 32768

This is a provider-specific passthrough field. Support depends on the selected upstream model.

Example:

4096

stop

string | string[]

Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

Example:

["</answer>"]

temperature

number

Determines the degree of randomness in the response.

Typical range: 0.0 to 2.0

Example:

0.7

top_p

number default: 0.7

The top_p nucleus sampling parameter dynamically adjusts the number of token choices considered during decoding.

Example:

0.7

frequency_penalty

number

Penalizes repeated tokens to reduce repetition in the output.

Example:

0.5

presence_penalty

number

Encourages the model to introduce new topics or less-repeated tokens.

Example:

0

response_format

object

An object specifying the format that the model must output.

For basic JSON mode, use { "type": "json_object" }.

Example:

{ "type": "json_object" }

n

integer

Number of completions to generate.

Example:

1

tools

object[]

A list of tools the model may call. Currently, function-style tools are the most portable option across OpenAI-compatible providers.

Example:

[ { "type": "function", "function": { "name": "get_weather", "description": "Get the weather for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] } } } ]

Response Examples

Use the following examples to understand the most common response shapes returned by this endpoint.

{
"id": "chatcmpl-flowapi-123",
"object": "chat.completion",
"created": 1713000000,
"model": "deepseek-ai/DeepSeek-V3.2",
"choices": [
  {
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Embeddings convert text into dense vectors, making semantic retrieval possible in RAG systems."
    },
    "finish_reason": "stop"
  }
],
"usage": {
  "prompt_tokens": 24,
  "completion_tokens": 18,
  "total_tokens": 42
}
}

When stream=true, the endpoint returns SSE chunks with object: "chat.completion.chunk" and incremental delta payloads.

See Streaming for detailed SSE handling guidance.

Notes

  • Model availability may change over time as models are added, adjusted, or retired.
  • Some advanced parameters are provider-specific and may be ignored by unsupported models.
  • Multimodal input is model-dependent. Use a model that explicitly supports vision or mixed content when sending image parts.
  • Request compatibility depends on the selected upstream provider and model. FlowAPI keeps the main request shape stable, while supported provider-specific fields may be forwarded when available.

If you want the most portable request format across providers, stick to model, messages, temperature, top_p, max_tokens, stream, and response_format. Use enable_thinking and thinking_budget only with models that explicitly support reasoning mode.

Last updated on