Chat Completions

POST /v1/chat/completions is the primary FlowAPI inference endpoint. It follows the familiar OpenAI-compatible chat format and is the recommended interface for most language and multimodal generation workflows.

Endpoint


POST https://api.flowapi.net/v1/chat/completions

Headers


Authorization: Bearer YOUR_FLOW_API_KEY
Content-Type: application/json

You can also provide X-Request-ID for tracing.

Request Body

Use one of the following examples to send a request in your preferred language.

from openai import OpenAI

client = OpenAI(
  api_key="YOUR_FLOWAPI_API_KEY",
  base_url="https://api.flowapi.net/v1",
)

response = client.chat.completions.create(
  model="deepseek-ai/DeepSeek-V3.2",
  messages=[
      {
          "role": "system",
          "content": "You are a concise technical assistant.",
      },
      {
          "role": "user",
          "content": "Explain why embeddings are useful in RAG systems.",
      },
  ],
  enable_thinking=False,
  thinking_budget=4096,
  temperature=0.7,
  top_p=0.7,
  max_tokens=1024,
  stream=False,
)

print(response.choices[0].message.content)

Parameters

`model`

string required

The model ID to call.

Example:


"deepseek-ai/DeepSeek-V3.2"

`messages`

array required

A list of messages comprising the conversation so far in OpenAI-compatible format.

Each message may use plain string content or multimodal content parts such as text and image_url when the selected model supports them.

Example:


[
  {
    "role": "system",
    "content": "You are a concise technical assistant."
  },
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "Explain what is shown in this image."
      },
      {
        "type": "image_url",
        "image_url": {
          "url": "https://example.com/image.png",
          "detail": "high"
        }
      }
    ]
  }
]

`stream`

boolean

If set, tokens are returned as Server-Sent Events as they are made available.

Example:


false

`max_tokens`

integer

Maximum number of output tokens to generate.

Example:

`enable_thinking`

boolean

Enables thinking mode for supported reasoning models.

This is a provider-specific passthrough field. It may be accepted by compatible upstream models and ignored by models that do not support thinking mode.

Example:


false

`thinking_budget`

integer default: 4096

Maximum number of tokens for chain-of-thought output. This field applies to supported reasoning models when thinking mode is enabled.

Required range: 128 <= x <= 32768

This is a provider-specific passthrough field. Support depends on the selected upstream model.

Example:

`stop`

string | string[]

Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

Example:


["</answer>"]

`temperature`

number

Determines the degree of randomness in the response.

Typical range: 0.0 to 2.0

Example:

0.7

`top_p`

number default: 0.7

The top_p nucleus sampling parameter dynamically adjusts the number of token choices considered during decoding.

Example:

0.7

`frequency_penalty`

number

Penalizes repeated tokens to reduce repetition in the output.

Example:

0.5

`presence_penalty`

number

Encourages the model to introduce new topics or less-repeated tokens.

Example:

`response_format`

object

An object specifying the format that the model must output.

For basic JSON mode, use { "type": "json_object" }.

Example:


{
  "type": "json_object"
}

`n`

integer

Number of completions to generate.

Example:

`tools`

object[]

A list of tools the model may call. Currently, function-style tools are the most portable option across OpenAI-compatible providers.

Example:


[
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get the weather for a city",
      "parameters": {
        "type": "object",
        "properties": {
          "city": {
            "type": "string"
          }
        },
        "required": ["city"]
      }
    }
  }
]

Response Examples

Use the following examples to understand the most common response shapes returned by this endpoint.

{
"id": "chatcmpl-flowapi-123",
"object": "chat.completion",
"created": 1713000000,
"model": "deepseek-ai/DeepSeek-V3.2",
"choices": [
  {
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Embeddings convert text into dense vectors, making semantic retrieval possible in RAG systems."
    },
    "finish_reason": "stop"
  }
],
"usage": {
  "prompt_tokens": 24,
  "completion_tokens": 18,
  "total_tokens": 42
}
}

When stream=true, the endpoint returns SSE chunks with object: "chat.completion.chunk" and incremental delta payloads.

See Streaming for detailed SSE handling guidance.

Notes

Model availability may change over time as models are added, adjusted, or retired.
Some advanced parameters are provider-specific and may be ignored by unsupported models.
Multimodal input is model-dependent. Use a model that explicitly supports vision or mixed content when sending image parts.
Request compatibility depends on the selected upstream provider and model. FlowAPI keeps the main request shape stable, while supported provider-specific fields may be forwarded when available.

If you want the most portable request format across providers, stick to model, messages, temperature, top_p, max_tokens, stream, and response_format. Use enable_thinking and thinking_budget only with models that explicitly support reasoning mode.