Chat Completions
POST /v1/chat/completions is the primary FlowAPI inference endpoint. It follows the familiar OpenAI-compatible chat format and is the recommended interface for most language and multimodal generation workflows.
Endpoint
POST https://api.flowapi.net/v1/chat/completionsHeaders
Authorization: Bearer YOUR_FLOW_API_KEY
Content-Type: application/jsonYou can also provide X-Request-ID for tracing.
Request Body
Use one of the following examples to send a request in your preferred language.
from openai import OpenAI
client = OpenAI(
api_key="YOUR_FLOWAPI_API_KEY",
base_url="https://api.flowapi.net/v1",
)
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3.2",
messages=[
{
"role": "system",
"content": "You are a concise technical assistant.",
},
{
"role": "user",
"content": "Explain why embeddings are useful in RAG systems.",
},
],
enable_thinking=False,
thinking_budget=4096,
temperature=0.7,
top_p=0.7,
max_tokens=1024,
stream=False,
)
print(response.choices[0].message.content)Parameters
model
string required
The model ID to call.
Example:
"deepseek-ai/DeepSeek-V3.2"messages
array required
A list of messages comprising the conversation so far in OpenAI-compatible format.
Each message may use plain string content or multimodal content parts such as text and image_url when the selected model supports them.
Example:
[
{
"role": "system",
"content": "You are a concise technical assistant."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Explain what is shown in this image."
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.png",
"detail": "high"
}
}
]
}
]stream
boolean
If set, tokens are returned as Server-Sent Events as they are made available.
Example:
falsemax_tokens
integer
Maximum number of output tokens to generate.
Example:
1024enable_thinking
boolean
Enables thinking mode for supported reasoning models.
This is a provider-specific passthrough field. It may be accepted by compatible upstream models and ignored by models that do not support thinking mode.
Example:
falsethinking_budget
integer default: 4096
Maximum number of tokens for chain-of-thought output. This field applies to supported reasoning models when thinking mode is enabled.
Required range: 128 <= x <= 32768
This is a provider-specific passthrough field. Support depends on the selected upstream model.
Example:
4096stop
string | string[]
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Example:
["</answer>"]temperature
number
Determines the degree of randomness in the response.
Typical range: 0.0 to 2.0
Example:
0.7top_p
number default: 0.7
The top_p nucleus sampling parameter dynamically adjusts the number of token choices considered during decoding.
Example:
0.7frequency_penalty
number
Penalizes repeated tokens to reduce repetition in the output.
Example:
0.5presence_penalty
number
Encourages the model to introduce new topics or less-repeated tokens.
Example:
0response_format
object
An object specifying the format that the model must output.
For basic JSON mode, use { "type": "json_object" }.
Example:
{
"type": "json_object"
}n
integer
Number of completions to generate.
Example:
1tools
object[]
A list of tools the model may call. Currently, function-style tools are the most portable option across OpenAI-compatible providers.
Example:
[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string"
}
},
"required": ["city"]
}
}
}
]Response Examples
Use the following examples to understand the most common response shapes returned by this endpoint.
{
"id": "chatcmpl-flowapi-123",
"object": "chat.completion",
"created": 1713000000,
"model": "deepseek-ai/DeepSeek-V3.2",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Embeddings convert text into dense vectors, making semantic retrieval possible in RAG systems."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 18,
"total_tokens": 42
}
}When stream=true, the endpoint returns SSE chunks with object: "chat.completion.chunk" and incremental delta payloads.
See Streaming for detailed SSE handling guidance.
Notes
- Model availability may change over time as models are added, adjusted, or retired.
- Some advanced parameters are provider-specific and may be ignored by unsupported models.
- Multimodal input is model-dependent. Use a model that explicitly supports vision or mixed content when sending image parts.
- Request compatibility depends on the selected upstream provider and model. FlowAPI keeps the main request shape stable, while supported provider-specific fields may be forwarded when available.
If you want the most portable request format across providers, stick to model, messages, temperature, top_p, max_tokens, stream, and response_format. Use enable_thinking and thinking_budget only with models that explicitly support reasoning mode.