Flex processing
Beta
=======================
Optimize costs with flex processing.
Flex processing provides significantly lower costs for Chat Completions or Responses requests in exchange for slower response times and occasional resource unavailability. It is ideal for non-production or lower-priority tasks such as model evaluations, data enrichment, or asynchronous workloads.
Token inputs and outputs are priced at Batch API rates, with additional discounts from prompt caching.
Flex processing is in beta, and currently only available for o3 and o4-mini models.
API usage
Set the service_tier
parameter to flex
in your API request (Chat or Responses) to take advantage of Flex processing.
Flex processing example
import OpenAI from "openai";
const client = new OpenAI({
timeout: 15 * 1000 * 60, // Increase default timeout to 15 minutes
});
const response = await client.responses.create({
model: "o3",
instructions: "List and describe all the metaphors used in this book.",
input: "<very long text of book here>",
service_tier: "flex",
timeout: 15 * 1000 * 60, // Can override timeout per request
});
console.log(response.output_text);
from openai import OpenAI
client = OpenAI(
# increase default timeout to 15 minutes (from 10 minutes)
timeout=900.0
)
# you can override the max timeout per request as well
response = client.with_options(timeout=900.0).responses.create(
model="o3",
instructions="List and describe all the metaphors used in this book.",
input="<very long text of book here>",
service_tier="flex",
)
print(response.output_text)
curl https://api.openai.com/v1/responses \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "o3",
"instructions": "List and describe all the metaphors used in this book.",
"input": "<very long text of book here>",
"service_tier": "flex"
}'
API request timeouts
Due to slower processing speeds with Flex processing, request timeouts are more likely. Here are some considerations for handling timeouts:
- Default timeout: The default timeout is 10 minutes when making API requests with an official OpenAI SDK. You may need to increase this timeout for lengthy prompts or complex tasks.
- Configuring timeouts: Each SDK will provide a parameter to increase this timeout. In the Python and JavaScript SDKs, this is
timeout
as shown in the code samples above. - Automatic retries: The OpenAI SDKs automatically retry requests that result in a
408 Request Timeout
error code twice before throwing an exception.
Resource unavailable errors
Flex processing may sometimes lack sufficient resources to handle your requests, resulting in a 429 Resource Unavailable
error code. You will not be charged when this occurs.
When encountering Resource Unavailable errors, consider these strategies:
Retry requests with exponential backoff: This approach is suitable for workloads that can tolerate delays and aims to minimize costs. For implementation details, see this cookbook.
Fallback to standard request: Switching to the default tier is recommended if timely completion is important and occasional higher costs are acceptable. Set
service_tier
toauto
in your request to do this, or remove theservice_tier
parameter to use the default tier.