GPT-5 New Features: 4 API Parameters and Migration Guide

GPT-5 New Features Overview — The Big Picture
What’s the Difference Between Verbosity and Freeform Function Calling
GPT-5 reasoning_effort Settings: Which Value to Use
What Changed in the GPT-5.1 Tool System
When to Use the GPT-5.2 Compact Endpoint
- The /responses/compact Endpoint
- GPT-5.2 Token Efficiency Improvements
Migrating from GPT-4o to GPT-5: Should It Happen Now
- Migration Checklist
- GPT-5 Series Version Comparison
GPT-5 Model Selection and Cost Calculation
GPT-5 New Features Summary and Next Steps

OpenAI’s GPT-5 input token pricing is roughly half that of GPT-4o. The model lineup splits into three variants — gpt-5, gpt-5-mini, and gpt-5-nano — with GPT-5.1 and GPT-5.2 following in quick succession. For anyone who hasn’t done a GPT-5 new features roundup yet, now is the right time. Verbosity parameters, Freeform Function Calling, Context-Free Grammar, and reasoning_effort tuning all landed at once, and GPT-4o has already been retired.

Here’s a Q&A-style breakdown of what changed at the API level.

GPT-5 New Features Overview — The Big Picture

“GPT-5 is out — but how many models were actually released?” is the most common question.

GPT-5 isn’t a single model; it’s a series. The model variants available via API are as follows.

Model ID	Characteristics	Best For
gpt-5	General-purpose, highest performance	Complex reasoning, long document analysis
gpt-5-mini	Low-cost, low-latency	Real-time chat, classification tasks
gpt-5-nano	Lightweight	Embedding preprocessing, simple transformations

Four core features were introduced with GPT-5.0.

Verbosity parameter: Controls response length with low/medium/high settings
Freeform Function Calling: Passes raw text (Python, SQL, etc.) directly to custom tools without JSON wrapping
Context-Free Grammar: Structurally constrains output syntax using Lark or Regex patterns
Minimal Reasoning: Minimizes reasoning tokens with reasoning_effort="minimal"

GPT-5.1 (2025-11) and GPT-5.2 followed, expanding reasoning_effort options and adding agentic workflow features. The Responses API is the recommended interface; Chat Completions API still works, but some new parameters are Responses API–exclusive.

GPT-5.0 → GPT-5.1 (2025-11) → GPT-5.2 — each version focused on reasoning_effort expansion and tool improvements. GPT-5.3, 5.4, and 5.5 are being rolled out for the ChatGPT interface only; official API parameter changes for those versions haven’t been documented.

What’s the Difference Between Verbosity and Freeform Function Calling

Verbosity vs Freeform Function Calling comparison

“Both Verbosity and Function Calling shipped together — aren’t they both about output control?” comes up often. The purposes are different.

Verbosity — Controlling Response Length

Verbosity controls the volume of a model’s response. Setting it to low produces concise, to-the-point answers; high generates longer responses with background explanations and examples. The code is straightforward:

response = client.responses.create(
    model="gpt-5-mini",
    input="Write a poem about a boy and his dog",
    text={"verbosity": "high"}
)

Use cases are clear-cut. A customer-facing chatbot benefits from low for brief replies, while technical documentation drafting calls for high to produce detailed output. The default is medium, so it works without any explicit configuration.

Freeform Function Calling — Removing JSON Wrapping

Traditional Function Calling required wrapping all model output in a JSON schema. Generating a SQL query meant wrapping it as {"query": "SELECT * FROM users"}, and Python code had to be string-escaped — an awkward constraint.

Freeform Function Calling removes this limitation. Raw text in Python, SQL, Bash, and other languages can be passed directly to custom tools. The key advantage: JSON parsing errors are structurally eliminated when building code-generation agents.

Freeform Function Calling Use Cases
Ideal for data analysis agents that generate and execute SQL directly, or CI/CD pipeline agents that run shell scripts without intermediary processing. Removing JSON wrapping also simplifies post-processing code.

Context-Free Grammar — Enforcing Output Syntax

Context-Free Grammar (CFG) enforces output syntax through Lark grammars or Regex patterns. It enables more flexible output control than JSON Schema. For example, constraints like “output must be valid YAML” or “follow this specific DSL syntax” can be applied at the grammar level.

If Verbosity controls “how long,” CFG controls “in what format.” If Freeform Function Calling changes “where output goes,” CFG changes “the grammar of the output itself.” These three features operate at different layers and can be combined.

The GPT-5 new parameters and tools notebook contains runnable examples for Verbosity, Freeform Function Calling, and CFG.

GPT-5 reasoning_effort Settings: Which Value to Use

“How much faster is reasoning_effort at minimal? Is the quality acceptable?” is the second most frequent question.

GPT-5.0 reasoning_effort Options

GPT-5.0 supports reasoning_effort at four levels: minimal, low, medium, and high. Setting minimal reduces reasoning tokens, speeding up responses and lowering costs. However, quality degradation can occur with complex math problems or multi-step logical reasoning.

Recommended values by scenario:

Scenario	Recommended reasoning_effort	Reason
Customer inquiry classification	minimal	Simple classification, no reasoning needed
Code review summary	low	Lightweight analysis
Bug root cause analysis	high	Multi-step reasoning required
Architecture design advice	high	Trade-off comparison needed

The ‘none’ Mode Added in GPT-5.1

GPT-5.1 (2025-11) added a none mode to reasoning_effort. This mode uses zero reasoning tokens, enabling low-latency responses. An automatic token-consumption adjustment mechanism based on prompt difficulty was also introduced.

none sits one level below minimal. For simple text transformations, formatting, and classification where reasoning tokens aren’t needed, none delivers the best cost efficiency. On the other hand, using none for code generation or logical reasoning tasks causes a sharp drop in output quality.

The ‘xhigh’ Level Added in GPT-5.2

GPT-5.2 expanded in the opposite direction as well. The xhigh level enables deeper reasoning than high. Token efficiency improved over GPT-5.1, unnecessary verbosity decreased, and instruction adherence along with structured reasoning were strengthened.

reasoning_effort 스펙트럼 (GPT-5.2 기준):

none → minimal → low → medium → high → xhigh
 ↑                                        ↑
저지연·저비용                        최고 추론 품질
(추론 토큰 0)                      (토큰 소비 최대)

Cost Warning for reasoning_effort Selection
xhigh consumes the most reasoning tokens. Applying xhigh to every request in production can cause costs to spike. A realistic strategy is to handle most requests at medium or below, reserving high or xhigh only for cases that clearly require complex reasoning.

What Changed in the GPT-5.1 Tool System

“apply_patch changed in GPT-5.1 — do existing integrations need updating?” is another frequent question.

Changes to the apply_patch Tool

In GPT-5.1, the apply_patch tool switched from JSON-based to a named function call approach. The key result: patch failure rates dropped by 35%. Escape errors and indentation corruption that occurred when passing patch content as JSON were structurally reduced.

response = client.responses.create(
    model="gpt-5.1",
    input=RESPONSE_INPUT,
    tools=[{"type": "apply_patch"}]
)

Specifying "apply_patch" as the tool type causes the model to return code modifications in named function call format. Unlike function calling in the Chat Completions API, tool definitions in the Responses API are more concise.

New shell Tool

GPT-5.1 also introduced a shell tool. Built-in timeout and output length limits prevent issues like infinite loops or excessive output when an agent executes shell commands.

The implication of combining these two tools is clear. Starting with GPT-5.1, agentic workflows where the model directly performs code modifications (apply_patch) and command execution (shell) are supported at the API level.

Improved Steerability for Persona, Tone, and Output Format

GPT-5.1 also improved steerability. The model was tuned to follow persona, tone, and output format instructions from system prompts more accurately. Cases where previous versions occasionally ignored system prompt instructions have become less frequent with GPT-5.1.

The GPT-5.1 prompting guide covers specific usage patterns for the apply_patch and shell tools.

GPT-5.1 Tool Changes Summary
apply_patch transitioned from JSON to named function calls (35% reduction in patch failures). The shell tool was newly added (with built-in timeout and output limits). Both tools are Responses API–exclusive.

When to Use the GPT-5.2 Compact Endpoint

“Agent sessions overflow the context window during long runs — did GPT-5.2 actually solve this?” Here’s the answer.

The /responses/compact Endpoint

GPT-5.2 introduced the /responses/compact endpoint. It performs loss-aware compaction of accumulated context in long-running agentic workflows. Rather than simply truncating old messages, it evaluates importance to preserve key information while reducing context length.

Context windows filling up after dozens of tool calls is a common production scenario. Previously, developers had to implement custom summarization logic or manually truncate older messages. /responses/compact handles this at the API level.

GPT-5.2 Token Efficiency Improvements

GPT-5.2 improved token efficiency over GPT-5.1. Unnecessary verbosity (a separate concept from the Verbosity parameter) decreased, producing equivalent-quality responses with fewer tokens. Instruction adherence and structured reasoning were also strengthened, allowing the model to follow complex system prompts more accurately.

From a startup perspective, /responses/compact matters because it directly affects costs. Efficient context window management during long-running agent sessions significantly reduces input token expenses. The impact is especially noticeable for customer support agents or code review agents that maintain extended sessions.

The GPT-5.2 prompting guide details how the compact endpoint works and recommended patterns.

에이전트 컨텍스트 관리 흐름:

[도구 호출 1] → [도구 호출 2] → ... → [도구 호출 N]
                                          ↓
                               컨텍스트 윈도우 포화
                                          ↓
                            /responses/compact 호출
                                          ↓
                          손실 인지 압축 (핵심 유지)
                                          ↓
                            [도구 호출 N+1] 계속

Migrating from GPT-4o to GPT-5: Should It Happen Now

The short answer is closer to “it may already be late.”

As of 2026-02-13, GPT-4o, GPT-4.1, and GPT-5 (Instant/Thinking) were retired from ChatGPT. Existing conversations were automatically migrated to GPT-5.3 Instant, GPT-5.4 Thinking, and GPT-5.4 Pro. The latest GPT-5.5 is being rolled out to Plus, Pro, Business, and Enterprise users.

GPT-4o and GPT-4.1 Retirement Notice
If API calls still reference GPT-4o or GPT-4.1 model IDs, a migration plan is needed. The ChatGPT interface has already completed automatic transitions, but API users must update model IDs manually. The exact API end-of-support timeline hasn’t been specified in official documentation.

Migration Checklist

Here are the items to verify when transitioning from GPT-4o to GPT-5.

Step 1 — Update Model ID: Replace gpt-4o with gpt-5, gpt-5-mini, or gpt-5-nano based on cost and performance requirements.

Step 2 — Switch API Endpoint: To use new features (Verbosity, Freeform Function Calling, CFG, etc.), switching to the Responses API is recommended. Basic functionality works on Chat Completions API, but some new parameters are Responses API–exclusive.

Step 3 — Configure reasoning_effort: This parameter didn’t exist in GPT-4o, so appropriate values need to be set per request type. A default applies when unspecified, but explicit configuration is better for cost optimization.

Step 4 — Review Function Calling: Existing JSON-based Function Calling continues to work. However, adopting Freeform Function Calling removes JSON wrapping and simplifies code.

GPT-5 Series Version Comparison

GPT-5.0  → Verbosity, Freeform FC, CFG, reasoning_effort
GPT-5.1  → reasoning_effort 'none', apply_patch 개선, shell 도구
GPT-5.2  → reasoning_effort 'xhigh', /responses/compact
GPT-5.3~ → ChatGPT 전용

Official API parameter changes for GPT-5.3, 5.4, and 5.5 haven’t been documented. What’s available in the ChatGPT interface and what’s available via API are separate concerns, so API developers should build against the GPT-5.2 spec as the safe baseline.

GPT-5 Model Selection and Cost Calculation

“Which one — gpt-5, gpt-5-mini, or gpt-5-nano? How big is the price difference?” is another common question.

The first thing that stands out is GPT-5 input token pricing at roughly half of GPT-4o. Performance went up while costs came down. That said, exact per-token pricing for the GPT-5 series hasn’t been verified from official sources.

Model selection comes down to task complexity, split into three tiers.

gpt-5 (general-purpose): Best for complex reasoning, long document processing, and multi-step agents. Delivers the highest performance but also carries the highest cost among the three.

gpt-5-mini (low-cost, low-latency): Suited for real-time chat, text classification, and simple code generation. For most production workloads, gpt-5-mini often offers the best performance-to-cost ratio.

gpt-5-nano (lightweight): Designed for embedding preprocessing, simple text transformations, and format conversions. The right choice when speed and cost take priority over performance.

A common pattern at startups is request routing. A lightweight classifier at the front determines request complexity, sending simple requests to gpt-5-nano, mid-complexity to gpt-5-mini, and only complex requests to gpt-5. Combining model selection with reasoning_effort settings enables finer-grained cost control.

Combination	Reasoning Quality	Cost	Latency
gpt-5 + xhigh	Highest	Highest	High
gpt-5 + medium	High	Medium	Medium
gpt-5-mini + low	Medium	Low	Low
gpt-5-nano + none	Lowest	Lowest	Lowest

The OpenAI GPT-5 model page lists context window sizes and supported features for each variant.

Model Routing Strategy
Pre-classifying request complexity and routing through gpt-5-nano → gpt-5-mini → gpt-5 can significantly reduce overall API costs. Setting reasoning_effort differently per request is also recommended.

GPT-5 New Features Summary and Next Steps

The GPT-5 series isn’t just a performance upgrade — it fundamentally changes how the API is used. Controlling reasoning depth with reasoning_effort, eliminating JSON wrapping with Freeform Function Calling, and managing long-running agent context with the Compact endpoint define the design direction of the GPT-5 series.

GPT-5 input token pricing is half that of GPT-4o, while capabilities expanded significantly. GPT-4o and GPT-4.1 are already retired. Migration isn’t a matter of choice — it’s a matter of timeline.

The next optimization target is segmenting reasoning_effort by request type. Freeform Function Calling–based agent architectures introduce new design patterns once JSON wrapping is removed. A GPT-5.5 vs Claude model comparison is a separate topic worth examining for technology stack decisions.

7 Key Changes in the GPT-5 Release — Complete Model Lineup and SDK Breaking Changes – A version-by-version comparison of GPT-5 through GPT-5.5 covering model lineup, context windows, and SDK breaking changes…
Complete GPT-5 API Usage Guide — Responses API Migration and 7-Step Parameter Tuning – OpenAI GPT-5 recommends the Responses API by default, with verbosity and reasoning_effort parameters and free…
GPT-5 Codex CLI Usage in 7 Steps — Rust Installation, config.toml, and Model Selection – GPT-5 Codex transitioned from a TypeScript CLI to a Rust implementation. Covers installation, config.toml setup, sandbox policies…

Table of Contents

GPT-5 New Features Overview — The Big Picture

What’s the Difference Between Verbosity and Freeform Function Calling

Verbosity — Controlling Response Length

Freeform Function Calling — Removing JSON Wrapping

Context-Free Grammar — Enforcing Output Syntax

GPT-5 reasoning_effort Settings: Which Value to Use

GPT-5.0 reasoning_effort Options

The ‘none’ Mode Added in GPT-5.1

The ‘xhigh’ Level Added in GPT-5.2

What Changed in the GPT-5.1 Tool System

Changes to the apply_patch Tool

New shell Tool

Improved Steerability for Persona, Tone, and Output Format

When to Use the GPT-5.2 Compact Endpoint

The /responses/compact Endpoint

GPT-5.2 Token Efficiency Improvements

Migrating from GPT-4o to GPT-5: Should It Happen Now

Migration Checklist

GPT-5 Series Version Comparison

GPT-5 Model Selection and Cost Calculation

GPT-5 New Features Summary and Next Steps

Related Posts

Read next