Table of Contents
- GPT-5 New Features Overview — The Big Picture
- What’s the Difference Between Verbosity and Freeform Function Calling
- GPT-5 reasoning_effort Settings: Which Value to Use
- What Changed in the GPT-5.1 Tool System
- When to Use the GPT-5.2 Compact Endpoint
- Migrating from GPT-4o to GPT-5: Should It Happen Now
- GPT-5 Model Selection and Cost Calculation
- GPT-5 New Features Summary and Next Steps
OpenAI’s GPT-5 input token pricing is roughly half that of GPT-4o. The model lineup splits into three variants — gpt-5, gpt-5-mini, and gpt-5-nano — with GPT-5.1 and GPT-5.2 following in quick succession. For anyone who hasn’t done a GPT-5 new features roundup yet, now is the right time. Verbosity parameters, Freeform Function Calling, Context-Free Grammar, and reasoning_effort tuning all landed at once, and GPT-4o has already been retired.
Here’s a Q&A-style breakdown of what changed at the API level.
GPT-5 New Features Overview — The Big Picture
“GPT-5 is out — but how many models were actually released?” is the most common question.
GPT-5 isn’t a single model; it’s a series. The model variants available via API are as follows.
| Model ID | Characteristics | Best For |
|---|---|---|
| gpt-5 | General-purpose, highest performance | Complex reasoning, long document analysis |
| gpt-5-mini | Low-cost, low-latency | Real-time chat, classification tasks |
| gpt-5-nano | Lightweight | Embedding preprocessing, simple transformations |
Four core features were introduced with GPT-5.0.
- Verbosity parameter: Controls response length with low/medium/high settings
- Freeform Function Calling: Passes raw text (Python, SQL, etc.) directly to custom tools without JSON wrapping
- Context-Free Grammar: Structurally constrains output syntax using Lark or Regex patterns
- Minimal Reasoning: Minimizes reasoning tokens with
reasoning_effort="minimal"
GPT-5.1 (2025-11) and GPT-5.2 followed, expanding reasoning_effort options and adding agentic workflow features. The Responses API is the recommended interface; Chat Completions API still works, but some new parameters are Responses API–exclusive.
GPT-5.0 → GPT-5.1 (2025-11) → GPT-5.2 — each version focused on reasoning_effort expansion and tool improvements. GPT-5.3, 5.4, and 5.5 are being rolled out for the ChatGPT interface only; official API parameter changes for those versions haven’t been documented.
What’s the Difference Between Verbosity and Freeform Function Calling

“Both Verbosity and Function Calling shipped together — aren’t they both about output control?” comes up often. The purposes are different.
Verbosity — Controlling Response Length
Verbosity controls the volume of a model’s response. Setting it to low produces concise, to-the-point answers; high generates longer responses with background explanations and examples. The code is straightforward:
response = client.responses.create(
model="gpt-5-mini",
input="Write a poem about a boy and his dog",
text={"verbosity": "high"}
)
Use cases are clear-cut. A customer-facing chatbot benefits from low for brief replies, while technical documentation drafting calls for high to produce detailed output. The default is medium, so it works without any explicit configuration.
Freeform Function Calling — Removing JSON Wrapping
Traditional Function Calling required wrapping all model output in a JSON schema. Generating a SQL query meant wrapping it as {"query": "SELECT * FROM users"}, and Python code had to be string-escaped — an awkward constraint.
Freeform Function Calling removes this limitation. Raw text in Python, SQL, Bash, and other languages can be passed directly to custom tools. The key advantage: JSON parsing errors are structurally eliminated when building code-generation agents.
Ideal for data analysis agents that generate and execute SQL directly, or CI/CD pipeline agents that run shell scripts without intermediary processing. Removing JSON wrapping also simplifies post-processing code.
Context-Free Grammar — Enforcing Output Syntax
Context-Free Grammar (CFG) enforces output syntax through Lark grammars or Regex patterns. It enables more flexible output control than JSON Schema. For example, constraints like “output must be valid YAML” or “follow this specific DSL syntax” can be applied at the grammar level.
If Verbosity controls “how long,” CFG controls “in what format.” If Freeform Function Calling changes “where output goes,” CFG changes “the grammar of the output itself.” These three features operate at different layers and can be combined.
The GPT-5 new parameters and tools notebook contains runnable examples for Verbosity, Freeform Function Calling, and CFG.
GPT-5 reasoning_effort Settings: Which Value to Use
“How much faster is reasoning_effort at minimal? Is the quality acceptable?” is the second most frequent question.
GPT-5.0 reasoning_effort Options
GPT-5.0 supports reasoning_effort at four levels: minimal, low, medium, and high. Setting minimal reduces reasoning tokens, speeding up responses and lowering costs. However, quality degradation can occur with complex math problems or multi-step logical reasoning.
Recommended values by scenario:
| Scenario | Recommended reasoning_effort | Reason |
|---|---|---|
| Customer inquiry classification | minimal | Simple classification, no reasoning needed |
| Code review summary | low | Lightweight analysis |
| Bug root cause analysis | high | Multi-step reasoning required |
| Architecture design advice | high | Trade-off comparison needed |
The ‘none’ Mode Added in GPT-5.1
GPT-5.1 (2025-11) added a none mode to reasoning_effort. This mode uses zero reasoning tokens, enabling low-latency responses. An automatic token-consumption adjustment mechanism based on prompt difficulty was also introduced.
none sits one level below minimal. For simple text transformations, formatting, and classification where reasoning tokens aren’t needed, none delivers the best cost efficiency. On the other hand, using none for code generation or logical reasoning tasks causes a sharp drop in output quality.
The ‘xhigh’ Level Added in GPT-5.2
GPT-5.2 expanded in the opposite direction as well. The xhigh level enables deeper reasoning than high. Token efficiency improved over GPT-5.1, unnecessary verbosity decreased, and instruction adherence along with structured reasoning were strengthened.
reasoning_effort 스펙트럼 (GPT-5.2 기준):
none → minimal → low → medium → high → xhigh
↑ ↑
저지연·저비용 최고 추론 품질
(추론 토큰 0) (토큰 소비 최대)
xhigh consumes the most reasoning tokens. Applying xhigh to every request in production can cause costs to spike. A realistic strategy is to handle most requests at medium or below, reserving high or xhigh only for cases that clearly require complex reasoning.
What Changed in the GPT-5.1 Tool System
“apply_patch changed in GPT-5.1 — do existing integrations need updating?” is another frequent question.
Changes to the apply_patch Tool
In GPT-5.1, the apply_patch tool switched from JSON-based to a named function call approach. The key result: patch failure rates dropped by 35%. Escape errors and indentation corruption that occurred when passing patch content as JSON were structurally reduced.
response = client.responses.create(
model="gpt-5.1",
input=RESPONSE_INPUT,
tools=[{"type": "apply_patch"}]
)
Specifying "apply_patch" as the tool type causes the model to return code modifications in named function call format. Unlike function calling in the Chat Completions API, tool definitions in the Responses API are more concise.
New shell Tool
GPT-5.1 also introduced a shell tool. Built-in timeout and output length limits prevent issues like infinite loops or excessive output when an agent executes shell commands.
The implication of combining these two tools is clear. Starting with GPT-5.1, agentic workflows where the model directly performs code modifications (apply_patch) and command execution (shell) are supported at the API level.
Improved Steerability for Persona, Tone, and Output Format
GPT-5.1 also improved steerability. The model was tuned to follow persona, tone, and output format instructions from system prompts more accurately. Cases where previous versions occasionally ignored system prompt instructions have become less frequent with GPT-5.1.
The GPT-5.1 prompting guide covers specific usage patterns for the apply_patch and shell tools.
apply_patch transitioned from JSON to named function calls (35% reduction in patch failures). The shell tool was newly added (with built-in timeout and output limits). Both tools are Responses API–exclusive.
When to Use the GPT-5.2 Compact Endpoint
“Agent sessions overflow the context window during long runs — did GPT-5.2 actually solve this?” Here’s the answer.
The /responses/compact Endpoint
GPT-5.2 introduced the /responses/compact endpoint. It performs loss-aware compaction of accumulated context in long-running agentic workflows. Rather than simply truncating old messages, it evaluates importance to preserve key information while reducing context length.
Context windows filling up after dozens of tool calls is a common production scenario. Previously, developers had to implement custom summarization logic or manually truncate older messages. /responses/compact handles this at the API level.
GPT-5.2 Token Efficiency Improvements
GPT-5.2 improved token efficiency over GPT-5.1. Unnecessary verbosity (a separate concept from the Verbosity parameter) decreased, producing equivalent-quality responses with fewer tokens. Instruction adherence and structured reasoning were also strengthened, allowing the model to follow complex system prompts more accurately.
From a startup perspective, /responses/compact matters because it directly affects costs. Efficient context window management during long-running agent sessions significantly reduces input token expenses. The impact is especially noticeable for customer support agents or code review agents that maintain extended sessions.
The GPT-5.2 prompting guide details how the compact endpoint works and recommended patterns.
에이전트 컨텍스트 관리 흐름:
[도구 호출 1] → [도구 호출 2] → ... → [도구 호출 N]
↓
컨텍스트 윈도우 포화
↓
/responses/compact 호출
↓
손실 인지 압축 (핵심 유지)
↓
[도구 호출 N+1] 계속
Migrating from GPT-4o to GPT-5: Should It Happen Now
The short answer is closer to “it may already be late.”
As of 2026-02-13, GPT-4o, GPT-4.1, and GPT-5 (Instant/Thinking) were retired from ChatGPT. Existing conversations were automatically migrated to GPT-5.3 Instant, GPT-5.4 Thinking, and GPT-5.4 Pro. The latest GPT-5.5 is being rolled out to Plus, Pro, Business, and Enterprise users.
If API calls still reference GPT-4o or GPT-4.1 model IDs, a migration plan is needed. The ChatGPT interface has already completed automatic transitions, but API users must update model IDs manually. The exact API end-of-support timeline hasn’t been specified in official documentation.
Migration Checklist
Here are the items to verify when transitioning from GPT-4o to GPT-5.
Step 1 — Update Model ID: Replace gpt-4o with gpt-5, gpt-5-mini, or gpt-5-nano based on cost and performance requirements.
Step 2 — Switch API Endpoint: To use new features (Verbosity, Freeform Function Calling, CFG, etc.), switching to the Responses API is recommended. Basic functionality works on Chat Completions API, but some new parameters are Responses API–exclusive.
Step 3 — Configure reasoning_effort: This parameter didn’t exist in GPT-4o, so appropriate values need to be set per request type. A default applies when unspecified, but explicit configuration is better for cost optimization.
Step 4 — Review Function Calling: Existing JSON-based Function Calling continues to work. However, adopting Freeform Function Calling removes JSON wrapping and simplifies code.
GPT-5 Series Version Comparison
GPT-5.0 → Verbosity, Freeform FC, CFG, reasoning_effort
GPT-5.1 → reasoning_effort 'none', apply_patch 개선, shell 도구
GPT-5.2 → reasoning_effort 'xhigh', /responses/compact
GPT-5.3~ → ChatGPT 전용
Official API parameter changes for GPT-5.3, 5.4, and 5.5 haven’t been documented. What’s available in the ChatGPT interface and what’s available via API are separate concerns, so API developers should build against the GPT-5.2 spec as the safe baseline.
GPT-5 Model Selection and Cost Calculation
“Which one — gpt-5, gpt-5-mini, or gpt-5-nano? How big is the price difference?” is another common question.
The first thing that stands out is GPT-5 input token pricing at roughly half of GPT-4o. Performance went up while costs came down. That said, exact per-token pricing for the GPT-5 series hasn’t been verified from official sources.
Model selection comes down to task complexity, split into three tiers.
gpt-5 (general-purpose): Best for complex reasoning, long document processing, and multi-step agents. Delivers the highest performance but also carries the highest cost among the three.
gpt-5-mini (low-cost, low-latency): Suited for real-time chat, text classification, and simple code generation. For most production workloads, gpt-5-mini often offers the best performance-to-cost ratio.
gpt-5-nano (lightweight): Designed for embedding preprocessing, simple text transformations, and format conversions. The right choice when speed and cost take priority over performance.
A common pattern at startups is request routing. A lightweight classifier at the front determines request complexity, sending simple requests to gpt-5-nano, mid-complexity to gpt-5-mini, and only complex requests to gpt-5. Combining model selection with reasoning_effort settings enables finer-grained cost control.
| Combination | Reasoning Quality | Cost | Latency |
|---|---|---|---|
| gpt-5 + xhigh | Highest | Highest | High |
| gpt-5 + medium | High | Medium | Medium |
| gpt-5-mini + low | Medium | Low | Low |
| gpt-5-nano + none | Lowest | Lowest | Lowest |
The OpenAI GPT-5 model page lists context window sizes and supported features for each variant.
Pre-classifying request complexity and routing through gpt-5-nano → gpt-5-mini → gpt-5 can significantly reduce overall API costs. Setting reasoning_effort differently per request is also recommended.
GPT-5 New Features Summary and Next Steps
The GPT-5 series isn’t just a performance upgrade — it fundamentally changes how the API is used. Controlling reasoning depth with reasoning_effort, eliminating JSON wrapping with Freeform Function Calling, and managing long-running agent context with the Compact endpoint define the design direction of the GPT-5 series.
GPT-5 input token pricing is half that of GPT-4o, while capabilities expanded significantly. GPT-4o and GPT-4.1 are already retired. Migration isn’t a matter of choice — it’s a matter of timeline.
The next optimization target is segmenting reasoning_effort by request type. Freeform Function Calling–based agent architectures introduce new design patterns once JSON wrapping is removed. A GPT-5.5 vs Claude model comparison is a separate topic worth examining for technology stack decisions.
Related Posts
- 7 Key Changes in the GPT-5 Release — Complete Model Lineup and SDK Breaking Changes – A version-by-version comparison of GPT-5 through GPT-5.5 covering model lineup, context windows, and SDK breaking changes…
- Complete GPT-5 API Usage Guide — Responses API Migration and 7-Step Parameter Tuning – OpenAI GPT-5 recommends the Responses API by default, with verbosity and reasoning_effort parameters and free…
- GPT-5 Codex CLI Usage in 7 Steps — Rust Installation, config.toml, and Model Selection – GPT-5 Codex transitioned from a TypeScript CLI to a Rust implementation. Covers installation, config.toml setup, sandbox policies…