7 Key GPT-5 Release Changes — Model Lineup and SDK Breaking Changes

Table of Contents

The bottom line: the key GPT-5 release changes boil down to three things. First, the context window expanded to 200K input and 100K output tokens. Second, the GPT-5.4 family split into three tiers — full-size, mini, and nano — opening up cost optimization options. Third, the openai-python SDK introduced a breaking change by removing the prompt_cache_key parameter, which means existing code may throw errors after upgrading. This article covers the changes from GPT-5 (2025-08-07 version) through GPT-5.5 (2026-04-23), organized by comparison criteria.

GPT-5 Release Background and Model Lineage

GPT-5 was released on August 7, 2025. The training data cutoff is October 2024, and the detailed model specs are available on the GPT-5 Azure Model Marketplace page. It natively supports multimodal (text + image) input, along with function calling (allowed tools and preamble), Structured Outputs, streaming, and reasoning effort settings (high/medium/low/minimal).

Over the following seven months, the GPT-5 series branched out rapidly. On March 5, 2026, gpt-5.4 appeared alongside openai-python v2.25.0. On March 17 of the same month, gpt-5.4-nano and gpt-5.4-mini were added. Then on April 23, 2026, GPT-5.5 (codename “Spud”) was unveiled with a context window expanded to 1M tokens.

The lineage runs GPT-5 (2025-08) → GPT-5.4 (2026-03-05) → GPT-5.4-mini/nano (2026-03-17) → GPT-5.5 (2026-04-23). At each step, the context window, supported features, and SDK compatibility differ — so the model version should be explicitly verified before deployment.

In a startup environment, model selection is directly tied to infrastructure costs. The fact that the GPT-5 series isn’t a single model but a purpose-segmented lineup is something that must factor into any technical decision.

Comparison Criteria for GPT-5 Release Changes

Comparing the key GPT-5 release changes requires clearly defined criteria. This article analyzes the changes along four axes.

Context Window and Token Limits

Input and output token limits vary significantly across models. The base GPT-5 model supports 200K input and 100K output, while GPT-5 Pro handles 400K input and 272K output. GPT-5.5 reportedly supports a 1M token context. Which model is selected completely changes the prompt design strategy.

Supported Feature Scope

Not all GPT-5 series models support the same features. GPT-5 Pro is exclusively available through the Responses API and doesn’t support streaming or Code Interpreter. GPT-5.5 has the broadest feature set, including MCP (Model Context Protocol), web search, and hosted shell.

SDK Compatibility and Breaking Changes

Available models and parameters differ depending on the openai-python SDK version. The removal of prompt_cache_key in v2.25.0 is a change that directly impacts existing production code.

Model Lineup and Use-Case Segmentation

GPT-5.4 split into three tiers: full-size, mini, and nano. This structure enables cost-to-performance optimization, but it also increases the complexity of model selection decisions.

Context Window Changes Compared

context-window-comparison

The most notable change in the GPT-5 series is the stepwise expansion of the context window.

ModelInput TokensOutput TokensNotes
GPT-5 (2025-08)200K100KBase multimodal
GPT-5 Pro400K272KResponses API only
GPT-5.5 (2026-04)1MUnconfirmedCodename Spud

The base GPT-5 model’s 200K input tokens were already a significant expansion over previous generations, but the jump to 400K with GPT-5 Pro and 1M with GPT-5.5 is dramatic. At 1M tokens, it becomes possible to fit an entire typical codebase into a single prompt.

GPT-5 Pro Limitations
GPT-5 Pro offers the most generous input/output token limits, but it doesn’t support streaming or Code Interpreter. This makes it unsuitable for real-time chat interfaces. Being Responses API-only also means it’s incompatible with existing Chat Completions API-based code.

For startups running RAG (Retrieval-Augmented Generation) pipelines, the context window size directly affects chunking strategy. GPT-5’s 200K is often sufficient, but for workloads requiring long context — such as legal documents or large-scale code analysis — GPT-5.5’s 1M tokens can process entire documents without chunking.

That said, a larger context window isn’t always better. API call costs increase proportionally with token count, and the “lost in the middle” phenomenon — where models miss information in the middle of long contexts — may still apply. Official pricing for the GPT-5 series is currently not fully available in public documentation, so exact cost calculations should be verified directly on the OpenAI pricing page.

GPT-5.4 Family Lineup Analysis

In March 2026, the GPT-5.4 family arrived, marking the beginning of full-scale lineup segmentation for the GPT-5 series. As shown on the openai-python releases page, v2.29.0 (2026-03-17) added the gpt-5.4-nano and gpt-5.4-mini model slugs.

Three-Tier Lineup Structure

GPT-5.4 패밀리
├── gpt-5.4        ← 풀사이즈 (최고 성능)
├── gpt-5.4-mini   ← 중간 (성능-비용 균형)
└── gpt-5.4-nano   ← 경량 (최저 비용·최저 지연)

This structure mirrors the tiering strategies of Google’s Gemini series (Pro, Flash, Nano) and Anthropic’s Claude series (Opus, Sonnet, Haiku). The full-size model targets complex reasoning and code generation, mini serves general conversational services, and nano handles lightweight tasks like classification and summarization.

Model Selection Matrix

Key factors for model selection in a startup environment:

Criteriagpt-5.4gpt-5.4-minigpt-5.4-nano
Reasoning ComplexityHighMediumLow
Expected LatencyHighMediumLow
Cost EfficiencyLowMediumHigh
Best ForCode generation, complex analysisChat, general QAClassification, summarization, routing

Official pricing is not currently available in public documentation, so the cost column above is an estimate based on general model tiering patterns. Exact figures should be verified on the OpenAI pricing page before adoption.

Leveraging the Router Pattern
In production, a router pattern that automatically selects a model based on request complexity can significantly reduce API call costs. Route simple intent classification to gpt-5.4-nano while sending complex code reviews to the full-size gpt-5.4 — cutting costs while maintaining quality.

openai-python SDK Breaking Changes in Detail

sdk-breaking-change-detail

The most critical item in the GPT-5 release changes is the SDK-level breaking change. The changes introduced in openai-python v2.25.0 (2026-03-05) directly affect existing production code.

prompt_cache_key Parameter Removal

As stated in the openai-python v2.25.0 release notes, the prompt_cache_key parameter was removed from responses. Any code that used this parameter to manage prompt caching will throw errors after upgrading to v2.25.0 or later.

If prompt_cache_key was being passed in Responses API calls, it needs to be removed or replaced with whatever alternative caching mechanism OpenAI provides. A detailed migration guide for the replacement method is not currently available in public documentation.

phase Field Temporarily Removed Then Re-Added

In the same v2.25.0 release, the phase field was temporarily removed from message types and then re-added. This kind of change can cause unexpected errors in TypeScript projects with strict type checking or Pydantic-based validation logic.

Pre-Upgrade Checklist for SDK
Before upgrading to v2.25.0 or later, search the codebase for any usage of `prompt_cache_key` and check for type definitions that depend on the `phase` field. If the SDK version is pinned in the CI pipeline, validate the upgrade in a test environment first.

New Features Added in v2.25.0

The release wasn’t all breaking changes. The same version added a tool search tool and a new computer tool. Tool search lets the model autonomously search and select from the list of available tools, while the computer tool enables the model to perform computer operations (mouse clicks, keyboard input, etc.).

These new tools lay the groundwork for agent patterns where tool discovery and computer operations happen directly — without explicit function calling. Previously, function calling was the only way to interact with external systems. With tool search and computer tool, the model can now perform tasks more autonomously.

GPT-5 Pro and GPT-5.5 New Features

GPT-5 Pro

GPT-5 Pro is a high-performance model available exclusively through the Responses API. It supports 400K input and 272K output tokens, with Web Search, Function Calling, Vision, PDF Input, Prompt Caching, and Reasoning capabilities included. However, it does not support streaming or Code Interpreter.

Being Responses API-only means that systems built on the Chat Completions API need to change their API call approach entirely. The lack of streaming support is also a critical constraint for real-time chat services. GPT-5 Pro is positioned for workloads where response latency is acceptable — batch processing, document analysis, and large-scale data reasoning.

FeatureGPT-5GPT-5 ProGPT-5.5
Input Tokens200K400K1M
Output Tokens100K272KUnconfirmed
Streaming
Code InterpreterUnconfirmedUnconfirmed
Function Calling
Structured OutputsUnconfirmed
Web SearchUnconfirmed
MCP SupportUnconfirmed
Prompt Caching
Computer UseUnconfirmedUnconfirmed

GPT-5.5 — Codename Spud

GPT-5.5, unveiled on April 23, 2026, is the latest model in the series. Its standout feature is the 1M token context window. It natively supports image input, Structured Outputs, function calling, prompt caching, and Batch API, along with tool search, built-in computer use, hosted shell, MCP (Model Context Protocol), and web search.

MCP is a standardized protocol for models to access external data sources and tools. With MCP built into GPT-5.5, direct connection to MCP-compatible servers becomes possible without intermediate adapters like LangChain.

Reliability of GPT-5.5 Information
The detailed specs for GPT-5.5 come from community sources rather than official API documentation. Final specs should be re-verified in OpenAI’s official documentation before production adoption. Note that official documentation (platform.openai.com) currently has limited accessibility.

Hosted shell enables the model to execute commands in an actual shell environment. Once this feature stabilizes, it could see significant use in CI/CD pipeline automation, server diagnostics, and code execution-based verification.

GPT-5 Reasoning Effort Settings and Usage

The reasoning effort setting, newly introduced with GPT-5, is a parameter that controls the model’s depth of reasoning. It supports four levels — high, medium, low, and minimal — allowing selection based on task complexity.

Use Cases by Level

reasoning effort 설정
┌──────────┬────────────────────────────────────┐
│ high     │ 복잡한 수학, 다단계 논리 추론        │
│ medium   │ 일반적인 코드 생성, 분석             │
│ low      │ 간단한 QA, 요약                     │
│ minimal  │ 분류, 키워드 추출, 라우팅 판단        │
└──────────┴────────────────────────────────────┘

Lowering reasoning effort speeds up responses and reduces token consumption, but accuracy may drop on complex problems. Conversely, setting it to high lets the model think more deeply, at the cost of increased response time and expense.

This setting is most effective when combined with the router pattern described earlier. For example, a two-stage architecture could use gpt-5.4-nano + minimal for initial classification, then route complex requests to gpt-5.4 + high. Official code examples for detailed usage and parameter passing of reasoning effort are currently not available in public documentation.

Reasoning Effort Selection Guide
In production, a practical approach is to start with medium and adjust incrementally while monitoring accuracy metrics. Applying high to every request inflates costs unnecessarily, while applying minimal across the board risks quality issues.

GPT-5 Series Full Change Timeline and Migration Points

A chronological summary of the changes since the GPT-5 series launch:

flowchart TB
    A["GPT-5 출시<br/>2025-08-07<br/>200K/100K 토큰"] --> B["openai-python v2.25.0<br/>2026-03-05<br/>gpt-5.4 + breaking change"]
    B --> C["openai-python v2.29.0<br/>2026-03-17<br/>gpt-5.4-mini/nano 추가"]
    C --> D["GPT-5.5 공개<br/>2026-04-23<br/>1M 토큰, MCP 지원"]

There are multiple points to watch when migrating from GPT-4o or GPT-4.1 to the GPT-5 series. However, the full list of breaking changes for the GPT-4o/4.1 to GPT-5 migration is not available in public documentation, so only the items confirmed through SDK release notes are covered here.

The confirmed key migration checklist items are as follows:

Check ItemDetailsImpact Scope
prompt_cache_key removalDeleted from responses parameters in v2.25.0All caching logic
phase field changeTemporarily removed then re-added; type definition changesTypeScript/Pydantic type validation
Responses API-only modelsGPT-5 Pro does not support Chat Completions APIRequires API call method switch
Model slug changesgpt-5 → gpt-5.4 → gpt-5.4-mini/nanoModel identifier string updates

Official pricing figures (cost per token) for the GPT-5 series API are not currently available. In cost-sensitive startup environments, the latest rates should be verified on the OpenAI official pricing page before budgeting.

Once GPT-5.5’s MCP support stabilizes, it could significantly reshape agent architecture design. The current common pattern — managing tool connections through LangChain or custom orchestration layers — may simplify as model-level MCP support reduces the complexity of intermediate layers. Additionally, cost optimization strategies leveraging the GPT-5.4 family’s three-tier lineup, along with feature and pricing comparisons against competitors like Gemini and Claude, should be evaluated during model selection. Ultimately, understanding the GPT-5 release changes comes down to finding the right combination of model, SDK version, and API approach for a given workload.

Scroll to Top