Table of Contents
- GPT-5 Release Background and Model Lineage
- Comparison Criteria for GPT-5 Release Changes
- Context Window Changes Compared
- GPT-5.4 Family Lineup Analysis
- openai-python SDK Breaking Changes in Detail
- GPT-5 Pro and GPT-5.5 New Features
- GPT-5 Reasoning Effort Settings and Usage
- GPT-5 Series Full Change Timeline and Migration Points
The bottom line: the key GPT-5 release changes boil down to three things. First, the context window expanded to 200K input and 100K output tokens. Second, the GPT-5.4 family split into three tiers — full-size, mini, and nano — opening up cost optimization options. Third, the openai-python SDK introduced a breaking change by removing the prompt_cache_key parameter, which means existing code may throw errors after upgrading. This article covers the changes from GPT-5 (2025-08-07 version) through GPT-5.5 (2026-04-23), organized by comparison criteria.
GPT-5 Release Background and Model Lineage
GPT-5 was released on August 7, 2025. The training data cutoff is October 2024, and the detailed model specs are available on the GPT-5 Azure Model Marketplace page. It natively supports multimodal (text + image) input, along with function calling (allowed tools and preamble), Structured Outputs, streaming, and reasoning effort settings (high/medium/low/minimal).
Over the following seven months, the GPT-5 series branched out rapidly. On March 5, 2026, gpt-5.4 appeared alongside openai-python v2.25.0. On March 17 of the same month, gpt-5.4-nano and gpt-5.4-mini were added. Then on April 23, 2026, GPT-5.5 (codename “Spud”) was unveiled with a context window expanded to 1M tokens.
The lineage runs GPT-5 (2025-08) → GPT-5.4 (2026-03-05) → GPT-5.4-mini/nano (2026-03-17) → GPT-5.5 (2026-04-23). At each step, the context window, supported features, and SDK compatibility differ — so the model version should be explicitly verified before deployment.
In a startup environment, model selection is directly tied to infrastructure costs. The fact that the GPT-5 series isn’t a single model but a purpose-segmented lineup is something that must factor into any technical decision.
Comparison Criteria for GPT-5 Release Changes
Comparing the key GPT-5 release changes requires clearly defined criteria. This article analyzes the changes along four axes.
Context Window and Token Limits
Input and output token limits vary significantly across models. The base GPT-5 model supports 200K input and 100K output, while GPT-5 Pro handles 400K input and 272K output. GPT-5.5 reportedly supports a 1M token context. Which model is selected completely changes the prompt design strategy.
Supported Feature Scope
Not all GPT-5 series models support the same features. GPT-5 Pro is exclusively available through the Responses API and doesn’t support streaming or Code Interpreter. GPT-5.5 has the broadest feature set, including MCP (Model Context Protocol), web search, and hosted shell.
SDK Compatibility and Breaking Changes
Available models and parameters differ depending on the openai-python SDK version. The removal of prompt_cache_key in v2.25.0 is a change that directly impacts existing production code.
Model Lineup and Use-Case Segmentation
GPT-5.4 split into three tiers: full-size, mini, and nano. This structure enables cost-to-performance optimization, but it also increases the complexity of model selection decisions.
Context Window Changes Compared

The most notable change in the GPT-5 series is the stepwise expansion of the context window.
| Model | Input Tokens | Output Tokens | Notes |
|---|---|---|---|
| GPT-5 (2025-08) | 200K | 100K | Base multimodal |
| GPT-5 Pro | 400K | 272K | Responses API only |
| GPT-5.5 (2026-04) | 1M | Unconfirmed | Codename Spud |
The base GPT-5 model’s 200K input tokens were already a significant expansion over previous generations, but the jump to 400K with GPT-5 Pro and 1M with GPT-5.5 is dramatic. At 1M tokens, it becomes possible to fit an entire typical codebase into a single prompt.
GPT-5 Pro offers the most generous input/output token limits, but it doesn’t support streaming or Code Interpreter. This makes it unsuitable for real-time chat interfaces. Being Responses API-only also means it’s incompatible with existing Chat Completions API-based code.
For startups running RAG (Retrieval-Augmented Generation) pipelines, the context window size directly affects chunking strategy. GPT-5’s 200K is often sufficient, but for workloads requiring long context — such as legal documents or large-scale code analysis — GPT-5.5’s 1M tokens can process entire documents without chunking.
That said, a larger context window isn’t always better. API call costs increase proportionally with token count, and the “lost in the middle” phenomenon — where models miss information in the middle of long contexts — may still apply. Official pricing for the GPT-5 series is currently not fully available in public documentation, so exact cost calculations should be verified directly on the OpenAI pricing page.
GPT-5.4 Family Lineup Analysis
In March 2026, the GPT-5.4 family arrived, marking the beginning of full-scale lineup segmentation for the GPT-5 series. As shown on the openai-python releases page, v2.29.0 (2026-03-17) added the gpt-5.4-nano and gpt-5.4-mini model slugs.
Three-Tier Lineup Structure
GPT-5.4 패밀리
├── gpt-5.4 ← 풀사이즈 (최고 성능)
├── gpt-5.4-mini ← 중간 (성능-비용 균형)
└── gpt-5.4-nano ← 경량 (최저 비용·최저 지연)
This structure mirrors the tiering strategies of Google’s Gemini series (Pro, Flash, Nano) and Anthropic’s Claude series (Opus, Sonnet, Haiku). The full-size model targets complex reasoning and code generation, mini serves general conversational services, and nano handles lightweight tasks like classification and summarization.
Model Selection Matrix
Key factors for model selection in a startup environment:
| Criteria | gpt-5.4 | gpt-5.4-mini | gpt-5.4-nano |
|---|---|---|---|
| Reasoning Complexity | High | Medium | Low |
| Expected Latency | High | Medium | Low |
| Cost Efficiency | Low | Medium | High |
| Best For | Code generation, complex analysis | Chat, general QA | Classification, summarization, routing |
Official pricing is not currently available in public documentation, so the cost column above is an estimate based on general model tiering patterns. Exact figures should be verified on the OpenAI pricing page before adoption.
In production, a router pattern that automatically selects a model based on request complexity can significantly reduce API call costs. Route simple intent classification to gpt-5.4-nano while sending complex code reviews to the full-size gpt-5.4 — cutting costs while maintaining quality.
openai-python SDK Breaking Changes in Detail

The most critical item in the GPT-5 release changes is the SDK-level breaking change. The changes introduced in openai-python v2.25.0 (2026-03-05) directly affect existing production code.
prompt_cache_key Parameter Removal
As stated in the openai-python v2.25.0 release notes, the prompt_cache_key parameter was removed from responses. Any code that used this parameter to manage prompt caching will throw errors after upgrading to v2.25.0 or later.
If prompt_cache_key was being passed in Responses API calls, it needs to be removed or replaced with whatever alternative caching mechanism OpenAI provides. A detailed migration guide for the replacement method is not currently available in public documentation.
phase Field Temporarily Removed Then Re-Added
In the same v2.25.0 release, the phase field was temporarily removed from message types and then re-added. This kind of change can cause unexpected errors in TypeScript projects with strict type checking or Pydantic-based validation logic.
Before upgrading to v2.25.0 or later, search the codebase for any usage of `prompt_cache_key` and check for type definitions that depend on the `phase` field. If the SDK version is pinned in the CI pipeline, validate the upgrade in a test environment first.
New Features Added in v2.25.0
The release wasn’t all breaking changes. The same version added a tool search tool and a new computer tool. Tool search lets the model autonomously search and select from the list of available tools, while the computer tool enables the model to perform computer operations (mouse clicks, keyboard input, etc.).
These new tools lay the groundwork for agent patterns where tool discovery and computer operations happen directly — without explicit function calling. Previously, function calling was the only way to interact with external systems. With tool search and computer tool, the model can now perform tasks more autonomously.
GPT-5 Pro and GPT-5.5 New Features
GPT-5 Pro
GPT-5 Pro is a high-performance model available exclusively through the Responses API. It supports 400K input and 272K output tokens, with Web Search, Function Calling, Vision, PDF Input, Prompt Caching, and Reasoning capabilities included. However, it does not support streaming or Code Interpreter.
Being Responses API-only means that systems built on the Chat Completions API need to change their API call approach entirely. The lack of streaming support is also a critical constraint for real-time chat services. GPT-5 Pro is positioned for workloads where response latency is acceptable — batch processing, document analysis, and large-scale data reasoning.
| Feature | GPT-5 | GPT-5 Pro | GPT-5.5 |
|---|---|---|---|
| Input Tokens | 200K | 400K | 1M |
| Output Tokens | 100K | 272K | Unconfirmed |
| Streaming | ✅ | ❌ | ✅ |
| Code Interpreter | Unconfirmed | ❌ | Unconfirmed |
| Function Calling | ✅ | ✅ | ✅ |
| Structured Outputs | ✅ | Unconfirmed | ✅ |
| Web Search | Unconfirmed | ✅ | ✅ |
| MCP Support | ❌ | Unconfirmed | ✅ |
| Prompt Caching | ✅ | ✅ | ✅ |
| Computer Use | Unconfirmed | Unconfirmed | ✅ |
GPT-5.5 — Codename Spud
GPT-5.5, unveiled on April 23, 2026, is the latest model in the series. Its standout feature is the 1M token context window. It natively supports image input, Structured Outputs, function calling, prompt caching, and Batch API, along with tool search, built-in computer use, hosted shell, MCP (Model Context Protocol), and web search.
MCP is a standardized protocol for models to access external data sources and tools. With MCP built into GPT-5.5, direct connection to MCP-compatible servers becomes possible without intermediate adapters like LangChain.
The detailed specs for GPT-5.5 come from community sources rather than official API documentation. Final specs should be re-verified in OpenAI’s official documentation before production adoption. Note that official documentation (platform.openai.com) currently has limited accessibility.
Hosted shell enables the model to execute commands in an actual shell environment. Once this feature stabilizes, it could see significant use in CI/CD pipeline automation, server diagnostics, and code execution-based verification.
GPT-5 Reasoning Effort Settings and Usage
The reasoning effort setting, newly introduced with GPT-5, is a parameter that controls the model’s depth of reasoning. It supports four levels — high, medium, low, and minimal — allowing selection based on task complexity.
Use Cases by Level
reasoning effort 설정
┌──────────┬────────────────────────────────────┐
│ high │ 복잡한 수학, 다단계 논리 추론 │
│ medium │ 일반적인 코드 생성, 분석 │
│ low │ 간단한 QA, 요약 │
│ minimal │ 분류, 키워드 추출, 라우팅 판단 │
└──────────┴────────────────────────────────────┘
Lowering reasoning effort speeds up responses and reduces token consumption, but accuracy may drop on complex problems. Conversely, setting it to high lets the model think more deeply, at the cost of increased response time and expense.
This setting is most effective when combined with the router pattern described earlier. For example, a two-stage architecture could use gpt-5.4-nano + minimal for initial classification, then route complex requests to gpt-5.4 + high. Official code examples for detailed usage and parameter passing of reasoning effort are currently not available in public documentation.
In production, a practical approach is to start with medium and adjust incrementally while monitoring accuracy metrics. Applying high to every request inflates costs unnecessarily, while applying minimal across the board risks quality issues.
GPT-5 Series Full Change Timeline and Migration Points
A chronological summary of the changes since the GPT-5 series launch:
flowchart TB
A["GPT-5 출시<br/>2025-08-07<br/>200K/100K 토큰"] --> B["openai-python v2.25.0<br/>2026-03-05<br/>gpt-5.4 + breaking change"]
B --> C["openai-python v2.29.0<br/>2026-03-17<br/>gpt-5.4-mini/nano 추가"]
C --> D["GPT-5.5 공개<br/>2026-04-23<br/>1M 토큰, MCP 지원"]
There are multiple points to watch when migrating from GPT-4o or GPT-4.1 to the GPT-5 series. However, the full list of breaking changes for the GPT-4o/4.1 to GPT-5 migration is not available in public documentation, so only the items confirmed through SDK release notes are covered here.
The confirmed key migration checklist items are as follows:
| Check Item | Details | Impact Scope |
|---|---|---|
prompt_cache_key removal | Deleted from responses parameters in v2.25.0 | All caching logic |
phase field change | Temporarily removed then re-added; type definition changes | TypeScript/Pydantic type validation |
| Responses API-only models | GPT-5 Pro does not support Chat Completions API | Requires API call method switch |
| Model slug changes | gpt-5 → gpt-5.4 → gpt-5.4-mini/nano | Model identifier string updates |
Official pricing figures (cost per token) for the GPT-5 series API are not currently available. In cost-sensitive startup environments, the latest rates should be verified on the OpenAI official pricing page before budgeting.
Once GPT-5.5’s MCP support stabilizes, it could significantly reshape agent architecture design. The current common pattern — managing tool connections through LangChain or custom orchestration layers — may simplify as model-level MCP support reduces the complexity of intermediate layers. Additionally, cost optimization strategies leveraging the GPT-5.4 family’s three-tier lineup, along with feature and pricing comparisons against competitors like Gemini and Claude, should be evaluated during model selection. Ultimately, understanding the GPT-5 release changes comes down to finding the right combination of model, SDK version, and API approach for a given workload.
Related Posts
- GPT-5 Codex CLI Usage in 7 Steps — Rust Installation, config.toml, and Model Selection Guide – GPT-5 Codex transitioned from the TypeScript CLI to a Rust implementation. Installation, config.toml setup, sandbox policy…
- Gmail Gemini Data Security — Complete Guide to Personal Account vs Workspace Policy Differences – Gemini integrated into Gmail has completely different data handling policies depending on account type. Personal accounts vs Workspace, Gemini API…