GPT-5.5 API Migration in 7 Steps — Breaking Changes and Docker Setup

Table of Contents

The OpenAI GPT-5 series released 5 minor versions over 6 months, from October 2025 through April 2026. A comprehensive review of GPT-5 API developer changes is necessary because this period introduced breaking changes such as reasoning effort default modifications and image processing behavior changes. GPT-5.5 in particular restructured pricing to $5/1M input tokens and $30/1M output tokens, and many early GPT-5 variant models are scheduled for deprecation on July 23, 2026. For services running OpenAI API calls on Docker, it’s time to audit environment variables, model slugs, and cost control logic.

GPT-5 Series Release Timeline

The GPT-5 series expanded capabilities incrementally over approximately 6 months. The key changes for each version are summarized chronologically in the table below.

DateVersionKey Change
2025-10GPT-5 ProInitial release
2025-11GPT-5.1none reasoning introduced
2025-12GPT-5.2xhigh reasoning, vision enhancement
2026-03GPT-5.4tool search, computer use, 1M context
2026-04GPT-5.5medium default, MCP, Skills support

The introduction of reasoning effort none in GPT-5.1 was significant from a cost optimization perspective. The ability to completely disable reasoning prevents unnecessary reasoning token consumption for simple text transformation or classification tasks.

GPT-5.4 expanded the context window to 1M tokens, and GPT-5.5 extended it further to 1,050,000 tokens. The full release notes for each version are available in the GPT-5 series changelog. With 5 major updates over 6 months, services pinned to a specific version face cumulative breaking change risk.

GPT-5 Series Version Management Strategy
The GPT-5 series has significant behavioral differences even between minor versions. In production environments, pin a specific slug like `gpt-5.5` and verify prompt compatibility in staging before transitioning to a new version.

GPT-5.5 Core Specs and Pricing Structure

GPT-5.5 was released on April 24, 2026, and is available through both the Chat Completions and Responses API. The key specifications for GPT-5.5 are outlined in the table below.

ItemValue
Model sluggpt-5.5
Context window1,050,000 tokens
Max output tokens128,000 tokens
Input price$5 / 1M tokens
Output price$30 / 1M tokens
Large input surcharge threshold272K+ tokens
Large input surcharge multiplierInput 2x, Output 1.5x

The notable aspect of the pricing structure is the surcharge applied when input exceeds 272K tokens. Once input crosses 272K, the input price rises to $10/1M and output price to $45/1M. Services that process large documents or maintain long conversation histories need to monitor token usage to stay below this threshold.

The 128,000 token max output is advantageous for long code generation or document summarization, but since output token cost is 6x the input rate, controlling output length via text.verbosity settings becomes the key to cost management.

272K Token Surcharge Threshold Warning
A 1,050,000 token context window doesn’t mean long inputs are free. In the 272K+ tier, input costs 2x and output costs 1.5x, making prompt caching and input optimization essential.

GPT-5.5 API Breaking Changes for Developers

The most critical aspect of GPT-5.5 is two breaking changes that alter existing behavior.

reasoning effort Default Value Change

GPT-5.5 changed the reasoning effort default to medium. When transitioning from previous versions where high was the default, the same prompts may produce different reasoning depths. Complex code generation or mathematical reasoning tasks may experience quality degradation.

The shift to medium as default is presumably motivated by cost optimization, but services that previously relied on the default without explicitly specifying reasoning effort must now set it explicitly. This can be addressed by setting OPENAI_REASONING_EFFORT=high as a Docker environment variable or passing the parameter directly in API calls.

image_detail Behavior Change

When image_detail is unset or set to auto, GPT-5.5 now preserves the original image without resizing up to 10,240,000 pixels or 6,000px dimensions. Services handling high-resolution images may see input token counts increase significantly beyond expectations.

Services that previously relied on automatic resizing with the auto setting need to either add image preprocessing logic or explicitly set image_detail to low to control token consumption. This change directly impacts costs for services running image analysis pipelines.

Prompt Interpretation Change

GPT-5.5 tends to interpret prompts more literally. Without explicit success criteria and stop rules, results may differ from expectations. The GPT-5.5 migration guide provides prompt writing guidelines. Adding explicit “task completion conditions” and “error handling instructions” to existing prompts is recommended.

reasoning effort Environment Variable Separation Strategy
In Docker environments, separating reasoning effort into environment variables and applying different values per task type is effective. Set `high` for code generation, `none` for simple classification, and `medium` for general conversation to balance cost and quality.

Deprecated Models and GPT-5 API Transition Schedule

Many early GPT-5 series variant models are scheduled for deprecation. Services using these model slugs need to establish transition plans immediately.

2026-07-23 Deprecation Targets

The following models will reach end of service on July 23, 2026:

  • gpt-5-chat-latest
  • gpt-5-codex
  • gpt-5.1-codex
  • gpt-5.1-codex-mini
  • gpt-5.2-codex

If codex-series models are in use for code generation pipelines, transitioning to gpt-5.5 is required. API calls will return errors after deprecation, so transition testing in staging environments must be completed before the deprecation date.

GPT-4o Deprecation Schedule

gpt-4o-2024-05-13 will be terminated on October 23, 2026, with gpt-4.1 recommended as the replacement. Services still using GPT-4o should evaluate transitioning to either GPT-4.1 or GPT-5.5. The complete list of deprecation dates and replacement models is available in the OpenAI model deprecation schedule.

Deprecation Schedule Summary
─────────────────────────────────────
2026-07-23  gpt-5-chat-latest + 4 others terminated
2026-10-23  gpt-4o-2024-05-13 terminated
─────────────────────────────────────
Recommended replacements: gpt-5.5 or gpt-4.1

GPT-5.5 Migration Checklist

Items to verify when transitioning to GPT-5.5 are organized in checklist format. From a DevOps perspective, both infrastructure settings and application code require inspection.

Model Slug and API Method Changes

  • Change model slug to gpt-5.5
  • Responses API recommended over Chat Completions
  • Use Structured Outputs instead of in-prompt schema definitions
  • Place static content at the beginning of prompts for prompt caching optimization
  • Setting text.verbosity to low produces concise responses

Environment Variable Checklist

Checklist
──────────────────────────────────────
[ ] OPENAI_MODEL=gpt-5.5 configured
[ ] OPENAI_REASONING_EFFORT explicitly set
[ ] IMAGE_DETAIL value specified (no auto dependency)
[ ] TEXT_VERBOSITY=low setting reviewed
[ ] Token usage monitoring threshold set at 272K
[ ] Deprecated model slugs removed
[ ] Staging environment prompt compatibility tested
[ ] Cost alert thresholds recalibrated
──────────────────────────────────────

Transitioning to the Responses API is the recommended approach for accessing GPT-5.5’s new features like Structured Outputs, MCP, and Skills. The existing Chat Completions API still works, but access to new features may be limited.

Prompt caching optimization has a direct impact on cost reduction. Placing static content like system prompts or common instructions at the beginning of prompts improves caching efficiency on OpenAI’s server side.

Docker Setup for GPT-5.5 API Transition

Step-by-step migration configuration for Docker-based services calling the OpenAI API. The key principle is separating the model slug and breaking change parameters covered in the GPT-5.5 API breaking changes section into environment variables.

Dockerfile Configuration

FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

ENV OPENAI_MODEL=gpt-5.5
ENV OPENAI_REASONING_EFFORT=medium
ENV OPENAI_IMAGE_DETAIL=low
ENV OPENAI_TEXT_VERBOSITY=low
ENV TOKEN_THRESHOLD=272000

HEALTHCHECK --interval=30s --timeout=5s \
  CMD python healthcheck.py || exit 1

CMD ["python", "main.py"]

Setting environment variable defaults in the Dockerfile while allowing overrides from docker-compose.yml or runtime is the key design principle. Since OPENAI_REASONING_EFFORT defaults to medium in GPT-5.5, services that previously expected high must explicitly specify it.

docker-compose.yml Configuration

version: "3.9"
services:
  api-worker:
    build: .
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - OPENAI_MODEL=gpt-5.5
      - OPENAI_REASONING_EFFORT=high
      - OPENAI_IMAGE_DETAIL=low
      - OPENAI_TEXT_VERBOSITY=low
      - TOKEN_THRESHOLD=272000
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 512M
    restart: unless-stopped

  token-monitor:
    build: ./monitor
    environment:
      - TOKEN_THRESHOLD=272000
      - ALERT_WEBHOOK=${ALERT_WEBHOOK}
    depends_on:
      - api-worker
    restart: unless-stopped

Separating the token-monitor service into its own container allows independent scaling of token usage monitoring. The 272K token surcharge threshold is injected as an environment variable, enabling alerts when the threshold is exceeded.

Secret Management via .env File
`OPENAI_API_KEY` should not be written directly in docker-compose.yml. Inject it through a `.env` file or Docker Secrets. The `.env` file must be included in `.gitignore`.

GPT-5.5 API Monitoring and Cost Management

Continuous monitoring remains necessary after transitioning to GPT-5.5. Pricing structure changes and breaking changes may invalidate existing cost prediction models.

Token Usage Monitoring Setup

# prometheus-alerts.yml
groups:
  - name: openai-token-alerts
    rules:
      - alert: HighTokenUsage
        expr: openai_input_tokens > 272000
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "입력 토큰 272K 초과 - 할증 구간 진입"

When requests exceed 272K tokens, input price doubles and output price increases by 1.5x, making immediate awareness through alerts critical. Combining Prometheus with Alertmanager enables a real-time cost monitoring system.

Cost Comparison: Standard vs Surcharge Tier

TierInput Price (1M tokens)Output Price (1M tokens)Notes
Standard (below 272K)$5$30Base rate
Surcharge (272K+)$10$45Input 2x, Output 1.5x

With 1,000 daily requests averaging 300K input tokens, daily costs approximately double compared to the standard tier. Strategies to keep input tokens below 272K are the core of cost optimization.

Input Token Reduction Strategies

Prompt caching optimization — placing static content at the beginning of prompts — increases OpenAI’s server-side cache hit rate, effectively reducing costs. For services with long conversation histories, summarizing older messages or managing context with a sliding window approach keeps usage below the 272K threshold.

Setting text.verbosity to low also directly contributes to output token reduction. Since output pricing is $30/1M tokens (6x the input rate), reducing output length delivers the greatest cost savings.

GPT-5.5 Transition Considerations and Production Stabilization

Items Not Confirmed in Official Documentation

The following items are not documented in official sources as of May 2026:

  • No official Korean-language documentation for GPT-5.5 is available
  • No prompt compatibility benchmark comparing GPT-4o to GPT-5 series transitions has been published
  • Whether GPT-5.5 function calling response format has changed remains unconfirmed
  • GPT-5.5 fine-tuning support and pricing are also unconfirmed

Practical Transition Considerations

A/B testing existing prompts on GPT-5.5 in a staging environment to compare response quality and token usage should precede production deployment. Since GPT-5.5 interprets prompts more literally, behaviors previously assumed implicitly may differ. Vague instructions like “use your judgment” or “handle it appropriately” may produce unexpected results with GPT-5.5.

# docker-compose.staging.yml
services:
  api-worker-staging:
    build: .
    environment:
      - OPENAI_MODEL=gpt-5.5
      - OPENAI_REASONING_EFFORT=high
      - STAGE=staging
    ports:
      - "8081:8080"

Running a staging service as a separate container and mirroring identical production requests allows direct comparison of response differences. Changing only the model slug while keeping all other settings identical isolates and measures the impact of breaking changes.

Hardcoded Deprecated Model Slug Audit
A full codebase search is needed to verify that deprecated slugs like `gpt-5-codex` and `gpt-5.1-codex` aren’t hardcoded in source code or configuration files. Without an environment variable injection structure during Docker image builds, outages may occur on the deprecation date.

Post-Transition Stabilization Checks

Intensive monitoring of response quality, token usage, and error rates for a minimum of 2 weeks after model transition is recommended. The actual service impact of reasoning effort default changes and image processing behavior changes varies depending on traffic patterns. Continuously checking the latest parameter information in the GPT-5.5 model specifications is also important.

Once GPT-5.5’s MCP and Skills support stabilizes, external tool integration architecture becomes a redesign candidate. Transitioning external API integrations previously implemented via function calling to MCP-based approaches enables more flexible tool chains. Analyzing quality-cost tradeoffs across GPT-5 reasoning effort settings, OpenAI Responses API usage patterns, and establishing migration timelines aligned with GPT-4o end-of-support dates are also worth exploring as follow-up topics.

GPT-5.5 API migration is not a simple model slug swap — it’s a comprehensive migration encompassing reasoning effort, image processing, and prompt interpretation changes. Separating Docker environment variables, building token monitoring systems, and running staging A/B tests are the keys to reflecting GPT-5 API developer changes while preventing production incidents.

Scroll to Top