OpenAI Codex Agent Loop Structure Explained — 5-Step Plan-Execute-Verify

Table of Contents

An AI coding agent isn’t a one-shot call that sends a prompt and receives a response. The core execution mechanism is an agent loop — a repeating cycle where the model makes decisions, invokes tools, and verifies results on its own. This analysis examines the while loop-based iterative execution mechanism, context management strategy, and error handling logic based on the Codex CLI agent-loop.ts source code, along with the background behind the TypeScript to Rust migration.

How the OpenAI Codex Agent Loop Structure Changed After the Rust Migration

The first thing to address: Codex CLI has been rewritten from its original TypeScript (ink-based) implementation to Rust. The codebase is now 96.2% Rust, and the primary motivation behind this transition was optimizing the efficiency of the agent loop — the harness that repeatedly calls the model. Eliminating GC (Garbage Collection) overhead reduced memory consumption and improved execution speed.

The code snippets analyzed here are based on the TypeScript version from the Codex CLI agent-loop.ts source code. The Rust version maintains the same while loop → API call → tool execution → result feedback pattern, but the runtime characteristics differ.

TypeScript→Rust Migration Notice
If running an existing TypeScript-based fork or extension, the Rust migration has completely changed the build chain and dependency structure. The ink-based UI layer has also been replaced, so plugin compatibility must be verified.
The Rust migration also strengthened the sandbox security model. macOS uses `sandbox-exec` with Seatbelt policies, while Linux uses Landlock/seccomp to isolate tool execution environments. Given that the agent loop repeatedly executes shell commands, sandboxing isn’t optional — it’s essential.
AspectTypeScript (ink) EraAfter Rust Migration
RuntimeNode.js + GCNative binary, no GC
MemoryGC overhead presentDirect memory management
SandboxLimitedmacOS Seatbelt / Linux Landlock+seccomp
Installationnpmnpm, Homebrew
Extension ProtocolNewline-delimited JSON wire protocol, MCP servers

While Loop-Based Iterative Execution — The Backbone of the Agent Loop

while-loop-plan-execute-verify

The core of the OpenAI Codex agent loop structure is the while (turnInput.length > 0) loop. This loop continues iterating as long as there’s input to process, executing three steps per turn:

  1. Call the OpenAI Responses API (oai.responses.create) with stream: true
  2. Execute tool calls (shell functions) included in the response via handleFunctionCall()
  3. Return execution results as function_call_output and insert them into the next turn’s input (turnInput)

The loop terminates when there’s no more input to process.

while (turnInput.length > 0) {
  // Send request to OpenAI Responses API
  const responseCall = this.oai.responses.create({
    model, instructions, input,
    stream: true,
    parallel_tool_calls: false,
    tools // shell function
  });
  // Handle response and tool calls
  turnInput = []; // Reset for next iteration
}

The parallel_tool_calls: false setting is worth noting. It means tool calls are executed sequentially rather than in parallel — a design choice that safely guarantees dependencies between shell commands (e.g., creating a file then modifying it). If parallel execution is needed, this flag becomes the first customization point.

Mapping to the Plan-Execute-Verify Cycle

Viewing this while loop through the plan-execute-verify lens makes the structure clear.

  • Plan: The model determines the next tool call to execute via the Responses API response. The instructions and previous turns’ function_call_output serve as the basis for this decision.
  • Execute: handleFunctionCall() runs the shell command. Execution happens within the sandbox in an isolated environment.
  • Verify: The execution result feeds back into turnInput, allowing the model to assess success or failure. If additional work is needed, the next turn begins. If everything is complete, turnInput empties and the loop terminates.
Responses API vs Chat Completions API
Codex CLI uses the Responses API, not the Chat Completions API. The Responses API allows tool call results to be returned directly as `function_call_output` type, making agent loop implementation more concise. Server-side conversation management via `previous_response_id` is also a Responses API-specific feature.

Context Management — Server-Side Storage vs Client-Side Transmission

As the agent loop progresses through multiple turns, conversation history accumulates. Codex CLI’s context management branches into two modes.

Server-Side Storage Mode (disableResponseStorage: false)

This mode sends only previous_response_id and the new input delta. Previous conversation content is stored on OpenAI’s servers, keeping each request payload small. Network costs are lower and token calculations are handled server-side, reducing the client burden.

Client-Side Transmission Mode (disableResponseStorage: true)

This mode includes the entire conversation history (transcript: Array<ResponseInputItem>) excluding system messages in every request. Since no state is stored on the server, it’s advantageous from a privacy standpoint, but the request size grows linearly as turns accumulate.

ModeRequest SizeServer StatePrivacySuitable Environment
Server-side storageDelta only → smallMaintainedHistory stored on serverGeneral development
Client-side transmissionFull history → proportional to turnsNoneHistory exists only locallySensitive data, air-gapped environments

In enterprise environments where security policies prohibit storing code on external servers, client-side mode is effectively the only option. Conversely, for environments with frequent long sessions, server-side mode is more efficient in terms of token usage.

Context Mode Selection Criteria
A single `disableResponseStorage` setting toggles the mode. Check team security policies first, measure average session length (turn count), then decide. For sessions that frequently exceed 20 turns, the token savings from server-side mode become noticeably significant.

Regarding context compaction, the existence of a /responses/compact endpoint has been mentioned, but detailed specifications aren’t available in the official documentation. This area requires monitoring for future OpenAI documentation updates.

Error Handling and Retry Mechanisms

Since the agent loop makes repeated API calls across multiple turns, it must handle various failure scenarios including network errors, server errors, and rate limits. Codex CLI’s error handling strategy operates on three tiers.

Retryable Transient Errors

Timeouts, 5xx server errors, and connection errors are retried with exponential backoff. The maximum retry count is fixed at MAX_RETRIES = 5.

const RATE_LIMIT_RETRY_WAIT_MS = parseInt(
  process.env["OPENAI_RATE_LIMIT_RETRY_WAIT_MS"] || "2500", 10
);
const MAX_RETRIES = 5;

429 Rate Limit Handling

When a rate limit response is received, the wait time is RATE_LIMIT_RETRY_WAIT_MS (default 2500ms) × 2^(attempt-1). If the server suggests a wait time via the Retry-After header, that value takes precedence. The base wait time can be adjusted via the OPENAI_RATE_LIMIT_RETRY_WAIT_MS environment variable, allowing tuning based on API plan.

Token Overflow — Unrecoverable Error

When a request exceeds the model’s context window, retrying is pointless. In this case, an error is displayed to the user and the request is immediately aborted. The entire agent loop terminates.

Rate Limit Wait Time Customization
Setting `OPENAI_RATE_LIMIT_RETRY_WAIT_MS` too low can trigger consecutive 429 responses, paradoxically increasing total wait time. The default 2500ms is a safe starting point for most API plans. With exponential backoff applied, the maximum wait across 5 retries reaches approximately 40 seconds.

The official documentation doesn’t specify a max iteration limit for the agent loop. Since the source code requires turnInput to be empty for the loop to terminate, there’s a theoretical possibility of infinite loops if the model keeps generating tool calls. In production environments, applying external timeouts or turn count limits is the safer approach.

Tool Execution and Sandbox Isolation

Actions determined by the model in the agent loop are executed as actual shell commands through handleFunctionCall(). The tool supported by Codex CLI is the shell function, which performs various tasks including file creation, modification, builds, and test execution.

Sandbox Security Model

Tool execution is always isolated within a sandbox. The OS-specific implementations are:

  • macOS: sandbox-exec with Seatbelt policy files restricts filesystem and network access
  • Linux: Landlock LSM (Linux Security Module) restricts filesystem access, and seccomp filters system calls

This sandbox layer serves as the agent loop’s safety net. Even if the model generates unexpected commands, they’re blocked by sandbox policies.

Extension Protocol

Codex CLI supports newline-delimited JSON wire protocol and MCP (Model Context Protocol) servers. To add custom tools beyond the default shell tool, an MCP server can be implemented and connected to the agent loop. Installation instructions and protocol details are available in the Codex CLI repository.

Communication between MCP servers and the agent loop uses stdin/stdout-based newline-delimited JSON streams. When the agent decides to invoke a tool, it sends a JSON-formatted request to the MCP server process, and the server returns the execution result in the same format. This simple protocol allows MCP servers to be written in any language — Python, Go, Ruby, and beyond.

The output field of tool execution results is a string type. Even when returning binary data or structured JSON, it must be serialized as a string. The agent loop inserts this string directly as the next turn’s function_call_output for the model to use as decision input. Long tool call results can rapidly consume the context window, so MCP server implementations should limit output size by design.

project/
├── codex-cli/
│   └── src/
│       └── utils/
│           └── agent/
│               └── agent-loop.ts   ← Agent loop core logic
├── codex-rs/                       ← Rust rewrite
│   ├── core/                       ← Loop + context management
│   └── sandbox/                    ← OS-specific sandbox implementation
└── docs/

OpenAI Codex Agent Loop Structure Customization Points

Based on source code analysis, here are the actionable customization points.

1. parallel_tool_calls Flag

The default false executes tool calls sequentially. For independent file read operations with no ordering dependency, switching to true can improve per-turn processing speed. However, when dependencies exist — such as file write followed by read — race conditions can occur.

2. Rate Limit Wait Time

The OPENAI_RATE_LIMIT_RETRY_WAIT_MS environment variable adjusts the base wait time. For teams with heavy API usage, increasing this value can be an effective strategy for reducing 429 response frequency.

3. Context Mode Switching

The disableResponseStorage setting toggles between server-side and client-side context management. Choose based on security requirements and session length.

4. Sandbox Policy Adjustment

Modifying the Seatbelt policy files on macOS or Landlock rules on Linux adjusts the access scope for tool execution. Fine-grained controls are possible — allowing writes only to specific directories or completely blocking network access. However, loosening the sandbox increases security risk, so maintaining the principle of least privilege is advisable.

5. Tool Extension via MCP Servers

Beyond the default shell function, custom tools like database queries, HTTP requests, and file conversions can be implemented as MCP servers and connected to the agent loop. The design rationale behind the extension architecture is discussed in the Codex project’s Rust migration discussion.

Customization PointHow to ChangeImpact ScopeCaveats
parallel_tool_callsAPI call parameterPer-turn processing speedRace conditions with dependent commands
Rate Limit waitEnvironment variableRetry intervalToo low causes consecutive 429s
Context modedisableResponseStorageRequest size, privacyClient mode increases token cost for long sessions
Sandbox policyOS-specific policy filesTool execution permissionsSecurity risk if least privilege principle is violated
MCP tool extensionMCP server implementationTool varietyProtocol compatibility verification required

Trade-offs to Consider in Production

Key factors to weigh when applying the agent loop structure to real projects.

Cost vs Accuracy

API call costs increase linearly as turn count grows. If the model generates correct code on the first attempt, a single turn suffices. But encountering build errors and repeating fix-retry cycles can stretch to 5–10 turns. In server-side context mode, input token costs are suppressed since only deltas are transmitted, but output tokens are fully billed per turn.

Speed vs Safety

Switching to parallel_tool_calls: true speeds up independent tasks through parallel execution. However, it’s difficult to fully identify implicit dependencies between agent-generated commands, so keeping the default false is the more reasonable choice when prioritizing safety.

Privacy vs Efficiency

Client-side context mode leaves no state on the server, making it favorable for security audits. But for sessions exceeding 30 turns, transmitting the full history with every request causes network bandwidth and token costs to spike. Establishing an upper bound on session length and switching to a new session when exceeded can be an effective operational pattern.

platform.openai.com Official Guide Inaccessible
As of 2026-04-23, the Codex guide documentation on platform.openai.com returns a 403 error. Due to the absence of official API-level documentation, the analysis in this article is based on GitHub source code and public discussions.

Summary and Next Steps

The OpenAI Codex agent loop structure is built on while (turnInput.length > 0) iterative execution, Responses API streaming calls, sequential tool execution, and exponential backoff error handling. Context management supports both server-side and client-side modes, and sandbox isolation ensures tool execution safety.

The TypeScript to Rust migration fundamentally improved the agent loop’s runtime efficiency, and MCP protocol support opens a path for tool extensibility.

The next area to watch is the Codex context compaction strategy. Context window management during long sessions directly impacts agent performance, and the official specification for this remains unreleased. Building a custom AI coding agent referencing the Codex CLI architecture is also worth considering. Implementing an agent loop directly on the OpenAI Responses API provides a deeper understanding of Codex CLI’s design decisions and enables designing an agent loop structure tailored to specific project requirements.

Scroll to Top