Table of Contents
- How the OpenAI Codex Agent Loop Structure Changed After the Rust Migration
- While Loop-Based Iterative Execution — The Backbone of the Agent Loop
- Context Management — Server-Side Storage vs Client-Side Transmission
- Error Handling and Retry Mechanisms
- Tool Execution and Sandbox Isolation
- OpenAI Codex Agent Loop Structure Customization Points
- Trade-offs to Consider in Production
- Summary and Next Steps
An AI coding agent isn’t a one-shot call that sends a prompt and receives a response. The core execution mechanism is an agent loop — a repeating cycle where the model makes decisions, invokes tools, and verifies results on its own. This analysis examines the while loop-based iterative execution mechanism, context management strategy, and error handling logic based on the Codex CLI agent-loop.ts source code, along with the background behind the TypeScript to Rust migration.
How the OpenAI Codex Agent Loop Structure Changed After the Rust Migration
The first thing to address: Codex CLI has been rewritten from its original TypeScript (ink-based) implementation to Rust. The codebase is now 96.2% Rust, and the primary motivation behind this transition was optimizing the efficiency of the agent loop — the harness that repeatedly calls the model. Eliminating GC (Garbage Collection) overhead reduced memory consumption and improved execution speed.
The code snippets analyzed here are based on the TypeScript version from the Codex CLI agent-loop.ts source code. The Rust version maintains the same while loop → API call → tool execution → result feedback pattern, but the runtime characteristics differ.
If running an existing TypeScript-based fork or extension, the Rust migration has completely changed the build chain and dependency structure. The ink-based UI layer has also been replaced, so plugin compatibility must be verified.
| Aspect | TypeScript (ink) Era | After Rust Migration |
|---|---|---|
| Runtime | Node.js + GC | Native binary, no GC |
| Memory | GC overhead present | Direct memory management |
| Sandbox | Limited | macOS Seatbelt / Linux Landlock+seccomp |
| Installation | npm | npm, Homebrew |
| Extension Protocol | — | Newline-delimited JSON wire protocol, MCP servers |
While Loop-Based Iterative Execution — The Backbone of the Agent Loop

The core of the OpenAI Codex agent loop structure is the while (turnInput.length > 0) loop. This loop continues iterating as long as there’s input to process, executing three steps per turn:
- Call the OpenAI Responses API (
oai.responses.create) withstream: true - Execute tool calls (shell functions) included in the response via
handleFunctionCall() - Return execution results as
function_call_outputand insert them into the next turn’s input (turnInput)
The loop terminates when there’s no more input to process.
while (turnInput.length > 0) {
// Send request to OpenAI Responses API
const responseCall = this.oai.responses.create({
model, instructions, input,
stream: true,
parallel_tool_calls: false,
tools // shell function
});
// Handle response and tool calls
turnInput = []; // Reset for next iteration
}
The parallel_tool_calls: false setting is worth noting. It means tool calls are executed sequentially rather than in parallel — a design choice that safely guarantees dependencies between shell commands (e.g., creating a file then modifying it). If parallel execution is needed, this flag becomes the first customization point.
Mapping to the Plan-Execute-Verify Cycle
Viewing this while loop through the plan-execute-verify lens makes the structure clear.
- Plan: The model determines the next tool call to execute via the Responses API response. The
instructionsand previous turns’function_call_outputserve as the basis for this decision. - Execute:
handleFunctionCall()runs the shell command. Execution happens within the sandbox in an isolated environment. - Verify: The execution result feeds back into
turnInput, allowing the model to assess success or failure. If additional work is needed, the next turn begins. If everything is complete,turnInputempties and the loop terminates.
Codex CLI uses the Responses API, not the Chat Completions API. The Responses API allows tool call results to be returned directly as `function_call_output` type, making agent loop implementation more concise. Server-side conversation management via `previous_response_id` is also a Responses API-specific feature.
Context Management — Server-Side Storage vs Client-Side Transmission
As the agent loop progresses through multiple turns, conversation history accumulates. Codex CLI’s context management branches into two modes.
Server-Side Storage Mode (disableResponseStorage: false)
This mode sends only previous_response_id and the new input delta. Previous conversation content is stored on OpenAI’s servers, keeping each request payload small. Network costs are lower and token calculations are handled server-side, reducing the client burden.
Client-Side Transmission Mode (disableResponseStorage: true)
This mode includes the entire conversation history (transcript: Array<ResponseInputItem>) excluding system messages in every request. Since no state is stored on the server, it’s advantageous from a privacy standpoint, but the request size grows linearly as turns accumulate.
| Mode | Request Size | Server State | Privacy | Suitable Environment |
|---|---|---|---|---|
| Server-side storage | Delta only → small | Maintained | History stored on server | General development |
| Client-side transmission | Full history → proportional to turns | None | History exists only locally | Sensitive data, air-gapped environments |
In enterprise environments where security policies prohibit storing code on external servers, client-side mode is effectively the only option. Conversely, for environments with frequent long sessions, server-side mode is more efficient in terms of token usage.
A single `disableResponseStorage` setting toggles the mode. Check team security policies first, measure average session length (turn count), then decide. For sessions that frequently exceed 20 turns, the token savings from server-side mode become noticeably significant.
Regarding context compaction, the existence of a /responses/compact endpoint has been mentioned, but detailed specifications aren’t available in the official documentation. This area requires monitoring for future OpenAI documentation updates.
Error Handling and Retry Mechanisms
Since the agent loop makes repeated API calls across multiple turns, it must handle various failure scenarios including network errors, server errors, and rate limits. Codex CLI’s error handling strategy operates on three tiers.
Retryable Transient Errors
Timeouts, 5xx server errors, and connection errors are retried with exponential backoff. The maximum retry count is fixed at MAX_RETRIES = 5.
const RATE_LIMIT_RETRY_WAIT_MS = parseInt(
process.env["OPENAI_RATE_LIMIT_RETRY_WAIT_MS"] || "2500", 10
);
const MAX_RETRIES = 5;
429 Rate Limit Handling
When a rate limit response is received, the wait time is RATE_LIMIT_RETRY_WAIT_MS (default 2500ms) × 2^(attempt-1). If the server suggests a wait time via the Retry-After header, that value takes precedence. The base wait time can be adjusted via the OPENAI_RATE_LIMIT_RETRY_WAIT_MS environment variable, allowing tuning based on API plan.
Token Overflow — Unrecoverable Error
When a request exceeds the model’s context window, retrying is pointless. In this case, an error is displayed to the user and the request is immediately aborted. The entire agent loop terminates.
Setting `OPENAI_RATE_LIMIT_RETRY_WAIT_MS` too low can trigger consecutive 429 responses, paradoxically increasing total wait time. The default 2500ms is a safe starting point for most API plans. With exponential backoff applied, the maximum wait across 5 retries reaches approximately 40 seconds.
The official documentation doesn’t specify a max iteration limit for the agent loop. Since the source code requires turnInput to be empty for the loop to terminate, there’s a theoretical possibility of infinite loops if the model keeps generating tool calls. In production environments, applying external timeouts or turn count limits is the safer approach.
Tool Execution and Sandbox Isolation
Actions determined by the model in the agent loop are executed as actual shell commands through handleFunctionCall(). The tool supported by Codex CLI is the shell function, which performs various tasks including file creation, modification, builds, and test execution.
Sandbox Security Model
Tool execution is always isolated within a sandbox. The OS-specific implementations are:
- macOS:
sandbox-execwith Seatbelt policy files restricts filesystem and network access - Linux: Landlock LSM (Linux Security Module) restricts filesystem access, and seccomp filters system calls
This sandbox layer serves as the agent loop’s safety net. Even if the model generates unexpected commands, they’re blocked by sandbox policies.
Extension Protocol
Codex CLI supports newline-delimited JSON wire protocol and MCP (Model Context Protocol) servers. To add custom tools beyond the default shell tool, an MCP server can be implemented and connected to the agent loop. Installation instructions and protocol details are available in the Codex CLI repository.
Communication between MCP servers and the agent loop uses stdin/stdout-based newline-delimited JSON streams. When the agent decides to invoke a tool, it sends a JSON-formatted request to the MCP server process, and the server returns the execution result in the same format. This simple protocol allows MCP servers to be written in any language — Python, Go, Ruby, and beyond.
The output field of tool execution results is a string type. Even when returning binary data or structured JSON, it must be serialized as a string. The agent loop inserts this string directly as the next turn’s function_call_output for the model to use as decision input. Long tool call results can rapidly consume the context window, so MCP server implementations should limit output size by design.
project/
├── codex-cli/
│ └── src/
│ └── utils/
│ └── agent/
│ └── agent-loop.ts ← Agent loop core logic
├── codex-rs/ ← Rust rewrite
│ ├── core/ ← Loop + context management
│ └── sandbox/ ← OS-specific sandbox implementation
└── docs/
OpenAI Codex Agent Loop Structure Customization Points
Based on source code analysis, here are the actionable customization points.
1. parallel_tool_calls Flag
The default false executes tool calls sequentially. For independent file read operations with no ordering dependency, switching to true can improve per-turn processing speed. However, when dependencies exist — such as file write followed by read — race conditions can occur.
2. Rate Limit Wait Time
The OPENAI_RATE_LIMIT_RETRY_WAIT_MS environment variable adjusts the base wait time. For teams with heavy API usage, increasing this value can be an effective strategy for reducing 429 response frequency.
3. Context Mode Switching
The disableResponseStorage setting toggles between server-side and client-side context management. Choose based on security requirements and session length.
4. Sandbox Policy Adjustment
Modifying the Seatbelt policy files on macOS or Landlock rules on Linux adjusts the access scope for tool execution. Fine-grained controls are possible — allowing writes only to specific directories or completely blocking network access. However, loosening the sandbox increases security risk, so maintaining the principle of least privilege is advisable.
5. Tool Extension via MCP Servers
Beyond the default shell function, custom tools like database queries, HTTP requests, and file conversions can be implemented as MCP servers and connected to the agent loop. The design rationale behind the extension architecture is discussed in the Codex project’s Rust migration discussion.
| Customization Point | How to Change | Impact Scope | Caveats |
|---|---|---|---|
| parallel_tool_calls | API call parameter | Per-turn processing speed | Race conditions with dependent commands |
| Rate Limit wait | Environment variable | Retry interval | Too low causes consecutive 429s |
| Context mode | disableResponseStorage | Request size, privacy | Client mode increases token cost for long sessions |
| Sandbox policy | OS-specific policy files | Tool execution permissions | Security risk if least privilege principle is violated |
| MCP tool extension | MCP server implementation | Tool variety | Protocol compatibility verification required |
Trade-offs to Consider in Production
Key factors to weigh when applying the agent loop structure to real projects.
Cost vs Accuracy
API call costs increase linearly as turn count grows. If the model generates correct code on the first attempt, a single turn suffices. But encountering build errors and repeating fix-retry cycles can stretch to 5–10 turns. In server-side context mode, input token costs are suppressed since only deltas are transmitted, but output tokens are fully billed per turn.
Speed vs Safety
Switching to parallel_tool_calls: true speeds up independent tasks through parallel execution. However, it’s difficult to fully identify implicit dependencies between agent-generated commands, so keeping the default false is the more reasonable choice when prioritizing safety.
Privacy vs Efficiency
Client-side context mode leaves no state on the server, making it favorable for security audits. But for sessions exceeding 30 turns, transmitting the full history with every request causes network bandwidth and token costs to spike. Establishing an upper bound on session length and switching to a new session when exceeded can be an effective operational pattern.
As of 2026-04-23, the Codex guide documentation on platform.openai.com returns a 403 error. Due to the absence of official API-level documentation, the analysis in this article is based on GitHub source code and public discussions.
Summary and Next Steps
The OpenAI Codex agent loop structure is built on while (turnInput.length > 0) iterative execution, Responses API streaming calls, sequential tool execution, and exponential backoff error handling. Context management supports both server-side and client-side modes, and sandbox isolation ensures tool execution safety.
The TypeScript to Rust migration fundamentally improved the agent loop’s runtime efficiency, and MCP protocol support opens a path for tool extensibility.
The next area to watch is the Codex context compaction strategy. Context window management during long sessions directly impacts agent performance, and the official specification for this remains unreleased. Building a custom AI coding agent referencing the Codex CLI architecture is also worth considering. Implementing an agent loop directly on the OpenAI Responses API provides a deeper understanding of Codex CLI’s design decisions and enables designing an agent loop structure tailored to specific project requirements.
Related Posts
- GPT-5 Codex CLI Usage in 7 Steps — Installation, Authentication, API Calls, and Agent Tasks – OpenAI Codex CLI installation, authentication, GPT-5-Codex model API calls, AGENTS.md agent task setup, config…
- OpenAI Codex Security Vulnerability Detection in 5 Steps: SQL Injection & XSS Auto-Scan Guide – Attaching OpenAI Codex to legacy PHP and Node.js codebases to automatically detect SQL injection and XSS vulnerabilities. Prompt setup…