OpenAI Codex Agent Loop Structure: 5-Step Analysis of agent-loop.ts

Table of Contents

There’s a widespread perception that AI coding agents are simple wrappers that call an LLM API once and return the result. Examining the OpenAI Codex agent loop structure at the source code level reveals this perception to be fundamentally wrong. Codex CLI is built on a loop-based architecture where a while loop autonomously handles model calls, tool execution, context accumulation, and error retries.

One fact needs to be addressed upfront. Codex CLI was originally written in TypeScript (ink-based), but has since been fully rewritten in Rust. According to the Codex Rust transition discussion, 96.2% of the codebase now consists of Rust. The structural patterns analyzed in agent-loop.ts follow the same design principles in the Rust version, but the runtime characteristics have changed fundamentally. Applying the TypeScript code as-is may not work, and this article should be read with that premise in mind.

Why the Loop Matters in AI Coding Agents

Most LLM-based code operates on a single request-response round trip. Send a prompt, receive a response, display the result — done. Coding agents are different. Completing a single task requires a chain of actions: reading files, modifying code, running tests, analyzing errors, and modifying again.

Understanding how the OpenAI Codex agent loop structure solves this problem provides concrete design criteria for three core challenges when building custom agents: iteration control, context explosion, and error recovery.

Codex CLI Basics
Codex CLI is a lightweight coding agent that runs locally, installable via npm and Homebrew. It supports newline-delimited JSON wire protocol and MCP (Model Context Protocol) servers for extensibility.

Understanding agent loop behavior requires going beyond the notion of “it repeats.” Each iteration demands precise knowledge of what goes in, what gets executed, and what carries over to the next iteration.

Structural Limitations of Previous Agent Loop Approaches

legacy-agent-loop-limits

Various attempts at implementing LLM-based agent loops existed before OpenAI Codex. The most common approaches were as follows.

First, the naive recursive call approach. When an LLM response includes a “next step” instruction, the same function is called recursively. As call depth increases, the stack grows and fails unpredictably once the context window is exceeded.

Second, the fixed iteration count approach. A pattern like for i in range(10) hardcodes a maximum iteration count. The loop either terminates regardless of task completion, or makes unnecessary API calls for already-completed tasks.

Third, the event-driven approach. Each tool execution result is emitted as an event and processed by separate handlers. The agent’s “intent” and execution flow become decoupled, making debugging extremely difficult.

ApproachTermination ConditionContext ManagementError RecoveryDebugging Difficulty
Naive RecursionRecursive exit conditionStack-based accumulationDifficultHigh
Fixed Iteration CountCounter exhaustionExternal variablesLimitedMedium
Event-DrivenNo events remainingEvent storeHandler-dependentVery High
Codex while LoopInput array exhaustionServer/client branchingExponential backoffMedium

The key innovation of the Codex agent loop is resolving these limitations with the simple yet effective pattern of while (turnInput.length > 0).

The while Loop Structure of the Codex Agent Loop

In the Codex CLI agent-loop.ts source code, the core of the agent loop is the while (turnInput.length > 0) structure. Each turn calls the OpenAI Responses API (oai.responses.create) with stream: true, executes any tool calls included in the response, and feeds the results as input for the next turn.

while (turnInput.length > 0) {
  // Send request to OpenAI Responses API
  const responseCall = this.oai.responses.create({
    model, instructions, input,
    stream: true,
    parallel_tool_calls: false,
    tools // shell function
  });
  // Handle response and tool calls
  turnInput = []; // Reset for next iteration
}

Three aspects of this code deserve attention.

Termination Condition Design

The loop’s termination condition is turnInput.length > 0. When the model stops returning tool calls, the turnInput array remains empty and the loop terminates naturally. Termination is dynamic — driven by task completion rather than a fixed count. This pattern allows simple tasks to finish in 1–2 turns while complex refactoring can run for dozens of turns.

Tool Call Flow

The parallel_tool_calls: false setting is a deliberate design choice to guarantee sequential dependency between tool calls. Disabling parallel tool calls ensures each tool call’s result can influence the input of the next one. This preserves sequential dependencies: reading a file determines where to edit, and the edit result determines which test to run.

When the model returns a tool call in shell function form during each turn, handleFunctionCall() executes it and converts the result into a function_call_output that gets added to the next turn’s input array.

What stream: true Means
Receiving responses in streaming mode provides feedback to the user as soon as the first token arrives. In an agent loop, this isn’t just a UX improvement — it also provides a control point where users can check intermediate results during long tasks and interrupt if necessary.

The Role of the Input Array

turnInput isn’t a simple string — it’s an array of ResponseInputItem. The user’s initial instruction, model responses from previous turns, and tool call results all accumulate as elements in this array. By the time the loop terminates, this array represents the complete execution history of the task.

Context Management Strategy in the OpenAI Codex Agent Loop

As the agent loop grows longer, context window management becomes the critical challenge. Codex CLI addresses this by branching into two modes.

Server-Side Storage Mode

This mode operates when disableResponseStorage: false. The server stores previous responses, and the client only needs to send previous_response_id plus the new input delta. Network transmission is minimized, and costs from retransmitting tokens are reduced.

The advantage is clear. Even at turn 20, the client payload only grows by the size of that turn’s new input. The server handles combining the full conversation history, reducing client-side memory pressure.

Client-Side Transmission Mode

When disableResponseStorage: true, nothing is stored on the server. Instead, the full conversation history (transcript: Array<ResponseInputItem>), excluding system messages, is included with every request.

This mode is useful when security or privacy requirements exist. Environments that don’t want the entire codebase stored on a server can opt for client-side mode. The trade-off is that request size grows linearly as turns increase, and the token limit is reached sooner than in server-side mode.

Context Compaction Spec Unverified
Detailed official documentation on a dedicated compaction endpoint for compressing context as it approaches token limits has not been confirmed. This is not explicitly documented in official sources either.

Error Handling and Exponential Backoff Retry Mechanism

API call failures are inevitable in an agent loop. Network blips, server overload, and rate limit violations occur routinely. Codex CLI handles this with a layered retry strategy.

const RATE_LIMIT_RETRY_WAIT_MS = parseInt(
  process.env["OPENAI_RATE_LIMIT_RETRY_WAIT_MS"] || "2500", 10
);
const MAX_RETRIES = 5;

The retry strategy branches into three paths based on error type.

First, transient errors (timeout, 5xx, connection errors) trigger exponential backoff retries. Wait time increases exponentially after each failure, up to a maximum of 5 attempts. This approach gives the server time to recover while preventing load amplification from excessive retries — a standard pattern.

Second, for 429 rate limit errors, the wait time follows whichever is longer: RATE_LIMIT_RETRY_WAIT_MS (default 2500ms) × 2^(attempt-1), or the duration suggested by the server’s Retry-After header. The ability to adjust the base wait time via the OPENAI_RATE_LIMIT_RETRY_WAIT_MS environment variable is a practical configuration option.

Third, token overflow errors are not retry candidates. The error is displayed to the user and the request is immediately aborted. Since the context already exceeds the token limit, repeating the same request won’t produce a different outcome.

Design Rationale for MAX_RETRIES = 5
Five retry attempts combined with exponential backoff result in a total wait time of approximately 2.5 + 5 + 10 + 20 + 40 = 77.5 seconds (at default values). This is enough for most transient failures to recover, while notifying the user of failure within 2 minutes even during a complete service outage.

The noteworthy aspect of this error handling structure is its relationship with the agent loop itself. Retries occur at the API call level within the while loop. On successful retry, the current turn completes normally and the loop proceeds to the next turn. On retry failure, the entire loop halts and control returns to the user.

TypeScript to Rust: OpenAI Codex Agent Loop Runtime Transition

The rewrite of Codex CLI from TypeScript to Rust wasn’t a matter of language preference. As confirmed in the Codex CLI GitHub repository, 96.2% of the codebase is now Rust. The core motivation was optimizing the efficiency of the agent loop — the harness that repeatedly calls the model.

Running the agent loop in a TypeScript (Node.js) runtime introduces structural issues. Each turn involves parsing API responses, processing tool call results, and assembling the next input — all while GC (Garbage Collection) intervenes. As turns accumulate, ResponseInputItem objects pile up on the heap, and GC pauses affect response latency.

The Rust transition eliminated GC overhead entirely. Reduced memory consumption and improved execution speed are the direct results. For patterns involving repeated object creation and destruction like the agent loop, Rust’s ownership-based memory management provides a structural advantage.

Sandbox Security Isolation

Alongside the Rust transition, sandbox implementation was strengthened to native level. macOS uses sandbox-exec with Seatbelt policies, while Linux uses Landlock and seccomp to restrict the agent’s filesystem access and system calls.

The reason this sandbox structure is directly tied to the agent loop is clear. Every tool call (shell command) executed within the loop runs inside the sandbox, so even if the model attempts unintended file deletion or network access, it gets blocked at the OS level. During the TypeScript era, achieving this level of isolation required Docker or separate processes. Rust enables direct system call control.

TypeScript Code Incompatibility
The agent-loop.ts code snippets analyzed in this article represent the TypeScript version’s structure. The current Rust version implements the same design patterns in Rust, so forking the TypeScript code as-is will not be compatible with the current Codex CLI. The Rust version’s source code should be consulted separately.

Validating the Codex Agent Loop Design: Why This Structure Works

The effectiveness of the Codex agent loop structure can be validated from three perspectives.

First, termination guarantee. Since the loop terminates when the turnInput array is empty, termination is guaranteed as long as the model doesn’t generate tool calls indefinitely. That said, no official documentation on maximum iteration limits for the agent loop currently exists, leaving a dependency on model behavior.

Second, state consistency. By clearly branching context management into server-side and client-side modes, conversation history consistency is guaranteed in each mode. On the server side, previous_response_id forms a chain; on the client side, the transcript array holds the complete history.

Third, failure isolation. Error handling operates at the API call level, so a failure in one turn preserves the results of previous turns. The loop continues on successful retry and returns control to the user on failure, minimizing partial work loss.

These three points are precisely where previous approaches (recursive, fixed count, event-driven) fell short. The Codex design isn’t revolutionary per se — it’s better understood as a well-calibrated combination of patterns long proven in distributed systems (input-queue-based loops, exponential backoff, state separation) applied to the agent loop context.

Limitations and Unverified Areas of the OpenAI Codex Agent Loop Structure

Every design has limitations. The Codex agent loop structure is no exception.

First, the official Codex guide documentation on platform.openai.com is currently inaccessible (403). Without official documentation defining agent loop behavior at the API level, analysis must rely on source code. Source code reveals the “how” of implementation, but tends to communicate the “why” of design incompletely.

Second, there is no official documentation on maximum iteration limits for the agent loop. The while (turnInput.length > 0) structure theoretically allows infinite loops, and how safeguards against the model continuously generating tool calls are implemented at the source code level requires further investigation.

Third, the detailed spec for the context compaction mechanism is unverified. A feature for summarizing and compressing conversation history when token limits are reached during long agent loops is known to exist, but detailed specifications are not documented in official sources.

Despite these limitations, the Codex CLI agent loop structure holds value as a concrete reference implementation for designing LLM-based agents. The three axes — input-queue-based termination conditions, server/client context branching, and layered error handling — are patterns directly applicable to general-purpose agent loop design.

Digging deeper into the Codex CLI architecture, the next step is examining its integration with MCP (Model Context Protocol) servers. Understanding how the agent loop’s tool invocation mechanism extends through MCP opens the path to using Codex CLI not just as a coding tool but as a general-purpose agent platform. The OpenAI Responses API streaming protocol details and OS-specific differences in Codex sandbox security policies are also topics worth separate analysis as extensions of the OpenAI Codex agent loop structure.

Scroll to Top