Coding your own agentic orchestration like OpenClaw
OpenClaw has become one of the most popular open-source AI assistants, and a big reason is its orchestration architecture. Not because it uses a fancy framework, but because it deliberately does not. No LangChain. No CrewAI. No LangGraph. Just a ReAct loop, a filesystem, sessions, and plain API calls. This post breaks down the core patterns behind OpenClaw's orchestration and shows you how to build each one yourself in vanilla Python.
Why Not LangGraph?
The first instinct when replicating something like OpenClaw is to reach for a framework. LangGraph is the obvious candidate: it is a graph-definition framework designed for workflows where you define nodes, edges, and state transitions ahead of time. But OpenClaw's orchestration is fundamentally non-deterministic. The agent decides at runtime which tools to call, whether to spawn a sub-agent, when to compact memory, and how to route messages across sessions. These decisions emerge from the LLM's reasoning, not from a predefined graph.
You could model this in LangGraph with conditional edges and dynamic routing, but you would be fighting the framework's assumptions. LangGraph wants you to declare your graph structure upfront. OpenClaw's design philosophy is the opposite: let the model decide the control flow at every step, and build the infrastructure to support that. So we will use vanilla Python, which is closer to what OpenClaw actually does under the hood (though OpenClaw itself is written in TypeScript).
Pattern 1: The ReAct Agent Loop
At the heart of OpenClaw's Pi agent runtime is a ReAct loop: Reason, Act, Observe, Repeat. The model receives the current conversation state, reasons about what to do, calls a tool (or responds directly), observes the tool result, and loops. This continues until the model decides to stop (no more tool calls) or a hard limit is hit. Here is the minimal version:
import json
import subprocess
import openai
client = openai.OpenAI()
TOOLS = [
{
"type": "function",
"function": {
"name": "bash",
"description": "Run a shell command and return stdout/stderr.",
"parameters": {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "The shell command to execute.",
}
},
"required": ["command"],
},
},
},
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read a file from the workspace.",
"parameters": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute path to the file.",
}
},
"required": ["path"],
},
},
},
]
def execute_tool(name: str, args: dict) -> str:
if name == "bash":
result = subprocess.run(
args["command"],
shell=True,
capture_output=True,
text=True,
)
return (
f"stdout: {result.stdout}\n"
f"stderr: {result.stderr}\n"
f"exit_code: {result.returncode}"
)
elif name == "read_file":
with open(args["path"]) as f:
return f.read()
return f"Unknown tool: {name}"
def react_loop(user_message: str, max_iterations: int = 10):
messages = [
{
"role": "system",
"content": "You are a helpful assistant with access to tools.",
},
{
"role": "user",
"content": user_message,
},
]
for i in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOLS,
)
choice = response.choices[0]
messages.append(choice.message)
# No tool calls = the model is done
if not choice.message.tool_calls:
return choice.message.content
# Execute each tool call and feed results back
for tool_call in choice.message.tool_calls:
args = json.loads(tool_call.function.arguments)
result = execute_tool(tool_call.function.name, args)
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": result,
}
)
return "Max iterations reached."This is the core of every production agent harness. The model calls tools in a loop, and the harness feeds results back until the model produces a final answer. OpenClaw's Pi runtime adds streaming, block-level output, and RPC transport on top of this, but the fundamental loop is the same.
Pattern 2: Filesystem-First Memory
OpenClaw does not use a vector database. It does not use Redis. Its long-term memory is the filesystem. The workspace at ~/.openclaw/workspace contains injected prompt files (AGENTS.md, SOUL.md, TOOLS.md), skills, and any state the agent writes during execution. This is the durable layer. The conversation history (RAM) is the volatile working memory, and the context window is what the model actually sees. The harness orchestrates data flow between these three layers.
Here is how to implement the same pattern: a workspace directory that the agent reads at startup and writes to during execution, with a system prompt that loads from AGENTS.md:
from pathlib import Path
WORKSPACE = Path.home() / ".myagent" / "workspace"
WORKSPACE.mkdir(parents=True, exist_ok=True)
def load_system_prompt() -> str:
"""Load system prompt from workspace files, just like OpenClaw."""
parts = ["You are a helpful assistant."]
agents_md = WORKSPACE / "AGENTS.md"
if agents_md.exists():
parts.append(f"\n## Agent Instructions\n{agents_md.read_text()}")
soul_md = WORKSPACE / "SOUL.md"
if soul_md.exists():
parts.append(f"\n## Personality\n{soul_md.read_text()}")
tools_md = WORKSPACE / "TOOLS.md"
if tools_md.exists():
parts.append(f"\n## Tool Guidelines\n{tools_md.read_text()}")
# Load any skills
skills_dir = WORKSPACE / "skills"
if skills_dir.exists():
for skill_dir in skills_dir.iterdir():
skill_file = skill_dir / "SKILL.md"
if skill_file.exists():
parts.append(
f"\n## Skill: {skill_dir.name}\n{skill_file.read_text()}"
)
return "\n".join(parts)
def save_progress(key: str, value: str):
"""Flush state to disk before discarding from context."""
progress_dir = WORKSPACE / "progress"
progress_dir.mkdir(exist_ok=True)
progress_file = progress_dir / f"{key}.md"
progress_file.write_text(value)
def load_progress(key: str) -> str | None:
"""Rehydrate state from disk."""
path = WORKSPACE / "progress" / f"{key}.md"
return path.read_text() if path.exists() else NoneThe key invariant from OpenClaw: memory is always flushed to disk before being discarded from context. When the context window fills up and the harness needs to compact, it writes a summary to disk first. When the agent resumes in a new session, it rehydrates by reading from the filesystem. No fancy retrieval system needed.
Pattern 3: Session-Based Multi-Agent Routing
OpenClaw's multi-agent coordination does not use a centralized orchestrator that decomposes tasks. Instead, it uses sessions. Each session is an isolated agent with its own conversation history, system prompt, and tool set. Agents communicate through session tools: sessions_list to discover peers, sessions_history to read transcripts, sessions_send to message another session, and sessions_spawn to create new agents. This is message-passing, not graph traversal.
Here is a minimal session manager that lets agents spawn sub-agents and communicate:
import json
import uuid
from dataclasses import dataclass, field
@dataclass
class Session:
id: str = field(default_factory=lambda: str(uuid.uuid4())[:8])
name: str = "main"
messages: list = field(default_factory=list)
system_prompt: str = "You are a helpful assistant."
tools: list = field(default_factory=list)
class SessionManager:
def __init__(self):
self.sessions: dict[str, Session] = {}
def create(
self,
name: str,
system_prompt: str,
tools: list | None = None,
) -> Session:
session = Session(
name=name,
system_prompt=system_prompt,
tools=tools or [],
)
self.sessions[session.id] = session
return session
def list_sessions(self) -> list[dict]:
return [
{
"id": session.id,
"name": session.name,
"messages": len(session.messages),
}
for session in self.sessions.values()
]
def get_history(self, session_id: str) -> list[dict]:
session = self.sessions[session_id]
return session.messages
def send_message(self, from_id: str, to_id: str, content: str) -> str:
"""Send a message from one session to another. Returns the response."""
target = self.sessions[to_id]
target.messages.append(
{
"role": "user",
"content": f"[from:{from_id}] {content}",
}
)
# Run the target session's react loop to get a response
response = react_loop_with_session(target)
return response
def spawn(self, name: str, task: str, system_prompt: str) -> str:
"""Spawn a sub-agent, run it to completion, return its output."""
session = self.create(
name=name,
system_prompt=system_prompt,
)
session.messages.append(
{
"role": "user",
"content": task,
}
)
return react_loop_with_session(session)
def react_loop_with_session(session: Session, max_iterations: int = 10) -> str:
messages = [
{
"role": "system",
"content": session.system_prompt,
},
*session.messages,
]
for _ in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOLS if TOOLS else None,
)
choice = response.choices[0]
session.messages.append(
{
"role": "assistant",
"content": choice.message.content,
}
)
if not choice.message.tool_calls:
return choice.message.content or ""
for tool_call in choice.message.tool_calls:
args = json.loads(tool_call.function.arguments)
result = execute_tool(tool_call.function.name, args)
session.messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": result,
}
)
return "Max iterations reached."The important difference from something like CrewAI or AutoGen is that OpenClaw does not have a predefined team structure. The main agent decides whether to spawn a sub-agent at runtime, through tool calls. The sub-agent gets its own isolated context window, its own system prompt, and a restricted set of tools. This is the orchestrator-worker pattern, but dynamic: the model chooses when and what to delegate.
Pattern 4: Context Compaction
When conversations get long, the context window fills up. OpenClaw handles this with a /compact command that triggers automatic summarization of the conversation. The harness detects when token usage exceeds a threshold (around 90% of the model's context limit), summarizes the conversation so far, replaces the message history with the summary, and continues. The key is that progress and important state are flushed to the filesystem before compaction, so nothing is lost permanently.
Here is a simple compaction implementation:
import json
import tiktoken
def count_tokens(messages: list[dict], model: str = "gpt-4o") -> int:
enc = tiktoken.encoding_for_model(model)
return sum(len(enc.encode(message.get("content", ""))) for message in messages)
def compact_if_needed(
session: Session,
threshold: float = 0.9,
max_tokens: int = 128_000,
):
token_count = count_tokens(session.messages)
if token_count < threshold * max_tokens:
return # Not yet at threshold
# Save progress to disk before compacting
save_progress(
f"session_{session.id}_pre_compact",
json.dumps(session.messages[-5:], default=str),
)
# Ask the model to summarize the conversation
summary_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
""Summarize this conversation concisely, preserving all important decisions, facts, and open tasks.""
),
},
],
)The pattern is: detect threshold, save state to disk, summarize, replace history with summary. This is the same approach used by Anthropic's long-running agent pattern, where an initializer agent creates a progress file and a feature list, and the coding agent reads git logs and progress files at the start of each session.
Putting It All Together
With these four patterns, you have the core of OpenClaw's orchestration in about 200 lines of Python. The ReAct loop handles tool-calling. The filesystem handles persistence. Sessions handle multi-agent coordination. Compaction handles long-running tasks. Everything else is engineering polish: streaming, retry logic, channel routing, typing indicators, and error recovery.
The deeper lesson here is a design philosophy: OpenClaw's power does not come from a sophisticated orchestration framework. It comes from the opposite. Simple, transparent patterns that the model can reason about. No hidden abstractions. No declarative graphs that obscure control flow. Just a loop, a filesystem, and sessions. The model is the orchestrator, and the harness gives it the tools to orchestrate well.
This is also why reaching for LangGraph is usually the wrong instinct. LangGraph encodes structure into the graph, but the whole point of agentic orchestration is that the agent controls the structure. When you build with vanilla code, you keep that flexibility. The harness becomes a thin layer of infrastructure, and the model makes the decisions. That is the OpenClaw way, and now you have the building blocks to do it yourself.