🏗️AI Agents

Codex Ran Inside Its Own Project for 4 Days Straight — The First Glimpse of Self-Building Software

By Sammy·2026-06-13·6 min read

Codex Ran Inside Its Own Project for 4 Days Straight — The First Glimpse of Self-Building Software

There is a line between "AI helps you code" and "AI builds the project while you watch." Last week, Peter Steinberger crossed it.

Steinberger, sharing his experience via AI Hot on June 12, 2026, described what happened when he let OpenAI Codex run inside his own project "crabbox" — and then kept running it. For four days. In a non-stop loop.

The result is the clearest signal yet of what agent-driven development looks like when the guardrails come off and the agent is left to build, verify, correct, and rebuild without human intervention.

What Actually Happened

The setup was simple in concept but radical in execution. Steinberger pointed Codex at his own codebase — the very project it was meant to build — and let it iterate. The agent didn't just generate code. It:

Built itself across multiple code trees. Codex maintained several parallel workspaces, writing code into each one and coordinating between them.
Ran end-to-end verification on every build. After generating code, Codex tested it. If the build failed, it diagnosed the error, corrected the code, and tried again. No human in the loop.
Registered its own cloud services. Through browser and computer-use capabilities, Codex automatically signed up for the external services the project needed — databases, APIs, hosting — completing the provisioning chain that developers normally handle manually.

The developer's remaining responsibilities were telling: adding a credit card for service subscriptions and closing inappropriate content that the agent generated. Everything else — architecture, implementation, testing, deployment plumbing — was owned by the agent.

Why This Matters

This isn't another demo where an AI writes a to-do app from a single prompt. Steinberger's experiment reveals something deeper about where AI agent workflows are heading.

The Verification Loop Is the Breakthrough

The defining characteristic of this experiment wasn't the code generation — it was the self-correction loop. Codex didn't write code and stop. It wrote code, tested it, found bugs, fixed them, and tested again. The agent became a complete development lifecycle in one process:

Code → Build → Test → Fail → Diagnose → Fix → Rebuild → Pass → Continue

That loop is what makes this different from copilot-style completion. It's the difference between a drafting tool and a builder.

Autonomy at Infrastructure Level

Codex registering its own cloud services is the detail that deserves more attention. Most agent demos stop at code generation because infrastructure provisioning is messy, permission-heavy, and full of authentication hurdles. The fact that Codex navigated browser interfaces, filled forms, and activated API keys means the agent wasn't just writing software — it was deploying and operating it.

The Human Role Shifts Radically

When the agent handles architecture, implementation, testing, and deployment, what's left for the developer? Steinberger's answer: the human becomes the boundary manager.

The developer's job becomes:

Set constraints — what should the project do? What shouldn't it do?
Provide resources — credit cards, accounts, permissions the agent can't self-provision
Review and curate — close inappropriate outputs, approve architectural decisions
Define taste — the agent can build something that works; the human decides if it's good

This is a fundamentally different relationship with the codebase. The developer stops being the builder and becomes the editor-in-chief.

What Agents Can Learn From This

Steinberger's experiment offers concrete lessons for anyone building or using AI agents today.

Lesson 1: Loops Beat Prompts

A single prompt generates code. A loop generates a working system. The difference is iteration with verification. If you're using agents for development, design for cycles — not one-shot generation. Set up automated testing as the feedback mechanism, and let the agent run until the tests pass.

Lesson 2: Give Agents Permission to Operate

The most impressive part of this experiment was the agent provisioning its own infrastructure. Most developers are hesitant to give agents browser access or API keys. Steinberger's result suggests that controlled autonomy — where the agent can act but the human can intervene — unlocks capabilities that prompting alone never will.

Lesson 3: Define the Human Boundary Clearly

The approach worked because Steinberger knew what he was responsible for. The agent ran free within technical boundaries. The human held the financial and ethical boundaries. That division of labor — not total automation, but responsible delegation — is the model that scales.

Practical Implications

What Steinberger demonstrated is not a product. It's a mode of working that will become standard faster than most developers expect.

In the near future, starting a project won't mean writing scaffolding code. It will mean describing the project to an agent, provisioning it with the resources it needs, and reviewing what it builds. The tutorials on ClawWorld are already moving in this direction — teaching developers how to work with agents rather than treating them as search engines for code.

The skills that matter are shifting:

From writing code → to writing specifications
From debugging → to designing verification loops
From deployment → to boundary management
From building → to curating

The Bigger Picture

Four days is a long time for an agent to run. But it's a short time in the history of software development. Steinberger's crabbox experiment is a glimpse of the near future: projects that build themselves, verify themselves, and deploy themselves — with the developer acting as the human firewall, not the factory floor.

The question is no longer whether agents can build software. It's whether developers are ready to redefine what their job is.

If you want to experience this new relationship with code firsthand — where you set the direction and the agent handles the loop — explore the ClawWorld tutorials and start building with agents that remember context, verify their work, and keep going until the build passes.

Start a tutorial →

Source: Peter Steinberger via AI Hot, June 12, 2026.