โ† Back to blog

A DeepSeek Researcher Just Open-Sourced an AI That Runs Full Research Loops With Zero Human Input

โญ Featured

A DeepSeek Researcher Just Open-Sourced an AI That Runs Full Research Loops With Zero Human Input

DeepSeek researcher Deli Chen just open-sourced something that pushes the boundary of what AI agents can do on their own: AutoResearch, a protocol where an AI agent autonomously completes the full reinforcement learning research loop โ€” experiment design, writing code, submitting GPU jobs, debugging, and drawing conclusions โ€” on a 285-billion-parameter model, without any human intervention.

Alongside the protocol, Chen also published a survey paper on Self-play, the technique that underpins the system (victorchen96.github.io/auto_research).

Why This Is Different From "AI Writing Code"

AI models writing code is no longer news. Every major coding agent can generate functions, fix bugs, and scaffold projects. AutoResearch does something categorically different: it runs the entire scientific workflow.

Writing code and completing a research loop are not the same thing. The difference is like knowing how to cook a dish versus running a restaurant kitchen that produces consistent output every night. The research loop involves formulating hypotheses, designing experiments, managing compute resources, interpreting unexpected results, and deciding what to try next.

AutoResearch closes that loop end to end. The agent decides what to investigate, writes the code, submits GPU jobs, reads the output, debugs failures, and โ€” critically โ€” draws conclusions and decides the next experiment. No human touches anything in between.

The system calls GRPO (Group Relative Policy Optimization) as one of its tools, treating reinforcement learning algorithms as functions an agent can invoke, not as separate projects requiring a human researcher.

The Scaffolding Matters as Much as the Model

What makes AutoResearch work is not just the underlying model. It's the engineering scaffolding that wraps the model in a workflow with defined stages, checkpointing, and error recovery.

The protocol handles the real-world messiness of research: GPU jobs that fail, results that don't converge, experiments that need to be re-run with different parameters. An autonomous research agent is less about being smart enough to design the perfect experiment and more about being resilient enough to handle the imperfect ones.

This is the same principle that makes production AI agents useful in any domain โ€” not perfection on the first try, but the ability to keep going, recover from failures, and produce useful output across many steps.

What Agents Can Learn From This Today

AutoResearch is a research artifact, not a product. You won't install it and replace your ML team. But it demonstrates patterns that apply to any agent workflow:

Closed loops beat open ones. An agent that formulates a hypothesis, executes, evaluates, and decides next steps on its own consistently outperforms one that needs a human to interpret each result and issue the next command. The value of autonomy isn't speed โ€” it's that the agent accumulates context across steps that a human would lose between sessions.

Scaffolding is the product. The most interesting part of AutoResearch isn't the 285B model โ€” it's the protocol that manages the research loop. The same applies to customer support agents, coding agents, and workflow automation. The model is the engine; the scaffolding is the vehicle.

Failure handling is a feature, not a bug. Research fails constantly. AutoResearch distinguishes between failures that mean "try again with different parameters" and failures that mean "this approach is wrong." Most agent workflows don't make that distinction yet. Adding it is how agents move from demos to production.

What This Means If You Use ClawWorld

AutoResearch runs overnight on GPU clusters without anyone watching. ClawWorld runs the same kind of loop at the scale where it's useful today โ€” not RL experiments on 285B models, but your actual workflows: responding to triggers, completing multi-step tasks, producing output across the tools you already use.

The principle is identical: a defined goal, persistent context, and the ability to keep working without constant human prompting. AutoResearch demonstrates that autonomous agent loops work at the extreme end. ClawWorld makes that loop available for the tasks you'd otherwise do manually.

The difference between a chat interface and an agent has always been that the agent keeps going. AutoResearch is what "keeps going" looks like when the task is scientific research. ClawWorld is what it looks like when the task is yours.

Start your free trial โ†’