The Folder Is the Framework: Building Autonomous Multi-Agent Coding Loops

Inspired by Jake Van Clief’s ICM methodology.

I build autonomous loops out of multiple coding agents — they plan, build, review, and correct each other on repeat until the work clears a bar I set, with no babysitting and no orchestration framework. The ones I happen to use are Claude Code, Codex CLI, and Antigravity CLI. The engine that drives the loop is the project’s folder structure.

That’s the whole idea: engineer the loop into the structure, not into a control program. The folder tells each agent what to do, where to write, and when to stop. Here’s how it works.

Agents converge on an append-only collab.md; work flows one way through plan → IR → deliverables, looping until all three agree.

The problem

A single coding agent plans, writes, and reviews its own work — so it also signs off on its own mistakes. One model, one set of blind spots, nobody checking.

The usual fix is “use more agents,” and that’s where people reach for a framework: routers, message buses, role-playing swarms. I think that automates the wrong layer. It automates the conversation when the thing that needs structure is the work. So I didn’t build a framework. I built a workspace.

The one rule everything hangs on

When several agents share a folder, there’s exactly one way they reliably hurt each other: two of them writing the same file at once. Everything else is recoverable; concurrent writes are not.

So the whole system is built on one rule:

One writer per artifact. Everyone else talks in collab.md and never edits a file they don’t own.

The owner writes the deliverable directly. Reviewers route changes through collab.md as requests, and the owner applies them.

That single constraint is what makes three agents safe in one folder. Everything below just makes it easy to follow.

The workspace

A project is a master folder with one subfolder per feature. The master holds shared context; each feature is self-contained.

<project>/
  index.md        # generated map of features and their status
  home.md         # roster, roles, scoreboard, stop conditions
  changelog.md    # terse, append-only event log
  <feature>/
    feature.md      # the task + its definition of done
    context.md      # audience, tone, what good looks like
    references.md   # sources, labelled by confidence
    agents/         # one scratchpad per agent: claude.md, agent.md, gemini.md
    collab/collab.md
    plan/           # planning
    IR/             # work-in-progress
    deliverables/   # final, signed-off output only

Three things to notice. context.md and references.md are the project’s memory, so no agent gets re-briefed in a prompt. The control files are lowercase on purpose — agent tools auto-load uppercase memory files, and lowercasing keeps every feature’s scratchpad from flooding the context. And each agent writes only its own files.

The lifecycle

plan/ → IR/ → deliverables/ is a one-way pipeline. Planning lives in plan/, messy work lives in IR/, and nothing reaches deliverables/ until it’s done and signed off. Any agent can glance at a folder and know whether something is still cooking or finished.

Phase 1 plans, Phase 2 builds, and the loop repeats until the stop condition is met.

How the agents talk

collab.md is the only channel, and it’s deliberately boring: append-only, newest at the bottom, never rewritten. Every entry has the same shape:

### YYYY-MM-DD HH:MM - <agent> -> <agent(s)>
- type: comment | question | rebuttal | gap | request | scoreboard
- ref: <file + item ID, e.g. plan/plan.md D1>
- message: <one or two lines>

The important type is request: if an agent wants a change in a file it doesn’t own, it doesn’t edit — it asks, and the owner makes the change. There’s also changelog.md for the terse record of what happened. The rule that keeps them apart: talk to another agent in collab.md, record that something happened in changelog.md.

Running a loop

Two ways to start one. Either run /loop in Claude Code and let it programmatically invoke the Codex and Antigravity CLIs over the workspace, or open each agent in its own terminal pane, run /goal, point them at the folder, and tell them not to end the loop until all three agree on the quality of the output.

Why different models, not three of the same

This is the part that makes it worth the trouble. Three instances of one model share the same blind spots. Three different models catch each other.

On the project that hardened this setup — deploying a large language model across a small GPU cluster — the loop tried to declare victory early. The independent validator refused to sign off, ran a real check, and caught a launch-configuration defect the other two had already agreed on. One model’s blind spot was another’s obvious bug. That single catch paid for the whole process.

Setup

The system is two pieces: a small cross-platform skill (multi-agent-workspace) that teaches any agent to recognize one of these workspaces and obey the one rule, and a /collab command that scaffolds a workspace the skill recognizes.

/collab init builds the master plus its features in one shot, then regenerates the index.

Install the skill into each agent’s skills directory and the command into your interactive agent:

SRC="/path/to/multi-agent-workspace-bundle"
mkdir -p ~/.claude/skills ~/.codex/skills ~/.gemini/config/skills ~/.claude/commands
cp -R "$SRC/multi-agent-workspace" ~/.claude/skills/
cp -R "$SRC/multi-agent-workspace" ~/.codex/skills/
cp -R "$SRC/multi-agent-workspace" ~/.gemini/config/skills/
cp "$SRC/collab.md" ~/.claude/commands/collab.md

The skill directories differ by tool, so check yours — the same SKILL.md goes into each.

What to know before you try it

This is glue, and glue needs maintenance: when a CLI changes, the wiring sometimes needs a tweak. The agents coordinate through files, not a live channel, so it’s polling, not messaging. And reproducing my exact setup means running multiple agents, which is a real barrier. None of that undercuts the protocol — it just sets expectations. If you try it and find a sharper version of the one-writer rule, I want to hear it.

Folder = memory. Prompt = direction. The folder is the framework.

References

Jake Van Clief, Interpretable Context Methodology: Folder Structure as Agentic Architecture — arxiv.org/abs/2603.16021. The folder = memory idea this builds on.
Google Cloud, Introducing the Open Knowledge Format (OKF) — cloud.google.com/blog/…/open-knowledge-format. A vendor-neutral standard for knowledge as a directory of markdown + YAML frontmatter — the same pattern, formalized.