$ CLI FLEET ORCHESTRATOR FOR CLAUDE CODE

Command a fleet of Claude coders. Every result, verified.

The Perfect Orchestrator turns one Claude Code session into the commander of N autonomous tmux workers. Every result gets adversarially verified by a different worker before it reaches you.

git clone https://github.com/daman8271/the-perfect-orchestrator && cd the-perfect-orchestrator && ./install.sh

★ Star on GitHub

MIT licensed
bash + tmux + files
zero daemons
no experimental flags
works with your Claude plan

THE PROBLEM

Every agent says “done.” Some of them are lying.

You fan out six agents and walk away. Twenty minutes later: six green checkmarks. Done, done, done — every one of them confident.

Then you look closer. One rubber-stamped your assumption instead of testing it. One declared victory on code it never ran. One reported “all tests pass” for a suite that exited before it started.

The more agents you run, the less of their work you can personally check. Autonomy without verification isn’t speed. It’s noise.

[W2] ✅ Done — all 14 tests passing. [W4 verifier] REFUTED — test suite was never run. Exit code 127.

real pattern. independent eyes kill false positives.

HOW IT WORKS

One lead. N workers. A bus they can’t bluff.

01

SPAWN

orch spawn audit 6 and six full Claude Code sessions bloom in tmux panes. Real TUIs you can watch, not invisible API calls.

02

COMMAND

The lead writes each worker a brief file, dispatches, then watches: reading panes, nudging the ones that drift. The lead is alive, not a cron job.

03

VERIFY

Done-flags drop, results land. Then a different worker tries to tear each one apart. Survivors get reported. Liars get caught.

$ orch spawn audit 4 ~/prod-api spawned orch-audit: 4 workers | workdir=/home/you/prod-api $ orch send audit 1 --file shared/agent-1.task.md -> W1: Read and follow the instructions… begin. [bus] [W2] unsafe redirect in auth/login.js:114 → posted [bus] [W3] 3 endpoints missing rate limits → posted $ orch status audit -- done flags -- agent-1.done agent-2.done agent-3.done [bus] [W4 verifier] W2 finding CONFIRMED · W3 #2 REFUTED — false positive, killed $ # survivors only. that's the point.

demo audit run, 2026-06-06 — a real fleet session replayed from the raw transcript. the verifier even corrects a finder’s miscount on camera.

TRUST NOTHING

The only fleet where workers audit each other.

We surveyed every public Claude-fleet orchestrator we could find (and Anthropic’s experimental Agent Teams). Adversarial result-verification existed in none of them. The closest thing, Agent Teams’ agent debate, argues during investigation, not over results. It exists here, as the core design rule: findings don’t count until a different worker has tried to kill them.

We ran this page through its own gauntlet: every claim was extracted, and a hostile panel tried to kill every load-bearing one.

15 sources read

74 claims extracted

24 confirmed of 25 panel-tested

1 refuted & deleted

the top 25 load-bearing claims went to 3-vote adversarial panels: 24 confirmed, 1 refuted; the rest logged as context.

this site was designed, written, and shipped by the very fleet it advertises.

read the full landscape survey →

the verification pass, visualized.

how it works, in 40 seconds. sound on, captions included.

THE HARNESS

Seven commands. Zero ceremony.

orch spawn <session> <N> [workdir]

Boot a fleet: N tiled Claude Code worker panes plus a shared workspace of bus, briefs, results, done-flags.

orch send <session> <n> <msg>

Type a single-line prompt into worker n. Newline submits; use --file for full briefs.

orch read <session> <n> [lines]

Capture a worker’s pane — see what it’s doing, catch stalls before they cost you.

orch status [session]

List fleets, or one fleet’s panes, done-flags, and bus tail.

orch kill <session> | --all

Tear down. Clean exit, no orphan panes.

orch doctor

Dependency check plus the allow-rules your lead session needs.

(+ orch worker — internal: what spawn runs inside each pane)

LEARNED IN PRODUCTION

Patterns with scar tissue.

This page practices it: a 4-worker fleet built v1 in one evening. An 8-agent panel then filed 38 findings, each adversarially verified before a fix shipped. The launch survey tested 25 claims; 24 confirmed, 1 refuted and deleted.

Find → verify

Some workers FIND, a different worker VERIFIES each finding. Independent eyes kill false positives.

Brief neutrally

Don’t pre-declare the cause — ask “verify whether X is wrong”, never “fix the X bug”. Workers investigate instead of rubber-stamping. This has caught the lead being wrong, repeatedly.

One owner per file

Parallel edits collide. One owner per shared file; everyone else requests changes via the bus.

Commit local, lead pushes

Workers never push. They commit only their own files behind a lock; the lead reviews every commit before anything leaves the machine.

3× cross-verify

For high stakes: round one audits, round two adversarially refutes, round three confirms the survivors.

THE LANDSCAPE

We checked everyone. Then built what was missing.

Capability	The Perfect Orchestrator	Tmux-Orchestrator	claude-squad	claude-tmux-orchestration	Claude Code Agent Teams
Lead is a live Claude session	✅	✅	❌ human-driven TUI	⚠️ external bash heartbeat	✅
Interactive TUI workers (watch them live)	✅	✅	✅	✅	⚠️ optional
File-based message bus	✅ `bus.md`	❌ send-keys only	❌	✅ JSON files	⚠️ mailbox (pushed)
Per-worker task brief files	✅	❌	❌	⚠️ injected prompts	✅ task list
Done-flag completion	✅	❌ poll panes	❌	✅	✅
Adversarial verification of results	✅ core design	❌	❌	❌	⚠️ debate-during-investigation only
Status (2026-06)	active	dormant since 2025-07	active (human multiplexer)	active	experimental, env-flagged

survey snapshot 2026-06-05 — full sources in docs/LANDSCAPE.md

$ whoami

You don’t babysit sessions anymore.

You give the order once.

Your terminal becomes mission control.

Made for vibe coders.

the people who ship at 3am with eight panes glowing.

60 SECONDS TO A FLEET

Start commanding.

Just want the /orch skill? Ten seconds, inside Claude Code:

claude plugin marketplace add daman8271/the-perfect-orchestrator && claude plugin install orchestrator@the-perfect-orchestrator

The full orch CLI takes the classic path:

git clone https://github.com/daman8271/the-perfect-orchestrator && cd the-perfect-orchestrator
./install.sh && orch doctor
# inside Claude Code: /orch — then describe the job

or read the docs →

if it earned it — a star helps others find this.

Questions you’re right to ask.

What does a fleet cost in tokens?

Workers are full Claude Code sessions on your existing plan: no API-key markup, no middleman. A 4-worker audit consumes roughly what 4 interactive sessions would, so size fleets to the work: spin up six workers for a cross-module sweep, not for a one-file fix.

Is verification automatic?

It’s step 8 of the lead’s protocol, not an optional extra: before reporting, the lead has a different worker try to refute each result, and signs off only on the survivors. The /orch skill bakes the loop in: spawn, brief, monitor, collect, verify, then report. A first-class orch verify command (auto-spawned refuters per done-flag) is on the roadmap.

Is it safe to run autonomous workers?

Workers run in auto mode with a generous allowlist: git, node, python, file edits, network fetches. That power is deliberate, and so are the limits: rm is not allowlisted, so destructive deletes still hit the permission classifier instead of running blind. Workers share your Claude credentials but get an isolated config dir — your hooks, history, and settings are untouched. Run fleets on a VPS or in a container, in repos you can git reset.

Why tmux instead of a headless SDK?

Observability is the feature. Every worker is a real TUI in a real pane. You can watch it think, catch it stalling, and nudge it mid-task. Headless API calls hide exactly the behavior you most need to see when you’re trusting agents with real work.

What about Anthropic’s Agent Teams?

Agent Teams is experimental and gated behind an env flag. The Perfect Orchestrator runs today on stock Claude Code with no experimental flags, and it adds the adversarial verification layer Agent Teams doesn’t have. A compatibility layer is on the roadmap.

When should I not use this?

A single quick task. Spawning a fleet to fix one bug is ceremony, not leverage — just do it yourself in one session. Reach for the fleet when the work genuinely parallelizes: audits, sweeps, multi-file migrations, anything with independent lanes.