A team built a system with 8 collaborating agents. It worked beautifully in demo. In production, the agents stepped on each other's outputs, retried the same work in parallel, and produced inconsistent results across runs. The team's debugging was largely "which agent did this, and why does its output disagree with the other agent's output?"
Multi-agent systems are operas. The orchestration is most of the engineering. Without an explicit orchestration layer, you don't have a multi-agent system; you have multiple agents in a fight.
Orchestration patterns
Three common patterns:
Pipeline. Agents in series. Output of A is input to B. Simple. Hard to mess up.
Hierarchy. A coordinator agent dispatches sub-agents and merges results. The coordinator is the source of truth.
Marketplace. Agents bid on tasks; coordinator selects. More complex; sometimes useful for diverse capabilities.
Pick one. Mix them only when the cost of complexity is justified.
Failure-isolation rules
Each agent's failure must not cascade:
- A failed sub-agent doesn't take down the coordinator.
- A failed sub-agent's output doesn't poison other sub-agents' work.
- Recovery happens at the coordinator level, not propagated.
Without failure isolation, multi-agent systems are more fragile than single-agent ones. With it, they can be more robust.
Cost accounting
Multi-agent costs are multiplicative:
- Each agent has its own context.
- Each agent makes its own model calls.
- Total cost can be 5-10x a single-agent baseline.
Cost-aware design:
- Each sub-agent uses the cheapest model that does the job.
- Contexts are minimised; no copying of large state across agents.
- Caching where work is repeated.
Reviewer ritual
Multi-agent observability requires structured tracing:
- Each agent's actions identifiable in traces.
- Inter-agent communications logged.
- Coordinator decisions visible.
- Aggregate cost tracked at the orchestration level.
Without this, debugging multi-agent systems is detective work without forensics. With it, the team can answer "why did the system recommend X?" by following the trace.
A real org
A research-synthesis system with three agents:
- Researcher. Fetches and summarises relevant material.
- Critic. Reads researcher's output, surfaces gaps and questionable claims.
- Writer. Reads researcher + critic, drafts the synthesis.
Coordinator orchestrates: researcher first, critic second, writer third. Each has scoped tools and minimal context overlap.
The system is more capable than any single agent of the same model size. The cost is 3-5x. The team uses it for high-stakes research where the quality justifies the cost.
What we won't ship
Multi-agent systems without an explicit orchestration layer.
Agents that talk to each other freely without coordinator mediation.
Multi-agent systems without the eval discipline to validate them.
Multi-agent systems when a single agent of the same model can do the work.
Close
Multi-agent orchestration is engineering. The pattern is chosen explicitly. The failure isolation is engineered. The cost is accounted for. The traces are structured. Skip any of these and the multi-agent system is more brittle than the single-agent baseline it was supposed to improve on.
Related reading
- Sub-agents: when 1+1 actually equals 2 — building block.
- Plan vs. act — single-agent baseline.
- Your AI agent should plan like a kitchen brigade — the metaphor.
We build AI-enabled software and help businesses put AI to work. If you're orchestrating multi-agent systems, we'd love to hear about it. Get in touch.