Background
This was one of our most technically ambitious projects: building a multi-agent orchestration platform that could take high-level goals and decompose them into coordinated, autonomous task chains executed by specialised AI agents.
The use case: a content and research business that needed to automate complex multi-step workflows. Tasks like “research competitor X, extract their top 10 content pieces, identify content gaps, draft a brief for each gap, and post to Notion” used to take a human 3 to 4 hours.
The Architecture Challenge
Single-agent AI systems are relatively straightforward. Multi-agent systems, where multiple specialised agents collaborate, hand off tasks, and share context, introduce a different class of problems:
Problems we needed to solve:
- Task decomposition: How do you reliably break a high-level goal into executable sub-tasks?
- Agent routing: How do you decide which specialised agent handles which sub-task?
- Context passing: How do agents downstream receive the right context from agents upstream?
- Failure recovery: What happens when one agent in a chain fails or produces bad output?
- Observability: How does the operator know what’s happening inside a running agent chain?
The Solution
We built a central Orchestrator that sits above all specialised agents. It receives the high-level goal, uses an LLM call to produce a structured execution plan (JSON), and then dispatches tasks to the appropriate agents in the right sequence.
Agent Types Built
| Agent | Responsibility |
|---|---|
| Research Agent | Web search, content extraction, source validation |
| Analysis Agent | Pattern recognition, gap identification, summarisation |
| Writer Agent | Long-form content drafting from structured briefs |
| Formatter Agent | Markdown, HTML, Notion, or Slack output formatting |
| Publisher Agent | Push content to Notion, Airtable, Google Docs, or Slack |
Orchestration Flow
Goal: "Research [topic], identify 5 content gaps, draft briefs for each"
Orchestrator → produces execution plan:
Step 1: Research Agent (searches, extracts, returns structured data)
Step 2: Analysis Agent (receives Step 1 output, identifies gaps)
Step 3 (parallel): Writer Agent × 5 (one per gap, runs concurrently)
Step 4: Formatter Agent (assembles all drafts)
Step 5: Publisher Agent (posts to Notion)
Step 3 runs in parallel: all 5 writer agents execute simultaneously, reducing total time by 4× compared to sequential execution.
Failure Recovery
Each agent step has a retry policy with exponential backoff. If an agent fails after 3 retries, the orchestrator:
- Logs the failure with full context
- Attempts a fallback strategy (different model, simpler prompt)
- If still failing, marks the step as degraded and continues with partial output
- Alerts the operator via webhook
This means a single agent failure doesn’t bring down the entire chain.
Observability Dashboard
One of the non-obvious requirements: the humans overseeing this system needed to see what was happening in real time. We built a live dashboard showing:
- Active agent chains with step-by-step status
- Token usage per agent per run (cost tracking)
- Success/failure rates per agent type over time
- Average chain completion time by goal type
The Stack
- Orchestration: Custom Python service + LangGraph for complex agent graphs
- LLMs: Claude 3.5 Sonnet (orchestrator + writer) + Claude Haiku (research + formatting)
- Memory: Redis for short-term context passing between agents
- Queue: Celery + Redis for parallel agent dispatch
- Database: PostgreSQL for run history and analytics
- Dashboard: Next.js + Recharts
- Webhooks: Custom event bus for real-time status updates
Results
| Metric | Manual Process | AI Agent System |
|---|---|---|
| Time per full workflow | 3 to 4 hours | 8 to 12 minutes |
| Concurrent workflows | 1 (one human) | 200+ |
| Consistency | Variable | Highly consistent |
| Cost per workflow | $45 to $60 in labour | $0.80 to $1.20 in API costs |
The platform now runs hundreds of concurrent task chains. The team it replaced wasn’t eliminated. They were redirected to the tasks that actually required human judgement: strategy, client relationships, and quality review.
Build time: 6 weeks from architecture design to production deployment.