AI Agents: Building an Autonomous Task Orchestration System

Background

This was one of our most technically ambitious projects: building a multi-agent orchestration platform that could take high-level goals and decompose them into coordinated, autonomous task chains executed by specialised AI agents.

The use case: a content and research business that needed to automate complex multi-step workflows. Tasks like “research competitor X, extract their top 10 content pieces, identify content gaps, draft a brief for each gap, and post to Notion” used to take a human 3 to 4 hours.

The Architecture Challenge

Single-agent AI systems are relatively straightforward. Multi-agent systems, where multiple specialised agents collaborate, hand off tasks, and share context, introduce a different class of problems:

Problems we needed to solve:

Task decomposition: How do you reliably break a high-level goal into executable sub-tasks?
Agent routing: How do you decide which specialised agent handles which sub-task?
Context passing: How do agents downstream receive the right context from agents upstream?
Failure recovery: What happens when one agent in a chain fails or produces bad output?
Observability: How does the operator know what’s happening inside a running agent chain?

The Solution

We built a central Orchestrator that sits above all specialised agents. It receives the high-level goal, uses an LLM call to produce a structured execution plan (JSON), and then dispatches tasks to the appropriate agents in the right sequence.

Agent Types Built

Agent	Responsibility
Research Agent	Web search, content extraction, source validation
Analysis Agent	Pattern recognition, gap identification, summarisation
Writer Agent	Long-form content drafting from structured briefs
Formatter Agent	Markdown, HTML, Notion, or Slack output formatting
Publisher Agent	Push content to Notion, Airtable, Google Docs, or Slack

Orchestration Flow

Goal: "Research [topic], identify 5 content gaps, draft briefs for each"

Orchestrator → produces execution plan:
  Step 1: Research Agent (searches, extracts, returns structured data)
  Step 2: Analysis Agent (receives Step 1 output, identifies gaps)
  Step 3 (parallel): Writer Agent × 5 (one per gap, runs concurrently)
  Step 4: Formatter Agent (assembles all drafts)
  Step 5: Publisher Agent (posts to Notion)

Step 3 runs in parallel: all 5 writer agents execute simultaneously, reducing total time by 4× compared to sequential execution.

Failure Recovery

Each agent step has a retry policy with exponential backoff. If an agent fails after 3 retries, the orchestrator:

Logs the failure with full context
Attempts a fallback strategy (different model, simpler prompt)
If still failing, marks the step as degraded and continues with partial output
Alerts the operator via webhook

This means a single agent failure doesn’t bring down the entire chain.

Observability Dashboard

One of the non-obvious requirements: the humans overseeing this system needed to see what was happening in real time. We built a live dashboard showing:

Active agent chains with step-by-step status
Token usage per agent per run (cost tracking)
Success/failure rates per agent type over time
Average chain completion time by goal type

The Stack

Orchestration: Custom Python service + LangGraph for complex agent graphs
LLMs: Claude 3.5 Sonnet (orchestrator + writer) + Claude Haiku (research + formatting)
Memory: Redis for short-term context passing between agents
Queue: Celery + Redis for parallel agent dispatch
Database: PostgreSQL for run history and analytics
Dashboard: Next.js + Recharts
Webhooks: Custom event bus for real-time status updates

Results

Metric	Manual Process	AI Agent System
Time per full workflow	3 to 4 hours	8 to 12 minutes
Concurrent workflows	1 (one human)	200+
Consistency	Variable	Highly consistent
Cost per workflow	$45 to $60 in labour	$0.80 to $1.20 in API costs

The platform now runs hundreds of concurrent task chains. The team it replaced wasn’t eliminated. They were redirected to the tasks that actually required human judgement: strategy, client relationships, and quality review.

Build time: 6 weeks from architecture design to production deployment.