iCentric Insights Insight

Supervisor and Sub-Agents: How One Agent Learns to Delegate

As multi-agent AI frameworks go mainstream, UK teams face a critical architectural choice: hardcode task routing or let a supervisor agent delegate dynamically — and getting it wrong is already provin

June 29, 2026
AI AgentsSoftware ArchitectureEnterprise AI
Supervisor and Sub-Agents: How One Agent Learns to Delegate

There is a moment in every maturing technology cycle when the experimental becomes the operational. For multi-agent AI systems, that moment is now. Following OpenAI's release of the Swarm framework and a wave of Anthropic research on agent orchestration, development teams across the UK are no longer debating whether to build agentic pipelines — they are debating how. The architectural decision at the centre of that debate is deceptively simple: should a supervisor agent dynamically decide which sub-agents to spawn and delegate to, or should those routing decisions be hardcoded in advance? The wrong answer is already landing organisations with runaway API bills and automation workflows that fail silently in production.

This is not an abstract engineering puzzle. It has direct consequences for cost control, reliability, and the speed at which your organisation can adapt its AI capabilities to changing business requirements. Senior decision-makers who leave this choice entirely to engineering teams — without establishing clear governance criteria — are likely to find themselves firefighting expensive surprises rather than realising the productivity gains they were promised.

Understanding the Architecture: Supervisors, Sub-Agents, and the Delegation Problem

In a multi-agent system, a supervisor agent acts as an orchestrator. It receives a high-level task, reasons about what needs to be done, and either completes subtasks itself or delegates them to specialised sub-agents — each of which may call tools, query databases, browse the web, write code, or interact with external APIs. The supervisor may also spawn sub-agents dynamically at runtime, choosing which agent to invoke based on the nature of the incoming task rather than a fixed script.

The alternative is a statically routed pipeline: the developer defines in advance exactly which agent handles which type of task, in what sequence, and under what conditions. This is sometimes called a 'hardcoded' or 'deterministic' routing architecture. Both approaches have legitimate uses, but conflating them — or defaulting to dynamic delegation because it feels more intelligent — is where many teams are currently going wrong. The flexibility of dynamic spawning comes with a cost in tokens, latency, and observability that is easy to underestimate until you are reading a very large cloud invoice.

When Dynamic Delegation Earns Its Keep

Dynamic supervisor-driven delegation genuinely excels in scenarios where the nature of incoming tasks is highly variable and cannot be reliably anticipated at design time. Consider a legal services firm building an AI assistant that handles queries ranging from contract review to regulatory research to client correspondence drafting. The breadth and unpredictability of those tasks makes it impractical to enumerate every routing rule. A well-designed supervisor agent can assess the incoming request, reason about which specialist sub-agent is most appropriate, and delegate accordingly — including handling edge cases that would fall through the gaps of any static ruleset.

Dynamic architectures also provide a meaningful advantage when your sub-agent roster is expected to grow over time. If your organisation is building a platform where new specialist agents will be added incrementally — say, as different business units onboard — a supervisor that can discover and delegate to new agents without requiring a redeployment of the core routing logic is a genuine architectural asset. The key word, however, is 'well-designed'. A supervisor that is given insufficient context, poorly scoped tools, or ambiguous instructions will hallucinate delegation decisions, call sub-agents unnecessarily, and compound errors across the pipeline.

When Hardcoded Routing Is the Right Call

For the majority of enterprise automation use cases, the task domain is far more constrained than teams tend to assume at the outset. A customer service triage workflow, a document processing pipeline, or a compliance checking routine typically involves a defined set of task types with predictable inputs and outputs. In these scenarios, hardcoded routing — where the developer explicitly defines which agent handles which task type — is not a limitation. It is a feature. It is auditable, cost-predictable, easier to test, and significantly simpler to debug when something goes wrong.

Hardcoded architectures also make it substantially easier to enforce compliance requirements, which is a non-trivial consideration for UK organisations operating under FCA oversight, NHS data governance frameworks, or GDPR obligations. When a regulator asks you to demonstrate exactly how a decision was reached, a deterministic routing graph is far easier to explain than a supervisor agent that made a runtime judgement call based on a prompt. If your task domain is stable, your compliance requirements are strict, and your team's AI observability tooling is still maturing, start with deterministic routing and earn the right to introduce dynamic delegation gradually.

The Cost and Observability Problem Teams Are Underestimating

One of the least discussed risks of dynamic multi-agent architectures is the compounding token cost problem. Each time a supervisor reasons about delegation, it consumes tokens. Each sub-agent invocation consumes tokens. If a sub-agent returns an ambiguous result and the supervisor decides to re-delegate or retry, costs multiply further. In production workloads processing thousands of requests per day, poorly scoped supervisor agents can generate ten times the API spend of an equivalent deterministic pipeline — and do so unpredictably, making budget forecasting extremely difficult.

Observability is the second underestimated challenge. In a statically routed system, logs and traces map cleanly to a known graph of steps. In a dynamic system, the path through the agent graph varies with every request. Without purpose-built tracing tooling — frameworks such as LangSmith, Weights & Biases, or bespoke observability layers built on OpenTelemetry — engineering teams are effectively flying blind. Diagnosing why a particular request failed, looped, or produced an unexpected output becomes a forensic exercise rather than a routine operational task. UK teams adopting multi-agent architectures should treat observability infrastructure as a prerequisite, not an afterthought.

The practical advice here is not to choose one architecture religion and apply it universally. It is to apply the right architecture to the right problem — and to be honest with yourself about which category your current use case actually falls into. Start by mapping the variability of your task domain. If more than 80 per cent of your incoming tasks can be categorised into a finite set of types, build deterministic routing first. Introduce dynamic delegation only where the remaining variability genuinely justifies the cost and complexity overhead.

Equally important is establishing governance around agent boundaries before deployment, not after. Define what each sub-agent is permitted to do, what tools it can access, and what constitutes a failure condition that should escalate to a human rather than trigger another delegation loop. The organisations getting the most reliable value from multi-agent systems right now are not the ones with the most sophisticated supervisors — they are the ones with the clearest operational boundaries. If you are currently designing or reviewing a multi-agent architecture and would like an independent assessment of where your routing strategy may be creating cost or reliability exposure, iCentric's engineering team is well placed to help.

What is the difference between OpenAI's Swarm framework and a traditional API integration?

OpenAI's Swarm is a lightweight framework specifically designed to orchestrate multiple AI agents, enabling handoffs and delegation between them in a structured way. Unlike a traditional single API call, Swarm allows one agent to pass context and control to another agent mid-task, making it suited to complex, multi-step workflows. It is still considered experimental and is best evaluated in controlled settings before production adoption.

How do we estimate API costs before committing to a dynamic multi-agent architecture?

The most reliable method is to instrument a representative sample of real production tasks and simulate the full agent interaction trace, counting every token consumed at each delegation step. Tools such as LangSmith or custom OpenTelemetry instrumentation can help you capture these traces. Add a contingency multiplier of at least two to three times for retry loops and ambiguous delegation decisions before presenting a cost model to stakeholders.

Can a hybrid approach work — using both static routing and dynamic delegation in the same system?

Yes, and this is often the most pragmatic architecture for large organisations. A common pattern is to use deterministic routing for well-understood, high-volume task types and reserve dynamic supervisor delegation for an 'overflow' category of complex or novel tasks. The critical requirement is that the boundary between the two regimes is clearly defined and monitored, so that unexpected task volumes do not push costs into the dynamic tier unexpectedly.

What UK regulatory considerations apply specifically to dynamic agent delegation decisions?

The primary concern for regulated industries is explainability and audit trail completeness. The FCA's AI guidance and the ICO's framework on automated decision-making both require that organisations can demonstrate how a decision was reached and that appropriate human oversight exists. Dynamic delegation decisions made at runtime by a supervisor agent must be logged with sufficient granularity to reconstruct the reasoning path — this is not optional in regulated environments.

How many sub-agents is too many for a supervisor to manage effectively?

There is no universal number, but practical experience suggests that supervisor agents begin to degrade in routing accuracy when choosing between more than eight to twelve distinct sub-agents without additional hierarchical structure. Beyond that threshold, consider introducing intermediate orchestrators — supervisors that manage clusters of related sub-agents — rather than forcing a single top-level agent to reason across an ever-growing tool surface.

What happens when a sub-agent fails — does the supervisor agent handle retries automatically?

This depends entirely on how the supervisor is designed and prompted. Without explicit failure-handling logic, many supervisor agents will either retry indefinitely (creating cost loops) or silently return a degraded result without flagging the failure upstream. Best practice is to define explicit error states for each sub-agent, set maximum retry limits at the framework level, and ensure that unresolvable failures escalate to a human-in-the-loop checkpoint rather than propagating through the pipeline.

Is it possible to use open-source models rather than OpenAI or Anthropic for supervisor agents in a multi-agent system?

Yes, open-source models such as Mistral, LLaMA, or Qwen can serve as supervisor agents, particularly where data sovereignty or cost control is a priority. However, the quality of routing decisions is highly sensitive to the model's instruction-following capability and context window size. Smaller open-source models often struggle with complex multi-step reasoning required for reliable delegation, so thorough benchmarking against your specific task distribution is essential before a production commitment.

How should we structure the prompts for a supervisor agent to minimise hallucinated delegation decisions?

Supervisor prompts should include a precise, unambiguous description of each available sub-agent's capabilities and limitations, along with explicit decision criteria for when to delegate versus handle a task directly. Providing worked examples of correct delegation decisions as few-shot examples significantly improves reliability. Avoid giving the supervisor a long list of broadly described tools — specificity in agent descriptions is one of the most effective ways to reduce erroneous routing.

What observability tooling do UK teams typically use to monitor multi-agent systems in production?

LangSmith is widely used by teams already in the LangChain ecosystem, offering agent trace visualisation and token-level cost attribution. Weights & Biases provides broader ML observability including agent run tracking. For organisations with existing observability infrastructure, OpenTelemetry-compatible instrumentation can surface agent traces in tools such as Grafana or Datadog. The choice should align with your existing engineering toolchain rather than requiring a parallel monitoring stack.

How long does it typically take to build and stabilise a production-ready multi-agent system?

For a well-scoped use case with a defined task domain, a development team can typically reach a functional prototype within four to eight weeks. However, stabilising a multi-agent system for production — including observability, failure handling, cost guardrails, and user acceptance testing — typically adds a further two to three months. Teams that underestimate the stabilisation phase are disproportionately likely to encounter the cost and reliability issues described in the article.

AI Agents Software Architecture Enterprise AI

Get in touch today

Book a call at a time to suit you, or fill out our enquiry form or get in touch using the contact details below

iCentric
June 2026
MONTUEWEDTHUFRISATSUN

How long do you need?

What time works best?

Showing times for 30 June 2026

No slots available for this date