An interesting thing about AI agents is just how many things you can do with them without ever going wrong. An agent is just as much a result of engineering choices and biases, as an actual product. A lot of times, those choices cannot be “wrong”, like a math problem where you know 2+2 is not equal to 5. It’s a matter of choosing the right tool, the right approach, for the right situation.
There are two choices that come very early in a project, in my experience: whether to adopt a single or multi-agent architecture, and to choose between emergent or explicit routing.
The dilemma when choosing the number of agents often comes from the nature and complexity of the tasks you are willing to separate. Let’s say Agent 1 has two responsibilities to take care of. If task A is very different than task B, shouldn’t you just create a separate agent that can handle the latter? But the problem now is that you have to handle the extra complexity of agent to Agent communication, separate concerns, information handoff and security, etc. In that case, if you don’t think it’s worth it, maybe just create two different tools and just add them to the one agent? But then how does the agent know which tool to use and when?

And really, there’s no obvious right way to do it. Multi-agent architectures have better PR and tend to be seen as more sophisticated, due to the marvel of seeing ai agents talking to each other like humans and being specialists of a task (which might seem like a cleaner separation of concern) but really if the objectives are not drastically different, you can accomplish a lot with one agent and well-defined tools. But obviously you don’t want a situation where you have a “god” agent that is expected to do everything (and will certainly be good at nothing, due to context pollution) or the opposite, where you have an army of agents and some are there for the most trivial tasks. It’s all about balance.
The second architectural decision, comes from how you want the agent to operate, it’s orchestration. Earlier, I mentioned emergent vs explicit routing.
In Emergent Routing, a single large model handles everything. It implicitly decides how to respond based on its internal reasoning. There’s no explicit router — the intelligence emerges from the model itself. Simpler architecture, but less controllable and harder to debug, because LLMs are still a bit of a black box. You can instruct the model, but ultimately it decides what to do. It can get very creative and amaze you, but sometimes be dumb (see my previous post about the time travel bug) or worse, dangerous (see the horror stories online about agents wiping databases and leaking personal data to the public).
The second, less-known sibling, explicit routing, is a bit more restrained. A separate router (rule-based system or smaller model) decides which tool, prompt, or sub-model to use. Often called tool-based routing, model routing, or orchestrated architecture. More controllable and modular. Easier to optimize cost, performance, and reliability. But less creative, limited to the purpose of your design, and a lot to maintain.

One project gave me the opportunity to try multi-agent architecture for the first time. The system needed to transform materials into detailed customer personas for a client. The workflow had three distinct phases: content analysis (extracting signals from PDFs, images, and spreadsheets), market research (enriching the analysis with current market data), and persona generation (creating culturally authentic customer profiles).
Each phase required different capabilities — the first needed fast multimodal processing, the second needed web search and reasoning, the third needed creative writing. Rather than building one agent that did everything, I split it into three specialist agents coordinated by an orchestrator. The orchestrator decided when to call each specialist, how many segments to create based on content complexity, and whether outputs met quality thresholds before proceeding.
This was a deliberate departure from my previous work. My earlier agents were single-agent systems with tools — one model handling everything, calling tools as needed. That works well when the task is homogeneous. But when you have genuinely different cognitive tasks requiring different model strengths, the multi-agent pattern starts making sense. And my curiosity got the best of me.
My initial implementation, using an orchestrator and three specialists agents-as-tools, maybe as a result of being a novice in this architecture, leaned heavily into the explicit nature of multi-agent systems. Each specialist had a hardcoded system prompt. The orchestrator called them with rigid parameters: create_segments(analysis: str, num_segments: int = 4). The agents did their job, returned their output, and forgot everything.
This worked, but I quickly ran into the shortcomings. Neither agent had any memory of what the other agents produced and no ability to refine. If the user said “focus more on younger customers,” the whole pipeline had to run again from scratch. The researcher agent couldn’t check if it had already analyzed this market last week. The persona generator couldn’t see the content analyst’s original findings to add nuance. The rigid interfaces meant the orchestrator couldn’t ask for targeted refinements like “keep the first two personas but make the third more conservative.” It could only say “create 4 personas” and hope for the best.
I realised I was leaning too much into explicit routing. But multi-agent systems are explicit by nature. There are two layers to consider. Infrastructure is always explicit because you have to define tools, register agents, wire up clients. Someone writes the @tool decorators and the dispatch logic. That plumbing is inherently explicit code. Routing decisions, however, can be emergent. The orchestrator gets handed a set of tools and the model decides which to invoke based on conversational context. Nobody writes if “refine” in user_input: call_persona_agent(). That reasoning emerges from the LLM.
Can a multi-agent system be fully emergent? Not really. Agents need defined capabilities and boundaries. Tool schemas must be declared for the LLM to reason about them. Security, cost controls, and guardrails require explicit constraints. You can’t have an agent spontaneously discover and wire up new sub-agents at runtime, not safely anyway.
So I made changes to loosen the grip where it mattered. First, long-term memory: each agent got access to a shared knowledge base where findings were stored across sessions. The researcher could now check past work before starting fresh. Second, short-term memory: a shared namespace within each session so agents could see each other’s output in real-time. The analyst’s findings were visible to the researcher. The researcher’s segments were visible to the persona generator. Third, flexible interfaces: rigid parameters became task descriptions. Instead of generate_personas(num_personas: int), it became generate_personas(task: str). The orchestrator could now pass nuanced instructions like “refine the first segment to emphasize sustainability concerns.”
Now the system is a mix of both approaches. The infrastructure stays explicit with tool definitions, agent capabilities, and memory namespaces all coded. But the dispatch is emergent. The orchestrator decides which agent to call and when, based on conversational context. The model reasons about sequencing and refinement while the architecture enforces the guardrails.
This brings us back to the architectural choices I outlined at the start. Single agent versus multi-agent. Emergent versus explicit routing. The diagrams make them look like binary decisions, but they’re not. My system started as multi-agent with explicit routing and evolved into multi-agent with hybrid routing. The agent count stayed the same. What changed was how much I trusted the model to make decisions versus how much I hardcoded.
These choices exist on a spectrum, and you can move along it. You start explicit because it’s predictable and debuggable. You loosen the grip as you understand where flexibility helps. Shared memory let my agents build on each other’s work. Flexible task descriptions let the orchestrator request refinements instead of full reruns. The infrastructure stayed explicit because it has to. The dispatch became emergent because it could.
There’s no obvious right answer. Multi-agent has better reputation but adds complexity. Emergent routing is elegant but harder to debug. What I learned is that you don’t have to pick one and stick with it. You architect for where you are, then iterate toward where you need to be.













