Imperium
All articles
Engineering

3 architecture patterns for production AI agents

Victoria P. 9 min

We've built 12+ AI agents in 2025-2026. Here are three architectures that actually work in production.

1. Single-agent + tools

One LLM + a set of tools (search, DB, API calls). Simplest option. Fits when:

- Task is repetitive (customer support, code review) - Context length is predictable - Latency isn't critical (<10 sec)

Stack: Claude Opus + tool_use API. Example: our in-house support.

2. Multi-agent orchestration

An orchestrator + N specialized agents. Each agent is an expert in its area. Fits when:

- Task is complex and needs different "experts" reasoning - User expects deep output (research paper, plan) - You're OK paying 3-5x tokens

Stack: Claude Sonnet for orchestration, Opus for complex sub-tasks. Example: a RAG system for a law firm (draft → review → risk-check).

3. Workflow with deterministic steps

Not an "agent" — a pipeline. LLM is called at fixed points, deterministic code between them. Fits when:

- You need guaranteed quality - You need debug-ability and monitoring - You need compliance (GDPR, HIPAA)

Stack: TypeScript backend, LLM as a "function" among others. Example: our Eilo — calendar suggestions come from an LLM, but Google Calendar writes go through deterministic code.

What we default to

Default — workflow. It's the most monitorable and cheapest at scale. Multi-agent only when the client has $$$$ budget and the task is creative. Single-agent for fast MVPs.

Main rule: DON'T write a multi-agent system on day one. Start with a workflow, remove the LLM from code where you can.