3 architecture patterns for production AI agents
We've built 12+ AI agents in 2025-2026. Here are three architectures that actually work in production.
1. Single-agent + tools
One LLM + a set of tools (search, DB, API calls). Simplest option. Fits when:
- Task is repetitive (customer support, code review) - Context length is predictable - Latency isn't critical (<10 sec)
Stack: Claude Opus + tool_use API. Example: our in-house support.
2. Multi-agent orchestration
An orchestrator + N specialized agents. Each agent is an expert in its area. Fits when:
- Task is complex and needs different "experts" reasoning - User expects deep output (research paper, plan) - You're OK paying 3-5x tokens
Stack: Claude Sonnet for orchestration, Opus for complex sub-tasks. Example: a RAG system for a law firm (draft → review → risk-check).
3. Workflow with deterministic steps
Not an "agent" — a pipeline. LLM is called at fixed points, deterministic code between them. Fits when:
- You need guaranteed quality - You need debug-ability and monitoring - You need compliance (GDPR, HIPAA)
Stack: TypeScript backend, LLM as a "function" among others. Example: our Eilo — calendar suggestions come from an LLM, but Google Calendar writes go through deterministic code.
What we default to
Default — workflow. It's the most monitorable and cheapest at scale. Multi-agent only when the client has $$$$ budget and the task is creative. Single-agent for fast MVPs.
Main rule: DON'T write a multi-agent system on day one. Start with a workflow, remove the LLM from code where you can.