You want to build an application — mobile, web, whatever — where the backend is an AI agent. Not a chatbot. An agent that reads files, runs commands, makes decisions, calls tools. Something like Donna (Eric’s personal CRM assistant running on Cursor/OpenClaw) but exposed as an API that multiple users can hit from their phones.
What’s the best architecture? Should you use Cursor’s headless CLI? Wrap Claude in a while-loop? Use a framework like LangGraph? Pay for Devin’s API? Roll your own?
We just proved cursor-agent -p -f works headless. But “works on my Mac” and “works as a product” are different questions. This thesis maps the landscape, names the real options, and picks one.
This isn’t academic. Eric is building multiple products that need agentic backends right now:
The contrarian angle: most discourse treats this as a framework selection problem. It’s not. It’s an architecture pattern question — and the answer is simpler than the industry wants you to believe.
The first mistake everyone makes is treating “agentic backend” as one category. There are five distinct layers, and conflating them leads to bad decisions.
| Layer | What It Does | Leaders | Key Numbers |
|---|---|---|---|
| 1. Model API | Raw LLM with tool_use / function calling | Claude API, GPT API, Gemini | Anthropic $14B ARR1; OpenAI Agents SDK 18.9K stars2 |
| 2. Agent Framework | Orchestration, memory, multi-step workflows | LangGraph, CrewAI, Mastra | LangChain 127K stars, $135M raised3; CrewAI 43.9K stars4 |
| 3. IDE/CLI Agent | Code-native agent with file/terminal access | Cursor, Claude Code, Codex CLI | Cursor $1B ARR5; Claude Code ~$2.5B ARR contribution6 |
| 4. Infrastructure | Sandboxing, browsers, tools, compute | E2B, Browserbase, Composio | E2B $21M raised7; Composio 26.5K stars, $29M raised8 |
| 5. Runtime/Gateway | Persistent agent hosting, messaging channels | OpenClaw, AgentProtocol | OpenClaw 193K stars9; MCP 78.6K stars10 |
Strip away marketing and there are exactly four ways people build agentic backends today. Here they are, from simplest to most complex:
A single LLM in a tool-calling loop. No framework. No orchestration layer. Just: send message → model responds with tool calls → execute tools → send results back → repeat until done.
Cost: $0.05–0.30 per turn (Claude Sonnet). Latency: 2–15s per tool cycle. Complexity: ~100–300 lines of code.
Use LangGraph, CrewAI, or Mastra to manage state, routing, multi-step workflows, and human-in-the-loop patterns.
When it makes sense: Multi-agent coordination, durable long-running tasks, compliance-heavy workflows with audit trails. When it doesn’t: Anything a single agent can handle in one conversation.
Use Cursor’s headless CLI or Claude Code’s SDK to run code-capable agents. This is what we just tested:
echo "" | cursor-agent -p -f --output-format json "your prompt"
Deploy a persistent agent on a runtime like OpenClaw that handles messaging channels (WhatsApp, Telegram, Discord), session management, and long-running state.
What OpenClaw gives you: WhatsApp/Telegram/Discord integration, persistent agent process, plugin system, Docker deployment. What it costs you: Known instability (message loss during gateway restarts), fast-moving project with breaking changes, CVE security concerns9.
Our position: The anti-framework camp is right for most applications. The pro-framework camp is right for complex enterprise workflows. The mistake is thinking you need to decide upfront. Start with a while-loop. Add framework when you hit a specific wall (durability, multi-agent, audit trails). Most people never hit that wall.
Cursor shipped a Background Agents API and headless CLI. Claude Code has a full programmatic SDK. This raises the question: can you use coding agents as general-purpose app backends?
What works: Dev automation, CI/CD integration, cron-triggered tasks, internal tools.
What doesn’t: High-concurrency customer-facing apps. These tools are per-developer priced, code-optimized, and single-tenant by design. Running 1,000 concurrent user sessions on Cursor CLI is neither designed nor priced for that.
Our position: Use Cursor CLI for your own automation (Eric’s prmupdate, deploys, research runs). Don’t use it as the production backend for a multi-user app.
Model Context Protocol has 78.6K stars and adoption from every major vendor10. But a 623-point HN post called out the spec as apparently LLM-generated17, transport layer should be WebSockets not SSE-on-SSE, security is an afterthought, and only 16% task completion on benchmarks.
Our position: MCP is winning by default, not by merit. As one HN commenter said: “I take a bad standard that can evolve, over no standard at all.” It moved to Linux Foundation governance. Build MCP-compatible tools. Don’t bet your architecture on MCP internals being stable.
The biggest surprise in our research. A single “agentic” user request triggers 8–15 internal LLM calls. One team budgeted $4K/month and hit $11.2K in 3 weeks because of recursive loops18.
| Task Type | Cost/Request | Calls/Request |
|---|---|---|
| Simple chat + 1 tool | $0.02–0.06 | 2–3 |
| Research task | $0.15–0.50 | 5–10 |
| Code audit | $1.00–5.85 | 10–25 |
| Autonomous multi-step | $2.00–15.00+ | 15–50+ |
Rule: Always hard-cap decision loops. No agent gets unlimited retries. Budget per-task, not per-month.
“Just vector DB it” fails at scale (mutation, deletion, selective recall). The production pattern emerging: structured JSON state + summarize-and-compress for context window. For durable long-running tasks: Temporal or Inngest. Write paths matter more than read paths.
HFS Research coined “agentic-washing.”19 AI21 called most agents “glorified if-else statements.” Klarna’s AI customer service reversal is the canonical proof point — CEO admitted quality tanked and re-hired humans20. 78% of enterprise agent pilots didn’t reach production.21
The market is bifurcating: simple agent tasks (chat + tools) are commoditizing toward direct API calls. Complex orchestration (multi-agent, durable workflows) is consolidating around LangGraph and Temporal. The middle ground — “framework that wraps a simple agent” — is dying.
| Architecture | What Happened | Lesson |
|---|---|---|
| Fully autonomous agents (AutoGPT, BabyAGI) | Loops burned money, produced garbage | Humans in the loop aren’t optional |
| Multi-agent communication | “Politeness loops” — agents thanking each other, $4K wasted18 | Agent-to-agent is a research problem, not a product pattern |
| AI hardware (Humane Pin, Rabbit R1) | Both dying — no distribution, no ecosystem | Hardware needs software moat first |
| Framework-heavy builds | Octomind: 12 months on LangChain, ripped it out14 | Frameworks add friction before they add value |
| Dual foundation-model-plus-agent | Adept acqui-hired by Amazon | Don’t train your own model AND build the agent |
| Standalone agent products | OpenAI Operator folded into ChatGPT after 7 months | Agent products get absorbed into platforms |
Grade: 70% insight, 30% oversimplification. The while-loop thesis is correct for most builders today. But it understates the operational overhead. You won’t use a framework for the LLM orchestration — but you’ll want infrastructure for observability, session management, and crash recovery. Those are infrastructure problems, not framework problems.
Stop asking “which framework.” Ask these three questions:
| Users | Architecture | Deploy |
|---|---|---|
| Just me | Cursor CLI headless (cursor-agent -p -f) | Your Mac + cron/webhook |
| 1–10 (pilot) | While-loop + Claude API + your tools | Railway / Fly.io / VPS |
| 10–1000 | While-loop + Claude API + session store (Redis/Supabase) | Railway / Fly.io |
| 1000+ | While-loop + Claude API + Temporal/Inngest for durability | AWS/GCP with auto-scaling |
| Channel | Best Option | Notes |
|---|---|---|
| OpenClaw (free) or Twilio + custom (paid) | OpenClaw has gateway instability; Twilio is reliable but $$$ | |
| Web / Mobile app | Custom API (Express/Fastify) | Full control, cleanest architecture |
| Telegram | Bot API is free, reliable, well-documented | Easiest channel to start with |
| Discord | Discord.js + your agent loop | Good for communities |
| Need | Best Option | Cost |
|---|---|---|
| Read/write files | Direct filesystem or S3 | Free / pennies |
| Run code | E2B sandboxes | $21/mo base7 |
| Browse web | Browserbase / Stagehand | $99/mo22 |
| Many external APIs | Composio (800+ toolkits) | Free tier available8 |
| Custom tools | MCP servers (build your own) | Free (open protocol) |
cursor-agent -p -f for cron jobs, prmupdate, research runs. This is the right tool for Eric-only workflows.Claude’s extended thinking + tool_use becoming real-time. Today there’s a ~2–5s latency per tool cycle. When that drops to sub-second (inference optimization + edge deployment), the “agent as backend” becomes indistinguishable from a traditional API. That’s when agents stop being a UX compromise and start being the default architecture. Timeline: 12–18 months.
cursor-agent -p -f (already working)The winning architecture for agentic backends is a while-loop calling a frontier model with tool_use. Not a framework. Not an IDE-as-backend. Not a multi-agent swarm. A loop.
Every product above $50M ARR uses this pattern. Cursor is this pattern. Anthropic recommends this pattern. Teams that started with frameworks are ripping them out.
The framework layer has value for enterprise orchestration (durability, audit trails, multi-agent). But 90% of builders — including Eric — don’t need it. Build the loop, add infrastructure (observability, session management) when you need it, and invest your real engineering time in the thing no framework gives you: domain-specific conversation design, tools, and data.
The moat is not the agent architecture. The moat is the SOUL.md.
One-sentence version: “The best agentic backend is 200 lines of code calling Claude with your tools — everything else is a premature abstraction.”