I Replaced My Team With 10 AI Agents - Real Results

The Team: 10 Agents, Named and Accountable

Let me introduce you to the team. These are not chatbot wrappers. Each agent has a written job definition, operating principles, an anti-brief (things they explicitly refuse to do), and a learning schedule. They coordinate through a Supabase database, hand work off to each other, and flag blockers for me to resolve.

Agent	Role	What They Actually Do
Holden	Chief Revenue Officer	Revenue briefs, pipeline monitoring, strategic recommendations. Holden identified our pipeline state autonomously: EUR 40K MRR, $45K active pipeline, 4 deals in play. He surfaces what I should focus on each week.
Bobbie	Strategic Account Executive	Copy gating, deal record management, follow-up drafting. Nothing goes out to a prospect without Bobbie reviewing it for positioning and tone.
Amos	Outbound Specialist	Apollo sequence building, prospecting lists, enrollment. Amos built our Shopify outbound sequences — value-first A/B variants that passed Bobbie's copy gate.
Alex	Growth Manager	ICP research, content strategy, portfolio management. Alex completed a PE/Portfolio CTO ICP analysis that identified a new high-LTV segment I hadn't considered.
Dawes	Personal Brand Agent	LinkedIn profile audits, content recommendations, post drafting. The least mature agent — still finding his voice.
Prax	Website & SEO	Keyword research, blog post production, technical SEO. Prax runs the entire content engine for growthanalyticsengine.com — 43 pages and counting.
Elvi	Internal Product Manager	System audits, agent registry, knowledge management. Elvi keeps the agent team itself running — documentation, process improvements, internal tooling specs.
Naomi	Lead Developer	Builds, deployments, infrastructure. Naomi runs on the Mac Mini — our always-on execution server — handling scheduled tasks and technical builds.
Anna	Chief of Staff	Weekly planning, blocker surfacing, coordination. Anna drafted the W15 plan that included the line: "Total Gregor time: ~1.5 hours. Everything else is agent-owned."
Cotyar	Finance Monitor	Still a stub. Supposed to track invoicing, cash flow, runway. Hasn't become useful yet. I'll be honest about why below.

Each agent is a Claude Code session with persistent memory, defined tools, and file system access. They read from and write to a shared knowledge base — a Git repo I call the "brain." They coordinate through Supabase tables that track handovers, thread state, and completion status.

What Actually Works: The Wins Nobody Expected

The surprise wasn't that agents could write drafts or do research. Everyone knows that. The surprise was what they could do when coordinated.

Holden's Pipeline Awareness

Holden, the CRO agent, pulled together our full pipeline state without me touching a spreadsheet. EUR 40K MRR, $45K active pipeline, 4 deals at various stages. He didn't just report numbers — he recommended which deals I should personally advance and which could wait. That's judgment, not just data retrieval.

Alex's ICP Discovery

Alex researched a PE/Portfolio CTO segment I hadn't prioritized. He analyzed the characteristics of our best clients, identified patterns in deal size and engagement length, and came back with a researched brief on why portfolio companies rolling up D2C brands need our exact service. That research would have taken me a full afternoon. Alex did it in one session while I was on a client call.

Amos + Bobbie: The Quality Chain

Here's where multi-agent coordination shows its value. Amos built outbound sequences targeting Shopify merchants. But instead of those going straight to Apollo, Bobbie gated them — reviewing every email for positioning accuracy, tone, and whether the value proposition actually matched our offer. Two sequences passed. One got sent back for revision. That feedback loop between agents is something I didn't design upfront — it emerged from giving each agent clear ownership of their domain.

Anna's Weekly Planning

Anna's W15 plan wasn't just a task list. She synthesized inputs from Holden (revenue priorities), Alex (growth opportunities), and Amos (outbound status) into a prioritized week with clear ownership. The plan included a "not-doing" list — things we explicitly decided to skip. That's the kind of strategic work I used to spend Sunday evenings on.

Prax's Content Machine

Prax has been the quiet workhorse. The growthanalyticsengine.com site now has 19 blog posts and 21 guides — 43 pages of content targeting PostHog frustration keywords with zero competition. The publishing workflow is fully agent-owned: JSON data files, Python generators, static HTML, auto-deployed via Cloudflare Pages. I review the first 3 articles in any new cluster for voice calibration, then Prax self-gates.

The Honest Failures: What Broke, What's Still Broken

This is the part most people skip. I'm not going to.

106 Orphaned Attio Tasks

Early on, I set up a daily-sales-prep workflow that generated Slack messages and created tasks in Attio, our CRM. The problem: it produced wall-of-text DMs that were hard to action, and it created 106 tasks that nobody — including me — ever completed. The quote from our internal sales playbook: "Don't use Attio for task management. Attio = CRM data only. Urgent alerts go to Slack. Execution goes to Claude Code sessions."

That was a lesson in tool fit. The agent was doing exactly what I told it to do. The architecture was wrong.

Empty Apollo Sequences

For over two weeks, Amos had sequences marked "active" in Apollo with zero contacts enrolled. Nobody noticed. The sequences looked live in every dashboard. They weren't. The contacts hadn't been loaded because an enrollment step was failing silently.

This is the scary part of AI agent operations: things can look like they're working when they're not. With a human, you'd notice that nobody's getting emails because the human would be writing them. With an agent, the system says "active" and you move on.

LinkedIn Monitoring: Technically Fragile

Dawes was supposed to monitor LinkedIn for signals — job changes, posts from prospects, engagement opportunities. The reality: browser automation requires auth, pages load dynamically, results vary by session. High failure rate. We're still using it, but I can't pretend it's reliable. It works maybe 60% of the time.

The GTM Dashboard That Ran on Stale Data

We built a GTM Hub dashboard that was supposed to give me a real-time view of pipeline, outbound, and content performance. The sync button was a stub. For three weeks, I was looking at numbers that were 3 weeks old and making decisions based on them. The lesson from our internal playbook: "If something hasn't worked in months, the architecture is wrong."

Fireflies Integration: Needs a Human in the Loop

We tried to automate the flow from Fireflies (meeting transcripts) to Attio (CRM records). It works beautifully in theory: call happens, transcript arrives, agent extracts deal signals, updates CRM. In practice, it requires an interactive approval loop — someone needs to confirm "yes, this was a sales conversation" vs. "this was an internal standup." Doesn't work unattended.

Cotyar: The Agent That Never Launched

Cotyar was supposed to be our Finance Monitor — tracking invoicing, cash flow, and runway. He's still a stub. Why? Because our CFO Allen handles invoicing, and the financial data lives across too many disconnected systems (bank accounts, accounting software, spreadsheets) for an agent to access reliably. Some roles genuinely need human systems access that agents can't have yet.

The Economics: What This Actually Costs

Let me be transparent about the numbers, because the "AI replaces humans" narrative always glosses over costs.

What I Spend

The agent team runs on Claude Max at the 20x tier. That's the primary cost. Beyond that: Supabase free tier for coordination, Cloudflare free tier for hosting, and the various SaaS tools the agents interact with (Apollo, Attio, etc.) which I'd be paying for regardless.

The Mac Mini running Naomi costs electricity and sits in my Amsterdam apartment. It's an always-on execution server — scheduled tasks, deploys, the kind of work that needs to happen at 3am without me being awake.

What I Save

Here's the comparison I keep coming back to: for the sales and marketing execution my agent team handles, the human alternative would be 2-3 junior hires. In the Netherlands, that's EUR 3,000-4,500 per person per month fully loaded. So EUR 6,000-13,500 per month for humans vs. a fraction of that for agents.

But the comparison isn't quite fair. The agents don't have judgment for ambiguous situations. They can't read a room on a client call. They can't build genuine relationships. What they can do is execute at 2am, never forget a follow-up sequence, research 50 companies in the time a human researches 5, and produce a first draft of anything in minutes.

The Time Equation

This is what actually matters. My weekly time on sales and GTM execution: roughly 1.5 hours. That's reviewing agent outputs, approving sequences, making strategic calls, and handling the things only I can do (discovery calls, relationship messages, pricing decisions).

Before the agent team, I was spending 15-20 hours per week on the same surface area. Most of that was research, drafting, CRM hygiene, and sequence setup. None of it required my judgment — I was just the only person available to do it.

The leverage ratio — my framework for tracking this — went from about 1.5x to somewhere between 2x and 3x. Still below my target of 3-5x, but the trajectory is clear.

What I'd Do Differently If Starting Over

Three months in, here's what I know now that I wish I'd known at the start.

1. Start With Workflows, Not Agents

I treated every automation as an "agent" — autonomous, conversational, judgment-heavy. Most of what I automated should have been workflows instead: deterministic, structured, repeatable. The daily sales prep? That's a workflow. The Slack ticket sweep? Workflow. The post-call deal detection? Workflow. Save the expensive, autonomous agent behavior for things that actually need judgment: discovery call strategy, revenue path brainstorming, deal negotiation prep.

Workflows can run on fast, cheap models (Haiku). Agents need the expensive models (Opus). Proper routing means more automation at lower cost.

2. Fix Inputs, Not Outputs

This principle comes from John Rush, who runs 26 startups with AI agents. When agent output is bad, the instinct is to add more review layers, more quality gates, more human oversight. The real fix is almost always in the inputs: better prompts, better context files, better data structure. Our 106 orphaned Attio tasks weren't a quality problem — they were an architecture problem. The input was wrong.

3. Build the Coordination Layer First

I built individual agents first and added coordination later. That's backwards. The Supabase handover table, the thread state tracking, the memory system — that should have been day-one infrastructure. Without coordination, you don't have a team. You have 10 individuals who don't know what each other are doing.

4. Expect 60% Reliability, Not 95%

AI agents are not software. They don't have the reliability of a cron job or a well-tested API integration. Things that work Tuesday might not work Thursday because the model interpreted the same prompt differently. Build for that. Have fallbacks. Check outputs. Don't assume "active" means "working."

5. Define What the Human Does, Not Just What the Agent Does

My biggest mistake was defining agent roles without defining my own role clearly enough. The result: agents would produce work, I'd feel obligated to review everything, and the leverage ratio stayed low. Once I defined my role as strictly "strategy, relationships, and approval gates" — and explicitly listed what I would NOT do — the system started working.

The Verdict: Is It Worth It?

Yes. But not for the reasons you'd expect from reading AI Twitter.

It's not worth it because AI agents are brilliant autonomous workers who never make mistakes. They're not. Holden sometimes gives me a revenue brief with numbers from last month. Amos builds sequences that look polished but have nobody enrolled. Dawes still can't reliably monitor LinkedIn.

It's worth it because of the time reallocation. I went from 15-20 hours per week on GTM execution to 1.5 hours. Those freed hours go into discovery calls, client relationships, and strategic thinking — the things that actually move revenue. The constraint in my business was never delivery quality or team capability. It was demand generation. And demand generation is relationship-driven work that requires me to be present, not buried in sequence setup and CRM hygiene.

The agent team didn't make me superhuman. It made me available for the work that only I can do.

Our business does EUR 40K per month with a team of 4 humans and 10 AI agents. We're targeting EUR 1M for 2026. The gap isn't in capacity or systems — it's in pipeline. And pipeline comes from the work I can now actually do, because I'm not drowning in the work agents handle.

If you're a consultant, freelancer, or agency operator spending more than half your time on execution instead of relationships and strategy, an AI agent team is worth building. Not because the agents are perfect — but because being freed from the work they can do imperfectly is still better than being trapped doing it yourself.

The messy middle is real. The economics work. The failures are part of the system. And the system, imperfect as it is, runs whether I feel disciplined or not.

That's the actual point.

Frequently Asked Questions

How long did it take to set up the 10-agent AI team?

The individual agent definitions took about 2 weeks of iterative work — writing job definitions, testing prompts, calibrating outputs. The coordination layer (Supabase handovers, thread state, memory system) took another week. But the team wasn't truly operational until week 4-5, after the first round of failures forced redesigns. Expect at least a month before things stabilize.

What AI model do the agents use?

Claude (Anthropic) via Claude Code. Different tasks use different model tiers: strategic work (Holden's revenue briefs, Anna's weekly plans) runs on the most capable model. Deterministic workflows (data formatting, sitemap updates) could run on cheaper, faster models. Getting this routing right is one of the biggest cost optimization levers.

Can AI agents fully replace human employees?

No. My agents can't build genuine client relationships, read a room during a sales call, make ambiguous judgment calls about company strategy, or access most financial systems. They replace the execution layer — research, drafting, scheduling, monitoring, publishing. The human layer (strategy, relationships, approvals) is still essential. The goal isn't replacement. It's reallocation of human time to higher-leverage work.

What happens when an AI agent makes a mistake?

It depends on the mistake. Some are caught by other agents (Bobbie gates Amos's outbound copy). Some are caught by automated checks (empty sequences, failed deployments). Some aren't caught for weeks (the stale GTM dashboard). The honest answer: you need monitoring, you need quality gates, and you need to accept that 60% reliability is the current ceiling for autonomous agent work. Build your system around that reality.

⚡

This article was drafted by an AI agent and reviewed by Gregor Spielmann. The source material, frameworks, and experiences are real. The writing is AI-assisted. Learn how this site works.