Most people automate the wrong things first.
They pick the task that sounds coolest ("AI-generated customer proposals!") instead of the one that actually saves money ("classify incoming emails by priority"). Six weeks later, the cool automation is broken and they're still sorting emails by hand.
The fix: score every task on two dimensions before you touch it. Reversibility × Judgment. That's it. Those two numbers tell you everything.
The Two Dimensions
Reversibility (1–5): How easy is it to undo if the AI gets it wrong?
- 5 (fully reversible): Draft a tweet, sort a file, generate a report. If it's bad, delete it. Zero consequences.
- 4 (easy to fix): Send an internal Slack message, reschedule a post. Mildly annoying to undo.
- 3 (moderate effort): Send a customer email, publish a blog post. Correctable but someone might have seen it.
- 2 (hard to reverse): Process a refund, change pricing, update billing. Real money moves.
- 1 (irreversible): Delete data, send legal documents, post something that goes viral for the wrong reasons.
Judgment Required (1–5): How much human-level thinking does the task need?
- 1 (zero judgment): Format conversion, scheduling posts at set times, file organization.
- 2 (pattern matching): Email classification, content categorization, data extraction.
- 3 (informed decisions): Writing drafts, reply triage, basic analysis.
- 4 (strategic thinking): Pricing decisions, customer escalation, partnership evaluation.
- 5 (pure intuition): Brand direction, crisis response, major business pivots.
The 5 Tiers
Plot your scores. The combination tells you the tier:
| Tier | Reversibility | Judgment | Action |
|---|---|---|---|
| 1: Full Auto | 4–5 | 1–2 | Automate completely. No human in the loop. |
| 2: Auto + Log | 3–5 | 2–3 | Automate but log everything. Spot-check weekly. |
| 3: AI Drafts, Human Approves | 2–3 | 3–4 | AI does the work. Human reviews before execution. |
| 4: Human Does, AI Assists | 1–2 | 4–5 | Human drives. AI provides data and suggestions. |
| 5: Human Only | 1 | 5 | Don't automate. The risk isn't worth the time saved. |
The key insight: Tier 1 tasks are where you start. Not Tier 3, not Tier 4. The boring, reversible, low-judgment stuff. It's not exciting. It's profitable.
A Filled-In Matrix
Here's the actual delegation matrix from my business. Every task I do, scored and classified:
Tier 1: Full Auto (95 min/day saved)
| Task | Rev. | Judg. | Time Saved | AI Cost |
|---|---|---|---|---|
| Post scheduled tweets | 5 | 1 | 15 min/day | $0.00 |
| Pull daily analytics | 5 | 1 | 10 min/day | $0.00 |
| Generate dashboard | 5 | 1 | 20 min/day | $0.00 |
| Check payment links | 5 | 1 | 5 min/day | $0.00 |
| Monitor file freshness | 5 | 1 | 10 min/day | $0.00 |
| Classify emails by priority | 5 | 2 | 15 min/day | $0.01 |
| Queue content from calendar | 5 | 2 | 20 min/day | $0.02 |
Notice: most Tier 1 tasks don't even need AI. They're pure Python. Posting a tweet from a queue is a file read + HTTP request. No language model required. This is important — the most valuable automation often has zero AI cost.
Tier 2: Auto + Log (45 min/day saved)
| Task | Rev. | Judg. | Time Saved | AI Cost |
|---|---|---|---|---|
| Draft tweets from prompts | 4 | 3 | 25 min/day | $0.02 |
| Triage engagement mentions | 4 | 2 | 10 min/day | $0.003 |
| Generate weekly report | 5 | 3 | 30 min/week | $0.04 |
| Draft reply to mentions | 4 | 3 | 15 min/day | $0.008 |
These are auto-executed but everything gets logged. I review the tweet drafts weekly (takes 10 minutes). If quality drifts, I update the voice config. I don't approve each one individually — that defeats the point.
Tier 3: AI Drafts, Human Approves (20 min/day saved)
| Task | Rev. | Judg. | Time Saved | AI Cost |
|---|---|---|---|---|
| Customer support emails | 3 | 3 | 15 min/day | $0.003 |
| Blog post drafts | 3 | 4 | 60 min/week | $0.06 |
| Email sequence copy | 3 | 3 | 30 min/week | $0.02 |
| Product page updates | 3 | 3 | 20 min/week | $0.01 |
These sit in an approval queue. AI drafts go into pending/. I review them in a batch — 15 minutes every other day. Approve, edit, or reject. The system learns from edits over time (rejection rate dropped from 30% to 8% in 6 weeks).
Tier 4: Human Does, AI Assists
| Task | Rev. | Judg. | AI Role |
|---|---|---|---|
| Pricing changes | 2 | 4 | Pull competitor data, model scenarios |
| Refund decisions (unusual) | 2 | 4 | Surface customer history, suggest resolution |
| Partnership evaluation | 2 | 5 | Research, summarize, score fit |
| Product roadmap | 2 | 5 | Aggregate feedback, identify patterns |
Tier 5: Human Only
| Task | Rev. | Judg. | Why |
|---|---|---|---|
| Brand voice changes | 1 | 5 | Defines everything downstream |
| Legal agreements | 1 | 5 | Irreversible liability |
| Crisis communication | 1 | 5 | One wrong word is catastrophic |
| Major pivots | 1 | 5 | Bet-the-business decisions |
How to Score Your Own Tasks
Step 1: List every recurring task in your business. All of them. Even the 2-minute ones.
Step 2: Score each on both dimensions. Be honest — most people overrate the judgment their tasks require. "Writing social posts" feels like a 5 until you realize you do the same pattern every day.
Step 3: Sort by tier. Count the minutes saved in Tier 1 and Tier 2.
# delegation_scorer.py — automate the scoring
import json
def score_task(task_description: str) -> dict:
"""Score a task on reversibility and judgment required."""
prompt = f"""Score this business task on two dimensions (1-5 each):
Task: {task_description}
REVERSIBILITY (1=irreversible, 5=fully reversible):
- 5: Can delete/redo with zero consequences
- 4: Easy to fix, minor inconvenience
- 3: Correctable but someone may have seen it
- 2: Real money or trust moves, hard to undo
- 1: Cannot be undone
JUDGMENT REQUIRED (1=none, 5=pure intuition):
- 1: Mechanical formatting, scheduling, filing
- 2: Pattern matching, classification, extraction
- 3: Informed decisions with clear criteria
- 4: Strategic thinking, weighing tradeoffs
- 5: Gut calls, crisis response, identity decisions
Return JSON: {{"reversibility": N, "judgment": N, "tier": N, "reasoning": "..."}}"""
# Route to Haiku — this is classification work
response = call_model("haiku", prompt)
return json.loads(response)
# Score a full task list
tasks = [
"Post scheduled tweets to X",
"Reply to customer support emails",
"Update product pricing",
"Generate weekly analytics report",
"Write blog post draft",
"Process refund request",
"Review partnership proposal",
]
for task in tasks:
result = score_task(task)
print(f"Tier {result['tier']}: {task}")
print(f" Rev={result['reversibility']} Judg={result['judgment']}")
Cost to score 20 tasks: $0.006. Six tenths of a cent for a complete automation roadmap.
The $2,375/Month Calculation
Tier 1 saves 95 minutes per day. Tier 2 saves another 45. That's 140 minutes — 2 hours and 20 minutes — every single day.
At $50/hour (conservative for someone running a business):
- Daily savings: $116.67
- Monthly savings (20 business days): $2,333
- AI cost to achieve this: $1.61/month
- ROI: 1,449x
That's not the ROI on fancy automation. That's the ROI on automating the boring stuff — the tasks nobody wants to do, that take 2 minutes each but happen 47 times a day.
The 3 Rules
Rule 1: Start with Tier 1, always. Every Tier 1 task you automate is pure profit. No approval queues, no quality reviews, no hand-wringing. Just scripts running on cron doing predictable work.
Rule 2: Tier 2 before Tier 3. People jump to the exciting Tier 3 tasks (AI writing customer emails!) before automating Tier 2 (AI classifying those emails). Classify first, draft second. Get the pipeline right before you turn on the tap.
Rule 3: Never skip to Tier 4. If you haven't automated Tier 1 and 2, you have no business asking AI for "strategic advice." You're still spending 2 hours a day on work that a cron job handles. Fix that first.
The Anti-Pattern: Automating Backwards
Here's what going backwards looks like:
- Week 1: Build an AI that writes product descriptions (Tier 3). Spend 4 hours tweaking prompts.
- Week 2: Realize you're still manually posting them to the website. Build an auto-poster (Tier 1).
- Week 3: Discover the AI descriptions have errors you didn't catch. Build a quality check (Tier 2).
- Week 4: Notice you built the pipeline backwards and half the descriptions were published without review.
The right order: auto-poster first (Tier 1, $0), quality check second (Tier 2, $0.003/check), AI descriptions last (Tier 3, $0.01/description). Same result. No publishing errors. Half the time.
What Changes Over Time
Tasks graduate between tiers as your system gets smarter:
| Task | Week 1 | Week 4 | Week 12 | What Changed |
|---|---|---|---|---|
| Tweet drafts | Tier 3 | Tier 2 | Tier 2 | Voice calibration improved, 92% no-edit rate |
| Support emails | Tier 3 | Tier 3 | Tier 2 | Template library covers 80% of cases |
| Refund decisions | Tier 4 | Tier 4 | Tier 3 | Clear policy + history makes most cases obvious |
| Blog posts | Tier 4 | Tier 3 | Tier 3 | Voice bible + examples reduced edit time 70% |
Don't try to force tasks down tiers. Let them graduate naturally as your prompts, templates, and guardrails improve. If tweet drafts are still getting 30% rejection at Week 4, they're not ready for Tier 2 yet — your voice config needs work.
The Template
Copy this. Fill it in. It's the fastest way to a clear automation roadmap:
{
"business": "YOUR BUSINESS",
"date": "2026-03-14",
"tasks": [
{
"name": "Task description",
"frequency": "daily|weekly|monthly",
"current_time_min": 15,
"reversibility": 5,
"judgment": 2,
"tier": 1,
"ai_model": "none|haiku|sonnet|opus",
"estimated_ai_cost_monthly": 0.00,
"status": "not_started|in_progress|automated",
"notes": ""
}
],
"summary": {
"total_tasks": 0,
"tier_1_count": 0,
"tier_1_time_saved_daily_min": 0,
"estimated_monthly_value": 0,
"estimated_monthly_ai_cost": 0,
"roi_multiple": 0
}
}
Fill in the tasks. Sort by tier. Automate Tier 1 this week. That's the whole strategy.
Want the complete delegation system? The Operator Playbook includes the full matrix, the scoring script, the approval queue code, and the trust ladder that lets tasks graduate between tiers automatically.