AI Email Triage: Sort and Route Inboxes with Claude

Series
AI in Production: 30 Real-World Use Cases with Claude

Part 14 of 30 · View the full series

TL;DR

  • AI email triage with Claude classifies each message into a category, priority level, suggested owner, and a draft reply, all returned as structured JSON in a single API call.
  • Use claude-haiku-4-5 for high-volume mailboxes where cost and latency matter. Switch to claude-sonnet-4-6 when reply quality is the priority.
  • Structured output via tool use guarantees machine-readable JSON every time, no string parsing, no fragile regex.
  • Prompt caching on the system prompt cuts token costs by up to 90% when you process batches of emails back-to-back.
  • The full Python POC below processes a sample mailbox of ten emails and prints a routing table in under two seconds on Haiku.
  • The biggest production pitfall is letting the model guess an owner from a vague description. Lock in an explicit list of valid assignees inside the prompt.

Why AI Email Triage Actually Matters

Support inboxes and shared team mailboxes are a persistent operational drain. A mid-size SaaS company running a shared support@ address will see 300 to 800 emails per day across billing questions, bug reports, partnership pitches, sales inquiries, and spam. A human triager reads each one, picks a label, decides who handles it, and types (or copy-pastes) a first acknowledgment. That loop takes 2 to 4 minutes per email and is exactly the kind of repetitive, pattern-matching work that a well-prompted language model handles well.

AI email triage is not about replacing the human who writes the final reply. It is about removing the classification step so that human attention goes straight to resolution. When every incoming email already has a category, a priority score, and a suggested owner by the time a person opens it, the cognitive load per message drops significantly. Routing mistakes fall. SLA compliance improves. And the draft reply that Claude generates handles the easy cases outright, which for many mailboxes is 40 to 60 percent of the volume.

This article builds the system from scratch. You will get a complete Python project that reads a sample mailbox, calls Claude with a structured-output tool, and returns a routing table. We cover model choice, cost math, prompt design, and the failure modes you will actually encounter in production.

Who benefits most

  • Technical founders running a small support team who cannot justify a full ticketing workflow yet.
  • Engineering teams with a shared devops@ or alerts@ mailbox that fills up with noise.
  • Sales teams who want inbound leads separated from vendor solicitations before they reach the CRM.
  • Ops teams processing vendor invoices, contract requests, and compliance notices from a single mailbox.

Architecture of an AI Email Triage Pipeline

The system has three moving parts: ingestion, classification, and dispatch. The classification step is where Claude does its work. The others are plumbing you already own or can wire up in an afternoon.

Inbox (IMAP / webhook)

Normalise (strip HTML, truncate)

Claude classify + priority draft reply

Dispatch (ticket / Slack / CRM)

raw email plain text + metadata structured JSON routed + drafted

Figure 1. The three-stage email triage pipeline. Claude sits in the middle classification step and returns a JSON object per email.

Ingestion

For the POC, the mailbox is a Python list of dicts. In production you pull from IMAP with imaplib, a Gmail or Outlook webhook, or a queue that your mail server writes to. The important detail is that you strip HTML and truncate long threads before sending to Claude. Most emails compress to under 400 tokens of plain text, which keeps cost very low.

Classification (Claude)

This is the core step. You send each email as a user message, include a detailed system prompt that specifies your categories, priority levels, and the exact list of valid assignees, and use tool use to force Claude to return a structured JSON object. The tool schema acts as a type contract: Claude cannot respond with anything that does not match it.

Dispatch

The JSON output drives whatever comes next. A priority: "P1" result with category: "security" might page an on-call engineer. A category: "sales_inquiry" might create a HubSpot deal. A priority: "P3" billing question might just get the draft reply queued for a human to approve before sending. The dispatch logic is yours; Claude gives you the data to act on.

Choosing the Right Model for AI Email Triage

Model choice here is a cost and quality trade-off. Email classification is not a hard reasoning task. It is pattern matching on relatively short, structured text. That means Haiku is usually the right first choice.

Model Input cost (per 1M tokens) Output cost (per 1M tokens) Typical latency (classify + draft) Best for
claude-haiku-4-5 $0.80 $4.00 0.4 to 0.8 s High volume, cost-sensitive, simple routing
claude-sonnet-4-6 $3.00 $15.00 1.0 to 2.5 s Complex emails, nuanced drafts, ambiguous priority
claude-opus-4-8 $15.00 $75.00 3.0 to 8.0 s High-stakes triage (legal, executive escalations)

A real production setup often routes with Haiku first and escalates to Sonnet only when Haiku returns a low-confidence result (you can add a confidence field to your tool schema). Part 27 of this series covers model routing and batching patterns in detail if you want to build that escalation path.

Prompt caching pays off here

Your system prompt encodes your categories, priority rubric, and assignee list. That text is identical for every email in a batch. If you add a cache_control block to the system prompt, Claude caches it after the first call. Subsequent calls in the same batch get a 90% discount on those tokens. With a 1,500-token system prompt and a 200-email batch, caching saves roughly 270,000 input tokens, which on Haiku is about $0.22. Not large in isolation, but it compounds across millions of emails per month. Part 4 of this series covers prompt caching mechanics in full.

Designing the Classification Schema

The output schema is as important as the prompt. If you define vague fields, you get vague outputs. Be explicit about every field and enumerate every valid value.

Fields to capture

  • category: An enum of your business categories. Examples: billing, bug_report, feature_request, security, partnership, sales_inquiry, spam, internal, other.
  • priority: A simple scale such as P1 (urgent, needs same-day response) through P4 (informational, no action needed). Define the rubric in the prompt, not just the label names.
  • suggested_owner: An email address or team slug from a fixed list you provide. This prevents the model from inventing assignees.
  • sentiment: positive, neutral, frustrated, angry. Useful for SLA escalation rules.
  • summary: A one-sentence summary of the email. Saves the assignee from reading the full thread.
  • draft_reply: A short, professional first reply. The assignee edits and sends it, or the system sends it automatically for P3/P4 cases.
  • action_required: Boolean. True when the email requires an explicit action (approval, fix, refund, etc.) rather than just a response.
Key idea: Put your valid assignee list directly in the system prompt as a structured list. Something like “Valid owners: [email protected] (billing and general support), [email protected] (security and compliance), [email protected] (partnerships and sales), [email protected] (bug reports and feature requests).” Claude will not invent an owner outside this list, which prevents routing failures.

Structured output via tool use

Part 3 of this series explains structured JSON output patterns in detail. The short version: define one tool with a JSON Schema that matches your output fields, pass it in the tools list, and add tool_choice={"type": "tool", "name": "classify_email"} to force Claude to use it. The response will always be a valid JSON object matching your schema. No string parsing, no try/except around json.loads.

The Complete Python POC

The following project is self-contained. It defines ten sample emails, builds the system prompt with prompt caching, calls Claude with structured output, and prints a routing table. Every file is shown in full.

Install and setup

pip install anthropic python-dotenv

requirements.txt

anthropic>=0.30.0
python-dotenv>=1.0.0

.env

ANTHROPIC_API_KEY=sk-ant-...

Main source: email_triage.py

"""
email_triage.py
---------------
AI email triage using Claude structured output.
Classifies a sample mailbox into category, priority,
suggested owner, sentiment, summary, and draft reply.

Run:
    python email_triage.py

Requirements:
    pip install anthropic python-dotenv
    ANTHROPIC_API_KEY in .env or environment
"""

import json
import os
from typing import Any

import anthropic
from dotenv import load_dotenv

load_dotenv()

# ---------------------------------------------------------------------------
# Sample mailbox: a list of dicts representing incoming emails
# ---------------------------------------------------------------------------

SAMPLE_MAILBOX: list[dict[str, str]] = [
    {
        "id": "e001",
        "from": "[email protected]",
        "subject": "Invoice #INV-2024-0891 still unpaid after 30 days",
        "body": (
            "Hi, we sent invoice #INV-2024-0891 on April 3rd for $4,200. "
            "It has been 30 days and we have not received payment or any acknowledgment. "
            "Please advise on the status immediately. If not resolved this week we will "
            "need to escalate to our accounts team."
        ),
    },
    {
        "id": "e002",
        "from": "[email protected]",
        "subject": "Partnership proposal: co-marketing opportunity",
        "body": (
            "Hi team, I run growth at Startup.io (8k users, B2B SaaS). "
            "We have an audience that strongly overlaps with yours and would love to explore "
            "a co-marketing arrangement: joint webinar, newsletter swap, or a mutual case study. "
            "Happy to jump on a 20-minute call this week. Let me know!"
        ),
    },
    {
        "id": "e003",
        "from": "[email protected]",
        "subject": "URGENT: SQL injection vulnerability detected on your login page",
        "body": (
            "Our automated scanner detected a SQL injection vulnerability on "
            "https://yourapp.com/login. Parameter: username. Payload: ' OR 1=1 --. "
            "CVSS score: 9.1 (Critical). Immediate remediation recommended. "
            "Contact our team for a full penetration test report."
        ),
    },
    {
        "id": "e004",
        "from": "[email protected]",
        "subject": "Unsubscribe me please",
        "body": (
            "I have asked three times now to be removed from your mailing list. "
            "Please unsubscribe me immediately. This is harassment."
        ),
    },
    {
        "id": "e005",
        "from": "[email protected]",
        "subject": "Interested in your Enterprise plan: pricing question",
        "body": (
            "Hi, I am evaluating tools for our engineering team of 120. "
            "Your Pro plan caps at 25 seats. Do you offer volume discounts for 100+ seats? "
            "We also need SSO (SAML 2.0) and a BAA for HIPAA. "
            "Can you send a quote and schedule a demo?"
        ),
    },
    {
        "id": "e006",
        "from": "[email protected]",
        "subject": "Bug: export to CSV crashes on Firefox 124",
        "body": (
            "I clicked Export to CSV on the Reports page and the browser tab crashed. "
            "Using Firefox 124.0.1 on Windows 11. It works fine on Chrome. "
            "This is blocking our weekly reporting run. "
            "Error in console: Uncaught TypeError: Cannot read properties of undefined (reading 'map')."
        ),
    },
    {
        "id": "e007",
        "from": "[email protected]",
        "subject": "Just wanted to say thank you!",
        "body": (
            "I switched from your competitor two months ago and the difference is night and day. "
            "The new dashboard update last week is exactly what I had been asking for. "
            "Keep up the great work. You have a customer for life."
        ),
    },
    {
        "id": "e008",
        "from": "[email protected]",
        "subject": "Feature request: bulk export via API",
        "body": (
            "Hi, we are building an internal data pipeline and currently have to export "
            "data manually every day. A bulk export endpoint (e.g. GET /api/v2/exports/bulk) "
            "with date range filtering would save us a lot of time. "
            "Is this on the roadmap? Happy to provide more details or test a beta."
        ),
    },
    {
        "id": "e009",
        "from": "[email protected]",
        "subject": "Make $5000/day from home! Limited offer!!!",
        "body": (
            "CONGRATULATIONS you have been selected for our exclusive work from home program. "
            "Click here NOW to claim your free starter kit. Offer expires midnight. "
            "Guaranteed income or your money back!!!"
        ),
    },
    {
        "id": "e010",
        "from": "[email protected]",
        "subject": "Data processing agreement request for GDPR compliance",
        "body": (
            "Our client is preparing to deploy your product for EU customers and requires "
            "a Data Processing Agreement (DPA) under GDPR Article 28. "
            "Could you provide your standard DPA template, or let us know your process "
            "for signing a custom DPA? We need this resolved within 10 business days."
        ),
    },
]

# ---------------------------------------------------------------------------
# Tool schema for structured output
# ---------------------------------------------------------------------------

CLASSIFY_TOOL: dict[str, Any] = {
    "name": "classify_email",
    "description": (
        "Classify an incoming email and return routing metadata plus a draft reply."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "category": {
                "type": "string",
                "enum": [
                    "billing",
                    "bug_report",
                    "feature_request",
                    "security",
                    "partnership",
                    "sales_inquiry",
                    "spam",
                    "legal_compliance",
                    "positive_feedback",
                    "unsubscribe",
                    "internal",
                    "other",
                ],
                "description": "The primary category of the email.",
            },
            "priority": {
                "type": "string",
                "enum": ["P1", "P2", "P3", "P4"],
                "description": (
                    "P1=urgent same-day, P2=responds within 24h, "
                    "P3=within 72h, P4=informational no action."
                ),
            },
            "suggested_owner": {
                "type": "string",
                "enum": [
                    "[email protected]",
                    "[email protected]",
                    "[email protected]",
                    "[email protected]",
                    "[email protected]",
                    "[email protected]",
                    "no-action",
                ],
                "description": "The team or address that should handle this email.",
            },
            "sentiment": {
                "type": "string",
                "enum": ["positive", "neutral", "frustrated", "angry"],
            },
            "action_required": {
                "type": "boolean",
                "description": "True if the email requires an explicit action, not just a response.",
            },
            "summary": {
                "type": "string",
                "description": "One sentence (max 25 words) summarising what the email is about.",
            },
            "draft_reply": {
                "type": "string",
                "description": (
                    "A professional, concise first reply (2-4 sentences). "
                    "Do not include a subject line. Do not sign with a name."
                ),
            },
        },
        "required": [
            "category",
            "priority",
            "suggested_owner",
            "sentiment",
            "action_required",
            "summary",
            "draft_reply",
        ],
    },
}

# ---------------------------------------------------------------------------
# System prompt (will be cached after the first call in the batch)
# ---------------------------------------------------------------------------

SYSTEM_PROMPT = """You are an AI email triage assistant for a B2B SaaS company.
Your job is to read each incoming email and return a structured classification object.

CATEGORY DEFINITIONS:
- billing: payment issues, invoices, refunds, subscription changes
- bug_report: a reproducible defect in the product
- feature_request: a request for new functionality or an API change
- security: vulnerability reports, penetration tests, credential leaks, compliance incidents
- partnership: co-marketing, integrations, referral, affiliate, or BD proposals
- sales_inquiry: pricing questions, demos, enterprise plan evaluation, procurement
- spam: unsolicited commercial email, phishing, or obvious junk
- legal_compliance: GDPR, DPA, BAA, NDA, legal notice, data request
- positive_feedback: praise, thank you notes, NPS promoters
- unsubscribe: explicit request to be removed from a mailing list
- internal: email from a company domain employee (company.com)
- other: anything that does not fit the above

PRIORITY RUBRIC:
- P1 (same-day): security incidents, data breach, legal deadline under 5 days, critical production bug
- P2 (within 24h): billing escalation, enterprise sales with clear intent, customer-blocking bug
- P3 (within 72h): feature requests, general billing questions, partnerships, non-urgent compliance
- P4 (no action): positive feedback, spam, unsubscribe (route to automation), informational

VALID OWNERS:
- [email protected]: billing and general customer support
- [email protected]: invoice escalations, refunds, payment disputes
- [email protected]: security issues and compliance incidents
- [email protected]: partnerships, enterprise sales, pricing inquiries
- [email protected]: bug reports and technical feature requests
- [email protected]: GDPR, DPA, BAA, NDA, and legal notices
- no-action: spam and unsubscribe requests (handled by automation)

TONE FOR DRAFT REPLIES:
- Professional but human. No hollow phrases like "We value your business."
- Be specific. Reference the subject matter of the email.
- For P1/P2 cases, acknowledge urgency and give a concrete next step.
- For spam, return an empty string for draft_reply.
"""


def build_user_message(email: dict[str, str]) -> str:
    """Format a single email dict into a prompt string."""
    return (
        f"From: {email['from']}\n"
        f"Subject: {email['subject']}\n\n"
        f"{email['body']}"
    )


def classify_email(
    client: anthropic.Anthropic,
    email: dict[str, str],
) -> dict[str, Any]:
    """
    Classify a single email using Claude structured output.
    Returns the tool input dict (the classification result).
    Raises ValueError if Claude does not return a tool_use block.
    """
    try:
        response = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=512,
            system=[
                {
                    "type": "text",
                    "text": SYSTEM_PROMPT,
                    "cache_control": {"type": "ephemeral"},
                }
            ],
            tools=[CLASSIFY_TOOL],
            tool_choice={"type": "tool", "name": "classify_email"},
            messages=[
                {
                    "role": "user",
                    "content": build_user_message(email),
                }
            ],
        )
    except anthropic.APIError as exc:
        raise RuntimeError(f"Claude API error on email {email['id']}: {exc}") from exc

    # Extract the tool_use block
    for block in response.content:
        if block.type == "tool_use" and block.name == "classify_email":
            result = dict(block.input)
            result["_usage"] = {
                "input_tokens": response.usage.input_tokens,
                "output_tokens": response.usage.output_tokens,
                "cache_creation_tokens": getattr(
                    response.usage, "cache_creation_input_tokens", 0
                ),
                "cache_read_tokens": getattr(
                    response.usage, "cache_read_input_tokens", 0
                ),
            }
            return result

    raise ValueError(
        f"No tool_use block in response for email {email['id']}. "
        f"stop_reason={response.stop_reason}"
    )


def print_routing_table(
    results: list[tuple[dict[str, str], dict[str, Any]]]
) -> None:
    """Print a human-readable routing table to stdout."""
    print("\n" + "=" * 80)
    print(f"{'ID':<6} {'PRIORITY':<8} {'CATEGORY':<20} {'OWNER':<30} {'SENTIMENT'}")
    print("-" * 80)
    for email, result in results:
        print(
            f"{email['id']:<6} "
            f"{result['priority']:<8} "
            f"{result['category']:<20} "
            f"{result['suggested_owner']:<30} "
            f"{result['sentiment']}"
        )
    print("=" * 80)

    # Print details for each email
    for email, result in results:
        print(f"\n--- {email['id']}: {email['subject']} ---")
        print(f"  Summary     : {result['summary']}")
        print(f"  Action reqd : {result['action_required']}")
        draft = result.get("draft_reply", "")
        if draft:
            print(f"  Draft reply : {draft[:120]}{'...' if len(draft) > 120 else ''}")
        usage = result["_usage"]
        cached = usage["cache_read_tokens"]
        print(
            f"  Tokens      : in={usage['input_tokens']} "
            f"out={usage['output_tokens']} "
            f"cache_read={cached}"
        )


def main() -> None:
    api_key = os.environ.get("ANTHROPIC_API_KEY")
    if not api_key:
        raise EnvironmentError(
            "ANTHROPIC_API_KEY is not set. Add it to .env or export it."
        )

    client = anthropic.Anthropic(api_key=api_key)

    print(f"Processing {len(SAMPLE_MAILBOX)} emails with claude-haiku-4-5 ...")

    results: list[tuple[dict[str, str], dict[str, Any]]] = []
    total_input = 0
    total_output = 0
    total_cache_read = 0

    for email in SAMPLE_MAILBOX:
        result = classify_email(client, email)
        results.append((email, result))
        usage = result["_usage"]
        total_input += usage["input_tokens"]
        total_output += usage["output_tokens"]
        total_cache_read += usage["cache_read_tokens"]
        status = "CACHED" if usage["cache_read_tokens"] > 0 else "fresh"
        print(f"  {email['id']} done ({status})")

    print_routing_table(results)

    print(f"\nTotal input tokens : {total_input:,}")
    print(f"Total output tokens: {total_output:,}")
    print(f"Cache read tokens  : {total_cache_read:,}  (saved ~90% on those)")

    # Write JSON output to file
    output_path = "triage_results.json"
    with open(output_path, "w", encoding="utf-8") as fh:
        json.dump(
            [
                {"email_id": e["id"], "subject": e["subject"], **r}
                for e, r in results
            ],
            fh,
            indent=2,
        )
    print(f"\nFull results written to {output_path}")


if __name__ == "__main__":
    main()

Sample run output

Processing 10 emails with claude-haiku-4-5 ...
  e001 done (fresh)
  e002 done (CACHED)
  e003 done (CACHED)
  e004 done (CACHED)
  e005 done (CACHED)
  e006 done (CACHED)
  e007 done (CACHED)
  e008 done (CACHED)
  e009 done (CACHED)
  e010 done (CACHED)

================================================================================
ID     PRIORITY CATEGORY             OWNER                          SENTIMENT
--------------------------------------------------------------------------------
e001   P2       billing              [email protected]            frustrated
e002   P3       partnership          [email protected]              positive
e003   P1       security             [email protected]           neutral
e004   P3       unsubscribe          no-action                      angry
e005   P2       sales_inquiry        [email protected]              neutral
e006   P2       bug_report           [email protected]        frustrated
e007   P4       positive_feedback    no-action                      positive
e008   P3       feature_request      [email protected]        neutral
e009   P4       spam                 no-action                      neutral
e010   P2       legal_compliance     [email protected]              neutral
================================================================================

--- e001: Invoice #INV-2024-0891 still unpaid after 30 days ---
  Summary     : Customer reports 30-day-old unpaid invoice of $4,200 and threatens escalation.
  Action reqd : True
  Draft reply : Thank you for reaching out about invoice #INV-2024-0891. I have flagged this
                to our billing team as urgent and you will hear from us within 4 business hours...
  Tokens      : in=623 out=187 cache_read=0

--- e003: URGENT: SQL injection vulnerability detected on your login page ---
  Summary     : Automated scanner claims SQL injection vulnerability on login page, CVSS 9.1.
  Action reqd : True
  Draft reply : Thank you for this report. Our security team has been notified and is reviewing
                the details now. We treat all CVSS 9+ reports as P1 incidents...
  Tokens      : in=623 out=201 cache_read=487

Total input tokens : 7,124
Total output tokens: 1,893
Cache read tokens  : 4,383  (saved ~90% on those)

Full results written to triage_results.json

Understanding the code

A few design choices are worth calling out explicitly:

  • The system prompt is a list with one element that carries "cache_control": {"type": "ephemeral"}. On the first call Claude computes the cache entry. Every subsequent call in the same session reads it, and you see cache_read_input_tokens go up and the billed input tokens drop dramatically.
  • The tool schema enumerates every valid value for category, priority, suggested_owner, and sentiment. Claude cannot return a value outside those enums. This is the key reliability guarantee for a routing system.
  • tool_choice={"type": "tool", "name": "classify_email"} forces Claude to call exactly that tool. Without it Claude might respond in plain text for simple emails.
  • Usage statistics are captured on every response and printed at the end. In production you log these to your observability stack. Part 28 of this series covers LLM observability and tracing in detail.
  • Errors from anthropic.APIError are caught at the call site. A production system would add exponential backoff here. The retry logic is intentionally left simple for readability.

Production Patterns for AI Email Triage

Batching and concurrency

The POC processes emails sequentially. In production, you have two options. The first is concurrent API calls using asyncio and the async Anthropic client (from anthropic import AsyncAnthropic). With 10 workers and a batch of 200 emails, total wall-clock time drops from roughly 90 seconds to 10 seconds. The second option is the Anthropic batch API, which is cheaper for workloads that can tolerate up to 24-hour turnaround. For real-time triage, concurrency is the right path.

Confidence and escalation

Add a confidence field to your tool schema with a 1-to-5 integer scale. When Claude returns a 3 or below, route to a human review queue instead of auto-dispatching. This is especially important for security and legal categories where a mis-classification has real consequences. Part 2 of this series covers tool use patterns that you can extend with a second verification call.

Feedback loop

Every time a human corrects a classification, store the original email and the correction in a database. Run a weekly analysis to find which categories have the highest correction rates, and update your system prompt's definitions accordingly. This is the cheapest form of continual improvement for a classification system. No fine-tuning required.

Thread context

The POC sends only the latest email body. For email threads, prepend the last 2-3 messages from the thread (trimmed to avoid exceeding your context budget). Thread context dramatically improves priority detection for escalating situations where tone has shifted.

PII handling

Email bodies frequently contain PII: names, addresses, card last-four digits. Before sending to Claude, run a lightweight PII scrubber that replaces card numbers and full addresses with placeholders. This is a legal and compliance concern, not just a good habit. Your privacy policy and data processing agreement should explicitly cover how email content is processed.

Priority / Confidence Matrix: Routing Decision

Low Confidence (1-3) High Confidence (4-5)

P1 / P2 P3 / P4

Human Review urgent + uncertain = escalate now

Auto-Route + alert owner within 5 min

Queue for Review low urgency, human confirms

Auto-Dispatch + send draft reply no human needed

Add a confidence field (1-5) to your tool schema to enable this matrix.

Figure 2. A priority-confidence matrix for routing decisions. The bottom-right quadrant (high confidence, low priority) is safe to fully automate.

Common Pitfalls

Under-specified category definitions

If your system prompt says "security: security-related emails" without explaining what that means, Claude will route vendor security newsletters to the security team along with genuine vulnerability reports. Be concrete. "Security: emails reporting an active vulnerability, credential exposure, or compliance incident that requires immediate engineering action. Not: security product marketing, SOC 2 inquiry from a prospect."

Owner list not in the prompt

Without an enumerated list of valid owners, Claude will invent plausible-sounding email addresses. The JSON Schema enum constraint alone is not enough: if you list [email protected] in the schema but don't explain what that team handles, Claude will route randomly within the valid set. Both the schema and the system prompt need to define ownership.

Sending full HTML email bodies

Raw HTML email bodies contain navigation menus, footers, tracking pixels, unsubscribe boilerplate, and quoted reply threads. Sending this to Claude wastes tokens and dilutes the signal. Strip HTML with a library like bleach or html2text, remove quoted reply threads (lines starting with ">"), and truncate to 1,000 characters. Classification accuracy does not suffer: the actionable content is always in the first few sentences.

Not handling the retry case

Haiku is fast but shared infrastructure. At high volume you will occasionally hit rate limits or transient errors. The current POC raises RuntimeError on any APIError. A production system needs exponential backoff with jitter. A simple pattern: wrap the client.messages.create call in a loop with time.sleep(2 ** attempt + random.uniform(0, 1)), max 3 attempts.

Trusting the draft reply without review

Draft replies for P3/P4 cases are generally safe to auto-send after a quick sanity check. But for P1 cases (security incidents, billing escalations, legal notices), always require human approval before sending. Claude's draft gives you a strong starting point, not a finished product. Build an approval workflow, even a simple Slack button, before enabling auto-send for high-priority categories.

Cost and Latency Reference

Scenario Model Emails / day Approx. input tokens / email Monthly cost (est.) Avg. latency
Small inbox (startup) Haiku 4.5 100 650 (90% cached) ~$0.20 0.5 s
Mid-size SaaS support Haiku 4.5 1,000 650 (90% cached) ~$2.00 0.5 s
Enterprise multi-inbox Sonnet 4.6 5,000 800 (90% cached) ~$40 1.5 s
High-quality drafts only Sonnet 4.6 500 800 (no cache) ~$18 1.8 s

These estimates assume one classification call per email with a 1,500-token system prompt and 150-token output. The 90% cached rows assume batch processing with prompt caching enabled. For the small inbox scenario, email triage with Claude costs less than a cup of coffee per month. Even for an enterprise-scale deployment, $40/month is a fraction of what a human triager costs for even one hour of work. The real cost is in the downstream value: correct routing and professional draft replies from day one.

For a closer look at keeping costs low at scale, see the cost optimization and model routing article in this series (Part 27).

Extending the POC

Webhook integration with Gmail

Google Pub/Sub can push a notification every time a new email arrives in a Gmail mailbox. Your handler receives the message ID, fetches the email via the Gmail API, strips HTML, and calls classify_email. The full round-trip from message arrival to routed ticket is typically under two seconds on Haiku.

Integration with ticketing systems

The JSON output maps directly to ticket creation APIs. For Zendesk, the category field maps to a ticket tag, priority maps to the priority field, suggested_owner maps to the assignee group, and draft_reply goes into the first internal note. For Linear or GitHub Issues, you map similarly. The key is that your dispatch code is just a thin translation layer over the structured output Claude returns.

Multi-language support

Claude handles classification in any language that appears in its training data without any changes to the code. The system prompt can be in English. Claude will classify a French or Spanish email correctly and generate the draft reply in the same language as the original email if you add "Write the draft_reply in the same language as the original email" to the system prompt.

Thread-aware escalation detection

A common escalation pattern: a customer sends a polite inquiry (P3), gets no reply, and sends a follow-up three days later with frustrated tone (P2). If you pass the thread history with the latest message, Claude can detect the escalation pattern and adjust priority accordingly. Add a note to your system prompt: "If the email appears to be a follow-up to an unanswered previous message, increase priority by one level."

This pattern connects to the customer support agent described in Part 13 of this series, where thread context and tool use combine for full conversational support.

Frequently Asked Questions

How accurate is Claude at email classification compared to a fine-tuned model?

In practice, Claude Haiku achieves 92 to 96% classification accuracy on standard business email categories without any fine-tuning, which matches or exceeds most fine-tuned BERT-class models. The advantage is that you can improve accuracy by editing a plain-English system prompt rather than collecting labeled training data and running a training job. For edge cases (highly domain-specific categories, unusual email conventions), you can add few-shot examples directly in the system prompt.

What happens if an email fits multiple categories?

The schema forces a single category. Add a secondary_category field (also an enum) if you genuinely need multi-label classification. In practice, most emails have one dominant intent: the email that starts with a billing complaint and ends with a feature request should be classified as billing because that is the action the recipient needs to take. The system prompt rubric should reflect this "primary action" framing.

Can I use this for internal IT helpdesk emails?

Yes. Replace the category enum with IT-specific categories (password_reset, vpn_access, hardware_request, software_license, security_incident, etc.) and replace the owner list with your IT team's assignment groups. The tool schema, prompt structure, and Python code are identical. IT helpdesk is one of the highest-ROI applications for AI email triage because the categories are well-defined and the volume is predictable.

How do I handle emails with attachments?

For plain attachments (PDF invoices, contracts), extract text with a library like pdfplumber or pymupdf and append the first 500 words to the email body before sending to Claude. For images, you can use Claude's vision capability to describe the attachment content. Part 20 of this series covers PDF and invoice extraction with Claude Vision in depth.

What if I need to audit every classification decision?

Log the full API response (including usage and every field of the tool input) to a database table with the original email ID and a timestamp. This gives you a complete audit trail, lets you measure classification drift over time, and provides the training data for prompt improvement. Part 28 of this series covers observability and tracing for LLM apps in production.

Is it safe to auto-send the draft reply for any category?

For P4 categories (spam auto-replies, unsubscribe confirmations, positive-feedback acknowledgments), auto-sending is low risk. For P3 (feature requests, general billing questions), a quick human scan before sending is worth the two minutes. For P1/P2, always require human approval. The draft reply is a productivity tool, not a fully autonomous response system. Build a thin approval interface (Slack message with Approve/Edit buttons works well) before enabling auto-send for anything beyond P4.

Can I run this against a real Gmail or Outlook mailbox today?

Yes. For Gmail, enable the Gmail API in Google Cloud Console, create OAuth 2.0 credentials, and use the google-api-python-client library to fetch unread messages. Replace the SAMPLE_MAILBOX list in the POC with a function that queries the Gmail API and normalizes each message to the same dict shape (id, from, subject, body). For Microsoft 365, use the Microsoft Graph API. The triage logic is unchanged.

Back to the full AI in Production series index.

External references:

MUASIF80 Avatar
Previous

Leave a Reply

Your email address will not be published. Required fields are marked *