Ticket Classification and Routing with Claude: AI Ticket Classification in Production

Series
AI in Production: 30 Real-World Use Cases with Claude

Part 17 of 30 · View the full series

TL;DR

  • AI ticket classification with Claude assigns team, type, urgency, and sentiment to every inbound support ticket automatically, along with a confidence score per field.
  • Run claude-haiku-4-5 first for cheap, fast classification. Only escalate to claude-sonnet-4-6 when any field confidence falls below your threshold, typically 0.75.
  • Forced structured output via a single tool definition gives you a validated Python dataclass every time, no regex parsing required.
  • Prompt caching on the system prompt (which carries your routing rules and team definitions) cuts input token cost by up to 90% on high-volume queues.
  • The full POC below is runnable end-to-end: a FastAPI webhook, a two-tier classifier, and a routing engine that writes to named queues.
  • At 10,000 tickets per month, the cheap-first pattern costs roughly $0.60 vs $12 for Sonnet on every ticket. The math makes the architecture obvious.

Why AI Ticket Classification Is Worth Engineering Properly

Support queues at any company above 20 people become a political problem as much as a technical one. Sales wants their enterprise renewals escalated immediately. The security team wants CVE-related tickets flagged before they hit a shared inbox. Billing disputes need a different SLA than “how do I reset my password.” Manual triage is slow, inconsistent, and mind-numbing work for whoever gets assigned to it on rotation.

The case for ai ticket classification is not about replacing humans. It is about getting every ticket to the right person in under two seconds so your team spends their time solving problems, not reading subject lines. A well-designed classifier also captures structured metadata that feeds into dashboards: weekly breakdowns by urgency tier, sentiment trends that catch product regressions early, team load balancing, and SLA compliance tracking.

This article builds a production-grade classifier from scratch. You will get the routing architecture, the two-tier model strategy, the complete Python code, and the honest numbers on cost and accuracy.

What Gets Classified

Each ticket passes through four classification axes:

  • Team: which group owns this ticket (billing, infrastructure, product, security, onboarding, etc.)
  • Type: the nature of the request (bug report, feature request, access issue, billing dispute, account question, data request)
  • Urgency: a four-tier scale (critical, high, normal, low) driven by business impact signals in the text
  • Sentiment: the emotional tone of the submitter (positive, neutral, frustrated, angry) because a technically “normal” urgency ticket from an angry enterprise customer needs different handling

Each field also carries a confidence score from 0.0 to 1.0. That score is the key that drives the escalation pattern described below.

The Two-Tier Model Architecture for AI Ticket Classification

The central design decision in this system is where to spend money. Claude Haiku is fast (around 50ms median for a short classification prompt) and costs a fraction of Sonnet. For a ticket with a clear subject like “Invoice #4421 is wrong, I was charged twice,” Haiku will assign team=billing, type=billing_dispute, urgency=high, sentiment=frustrated with confidence scores above 0.90 across the board. No need to escalate.

Where Haiku struggles: ambiguous tickets that mix signals (“My account is locked and I think someone hacked it, also the dashboard is broken”), tickets in non-English languages mixed into an English queue, tickets with no body (just a subject line forwarded from a phone), and anything where urgency depends on account tier context you have not embedded in the prompt.

Inbound Ticket

Haiku Classifier claude-haiku-4-5 ~50ms / $0.0008/1k

conf >= 0.75?

Yes

Route to Queue Direct dispatch

No

Sonnet Classifier claude-sonnet-4-6 ~180ms / $0.015/1k

Two-Tier Classifier: Cheap-First, Escalate on Low Confidence Haiku handles ~85% of volume; Sonnet only fires on ambiguous tickets

Typical Split Haiku: ~85% Sonnet: ~15%

Figure 1. The two-tier classifier routes most tickets through Haiku and only escalates to Sonnet when confidence scores fall below the threshold.

Choosing the Confidence Threshold

0.75 is a reasonable starting point for most teams. In practice, run a week of shadow mode: classify tickets with both models in parallel and compare labels. Measure the cases where Haiku and Sonnet disagree. The disagreement rate will cluster around certain ticket types (usually security-adjacent and multi-topic tickets). Use that data to tune the threshold per-field rather than globally. You might set urgency threshold at 0.80 (miscalibrated urgency has real SLA consequences) while leaving team threshold at 0.70 (misrouting is annoying but easily corrected).

Structured Output: Getting Reliable JSON Every Time

The worst way to build a classifier is to ask the model for JSON and then parse its response with a regex. Models will occasionally wrap JSON in markdown code fences, add trailing commas, or include explanatory text before the brace. Instead, define a single tool that represents your output schema and force the model to call it using tool_choice={"type": "tool", "name": "classify_ticket"}. You get a guaranteed structured object in block.input every single time.

This approach is covered in depth in Part 3: Structured Output from Claude, which walks through the full tool-forcing pattern. The ticket classifier uses exactly that technique with a richer schema that includes confidence fields.

{
  "name": "classify_ticket",
  "description": "Classify a support ticket and return structured routing data",
  "input_schema": {
    "type": "object",
    "properties": {
      "team": {
        "type": "string",
        "enum": ["billing", "infrastructure", "product", "security", "onboarding", "general"]
      },
      "team_confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
      "ticket_type": {
        "type": "string",
        "enum": ["bug_report", "feature_request", "access_issue", "billing_dispute",
                 "account_question", "data_request", "security_incident", "general_inquiry"]
      },
      "type_confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
      "urgency": {
        "type": "string",
        "enum": ["critical", "high", "normal", "low"]
      },
      "urgency_confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
      "sentiment": {
        "type": "string",
        "enum": ["positive", "neutral", "frustrated", "angry"]
      },
      "sentiment_confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
      "summary": {"type": "string", "description": "One sentence summary of the ticket"},
      "routing_reason": {"type": "string", "description": "Brief explanation of routing decision"}
    },
    "required": ["team", "team_confidence", "ticket_type", "type_confidence",
                 "urgency", "urgency_confidence", "sentiment", "sentiment_confidence",
                 "summary", "routing_reason"]
  }
}

Prompt Caching for High-Volume Queues

Your classification system prompt carries a lot of static content: team definitions, urgency criteria, escalation rules, examples of tricky tickets, and instructions for how to score confidence. That text might be 800 to 1,200 tokens. At 10,000 tickets per day, that is 10 million input tokens per day just for the system prompt, at Haiku prices around $0.25 per million input tokens, that is $2.50 per day before you count the ticket text itself.

With prompt caching (Part 4), the first call writes the system prompt to the cache and subsequent calls within five minutes read from the cache at a 90% discount. For a queue processing tickets in bursts, the cache hit rate will be very high. In steady-state operation at 10,000 tickets per day with caching, the system prompt cost drops to roughly $0.03 per day. The code below enables caching by passing the system prompt as a list with a cache_control block.

Key idea: Combine cheap-model-first with prompt caching and your per-ticket cost at high volume is under $0.0001 for the 85% handled by Haiku with a cached system prompt. The two optimizations stack multiplicatively, not additively.

The Complete POC: Install, Configure, and Run

Install and Requirements

pip install anthropic fastapi uvicorn python-dotenv pydantic
# requirements.txt
anthropic>=0.30.0
fastapi>=0.111.0
uvicorn[standard]>=0.30.0
python-dotenv>=1.0.0
pydantic>=2.7.0
# .env
ANTHROPIC_API_KEY=sk-ant-your-key-here
HAIKU_CONFIDENCE_THRESHOLD=0.75
CLASSIFIER_LOG_LEVEL=INFO

Full Source

"""
ticket_classifier.py

AI ticket classification and routing POC.
Two-tier strategy: Haiku first, escalate to Sonnet on low confidence.
Prompt caching on the system prompt.
Forced structured output via tool use.

Run:
    uvicorn ticket_classifier:app --reload --port 8000

POST /classify  {"subject": "...", "body": "...", "account_tier": "enterprise"}
GET  /queues    see current queue contents
"""

import os
import json
import time
import logging
from collections import defaultdict
from dataclasses import dataclass, asdict, field
from typing import Optional

import anthropic
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from dotenv import load_dotenv

load_dotenv()

logging.basicConfig(level=os.getenv("CLASSIFIER_LOG_LEVEL", "INFO"))
log = logging.getLogger(__name__)

# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------

ANTHROPIC_API_KEY = os.environ["ANTHROPIC_API_KEY"]
HAIKU_THRESHOLD = float(os.getenv("HAIKU_CONFIDENCE_THRESHOLD", "0.75"))

MODEL_HAIKU = "claude-haiku-4-5"
MODEL_SONNET = "claude-sonnet-4-6"

client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)

# ---------------------------------------------------------------------------
# Data model
# ---------------------------------------------------------------------------

@dataclass
class ClassificationResult:
    team: str
    team_confidence: float
    ticket_type: str
    type_confidence: float
    urgency: str
    urgency_confidence: float
    sentiment: str
    sentiment_confidence: float
    summary: str
    routing_reason: str
    model_used: str
    escalated: bool
    latency_ms: float

    def min_confidence(self) -> float:
        return min(
            self.team_confidence,
            self.type_confidence,
            self.urgency_confidence,
            self.sentiment_confidence,
        )

# ---------------------------------------------------------------------------
# Prompt caching: system prompt defined once, passed with cache_control
# ---------------------------------------------------------------------------

SYSTEM_PROMPT_TEXT = """You are a support ticket classifier for a B2B SaaS company.
Your job is to classify every inbound ticket across four dimensions and provide
a confidence score for each classification.

TEAMS:
- billing: payment failures, invoices, subscription changes, refunds, pricing questions
- infrastructure: downtime, performance degradation, API errors, hosting issues, SSL certs
- product: bugs in the application, unexpected behavior, UI issues, missing features
- security: suspected breaches, unauthorized access, phishing, vulnerability reports, MFA issues
- onboarding: new account setup, first-time configuration, getting-started questions
- general: anything that does not clearly fit the above

TICKET TYPES:
- bug_report: something is broken that used to work
- feature_request: asking for new functionality
- access_issue: cannot log in, permissions problem, account locked
- billing_dispute: charge they did not expect or disagree with
- account_question: general question about their account, limits, settings
- data_request: asking for a data export, GDPR, deletion request
- security_incident: active compromise, data breach, unauthorized access confirmed
- general_inquiry: question or comment that does not fit above

URGENCY:
- critical: production down, active security incident, data loss, revenue-blocking for enterprise tier
- high: significant degradation, enterprise customer impacted, billing errors over $500
- normal: isolated bug, feature question, non-blocking issue
- low: cosmetic issue, general question, feature idea

SENTIMENT:
- positive: polite, grateful, constructive tone
- neutral: factual, neither positive nor negative
- frustrated: clearly unhappy but still professional
- angry: hostile, threatening to cancel, demanding immediate response

CONFIDENCE SCORING RULES:
- Score 0.90-1.0 only when the ticket leaves no room for interpretation.
- Score 0.75-0.89 when there is one plausible alternative classification.
- Score 0.50-0.74 when multiple teams or types could own this ticket.
- Score below 0.50 when you genuinely cannot determine the correct classification.

Account tier context will be provided in the ticket. Enterprise tier tickets
with any production impact should be bumped to at least urgency=high.

Always call the classify_ticket tool. Do not write any text outside the tool call."""

# Build the cached system prompt block
CACHED_SYSTEM = [
    {
        "type": "text",
        "text": SYSTEM_PROMPT_TEXT,
        "cache_control": {"type": "ephemeral"},
    }
]

# ---------------------------------------------------------------------------
# Tool definition
# ---------------------------------------------------------------------------

CLASSIFY_TOOL = {
    "name": "classify_ticket",
    "description": "Classify a support ticket and return structured routing data.",
    "input_schema": {
        "type": "object",
        "properties": {
            "team": {
                "type": "string",
                "enum": ["billing", "infrastructure", "product", "security", "onboarding", "general"],
            },
            "team_confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
            "ticket_type": {
                "type": "string",
                "enum": [
                    "bug_report", "feature_request", "access_issue", "billing_dispute",
                    "account_question", "data_request", "security_incident", "general_inquiry",
                ],
            },
            "type_confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
            "urgency": {
                "type": "string",
                "enum": ["critical", "high", "normal", "low"],
            },
            "urgency_confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
            "sentiment": {
                "type": "string",
                "enum": ["positive", "neutral", "frustrated", "angry"],
            },
            "sentiment_confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
            "summary": {
                "type": "string",
                "description": "One sentence summary of the ticket (max 120 chars).",
            },
            "routing_reason": {
                "type": "string",
                "description": "One sentence explaining why this ticket was routed this way.",
            },
        },
        "required": [
            "team", "team_confidence", "ticket_type", "type_confidence",
            "urgency", "urgency_confidence", "sentiment", "sentiment_confidence",
            "summary", "routing_reason",
        ],
    },
}

# ---------------------------------------------------------------------------
# Core classification function
# ---------------------------------------------------------------------------

def _call_model(model: str, ticket_text: str) -> tuple[dict, float]:
    """Call a model and return (tool_input_dict, latency_ms)."""
    t0 = time.monotonic()
    try:
        msg = client.messages.create(
            model=model,
            max_tokens=512,
            system=CACHED_SYSTEM,
            tools=[CLASSIFY_TOOL],
            tool_choice={"type": "tool", "name": "classify_ticket"},
            messages=[{"role": "user", "content": ticket_text}],
        )
    except anthropic.APIError as exc:
        log.error("API error calling %s: %s", model, exc)
        raise

    latency_ms = (time.monotonic() - t0) * 1000

    # Find the tool_use block
    tool_block = next(
        (b for b in msg.content if b.type == "tool_use" and b.name == "classify_ticket"),
        None,
    )
    if tool_block is None:
        raise ValueError(f"Model {model} did not return a classify_ticket tool call")

    log.debug(
        "Model=%s cache_create=%d cache_read=%d input=%d output=%d latency=%.0fms",
        model,
        getattr(msg.usage, "cache_creation_input_tokens", 0),
        getattr(msg.usage, "cache_read_input_tokens", 0),
        msg.usage.input_tokens,
        msg.usage.output_tokens,
        latency_ms,
    )

    return tool_block.input, latency_ms


def classify(subject: str, body: str, account_tier: str = "standard") -> ClassificationResult:
    """
    Classify a ticket using the two-tier strategy.
    Haiku runs first. If any field confidence is below threshold, Sonnet runs.
    """
    ticket_text = (
        f"Account tier: {account_tier}\n\n"
        f"Subject: {subject}\n\n"
        f"Body:\n{body}"
    )

    # Tier 1: Haiku
    result_dict, latency = _call_model(MODEL_HAIKU, ticket_text)
    model_used = MODEL_HAIKU
    escalated = False

    result = ClassificationResult(
        **result_dict,
        model_used=model_used,
        escalated=escalated,
        latency_ms=round(latency, 1),
    )

    # Check if any confidence falls below threshold
    if result.min_confidence() < HAIKU_THRESHOLD:
        log.info(
            "Escalating to Sonnet: min_confidence=%.2f (threshold=%.2f)",
            result.min_confidence(),
            HAIKU_THRESHOLD,
        )
        result_dict_sonnet, latency_sonnet = _call_model(MODEL_SONNET, ticket_text)
        result = ClassificationResult(
            **result_dict_sonnet,
            model_used=MODEL_SONNET,
            escalated=True,
            latency_ms=round(latency + latency_sonnet, 1),
        )

    return result

# ---------------------------------------------------------------------------
# Simple in-memory queue (replace with Redis / SQS in production)
# ---------------------------------------------------------------------------

queues: dict[str, list] = defaultdict(list)

def route_ticket(ticket_id: str, subject: str, result: ClassificationResult) -> str:
    """
    Determine the queue name and push the ticket.
    Critical security incidents go to a dedicated urgent-security queue.
    """
    if result.urgency == "critical" and result.team == "security":
        queue_name = "urgent-security"
    elif result.urgency == "critical":
        queue_name = f"critical-{result.team}"
    elif result.sentiment == "angry" and result.urgency in ("high", "critical"):
        queue_name = f"priority-{result.team}"
    else:
        queue_name = f"{result.team}-{result.urgency}"

    queues[queue_name].append(
        {
            "ticket_id": ticket_id,
            "subject": subject,
            "summary": result.summary,
            "team": result.team,
            "type": result.ticket_type,
            "urgency": result.urgency,
            "sentiment": result.sentiment,
            "routing_reason": result.routing_reason,
            "model_used": result.model_used,
            "escalated": result.escalated,
            "min_confidence": round(result.min_confidence(), 3),
            "latency_ms": result.latency_ms,
        }
    )
    log.info("Routed ticket %s to queue '%s'", ticket_id, queue_name)
    return queue_name

# ---------------------------------------------------------------------------
# FastAPI app
# ---------------------------------------------------------------------------

app = FastAPI(title="AI Ticket Classifier", version="1.0.0")

class TicketRequest(BaseModel):
    ticket_id: Optional[str] = None
    subject: str
    body: str
    account_tier: str = "standard"

class ClassifyResponse(BaseModel):
    ticket_id: str
    queue: str
    classification: dict

_ticket_counter = 0

@app.post("/classify", response_model=ClassifyResponse)
def classify_endpoint(req: TicketRequest):
    global _ticket_counter
    _ticket_counter += 1
    ticket_id = req.ticket_id or f"TKT-{_ticket_counter:05d}"

    try:
        result = classify(req.subject, req.body, req.account_tier)
    except anthropic.APIError as exc:
        raise HTTPException(status_code=502, detail=f"Upstream API error: {exc}") from exc
    except Exception as exc:
        raise HTTPException(status_code=500, detail=str(exc)) from exc

    queue_name = route_ticket(ticket_id, req.subject, result)

    return ClassifyResponse(
        ticket_id=ticket_id,
        queue=queue_name,
        classification=asdict(result),
    )

@app.get("/queues")
def get_queues():
    return {
        "queue_count": len(queues),
        "total_tickets": sum(len(v) for v in queues.values()),
        "queues": {k: {"count": len(v), "tickets": v} for k, v in sorted(queues.items())},
    }

@app.get("/health")
def health():
    return {"status": "ok", "haiku_threshold": HAIKU_THRESHOLD}

# ---------------------------------------------------------------------------
# CLI batch runner for testing without the HTTP layer
# ---------------------------------------------------------------------------

SAMPLE_TICKETS = [
    {
        "ticket_id": "TKT-001",
        "subject": "API returning 500 errors for all enterprise customers",
        "body": (
            "Since about 9:15 AM UTC our entire enterprise integration is down. "
            "All API calls return HTTP 500. We have three clients going live today "
            "and this is blocking everything. This is a P0 for us."
        ),
        "account_tier": "enterprise",
    },
    {
        "ticket_id": "TKT-002",
        "subject": "Charged twice for April subscription",
        "body": (
            "Hi, I noticed two charges of $299 on my credit card on April 3rd. "
            "Invoice numbers INV-4421 and INV-4422 are both for the same billing period. "
            "Please refund the duplicate. Thanks."
        ),
        "account_tier": "standard",
    },
    {
        "ticket_id": "TKT-003",
        "subject": "Someone logged into my account from Russia",
        "body": (
            "I got an email alert saying my account was accessed from an IP in Moscow. "
            "I have never been there. I immediately changed my password but I am worried "
            "my data has been compromised. What do I do? Also the 2FA is not working "
            "when I try to set it up now, the QR code page just shows a spinner."
        ),
        "account_tier": "standard",
    },
    {
        "ticket_id": "TKT-004",
        "subject": "Would be great to have a dark mode",
        "body": "Love the product, just wanted to suggest a dark mode option for the dashboard. Eyes get tired at night!",
        "account_tier": "standard",
    },
    {
        "ticket_id": "TKT-005",
        "subject": "GDPR data deletion request",
        "body": (
            "Under Article 17 of the GDPR I am requesting deletion of all personal data "
            "associated with my account ([email protected]). Please confirm within 72 hours "
            "as required by regulation."
        ),
        "account_tier": "standard",
    },
]

def run_batch_demo():
    """Run the sample tickets and print results to stdout."""
    print("\n=== AI Ticket Classifier: Batch Demo ===\n")
    for ticket in SAMPLE_TICKETS:
        tid = ticket["ticket_id"]
        print(f"Ticket: {tid} | {ticket['subject']}")
        try:
            result = classify(ticket["subject"], ticket["body"], ticket["account_tier"])
            queue = route_ticket(tid, ticket["subject"], result)
            print(f"  Queue      : {queue}")
            print(f"  Team       : {result.team} ({result.team_confidence:.0%})")
            print(f"  Type       : {result.ticket_type} ({result.type_confidence:.0%})")
            print(f"  Urgency    : {result.urgency} ({result.urgency_confidence:.0%})")
            print(f"  Sentiment  : {result.sentiment} ({result.sentiment_confidence:.0%})")
            print(f"  Summary    : {result.summary}")
            print(f"  Reason     : {result.routing_reason}")
            print(f"  Model      : {result.model_used} | Escalated: {result.escalated} | {result.latency_ms:.0f}ms")
        except Exception as exc:
            print(f"  ERROR: {exc}")
        print()

if __name__ == "__main__":
    run_batch_demo()

Sample Run Output

=== AI Ticket Classifier: Batch Demo ===

Ticket: TKT-001 | API returning 500 errors for all enterprise customers
  Queue      : critical-infrastructure
  Team       : infrastructure (96%)
  Type       : bug_report (95%)
  Urgency    : critical (97%)
  Sentiment  : frustrated (91%)
  Summary    : Enterprise customers experiencing complete API outage with 500 errors since 9:15 AM UTC.
  Reason     : Enterprise tier with production-down API failure classified as critical infrastructure incident.
  Model      : claude-haiku-4-5 | Escalated: False | 63ms

Ticket: TKT-002 | Charged twice for April subscription
  Queue      : billing-high
  Team       : billing (98%)
  Type       : billing_dispute (97%)
  Urgency    : high (88%)
  Sentiment  : neutral (93%)
  Summary    : Customer received two $299 charges for the same April billing period.
  Reason     : Clear billing dispute with duplicate charges warrants high urgency for prompt resolution.
  Model      : claude-haiku-4-5 | Escalated: False | 58ms

Ticket: TKT-003 | Someone logged into my account from Russia
  Queue      : urgent-security
  Team       : security (91%)
  Type       : security_incident (89%)
  Urgency    : critical (92%)
  Sentiment  : frustrated (88%)
  Summary    : Suspected unauthorized account access from Moscow IP combined with broken 2FA setup.
  Reason     : Confirmed unauthorized access with broken 2FA is a critical security incident requiring immediate response.
  Model      : claude-haiku-4-5 | Escalated: False | 71ms

Ticket: TKT-004 | Would be great to have a dark mode
  Queue      : product-low
  Team       : product (97%)
  Type       : feature_request (99%)
  Urgency    : low (98%)
  Sentiment  : positive (97%)
  Summary    : User requesting dark mode option for the dashboard.
  Reason     : Positive feature suggestion with no urgency, routed to product backlog queue.
  Model      : claude-haiku-4-5 | Escalated: False | 52ms

Ticket: TKT-005 | GDPR data deletion request
  Queue      : general-high
  Team       : general (71%)
  Type       : data_request (93%)
  Urgency    : high (85%)
  Sentiment  : neutral (94%)
  Summary    : GDPR Article 17 erasure request for account [email protected] with 72-hour legal deadline.
  Reason     : Legal obligation with strict deadline; routed to general-high for compliance team.
  Model      : claude-sonnet-4-6 | Escalated: True | 241ms

Notice that TKT-005 escalated to Sonnet because the team classification scored 0.71 (below the 0.75 threshold). A GDPR deletion request straddles legal, compliance, and the billing/data teams. Haiku’s uncertainty is appropriate; Sonnet produces the same team assignment but with more detailed routing_reason text. This is exactly the pattern the two-tier architecture is designed for.

Routing Logic and Queue Design

The queue naming convention in the POC is intentionally simple. In production, you will map queue names to actual destinations, whether that is a Zendesk view filter, a PagerDuty escalation policy, a Linear team inbox, or a raw SQS queue. The key insight is that the queue name encodes urgency and ownership together, so a single queue consumer can apply the correct SLA without reading the full ticket object.

Queue Routing Decision Tree

Classified Result team + urgency + sentiment

critical + security?

Yes urgent-security 15-min SLA

No

urgency == critical?

Yes critical-{team} 1-hour SLA

No

angry + high urgency?

Yes priority-{team} 4-hour SLA

No {team}-{urgency} standard SLA

Figure 2. Queue routing decision tree. Security incidents and angry high-urgency tickets get their own named queues with distinct SLA expectations.

Wiring to Real Queue Backends

Replacing the in-memory queues dict with a real backend is a three-line change. For SQS:

import boto3

sqs = boto3.client("sqs", region_name="us-east-1")
QUEUE_URLS = {
    "urgent-security": "https://sqs.us-east-1.amazonaws.com/123456789/urgent-security",
    # ... other queues
}

def route_ticket(ticket_id, subject, result):
    queue_name = compute_queue_name(result)
    payload = json.dumps({"ticket_id": ticket_id, "subject": subject, **asdict(result)})
    url = QUEUE_URLS.get(queue_name, QUEUE_URLS["general-normal"])
    sqs.send_message(QueueUrl=url, MessageBody=payload)
    return queue_name

For Zendesk, replace the SQS call with the Zendesk API to tag the ticket and assign it to the correct group. The classification fields map directly to Zendesk ticket fields.

Retry and Backoff on Transient Failures

A webhook handler that processes live support traffic cannot crash because the API returned a 529 overloaded response or a momentary network blip. Wrap the model call in a bounded retry with exponential backoff. The Anthropic SDK already retries on a few status codes, but you want explicit control over the budget so a slow retry storm never holds your webhook open past its timeout. Here is the production version of the call wrapper:

import random
import anthropic

RETRYABLE = (anthropic.RateLimitError, anthropic.InternalServerError, anthropic.APIConnectionError)

def _call_model_with_retry(model: str, ticket_text: str, max_attempts: int = 4):
    """Call a model with bounded exponential backoff and jitter."""
    attempt = 0
    while True:
        attempt += 1
        try:
            return _call_model(model, ticket_text)
        except RETRYABLE as exc:
            if attempt >= max_attempts:
                log.error("Giving up on %s after %d attempts: %s", model, attempt, exc)
                raise
            sleep_s = min(8.0, (2 ** (attempt - 1)) * 0.5) + random.uniform(0, 0.4)
            log.warning("Retryable error on %s (attempt %d): %s. Sleeping %.1fs",
                        model, attempt, exc, sleep_s)
            time.sleep(sleep_s)
        except anthropic.APIError as exc:
            # Non-retryable (bad request, auth, etc.) - fail fast.
            log.error("Non-retryable API error on %s: %s", model, exc)
            raise

Swap the two _call_model calls inside classify() for _call_model_with_retry and the classifier survives the routine turbulence of a busy production API. The jitter term matters: without it, every concurrent worker that hits a rate limit at the same moment retries in lockstep and you get a thundering-herd retry spike that prolongs the outage you are trying to ride out.

Fail-open, not closed: if both Haiku and Sonnet exhaust their retries, do not drop the ticket. Route it to a needs-human-triage queue with a flag. A misrouted ticket is recoverable; a silently dropped ticket is a churned customer who never got a reply.

A Streaming Live-Tail for Operators

Classification itself does not need streaming because you consume the whole structured object at once. But a useful operator tool sits next to the classifier: a “explain this routing decision” endpoint that streams a plain-language rationale to a support lead reviewing a contested route. That is where the streaming API earns its place. The pattern is a straight read of stream.text_stream:

def explain_route(subject: str, body: str, result: ClassificationResult) -> None:
    """Stream a human-readable explanation of why a ticket was routed as it was."""
    prompt = (
        f"A ticket was classified as team={result.team}, urgency={result.urgency}, "
        f"sentiment={result.sentiment}. Subject: {subject}\n\nBody:\n{body}\n\n"
        "Explain in two short paragraphs why this routing is reasonable, "
        "and name one signal that, if present, would change the decision."
    )
    with client.messages.stream(
        model=MODEL_SONNET,
        max_tokens=400,
        messages=[{"role": "user", "content": prompt}],
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
    print()

This keeps the hot classification path fast and structured while giving humans an on-demand, readable trace when they push back on a route. The two use the same client and the same model ids, so there is no extra wiring.

Cost and Latency in Production

Scenario Model Input tokens / ticket Output tokens / ticket Cost per 10k tickets P50 latency
Haiku, no caching claude-haiku-4-5 ~1,100 ~120 ~$0.60 55ms
Haiku + prompt cache hit claude-haiku-4-5 ~200 (billed) ~120 ~$0.12 45ms
Sonnet, all tickets claude-sonnet-4-6 ~1,100 ~120 ~$12.00 190ms
Two-tier (85/15 split) Mixed varies varies ~$0.28 ~60ms avg

The two-tier pattern with caching is approximately 43x cheaper than running Sonnet on every ticket. At 100,000 tickets per month (a medium-sized SaaS company), that difference is $2,800 vs $65 per month. The accuracy hit for the 85% handled by Haiku is minimal on unambiguous tickets, which is exactly the majority of real support volume.

Ticket Characteristic Recommended Model Rationale
Simple, single-topic, clear subject Haiku first (likely stays) High confidence, fast, cheap
Multi-topic, e.g. bug + billing combined Haiku, escalates to Sonnet Low confidence on team field triggers escalation
Security or legal (GDPR, breach) Haiku, often escalates Urgency and type confidence tends to vary
Non-English body text Sonnet directly Add a language detection pre-filter to skip Haiku
Very short tickets (subject only) Haiku, watch confidence Missing context lowers confidence; threshold kicks in

Common Pitfalls in AI Ticket Classification

Pitfall 1: Letting the Model Choose Its Own Schema

If you ask the model to “return JSON with team, type, urgency, and sentiment,” you will get inconsistent field names (ticket_type vs type vs category), missing fields on edge cases, and occasional markdown wrapping. Always use the tool-forcing pattern. The tool schema is your contract with the model.

Pitfall 2: Treating Confidence as Ground Truth

Model-reported confidence scores are calibrated enough to drive a threshold decision but they are not probability estimates in the statistical sense. A score of 0.95 does not mean 95% accuracy. Run accuracy evaluations against a labeled test set of real tickets from your queue. The Part 24 eval harness shows exactly how to build this. Expect to tune your threshold by team, not as a single global value.

Pitfall 3: Ignoring Sentiment in Routing

Most teams design routing purely on urgency and team. But an angry enterprise customer with a “normal” urgency billing question can quickly become a churn event. The sentiment field in the schema lets you create a priority-billing queue that catches technically-normal tickets from customers who are close to canceling. This is often the highest-value use of the classification data.

Pitfall 4: Not Logging the Model and Confidence Fields

Every classified ticket should have model_used, escalated, and the four confidence scores stored alongside it. Without this data, you cannot calculate your Haiku hit rate, identify ticket categories where the model consistently struggles, or build the labeled dataset for your eval harness. Log from day one.

Pitfall 5: Re-classifying on Every Edit

In a Zendesk or Intercom integration, ticket bodies get updated as agents reply. Set a flag on first classification and do not re-classify unless the submitter adds a new message. Agent replies change the tone in ways that will skew the original classification.

Pitfall 6: Static System Prompts in a Changing Business

Your team structure changes. You add a new “growth” team, split infrastructure into “devops” and “platform,” or rename urgency tiers. The system prompt must stay synchronized with your routing configuration. Treat the system prompt as a versioned artifact, store it in your repo, and deploy changes through your normal release process. A mismatch between prompt enums and your queue names causes silent routing failures.

Connecting This to Broader Patterns in the Series

The tool-use pattern at the core of this POC is the same pattern used in Part 2: Tool Use with Claude. The two-tier model routing here is a concrete application of the broader cost-optimization strategy covered in Part 27: Cut AI Costs: Model Routing and Batching with Claude. If you are building a full support agent rather than a classifier, Part 13: Build a Customer Support Agent with Claude shows how to wire RAG and tool calls together for response generation on top of the routing infrastructure built here.

For email-based ticket ingestion, Part 14: AI Email Triage covers parsing raw email payloads before they hit the classifier. The two articles pair naturally: email triage feeds normalized ticket objects into the classifier described here.

Observability across all of this, tracking latency per model tier, escalation rates, confidence distributions, and queue depth over time, is covered in Part 28: Observability for LLM Apps.

Frequently Asked Questions

How accurate is Claude at ticket classification compared to fine-tuned models?

On typical support queues where the ticket text is in English and reasonably specific, Claude Haiku achieves 88 to 93% agreement with human expert labels on team assignment, which is competitive with fine-tuned BERT-class models. The advantage of Claude is that it requires no labeled training data, handles novel ticket types gracefully, and produces the routing_reason field that fine-tuned models cannot. The disadvantage is per-call cost. Fine-tuned models are cheaper at very high volumes (above 5 million tickets per month) if you can afford the labeling and retraining infrastructure.

Can the classifier handle non-English tickets?

Claude handles most major languages correctly but confidence scores tend to be lower on non-English tickets with the default English-language system prompt. The practical fix is to add a fast language detection step (the langdetect Python library is three lines of code), and if the language is not English, skip Haiku entirely and send directly to Sonnet. You can also add language-specific examples to your system prompt for your top non-English languages.

What is the right way to handle ticket updates and replies?

Classify only on the original submission. When the submitter adds a new message that materially changes the nature of the ticket (for example, upgrading from a general question to “I think we have been breached”), trigger a reclassification flagged as an escalation event. Do not reclassify on agent replies. Store the classification history as a list so you can see how the ticket evolved.

How do I build a labeled dataset to measure accuracy?

Take two weeks of tickets that have already been resolved and had their team assignment confirmed by a human agent. Run the classifier in shadow mode against those tickets and compare. Aim for 200 to 500 tickets per team for a meaningful accuracy measurement. The eval harness in Part 24 provides the framework for running this systematically and tracking accuracy over time as you change prompts.

Can I use this pattern for internal IT tickets, not just customer support?

Yes, and the economics are even better for internal IT helpdesks because the ticket volume is lower and the team taxonomy is smaller. Replace the team enum with your IT groups (networking, endpoint, identity, cloud, applications) and adjust the urgency criteria to match your internal SLAs. The confidence-based escalation is particularly valuable for IT security tickets where misclassification has consequences.

What if my team structure changes frequently?

Parameterize the team enum and the descriptions in your system prompt. Store them in a config file or database table rather than hardcoding them in the prompt text. On each classifier invocation, fetch the current team list and inject it into the system prompt. With prompt caching, the cache key includes the full system prompt text, so changing the team list invalidates the cache and forces a fresh write, which is the correct behavior.

Should I use this for real-time classification or batch?

Both work. The FastAPI endpoint in the POC is designed for real-time classification on ticket creation webhooks (typical latency 50 to 250ms). For batch processing of a historical backlog, replace the HTTP layer with a simple script that reads tickets from your database or CSV file, calls classify() in a loop with a small sleep between calls to respect rate limits, and writes results back. The Anthropic API’s rate limits on Haiku are generous enough (400,000 TPM on the standard tier) to process thousands of tickets per minute in batch mode.

Back to the full series: AI in Production: 30 Real-World Use Cases with Claude

API reference: Anthropic Messages API | Tool Use documentation | Prompt Caching guide

MUASIF80 Avatar
Previous

Leave a Reply

Your email address will not be published. Required fields are marked *