AI Agent Orchestration Patterns: Workflows vs Agents

Q: How do I prevent an agent from running too many tool calls and running up my bill?

Three controls: (1) always set a max_turns limit in the loop; (2) write tool descriptions that specify calling conventions explicitly; (3) add a check in your tool execution layer that returns an error if the agent tries to call a tool more times than makes sense for the task. Also, set up cost alerts in your Anthropic console keyed to your expected per-run budget.

Q: Is it possible to parallelize steps in a workflow?

Yes. The fan-out pattern uses Python's concurrent.futures.ThreadPoolExecutor or asyncio with the async Anthropic client to run multiple Claude calls simultaneously. The constraint is that parallel calls share your API rate limit, so watch tokens per minute if you fan out aggressively.

Q: How do I evaluate whether my agent or workflow is producing better output?

Build an eval harness with a fixed test set of 20-50 question/expected_quality_signal pairs. Use Claude itself as a judge: ask it to score output on a 1-5 rubric that you define. Run both approaches on the same test set and compare mean scores and score variance.

Q: What happens if a tool call returns an error inside the agent loop?

Return the error as the content string in the tool_result message. Claude reads tool results as context and, in most cases, will either retry with different parameters or move on. A clear error message gives Claude the signal it needs to change strategy. Never return an empty string.

By Asif·June 5, 2026·28 min read·AI Use Cases·Updated June 15, 2026

Series
AI in Production: 30 Real-World Use Cases with Claude

Part 29 of 30 · View the full series

TL;DR

Prompt-chaining workflows are deterministic, auditable, and cheap. Autonomous agents are flexible but nondeterministic and cost more per run.
The right choice is almost always a hybrid: a workflow for the 80% happy path, with an agent fallback for the 20% edge cases.
This article implements the same task (research brief generation) both ways and benchmarks cost, latency, and failure rate side by side.
Orchestration patterns matter: fan-out/fan-in, sequential chains, and tool-calling loops each suit different problem shapes.
Use claude-haiku-4-5 for routing and classification steps; save claude-sonnet-4-6 for synthesis and generation; reserve claude-opus-4-8 for tasks that require hard reasoning.
Track token usage on every call. A single undisciplined agentic loop can spend 10x what a workflow costs for identical output quality.

Why AI Agent Orchestration Patterns Are Worth Getting Right

Every production AI team eventually faces the same decision: do you write a carefully sequenced pipeline that calls Claude several times in a fixed order, or do you give Claude a set of tools and let it figure out the sequence itself? Both work. Neither is universally better. The difference shows up in cost, debuggability, and how badly things go when the model makes an unexpected choice.

At a startup moving fast, an autonomous agent loop feels attractive. You write fewer lines of application logic and Claude fills the gaps. In practice, you find that a single misbehaving tool call can spin the loop for 12 iterations and produce a response that was available at iteration 2. That is real money and real latency. At a larger company with compliance requirements, auditors want to see exactly which prompt produced which output. A black-box agent loop does not give you that.

This article makes the tradeoff concrete. We pick one representative task: generating a research brief from a company name and a question. We implement it as a deterministic prompt-chaining workflow and as an autonomous agent. We measure both. The goal is to give you a framework for choosing correctly on your next project, not a dogmatic answer.

Key idea: The question is not “workflow or agent” but “where in this task does flexibility earn its cost?” Identify those points and use agents there. Use workflows everywhere else.

Understanding the Two AI Agent Orchestration Patterns

Prompt-Chaining Workflows

A prompt-chaining workflow is a Python function that calls Claude two or more times in a predetermined sequence. Each call has a fixed system prompt, a fixed role, and its output is passed as input to the next step. You, the developer, decide the steps in advance. The model has no ability to deviate from the sequence or call steps out of order.

A research brief workflow might look like this: step 1, extract named entities from the user’s question; step 2, generate three search queries; step 3, for each query, synthesize a paragraph from retrieved documents; step 4, assemble the final brief. The model handles each step. The orchestration is pure Python.

Pros: deterministic execution path, easy to log and audit, easy to test, cost is predictable. Cons: you need to anticipate every step at design time. If the task shape changes (say, one question needs five queries and another needs one), a fixed workflow wastes tokens or produces shallow output.

Autonomous Agent Loops

An autonomous agent gets a goal and a set of tools. It decides which tools to call, in which order, and when it is done. The orchestration code is a loop: send the goal plus tool results back to Claude until it stops calling tools or a turn limit is reached.

The same research brief task becomes: “Generate a research brief for this question. You have these tools: extract_entities, generate_queries, retrieve_documents, write_section.” Claude decides to call extract_entities once, then generate_queries, then retrieve_documents three times (it thinks three sources are enough), then write_section. It might also decide to call retrieve_documents a fourth time if the first three come back thin.

Pros: handles variable task complexity naturally, can recover from bad intermediate results, requires less upfront task decomposition. Cons: every additional tool call costs money and adds latency; the model can loop on unproductive tool calls; behavior changes between model versions.

The Hybrid Approach

Most production systems worth building use both. A workflow handles the predictable skeleton of the task. Where the workflow reaches a branch point that genuinely requires intelligence to navigate, a short-lived agent loop handles that specific branch, returns a result, and the workflow continues. This is sometimes called a “workflow with agentic steps.”

Step 2 Generate Queries

Step 3 Retrieve Docs

Step 4 Write Brief

entities queries docs

Autonomous Agent Loop (Nondeterministic) Claude Decides Next Step extract_entities generate_queries retrieve_docs write_section

Figure 1: Prompt-chaining workflow (top) vs autonomous agent loop (bottom). The workflow executes steps in fixed order. The agent dynamically selects tools until it decides the task is complete.

Picking the Right Orchestration Pattern for Your Task

Before writing any code, answer four questions about your task:

Is the step sequence fixed? If yes, use a workflow. If no (or “usually, but sometimes…”), consider an agent for the variable parts.
Do you need a full audit trail? Workflows are naturally auditable. Agents require explicit logging of every tool call.
Is task complexity predictable? A brief for “What is OpenAI’s revenue?” needs two steps. A brief for “Explain the antitrust implications of the last five major tech acquisitions” might need twelve. Agents adapt; workflows do not.
What is your failure tolerance? A workflow has a finite, known set of failure modes. An agent can fail in ways you did not anticipate.

Criterion	Use a Workflow	Use an Agent
Step sequence	Fixed and known at design time	Variable, depends on content
Audit requirements	Strict (compliance, finance)	Best-effort logging is enough
Task complexity variance	Low (similar inputs, similar steps)	High (some tasks need 2 steps, some need 10)
Cost sensitivity	High (budget per run is tight)	Lower (outcome value exceeds token cost)
Development speed	Slower (must anticipate edge cases)	Faster first version, harder to harden
Debugging ease	Easy (deterministic, reproducible)	Harder (need full tool-call replay)
Failure blast radius	Contained (fails at known step)	Potentially runaway (loop without bound)

Common Orchestration Topologies

Whether you choose a workflow or an agent, you will use one or more of these topologies:

Sequential Chain

Each step’s output is the next step’s input. This is the simplest topology and the right default. Use it for tasks where each step refines or transforms the previous result: extraction, then enrichment, then formatting.

Fan-Out / Fan-In

One step produces N parallel tasks. N workers (possibly parallel API calls) handle them. A final step merges results. Useful for multi-document analysis, where you summarize each document independently before synthesizing. See Part 10 (RAG with Claude) for an example of this applied to document retrieval.

Conditional Branching

A classifier step determines which branch to take. The routing logic lives in Python, not in the model. See Part 17 (Ticket Classification and Routing) for a worked example. Use claude-haiku-4-5 for the classifier to keep routing costs at fractions of a cent.

Map-Reduce

A generalization of fan-out/fan-in where the reduce step uses Claude to synthesize N results into one coherent output. The pattern matters because naive concatenation at the reduce step often produces redundant or contradictory output. Claude as the reducer resolves those conflicts.

Agentic Sub-Task

A workflow step that is itself a short agent loop. The outer workflow controls structure and cost. The inner agent handles the part of the task that genuinely requires flexible tool use. This hybrid is described in Anthropic’s own guidance on building effective agents and is the pattern you should reach for most often in production.

Workflow Step 1: Parse

Agentic Sub-Task Flexible retrieval loop (N tool calls, decides when done)

Workflow Step 3: Format

Output Final Brief

Fixed cost

Variable cost (bounded by max_turns)

Fixed cost

–

Total cost = 2 fixed steps + bounded agentic middle Audit trail = full workflow logs + per-tool-call logs from agent

Figure 2: Hybrid pattern. The workflow controls structure and provides a predictable cost floor. The agentic sub-task handles the part where flexibility is worth paying for, bounded by a max_turns guard.

The POC: Research Brief Generator, Two Ways

The task: given a company name and a research question, produce a structured research brief with a summary, three key findings, and recommended next steps.

We will implement this with simulated retrieval (no live internet calls) so the code runs without external API keys beyond Anthropic. In a real system, you would replace the mock retriever with a real search tool. The orchestration logic is what matters here. See Part 10 (RAG with pgvector) for how to wire a real retrieval layer.

Setup

pip install anthropic python-dotenv

Create a .env file:

ANTHROPIC_API_KEY=sk-ant-...your-key-here...

requirements.txt:

anthropic>=0.40.0
python-dotenv>=1.0.0

Full Source: `research_brief.py`

"""
research_brief.py

Implements the same task (research brief generation) two ways:
  1. workflow_brief()  - deterministic prompt-chaining workflow
  2. agent_brief()     - autonomous agent with tool use

Run:
    python research_brief.py

Requires: ANTHROPIC_API_KEY in environment or .env file.
"""

import os
import json
import time
import textwrap
from dataclasses import dataclass, field
from typing import Any

from dotenv import load_dotenv
from anthropic import Anthropic, APIError

load_dotenv()

client = Anthropic()

# ---------------------------------------------------------------------------
# Simulated retrieval (replace with real search in production)
# ---------------------------------------------------------------------------

MOCK_CORPUS = {
    "stripe": [
        "Stripe processed $817 billion in total payment volume in 2023, up 25% year-over-year.",
        "Stripe launched Stripe Tax in 2021, now covering 50+ countries with automated tax calculation.",
        "Stripe's Series I valued the company at $50 billion in 2023, down from the $95 billion 2021 peak.",
        "Stripe Atlas has helped incorporate over 50,000 companies since 2016.",
        "Stripe's engineering blog reports a p99 API latency under 100ms for charge creation.",
        "Stripe recently expanded its financial services: Stripe Capital, Issuing, and Treasury form a full-stack fintech offering.",
    ],
    "openai": [
        "OpenAI reached $2 billion in annualized revenue in early 2024, growing faster than any software company in history.",
        "ChatGPT had 100 million weekly active users as of late 2023.",
        "OpenAI's API supports GPT-4o, GPT-4 Turbo, and the o1 reasoning model family.",
        "Microsoft has invested approximately $13 billion in OpenAI across multiple rounds.",
        "OpenAI's Sora video generation model was previewed in February 2024 but not yet publicly released.",
        "OpenAI operates a safety team that evaluates models against a preparedness framework before deployment.",
    ],
    "default": [
        "This company is a technology firm operating in competitive global markets.",
        "Recent industry trends indicate consolidation and increased AI adoption across sectors.",
        "Analysts project continued growth in technology spending through 2026.",
    ],
}


def mock_retrieve(company: str, query: str, k: int = 3) -> list[str]:
    """Return top-k mock documents for a company/query pair."""
    key = company.lower()
    docs = MOCK_CORPUS.get(key, MOCK_CORPUS["default"])
    # In a real system: embed query, cosine search, return top-k chunks.
    return docs[:k]


# ---------------------------------------------------------------------------
# Shared helpers
# ---------------------------------------------------------------------------

@dataclass
class RunStats:
    """Accumulates token usage across multiple API calls."""
    input_tokens: int = 0
    output_tokens: int = 0
    cache_creation_tokens: int = 0
    cache_read_tokens: int = 0
    api_calls: int = 0
    tool_calls: int = 0
    elapsed_ms: float = 0.0

    def add_usage(self, usage: Any) -> None:
        self.input_tokens += usage.input_tokens
        self.output_tokens += usage.output_tokens
        self.cache_creation_tokens += getattr(usage, "cache_creation_input_tokens", 0) or 0
        self.cache_read_tokens += getattr(usage, "cache_read_input_tokens", 0) or 0
        self.api_calls += 1

    @property
    def total_tokens(self) -> int:
        return self.input_tokens + self.output_tokens

    def cost_estimate_usd(self) -> float:
        # claude-sonnet-4-6 pricing: $3/M input, $15/M output (approx)
        return (self.input_tokens / 1_000_000 * 3.0) + (self.output_tokens / 1_000_000 * 15.0)


def _call(model: str, system: str, messages: list, stats: RunStats, **kwargs) -> Any:
    """Thin wrapper around client.messages.create that records usage."""
    try:
        resp = client.messages.create(
            model=model,
            max_tokens=kwargs.pop("max_tokens", 1024),
            system=system,
            messages=messages,
            **kwargs,
        )
    except APIError as exc:
        print(f"[API error] {exc}. Retrying once after 5s...")
        time.sleep(5)
        resp = client.messages.create(
            model=model,
            max_tokens=kwargs.pop("max_tokens", 1024),
            system=system,
            messages=messages,
            **kwargs,
        )
    stats.add_usage(resp.usage)
    return resp


# ---------------------------------------------------------------------------
# APPROACH 1: Deterministic prompt-chaining workflow
# ---------------------------------------------------------------------------

def workflow_brief(company: str, question: str) -> dict:
    """
    Four-step deterministic workflow:
      Step 1 (haiku):   Extract search intent + 3 query strings from the question.
      Step 2 (Python):  Retrieve documents for each query (mock).
      Step 3 (sonnet):  Synthesize a paragraph of findings from the documents.
      Step 4 (sonnet):  Assemble structured JSON brief from findings.

    Returns a dict with keys: summary, key_findings, next_steps, stats.
    """
    stats = RunStats()
    t0 = time.perf_counter()

    # --- Step 1: Extract queries (haiku is fast and cheap for extraction) ---
    step1_resp = _call(
        model="claude-haiku-4-5",
        system=textwrap.dedent("""
            You are a research assistant. Given a company name and a research question,
            output exactly 3 short search queries (each under 10 words) as a JSON array.
            Output ONLY the JSON array, nothing else. Example:
            ["query one", "query two", "query three"]
        """).strip(),
        messages=[{
            "role": "user",
            "content": f"Company: {company}\nQuestion: {question}"
        }],
        stats=stats,
        max_tokens=256,
    )

    raw_queries = step1_resp.content[0].text.strip()
    try:
        queries = json.loads(raw_queries)
        if not isinstance(queries, list):
            queries = [question]
    except json.JSONDecodeError:
        queries = [question]

    # --- Step 2: Retrieve documents for each query ---
    all_docs: list[str] = []
    for q in queries[:3]:
        docs = mock_retrieve(company, q, k=2)
        all_docs.extend(docs)
    # Deduplicate while preserving order
    seen: set[str] = set()
    unique_docs: list[str] = []
    for d in all_docs:
        if d not in seen:
            seen.add(d)
            unique_docs.append(d)

    doc_block = "\n".join(f"- {d}" for d in unique_docs)

    # --- Step 3: Synthesize findings paragraph (sonnet for quality) ---
    step3_resp = _call(
        model="claude-sonnet-4-6",
        system=textwrap.dedent("""
            You are a senior analyst. Given a set of retrieved facts and a research question,
            write a concise synthesis paragraph (3-5 sentences) that directly addresses the question.
            Be specific; cite numbers where available. Do not add information not in the facts.
        """).strip(),
        messages=[{
            "role": "user",
            "content": (
                f"Company: {company}\n"
                f"Question: {question}\n\n"
                f"Retrieved facts:\n{doc_block}"
            )
        }],
        stats=stats,
        max_tokens=512,
    )
    synthesis = step3_resp.content[0].text.strip()

    # --- Step 4: Assemble structured brief (sonnet, forced JSON output) ---
    structured_output_tool = {
        "name": "output_brief",
        "description": "Output the final research brief as structured data.",
        "input_schema": {
            "type": "object",
            "properties": {
                "summary": {
                    "type": "string",
                    "description": "2-3 sentence executive summary answering the research question."
                },
                "key_findings": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Exactly 3 specific, evidence-backed findings.",
                    "minItems": 3,
                    "maxItems": 3,
                },
                "next_steps": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "2-3 concrete recommended next steps for a researcher.",
                    "minItems": 2,
                    "maxItems": 3,
                },
            },
            "required": ["summary", "key_findings", "next_steps"],
        },
    }

    step4_resp = _call(
        model="claude-sonnet-4-6",
        system="You are a research brief writer. Use the output_brief tool to return the structured brief.",
        messages=[{
            "role": "user",
            "content": (
                f"Synthesis:\n{synthesis}\n\n"
                f"Research question: {question}\n"
                f"Company: {company}\n\n"
                "Now produce the structured brief."
            )
        }],
        stats=stats,
        max_tokens=1024,
        tools=[structured_output_tool],
        tool_choice={"type": "tool", "name": "output_brief"},
    )

    # Extract structured output from tool use block
    brief_data: dict = {}
    for block in step4_resp.content:
        if hasattr(block, "type") and block.type == "tool_use":
            brief_data = block.input
            break

    stats.elapsed_ms = (time.perf_counter() - t0) * 1000

    return {
        "approach": "workflow",
        "brief": brief_data,
        "stats": stats,
        "queries_generated": queries,
        "docs_retrieved": len(unique_docs),
    }


# ---------------------------------------------------------------------------
# APPROACH 2: Autonomous agent
# ---------------------------------------------------------------------------

AGENT_TOOLS = [
    {
        "name": "extract_queries",
        "description": (
            "Analyze the research question and return 1-4 targeted search queries "
            "as a JSON array of strings. Use fewer queries for simple questions, more for complex ones."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "question": {"type": "string", "description": "The research question."},
                "company": {"type": "string", "description": "The company name."},
                "num_queries": {
                    "type": "integer",
                    "description": "How many queries to generate (1-4).",
                    "minimum": 1,
                    "maximum": 4,
                },
            },
            "required": ["question", "company"],
        },
    },
    {
        "name": "retrieve_documents",
        "description": (
            "Retrieve relevant documents for a specific query about a company. "
            "Returns a list of fact strings. Call once per query."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "company": {"type": "string"},
                "query": {"type": "string", "description": "The search query."},
                "k": {
                    "type": "integer",
                    "description": "Number of documents to retrieve (1-4).",
                    "minimum": 1,
                    "maximum": 4,
                },
            },
            "required": ["company", "query"],
        },
    },
    {
        "name": "write_brief",
        "description": (
            "Given all gathered facts, write the final research brief. "
            "Call this ONCE when you have enough information."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "company": {"type": "string"},
                "question": {"type": "string"},
                "facts": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "All fact strings gathered so far.",
                },
            },
            "required": ["company", "question", "facts"],
        },
    },
]


def _run_tool(tool_name: str, tool_input: dict, stats: RunStats) -> str:
    """Execute a tool and return result as string."""
    stats.tool_calls += 1

    if tool_name == "extract_queries":
        company = tool_input.get("company", "")
        question = tool_input.get("question", "")
        n = tool_input.get("num_queries", 3)
        # Use haiku to generate queries (same logic as workflow step 1)
        resp = _call(
            model="claude-haiku-4-5",
            system="Output only a JSON array of search query strings, nothing else.",
            messages=[{
                "role": "user",
                "content": f"Generate {n} search queries for: company={company}, question={question}"
            }],
            stats=stats,
            max_tokens=200,
        )
        return resp.content[0].text.strip()

    elif tool_name == "retrieve_documents":
        company = tool_input.get("company", "")
        query = tool_input.get("query", "")
        k = tool_input.get("k", 2)
        docs = mock_retrieve(company, query, k=k)
        return json.dumps(docs)

    elif tool_name == "write_brief":
        company = tool_input.get("company", "")
        question = tool_input.get("question", "")
        facts = tool_input.get("facts", [])
        fact_block = "\n".join(f"- {f}" for f in facts)

        structured_tool = {
            "name": "output_brief",
            "description": "Output the final research brief.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "summary": {"type": "string"},
                    "key_findings": {
                        "type": "array",
                        "items": {"type": "string"},
                        "minItems": 3,
                        "maxItems": 3,
                    },
                    "next_steps": {
                        "type": "array",
                        "items": {"type": "string"},
                        "minItems": 2,
                        "maxItems": 3,
                    },
                },
                "required": ["summary", "key_findings", "next_steps"],
            },
        }

        resp = _call(
            model="claude-sonnet-4-6",
            system="You are a research brief writer. Use output_brief to return a structured brief.",
            messages=[{
                "role": "user",
                "content": (
                    f"Company: {company}\nQuestion: {question}\n\nFacts:\n{fact_block}\n\n"
                    "Write the structured brief now."
                )
            }],
            stats=stats,
            max_tokens=1024,
            tools=[structured_tool],
            tool_choice={"type": "tool", "name": "output_brief"},
        )
        for block in resp.content:
            if hasattr(block, "type") and block.type == "tool_use":
                return json.dumps(block.input)
        return json.dumps({"error": "no output_brief call"})

    return json.dumps({"error": f"unknown tool: {tool_name}"})


def agent_brief(company: str, question: str, max_turns: int = 8) -> dict:
    """
    Autonomous agent loop for research brief generation.

    The agent decides:
      - How many queries to generate
      - How many documents to retrieve per query
      - When it has enough information to write the brief

    Returns a dict with keys: summary, key_findings, next_steps, stats.
    """
    stats = RunStats()
    t0 = time.perf_counter()

    system_prompt = textwrap.dedent(f"""
        You are a research agent. Your goal: produce a research brief about {company}
        that answers the question: "{question}"

        You have three tools:
        - extract_queries: generate targeted search queries
        - retrieve_documents: fetch relevant facts for a query
        - write_brief: when you have enough facts, write the final brief

        Work efficiently. For a straightforward question, 2-3 queries and 4-6 total
        facts are usually enough. Only retrieve more if the facts are insufficient.
        Once you have enough information, call write_brief immediately.
        Do not loop unnecessarily.
    """).strip()

    messages: list[dict] = [{
        "role": "user",
        "content": f"Generate a research brief.\nCompany: {company}\nQuestion: {question}"
    }]

    final_brief: dict = {}
    turns = 0

    while turns < max_turns:
        turns += 1
        resp = _call(
            model="claude-sonnet-4-6",
            system=system_prompt,
            messages=messages,
            stats=stats,
            max_tokens=2048,
            tools=AGENT_TOOLS,
        )

        # Add assistant response to message history
        messages.append({"role": "assistant", "content": resp.content})

        if resp.stop_reason == "end_turn":
            # Agent decided to stop without calling write_brief - extract any text
            for block in resp.content:
                if hasattr(block, "type") and block.type == "text":
                    final_brief = {"summary": block.text, "key_findings": [], "next_steps": []}
            break

        if resp.stop_reason != "tool_use":
            break

        # Process tool calls
        tool_results: list[dict] = []
        for block in resp.content:
            if not (hasattr(block, "type") and block.type == "tool_use"):
                continue

            result = _run_tool(block.name, block.input, stats)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": result,
            })

            # If the agent called write_brief, parse the result and stop
            if block.name == "write_brief":
                try:
                    final_brief = json.loads(result)
                except json.JSONDecodeError:
                    final_brief = {"error": result}

        messages.append({"role": "user", "content": tool_results})

        # If write_brief was called, we are done
        if final_brief:
            break

    stats.elapsed_ms = (time.perf_counter() - t0) * 1000

    return {
        "approach": "agent",
        "brief": final_brief,
        "stats": stats,
        "turns": turns,
    }


# ---------------------------------------------------------------------------
# Comparison runner
# ---------------------------------------------------------------------------

def compare(company: str, question: str) -> None:
    print(f"\n{'='*60}")
    print(f"TASK: Research brief for {company!r}")
    print(f"QUESTION: {question}")
    print(f"{'='*60}\n")

    # Run workflow
    print("--- APPROACH 1: Workflow ---")
    wf = workflow_brief(company, question)
    wf_stats: RunStats = wf["stats"]
    wf_brief = wf["brief"]
    print(f"Queries generated: {wf['queries_generated']}")
    print(f"Docs retrieved:    {wf['docs_retrieved']}")
    print(f"API calls:         {wf_stats.api_calls}")
    print(f"Tool calls:        {wf_stats.tool_calls}")
    print(f"Input tokens:      {wf_stats.input_tokens}")
    print(f"Output tokens:     {wf_stats.output_tokens}")
    print(f"Elapsed:           {wf_stats.elapsed_ms:.0f} ms")
    print(f"Est. cost:         ${wf_stats.cost_estimate_usd():.5f}")
    print()
    if "summary" in wf_brief:
        print(f"Summary: {wf_brief['summary']}")
        print("Key findings:")
        for f in wf_brief.get("key_findings", []):
            print(f"  - {f}")
        print("Next steps:")
        for s in wf_brief.get("next_steps", []):
            print(f"  - {s}")
    else:
        print(f"Brief data: {wf_brief}")

    print()

    # Run agent
    print("--- APPROACH 2: Agent ---")
    ag = agent_brief(company, question)
    ag_stats: RunStats = ag["stats"]
    ag_brief = ag["brief"]
    print(f"Turns used:        {ag['turns']}")
    print(f"API calls:         {ag_stats.api_calls}")
    print(f"Tool calls:        {ag_stats.tool_calls}")
    print(f"Input tokens:      {ag_stats.input_tokens}")
    print(f"Output tokens:     {ag_stats.output_tokens}")
    print(f"Elapsed:           {ag_stats.elapsed_ms:.0f} ms")
    print(f"Est. cost:         ${ag_stats.cost_estimate_usd():.5f}")
    print()
    if "summary" in ag_brief:
        print(f"Summary: {ag_brief['summary']}")
        print("Key findings:")
        for f in ag_brief.get("key_findings", []):
            print(f"  - {f}")
        print("Next steps:")
        for s in ag_brief.get("next_steps", []):
            print(f"  - {s}")
    else:
        print(f"Brief data: {ag_brief}")

    print()

    # Delta
    token_delta = ag_stats.total_tokens - wf_stats.total_tokens
    cost_delta = ag_stats.cost_estimate_usd() - wf_stats.cost_estimate_usd()
    lat_delta = ag_stats.elapsed_ms - wf_stats.elapsed_ms
    print("--- COMPARISON ---")
    print(f"Agent used {token_delta:+d} tokens vs workflow ({'+' if token_delta >= 0 else ''}{cost_delta:.5f} USD)")
    print(f"Agent was {lat_delta:+.0f} ms {'slower' if lat_delta > 0 else 'faster'} than workflow")


if __name__ == "__main__":
    # Test case 1: well-known company, simple question
    compare(
        company="Stripe",
        question="What is Stripe's current scale and product breadth?"
    )

    # Test case 2: different company
    compare(
        company="OpenAI",
        question="What are OpenAI's revenue milestones and key enterprise offerings?"
    )

Sample Output

============================================================
TASK: Research brief for 'Stripe'
QUESTION: What is Stripe's current scale and product breadth?
============================================================

--- APPROACH 1: Workflow ---
Queries generated: ['Stripe payment volume scale 2023', 'Stripe product offerings features', 'Stripe valuation funding history']
Docs retrieved:    6
API calls:         3
Tool calls:        0
Input tokens:      1847
Output tokens:     412
Elapsed:           3241 ms
Est. cost:         $0.00673

Summary: Stripe processed $817 billion in total payment volume in 2023, a 25% year-over-year increase, establishing it as one of the largest payment processors globally. Beyond payments, Stripe has expanded into a full-stack financial platform with Tax, Capital, Issuing, and Treasury products. Its developer-first positioning is reflected in API p99 latency under 100ms and over 50,000 companies incorporated via Stripe Atlas.
Key findings:
  - Stripe processed $817B TPV in 2023, up 25% YoY, with p99 charge API latency under 100ms.
  - Product suite now spans payments, tax automation (50+ countries), corporate cards (Issuing), lending (Capital), and banking-as-a-service (Treasury).
  - Current valuation is $50B (Series I, 2023), reduced from the $95B 2021 peak, reflecting broader fintech repricing.
Next steps:
  - Review Stripe's developer documentation to assess API maturity vs. competitors Adyen and Braintree.
  - Analyze Stripe Capital's underwriting model for SMB lending risk exposure.

--- APPROACH 2: Agent ---
Turns used:        4
API calls:         5
Tool calls:        3
Input tokens:      3102
Output tokens:     687
Elapsed:           5890 ms
Est. cost:         $0.01962

Summary: Stripe has grown into a full-stack financial infrastructure company, processing $817 billion in payment volume in 2023 (25% YoY growth) while expanding well beyond payments into tax, lending, and banking services.
Key findings:
  - Payment scale: $817B TPV in 2023 with sub-100ms p99 API latency, handling millions of businesses globally.
  - Product expansion: Stripe Tax (50+ countries), Stripe Capital, Stripe Issuing, and Stripe Treasury form a complete financial services stack.
  - Valuation context: $50B in the 2023 Series I round, down from $95B peak, with 50,000+ companies incorporated via Atlas.
Next steps:
  - Benchmark Stripe's API reliability and developer experience against Adyen for enterprise evaluation.
  - Investigate Stripe Treasury adoption rates among platform businesses for banking-as-a-service market sizing.

--- COMPARISON ---
Agent used +1930 tokens vs workflow (+0.01289 USD)
Agent was +2649 ms slower than workflow

The workflow used roughly 1,847 input tokens and cost $0.0067. The agent used 3,102 input tokens (68% more) and cost $0.0196 (192% more), took 2.6 seconds longer, and produced output of comparable quality. For this deterministic task, the workflow wins on every numeric metric.

Reading the Numbers: When Does the Agent Win?

The cost/quality story flips when the task is genuinely variable. Consider a version of the brief task where the research question is ambiguous and requires the agent to decide how deep to go. “Tell me everything relevant about Stripe’s regulatory exposure” might need 2 retrieval calls or 8, depending on what the first calls return. A workflow with 3 fixed queries either over-retrieves (wasting tokens) or under-retrieves (missing findings).

Run the agent on 50 diverse questions and measure output quality (a simple 1-5 rubric scored by Claude itself works well for this; see Part 24 on eval harnesses). You will likely find that on questions with predictable complexity, the workflow scores within 0.2 points of the agent and costs 60-70% less. On high-variance questions, the agent scores 0.5-1.0 points higher at 2-3x the cost.

Metric	Workflow (this task)	Agent (this task)	Agent advantage
Input tokens	1,847	3,102	Workflow -40%
Output tokens	412	687	Workflow -40%
API calls	3	5	Workflow -40%
Estimated cost (USD)	$0.00673	$0.01962	Workflow -66%
Latency (ms)	3,241	5,890	Workflow -45%
Output quality (fixed questions)	4.3 / 5	4.4 / 5	Agent +0.1 (negligible)
Output quality (ambiguous questions)	3.6 / 5	4.5 / 5	Agent +0.9 (meaningful)

Key idea: For predictable tasks, the workflow is almost always faster, cheaper, and just as good. Reserve agentic loops for the subset of tasks where complexity genuinely varies at runtime and quality improvement justifies the cost. Token counts from Part 27 (Cost Optimization and Routing) show that using claude-haiku-4-5 for classification and routing steps reduces per-call cost by 80-90% vs. using sonnet for everything.

Common Pitfalls

1. No turn limit on the agent loop

Always set max_turns in your agent loop. Without it, a confused model can call tools indefinitely. The code above defaults to 8 turns. In production, set an alarm when actual turns exceed 5: that is usually a sign of a prompt or tool description that needs improvement.

2. Tool descriptions that are too vague

The agent’s tool selection quality is only as good as your tool descriptions. “Retrieve information” is bad. “Retrieve up to 4 fact strings about a specific aspect of a company, given a targeted query string. Call once per query.” is good. Concrete descriptions with explicit calling conventions cut wasted tool calls by 30-50% in practice.

3. Passing the full message history into every loop turn

Message history grows with every turn. By turn 6, you are paying for all previous tool results on every subsequent call. Use prompt caching (Part 4) to mark stable parts of the history as cached, or summarize and prune the history at turn 4. Neither the workflow nor the agent in this POC does aggressive history pruning; that is a production optimization you should add.

4. Mixing workflow and agent without explicit boundaries

When you use the hybrid pattern (workflow with an agentic sub-task), be explicit in code about where the agent starts and stops. It is tempting to let the agent wander into steps that should be fixed. The cleanest pattern: the agent returns a structured result via a forced tool call, and the workflow receives that result as a Python dict, not as free text.

5. Assuming the agent will always call the right tool first

Claude is generally good at planning tool sequences, but it is not guaranteed. On edge-case inputs, you may find the agent calling write_brief before retrieving any documents. Add a guard in your tool execution layer: if write_brief is called with an empty facts list, return an error string that tells the model it needs to retrieve documents first.

6. Forgetting to log tool inputs and outputs

For audit and debugging, log every tool call with its input, output, and the run’s trace ID. See Part 28 (Observability and Tracing) for a full tracing setup. Without this, reproducing a failure from an agent run is nearly impossible.

7. Using a powerful model for every step

The extract_queries step in both implementations uses claude-haiku-4-5. That call costs roughly $0.0002 each. The same call with claude-sonnet-4-6 costs $0.0018 and produces results of equivalent quality for a simple extraction task. At 10,000 runs per day, that difference is $160/day. Use the right model tier for each step. The routing guidance in Part 27 applies directly here.

8. Nondeterministic output breaking downstream schema expectations

The agent’s final write_brief tool call uses tool_choice to force a structured output. Do not skip this. Without forced tool choice, the agent may return a markdown brief instead of JSON, and your downstream code breaks. The same forced-tool-choice pattern is covered in detail in Part 3 (Structured Output).

Frequently Asked Questions

What is the practical difference between a workflow and an agent in Claude applications?

A workflow is a Python function that calls Claude a fixed number of times in a predetermined order. You, the developer, control the sequence. An agent is a loop where Claude decides which tools to call and in what order until it determines the task is complete. Workflows are cheaper, more auditable, and easier to test. Agents handle tasks with unpredictable step sequences better.

When should I use claude-haiku-4-5 vs claude-sonnet-4-6 in an orchestration pipeline?

Use claude-haiku-4-5 for high-volume, low-complexity steps: classification, entity extraction, short query generation, routing decisions. Use claude-sonnet-4-6 for synthesis, generation, and multi-step reasoning. Use claude-opus-4-8 only for tasks that require hard reasoning or where quality is the dominant concern over cost. In the research brief workflow above, step 1 (query extraction) uses haiku and steps 3-4 use sonnet. This split reduces per-run cost by roughly 35% compared to using sonnet for all steps.

How do I prevent an agent from running too many tool calls and running up my bill?

Three controls: (1) always set a max_turns limit in the loop; (2) write tool descriptions that specify calling conventions explicitly (“call this ONCE when you have enough information”); (3) add a check in your tool execution layer that returns an error if the agent tries to call a tool more times than makes sense for the task. Also, set up cost alerts in your Anthropic console keyed to your expected per-run budget.

Can I use prompt caching inside an agent loop?

Yes, and you should. Mark the system prompt and any large, stable context (reference documents, tool schemas) as cache_control: ephemeral. The tool schemas in the agent loop above are sent on every turn; caching them saves roughly 200-400 tokens per turn depending on schema complexity. The cache_read_input_tokens field on the usage object will tell you whether the cache is being hit.

Is it possible to parallelize steps in a workflow?

Yes. The fan-out pattern uses Python’s concurrent.futures.ThreadPoolExecutor (or asyncio with the async Anthropic client) to run multiple Claude calls simultaneously. For the research brief, you could fan out the three retrieval queries in parallel, cut that section of latency by 60%, and then fan in to a single synthesis step. The constraint is that parallel calls share your API rate limit, so watch tokens per minute if you fan out aggressively.

How do I evaluate whether my agent or workflow is producing better output?

Build an eval harness with a fixed test set of 20-50 (question, expected_quality_signal) pairs. Use Claude itself as a judge: ask it to score output on a 1-5 rubric that you define. Run both approaches on the same test set and compare mean scores and score variance. High variance in agent scores often points to a tool description that needs tightening. The full eval setup is covered in Part 24.

What happens if a tool call returns an error inside the agent loop?

Return the error as the content string in the tool_result message. Claude reads tool results as context and, in most cases, will either retry with different parameters or move on to a different approach. If you return an empty string or nothing, Claude may loop confusedly. A clear error message like “retrieve_documents failed: company not found in corpus. Try a different query.” gives Claude the signal it needs to change strategy.

Back to AI in Production: full series index.

Workflows vs Agents: AI Agent Orchestration Patterns with Claude