Model Context Protocol: Connect Claude to Your Tools

Q: What Python version is required for the MCP package?

The mcp package requires Python 3.10 or newer. It uses asyncio throughout, and some type annotation syntax requires 3.10+. The anthropic SDK supports 3.8+, so the binding constraint is the MCP package.

Q: Can I run the MCP server as a long-running daemon instead of a subprocess?

Yes. The mcp package provides an SSE transport that runs the server as an HTTP service. You can point multiple clients at it simultaneously. For single-user use cases, stdio is simpler.

Q: How does Claude discover which tool to call?

No special prompting is needed. Claude reads the name, description, and inputSchema of each tool and decides based on the user's request. Tool selection accuracy depends on description quality.

Q: Are MCP tool results included in Claude's context window?

Yes. Each tool result counts against the context window. Summarize or truncate large payloads on the server side before returning to avoid slow responses and high costs on subsequent calls.

Q: Can I use MCP with the structured output pattern?

Yes. Structured output works alongside MCP tools. Define the output schema as one of the tools the server exposes, or as an additional inline tool passed alongside MCP tools, then force it with tool_choice.

By Asif·June 5, 2026·22 min read·AI Use Cases·Updated June 15, 2026

Series
AI in Production: 30 Real-World Use Cases with Claude

Part 23 of 30 · View the full series

TL;DR

The Model Context Protocol (MCP) is an open standard that lets Claude communicate with external tools and data sources through a well-defined JSON-RPC interface, without custom glue code per integration.
MCP uses a client/server architecture. The server exposes tools and resources; Claude acts as the client that discovers and calls them at runtime.
The stdio transport runs the MCP server as a subprocess communicating over stdin/stdout, making it trivially embeddable in any Python or Node.js workflow.
Building a minimal MCP server takes about 50 lines of Python using the mcp package, and Claude can call its tools with no special prompt engineering.
MCP removes the N-times-M integration problem: write one server per tool, connect it to any MCP-compatible client including Claude Desktop, the Anthropic SDK, and third-party agents.
Production concerns include input validation on the server side, timeout handling in the client, and clear tool descriptions that help Claude pick the right tool.

Why the Model Context Protocol Claude Integration Matters

Every AI application eventually hits the same wall: you have data or actions locked inside a database, an API, or a legacy service, and getting Claude to interact with them requires writing bespoke tool definitions, plumbing, and glue code for each case. You end up with a pile of one-off function stubs that only work with one model provider, one SDK version, and one deployment topology.

The Model Context Protocol (MCP) is Anthropic’s answer to this. Released in late 2024 and already adopted by several major developer tools, MCP defines a single open standard that any host (Claude Desktop, the Anthropic Python SDK, third-party agents) can speak to any server (your database connector, your file system tool, your internal API). Write the server once; attach it to any client.

For technical founders and backend engineers, this matters because:

You can expose your internal tooling to Claude without rewriting it for each project.
Your MCP server works identically whether Claude is running inside Claude Desktop, a FastAPI background job, or an autonomous agent loop.
The protocol is model-agnostic. Other LLM providers are adopting it, so your server investment is not tied to one vendor.
The stdio transport requires zero network infrastructure for local development.

This article walks you through the full picture: what MCP is architecturally, how the stdio transport works, a complete minimal Python server, and a client that has Claude call it. By the end you will have a running POC you can extend with your own tools.

MCP Architecture: Hosts, Clients, and Servers

The three roles

MCP separates concerns into three roles that communicate over JSON-RPC 2.0:

Host: The application that embeds an LLM. Claude Desktop is the canonical host. Your Python script that uses the Anthropic SDK is also a host when you wire it up to MCP.
Client: A component inside the host that manages one connection to one MCP server. The host can maintain multiple clients, one per server.
Server: A process that exposes capabilities: tools (callable functions), resources (readable data), and prompts (reusable templates). This is the part you write.

The server has no knowledge of Claude or any specific LLM. It just handles JSON-RPC requests. The client (inside the host) translates the server’s tool list into the format Claude expects and routes tool calls back and forth.

Claude (Anthropic SDK)

MCP Client (manages conn)

Transport: stdio (subprocess pipe)

MCP Server (subprocess)

Tools callable funcs

Resources readable data

Prompts (reusable templates)

JSON-RPC 2.0

Figure 1: MCP architecture. The host embeds both Claude (via the Anthropic SDK) and an MCP client. The client spawns the server as a subprocess and communicates via stdin/stdout pipes (stdio transport).

What servers expose

MCP servers can expose three primitive types:

Primitive	What it is	Claude sees it as
Tool	A callable function with an input schema	A tool in the `tools` array, callable via tool use
Resource	A URI-addressable piece of data (file, DB row, API response)	Context injected into the conversation
Prompt	A reusable message template with arguments	A prompt template the host can invoke

For most production integrations, tools are the workhorse. You define a function, give it a JSON Schema input spec, and Claude will call it when it decides the function is relevant. Resources are useful when you want to inject data into context without Claude explicitly requesting it.

The stdio Transport: Simplest Possible Wiring

How stdio works

MCP supports multiple transports. The stdio transport is the simplest: the host process spawns the server as a subprocess and writes JSON-RPC messages to the server’s stdin, reading responses from its stdout. Stderr is available for logging without polluting the protocol channel.

This approach has a few practical advantages:

No port allocation, no network firewall rules, no TLS certificates for local dev.
The server process lifecycle is tied to the host. When the host exits, the server subprocess exits too.
Works identically on Linux, macOS, and Windows.
Claude Desktop uses stdio for all its built-in server configurations.

The alternative transports (HTTP with Server-Sent Events, WebSocket) are for cases where the server runs remotely or needs to be shared across multiple clients simultaneously. Start with stdio; migrate later if you need it.

Message flow

When a host connects to a stdio server, the initialization handshake looks like this:

Host sends initialize with its protocol version and capabilities.
Server responds with its own version, capabilities, and server info.
Host sends initialized to confirm.
Host calls tools/list to get the full tool catalog.
Host passes that catalog to Claude as the tools parameter in a messages API call.
If Claude decides to use a tool, the host receives a tool_use block, calls tools/call on the MCP server, and feeds the result back to Claude as a tool_result content block.

MCP Server

Claude API

1. initialize server info + capabilities

2. initialized (notify)

3. tools/list [tool definitions]

4. messages.create (tools=…) stop_reason=tool_use

5. tools/call tool result

6. messages.create (tool_result) final text response

Figure 2: Sequence diagram for a single tool call over stdio transport. Steps 1-3 are the MCP handshake. Steps 4-6 are the Claude API round-trip that executes the tool.

Key idea: Your Python host code does not manually implement JSON-RPC. The mcp Python package handles the protocol, serialization, and subprocess management. You write the server as plain Python functions decorated with @server.tool(), and the client as a few lines using ClientSession. The protocol is invisible at the application layer.

The Minimal MCP Server POC: Complete Runnable Code

What we are building

The POC has two files:

server.py: An MCP server exposing one tool (get_system_stats) and one resource (a static “about” text). It runs as a standalone process communicating over stdio.
client.py: A host script that spawns the server, fetches its tool list, feeds those tools to Claude (Sonnet 4.6), and handles the tool-use round-trip automatically.

The tool returns basic system information (CPU percent, memory used, disk usage). This is concrete enough to show real data flowing through the protocol, and simple enough that you can swap it for any real tool in your stack.

Prerequisites and installation

pip install anthropic mcp psutil


# requirements.txt
anthropic>=0.40.0
mcp>=1.0.0
psutil>=5.9.0


# .env
ANTHROPIC_API_KEY=sk-ant-...

The MCP server (server.py)


#!/usr/bin/env python3
"""
server.py  --  Minimal MCP server over stdio transport.

Exposes:
  Tool:     get_system_stats   (returns CPU / memory / disk usage)
  Resource: about://server     (static description text)

Run standalone for debugging:
  python server.py
But in production it is spawned as a subprocess by client.py.
"""

import json
import psutil
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import (
    Tool,
    TextContent,
    Resource,
    ReadResourceResult,
)


# Create the server instance with a name clients will see.
app = Server("system-stats-server")


# ---------------------------------------------------------------------------
# Tool: get_system_stats
# ---------------------------------------------------------------------------

@app.list_tools()
async def list_tools() -> list[Tool]:
    """Advertise all tools this server provides."""
    return [
        Tool(
            name="get_system_stats",
            description=(
                "Return current CPU usage percentage, total and available "
                "memory in megabytes, and disk usage for the root filesystem. "
                "Use this whenever the user asks about system performance, "
                "resource consumption, or available capacity."
            ),
            inputSchema={
                "type": "object",
                "properties": {
                    "include_per_cpu": {
                        "type": "boolean",
                        "description": (
                            "If true, include per-core CPU percentages "
                            "in addition to the overall average."
                        ),
                        "default": False,
                    }
                },
                "required": [],
            },
        )
    ]


@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    """Dispatch tool calls by name and return results."""
    if name == "get_system_stats":
        include_per_cpu = arguments.get("include_per_cpu", False)

        # Gather stats. cpu_percent(interval=0.1) blocks briefly for accuracy.
        cpu_overall = psutil.cpu_percent(interval=0.1)
        mem = psutil.virtual_memory()
        disk = psutil.disk_usage("/")

        result: dict = {
            "cpu_percent_overall": cpu_overall,
            "memory_total_mb": round(mem.total / 1024 / 1024, 1),
            "memory_available_mb": round(mem.available / 1024 / 1024, 1),
            "memory_used_percent": mem.percent,
            "disk_total_gb": round(disk.total / 1024 / 1024 / 1024, 2),
            "disk_used_gb": round(disk.used / 1024 / 1024 / 1024, 2),
            "disk_free_gb": round(disk.free / 1024 / 1024 / 1024, 2),
            "disk_used_percent": disk.percent,
        }

        if include_per_cpu:
            per_cpu = psutil.cpu_percent(interval=0.1, percpu=True)
            result["cpu_percent_per_core"] = per_cpu

        return [TextContent(type="text", text=json.dumps(result, indent=2))]

    # Unknown tool -- return an error string rather than raising, so the
    # client sees a graceful tool_result instead of a protocol error.
    return [TextContent(type="text", text=f"Unknown tool: {name}")]


# ---------------------------------------------------------------------------
# Resource: about://server
# ---------------------------------------------------------------------------

@app.list_resources()
async def list_resources() -> list[Resource]:
    """Advertise resources this server provides."""
    return [
        Resource(
            uri="about://server",
            name="Server description",
            description="A short description of what this MCP server does.",
            mimeType="text/plain",
        )
    ]


@app.read_resource()
async def read_resource(uri: str) -> ReadResourceResult:
    """Handle resource read requests by URI."""
    if uri == "about://server":
        text = (
            "system-stats-server v1.0\n"
            "Provides real-time CPU, memory, and disk metrics via psutil.\n"
            "Tool: get_system_stats\n"
            "Transport: stdio\n"
        )
        return ReadResourceResult(
            contents=[TextContent(type="text", text=text)]
        )
    raise ValueError(f"Unknown resource URI: {uri}")


# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------

async def main():
    """Start the server using stdio transport."""
    async with stdio_server() as (read_stream, write_stream):
        await app.run(
            read_stream,
            write_stream,
            app.create_initialization_options(),
        )


if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

The host client (client.py)


#!/usr/bin/env python3
"""
client.py  --  Host that:
  1. Spawns server.py as a subprocess (stdio transport).
  2. Fetches its tool list via MCP.
  3. Passes those tools to Claude (claude-sonnet-4-6).
  4. Handles the tool_use / tool_result round-trip automatically.
  5. Prints the final answer.

Usage:
  python client.py "How much free disk space do I have?"
  python client.py "Show me per-core CPU usage right now"
"""

import asyncio
import json
import os
import sys

import anthropic
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client


CLAUDE_MODEL = "claude-sonnet-4-6"
MAX_TOKENS = 1024
SERVER_SCRIPT = os.path.join(os.path.dirname(__file__), "server.py")


async def run(user_question: str) -> None:
    # --- 1. Spawn the MCP server and open a session ---
    server_params = StdioServerParameters(
        command="python",
        args=[SERVER_SCRIPT],
        env=None,  # inherit the current environment
    )

    async with stdio_client(server_params) as (read_stream, write_stream):
        async with ClientSession(read_stream, write_stream) as session:
            # MCP handshake: initialize + initialized + capabilities exchange
            await session.initialize()

            # --- 2. Fetch the tool list from the MCP server ---
            tools_result = await session.list_tools()
            mcp_tools = tools_result.tools  # list of mcp.types.Tool objects

            # Convert MCP tool objects to the shape the Anthropic SDK expects.
            anthropic_tools = [
                {
                    "name": t.name,
                    "description": t.description or "",
                    "input_schema": t.inputSchema,
                }
                for t in mcp_tools
            ]

            # --- 3. First call to Claude ---
            client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

            messages = [{"role": "user", "content": user_question}]

            try:
                response = client.messages.create(
                    model=CLAUDE_MODEL,
                    max_tokens=MAX_TOKENS,
                    system=(
                        "You are a helpful system-monitoring assistant. "
                        "Use the available tools to answer questions about "
                        "the current machine's resource usage. "
                        "Always call the tool before answering; "
                        "do not guess at values."
                    ),
                    tools=anthropic_tools,
                    messages=messages,
                )
            except anthropic.APIError as exc:
                print(f"Anthropic API error: {exc}", file=sys.stderr)
                return

            # --- 4. Agentic loop: handle tool_use blocks ---
            # We allow up to 5 iterations to avoid infinite loops.
            iterations = 0
            while response.stop_reason == "tool_use" and iterations < 5:
                iterations += 1

                # Collect all tool_use blocks from this response.
                tool_uses = [
                    block
                    for block in response.content
                    if block.type == "tool_use"
                ]

                # Append Claude's response (with tool_use blocks) to history.
                messages.append({"role": "assistant", "content": response.content})

                # Execute each requested tool call via MCP and collect results.
                tool_results = []
                for block in tool_uses:
                    print(
                        f"  [tool call] {block.name}({json.dumps(block.input)})",
                        file=sys.stderr,
                    )
                    try:
                        mcp_result = await session.call_tool(
                            block.name, arguments=block.input
                        )
                        # mcp_result.content is a list of TextContent objects.
                        result_text = "\n".join(
                            c.text for c in mcp_result.content if hasattr(c, "text")
                        )
                    except Exception as exc:
                        result_text = f"Tool execution error: {exc}"

                    tool_results.append(
                        {
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": result_text,
                        }
                    )

                # Send tool results back to Claude.
                messages.append({"role": "user", "content": tool_results})

                try:
                    response = client.messages.create(
                        model=CLAUDE_MODEL,
                        max_tokens=MAX_TOKENS,
                        system=(
                            "You are a helpful system-monitoring assistant. "
                            "Use the available tools to answer questions about "
                            "the current machine's resource usage."
                        ),
                        tools=anthropic_tools,
                        messages=messages,
                    )
                except anthropic.APIError as exc:
                    print(f"Anthropic API error on follow-up: {exc}", file=sys.stderr)
                    return

            # --- 5. Print the final text answer ---
            for block in response.content:
                if hasattr(block, "text"):
                    print(block.text)


def main():
    question = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else "What is the current system resource usage?"
    asyncio.run(run(question))


if __name__ == "__main__":
    main()

Project structure


mcp-poc/
  server.py          # MCP server (the tool provider)
  client.py          # Host: spawns server, calls Claude, handles tool loop
  requirements.txt
  .env               # ANTHROPIC_API_KEY=sk-ant-...

Sample run


$ cd mcp-poc
$ source .env   # or: export ANTHROPIC_API_KEY=sk-ant-...
$ python client.py "How much free disk space do I have, and is memory pressure high?"

  [tool call] get_system_stats({})

Your system currently has 312.4 GB of free disk space out of a total 500.1 GB (37.5% used).
Memory pressure is moderate: you are using 68.3% of 16,384.0 MB total RAM,
leaving about 5,202.4 MB available. CPU is at 12.4% overall.
Nothing looks critically constrained right now, though if memory climbs
above 85-90% you may start seeing swap activity.

The stderr line shows the tool call happening. The final answer is Claude’s prose interpretation of the raw JSON the server returned. Notice that Claude did not hallucinate numbers; it waited for the tool result before writing a single figure.

Model Context Protocol Claude: Choosing the Right Model Tier

Not every MCP integration needs Sonnet. The model choice depends on what the tools return and how much reasoning the response requires.

Scenario	Recommended model	Why
System monitoring queries, simple lookups	claude-sonnet-4-6	Fast, cheap, accurate enough for structured data interpretation
High-volume log classification via MCP tool	claude-haiku-4-5	10-20x cheaper, sub-second latency for short inputs
Multi-tool agent with planning and code generation	claude-opus-4-8	Best tool selection when tool descriptions overlap or inputs are ambiguous
RAG + tool use (large context injection)	claude-sonnet-4-6 with prompt caching	Cache the system prompt + tool descriptions; pay once per cache TTL

For the MCP server itself, model choice is irrelevant. The server is a plain Python process; it never calls an LLM. Only the client (host) side involves model selection.

Extending the Server: Adding More Tools and Resources

Adding a second tool

To add a list_processes tool that returns the top-N processes by CPU usage, append it to the list_tools return value and add a branch in call_tool:


# In list_tools(), add to the returned list:
Tool(
    name="list_processes",
    description=(
        "Return the top N processes ranked by CPU usage. "
        "Use when the user asks what is consuming CPU or wants a process list."
    ),
    inputSchema={
        "type": "object",
        "properties": {
            "top_n": {
                "type": "integer",
                "description": "Number of processes to return (default 5).",
                "default": 5,
            }
        },
        "required": [],
    },
)

# In call_tool(), add a branch:
if name == "list_processes":
    top_n = arguments.get("top_n", 5)
    procs = []
    for p in psutil.process_iter(["pid", "name", "cpu_percent", "memory_percent"]):
        try:
            procs.append(p.info)
        except psutil.NoSuchProcess:
            pass
    procs.sort(key=lambda x: x.get("cpu_percent") or 0, reverse=True)
    return [TextContent(type="text", text=json.dumps(procs[:top_n], indent=2))]

The client requires no changes. Claude will automatically discover the new tool on its next tools/list call at session startup.

Making a real business integration

Replacing psutil with a real integration looks the same. A Postgres MCP server would have tools like run_select_query (read-only, schema-validated inputs) and get_table_schema. The key discipline: keep tools focused, write precise descriptions, and validate inputs on the server side. Claude’s tool selection accuracy depends almost entirely on description quality, not on any special prompting.

If you have already built function-calling integrations following Part 2: Tool Use with Claude, wrapping them in an MCP server is straightforward. The JSON Schema input format is identical; you are just changing where the dispatch code lives.

Connecting to Claude Desktop

Claude Desktop reads MCP server configurations from a JSON file. Once your server.py works via the client.py POC, you can register it in Claude Desktop with no code changes:


// On macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
// On Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "system-stats": {
      "command": "python",
      "args": ["/absolute/path/to/mcp-poc/server.py"]
    }
  }
}

After restarting Claude Desktop, the tools appear in the sidebar under “Connected tools.” Users can ask “what’s my disk usage?” in natural language and Claude will call your server automatically.

This is the design goal of MCP: write the server once, attach it to any host. The exact same server.py file works with your Python client, Claude Desktop, and any other MCP-compatible agent framework without modification.

Common Pitfalls

1. Weak tool descriptions

Claude decides which tool to call based purely on the description and the input schema. If the description is vague (“gets some data”), Claude will either not call it when it should, or call it for unrelated queries. Write descriptions like you are writing a docstring for a colleague who does not have the source: what does it return, when should you call it, and what inputs does it accept.

2. Forgetting input validation on the server

Claude passes block.input directly to your server. Even though you defined an input schema, Claude can occasionally send unexpected keys or wrong types (especially with Haiku). Always validate on the server side, and return a descriptive error string rather than raising an exception. An exception in call_tool propagates as a protocol error and breaks the session.

3. No iteration cap in the agent loop

The client.py above caps the loop at 5 iterations. Without a cap, a badly described tool can cause Claude to call the same tool repeatedly if the result is not what it expected. Set the cap explicitly and log a warning when it is hit.

4. Using sys.stdout for logging in the server

The stdio transport uses stdout as the protocol channel. Any print() call or logging to stdout in the server will corrupt the JSON-RPC stream and break the connection. Route all server-side logging to stderr: import sys; print("debug", file=sys.stderr), or use the standard logging module configured to write to a file.

5. Blocking calls in async handlers

The mcp server runs on asyncio. Calling psutil.cpu_percent(interval=1.0) inside an async handler blocks the event loop for a full second. For the POC it is fine at 0.1 s. For production servers with multiple concurrent clients, move blocking calls into asyncio.to_thread() or reduce the interval.


# Safe pattern for blocking I/O inside an async MCP handler:
import asyncio

cpu = await asyncio.to_thread(psutil.cpu_percent, interval=0.5)

6. Assuming the MCP server is always running

With stdio transport the server is spawned fresh per client session. It is not a long-running daemon. Any state you accumulate in server-side globals is lost when the session ends. If you need persistence (a connection pool, a cache), store it in a database or a shared file, not in a module-level variable.

Cost and Latency Considerations

MCP adds a small overhead compared to inline tool definitions: the initialization handshake (two round-trips over stdin/stdout) adds a few milliseconds at session start. After that, each tool call adds one IPC round-trip (sub-millisecond on localhost) plus whatever your tool itself takes.

The dominant cost is the Claude API call count. Each messages.create call costs tokens. With MCP you typically need two calls per tool use: one to get the tool call, one to get the final answer. The strategies from Part 4: Prompt Caching apply directly. Cache your system prompt and tool definitions; they are identical across calls in the same session.

Component	Latency (typical)	Cost driver
MCP session init (stdio)	10-30 ms	None (local IPC only)
tools/list round-trip	1-5 ms	None
Claude API call (Haiku 4.5, 512 in / 256 out)	400-800 ms	~$0.0002 per call
Claude API call (Sonnet 4.6, 512 in / 512 out)	1.5-3 s	~$0.005 per call
Tool execution (psutil stats)	100-200 ms	None
Total (Sonnet, one tool call, no cache)	3-7 s	~$0.01 (two API calls)

For interactive use cases where a user is waiting for a response, Sonnet 4.6 with streaming (covered in Part 26: Streaming Responses with Claude) gives a much better perceived latency: the first tokens appear within a second even if the full response takes three.

For high-volume automated pipelines (classify 10,000 log lines per hour via an MCP tool), switch to Haiku 4.5 and batch the requests. The per-token cost difference between Haiku and Sonnet is roughly 10-20x. See Part 27: Cut AI Costs for a worked routing example.

MCP Versus Inline Tool Definitions: When to Use Each

MCP is not always the right choice. Here is a clear decision framework:

Use inline tools (passing tools=[...] directly in messages.create) when: you have one or two tools, they are specific to a single script, and you do not need them accessible from Claude Desktop or other agents.
Use an MCP server when: the same tools need to work across multiple clients (Desktop + API + agent), you have a catalog of 5 or more tools, or you want to share a tool server within a team without distributing Python glue code.
Use MCP with HTTP transport (SSE or WebSocket) when: the server needs to be hosted remotely, shared by multiple concurrent clients, or integrated with a web-based host.

If you are already using the agentic loop pattern from Part 22: Autonomous Agent Loop, MCP is a natural extension: your agent’s tools become MCP tools, and the agent itself becomes an MCP host. The agent loop code looks the same; only the tool dispatch path changes.

Frequently Asked Questions

What Python version is required for the MCP package?

The mcp package requires Python 3.10 or newer. It uses asyncio throughout, and some type annotation syntax (union types with |) requires 3.10+. The anthropic SDK supports 3.8+, so the binding constraint for this POC is the MCP package.

Can I run the MCP server as a long-running daemon instead of a subprocess?

Yes. The mcp package also provides an SSE (Server-Sent Events) transport that runs the server as an HTTP service. You can point multiple clients at it simultaneously. The trade-off is that you need to manage the server process independently (systemd, Docker, etc.) and handle authentication. For most single-user and single-project use cases, stdio is simpler.

How does Claude discover which tool to call? Is there any special prompting needed?

No special prompting is needed. Claude reads the name, description, and inputSchema of each tool and decides based on the user’s request which (if any) to call. The only thing you control is the quality of those descriptions. A tool with a precise, example-driven description will be selected more reliably than one with a vague one-liner.

What happens if the MCP server crashes mid-session?

With stdio transport, the host detects the subprocess exit because the read stream closes. The ClientSession will raise an exception on the next call_tool or list_tools call. You should wrap the async with stdio_client(...): block in a try/except and either re-spawn the server or return an error to the user. The server exit does not affect the Claude API session; you can still call client.messages.create without MCP tools if needed.

Are MCP tool results included in Claude’s context window?

Yes. Each tool result you send back as a tool_result content block is part of the messages array and counts against the context window. If your tools return large payloads (thousands of lines of logs, large JSON blobs), summarize or truncate on the server side before returning. Claude has a 200K token context, but very large tool results slow down the response and increase cost on every subsequent call in the same conversation.

Can I use MCP with the structured output pattern?

Yes. Structured output (defining one tool whose input schema represents your desired output, then forcing it with tool_choice={"type": "tool", "name": "..."}) works alongside MCP tools. You would define the output schema as one of the tools the MCP server exposes (or as an additional inline tool passed alongside the MCP tools). The pattern is covered in Part 3: Structured Output from Claude.

Does the MCP server need to know which Claude model is being used?

No. The MCP server is completely decoupled from the LLM. It handles JSON-RPC requests; it has no idea whether Claude, GPT-4, or any other model is on the other side. This decoupling is one of the main design goals: you can swap models in the host without touching the server code.

Back to the full AI in Production series.

Connect Claude to Your Tools with MCP (Model Context Protocol)