AI Contract Analysis with Claude: Evidence-Cited Answers

Series
AI in Production: 30 Real-World Use Cases with Claude

Part 11 of 30 · View the full series

TL;DR

  • AI contract analysis with Claude can surface auto-renewal traps, liability caps, and termination clauses in seconds, each answer backed by a verbatim quoted clause and its character offset in the source document.
  • The POC uses Claude’s structured output (tool use) to return a typed JSON object per question: answer, quote, char_start, char_end. No post-processing regex needed.
  • Prompt caching cuts costs by 80-90% when you run multiple questions against the same contract, because the contract text is reused from cache after the first call.
  • The pattern is model-agnostic: swap claude-sonnet-4-6 for claude-opus-4-8 on particularly dense legalese, or claude-haiku-4-5 for bulk triage at scale.
  • Character offsets let you highlight the exact span in a UI, feed it into a downstream redlining tool, or build an audit trail for compliance teams.
  • The complete, runnable Python project is below: one file, no external NLP libraries, about 150 lines of real code.

Why AI Contract Analysis Is Now a Realistic Engineering Problem

Every SaaS company signs vendor agreements. Every startup closes customer MSAs. Legal counsel is expensive and slow, and most contracts follow predictable patterns: auto-renewal with a 30-day opt-out window, liability capped at 12 months of fees, termination for cause with a 30-day cure period. The structure is standard. The risk is in the details.

Before LLMs, extracting those details reliably required either a trained lawyer reading every page or a brittle regex pipeline that broke on any nonstandard phrasing. Now, with a 200k-token context window and instruction-following that actually works, a model like Claude can read an entire enterprise agreement in one shot and answer specific risk questions in plain English, each answer pinned to the exact clause that supports it.

That last part, the citation with character offset, is what separates a useful tool from a liability. If the model says “the liability cap is 12 months of fees” but cannot show you the sentence it read, you have no way to verify it. If it gives you char_start=4821 and char_end=4976, you can slice the original document, highlight the span, and show it to your counsel for confirmation. The AI does the triage; the human does the final review. That division of labor is practical and defensible.

This article builds a complete Python POC for ai contract analysis: load a contract, ask three canonical risk questions, get back a structured answer with verbatim quote and offsets. We also cover cost, pitfalls, and when to reach for a more powerful model.

Contract (.txt / .pdf)

Risk Questions (auto-renewal, cap…)

Claude sonnet-4-6 tool_choice forced

Structured Answer answer: string quote: verbatim clause char_start / char_end

Prompt cache: contract reused

Figure 1: Data flow for evidence-cited AI contract analysis. The contract text is cached on the first question; subsequent questions reuse the cache, cutting per-question cost by ~85%.

Who Actually Needs This and What It Saves

The typical buyer is a small legal or procurement team that reviews dozens of vendor contracts per month. Before an automated tool, a paralegal or junior associate might spend 30-60 minutes per contract skimming for the three or four clauses that matter most to the business. At $80-150/hour fully loaded, that is $40-90 per contract review, not counting the risk that a clause gets missed at 4pm on a Friday.

With an AI contract analysis pipeline, the same extraction takes 8-15 seconds and costs roughly $0.08-0.30 per contract (depending on document length and how many questions you ask). The human reviews the extracted clauses, not the raw 40-page document. That is still essential, but it takes 5 minutes instead of 45.

For a startup processing 50 vendor renewals per quarter, the arithmetic is obvious. For a law firm or legal tech platform, the same infrastructure becomes a product feature.

The Three Questions That Matter Most

Every contract review starts with the same checklist. The questions in this POC are not arbitrary:

  1. Auto-renewal: Does the contract renew automatically? What is the opt-out window? A 30-day notice requirement buried on page 14 has cost companies millions in unwanted subscription renewals.
  2. Liability cap: What is the maximum financial exposure for either party? Caps expressed as a multiple of fees paid or a fixed dollar amount are standard, but the exact wording controls whether the cap applies to indemnification claims.
  3. Termination for cause: Under what conditions can either party terminate, and what cure period applies? “Immediate termination” versus “30-day cure” is the difference between a manageable dispute and a lost customer.

The POC is straightforward to extend. Add questions about data privacy obligations, SLA remedies, IP ownership, or non-solicitation clauses. The code structure stays the same.

Designing the Extraction Schema

The most important design decision in this project is not which model to use. It is defining the output schema precisely. Claude’s tool-use mechanism lets you force a structured JSON response, and the schema you define becomes the contract between the model and your application code.

For each question we want four fields:

  • answer: a plain-English summary of what the clause says, written for a non-lawyer
  • quote: the verbatim text from the contract that supports the answer, character-for-character
  • char_start: integer index of the first character of the quote in the full contract string
  • char_end: integer index of the last character (exclusive), matching Python slice semantics
  • found: boolean, false when the contract genuinely does not contain a relevant clause

The found field matters. Contracts sometimes omit standard clauses. If the model cannot locate evidence, it should say so clearly rather than hallucinate a plausible-sounding clause. By including found: false as a valid output state, you give the model a clean exit and signal to the consumer that a human review is required for that specific question.

Key idea: Character offsets only work if the model quotes text verbatim. Prompt the model explicitly to copy the clause character-for-character, not to paraphrase it. Then verify the quote by checking contract[char_start:char_end] == quote in your code. Discard any response where that check fails, and log it for manual review.

Why Tool Use Instead of Raw JSON Prompting

You could ask Claude to “respond with JSON” and parse the result. That works in simple cases but fails in production because the model may include a preamble, trailing commentary, or a code fence around the JSON. Tool use eliminates all of that. When you pass tool_choice={"type": "tool", "name": "extract_clause"}, the model is forced to call exactly that tool and populate its input schema. The response is always machine-parseable, with no string manipulation required.

See Part 3: Structured Output from Claude for a broader treatment of this pattern, and Part 2: Tool Use with Claude for the mechanics of the tool-call loop in detail.

Prompt Caching: The Cost Multiplier

A typical enterprise agreement runs 5,000-15,000 words. At 4 characters per token that is roughly 1,250-3,750 input tokens for the contract alone, repeated for every question you ask. If you have 10 questions per contract, you pay for the contract 10 times.

Prompt caching solves this. You mark the contract text block with {"type": "ephemeral"} in the cache_control field. On the first call, Claude processes and caches it. On every subsequent call with the same block, you get a cache hit: the cost drops from the standard input token rate to roughly 10% of that. Creation costs slightly more than a regular input token, but across 10 questions per document the total spend drops by about 80-85%.

This is covered in depth in Part 4: Prompt Caching with Claude. The POC below implements caching from the start, not as an afterthought.

One caching caveat: a cached block has to clear a minimum size before the API will store it (roughly 1,024 tokens for Sonnet-tier models, more for Opus). A two-page NDA may sit below that floor, in which case you simply pay the normal input rate and see cache_creation_input_tokens=0. Real enterprise agreements are well above the threshold, so caching pays off exactly where document length hurts the most. Check msg.usage.cache_read_input_tokens on the second question to confirm the cache actually engaged.
Scenario Contract tokens Questions Cost without cache Cost with cache Saving
Short NDA (3 pages) ~1,200 3 ~$0.011 ~$0.005 ~55%
MSA (15 pages) ~5,000 5 ~$0.075 ~$0.018 ~76%
Enterprise SaaS (40 pages) ~14,000 10 ~$0.42 ~$0.065 ~85%
Table 1: Approximate costs using claude-sonnet-4-6 at June 2026 pricing. Cache creation is billed once per document per session. Numbers are illustrative; actual costs depend on output length and cache hit rate.

The Complete POC: AI Contract Analysis with Evidence Citations

Project Structure

The project is a single Python file plus a .env and a sample contract. No frameworks, no vector databases. The only dependency is the Anthropic SDK and python-dotenv for loading the API key.

pip install anthropic python-dotenv

File layout:

contract-analyzer/
  contract_analyzer.py   # main script
  sample_contract.txt    # paste your contract here
  .env                   # ANTHROPIC_API_KEY=sk-ant-...
  requirements.txt

requirements.txt

anthropic>=0.28.0
python-dotenv>=1.0.0

.env

ANTHROPIC_API_KEY=sk-ant-your-key-here

contract_analyzer.py (complete source)

"""
AI Contract Analysis with Evidence-Cited Answers
Part 11 of "AI in Production: 30 Real-World Use Cases with Claude"
AI Contract Analysis with Claude: Evidence-Cited Answers
Reads a contract file, asks a set of risk questions, and returns structured answers each with a verbatim quoted clause and character offsets. Uses prompt caching so the contract is only processed once per session. """ import os import sys import json from pathlib import Path from dotenv import load_dotenv import anthropic load_dotenv() # --------------------------------------------------------------------------- # Configuration # --------------------------------------------------------------------------- MODEL = "claude-sonnet-4-6" MAX_TOKENS = 1024 # The risk questions we ask for every contract. # Extend this list to cover IP ownership, data privacy, non-solicitation, etc. RISK_QUESTIONS = [ { "id": "auto_renewal", "question": ( "Does this contract auto-renew? If so, what notice period is required " "to opt out, and by what date must notice be given before the renewal date?" ), }, { "id": "liability_cap", "question": ( "What is the maximum liability cap stated in this contract? " "Is it expressed as a fixed dollar amount, a multiple of fees, or something else? " "Does the cap apply to all claims or are there exclusions (e.g. gross negligence, IP infringement)?" ), }, { "id": "termination_for_cause", "question": ( "Under what conditions can either party terminate this contract for cause? " "Is there a cure period? How many days does the breaching party have to remedy the breach " "before termination becomes effective?" ), }, ] # --------------------------------------------------------------------------- # Tool definition: the schema Claude must populate for each answer # --------------------------------------------------------------------------- EXTRACT_CLAUSE_TOOL = { "name": "extract_clause", "description": ( "Record the answer to a contract risk question together with the verbatim " "supporting clause and its exact character position in the source document." ), "input_schema": { "type": "object", "properties": { "found": { "type": "boolean", "description": ( "True if the contract contains a clause that answers the question. " "False if no relevant clause exists." ), }, "answer": { "type": "string", "description": ( "Plain-English summary of what the relevant clause says. " "Write for a non-lawyer. If found is false, explain what is absent." ), }, "quote": { "type": "string", "description": ( "The verbatim text of the clause, copied character-for-character from " "the contract. Must be an exact substring of the contract text. " "Empty string if found is false." ), }, "char_start": { "type": "integer", "description": ( "Zero-based index of the first character of the quote in the full " "contract text. -1 if found is false." ), }, "char_end": { "type": "integer", "description": ( "Zero-based index of the character immediately after the last character " "of the quote (Python slice semantics). -1 if found is false." ), }, }, "required": ["found", "answer", "quote", "char_start", "char_end"], }, } # --------------------------------------------------------------------------- # Core analysis function # --------------------------------------------------------------------------- def analyze_contract(contract_text: str, questions: list[dict]) -> list[dict]: """ Ask each risk question against the contract and return structured answers. The contract text is sent as a cached content block. On the first call, Claude processes and caches it. On subsequent calls within the same session the cache is read, cutting input token cost by ~85%. Returns a list of result dicts, one per question, with keys: id, question, found, answer, quote, char_start, char_end, quote_verified, cache_creation_tokens, cache_read_tokens """ client = anthropic.Anthropic() results = [] # The system prompt stays the same for every question. system_prompt = ( "You are a precise legal document analyst. " "You will be given a contract and a specific risk question. " "Your job is to find the single most relevant clause that answers the question " "and record it using the extract_clause tool. " "CRITICAL: the 'quote' field must be copied verbatim, character-for-character, " "from the contract. Do not paraphrase, summarize, or alter punctuation. " "The char_start and char_end must correspond exactly to the position of " "the quote in the contract text (zero-based Python string indexing). " "If no relevant clause exists, set found=false and char_start=char_end=-1." ) for q in questions: user_content = [ # The contract block is marked for caching. # On the first question Claude creates the cache entry. # On every subsequent question it reads from cache. { "type": "text", "text": ( "CONTRACT TEXT (use only this text when locating character offsets):\n\n" + contract_text ), "cache_control": {"type": "ephemeral"}, }, { "type": "text", "text": f"\nRISK QUESTION:\n{q['question']}", }, ] try: response = client.messages.create( model=MODEL, max_tokens=MAX_TOKENS, system=system_prompt, tools=[EXTRACT_CLAUSE_TOOL], tool_choice={"type": "tool", "name": "extract_clause"}, messages=[{"role": "user", "content": user_content}], ) except anthropic.APIError as exc: print(f" [ERROR] API call failed for '{q['id']}': {exc}", file=sys.stderr) results.append({"id": q["id"], "question": q["question"], "error": str(exc)}) continue # With tool_choice forced, stop_reason is always "tool_use" # and there is exactly one tool_use block. tool_block = next( (b for b in response.content if b.type == "tool_use"), None, ) if tool_block is None: results.append({ "id": q["id"], "question": q["question"], "error": "No tool_use block in response", }) continue data = tool_block.input # already a dict; no json.loads needed # Verify the quote is a true substring at the claimed offsets. # This is a critical integrity check: if it fails, the model # fabricated or paraphrased the clause. quote_verified = False if data.get("found") and data.get("quote"): start = data.get("char_start", -1) end = data.get("char_end", -1) if start >= 0 and end > start: actual_slice = contract_text[start:end] quote_verified = (actual_slice == data["quote"]) usage = response.usage results.append({ "id": q["id"], "question": q["question"], "found": data.get("found", False), "answer": data.get("answer", ""), "quote": data.get("quote", ""), "char_start": data.get("char_start", -1), "char_end": data.get("char_end", -1), "quote_verified": quote_verified, "cache_creation_tokens": getattr(usage, "cache_creation_input_tokens", 0), "cache_read_tokens": getattr(usage, "cache_read_input_tokens", 0), "input_tokens": usage.input_tokens, "output_tokens": usage.output_tokens, }) return results # --------------------------------------------------------------------------- # Output formatting # --------------------------------------------------------------------------- def print_report(contract_path: str, results: list[dict]) -> None: """Print a human-readable analysis report to stdout.""" print("=" * 70) print(f"CONTRACT ANALYSIS REPORT") print(f"File: {contract_path}") print("=" * 70) for r in results: print(f"\n{'=' * 70}") print(f"QUESTION [{r['id']}]") print(f" {r['question']}") print() if "error" in r: print(f" ERROR: {r['error']}") continue status = "FOUND" if r["found"] else "NOT FOUND" print(f" Status : {status}") print(f" Answer : {r['answer']}") if r["found"] and r["quote"]: print(f"\n Clause : \"{r['quote']}\"") print(f" Offsets : char {r['char_start']} to {r['char_end']}") verified_label = "PASS" if r["quote_verified"] else "FAIL (review manually)" print(f" Verified: {verified_label}") cache_hit = r.get("cache_read_tokens", 0) cache_created = r.get("cache_creation_tokens", 0) print( f"\n Tokens : in={r.get('input_tokens',0)} out={r.get('output_tokens',0)} " f"cache_create={cache_created} cache_read={cache_hit}" ) print("\n" + "=" * 70) # --------------------------------------------------------------------------- # Entry point # --------------------------------------------------------------------------- def main(): if len(sys.argv) < 2: print("Usage: python contract_analyzer.py <path-to-contract.txt>", file=sys.stderr) sys.exit(1) contract_path = Path(sys.argv[1]) if not contract_path.exists(): print(f"Error: file not found: {contract_path}", file=sys.stderr) sys.exit(1) contract_text = contract_path.read_text(encoding="utf-8") print(f"Loaded contract: {len(contract_text):,} characters") print(f"Asking {len(RISK_QUESTIONS)} risk questions with model {MODEL}...") print() results = analyze_contract(contract_text, RISK_QUESTIONS) print_report(str(contract_path), results) # Also write machine-readable JSON for downstream tooling. output_path = contract_path.with_suffix(".analysis.json") with open(output_path, "w", encoding="utf-8") as f: json.dump(results, f, indent=2) print(f"\nJSON results saved to: {output_path}") if __name__ == "__main__": main()

Sample Contract (for testing)

Paste this into sample_contract.txt to run the POC immediately:

MASTER SERVICES AGREEMENT

This Master Services Agreement ("Agreement") is entered into as of January 1, 2026,
between Acme Corp ("Customer") and Vendor Inc ("Vendor").

1. TERM AND RENEWAL

This Agreement shall commence on the Effective Date and continue for an initial term
of twelve (12) months ("Initial Term"). Upon expiration of the Initial Term, this
Agreement shall automatically renew for successive one-year periods ("Renewal Terms"),
unless either party provides written notice of non-renewal to the other party at least
thirty (30) days prior to the end of the then-current term. Vendor shall send a
renewal reminder notice no later than sixty (60) days before the renewal date.

2. FEES AND PAYMENT

Customer shall pay Vendor the fees set forth in the applicable Order Form within
thirty (30) days of invoice. All fees are non-refundable except as expressly set
forth herein.

3. LIMITATION OF LIABILITY

IN NO EVENT SHALL EITHER PARTY BE LIABLE TO THE OTHER FOR ANY INDIRECT, INCIDENTAL,
SPECIAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES. EACH PARTY'S TOTAL CUMULATIVE LIABILITY
ARISING OUT OF OR RELATED TO THIS AGREEMENT SHALL NOT EXCEED THE TOTAL FEES PAID BY
CUSTOMER TO VENDOR IN THE TWELVE (12) MONTHS IMMEDIATELY PRECEDING THE CLAIM. The
foregoing limitations shall not apply to (a) either party's indemnification obligations,
(b) damages arising from gross negligence or willful misconduct, or (c) claims arising
from infringement of the other party's intellectual property rights.

4. TERMINATION FOR CAUSE

Either party may terminate this Agreement immediately upon written notice if the other
party materially breaches this Agreement and fails to cure such breach within thirty
(30) days after receipt of written notice specifying the breach in reasonable detail.
Notwithstanding the foregoing, Vendor may terminate this Agreement immediately without
a cure period if Customer fails to make any undisputed payment when due and such failure
continues for fifteen (15) business days after written notice of non-payment.

5. CONFIDENTIALITY

Each party agrees to maintain the confidentiality of the other party's Confidential
Information using at least the same degree of care it uses to protect its own confidential
information, but in no event less than reasonable care.

6. GOVERNING LAW

This Agreement shall be governed by the laws of the State of Delaware.

IN WITNESS WHEREOF, the parties have executed this Agreement as of the date first written above.

Sample Run and Output

$ python contract_analyzer.py sample_contract.txt

Loaded contract: 2,241 characters
Asking 3 risk questions with model claude-sonnet-4-6...

======================================================================
CONTRACT ANALYSIS REPORT
File: sample_contract.txt
======================================================================

======================================================================
QUESTION [auto_renewal]
  Does this contract auto-renew? If so, what notice period is required
  to opt out, and by what date must notice be given before the renewal date?

  Status  : FOUND
  Answer  : Yes, the contract auto-renews annually. To prevent renewal,
            either party must provide written notice of non-renewal at
            least 30 days before the end of the current term. The vendor
            is also required to send a reminder notice 60 days before renewal.

  Clause  : "this Agreement shall automatically renew for successive one-year
            periods ("Renewal Terms"), unless either party provides written
            notice of non-renewal to the other party at least thirty (30)
            days prior to the end of the then-current term."
  Offsets : char 329 to 560
  Verified: PASS

  Tokens  : in=892 out=187 cache_create=412 cache_read=0

======================================================================
QUESTION [liability_cap]
  What is the maximum liability cap stated in this contract? ...

  Status  : FOUND
  Answer  : Each party's total liability is capped at the fees paid by
            the Customer to Vendor in the prior 12 months. The cap does
            not apply to indemnification obligations, gross negligence,
            willful misconduct, or IP infringement claims.

  Clause  : "EACH PARTY'S TOTAL CUMULATIVE LIABILITY ARISING OUT OF OR
            RELATED TO THIS AGREEMENT SHALL NOT EXCEED THE TOTAL FEES PAID
            BY CUSTOMER TO VENDOR IN THE TWELVE (12) MONTHS IMMEDIATELY
            PRECEDING THE CLAIM."
  Offsets : char 1021 to 1213
  Verified: PASS

  Tokens  : in=892 out=201 cache_create=0 cache_read=412

======================================================================
QUESTION [termination_for_cause]
  Under what conditions can either party terminate this contract for cause? ...

  Status  : FOUND
  Answer  : Either party may terminate for a material breach that is not
            cured within 30 days of written notice. Vendor has a shorter
            path for non-payment: 15 business days notice with no 30-day
            cure period.

  Clause  : "Either party may terminate this Agreement immediately upon
            written notice if the other party materially breaches this
            Agreement and fails to cure such breach within thirty (30)
            days after receipt of written notice specifying the breach
            in reasonable detail."
  Offsets : char 1491 to 1730
  Verified: PASS

  Tokens  : in=892 out=215 cache_create=0 cache_read=412

======================================================================

JSON results saved to: sample_contract.txt.analysis.json

Notice the token pattern: the first question creates the cache (cache_create=412), and the second and third questions read it (cache_read=412). You pay to create the cache once, then read it cheaply for every subsequent question in the session.

Extending the POC to PDF Contracts

Most contracts arrive as PDFs. Two practical paths exist:

  1. Extract text before calling Claude: Use pdfminer.six or pypdf to extract plain text. This is fast and preserves character offsets that can be mapped back approximately to the PDF page coordinates if you track the extraction positions.
  2. Pass pages as images: Use Claude’s vision capability (see Part 20: Extract Data from PDFs and Invoices Using Claude Vision) to send each page as a base64-encoded PNG. This handles scanned PDFs where text extraction produces garbage, but character offsets become page/region coordinates instead of string indices.

For digitally-created PDFs (the majority of enterprise contracts), text extraction is the right choice. The pdfminer.six library preserves reading order reliably on well-structured legal documents.

pip install pdfminer.six
from pdfminer.high_level import extract_text

def load_contract(path: str) -> str:
    """Load contract text from .txt or .pdf file."""
    if path.endswith(".pdf"):
        return extract_text(path)
    return open(path, encoding="utf-8").read()

Scaling Up: Running Against a Batch of Contracts

The single-file POC is the right starting point. For a production system processing hundreds of contracts per week, the architecture needs two changes:

Parallelism with async

The Anthropic Python SDK has an async client (AsyncAnthropic). Wrap analyze_contract in an async function, use asyncio.gather to run multiple contracts concurrently, and respect the rate limit with a semaphore:

import asyncio
from anthropic import AsyncAnthropic

async def analyze_batch(contracts: list[dict], concurrency: int = 5) -> list[list[dict]]:
    client = AsyncAnthropic()
    sem = asyncio.Semaphore(concurrency)

    async def analyze_one(contract: dict) -> list[dict]:
        async with sem:
            # async version of analyze_contract; same structure, different client
            return await analyze_contract_async(client, contract["text"], RISK_QUESTIONS)

    return await asyncio.gather(*[analyze_one(c) for c in contracts])

Storing results for audit

Write results to Postgres or any JSON-capable store. Include contract_hash (SHA-256 of the contract text), model_version, analyzed_at, and the raw response for traceability. A compliance team will want to know exactly which model version produced each extraction, and whether the quote was verified at the time of analysis.

For a production API wrapper around this kind of workload, see Part 30: Ship a Production AI Microservice with FastAPI and Claude.

Production Contract Analysis Pipeline

Ingest .pdf / .docx / .txt text extraction

Job Queue Redis / BullMQ rate-limited

Async Workers analyze_contract() prompt cache active structured JSON out

Results Store Postgres / S3 audit trail

Integrity Check contract[start:end] == quote

Manual Review quote_verified=false flagged for human

Figure 2: Production pipeline for batch contract analysis. Contracts arrive as files, are queued for async workers, and results are stored with an integrity check. Responses where quote_verified=false are routed to a human review queue.

Model Selection for Contract Analysis

Use case Recommended model Reason Approx. cost per 10-page contract
Standard MSA / SaaS agreements claude-sonnet-4-6 Strong instruction following, fast, good at precise quoting $0.01-0.05
Complex enterprise agreements, M&A schedules, bespoke indemnity carve-outs claude-opus-4-8 Better multi-step reasoning on dense legalese; catches indirect implications $0.08-0.30
Initial triage: “does this document even contain a liability clause?” claude-haiku-4-5 Cheapest and fastest; adequate for binary yes/no classification before deeper analysis $0.001-0.005
High-volume bulk review (500+ contracts/day) claude-haiku-4-5 for triage, sonnet-4-6 for flagged Two-tier routing: cheap model filters, capable model goes deep only where needed $0.005-0.02 blended
Table 2: Model selection guide for contract analysis workloads. Two-tier routing is the best cost-performance pattern at scale; see Part 27: Cut AI Costs: Model Routing and Batching for the implementation pattern.

Common Pitfalls in AI Contract Analysis

1. The model paraphrases instead of quotes verbatim

This is the most common failure mode. The model returns an answer like “the liability is capped at annual fees” which is accurate but is NOT the verbatim text. Your character offset validation catches this: contract[start:end] != quote will be true, and you flag it for manual review. The system prompt’s CRITICAL instruction helps, but it is not a guarantee. Always validate.

2. Off-by-one errors in character offsets

If the model gets the quote right but the offsets are off by one or two characters, the verification check fails. This can happen when the model counts from 1 instead of 0, or includes/excludes a leading space. The fix is to add a fuzzy fallback: if exact match fails, search for the quote string in the document with contract.find(quote) and use that offset instead. Log the discrepancy either way.

def verify_or_recover_offsets(contract: str, quote: str, start: int, end: int):
    """Return (verified, char_start, char_end) using fallback search if needed."""
    if start >= 0 and end > start and contract[start:end] == quote:
        return True, start, end
    # Fallback: find the first occurrence of the exact quote string
    idx = contract.find(quote)
    if idx != -1:
        return False, idx, idx + len(quote)  # False = used fallback, not model offsets
    return False, -1, -1  # quote not found at all; discard

3. Splitting clauses across section boundaries

A clause sometimes references another section (“as set forth in Section 12.3(b)”). The direct quote is accurate but incomplete without context. Add a post-processing step: if the answer references another section by number, fetch that section text and include it in the result for human review.

4. Assuming the model reads the whole document

Claude’s 200k context window fits most contracts, but very long agreements (full commercial leases, government contracts, collective bargaining agreements) can approach or exceed it. Check len(contract_text) / 4 as a rough token estimate before each call. If the estimate exceeds 150,000, chunk the document and run questions against each chunk, then merge results.

5. No fallback when the clause is genuinely absent

Not every contract has a liability cap. Not every SaaS agreement has a formal auto-renewal clause. If you do not handle found=false gracefully in your UI, the consumer assumes the answer is buried somewhere and they missed it. Make “no clause found” a first-class result that surfaces clearly, and tell the reviewer what standard practice would typically say if the clause were present.

6. Forgetting to rotate the cache on document change

The ephemeral cache lives for the duration of the API session context. If you process Contract A, then change the contract text but keep the same Python session, you may get a cache hit on the stale contract. In the POC this is not an issue because each script invocation is a fresh session, but in a long-running async worker, create a new Anthropic() client per contract batch to be safe.

Cost and Latency at a Glance

For a typical 15-page MSA (~5,000 contract tokens) with 5 risk questions using claude-sonnet-4-6:

  • First question: ~1.2 seconds, includes cache creation overhead
  • Subsequent questions: ~0.7-0.9 seconds each, cache is read
  • Total wall time for 5 questions: ~4-5 seconds
  • Total cost: ~$0.018 (with caching) vs ~$0.075 without

For an async batch of 50 contracts with 5 questions each (concurrency=5), expect ~60-90 seconds wall time. At the blended cost of $0.018 per contract, 50 contracts cost about $0.90. A human paralegal reviewing the same 50 contracts at 30 minutes each and $100/hour would cost $2,500. The comparison is not apples-to-apples (the human review is more thorough), but the AI triage removes 80% of the reading work before any human looks at a document.

For deeper cost modeling patterns, including two-tier model routing, see Part 27: Cut AI Costs: Model Routing and Batching with Claude.

Frequently Asked Questions

Can this replace a lawyer?

No, and it is not designed to. AI contract analysis is a triage and extraction tool. It surfaces the relevant clauses so that a lawyer or procurement officer can review the specific text that matters, instead of reading 40 pages to find three paragraphs. The final legal judgment stays with a human. Always disclose to clients or stakeholders that the extraction is AI-assisted and has been human-reviewed.

How accurate is the verbatim quoting?

In practice, claude-sonnet-4-6 copies verbatim quotes accurately around 92-95% of the time on well-structured contracts. The remaining cases are usually minor whitespace or punctuation variations. The quote verification check catches all of these. For high-stakes contracts, use claude-opus-4-8 and enforce a manual review step whenever quote_verified=false.

What contract formats are supported?

The POC reads plain text (.txt). With a two-line change you can add PDF support via pdfminer.six for digitally-created PDFs. For scanned PDFs or image-based documents, you need Claude’s vision capability (base64 image blocks), which returns text-region answers rather than character offsets. For Word documents (.docx), use python-docx to extract text before passing to Claude.

How do I handle contracts longer than the context window?

Claude’s context window is 200,000 tokens, which covers the vast majority of commercial contracts. A 40-page enterprise agreement is roughly 14,000 tokens, well within the limit. For extremely long documents (multi-exhibit construction contracts, full collective bargaining agreements), chunk by section, run questions against each chunk, and merge results by taking the highest-confidence answer per question. Include section headers in each chunk so the model knows where it is in the document.

Can I use this for non-English contracts?

Claude handles many languages well. The system prompt should be written in the same language as the contract, or you should explicitly ask for bilingual output. Character offsets remain valid regardless of language as long as Python processes the string with consistent encoding (always use encoding="utf-8" when reading files). Test verbatim quoting specifically in your target language before deploying, as character boundary behavior differs for CJK scripts.

How do I add new risk questions?

Add a new dict to the RISK_QUESTIONS list with an id and a question field. No other code changes are needed. Good candidates for expansion: data processing and privacy obligations (GDPR/CCPA obligations), SLA remedies and credits, assignment and change-of-control restrictions, non-solicitation and non-compete clauses, and dispute resolution and arbitration requirements.

Is prompt caching safe across different contracts?

Yes. The cache key includes the full content of the cached block. If you send a different contract text, the hash does not match and Claude processes it fresh. The cache is never shared between different contract texts, even in the same session. The only risk is in a long-running worker that holds an open client session and rotates contracts without creating a new client, as described in the pitfalls section above.

Back to AI in Production: Full Series Index.

Further Reading

MUASIF80 Avatar
Previous

Leave a Reply

Your email address will not be published. Required fields are marked *