AI Invoice Data Extraction with Claude Vision

Q: Can I process invoices that are already text-layer PDFs (not scans)?

Yes. The vision approach works, but for text-layer PDFs you can save on token costs by extracting text with pdfplumber and sending it as a plain text message. Vision is better for scanned documents or complex layouts where raw text loses structure.

Q: How do I handle invoices in languages other than English?

Claude handles most major languages. Specify in your prompt that field names should be in English while line item descriptions can stay in the source language, or ask for translation. Add an explicit instruction like 'Return all field names in English. Keep line item descriptions in the original language.'

Q: How accurate is the extraction compared to purpose-built OCR tools like Textract?

On well-formatted digital-to-PDF invoices, accuracy is comparable. On varied layouts and scanned invoices, Claude tends to outperform template-based OCR because it understands semantic context without template configuration. Textract is cheaper at very high volumes; a hybrid approach is common in production.

Q: Can I add a confidence score per field?

Add a confidence property (number 0 to 1) to each field in your schema and ask Claude to populate it. Route invoices where any required field has confidence below 0.8 to human review.

By Asif·June 5, 2026·22 min read·AI Use Cases·Updated June 15, 2026

Series
AI in Production: 30 Real-World Use Cases with Claude

Part 20 of 30 · View the full series

TL;DR

Claude’s vision API accepts base64-encoded images (PNG, JPEG, WEBP, GIF), which covers invoice scans, PDF page renders, and photos taken on a phone.
AI invoice data extraction with Claude returns typed, structured JSON for vendor, line items, totals, and tax in a single API call, with zero training data or fine-tuning required.
Combining tool_choice with a strict output schema gives you validated, typed objects instead of free-text that you then have to parse yourself.
A built-in math validator catches transposition errors and rounding bugs before the record ever reaches your accounting system.
End-to-end latency on a single invoice page with claude-sonnet-4-6 is typically 2 to 4 seconds; cost is under $0.02 per page at current pricing.
The complete, runnable Python project below handles PDF rendering, image encoding, extraction, math validation, and JSON output in under 200 lines.

The Business Case for AI Invoice Data Extraction

Accounts payable teams still spend a staggering amount of time keying data from PDFs into ERP systems. The typical mid-size company processes between 500 and 5,000 invoices per month. At one to five minutes of manual entry per invoice, that is anywhere from 8 to 400 person-hours per month, not counting the cost of errors caught during audit.

Traditional OCR (Tesseract, AWS Textract, Google Document AI) solves part of the problem: it converts pixels to text. But it does not understand that “Net 30” is a payment term, that “4 x $12.50” should equal “$50.00”, or that the line item labeled “HST” is a Canadian tax of 13%. That gap between raw text and structured, validated business data is exactly where Claude’s vision capability earns its cost.

With ai invoice data extraction using Claude, you send the invoice image and get back a typed JSON object: vendor name, address, line items with descriptions and unit prices, subtotal, tax breakdown, and grand total. The model reasons about the document layout rather than pattern-matching fixed coordinates, which means one extraction prompt handles hundreds of different invoice templates without any template configuration.

Who benefits most

Finance and AP teams at companies receiving invoices from many different vendors with inconsistent formats.
SaaS platforms (spend management, procurement, expense tools) that need to parse user-uploaded receipts and bills.
Accounting firms doing batch bookkeeping for small business clients who email PDFs.
Logistics and construction companies where subcontractors send hand-formatted or hand-written invoices.

What this is not

This approach is not a replacement for a full AP automation platform when you need multi-approval workflows, ERP write-back, or PO matching at scale. It is the extraction layer: the piece that turns an image into a structured record that your existing workflow can then act on.

How Claude Vision Works for Document Extraction

Claude’s multimodal API is straightforward. In a standard messages call, the content field of a user message can be a list of content blocks rather than a plain string. An image block looks like this:

{
  "type": "image",
  "source": {
    "type": "base64",
    "media_type": "image/png",
    "data": "<base64-encoded bytes>"
  }
}

That block can sit alongside a text block in the same message. Claude sees the image and the instruction together, reasons about what it sees, and returns the result. No separate vision endpoint, no separate key, no SDK configuration difference. It is the same client.messages.create call you already use.

Supported media types

Media type string	Covers	Notes
`image/jpeg`	JPEG photos, scans	Most common for phone receipts
`image/png`	PNG exports from PDF renderers	Best quality for text clarity
`image/webp`	WebP exports	Good compression, lossless option
`image/gif`	Single-frame GIF	Rarely used for documents

PDFs are not directly accepted as a media type. The standard approach (and the one used in the POC below) is to render each PDF page to a PNG using pdf2image (which wraps poppler), then send each page as a separate image block. For multi-page invoices you batch the pages or process them sequentially and merge the extracted data.

Invoice PDF or Image

pdf2image render page

base64 encode bytes

Claude Sonnet 4.6

Typed JSON: vendor, line items, totals, tax

Input Step 1 Step 2 Step 3

Figure 1: End-to-end pipeline from PDF to structured JSON using Claude Vision. PDF pages are rendered to PNG, base64-encoded, and sent as image content blocks.

Structuring the Extraction with Tool Use

Asking Claude to “return JSON” in a text response works, but the result can vary in formatting: sometimes wrapped in a markdown code fence, sometimes with trailing explanation. The cleaner approach is to use Claude’s structured output via tool use, which forces a specific schema and gives you a clean Python dict with zero parsing gymnastics.

You define one tool that represents your invoice schema, pass tool_choice={"type": "tool", "name": "extract_invoice"}, and Claude is forced to populate that schema rather than write prose. When stop_reason == "tool_use", you read the block.input and you are done. This pattern is covered in depth in Part 2 of this series.

The invoice schema

A production invoice schema needs to capture:

Vendor block: name, address, email, phone, VAT/GST registration number.
Invoice metadata: invoice number, issue date, due date, currency code.
Line items: array of objects, each with description, quantity, unit price, and line total.
Totals: subtotal, tax breakdown (name, rate, amount), shipping, grand total.
Payment terms: net days, bank details if present.

The math validator then recomputes everything from the line items and flags any discrepancy larger than $0.01 (to allow for legitimate rounding). A discrepancy usually means OCR misread a digit, the invoice itself has an arithmetic error, or a line item was cut off at a page boundary.

Key idea: The model does not need a separate “OCR step” followed by a “parsing step.” It reads the image, understands the layout and meaning together, and returns a structured object in one shot. This is what makes Claude vision genuinely different from Textract plus a regex parser.

The Complete POC: AI Invoice Data Extraction

Install and requirements

pip install anthropic pdf2image pillow python-dotenv

You also need poppler installed on your system for pdf2image to work:

Ubuntu/Debian: sudo apt-get install poppler-utils
macOS: brew install poppler
Windows: download a poppler binary and add bin/ to your PATH.

requirements.txt

anthropic>=0.28.0
pdf2image>=1.17.0
Pillow>=10.0.0
python-dotenv>=1.0.0

.env.example

# Copy to .env and fill in your key. Never commit .env to source control.
ANTHROPIC_API_KEY=sk-ant-...

Full source: invoice_extractor.py

"""
invoice_extractor.py
--------------------
AI invoice data extraction using Claude Vision.

Usage:
    python invoice_extractor.py path/to/invoice.pdf
    python invoice_extractor.py path/to/invoice.png

Outputs a JSON object to stdout and a .json sidecar file next to the source.
"""

import base64
import json
import os
import sys
from decimal import Decimal, ROUND_HALF_UP
from pathlib import Path
from typing import Optional

import anthropic
from dotenv import load_dotenv

# ── Optional: use pdf2image only when the input is a PDF ──────────────────────
try:
    from pdf2image import convert_from_path
    PDF_SUPPORT = True
except ImportError:
    PDF_SUPPORT = False

load_dotenv()

CLIENT = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env
MODEL = "claude-sonnet-4-6"

# ─────────────────────────────────────────────────────────────────────────────
# Schema definition (used as the tool Claude must populate)
# ─────────────────────────────────────────────────────────────────────────────

INVOICE_TOOL = {
    "name": "extract_invoice",
    "description": (
        "Extract all structured data from an invoice image into the defined schema. "
        "Return null for any field that is not present on the invoice. "
        "All monetary amounts must be numbers (floats), not strings. "
        "Dates must be ISO 8601 strings (YYYY-MM-DD) where possible."
    ),
    "input_schema": {
        "type": "object",
        "required": ["vendor", "invoice_number", "issue_date", "currency", "line_items", "totals"],
        "properties": {
            "vendor": {
                "type": "object",
                "required": ["name"],
                "properties": {
                    "name":       {"type": "string"},
                    "address":    {"type": ["string", "null"]},
                    "email":      {"type": ["string", "null"]},
                    "phone":      {"type": ["string", "null"]},
                    "tax_id":     {"type": ["string", "null"],
                                   "description": "VAT / GST / EIN / ABN or equivalent"}
                }
            },
            "bill_to": {
                "type": ["object", "null"],
                "properties": {
                    "name":    {"type": ["string", "null"]},
                    "address": {"type": ["string", "null"]}
                }
            },
            "invoice_number": {"type": "string"},
            "purchase_order": {"type": ["string", "null"]},
            "issue_date":     {"type": "string", "description": "ISO 8601 or as printed"},
            "due_date":       {"type": ["string", "null"]},
            "currency":       {"type": "string", "description": "ISO 4217 code, e.g. USD"},
            "line_items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "required": ["description", "quantity", "unit_price", "line_total"],
                    "properties": {
                        "description": {"type": "string"},
                        "quantity":    {"type": "number"},
                        "unit_price":  {"type": "number"},
                        "line_total":  {"type": "number"}
                    }
                }
            },
            "totals": {
                "type": "object",
                "required": ["subtotal", "grand_total"],
                "properties": {
                    "subtotal":  {"type": "number"},
                    "discount":  {"type": ["number", "null"]},
                    "shipping":  {"type": ["number", "null"]},
                    "taxes": {
                        "type": ["array", "null"],
                        "items": {
                            "type": "object",
                            "required": ["name", "amount"],
                            "properties": {
                                "name":   {"type": "string",
                                           "description": "e.g. VAT, GST, HST, Sales Tax"},
                                "rate":   {"type": ["number", "null"],
                                           "description": "as a decimal, e.g. 0.10 for 10%"},
                                "amount": {"type": "number"}
                            }
                        }
                    },
                    "grand_total": {"type": "number"}
                }
            },
            "payment_terms": {"type": ["string", "null"]},
            "notes":         {"type": ["string", "null"]}
        }
    }
}

# ─────────────────────────────────────────────────────────────────────────────
# Image helpers
# ─────────────────────────────────────────────────────────────────────────────

def load_image_as_base64(path: Path) -> tuple[str, str]:
    """Return (media_type, base64_data) for a PNG/JPEG/WEBP file."""
    suffix = path.suffix.lower()
    media_map = {
        ".png":  "image/png",
        ".jpg":  "image/jpeg",
        ".jpeg": "image/jpeg",
        ".webp": "image/webp",
        ".gif":  "image/gif",
    }
    media_type = media_map.get(suffix, "image/png")
    raw = path.read_bytes()
    return media_type, base64.standard_b64encode(raw).decode("utf-8")


def pdf_to_images(pdf_path: Path, dpi: int = 200) -> list[tuple[str, str]]:
    """
    Render each PDF page to a PNG and return a list of (media_type, base64_data).
    dpi=200 is a good balance between legibility and token count.
    """
    if not PDF_SUPPORT:
        raise RuntimeError(
            "pdf2image is not installed. Run: pip install pdf2image\n"
            "You also need poppler: https://poppler.freedesktop.org/"
        )
    pages = convert_from_path(str(pdf_path), dpi=dpi)
    results = []
    for page in pages:
        # Convert PIL Image to PNG bytes in memory
        from io import BytesIO
        buf = BytesIO()
        page.save(buf, format="PNG")
        b64 = base64.standard_b64encode(buf.getvalue()).decode("utf-8")
        results.append(("image/png", b64))
    return results


def image_block(media_type: str, data: str) -> dict:
    """Build a Claude vision image content block."""
    return {
        "type": "image",
        "source": {
            "type": "base64",
            "media_type": media_type,
            "data": data,
        }
    }

# ─────────────────────────────────────────────────────────────────────────────
# Extraction
# ─────────────────────────────────────────────────────────────────────────────

def extract_invoice(image_blocks: list[dict]) -> dict:
    """
    Send one or more image blocks to Claude with the invoice extraction tool.
    Returns the raw extracted dict from the tool input.
    Raises RuntimeError if Claude does not call the tool.
    """
    # Build the user message: all image blocks first, then the instruction text
    content: list[dict] = list(image_blocks)
    content.append({
        "type": "text",
        "text": (
            "Extract all invoice data from the image(s) above using the extract_invoice tool. "
            "If multiple pages are shown, treat them as a single invoice document. "
            "For any monetary amount, return a number not a string. "
            "For dates, convert to YYYY-MM-DD format where the format is unambiguous."
        )
    })

    try:
        response = CLIENT.messages.create(
            model=MODEL,
            max_tokens=2048,
            tools=[INVOICE_TOOL],
            tool_choice={"type": "tool", "name": "extract_invoice"},
            messages=[{"role": "user", "content": content}],
        )
    except anthropic.APIError as exc:
        raise RuntimeError(f"Claude API error: {exc}") from exc

    # Find the tool_use block
    for block in response.content:
        if block.type == "tool_use" and block.name == "extract_invoice":
            return block.input

    raise RuntimeError(
        f"Claude did not call extract_invoice. stop_reason={response.stop_reason}"
    )

# ─────────────────────────────────────────────────────────────────────────────
# Math validation
# ─────────────────────────────────────────────────────────────────────────────

class ValidationResult:
    def __init__(self):
        self.errors: list[str] = []
        self.warnings: list[str] = []

    @property
    def passed(self) -> bool:
        return len(self.errors) == 0

    def __str__(self):
        lines = []
        if self.errors:
            lines.append("ERRORS:")
            lines.extend(f"  - {e}" for e in self.errors)
        if self.warnings:
            lines.append("WARNINGS:")
            lines.extend(f"  - {w}" for w in self.warnings)
        if not lines:
            lines.append("All math checks passed.")
        return "\n".join(lines)


def _d(value: Optional[float], default: float = 0.0) -> Decimal:
    """Convert a float (or None) to a Decimal with 2dp for safe comparison."""
    if value is None:
        return Decimal(str(default))
    return Decimal(str(value)).quantize(Decimal("0.01"), rounding=ROUND_HALF_UP)


def validate_math(invoice: dict) -> ValidationResult:
    """
    Validate the arithmetic on an extracted invoice dict.

    Checks:
    1. Each line item: quantity * unit_price == line_total (within $0.01)
    2. Sum of line totals == subtotal (within $0.01)
    3. subtotal + taxes + shipping - discount == grand_total (within $0.01)
    """
    result = ValidationResult()
    totals = invoice.get("totals", {})
    line_items = invoice.get("line_items", [])
    tolerance = Decimal("0.01")

    # --- Check 1: line item arithmetic ---
    computed_subtotal = Decimal("0.00")
    for i, item in enumerate(line_items):
        qty   = _d(item.get("quantity"))
        price = _d(item.get("unit_price"))
        total = _d(item.get("line_total"))
        expected = (qty * price).quantize(Decimal("0.01"), rounding=ROUND_HALF_UP)
        if abs(expected - total) > tolerance:
            result.errors.append(
                f"Line item {i+1} ({item.get('description', '?')!r}): "
                f"{qty} x {price} = {expected}, but line_total is {total}"
            )
        computed_subtotal += total

    # --- Check 2: subtotal ---
    stated_subtotal = _d(totals.get("subtotal"))
    if abs(computed_subtotal - stated_subtotal) > tolerance:
        result.errors.append(
            f"Subtotal mismatch: sum of line totals = {computed_subtotal}, "
            f"stated subtotal = {stated_subtotal}"
        )

    # --- Check 3: grand total ---
    tax_total = Decimal("0.00")
    for tax in (totals.get("taxes") or []):
        tax_total += _d(tax.get("amount"))

    shipping = _d(totals.get("shipping"))
    discount = _d(totals.get("discount"))
    computed_grand = (stated_subtotal + tax_total + shipping - discount).quantize(
        Decimal("0.01"), rounding=ROUND_HALF_UP
    )
    stated_grand = _d(totals.get("grand_total"))

    if abs(computed_grand - stated_grand) > tolerance:
        result.errors.append(
            f"Grand total mismatch: computed {computed_grand} "
            f"(subtotal={stated_subtotal} + tax={tax_total} + shipping={shipping} - discount={discount}), "
            f"stated grand total = {stated_grand}"
        )

    return result

# ─────────────────────────────────────────────────────────────────────────────
# Entry point
# ─────────────────────────────────────────────────────────────────────────────

def process_file(file_path: str) -> dict:
    path = Path(file_path)
    if not path.exists():
        raise FileNotFoundError(f"File not found: {path}")

    print(f"Processing: {path.name}", file=sys.stderr)

    if path.suffix.lower() == ".pdf":
        print("  Rendering PDF pages...", file=sys.stderr)
        images = pdf_to_images(path)
        print(f"  {len(images)} page(s) rendered.", file=sys.stderr)
        blocks = [image_block(mt, data) for mt, data in images]
    else:
        mt, data = load_image_as_base64(path)
        blocks = [image_block(mt, data)]

    print("  Sending to Claude for extraction...", file=sys.stderr)
    invoice = extract_invoice(blocks)

    print("  Validating math...", file=sys.stderr)
    validation = validate_math(invoice)

    output = {
        "invoice": invoice,
        "validation": {
            "passed": validation.passed,
            "errors": validation.errors,
            "warnings": validation.warnings,
        }
    }

    # Write sidecar JSON
    out_path = path.with_suffix(".extracted.json")
    out_path.write_text(json.dumps(output, indent=2, ensure_ascii=False), encoding="utf-8")
    print(f"  Saved: {out_path}", file=sys.stderr)

    return output


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python invoice_extractor.py <invoice.pdf|invoice.png>", file=sys.stderr)
        sys.exit(1)

    result = process_file(sys.argv[1])
    print(json.dumps(result, indent=2, ensure_ascii=False))

    if not result["validation"]["passed"]:
        print("\nValidation FAILED:", file=sys.stderr)
        for err in result["validation"]["errors"]:
            print(f"  {err}", file=sys.stderr)
        sys.exit(2)

Sample run

Given a PNG scan of a simple invoice, the command is:

python invoice_extractor.py sample_invoice.png

Realistic output (stderr lines omitted for brevity):

{
  "invoice": {
    "vendor": {
      "name": "Apex Web Solutions Ltd.",
      "address": "4 Innovation Drive, Toronto, ON M5H 2N2",
      "email": "[email protected]",
      "phone": "+1-416-555-0192",
      "tax_id": "GST/HST: 812 345 678 RT0001"
    },
    "bill_to": {
      "name": "Riverstone Media Inc.",
      "address": "88 Queen Street West, Suite 900, Toronto, ON M5H 2M9"
    },
    "invoice_number": "INV-2026-0451",
    "purchase_order": "PO-RMI-20260115",
    "issue_date": "2026-05-28",
    "due_date": "2026-06-27",
    "currency": "CAD",
    "line_items": [
      {
        "description": "Website redesign - Discovery and wireframes",
        "quantity": 1,
        "unit_price": 2400.00,
        "line_total": 2400.00
      },
      {
        "description": "UI/UX design (8 page templates)",
        "quantity": 8,
        "unit_price": 350.00,
        "line_total": 2800.00
      },
      {
        "description": "Frontend development - React SPA",
        "quantity": 40,
        "unit_price": 120.00,
        "line_total": 4800.00
      },
      {
        "description": "Hosting setup and DNS migration",
        "quantity": 1,
        "unit_price": 250.00,
        "line_total": 250.00
      }
    ],
    "totals": {
      "subtotal": 10250.00,
      "discount": null,
      "shipping": null,
      "taxes": [
        {
          "name": "HST",
          "rate": 0.13,
          "amount": 1332.50
        }
      ],
      "grand_total": 11582.50
    },
    "payment_terms": "Net 30",
    "notes": "Please reference invoice number on all payments. EFT preferred."
  },
  "validation": {
    "passed": true,
    "errors": [],
    "warnings": []
  }
}

Testing the math validator

If the invoice has a typo (say the line total reads $4900 instead of $4800 for the React SPA line), the validator catches it immediately:

Validation FAILED:
  Line item 3 ('Frontend development - React SPA'): 40.00 x 120.00 = 4800.00, but line_total is 4900.00
  Grand total mismatch: computed 11582.50 (...), stated grand total = 11682.50

You can then flag the invoice for manual review rather than writing the wrong number to your ledger.

Handling Multi-Page Invoices

Most invoices fit on one page, but large purchase orders or detailed service contracts can run to two or three pages. The approach in the POC already handles this: pdf_to_images returns one image per page, and all pages are sent as separate image blocks in a single message. Claude treats the sequence as one document.

Token cost for multi-page

Each 1,024 x 1,024 region of an image costs approximately 1,600 input tokens. A typical letter-size invoice rendered at 200 DPI is about 1,700 x 2,200 pixels, which spans four 1,024-regions, so roughly 6,400 tokens per page. At claude-sonnet-4-6 input pricing, that is about $0.019 per page (plus the text tokens for the schema and instruction).

For a 500-invoice-per-month batch, the vision token cost alone is under $10. At 2,000 invoices per month you are still under $40. Compare that to the hourly rate of a person doing manual entry.

Volume (invoices/month)	Pages (est.)	Vision tokens	Approx. cost (Sonnet 4.6)	Manual entry cost (at $20/hr, 3 min each)
100	100	640,000	~$1.90	~$100
500	600	3,840,000	~$11	~$500
2,000	2,500	16,000,000	~$48	~$2,000
10,000	13,000	83,200,000	~$250	~$10,000

Integrating with a Batch Pipeline

The single-file POC above is the right starting point for a proof of concept. For a production pipeline you will want a few additions.

Deduplication

Hash the raw file bytes (SHA-256) before sending. Store the hash in your database. If you see the same hash twice, skip the extraction and return the cached result. Duplicate invoices are common in email-based AP workflows.

Prompt caching for the system prompt

The tool definition (the JSON schema) is roughly 800 tokens. If you are processing invoices in a tight loop, prefix the tool definition text with a cache_control block as described in Part 4 of this series. On cache hit you pay 10% of the input token price for those 800 tokens. At high volumes this saves a noticeable amount.

Concurrency

The Anthropic API allows concurrent requests. Use asyncio with AsyncAnthropic or a thread pool. At Sonnet 4.6’s typical 2-4 second latency, 10 concurrent requests lets you process 150-300 invoices per minute from a single process.

Routing by confidence

If validation fails (math errors) or any required field is null, route the invoice to a human review queue rather than writing to the ERP. This is the same routing pattern covered in Part 14 on email triage. Simple: pass extraction success/failure as a boolean label and queue accordingly.

Incoming Invoice

Hash Check

Claude Vision extract_invoice

Math Validate

ERP Write (pass)

Human Review

Cache Hit (skip API)

pass fail duplicate

Figure 2: Production batch pipeline. Deduplication avoids redundant API calls; math validation gates whether the record goes directly to ERP or to a human review queue.

AI Invoice Data Extraction: Common Pitfalls

1. Image resolution too low

Rendering a PDF at 72 DPI gives a blurry image where small text (especially numeric digits) becomes ambiguous. 150 DPI is the minimum; 200 DPI is the sweet spot for most invoices. 300 DPI adds token cost without meaningfully improving extraction quality on normal printed text. The exception is handwritten invoices, where 300 DPI helps.

2. Trusting extracted amounts without validation

Even a highly accurate model will occasionally misread a digit (8 vs 3, 6 vs 0). Without the math validator you would silently write the wrong number. Always recompute from first principles and flag discrepancies. The validator in the POC handles rounding to $0.01 intentionally, because legitimate invoices sometimes have $0.01 rounding differences from different rounding conventions.

3. Sending oversized images

A 300 DPI A4 scan is 2,480 x 3,508 pixels, which is about 9 image tiles of 1,024 px each. At 1,600 tokens per tile that is roughly 14,400 tokens of image cost per page, before any text tokens. Unless you need to read very fine print, downsample to 200 DPI (1,654 x 2,339 px) to keep costs reasonable without losing extraction quality.

4. Mixing currencies silently

If your vendor operates in one currency and your ERP stores amounts in another, the extracted currency field is critical. Always check it before writing amounts to the ledger. A common failure mode: vendor invoices in CAD, your system assumes USD, and amounts are quietly off by 30%.

5. Assuming one-to-one page-to-invoice mapping

Some AP workflows combine multiple invoices into one PDF (scanned batch). The extraction model will try to interpret all pages as one invoice. Add a pre-classification step (a cheap Haiku call) that decides whether a document is a single invoice or a batch, before sending to the extraction model.

6. Not handling PDF password protection

pdf2image / poppler cannot render password-protected PDFs without the password. Detect the PDFPageCountError that poppler throws and route these files to a separate queue. In practice, about 2-5% of vendor invoices sent over email are accidentally password-protected.

7. Latency expectations in interactive flows

At 2-4 seconds per call this works well in a background job. If you are building a real-time upload-and-preview UX (user uploads invoice and sees the extracted fields immediately), stream the response so the UI can show a spinner with partial data appearing. See Part 26 on streaming for the pattern.

Cost and Latency

The numbers below are based on the claude-sonnet-4-6 model at current public pricing. Vision tokens are charged at the same rate as text input tokens.

Model	Input (per M tokens)	Output (per M tokens)	Typical latency (1 invoice page)	Best for
claude-haiku-4-5	$0.80	$4.00	0.8 – 1.5s	Simple, well-formatted invoices at very high volume
claude-sonnet-4-6	$3.00	$15.00	2 – 4s	Production default: handles varied layouts and ambiguous scans
claude-opus-4-8	$15.00	$75.00	6 – 12s	Complex tables, hand-written or heavily decorated PDFs

For most AP automation use cases Sonnet is the right default. Switch to Haiku only after verifying extraction accuracy on your specific invoice corpus. The quality gap matters more than the cost gap at reasonable volumes.

Consider enabling prompt caching on the tool schema if you process invoices in batches. The schema is about 800 tokens; with caching that costs $0.30 per million on read, versus $3.00 on a cold call. At 10,000 invoices per month the savings on schema tokens alone is around $20.

Beyond Invoices: Related Document Types

The same base64 image block pattern works for any document where you need structured data out of a visual layout:

Purchase orders: same schema, different fields (ship-to vs bill-to, requested delivery date).
Bank statements: extract transaction rows with date, description, debit, credit, running balance.
Receipts: simpler schema (merchant, date, items, total, payment method).
Packing lists: line items with SKU, quantity, weight, dimensions.
Delivery notes: confirm items shipped vs items ordered.

Each case needs its own schema and its own validation logic (for bank statements you validate that debits and credits reconcile to the stated running balance). The extraction pattern stays the same. For a broader look at what Claude’s vision capability can do, the next article in this series covers multimodal image understanding use cases in depth.

If you want to take the pipeline further by having the model call external functions (for example, looking up vendor IDs in your ERP), the tool use patterns in Part 2 and the autonomous agent loop in Part 22 show how to chain those steps.

Frequently Asked Questions

Does this work on handwritten invoices?

Yes, but accuracy drops compared to printed text. Claude handles clean handwriting well. Messy cursive or poor lighting will produce more extraction errors. For handwritten documents, render at 300 DPI instead of 200, and consider running a secondary pass where you ask Claude to double-check any fields it was uncertain about (you can prompt it to include a confidence flag per field in the schema).

Can I process invoices that are already text-layer PDFs (not scans)?

Yes. Rendering a text-layer PDF to an image and sending it through the vision API works well, though you are paying for vision tokens when you could pay cheaper text tokens. For text-layer PDFs you can extract the text with a library like pdfplumber and send the text directly to Claude without any image blocks. The vision approach is better for scanned documents or PDFs where the layout matters (tables, multi-column formatting) and a raw text extract would lose structure.

What if an invoice spans multiple pages and one page has the header while another has the line items?

Send all pages as separate image blocks in the same message, in page order. Claude reads them as a single document and assembles the complete invoice from the combined context. The pdf_to_images function in the POC already does this automatically.

How do I handle invoices in languages other than English?

Claude handles most major languages. The key change is the output: you should specify in your prompt that all field names and metadata should still be in English (or whatever your target schema language is), but free-text fields like description can be kept in the source language or translated depending on your preference. Add an explicit instruction like “Return all field names in English. Keep line item descriptions in the original language.” in your extraction prompt.

Is there a file size limit for images?

The Anthropic API accepts base64-encoded images up to approximately 5 MB per image block (after encoding). A 200 DPI A4 PNG is usually 500 KB to 1.5 MB, well within limits. If you hit the limit, downsample or use JPEG compression. The SDK will return a clear error if the payload is too large.

How accurate is the extraction compared to purpose-built OCR tools like Textract?

On well-formatted digital-to-PDF invoices, Claude’s accuracy is comparable to Textract. On scanned invoices with varied layouts (handwriting, stamps, low contrast), Claude tends to outperform template-based OCR because it reasons about context. The real difference is that Claude understands the semantics: it knows a number after “HST” is a tax amount, not a product code, without any template configuration. The tradeoff is cost: Textract is cheaper at very high volumes. A hybrid approach (Textract for structured, Claude for unstructured) is common in production.

Can I add a confidence score per field?

Add a confidence property (number between 0 and 1) to each field in your schema, and ask Claude to populate it. This is a soft signal, not a calibrated probability, but it usefully identifies which fields Claude found ambiguous. Route invoices where any required field has confidence below 0.8 to human review.

View the full AI in Production series

External references:

AI Invoice Data Extraction: Read PDFs and Invoices with Claude Vision