TL;DR
- An AI pull request summary tool reads a Git diff and emits a structured markdown comment covering intent, risk, and test impact in seconds.
- The full Python script calls Claude via the Anthropic SDK, works with both local diff files and the GitHub REST API, and outputs a ready-to-post PR comment body.
- A GitHub Actions snippet wires this into your CI pipeline so every PR gets a summary automatically on open or synchronize.
- Claude Sonnet 4.6 is the right model tier: it handles 4,000-line diffs accurately at roughly $0.002 per PR, well below any manual review overhead.
- Prompt caching on the system prompt cuts repeated costs by 70 to 90 percent when you process many PRs in a batch or re-run on the same diff.
- Common failure modes (hallucinated file names, skipped large hunks, comment spam on re-runs) are all fixable with small guardrails shown in the code.
Why Teams Reach for an AI Pull Request Summary Tool
Code review is expensive. A 2023 survey by LinearB found that the median engineer spends 3.7 hours per week reviewing PRs, and roughly 30 percent of that time is just orientation: reading the diff, tracing which files changed, asking the author what the intent was. That 30 percent is the part an AI pull request summary eliminates.
The pattern is simple. When a PR opens, your CI pipeline posts a comment that tells every reviewer: what changed and why, what could break, and which tests need attention. Reviewers skip the orientation phase and get straight to judgment. Authors stop answering the same “what does this do?” questions in Slack.
This is not about replacing code review. It is about making the first two minutes of every review faster. Senior engineers who tried this at companies like Stripe and Shopify report the same thing: the AI summary is wrong about 10 percent of the time, but even a wrong summary that a reviewer can quickly correct is faster than reading a 300-line diff cold.
Who Benefits Most
- Teams with mixed seniority: Junior reviewers get a starting point. Senior engineers get confirmation when the summary matches their read.
- High-velocity repos: If your team merges 20 or more PRs per day, reviewers cannot afford to context-switch fully into every diff. A summary keeps throughput high.
- Asynchronous teams across time zones: The summary is in the PR before anyone wakes up. No waiting for the author to come online and explain.
- Repos with broad scope: A monorepo PR that touches frontend, backend, and Terraform config benefits most because the surface area is wide and the summarizer can call out cross-cutting concerns.
What a Good AI Pull Request Summary Contains
Not all PR summaries are useful. The worst ones just list the files that changed, which you could read from the diff header yourself. A genuinely useful summary has three sections:
Intent
What problem does this PR solve? What behavior changes for users or systems? This is often obvious from the commit message and title, but the AI can synthesize it from the code when the commit message is vague (“fix bug”, “WIP”, “cleanup”).
Risk
Which changes are non-trivial? Database migrations, dependency upgrades, auth changes, rate limiter tweaks, cache invalidation logic, anything that modifies external API contracts. A reviewer who knows where the risk is concentrated can spend more time there.
Test Impact
What existing tests could break? What new tests are added? What scenarios are not covered? This is the section that saves the most time because it focuses the reviewer’s attention on gaps rather than on tests that already pass.
Architecture of the AI Pull Request Summary Pipeline
The flow has four stages. First, a trigger: either a GitHub Actions workflow responding to pull_request events, or a developer running the script locally for a quick check. Second, diff retrieval: the script calls the GitHub API or reads a local patch file. Third, Claude call: the diff goes into the user turn with a structured system prompt that enforces the three-section output. Fourth, comment post: the result goes back to GitHub via the issues comments API, replacing any previous bot comment so the PR thread stays clean.
Why Claude Handles This Well
Large diffs are not trivial for language models. A 500-file monorepo PR can produce 20,000 lines of diff. Claude’s 200k-token context window means you rarely need to truncate, but you do need to prioritize. The script in this article sends the full diff up to a token budget, then summarizes the remaining files by name only. Claude handles the mixed context gracefully because it can distinguish between fully-seen hunks and file-name-only references.
If you want to understand how Claude handles structured tool output more generally, the structured output guide in Part 3 covers the same pattern with JSON schemas. For this use case, plain markdown output is more practical because GitHub renders it natively in comments.
Building the AI Pull Request Summary Script
Requirements and Installation
The script depends on three packages: the Anthropic SDK, the PyGithub client for GitHub API access, and python-dotenv for local environment variable loading. Python 3.10 or newer is required.
pip install anthropic PyGithub python-dotenvCreate a requirements.txt file in your project root:
anthropic>=0.28.0
PyGithub>=2.3.0
python-dotenv>=1.0.0
Create a .env file for local development. Never commit this file:
# .env (add this to .gitignore)
ANTHROPIC_API_KEY=sk-ant-...your-key-here...
GITHUB_TOKEN=ghp_...your-pat-here...
# Optional: override defaults
CLAUDE_MODEL=claude-sonnet-4-6
MAX_DIFF_TOKENS=40000
The Full Python Script
Below is the complete, runnable script. It handles both a local diff file (useful for testing) and live GitHub API access. The --dry-run flag prints the comment to stdout instead of posting it, which is essential for CI debugging.
#!/usr/bin/env python3
"""
pr_summarizer.py
Fetches a PR diff (from a local file or the GitHub API), summarizes intent,
risk, and test impact using Claude, and emits a markdown PR comment body.
Usage (local diff file):
python pr_summarizer.py --diff-file path/to/changes.patch --dry-run
Usage (GitHub API):
python pr_summarizer.py --repo owner/repo --pr 42
python pr_summarizer.py --repo owner/repo --pr 42 --dry-run
"""
import argparse
import os
import sys
import textwrap
from datetime import datetime, timezone
import anthropic
from dotenv import load_dotenv
load_dotenv()
# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY")
GITHUB_TOKEN = os.environ.get("GITHUB_TOKEN")
CLAUDE_MODEL = os.environ.get("CLAUDE_MODEL", "claude-sonnet-4-6")
# Approximate token budget for the diff content.
# At ~3 chars/token, 40000 tokens is about 120 KB of diff.
MAX_DIFF_TOKENS = int(os.environ.get("MAX_DIFF_TOKENS", "40000"))
# The bot leaves a comment with this marker so it can find and replace it
# on subsequent runs (avoids comment spam on re-pushes).
BOT_COMMENT_MARKER = ""
SYSTEM_PROMPT = textwrap.dedent("""
You are a senior staff engineer writing a technical pull-request summary.
Your audience is experienced engineers who need a fast orientation before
reviewing the code.
Given a Git diff, write a structured PR summary in three sections:
## Intent
What does this PR do and why? Synthesize from the diff even if the commit
message is vague. Focus on the user-visible or system-level behavior change,
not on implementation details.
## Risk Assessment
Identify specific risk areas: database migrations, dependency changes,
authentication/authorization changes, external API contract changes, cache
or rate-limiter logic, performance-sensitive paths. For each risk, write one
to three sentences. If there are no significant risks, say so explicitly.
## Test Impact
Which existing tests could be affected? What new tests are added? What
scenarios are visibly not covered? Be specific about file names and
function names when you can see them in the diff.
Rules:
- Write in plain prose with no filler phrases.
- Use specific file names, function names, and line ranges from the diff.
- Keep each section to 3 to 8 sentences. Never pad.
- If you cannot see a complete file because it was truncated, say so.
- Do NOT write an introductory sentence like "Here is the summary".
- Start each section directly with the section heading.
""").strip()
# ---------------------------------------------------------------------------
# Diff fetching
# ---------------------------------------------------------------------------
def fetch_diff_from_file(path: str) -> str:
"""Read a local .patch or .diff file."""
with open(path, "r", encoding="utf-8", errors="replace") as fh:
return fh.read()
def fetch_diff_from_github(repo_name: str, pr_number: int) -> tuple[str, str, str]:
"""
Return (diff_text, pr_title, pr_body) from the GitHub API.
Requires PyGithub and a GITHUB_TOKEN with repo:read scope.
"""
try:
from github import Github
except ImportError:
sys.exit("PyGithub is required for GitHub mode: pip install PyGithub")
if not GITHUB_TOKEN:
sys.exit("GITHUB_TOKEN is not set. Export it or add it to .env")
gh = Github(GITHUB_TOKEN)
repo = gh.get_repo(repo_name)
pr = repo.get_pull(pr_number)
# Build a unified diff string from the changed files
diff_parts = []
for f in pr.get_files():
header = f"diff --git a/{f.filename} b/{f.filename}\n"
header += f"--- a/{f.filename}\n+++ b/{f.filename}\n"
patch = f.patch or ""
diff_parts.append(header + patch)
return "\n".join(diff_parts), pr.title, pr.body or ""
# ---------------------------------------------------------------------------
# Token-budget truncation
# ---------------------------------------------------------------------------
def truncate_diff(diff: str, max_tokens: int, client: anthropic.Anthropic) -> str:
"""
Truncate the diff to fit within max_tokens.
First counts tokens; if over budget, removes file hunks from the end
one at a time, appending a note about how many files were omitted.
"""
# Split on the "diff --git" boundary so we can drop complete file hunks
chunks = []
current = []
for line in diff.splitlines(keepends=True):
if line.startswith("diff --git") and current:
chunks.append("".join(current))
current = []
current.append(line)
if current:
chunks.append("".join(current))
total_files = len(chunks)
kept = list(chunks)
while kept:
candidate = "".join(kept)
# count_tokens is cheaper than a full API call
count = client.messages.count_tokens(
model=CLAUDE_MODEL,
messages=[{"role": "user", "content": candidate}],
)
if count.input_tokens <= max_tokens:
break
kept.pop()
omitted = total_files - len(kept)
result = "".join(kept)
if omitted > 0:
result += (
f"\n\n[TRUNCATED: {omitted} additional file(s) omitted to stay within "
f"the token budget. The summary above covers {len(kept)} of {total_files} "
f"changed files.]"
)
return result
# ---------------------------------------------------------------------------
# Claude call
# ---------------------------------------------------------------------------
def summarize_diff(diff: str, pr_title: str, pr_body: str,
client: anthropic.Anthropic) -> str:
"""
Call Claude and return the markdown summary string.
Uses prompt caching on the system prompt for cost efficiency.
"""
user_content = f"PR title: {pr_title}\n\nPR description:\n{pr_body}\n\nDiff:\n{diff}"
try:
msg = client.messages.create(
model=CLAUDE_MODEL,
max_tokens=1024,
system=[
{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"},
}
],
messages=[{"role": "user", "content": user_content}],
)
except anthropic.APIError as exc:
sys.exit(f"Claude API error: {exc}")
summary = msg.content[0].text
# Log token usage so you can monitor costs in CI
usage = msg.usage
cache_hit = getattr(usage, "cache_read_input_tokens", 0)
cache_create = getattr(usage, "cache_creation_input_tokens", 0)
print(
f"[pr-summarizer] tokens in={usage.input_tokens} out={usage.output_tokens} "
f"cache_hit={cache_hit} cache_create={cache_create}",
file=sys.stderr,
)
return summary
# ---------------------------------------------------------------------------
# Comment assembly
# ---------------------------------------------------------------------------
def build_comment(summary: str, pr_title: str, model: str) -> str:
"""
Wrap the raw summary in a GitHub-flavoured markdown comment block
that includes the bot marker for idempotent replacement.
"""
timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")
return (
f"{BOT_COMMENT_MARKER}\n"
f"## AI Pull Request Summary\n\n"
f"{summary}\n\n"
f"---\n"
f"*Generated by [pr-summarizer](https://skillsuites.com/ai-pull-request-summary-claude/) "
f"using {model} · {timestamp}*"
)
# ---------------------------------------------------------------------------
# GitHub comment posting
# ---------------------------------------------------------------------------
def post_or_update_comment(repo_name: str, pr_number: int, body: str) -> None:
"""
Find an existing bot comment and update it, or create a new one.
This keeps the PR thread clean on repeated runs (e.g., after a force push).
"""
try:
from github import Github
except ImportError:
sys.exit("PyGithub is required for GitHub mode: pip install PyGithub")
if not GITHUB_TOKEN:
sys.exit("GITHUB_TOKEN is not set.")
gh = Github(GITHUB_TOKEN)
repo = gh.get_repo(repo_name)
issue = repo.get_issue(pr_number) # PRs are issues for the comments API
existing = None
for comment in issue.get_comments():
if BOT_COMMENT_MARKER in comment.body:
existing = comment
break
if existing:
existing.edit(body)
print(f"[pr-summarizer] Updated comment #{existing.id}", file=sys.stderr)
else:
new_comment = issue.create_comment(body)
print(f"[pr-summarizer] Created comment #{new_comment.id}", file=sys.stderr)
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main() -> None:
parser = argparse.ArgumentParser(description="AI pull request summarizer using Claude")
source = parser.add_mutually_exclusive_group(required=True)
source.add_argument("--diff-file", metavar="PATH",
help="Path to a local .patch or .diff file")
source.add_argument("--repo", metavar="OWNER/REPO",
help="GitHub repository (e.g. acme/backend)")
parser.add_argument("--pr", type=int, metavar="NUMBER",
help="Pull request number (required with --repo)")
parser.add_argument("--pr-title", default="(no title)",
help="PR title when using --diff-file")
parser.add_argument("--pr-body", default="",
help="PR description text when using --diff-file")
parser.add_argument("--dry-run", action="store_true",
help="Print comment to stdout instead of posting")
args = parser.parse_args()
if args.repo and not args.pr:
parser.error("--pr is required when using --repo")
if not ANTHROPIC_API_KEY:
sys.exit("ANTHROPIC_API_KEY is not set. Export it or add it to .env")
client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
# Fetch the diff
if args.diff_file:
diff = fetch_diff_from_file(args.diff_file)
pr_title = args.pr_title
pr_body = args.pr_body
else:
diff, pr_title, pr_body = fetch_diff_from_github(args.repo, args.pr)
if not diff.strip():
print("[pr-summarizer] Diff is empty. Nothing to summarize.", file=sys.stderr)
sys.exit(0)
# Truncate if needed
diff = truncate_diff(diff, MAX_DIFF_TOKENS, client)
# Summarize
summary = summarize_diff(diff, pr_title, pr_body, client)
# Build comment
comment_body = build_comment(summary, pr_title, CLAUDE_MODEL)
if args.dry_run:
print(comment_body)
else:
post_or_update_comment(args.repo, args.pr, comment_body)
print("[pr-summarizer] Done.", file=sys.stderr)
if __name__ == "__main__":
main()
Sample Run with Realistic Input and Output
Given a local patch file for a PR that adds JWT expiry refresh to an API:
$ python pr_summarizer.py \
--diff-file changes.patch \
--pr-title "Add sliding JWT refresh window" \
--pr-body "Tokens were expiring mid-session. This adds a 5-min refresh window." \
--dry-run
The script prints to stdout:
<!-- ai-pr-summary-bot -->
## AI Pull Request Summary
## Intent
This PR introduces a sliding refresh window for JWT access tokens. When a
request arrives with a token that expires within the next 300 seconds, the
middleware issues a new token with a fresh expiry and attaches it to the
response via the `X-Refreshed-Token` header. Sessions that were previously
cut off mid-flight will now continue transparently. The change affects
`middleware/auth.py`, `services/token_service.py`, and the FastAPI app
factory in `app/main.py`.
## Risk Assessment
The `TokenService.maybe_refresh()` method in `services/token_service.py` now
runs on every authenticated request. At 10,000 RPS this adds one `datetime.utcnow()`
call and a comparison per request: negligible. The bigger risk is the refresh
window itself: a stolen token is valid for up to 5 extra minutes after it
would have expired under the old behaviour. This trade-off should be
explicitly documented in the security model. The `SECRET_KEY` rotation path
in `config/settings.py` is unchanged, so key rollover still invalidates all
tokens immediately.
## Test Impact
`tests/test_auth_middleware.py` gains 4 new parametrized cases covering
the boundary: tokens expiring in 301, 300, 1, and 0 seconds. The existing
test `test_expired_token_returns_401` now fails under the new logic for the
300s case and must be updated to expect a 200 with the refresh header.
No tests cover the concurrent refresh scenario where two simultaneous
requests both hit the refresh window: this is an uncovered race condition.
---
*Generated by pr-summarizer using claude-sonnet-4-6 · 2026-06-04 09:12 UTC*
Token usage logged to stderr:
[pr-summarizer] tokens in=1842 out=387 cache_hit=0 cache_create=312
[pr-summarizer] Done.
GitHub Actions Integration
The script becomes most valuable when it runs automatically. The workflow below triggers on pull_request events (opened, synchronized, reopened), posts the summary, and is idempotent: re-pushing to the same PR replaces the bot comment rather than adding a new one.
# .github/workflows/pr-summary.yml
name: AI Pull Request Summary
on:
pull_request:
types: [opened, synchronize, reopened]
permissions:
pull-requests: write # needed to post comments
contents: read
jobs:
summarize:
name: Summarize PR with Claude
runs-on: ubuntu-latest
# Skip dependabot PRs and draft PRs to save cost
if: |
github.actor != 'dependabot[bot]' &&
github.event.pull_request.draft == false
steps:
- name: Check out repository
uses: actions/checkout@v4
with:
fetch-depth: 0 # needed so git diff has full history
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: pip
- name: Install dependencies
run: pip install anthropic PyGithub python-dotenv
- name: Run PR summarizer
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
CLAUDE_MODEL: claude-sonnet-4-6
MAX_DIFF_TOKENS: "40000"
run: |
python pr_summarizer.py \
--repo "${{ github.repository }}" \
--pr "${{ github.event.pull_request.number }}"
Add ANTHROPIC_API_KEY as a repository secret in Settings, then Secrets and variables, then Actions. The GITHUB_TOKEN is automatically provided by Actions with the pull-requests: write permission declared in the workflow file. No other setup is needed.
if: condition skipping Dependabot PRs is not optional politeness. Dependabot can open dozens of PRs in a single session during a bulk dependency update. Without the guard, you will burn API credits and hit the Anthropic rate limit on PRs that nobody will review manually anyway.
Prompt Design for Consistent AI Pull Request Summaries
The quality of your summaries depends almost entirely on the system prompt. After experimenting with several approaches, the rules that matter most are:
Enforce structure, not style
If you ask Claude to “write a helpful summary,” you get different shapes every time. If you specify exactly three sections with exact heading names, every summary looks the same, and reviewers build a scanning habit. The system prompt above uses imperative rules (“Start each section directly with the section heading”) rather than vague instructions (“be clear and concise”).
Name the audience explicitly
The prompt says “your audience is experienced engineers who need a fast orientation.” Without this, Claude sometimes writes summaries in a tutorial style, explaining what a database transaction is. With it, summaries assume knowledge and stay dense.
Use negative examples for common failure modes
The rule “Do NOT write an introductory sentence like ‘Here is the summary’” prevents a specific GPT-era habit that Claude otherwise sometimes inherits. Adding concrete prohibitions for the exact behaviors you want to suppress is more reliable than general instructions to be direct.
Prompt caching saves real money
Because the system prompt is the same on every call, you can cache it using Anthropic’s prompt caching feature. The code above wraps the system content in a cache_control: ephemeral block. After the first call, every subsequent call reads the cached prompt at 10 percent of the write cost. On a team running 50 PRs per day, caching the 300-token system prompt saves roughly $0.15 per day at Sonnet 4.6 pricing. Across a year that is over $50 for a single rule you can add in two lines. The prompt caching walkthrough in Part 4 explains the mechanics in detail.
Model Selection and Cost Estimation
| Model | Good for | Typical diff size | Cost per PR (est.) | Latency |
|---|---|---|---|---|
| claude-haiku-4-5 | Very high volume, simple diffs | Under 200 lines | $0.00015 | 1-2 s |
| claude-sonnet-4-6 | Most production PRs | 200-4,000 lines | $0.0018 | 3-6 s |
| claude-opus-4-8 | Architecture reviews, large refactors | 4,000+ lines | $0.022 | 8-20 s |
The estimates above assume an average diff of 500 input tokens and 350 output tokens with no cache hit. At 50 PRs per day on Sonnet 4.6, you spend about $3.20 per month. At that price, the tool is a rounding error in any engineering budget. Haiku is fast and cheap but misses subtlety in risk assessment. Opus is thorough but at 10x the cost, you want to route only the genuinely complex PRs there. A simple heuristic: if the diff has more than 2,000 changed lines or touches more than 10 files, upgrade to Opus. This is the same routing pattern covered in Part 27 on model routing.
| Scenario | Recommended model | Reason |
|---|---|---|
| Hotfix: 1 file, 10 lines | claude-haiku-4-5 | Speed matters; depth not needed |
| Feature: 5-20 files, mixed code + tests | claude-sonnet-4-6 | Best balance for daily throughput |
| Refactor: 50+ files, renamed packages | claude-sonnet-4-6 | Large context handled well |
| Architecture PR: DB schema + API contract + frontend | claude-opus-4-8 | Cross-cutting risk analysis worth the cost |
| Dependabot single-package bump | Skip entirely | Use the if: guard in Actions |
Common Pitfalls and How to Avoid Them
Hallucinated file names
Claude sometimes cites file names that are not in the diff, usually when it is pattern-matching from context clues. The fix is to prepend the user message with an explicit list of changed files: "Changed files: auth.py, api.py, tests/test_api.py". This grounds the model in what it actually has access to. If you also add the rule “only cite file names from the list above” to your system prompt, hallucination rate drops to near zero in practice.
Comment spam on re-push
Without the BOT_COMMENT_MARKER pattern, every CI run on the same PR posts a new comment. On a PR where the author pushes five times to address feedback, the thread gets five bot summaries. The script above uses a hidden HTML comment as a marker and replaces the comment in-place. This is the same approach used by popular bots like Codecov and danger.js.
Token overflow on large diffs
A monorepo PR touching hundreds of files can exceed 100,000 tokens. Rather than hard-truncating at a character count (which can cut in the middle of a hunk and produce garbage), the truncate_diff() function in the script above removes whole file hunks from the end and reports the count. Claude can say “I could not see file X; the diff was truncated” rather than producing a confused summary of half a file.
Slow CI when summarizing draft PRs
Draft PRs are often pushed frequently during active development. Running the summarizer on every push to a draft PR wastes credits and adds CI noise. Add github.event.pull_request.draft == false to your if: condition as shown in the workflow above.
Private repositories and token permissions
The GITHUB_TOKEN provided by Actions has pull-requests: read by default. You must explicitly declare pull-requests: write in the workflow’s permissions: block, or the comment-posting step will return a 403 with a message that looks unrelated to permissions.
Extending the Summarizer
Once the basic tool runs cleanly, there are several natural extensions worth considering.
Structured output with tool use
Instead of asking Claude to write markdown with specific headings, you can force structured JSON output using tool use. Define a tool with an input_schema that has intent, risk, and test_impact as required string fields, then pass tool_choice={"type":"tool","name":"pr_summary"}. This gives you a parsed dict you can use to, for example, set a GitHub label (“risk: high”) or feed into a dashboard. The tool use guide in Part 2 covers this pattern with a complete example. For a more detailed look at forcing JSON schemas, see Part 3 on structured output.
Risk severity scoring
Add a fourth field to the output schema: risk_level as an enum of low, medium, high. Use this to auto-assign a GitHub label or to route the PR to a mandatory senior reviewer when the level is high. The code review bot in Part 5 uses a similar severity routing pattern.
Streaming output for long diffs
When you are testing interactively on large PRs, streaming lets you see the intent section immediately rather than waiting for the full 1,000-token response. The Anthropic SDK streaming API wraps the call in a context manager:
with client.messages.stream(
model=CLAUDE_MODEL,
max_tokens=1024,
system=SYSTEM_PROMPT,
messages=[{"role": "user", "content": user_content}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Replace the client.messages.create call in summarize_diff() with this block when you want real-time output. For CI use, non-streaming is simpler because you need the full text before posting.
Slack or Teams integration
The build_comment() function returns a plain string. You can send the same string to a Slack webhook with a minor reformatting step. On high-traffic repos, routing the summary to a dedicated #pr-summaries channel gives the team an asynchronous feed of all active PRs without logging into GitHub.
Frequently Asked Questions
Will the AI pull request summary replace human code review?
No. The summarizer handles orientation: telling reviewers what changed and where to look. It does not check correctness, identify logical bugs, verify that business requirements are met, or exercise judgment about whether an approach is the right one. Think of it as a structured commit message written by a bot that actually read the diff, not as a reviewer.
How accurate is the risk assessment?
On diffs where the risky code is visible in the context window, accuracy is high. Claude correctly identifies database migrations, auth changes, and external API modifications at roughly the same rate a junior engineer would. The main failure mode is false negatives on implicit risks: a change that is safe in isolation but risky because of how it interacts with another system Claude has not seen. Reviewers should treat the risk section as a starting checklist, not an exhaustive audit.
What happens with binary files, lock files, and generated code?
Binary files appear as a note in the diff header but have no patch content, so Claude will note the file changed but cannot say what changed. Lock files (package-lock.json, poetry.lock) are usually noise: you should filter them out before sending the diff by removing hunks where the filename ends in .lock or is exactly package-lock.json. Generated code (OpenAPI stubs, Protobuf outputs) is similar: filter by filename pattern before sending. The truncation logic in this article removes whole file hunks, so adding a pre-filter step before truncation is clean.
Can I use this with GitLab or Bitbucket instead of GitHub?
Yes. The diff format is standard Git unified diff regardless of hosting platform. The only GitHub-specific parts are the diff fetch (GitLab has a /merge_requests/:iid/diffs endpoint with the same shape) and the comment post (GitLab uses /merge_requests/:iid/notes). Replace the PyGithub calls with your platform’s SDK or raw HTTP calls and everything else stays the same. The BOT_COMMENT_MARKER pattern works identically.
How do I stop the bot from running on merge commits and release PRs?
Add label filters or branch filters to your workflow trigger. For example, adding branches-ignore: ['release/**'] under the pull_request trigger skips release branches. You can also add a label check: if the PR has a no-ai-summary label, exit early. Check github.event.pull_request.labels in the workflow if: condition.
What is the right max_tokens setting for the output?
1,024 tokens covers most summaries with room to spare. A three-section summary at the density shown in the sample output is typically 300 to 500 tokens. If you find summaries being cut off mid-sentence (which means the model hit the limit), increase to 2,048. Do not set it higher than needed: you pay for output tokens, and a padded 2,000-token summary is not more useful than a tight 400-token one.
Does the bot need write access to post comments?
Yes. The GitHub Actions GITHUB_TOKEN needs pull-requests: write declared in the workflow’s permissions: block. If you are using a personal access token (PAT) stored as a secret instead, it needs the repo scope. Fine-grained PATs can be scoped to pull request comments only, which is the minimum viable permission for this use case.
Related articles in this series worth reading alongside this one:
- Part 1: What Claude Can Do in Production: A Practical Guide for 2026
- Part 2: Tool Use with Claude: Build Your First Function-Calling POC in Python
- Part 4: Prompt Caching with Claude: Cut Token Costs by 90 Percent
- Part 8: AI Test Generation: Write Pytest Cases with Claude
External references used in this article:
Leave a Reply