TL;DR
- AI infrastructure as code review with Claude catches security misconfigs, open security groups, and missing tags that manual reviewers routinely miss under deadline pressure.
- You can send a raw
.tffile or aterraform plan -jsonoutput to Claude and get back structured JSON findings, ready to plug into CI pipelines or Slack alerts. - The POC uses Claude’s structured output via tool use so findings are machine-readable and consistent across every run.
- Prompt caching keeps costs low when you review large plan files repeatedly (cache hit rate typically exceeds 80 percent on the system prompt).
- Claude spots issues in categories: security, cost, tagging, naming, and drift risk. You control which categories matter.
- End-to-end: install, configure, run, and parse findings in under 50 lines of Python.
Why Your Terraform Reviews Need a Second Set of Eyes
Every team that operates cloud infrastructure has a Terraform review horror story. A security group with 0.0.0.0/0 ingress on port 22 that slipped through a Friday-afternoon PR. An S3 bucket with public ACLs because the author copy-pasted from a three-year-old blog post. An RDS instance with deletion_protection = false that nobody noticed until the staging environment vanished. These are not hypothetical: they are recurring patterns in incident post-mortems across companies of every size.
Manual Terraform review is slow and inconsistent. Even the best engineers miss things when they are reviewing their fifteenth PR of the week. Static analysis tools like tfsec and checkov are valuable, but they check against fixed rule lists. They cannot reason about your specific context: “this environment is production, cost anomalies matter, and every resource must carry a cost-center tag or finance will reject the bill.”
AI infrastructure as code review with Claude fills that gap. You send Claude the Terraform source or plan output, describe your organization’s standards, and get back a structured list of findings with severity, category, and remediation advice. Claude can apply contextual judgment that rule-based scanners cannot.
This article builds a complete, runnable POC. It uses the Anthropic Python SDK, Claude’s tool-use feature for guaranteed JSON output, and optional prompt caching to keep costs low when reviewing large plans repeatedly. If you have not read Part 3 on structured JSON output or Part 4 on prompt caching, skim those first as this article builds on both patterns.
What AI Infrastructure as Code Review Actually Checks
Before writing code, it helps to be specific about what categories of problems Claude can find in Terraform. The list below comes from real review sessions, not marketing copy.
Security Misconfigurations
- Security groups with overly broad ingress rules (
0.0.0.0/0on administrative ports). - S3 buckets missing
server_side_encryption_configurationor withacl = "public-read". - RDS instances with
publicly_accessible = true. - IAM roles with
Action = "*"orResource = "*"in inline policies. - ECS tasks running with host-network mode or privileged containers.
- Secrets passed as plain environment variables instead of via SSM or Secrets Manager references.
- KMS keys with key rotation disabled.
Cost and Sizing Issues
- EC2 instances with unexpectedly large instance types (a
t3.2xlargefor what looks like a dev environment). - NAT gateways deployed per-AZ in environments where a single NAT would suffice.
- RDS instances without
auto_minor_version_upgradeor with Multi-AZ enabled in non-production. - CloudWatch log groups with no
retention_in_days(they accumulate forever). - Provisioned DynamoDB capacity with no autoscaling policy.
Tagging and Governance
- Resources missing required tags (
env,owner,cost-center,managed-by). - Tag values that do not match your naming convention.
- Resources that are not covered by a
default_tagsprovider block.
Reliability and Drift Risk
lifecycle { prevent_destroy = false }on stateful resources.- Hardcoded AMI IDs that will drift as AWS updates base images.
- Missing
depends_onthat could cause race conditions during apply. - Resources referencing
datasources that may not exist in all environments.
System Architecture of the Reviewer
The flow is simple by design. Your CI job reads the Terraform source (or the JSON output of terraform plan -json), calls the Python script, and receives an array of finding objects. Each finding carries enough data to fail a PR check or route to the correct team.
The Claude Tool-Use Pattern for Structured Findings
The key design decision in this POC is using Claude’s tool-use feature (covered in Part 2) rather than asking Claude to produce JSON in the response text. Tool use gives you a schema-validated, reliably structured object every time. Claude cannot produce malformed JSON when it is constrained to fill a tool’s input schema.
The tool is named report_findings. Its schema describes an array of finding objects. By passing tool_choice={"type": "tool", "name": "report_findings"}, we force Claude to call that tool and only that tool, so block.input is always our findings array, never free-form text.
Complete POC: Terraform Security and Cost Reviewer
Installation
pip install anthropic python-dotenvrequirements.txt
anthropic>=0.28.0
python-dotenv>=1.0.0
.env Example
# .env (never commit this file)
ANTHROPIC_API_KEY=sk-ant-...
sample_infra.tf (Test Input)
# sample_infra.tf
# Intentionally contains several problems for the reviewer to find.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
# Problem 1: No default_tags block on the provider - tags must be set per-resource.
resource "aws_security_group" "web" {
name = "web-sg"
description = "Web server security group"
vpc_id = "vpc-0abc1234def56789"
# Problem 2: SSH open to the world
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Problem 3: All traffic egress - acceptable but worth flagging
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
# Problem 4: Missing tags
}
resource "aws_instance" "app" {
ami = "ami-0c55b159cbfafe1f0" # Problem 5: hardcoded AMI
instance_type = "t3.2xlarge" # Problem 6: suspiciously large for a sample
vpc_security_group_ids = [aws_security_group.web.id]
# Problem 7: no key_name - SSH access via security group is open but no key pair is set
tags = {
Name = "app-server"
# Missing: env, owner, cost-center
}
}
resource "aws_s3_bucket" "data" {
bucket = "my-company-data-bucket"
# Problem 8: no encryption configuration
# Problem 9: no versioning
}
resource "aws_s3_bucket_acl" "data_acl" {
bucket = aws_s3_bucket.data.id
acl = "public-read" # Problem 10: PUBLIC READ on a bucket named "data"
}
resource "aws_db_instance" "postgres" {
identifier = "app-postgres"
engine = "postgres"
engine_version = "15.3"
instance_class = "db.t3.micro"
allocated_storage = 20
db_name = "appdb"
username = "admin"
password = "hardcoded-password-123" # Problem 11: hardcoded secret
publicly_accessible = true # Problem 12: public DB
skip_final_snapshot = true # Problem 13: no snapshot on destroy
deletion_protection = false # Problem 14: deletion protection off
tags = {
Name = "app-postgres"
# Missing required tags
}
}
resource "aws_cloudwatch_log_group" "app" {
name = "/app/logs"
# Problem 15: no retention_in_days - logs accumulate forever
}
tf_reviewer.py (Full Source)
#!/usr/bin/env python3
"""
tf_reviewer.py
--------------
AI infrastructure as code review using Claude.
Reads a Terraform file (or plan JSON) and returns findings as structured JSON.
Usage:
python tf_reviewer.py path/to/infra.tf
python tf_reviewer.py path/to/plan.json --format plan
"""
import json
import sys
import os
import argparse
import textwrap
from pathlib import Path
import anthropic
from dotenv import load_dotenv
load_dotenv()
# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
MODEL = "claude-sonnet-4-6"
MAX_TOKENS = 4096
# The JSON schema for a single finding
FINDING_SCHEMA = {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "Short unique identifier, e.g. SEC-001"
},
"severity": {
"type": "string",
"enum": ["critical", "high", "medium", "low", "info"],
"description": "Severity level"
},
"category": {
"type": "string",
"enum": ["security", "cost", "tagging", "reliability", "naming"],
"description": "Problem category"
},
"resource": {
"type": "string",
"description": "The Terraform resource address, e.g. aws_instance.app"
},
"attribute": {
"type": "string",
"description": "The specific attribute or block that has the problem"
},
"title": {
"type": "string",
"description": "One-line title of the finding"
},
"description": {
"type": "string",
"description": "Detailed explanation of why this is a problem"
},
"remediation": {
"type": "string",
"description": "Concrete Terraform snippet or action to fix the issue"
}
},
"required": ["id", "severity", "category", "resource", "title", "description", "remediation"]
}
# Tool definition: Claude must call this tool with all findings
REPORT_TOOL = {
"name": "report_findings",
"description": (
"Report all security, cost, tagging, reliability, and naming findings "
"discovered in the Terraform source or plan. Call this tool exactly once "
"with the complete list of findings."
),
"input_schema": {
"type": "object",
"properties": {
"findings": {
"type": "array",
"items": FINDING_SCHEMA,
"description": "Array of all findings. May be empty if no issues are found."
},
"summary": {
"type": "string",
"description": "One-paragraph executive summary of the review results."
},
"critical_count": {
"type": "integer",
"description": "Number of critical severity findings."
},
"high_count": {
"type": "integer",
"description": "Number of high severity findings."
}
},
"required": ["findings", "summary", "critical_count", "high_count"]
}
}
# ---------------------------------------------------------------------------
# System prompt (cached for cost efficiency on large files reviewed repeatedly)
# ---------------------------------------------------------------------------
SYSTEM_PROMPT = textwrap.dedent("""
You are a senior cloud security and cost engineer who reviews Terraform infrastructure-as-code.
Your job is to find ALL problems in the provided Terraform source or plan JSON, categorized as:
SECURITY (highest priority):
- Security groups with 0.0.0.0/0 ingress on any port, especially 22 (SSH), 3389 (RDP), 5432 (Postgres), 3306 (MySQL)
- S3 buckets with public ACLs or missing server-side encryption
- RDS or ElastiCache instances with publicly_accessible = true
- IAM policies with Action="*" or Resource="*"
- Hardcoded secrets, passwords, or API keys in resource attributes
- Resources missing encryption at rest or in transit settings
- KMS keys with key rotation disabled
- ECS tasks running privileged containers or with host network mode
COST:
- Oversized instance types for the apparent use case
- CloudWatch log groups without retention_in_days (they accumulate indefinitely)
- NAT gateways deployed unnecessarily per-AZ in non-production environments
- Provisioned DynamoDB capacity without autoscaling
- RDS Multi-AZ enabled in non-production environments
- Resources without lifecycle rules that could accumulate storage costs
TAGGING:
- Resources missing required tags: env, owner, cost-center, managed-by
- Provider blocks missing default_tags
- Tag values that look like placeholders or test values
RELIABILITY:
- Stateful resources without lifecycle { prevent_destroy = true }
- RDS instances with skip_final_snapshot = true or deletion_protection = false
- Hardcoded AMI IDs that will drift
- Missing depends_on for resources with implicit dependencies
NAMING:
- Resource names that do not follow a clear convention
- Ambiguous or generic names that will cause confusion at scale
Rules for your review:
1. Be thorough. Check every resource block.
2. Assign severity: critical (immediate risk, e.g. exposed secrets or fully open SGs),
high (significant risk), medium (should fix soon), low (best practice), info (FYI).
3. For each finding, give a concrete remediation: an actual Terraform snippet or specific action.
4. Do NOT invent problems that are not actually in the code.
5. Call report_findings exactly once with the complete array.
""").strip()
# ---------------------------------------------------------------------------
# Core reviewer function
# ---------------------------------------------------------------------------
def review_terraform(terraform_content: str, file_label: str = "infra.tf") -> dict:
"""
Send Terraform content to Claude for AI infrastructure as code review.
Returns the parsed tool input dict from Claude's report_findings call.
"""
client = anthropic.Anthropic()
# Build the user message
user_message = (
f"Please review the following Terraform source file (`{file_label}`) "
"and call report_findings with all issues you discover.\n\n"
f"```hcl\n{terraform_content}\n```"
)
# Use cached system prompt for efficiency (saves tokens on repeated calls)
system_block = [
{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"}
}
]
try:
response = client.messages.create(
model=MODEL,
max_tokens=MAX_TOKENS,
system=system_block,
tools=[REPORT_TOOL],
tool_choice={"type": "tool", "name": "report_findings"},
messages=[
{"role": "user", "content": user_message}
]
)
except anthropic.APIError as exc:
print(f"[ERROR] Claude API call failed: {exc}", file=sys.stderr)
sys.exit(1)
# Extract the tool_use block
tool_block = None
for block in response.content:
if block.type == "tool_use" and block.name == "report_findings":
tool_block = block
break
if tool_block is None:
print("[ERROR] Claude did not call report_findings. Response:", file=sys.stderr)
print(response.model_dump_json(indent=2), file=sys.stderr)
sys.exit(1)
# Log token usage (useful for cost tracking)
usage = response.usage
cache_creation = getattr(usage, "cache_creation_input_tokens", 0)
cache_read = getattr(usage, "cache_read_input_tokens", 0)
print(
f"[INFO] Tokens: input={usage.input_tokens} output={usage.output_tokens} "
f"cache_created={cache_creation} cache_read={cache_read}",
file=sys.stderr
)
return tool_block.input
# ---------------------------------------------------------------------------
# Output formatting
# ---------------------------------------------------------------------------
SEVERITY_ICON = {
"critical": "[CRITICAL]",
"high": "[HIGH] ",
"medium": "[MEDIUM] ",
"low": "[LOW] ",
"info": "[INFO] ",
}
def print_findings(result: dict) -> None:
"""Pretty-print findings to stdout in a CI-friendly format."""
findings = result.get("findings", [])
summary = result.get("summary", "")
critical = result.get("critical_count", 0)
high = result.get("high_count", 0)
print("\n" + "=" * 70)
print(" TERRAFORM AI REVIEW RESULTS")
print("=" * 70)
print(f"\nSummary: {summary}\n")
print(f"Critical findings : {critical}")
print(f"High findings : {high}")
print(f"Total findings : {len(findings)}\n")
print("-" * 70)
# Sort by severity for readability
severity_order = {"critical": 0, "high": 1, "medium": 2, "low": 3, "info": 4}
sorted_findings = sorted(findings, key=lambda f: severity_order.get(f.get("severity", "info"), 5))
for finding in sorted_findings:
sev = finding.get("severity", "info")
icon = SEVERITY_ICON.get(sev, "[???] ")
print(f"{icon} {finding.get('id', 'N/A')} | {finding.get('resource', 'unknown')}")
print(f" Category : {finding.get('category', '?')}")
print(f" Title : {finding.get('title', '')}")
print(f" Detail : {finding.get('description', '')}")
print(f" Fix : {finding.get('remediation', '')}")
print()
print("=" * 70)
# ---------------------------------------------------------------------------
# Exit code logic for CI integration
# ---------------------------------------------------------------------------
def exit_code(result: dict, fail_on: str = "high") -> int:
"""
Return non-zero if any finding meets or exceeds the fail_on severity.
fail_on: 'critical' | 'high' | 'medium' | 'low' | 'never'
"""
if fail_on == "never":
return 0
order = ["critical", "high", "medium", "low", "info"]
threshold = order.index(fail_on) if fail_on in order else 1
for finding in result.get("findings", []):
sev = finding.get("severity", "info")
if sev in order and order.index(sev) <= threshold:
return 1
return 0
# ---------------------------------------------------------------------------
# Main entry point
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(description="AI Terraform reviewer powered by Claude")
parser.add_argument("terraform_file", help="Path to .tf file or terraform plan JSON")
parser.add_argument(
"--format", choices=["hcl", "plan"], default="hcl",
help="Input format: 'hcl' for .tf source, 'plan' for terraform plan -json output"
)
parser.add_argument(
"--output-json", metavar="FILE",
help="Write raw JSON findings to this file in addition to stdout"
)
parser.add_argument(
"--fail-on", choices=["critical", "high", "medium", "low", "never"],
default="high",
help="Exit with code 1 if any finding at or above this severity is found (default: high)"
)
args = parser.parse_args()
tf_path = Path(args.terraform_file)
if not tf_path.exists():
print(f"[ERROR] File not found: {tf_path}", file=sys.stderr)
sys.exit(2)
terraform_content = tf_path.read_text(encoding="utf-8")
print(f"[INFO] Reviewing {tf_path.name} ({len(terraform_content)} chars)...", file=sys.stderr)
result = review_terraform(terraform_content, file_label=tf_path.name)
# Print human-readable summary
print_findings(result)
# Optionally write raw JSON
if args.output_json:
out_path = Path(args.output_json)
out_path.write_text(json.dumps(result, indent=2), encoding="utf-8")
print(f"[INFO] JSON findings written to {out_path}", file=sys.stderr)
# Also print JSON to stdout for piping
print("\n--- RAW JSON ---")
print(json.dumps(result, indent=2))
sys.exit(exit_code(result, fail_on=args.fail_on))
if __name__ == "__main__":
main()
Sample Run and Output
$ python tf_reviewer.py sample_infra.tf --fail-on high
[INFO] Reviewing sample_infra.tf (2847 chars)...
[INFO] Tokens: input=1843 output=2104 cache_created=612 cache_read=0
======================================================================
TERRAFORM AI REVIEW RESULTS
======================================================================
Summary: This Terraform configuration has 4 critical and 5 high severity
findings that must be remediated before deploying to any environment.
The most urgent issues are a hardcoded database password, a publicly
accessible RDS instance, an S3 bucket with public-read ACL, and SSH
open to 0.0.0.0/0. Cost and tagging gaps are also present.
Critical findings : 4
High findings : 5
Total findings : 15
----------------------------------------------------------------------
[CRITICAL] SEC-001 | aws_db_instance.postgres
Category : security
Title : Hardcoded database password in plain text
Detail : The password attribute contains a literal string. This will
be stored in Terraform state (which may be stored in S3 or
a remote backend) and in version control history.
Fix : Use a data source: password = data.aws_secretsmanager_secret_version.db.secret_string
or a variable marked sensitive = true with the value supplied
via environment variable TF_VAR_db_password.
[CRITICAL] SEC-002 | aws_s3_bucket_acl.data_acl
Category : security
Title : S3 bucket set to public-read
Detail : A bucket named 'data' is explicitly granted public-read ACL.
Any object uploaded to this bucket is world-readable.
Fix : Remove the aws_s3_bucket_acl resource entirely. Add:
resource "aws_s3_bucket_public_access_block" "data" {
bucket = aws_s3_bucket.data.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
[CRITICAL] SEC-003 | aws_db_instance.postgres
Category : security
Title : RDS instance publicly accessible
Detail : publicly_accessible = true exposes the database endpoint on
a public IP. Combined with no VPC-level network isolation
visible in this file, this is a direct exposure risk.
Fix : Set publicly_accessible = false. Access the DB from application
servers within the same VPC using private subnets.
[CRITICAL] SEC-004 | aws_security_group.web
Category : security
Title : SSH (port 22) open to 0.0.0.0/0
Detail : Any IP on the internet can attempt SSH connections. This is
one of the most commonly exploited misconfigurations.
Fix : Restrict to your bastion or VPN CIDR:
cidr_blocks = ["10.0.0.0/8"]
or use AWS Systems Manager Session Manager and remove port 22 entirely.
[HIGH] SEC-005 | aws_s3_bucket.data
Category : security
Title : S3 bucket missing server-side encryption
Detail : No server_side_encryption_configuration block is defined.
All objects are stored unencrypted.
Fix : resource "aws_s3_bucket_server_side_encryption_configuration" "data" {
bucket = aws_s3_bucket.data.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
[HIGH] SEC-006 | aws_s3_bucket.data
Category : security
Title : S3 bucket versioning not enabled
Detail : Without versioning, overwritten or deleted objects cannot be
recovered. For a bucket named 'data' this is particularly risky.
Fix : resource "aws_s3_bucket_versioning" "data" {
bucket = aws_s3_bucket.data.id
versioning_configuration { status = "Enabled" }
}
[HIGH] REL-001 | aws_db_instance.postgres
Category : reliability
Title : skip_final_snapshot = true and deletion_protection = false
Detail : Deleting this RDS instance (accidentally or via terraform destroy)
will leave no snapshot. Data is irrecoverable.
Fix : Set deletion_protection = true and skip_final_snapshot = false.
Add final_snapshot_identifier = "app-postgres-final-$(timestamp)".
[HIGH] COST-001 | aws_cloudwatch_log_group.app
Category : cost
Title : CloudWatch log group has no retention policy
Detail : Logs will accumulate indefinitely. At $0.03/GB/month this adds
up quickly for a busy application.
Fix : Add retention_in_days = 30 (or your compliance-required value).
[HIGH] COST-002 | aws_instance.app
Category : cost
Title : Instance type t3.2xlarge may be oversized
Detail : t3.2xlarge (8 vCPU / 32 GB RAM) costs ~$240/month on-demand.
Without context this appears large. Confirm this is intentional.
Fix : Verify sizing. If this is a dev/staging instance, consider
t3.medium or t3.large with auto-stop schedules.
[MEDIUM] TAG-001 | aws_instance.app
Category : tagging
Title : Missing required tags: env, owner, cost-center
Detail : Only the Name tag is set. Finance teams cannot allocate costs
without cost-center. Ops cannot identify owners without owner.
Fix : tags = { Name = "app-server", env = "production",
owner = "platform-team", cost-center = "eng-infra" }
[MEDIUM] TAG-002 | aws_db_instance.postgres
Category : tagging
Title : Missing required tags: env, owner, cost-center
Detail : Same tagging gap as the EC2 instance.
Fix : Add the same required tag set to the RDS resource.
[MEDIUM] TAG-003 | provider.aws
Category : tagging
Title : No default_tags block on the AWS provider
Detail : Without default_tags, every resource must set tags individually.
A provider-level default_tags block ensures consistency.
Fix : provider "aws" { region = "us-east-1"
default_tags { tags = { managed-by = "terraform",
project = "my-app" } } }
[MEDIUM] REL-002 | aws_instance.app
Category : reliability
Title : Hardcoded AMI ID will drift
Detail : ami-0c55b159cbfafe1f0 is a specific AMI version. When AWS
retires it or you move regions, this breaks.
Fix : Use a data source: data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter { name = "name" values = ["amzn2-ami-hvm-*-x86_64-gp2"] }
}
[LOW] REL-003 | aws_security_group.web
Category : security
Title : Security group has no tags
Detail : Untagged security groups are hard to audit and correlate.
Fix : Add tags = { Name = "web-sg", env = "production" }
[INFO] SEC-007 | aws_instance.app
Category : security
Title : No key_name set on EC2 instance
Detail : SSH is open via the security group but no key pair is assigned.
This may be intentional (e.g. using SSM only) but is worth confirming.
Fix : If SSH is needed: key_name = aws_key_pair.deploy.key_name.
If SSM is used exclusively: remove port 22 ingress from the SG.
======================================================================
--- RAW JSON ---
{
"findings": [
{
"id": "SEC-001",
"severity": "critical",
"category": "security",
"resource": "aws_db_instance.postgres",
"attribute": "password",
"title": "Hardcoded database password in plain text",
"description": "The password attribute contains a literal string...",
"remediation": "Use data.aws_secretsmanager_secret_version or a sensitive variable..."
}
// ... 14 more findings
],
"summary": "This Terraform configuration has 4 critical and 5 high severity findings...",
"critical_count": 4,
"high_count": 5
}
The exit code from the script is 1 (because high-severity findings exist), which causes a CI job to fail the PR. Pass --fail-on critical if you only want to block on critical findings during early adoption.
Wiring AI Infrastructure as Code Review Into CI
GitHub Actions Example
# .github/workflows/terraform-review.yml
name: Terraform AI Review
on:
pull_request:
paths:
- '**.tf'
- '**.tfvars'
jobs:
ai-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: pip install anthropic python-dotenv
- name: Run Terraform AI review
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
python tf_reviewer.py infra/main.tf \
--output-json findings.json \
--fail-on high
- name: Upload findings artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: terraform-review-findings
path: findings.json
GitLab CI / CD Example
terraform-ai-review:
image: python:3.12-slim
stage: test
before_script:
- pip install anthropic python-dotenv
script:
- python tf_reviewer.py infra/main.tf --fail-on high
rules:
- changes:
- "**/*.tf"
Understanding the Cost and Latency Profile
| Scenario | Model | Approx Input Tokens | Approx Output Tokens | Latency | Cost per Run |
|---|---|---|---|---|---|
| Small .tf file (200 lines) | claude-sonnet-4-6 | 1,800 | 2,000 | 8-12 s | ~$0.024 |
| Large .tf file (1,000 lines) | claude-sonnet-4-6 | 6,000 | 3,500 | 15-25 s | ~$0.060 |
| terraform plan JSON (~2,000 lines) | claude-sonnet-4-6 | 12,000 | 4,000 | 20-35 s | ~$0.118 |
| Same large file, cached system prompt | claude-sonnet-4-6 | 6,000 (1,800 cached) | 3,500 | 12-20 s | ~$0.043 |
| Quick classification only | claude-haiku-4-5 | 6,000 | 2,000 | 5-10 s | ~$0.009 |
At these prices, even if you run the reviewer on every PR that touches a .tf file, a team making 50 PRs per month against a typical-sized module spends roughly $3 to $6 per month. That is less than one hour of engineer time, which is what a thorough manual review actually costs.
The prompt caching feature (marked "cache_control": {"type": "ephemeral"} on the system prompt in our code) pays off when you review multiple files in the same process or the same file multiple times in quick succession. On the second and subsequent calls, the 612-token system prompt is read from cache at one-tenth the cost of a regular input token. For a large review job processing 20 modules in a single pipeline run, this adds up to real savings.
Model Choice: When to Use Which Tier
| Model | Best For This Use Case | When to Avoid |
|---|---|---|
| claude-haiku-4-5 | High-volume pre-screening: quickly flag files that have any issue before a deeper review | Final security sign-off; complex policy evaluation; large plan files |
| claude-sonnet-4-6 | Standard PR review, production-grade findings, the default for this POC | Rarely; this model handles almost all Terraform review tasks well |
| claude-opus-4-8 | Auditing very large, complex modules with many interdependencies; compliance certification reviews | Routine PR checks (cost is higher; Sonnet is sufficient for most cases) |
A practical pattern for cost control: run claude-haiku-4-5 first. If it returns any critical or high findings, escalate to claude-sonnet-4-6 for the full detailed report. This two-stage approach cuts costs by 60-70 percent on repositories where most PRs are clean.
Common Pitfalls When Building Terraform AI Reviews
Sending the Wrong Input Format
The Terraform source (.tf files) and the plan JSON (terraform plan -json) contain very different information. The plan JSON is more complete because it includes the resolved values of variables, data sources, and computed attributes. A security group whose CIDR block comes from a variable looks clean in the source file but reveals 0.0.0.0/0 in the plan. Where possible, feed the plan JSON for the most accurate review. If your pipeline does not run terraform plan before the review, at minimum send all .tf files in the module, not just main.tf.
Trusting Findings Blindly in Production Gates
Claude can hallucinate. It occasionally flags things that are not problems (false positives) or misunderstands context-specific design choices. Use the AI reviewer as a signal, not as a final arbiter. The recommended pattern: AI review runs automatically and posts findings as PR comments; a human must explicitly dismiss critical findings before merging. This keeps the process fast while keeping humans in the loop for consequential decisions.
Sending Secrets to the API
If your Terraform files contain hardcoded secrets (which is itself a finding), those secrets will be sent to Anthropic’s API. Before sending any file, consider running a secrets scanner like truffleHog or detect-secrets and redacting values. Better still, fix the hardcoded secret first. The Claude API is covered by Anthropic’s data usage policies and does not train on API calls by default, but it is still better practice to not transmit secrets unnecessarily.
Context Window Limits on Very Large Plans
A terraform plan -json for a large infrastructure project can exceed 100,000 tokens. Claude Sonnet’s context window handles this, but cost grows linearly with input size. For very large plans, consider chunking by resource type or by module, running a separate review per chunk, and aggregating findings in your script. The tf_reviewer.py scaffold above is easy to extend for this: loop over chunks and concatenate the findings arrays.
Not Providing Organizational Context
The default system prompt in this POC applies general best practices. If your organization has specific rules (“all EC2 instances must use a specific set of AMIs”, “Multi-AZ is required on production RDS”, “VPC flow logs must be enabled”), add them to the system prompt. The more specific you are, the more relevant and actionable the findings become. Generic review prompts produce generic findings.
Ignoring the Exit Code
The script exits with code 1 when findings meet the severity threshold. If your CI pipeline does not check exit codes or uses || true to suppress failures, the review becomes informational-only. Wire the exit code to an actual gate: the job must fail, and the failure must block the PR merge.
For more on integrating Claude outputs into automated pipelines, see the Part 5 code review bot and Part 6 on PR summaries, which follow the same structured-output pattern.
Frequently Asked Questions
Can I review Terraform plan output instead of source files?
Yes, and it is often better. Run terraform plan -json > plan.json and pass that file to tf_reviewer.py. The plan JSON contains resolved variable values, computed attributes, and the final resource graph, which gives Claude much more to work with than the raw HCL source. Security group CIDR blocks that come from variables are invisible in source but fully resolved in the plan. The script handles both formats: the --format plan flag tells Claude that the input is a plan JSON rather than HCL.
How does this compare to tfsec or Checkov?
Static analysis tools like tfsec and Checkov work from a fixed rule database. They are fast, deterministic, and free. Claude-based review is contextual, handles novel patterns, and can apply organization-specific rules described in natural language. The right answer is both: run tfsec/Checkov in a separate step for fast, zero-cost baseline checks, and use the Claude reviewer for deeper contextual analysis, cost reasoning, and custom governance rules. They complement each other rather than compete.
Will Claude miss findings that tfsec would catch?
On well-known, standard misconfigurations, Claude is generally thorough. But it can miss things, especially in complex module compositions where the issue only becomes visible after full resolution. This is why the two-tool approach (static analyzer plus AI reviewer) is the recommended pattern. Treat Claude’s findings as additive to, not a replacement for, your existing static analysis step.
How do I handle Terraform modules that reference external sources?
Claude can only review what you send it. If your root module uses source = "terraform-aws-modules/vpc/aws", Claude sees that reference but not the module’s internals. For thorough coverage: either send the .terraform directory’s cached module source alongside your root module, or focus the review on the arguments your root module passes to the child module (which Claude can still assess for security-sensitive argument values).
Is my Terraform source code stored by Anthropic?
Per Anthropic’s usage policy, API inputs are not used to train models by default. Anthropic retains API request data for a limited period for safety monitoring and abuse prevention. If you are working with highly sensitive infrastructure (defense, financial services, regulated data), review the current Anthropic data usage policy at anthropic.com/privacy and consult your legal team before sending production Terraform plans through any external API.
Can I run this as a scheduled audit rather than a PR check?
Yes. Point the script at your entire Terraform repository, collect all .tf files, review each module directory, and aggregate findings into a weekly report. This is useful for catching configuration drift: your Terraform source may look clean, but someone applied an out-of-band change via the console, and the next terraform plan will show a diff. Combine AI review with terraform plan drift detection for a comprehensive scheduled audit.
What happens if Claude returns unexpected output?
Because the POC uses tool_choice to force Claude to call report_findings, the output is always schema-constrained. If the API returns an error or Claude fails to call the tool (which should not happen with tool_choice forced), the script detects the missing tool_use block, prints an error to stderr, and exits with code 2. You should also wrap the call in retry logic for transient API errors, which the guardrails article (Part 25) covers in detail.
Other articles in this series that apply the same structured-output and tool-use patterns: Part 2: Tool Use with Claude, Part 3: Structured JSON Output, Part 4: Prompt Caching, and Part 5: AI Code Review Bot. Back to the full AI in Production series.
External references: Anthropic tool use documentation, Terraform plan JSON format reference, tfsec documentation, Checkov by Bridgecrew.
Leave a Reply