In Part 1, we built the foundation — the "what" and "how" of Terraform component testing with two working code examples. Now we go further. In this article, we will plug testing into your CI/CD pipeline, talk honestly about what is worth testing and what is not, explore the pitfalls that trip up most teams, and then spend a significant amount of time on the part that genuinely excites me right now: how AI, AI Agents, and Retrieval-Augmented Generation (RAG) are beginning to transform the way infrastructure tests are written, maintained, and evolved.

That last section is not speculation. I am drawing from patterns we are already seeing in production environments and from my own exploration building AI-assisted infrastructure tooling at Oracle. Let us get into it.

Integrating Into Your CI/CD Pipeline

Testing in isolation is a great start. But the real value comes when these tests run automatically on every pull request to your module repository. Here is how a mature pipeline looks for a Terraform module library.

CI/CD pipeline for Terraform module testing: PR trigger, Stage 1 static checks, Stage 2 component tests, merge gate A flowchart showing four stages: Pull Request trigger, Stage 1 fast static analysis, Stage 2 slower component tests, and a pass/fail merge gate. TRIGGER STAGE 1 · fast STAGE 2 · slower GATE PR / Push to main branch terraform fmt -check terraform validate tflint provider rules check checkov security policies ⚡ ~30 seconds total terraform test component tests terratest integration tests Dedicated test account AWS / OCI tenancy ⏱ 3–10 minutes All pass? ✓ Merge allowed FAIL ❌ PR blocked — fix the module and re-push

Fig 3. A production-grade CI/CD pipeline for Terraform module testing. Static checks gate the costlier component tests.

A few important design decisions in this pipeline worth calling out:

  1. Separate test cloud account: Never run component tests against your production or staging environment. Use a dedicated AWS account or OCI tenancy purely for testing. Resources are created and destroyed frequently — you need clean isolation and predictable costs.
  2. Stage gating: Static analysis is cheap and fast. Component tests are slower and actually deploy infrastructure. Only trigger Stage 2 if Stage 1 is fully green. A PR with a format error does not need to spin up a VPC to know it should be rejected.
  3. Parallelise where possible: Checkov, tflint, and fmt checks have no interdependencies — run them all in parallel. Terratest tests can also be parallelised with Go's t.Parallel() directive as long as resource names are unique per run.
  4. Tag every test resource: Every resource created during tests should carry ManagedBy = "terraform-test" and Environment = "test". Set up a billing alert on those tags so you can catch runaway test infrastructure immediately.

What Should You Actually Test?

One of the most common questions I get is: "This sounds expensive — we are deploying and destroying real resources for every PR. What is actually worth testing?" Good question. Not every Terraform resource warrants a component test. Here is my practical rule of thumb:

Common Pitfalls to Avoid

Having done this across several organisations, I have seen the same mistakes come up again and again. Save yourself the pain:

· · ·
The Frontier

How AI, AI Agents & RAG Are Transforming Terraform Testing

From manually writing test assertions to intelligent systems that generate, audit, and evolve your test suite — here is where this is heading.

Let me set the scene. You have a module library with forty shared Terraform modules. Writing tests for all forty is several weeks of work. Keeping those tests updated as modules evolve is an ongoing tax on the platform team. When a new compliance requirement comes in — say, your organisation needs to enforce FIPS 140-2 encryption across all storage — you need to audit which modules comply and write new tests for those that do not. Traditionally, that is a manual, tedious, error-prone process.

AI is beginning to change all three of these problems: test generation, test maintenance, and compliance-driven test creation. Let us break down each one with concrete examples.

✍️

AI Test Generation

LLMs read your module code and auto-generate .tftest.hcl or Terratest files with meaningful assertions.

📚

RAG Policy Compliance

Index your org's security docs and let an LLM generate Checkov custom checks directly from your policy language.

🔍

Drift Detection Agent

An autonomous agent compares Terraform state against real cloud resources and suggests remediation test cases.

🔄

Test Maintenance Agent

When a module changes, an agent analyses the diff and updates affected test assertions automatically via PR.

1. AI-Assisted Test Generation

The most immediately practical application is using an LLM to generate the first version of your test file. You feed it your Terraform module code, and it produces a complete .tftest.hcl with meaningful assertions — not just boilerplate.

For example, here is a prompt pattern that works well in practice when integrating this into a GitHub Copilot workflow or an internal developer portal built on an LLM API:

# System prompt for the LLM
You are a senior Platform Engineer specialising in Terraform.
Given a Terraform module, generate a complete .tftest.hcl file that:
1. Tests the module's security contract (not implementation details)
2. Uses uuid() for unique resource names to allow parallel test runs
3. Includes at minimum: one happy-path run block and one edge-case run block
4. Adds a descriptive error_message for every assert block
5. Follows Terraform 1.6+ native test syntax

Focus assertions on: encryption, access controls, tagging compliance,
and any output values that consumers of this module will depend on.

# User message
Here is my module code:
[paste module main.tf, variables.tf, outputs.tf]

Generate the test file.

In my own experiments using Claude via the Anthropic API, this approach produces test files that cover around 70–80% of meaningful assertions on the first pass, with only minor corrections needed for provider-specific edge cases. That is the difference between a two-hour task and a fifteen-minute task — for each module.

🤖 Real-World Application At Oracle, teams working on OCI-based Terraform modules are beginning to explore LLM-assisted test scaffolding as part of the internal developer platform. The pattern is simple: the module author pushes code, a pipeline job calls an LLM API with the module source, and a draft test PR is automatically raised for the author to review and refine. The author writes the module; the AI writes the first draft of the tests.

2. RAG-Powered Compliance Test Generation

This is the one I am most excited about, and it is more sophisticated than simple code generation. Retrieval-Augmented Generation (RAG) allows you to connect an LLM to your organisation's actual documentation — your security policy PDFs, your compliance frameworks, your internal architecture decision records — and use that knowledge to generate tests grounded in your specific requirements.

Here is how the architecture looks in practice:

RAG-powered compliance test generation architecture with policy document ingestion, vector store, LLM, and Checkov check output Architecture diagram showing how policy documents are chunked and embedded into a vector store, then retrieved by an LLM to generate Checkov custom checks for Terraform modules. Policy Sources SOC 2 / ISO 27001 Internal Sec Policy Architecture ADRs CIS Benchmarks chunk + embed Vector Store e.g. Chroma / pgvector Terraform Module Code LLM Reads module + relevant policy context from RAG Checkov Custom Check + .tftest.hcl Step 1: Ingest Step 2: Retrieve Step 3: Generate Step 4: Enforce Policy language → test code, automatically

Fig 4. RAG-powered compliance test generation. Policy documents are embedded, retrieved at query time, and used to generate context-aware Checkov checks and test assertions.

For example, say your company's information security policy document contains the sentence: "All object storage resources must use server-side encryption with organisation-managed keys (SSE-KMS), not platform-managed keys (SSE-S3)." Traditionally, a platform engineer reads that, interprets it, and manually writes a Checkov custom check. With a RAG pipeline:

  1. The policy PDF is chunked and embedded into a vector store (Chroma, pgvector, or OCI Search with OpenSearch).
  2. When a new Terraform module for object storage is committed, the pipeline retrieves the relevant policy chunks — "storage", "encryption", "key management" — via semantic search.
  3. The LLM receives both the module code and the retrieved policy context, and generates a Checkov custom check that enforces SSE-KMS specifically.
  4. The generated check is raised as a PR for a platform engineer to review before merging into the shared Checkov policy library.

The human is still in the loop — reviewing the generated check before it enforces anything. But the translation from policy language to code is automated. That is a genuinely new capability.

# AI-generated Checkov custom check — from policy context
# Policy: "All S3 buckets must use SSE-KMS with a CMK, not AES256"

from checkov.common.models.enums import CheckCategories, CheckResult
from checkov.terraform.checks.resource.base_resource_check import BaseResourceCheck

class S3BucketKMSEncryptionRequired(BaseResourceCheck):
    def __init__(self):
        name = "Ensure S3 bucket uses KMS CMK — policy ref: SEC-STORE-003"
        id   = "CKV_CUSTOM_S3_001"
        super().__init__(name=name, id=id,
                         categories=[CheckCategories.ENCRYPTION],
                         supported_resources=["aws_s3_bucket_server_side_encryption_configuration"])

    def scan_resource_conf(self, conf):
        rules = conf.get("rule", [])
        for rule in rules:
            sse_config = rule.get("apply_server_side_encryption_by_default", [{}])
            algorithm  = sse_config[0].get("sse_algorithm", [""])
            kms_key    = sse_config[0].get("kms_master_key_id", [""])
            # Must be aws:kms AND have a non-empty CMK ARN
            if algorithm[0] == "aws:kms" and kms_key[0]:
                return CheckResult.PASSED
        return CheckResult.FAILED

scanner = S3BucketKMSEncryptionRequired()

Notice the reference SEC-STORE-003 in the check name. The LLM was able to pull the policy reference number from the retrieved document chunk and embed it directly in the check's identifier. A platform engineer reviewing this can immediately trace the check back to its source policy — something that manual checks often fail to do consistently.

3. AI Agents for Drift Detection and Test Suggestion

The most autonomous pattern — and the one that feels most futuristic but is already technically achievable — is the drift-detection AI agent. Here is the problem it solves: Terraform state and real cloud infrastructure diverge. Someone makes a manual change in the AWS console. A cloud provider updates a default setting. A resource gets modified by another automation tool. Your terraform plan shows a diff. But what tests should you write to prevent this class of drift from happening again?

AI drift detection agent loop: detect drift, analyse root cause, suggest test, raise PR An agentic loop where a drift detection agent compares Terraform state against real cloud resources, analyses the root cause, suggests a new test assertion, and raises a pull request. 1. Detect TF state ≠ actual cloud resource terraform plan / drift tool 2. Analyse LLM classifies drift root cause + retrieves related policy 3. Generate AI Agent writes new assert block to catch this drift class of failure → prevention test 4. Raise PR Test added to module test file human reviews before merge Every drift incident becomes a permanent regression test — automatically

Fig 5. The AI drift detection agent loop. Infrastructure incidents become permanent test assertions with minimal human effort.

The agent workflow looks like this in practice. A scheduled job runs terraform plan against all modules nightly and detects unexpected diffs. For each diff, it packages the before/after state change and sends it to an AI agent. The agent — which has access to your RAG policy store, your module code, and your existing test suite — does three things: it classifies the root cause (manual console change? provider API change? missing guard in the module?), it generates a new assert block for the .tftest.hcl that would have caught this specific drift, and it raises a pull request against the module repository with the suggested fix. A platform engineer reviews and merges. The test suite grows smarter with every incident.

This is the compound interest of infrastructure testing — every failure makes the suite better at catching the next failure.

4. LLM-Powered Test Maintenance

Finally, there is the maintenance problem. Modules evolve. A new input variable is added. A resource type is replaced with a newer one. Your tests, if not updated, either break or — worse — continue to pass but no longer test the right thing.

An LLM can be integrated directly into your PR review pipeline. When a PR modifies a Terraform module, a CI step calls an LLM with the module diff and the current test file and asks a simple question: "Given this change to the module, are there any assertions in the test file that are now incorrect, incomplete, or missing?" The response is posted as an automated PR comment, flagging specific assert blocks that may need updating.

🔑 Key Insight AI does not replace your test suite. It removes the activation energy required to build and maintain one. The platform engineer still writes the final version, reviews every generated assertion, and owns the quality of the test suite. AI just eliminates the blank-page problem and the tedious diff-analysis — the two biggest reasons infrastructure tests do not get written or updated in practice.

The Business Case for Testing Infrastructure

I understand the pushback. "We are moving fast. We do not have time for this." I have heard it at every team I have worked with — at PALO-IT, at British Telecom, and now building data platform infrastructure at Oracle. And every time, the counterargument is the same real incident from Tuesday afternoon that opened Part 1 of this series.

"You do not pay for testing. You pay for not testing — just later, and with interest."

The cost of writing a component test for your S3 module is perhaps two hours of engineering time — or thirty minutes with AI assistance — and a few dollars of cloud compute for each CI run. The cost of a publicly accessible compliance bucket, a deleted RDS without a backup, or an IAM role with overly permissive policies? That is measured in sleepless nights, incident reports, customer trust, and sometimes regulatory fines.

Beyond risk mitigation, there is a developer productivity argument too. When engineers can confidently make changes to shared modules knowing a test suite will catch regressions, they move faster. The fear of breaking something downstream is one of the biggest silent taxes on platform engineering teams. Testing removes that fear. AI-assisted testing removes the cost of building the safety net in the first place.

Your Complete Testing Checklist

Terraform testing checklist with four categories: Static, Component, Integration, and Governance A four-panel flashcard summarising the Terraform testing strategy, each with a time estimate and key actions. STATIC Every commit ✓ terraform fmt ✓ terraform validate ✓ tflint ✓ checkov ✓ terrascan ⚡ < 1 min COMPONENT Every PR ✓ Deploy module ✓ Assert outputs ✓ Assert resource config via API ✓ Auto destroy ✓ 🤖 AI-generated ⏱ 3–10 min INTEGRATION Nightly / pre-release ✓ Multi-module stack ✓ Network flows ✓ Cross-service IAM ✓ Endpoint health ⏱ 15–40 min AI + GOVERN Continuous ✓ RAG policy checks ✓ AI test generation ✓ Drift agent PRs ✓ infracost checks ✓ Sentinel / OPA 🔄 Continuous

Fig 6. The complete Terraform testing checklist. Column 4 (AI + Governance) is the emerging frontier — start with columns 1–2 and grow from there.

Where to Start If You Have Nothing Today

Do not try to implement everything at once. Here is a practical three-week roadmap that I have seen work across different team sizes and cloud providers:

Week 1

Add Checkov to your CI pipeline

No cloud credentials needed, zero cost, immediate value. It will likely flag things you did not know were wrong. Fix them. Build the habit of treating infrastructure as code that must pass automated checks.

Week 2

Add terraform test to your most critical shared module

Pick the one with the widest blast radius — your VPC module, your IAM baseline, your standard object storage module. Write three to five assertions. Use an LLM to draft the first version. Get it running in CI against a dedicated test account.

Week 3

Extend the pattern and enforce it

Add tests to your next three most-used modules. Make component tests a required check for PR approval on any shared module. Enable the Checkov GitHub App for automatic PR annotations. You now have a real testing culture.

After that foundation, Terratest, RAG-powered compliance checks, and AI agents can come when the complexity genuinely warrants it. Do not over-engineer up front. A working simple test beats a perfectly designed absent one every single time.

· · ·

We started Part 1 with a Tuesday afternoon incident — a silently misconfigured S3 bucket that three weeks later became an auditor's problem. We have spent two articles building the answer to that problem: a layered testing strategy, real working code, a production-grade CI/CD pipeline, and a look at how AI is beginning to make the whole process faster, smarter, and more self-healing.

Infrastructure testing is no longer a nice-to-have. At the pace modern platform teams ship, it is the difference between confidence and chaos. The tooling is mature. The patterns are proven. The AI assistance is arriving. There has never been a better time to start.

Happy coding!! If you have questions, have a testing pattern that worked well for your team, or want to share a war story about infrastructure gone wrong — I would love to hear from you. Drop a comment or reach me on LinkedIn.

In my next article in this series, I will show how to implement Terraform module versioning and release workflows — because testing is only valuable when paired with a proper way to version, publish, and consume your tested modules across teams. Stay connected and see you next time.