İçeriğe Atla
Mustafa Erbay
Technology · 9 min read · görüntülenme Türkçe oku
100%

The Terraform Plan Mystery: Automation That Deletes the Wrong Resource

Take a deep look at Terraform plan's surprise resource deletions and the strategies for protecting your automation pipelines from these kinds of failures.

The Terraform Plan Mystery: Automation That Deletes the Wrong Resource — cover image

The Terraform Plan Mystery: Automation That Deletes the Wrong Resource

Terraform, one of the indispensable tools in modern infrastructure management, embraces the “Infrastructure as Code” (IaC) philosophy and lets us define and manage our cloud resources through code. That brings infrastructure work into automation, makes it consistent, and minimizes human error. But all that automation power comes with serious responsibility too. Misreading a terraform plan output, or letting automation apply blindly, can cause real disasters.

Today I’m going deep on the terraform plan command — one of Terraform’s most powerful features — to unpack the mysteries that lead automation to delete the wrong resources, and walk through, in detail, the strategies you can use to keep these unfortunate events from happening. Understanding this critical material should be a priority for every DevOps engineer and cloud architect who manages infrastructure.

What Is Terraform Plan and Why Does It Matter?

The terraform plan command sits at the heart of Terraform; it’s a preview tool that shows you what will happen before any changes are applied. It compares the current state of your infrastructure (state) with your Terraform configuration and, based on that comparison, walks through which resources will be created, updated, or destroyed.

That preview is a vital step for catching potential errors or unwanted changes. Before you run an apply, it lets the whole team verify the expected diff. That stops you from running into unexpected outages or data loss in production.

Reading Plan Output in Detail

terraform plan output uses symbols to indicate what will change:

  • + (Create): A new resource will be created.
  • - (Destroy): An existing resource will be destroyed.
  • ~ (Update): An existing resource will be updated (this can be either “in-place” or “replace” depending on the kind of change).
  • -/+ (Destroy then Create): A resource will first be destroyed and then recreated with the new parameters. You usually see this when you change an immutable property of a resource.

When you read any terraform plan output, you need to pay particular attention to Destroy (-) and Destroy then Create (-/+) markers. These mean an existing resource will be removed, which often means significant data loss or downtime. Whenever a destroy is on the plan, you have to double-check that this is genuinely what you intended.

# Example plan output (simplified)

Terraform will perform the following actions:

  # aws_instance.web_server will be created
  + resource "aws_instance" "web_server" {
      + ami                         = "ami-0abcdef1234567890"
      + instance_type               = "t2.micro"
      + tags                        = {
          + "Name" = "new-web-server"
        }
    }

  # aws_s3_bucket.data_bucket will be destroyed
  - resource "aws_s3_bucket" "data_bucket" {
      - acl                         = "private"
      - bucket                      = "my-old-data-bucket-123"
      - force_destroy               = false
      - id                          = "my-old-data-bucket-123"
        # (lifecycle non-destructive changes)
    }

  # aws_instance.db_server will be updated in-place
  ~ resource "aws_instance" "db_server" {
        id                            = "i-0123456789abcdef0"
      ~ instance_type                 = "t2.medium" -> "t2.large"
        # (lifecycle non-destructive changes)
    }

Plan: 1 to add, 1 to change, 1 to destroy.

In this example, a web_server will be created, a data_bucket will be destroyed, and a db_server will be updated. The fact that data_bucket is on the chopping block is something that needs careful review.

Automation and Terraform: Where They Meet

Terraform really shines once it’s wired into CI/CD (Continuous Integration / Continuous Delivery) pipelines. Automated pipelines let developers test, validate, and deploy code changes to cloud environments automatically. That integration makes infrastructure changes fast, repeatable, and error-free.

A typical Terraform flow inside a CI/CD pipeline might include the following steps:

  1. terraform init: Pulls down the required providers and modules.
  2. terraform validate: Checks the config files for syntax and semantic errors.
  3. terraform plan: Previews the upcoming changes and produces a plan file (.tfplan).
  4. Approval Step: Someone reviews the plan output and approves it manually or automatically.
  5. terraform apply: Applies the approved plan to the cloud.

Automating these steps cuts down on manual intervention, increases deployment speed, and reduces the risk of human error. But the potential dangers of all that automation can’t be ignored either.

The Risks of Automatic Plan Approval

The biggest risk in automated pipelines is using terraform apply -auto-approve blindly. That command applies whatever plan terraform plan produced, with no human review or approval at all. If the plan output contains an unintended destroy and the automation doesn’t catch it, disaster becomes inevitable.

A developer might accidentally remove a resource block from code, or set a count value to 0. If the CI/CD pipeline runs terraform plan on that change and goes straight to apply without an approval gate, the relevant resource will be destroyed in an instant. For high-priority or critical resources, that’s an unacceptable outcome.

A Wrong-Resource-Deletion Scenario: Case Study

Here’s a scenario. A company is using Terraform to manage a critical database server (say, an aws_db_instance) running in production. While working on a new feature, the development team — by mistake or because of a misunderstanding — comments out or completely deletes the aws_db_instance block in the Terraform configuration that defines the database server.

The CI/CD pipeline picks up the change. When terraform plan runs, the output looks like this:

# Example plan output: database is being destroyed

Terraform will perform the following actions:

  # aws_db_instance.main_db will be destroyed
  - resource "aws_db_instance" "main_db" {
      - allocated_storage              = 20
      - engine                         = "mysql"
      - id                             = "my-production-db-identifier"
      - instance_class                 = "db.t3.medium"
      - skip_final_snapshot            = false
      # (lifecycle non-destructive changes)
    }

Plan: 0 to add, 0 to change, 1 to destroy.

If the pipeline is set up to auto-approve this plan via apply -auto-approve, the production database is destroyed instantly. That can lead to hours of downtime, data loss, and serious financial damage. An incident like this typically calls for a Root Cause Analysis (RCA) and puts the organization’s safety practices under the microscope.

Triggers and Root Cause Analysis

The most common triggers and root causes behind wrong-resource deletions:

  1. Accidental Resource Removal (Removing from HCL): The simplest, most common cause is a developer accidentally removing a resource block from Terraform code.
  2. count or for_each Changes: When a resource’s count meta-argument drops from 1 to 0, or a key is removed from a for_each map, the corresponding resource is destroyed. This is a frequent gotcha, especially in dynamic resource definitions.
    resource "aws_instance" "web" {
      count = var.enable_web_server ? 1 : 0 # If var.enable_web_server is false, the server gets destroyed
      # ...
    }
  3. State Drift: This is a mismatch between the Terraform state file and the actual infrastructure. For example, if a resource is deleted manually but still appears in state, Terraform won’t perceive it as “deleted.” The more dangerous variant: a resource is created manually and is not managed by Terraform code — Terraform treats it as out-of-management and won’t bother with it. But if Terraform code thinks it’s managing a resource and that resource has been changed manually, the plan can show unexpected diffs.
  4. Module Updates: A new version of a Terraform module you’re consuming can introduce internal changes that trigger unexpected destroy operations. This can happen when a module’s outputs or resource definitions shift.
  5. Provider Version Changes: New versions of Terraform providers sometimes change the default behavior or attributes of resources, which can cause unexpected destroy or replace actions.
  6. Incorrect Variable Values: Environment variables, tfvars files, or variables misconfigured in CI/CD can affect the count or for_each values of resources and cause them to be destroyed.
  7. Resource Renaming or ID Changes: Unless you change the id or type of a resource, Terraform interprets renaming a resource block (e.g., from aws_instance.old_name to aws_instance.new_name) as creating a new resource and destroying the old one. To prevent that, use terraform state mv or moved blocks.

Strategies for Preventing Disaster

To prevent wrong-resource deletions, you need a multi-layer defense. Some effective approaches:

Thorough Plan Review and Approval Mechanisms

No matter how fast your automation gets, human oversight on critical changes is a must.

  • Manual Review: For every deployment to production in particular, the terraform plan output should be carefully reviewed and signed off by a teammate or architect.
  • Pull-Request-Based Approvals: If your Terraform code is managed in a Git repo, every change should go through a Pull Request (or Merge Request). The CI/CD pipeline runs terraform plan when a PR opens and posts the output as a comment on the PR. That way, the team can review the code and the plan output together before approving.
  • Policy-as-Code (PaC): Tools like OPA (Open Policy Agent) or HashiCorp Sentinel let you automatically check Terraform plans against specific security or compliance policies. For instance, you can define a rule like “destroying aws_db_instance resources in production must be blocked.”

State Management and Drift Detection

Having Terraform state that’s accurate and current is critical for plan correctness.

  • Use Remote State: Instead of keeping the state file locally, the recommendation is to store it remotely in S3, Azure Blob Storage, Google Cloud Storage, Terraform Cloud, or similar. That keeps state centralized, secure, and versioned.
  • Drift Detection (terraform refresh): The terraform refresh command syncs Terraform’s state with actual infrastructure. That makes manual changes show up in state. Meanwhile terraform plan -refresh-only only refreshes state and won’t propose any creates or destroys based on the current configuration — it just surfaces drift. Running drift detection regularly lets you catch unexpected changes early.

Designing a Safe CI/CD Pipeline

The pipeline itself should be built to prevent unintended changes.

  • Environment Separation: Dev, staging, and production environments should be fully isolated from each other and managed by separate Terraform states. Validate changes in staging before deploying to production.
  • Least-Privilege Principle: The CI/CD agent or user should only have the minimum permissions required to perform Terraform operations. The destroy permission in particular should be restricted in critical environments.
  • Conditional auto-approve: auto-approve should only be used under specific conditions (for example, only in staging or only after a manual approval).
  • Use a Plan File: The plan file produced by terraform plan -out=tfplan should later be applied with terraform apply tfplan. That prevents another code change from sneaking in between plan and apply and changing what gets applied — and it guarantees that what runs is exactly what was planned.
  • Pin Versions: Pin the Terraform version, provider versions, and module versions (~> 1.0, = 1.0.0) inside your code. That avoids incompatibilities and bugs from unexpected version bumps.

Resource Locking and Protection (Lifecycle Rules)

Both Terraform and the cloud providers offer mechanisms for protecting resources from accidental deletion.

  • The prevent_destroy Meta-Argument: In Terraform, you can set prevent_destroy = true inside a resource’s lifecycle block to prevent that resource from being destroyed via terraform destroy or terraform apply. This is a very valuable safety layer for critical databases and storage buckets.
    resource "aws_s3_bucket" "sensitive_data_bucket" {
      bucket = "my-critical-data-bucket"
    
      lifecycle {
        prevent_destroy = true
      }
    }
    With that setting active, Terraform will throw an error any time it tries to destroy this bucket. To actually delete it, you have to flip the setting back to false.
  • Cloud Provider Deletion Protection: Provider-side features like AWS RDS Deletion Protection, Azure SQL Database Deletion Locks, and Google Cloud Storage Retention Policy give you another layer of safety. They protect against accidental deletions that can come from outside Terraform too.

Code Examples and Best Practices

Using prevent_destroy

Here’s an example of using the prevent_destroy setting on a critical S3 bucket:

# main.tf
resource "aws_s3_bucket" "production_assets" {
  bucket = "my-company-prod-assets-2026"
  acl    = "private"

  tags = {
    Environment = "Production"
    ManagedBy   = "Terraform"
  }

  # Prevent this bucket from being destroyed accidentally by Terraform
  lifecycle {
    prevent_destroy = true
  }
}

# vars.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "eu-central-1"
}

provider "aws" {
  region = var.aws_region
}

With that configuration, if someone removes the aws_s3_bucket.production_assets resource from HCL and tries to run terraform apply, Terraform will throw an error and refuse the destroy:

$ terraform plan
# (after removing the aws_s3_bucket.production_assets resource)

Terraform will perform the following actions:

  # aws_s3_bucket.production_assets will be destroyed
  - resource "aws_s3_bucket" "production_assets" {
      - acl                   = "private"
      - bucket                = "my-company-prod-assets-2026"
      - id                    = "my-company-prod-assets-2026"
      - tags                  = {
          - "Environment" = "Production"
          - "ManagedBy"   = "Terraform"
        }
    }

Plan: 0 to add, 0 to change, 1 to destroy.

$ terraform apply

Error: Instance cannot be destroyed
  on main.tf line 14, in resource "aws_s3_bucket" "production_assets":
  14:   lifecycle {
  15:     prevent_destroy = true
  16:   }

This instance would be destroyed, but its meta-argument `prevent_destroy` is set to true.
To destroy this instance, remove that meta-argument and run `terraform apply` again.

terraform plan -out and a Safe terraform apply Flow

You can use a plan file in your CI/CD pipeline to build a safer deployment flow:

# Example GitLab CI/CD pipeline configuration (simplified)

stages:
  - init
  - validate
  - plan
  - apply

variables:
  TF_ROOT: ${CI_PROJECT_DIR}/terraform
  TF_PLAN_FILE: plan.tfplan

init:
  stage: init
  script:
    - cd $TF_ROOT
    - terraform init

validate:
  stage: validate
  script:
    - cd $TF_ROOT
    - terraform validate
    - terraform fmt -check=true # Check the code formatting

plan:
  stage: plan
  script:
    - cd $TF_ROOT
    - terraform plan -out=$TF_PLAN_FILE
    - terraform show -no-color $TF_PLAN_FILE # Render the plan output in a readable form
  artifacts:
    paths:
      - ${TF_ROOT}/${TF_PLAN_FILE}
    expire_in: 1 day # Keep the plan file

apply_review:
  stage: apply
  needs:
    - plan
  script:
    - echo "Terraform plan has been generated. Please review the artifacts and approve the deployment."
    - echo "To apply, trigger the 'apply_production' job manually."
  when: manual # A stage that requires manual approval

apply_production:
  stage: apply
  needs:
    - apply_review
  script:
    - cd $TF_ROOT
    - terraform apply $TF_PLAN_FILE
  when: manual # apply via manual trigger

This pipeline produces a .tfplan file in the plan stage and stores it as an artifact. The apply_review stage is manual and reminds the user to review the plan. The apply_production stage is also manual, so it can only be triggered after the plan is reviewed and approved. That avoids the risk of automatic apply -auto-approve.

Safe Use of count and for_each

Using count or for_each to manage dynamic resources can lead to surprise deletions if you’re not careful. Changing the order or the keys of resources can cause Terraform to interpret the change as “this old resource is gone, a new one will be created.”

  • moved Blocks: Since Terraform 1.1, moved blocks let you tell Terraform that resources weren’t actually deleted — they just moved — when you do things like rename resources or change for_each keys.
    # Old:
    # resource "aws_instance" "web_server" { ... }
    
    # New:
    resource "aws_instance" "app_server" { # We changed the resource name
      # ...
    }
    
    # Tell Terraform via a moved block:
    moved {
      from = aws_instance.web_server
      to   = aws_instance.app_server
    }
  • Stable Keys with for_each: When you use for_each, make sure the keys in your collection are stable. Using a map instead of a list, and ensuring the map keys won’t change, can keep resources from being accidentally destroyed.
# Wrong (risk of deletion if order changes):
# variable "instance_names" {
#   type = list(string)
#   default = ["web1", "web2"]
# }
# resource "aws_instance" "web" {
#   count = length(var.instance_names)
#   tags = {
#     Name = var.instance_names[count.index]
#   }
#   # ...
# }

# Right (safer because the keys are stable):
variable "web_servers" {
  type = map(object({
    instance_type = string
    ami_id        = string
  }))
  default = {
    "prod-web-01" = { instance_type = "t3.medium", ami_id = "ami-0abcdef123" }
    "prod-web-02" = { instance_type = "t3.medium", ami_id = "ami-0abcdef123" }
  }
}

resource "aws_instance" "web" {
  for_each = var.web_servers
  instance_type = each.value.instance_type
  ami           = each.value.ami_id
  tags = {
    Name = each.key
  }
  # ...
}

In this example, using for_each with a map ties each server to a unique key (e.g., “prod-web-01”). As long as you don’t remove a key from the map, Terraform won’t destroy a server even if the order of the others changes.

Conclusion

Terraform is a powerful tool that has revolutionized infrastructure management. But being aware of that power, and understanding the risks that come with it, is the key to using it responsibly. The terraform plan command is one of the most valuable tools we have for managing those risks. Reading its output carefully, avoiding blind apply automation, and using protective mechanisms like prevent_destroy are the foundation for preventing wrong-resource-deletion disasters.

It’s natural to chase speed and efficiency when automating your infrastructure, but you should never compromise on safety and consistency. A solid CI/CD pipeline, paired with a thorough review process and well-defined policies, lets you reap all of Terraform’s benefits while minimizing the potential risks. Don’t forget — well-planned and well-supervised automation is the foundation of smooth, safe infrastructure.

So, how do you manage your Terraform plans inside your automation pipelines? What strategies do you use to prevent wrong-resource-deletion scenarios? Don’t be shy — drop your thoughts and experiences in the comments!

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts