The Terraform Plan Mystery: Automation That Deletes the Wrong Resource
Terraform, one of the indispensable tools in modern infrastructure management, embraces the “Infrastructure as Code” (IaC) philosophy and lets us define and manage our cloud resources through code. That brings infrastructure work into automation, makes it consistent, and minimizes human error. But all that automation power comes with serious responsibility too. Misreading a terraform plan output, or letting automation apply blindly, can cause real disasters.
Today I’m going deep on the terraform plan command — one of Terraform’s most powerful features — to unpack the mysteries that lead automation to delete the wrong resources, and walk through, in detail, the strategies you can use to keep these unfortunate events from happening. Understanding this critical material should be a priority for every DevOps engineer and cloud architect who manages infrastructure.
What Is Terraform Plan and Why Does It Matter?
The terraform plan command sits at the heart of Terraform; it’s a preview tool that shows you what will happen before any changes are applied. It compares the current state of your infrastructure (state) with your Terraform configuration and, based on that comparison, walks through which resources will be created, updated, or destroyed.
That preview is a vital step for catching potential errors or unwanted changes. Before you run an apply, it lets the whole team verify the expected diff. That stops you from running into unexpected outages or data loss in production.
Reading Plan Output in Detail
terraform plan output uses symbols to indicate what will change:
+(Create): A new resource will be created.-(Destroy): An existing resource will be destroyed.~(Update): An existing resource will be updated (this can be either “in-place” or “replace” depending on the kind of change).-/+(Destroy then Create): A resource will first be destroyed and then recreated with the new parameters. You usually see this when you change an immutable property of a resource.
When you read any terraform plan output, you need to pay particular attention to Destroy (-) and Destroy then Create (-/+) markers. These mean an existing resource will be removed, which often means significant data loss or downtime. Whenever a destroy is on the plan, you have to double-check that this is genuinely what you intended.
# Example plan output (simplified)
Terraform will perform the following actions:
# aws_instance.web_server will be created
+ resource "aws_instance" "web_server" {
+ ami = "ami-0abcdef1234567890"
+ instance_type = "t2.micro"
+ tags = {
+ "Name" = "new-web-server"
}
}
# aws_s3_bucket.data_bucket will be destroyed
- resource "aws_s3_bucket" "data_bucket" {
- acl = "private"
- bucket = "my-old-data-bucket-123"
- force_destroy = false
- id = "my-old-data-bucket-123"
# (lifecycle non-destructive changes)
}
# aws_instance.db_server will be updated in-place
~ resource "aws_instance" "db_server" {
id = "i-0123456789abcdef0"
~ instance_type = "t2.medium" -> "t2.large"
# (lifecycle non-destructive changes)
}
Plan: 1 to add, 1 to change, 1 to destroy.
In this example, a web_server will be created, a data_bucket will be destroyed, and a db_server will be updated. The fact that data_bucket is on the chopping block is something that needs careful review.
Automation and Terraform: Where They Meet
Terraform really shines once it’s wired into CI/CD (Continuous Integration / Continuous Delivery) pipelines. Automated pipelines let developers test, validate, and deploy code changes to cloud environments automatically. That integration makes infrastructure changes fast, repeatable, and error-free.
A typical Terraform flow inside a CI/CD pipeline might include the following steps:
terraform init: Pulls down the required providers and modules.terraform validate: Checks the config files for syntax and semantic errors.terraform plan: Previews the upcoming changes and produces a plan file (.tfplan).- Approval Step: Someone reviews the plan output and approves it manually or automatically.
terraform apply: Applies the approved plan to the cloud.
Automating these steps cuts down on manual intervention, increases deployment speed, and reduces the risk of human error. But the potential dangers of all that automation can’t be ignored either.
The Risks of Automatic Plan Approval
The biggest risk in automated pipelines is using terraform apply -auto-approve blindly. That command applies whatever plan terraform plan produced, with no human review or approval at all. If the plan output contains an unintended destroy and the automation doesn’t catch it, disaster becomes inevitable.
A developer might accidentally remove a resource block from code, or set a count value to 0. If the CI/CD pipeline runs terraform plan on that change and goes straight to apply without an approval gate, the relevant resource will be destroyed in an instant. For high-priority or critical resources, that’s an unacceptable outcome.
A Wrong-Resource-Deletion Scenario: Case Study
Here’s a scenario. A company is using Terraform to manage a critical database server (say, an aws_db_instance) running in production. While working on a new feature, the development team — by mistake or because of a misunderstanding — comments out or completely deletes the aws_db_instance block in the Terraform configuration that defines the database server.
The CI/CD pipeline picks up the change. When terraform plan runs, the output looks like this:
# Example plan output: database is being destroyed
Terraform will perform the following actions:
# aws_db_instance.main_db will be destroyed
- resource "aws_db_instance" "main_db" {
- allocated_storage = 20
- engine = "mysql"
- id = "my-production-db-identifier"
- instance_class = "db.t3.medium"
- skip_final_snapshot = false
# (lifecycle non-destructive changes)
}
Plan: 0 to add, 0 to change, 1 to destroy.
If the pipeline is set up to auto-approve this plan via apply -auto-approve, the production database is destroyed instantly. That can lead to hours of downtime, data loss, and serious financial damage. An incident like this typically calls for a Root Cause Analysis (RCA) and puts the organization’s safety practices under the microscope.
Triggers and Root Cause Analysis
The most common triggers and root causes behind wrong-resource deletions:
- Accidental Resource Removal (Removing from HCL): The simplest, most common cause is a developer accidentally removing a
resourceblock from Terraform code. countorfor_eachChanges: When a resource’scountmeta-argument drops from 1 to 0, or a key is removed from afor_eachmap, the corresponding resource is destroyed. This is a frequent gotcha, especially in dynamic resource definitions.resource "aws_instance" "web" { count = var.enable_web_server ? 1 : 0 # If var.enable_web_server is false, the server gets destroyed # ... }- State Drift: This is a mismatch between the Terraform state file and the actual infrastructure. For example, if a resource is deleted manually but still appears in state, Terraform won’t perceive it as “deleted.” The more dangerous variant: a resource is created manually and is not managed by Terraform code — Terraform treats it as out-of-management and won’t bother with it. But if Terraform code thinks it’s managing a resource and that resource has been changed manually, the plan can show unexpected diffs.
- Module Updates: A new version of a Terraform module you’re consuming can introduce internal changes that trigger unexpected
destroyoperations. This can happen when a module’soutputs orresourcedefinitions shift. - Provider Version Changes: New versions of Terraform providers sometimes change the default behavior or attributes of resources, which can cause unexpected
destroyorreplaceactions. - Incorrect Variable Values: Environment variables,
tfvarsfiles, or variables misconfigured in CI/CD can affect thecountorfor_eachvalues of resources and cause them to be destroyed. - Resource Renaming or ID Changes: Unless you change the
idortypeof a resource, Terraform interprets renaming aresourceblock (e.g., fromaws_instance.old_nametoaws_instance.new_name) as creating a new resource and destroying the old one. To prevent that, useterraform state mvormovedblocks.
Strategies for Preventing Disaster
To prevent wrong-resource deletions, you need a multi-layer defense. Some effective approaches:
Thorough Plan Review and Approval Mechanisms
No matter how fast your automation gets, human oversight on critical changes is a must.
- Manual Review: For every deployment to production in particular, the
terraform planoutput should be carefully reviewed and signed off by a teammate or architect. - Pull-Request-Based Approvals: If your Terraform code is managed in a Git repo, every change should go through a Pull Request (or Merge Request). The CI/CD pipeline runs
terraform planwhen a PR opens and posts the output as a comment on the PR. That way, the team can review the code and the plan output together before approving. - Policy-as-Code (PaC): Tools like OPA (Open Policy Agent) or HashiCorp Sentinel let you automatically check Terraform plans against specific security or compliance policies. For instance, you can define a rule like “destroying
aws_db_instanceresources in production must be blocked.”
State Management and Drift Detection
Having Terraform state that’s accurate and current is critical for plan correctness.
- Use Remote State: Instead of keeping the state file locally, the recommendation is to store it remotely in S3, Azure Blob Storage, Google Cloud Storage, Terraform Cloud, or similar. That keeps state centralized, secure, and versioned.
- Drift Detection (
terraform refresh): Theterraform refreshcommand syncs Terraform’s state with actual infrastructure. That makes manual changes show up in state. Meanwhileterraform plan -refresh-onlyonly refreshes state and won’t propose any creates or destroys based on the current configuration — it just surfaces drift. Running drift detection regularly lets you catch unexpected changes early.
Designing a Safe CI/CD Pipeline
The pipeline itself should be built to prevent unintended changes.
- Environment Separation: Dev, staging, and production environments should be fully isolated from each other and managed by separate Terraform states. Validate changes in staging before deploying to production.
- Least-Privilege Principle: The CI/CD agent or user should only have the minimum permissions required to perform Terraform operations. The
destroypermission in particular should be restricted in critical environments. - Conditional
auto-approve:auto-approveshould only be used under specific conditions (for example, only in staging or only after a manual approval). - Use a Plan File: The plan file produced by
terraform plan -out=tfplanshould later be applied withterraform apply tfplan. That prevents another code change from sneaking in betweenplanandapplyand changing what gets applied — and it guarantees that what runs is exactly what was planned. - Pin Versions: Pin the Terraform version, provider versions, and module versions (
~> 1.0,= 1.0.0) inside your code. That avoids incompatibilities and bugs from unexpected version bumps.
Resource Locking and Protection (Lifecycle Rules)
Both Terraform and the cloud providers offer mechanisms for protecting resources from accidental deletion.
- The
prevent_destroyMeta-Argument: In Terraform, you can setprevent_destroy = trueinside a resource’slifecycleblock to prevent that resource from being destroyed viaterraform destroyorterraform apply. This is a very valuable safety layer for critical databases and storage buckets.
With that setting active, Terraform will throw an error any time it tries to destroy this bucket. To actually delete it, you have to flip the setting back toresource "aws_s3_bucket" "sensitive_data_bucket" { bucket = "my-critical-data-bucket" lifecycle { prevent_destroy = true } }false. - Cloud Provider Deletion Protection: Provider-side features like AWS RDS
Deletion Protection, Azure SQL DatabaseDeletion Locks, and Google Cloud StorageRetention Policygive you another layer of safety. They protect against accidental deletions that can come from outside Terraform too.
Code Examples and Best Practices
Using prevent_destroy
Here’s an example of using the prevent_destroy setting on a critical S3 bucket:
# main.tf
resource "aws_s3_bucket" "production_assets" {
bucket = "my-company-prod-assets-2026"
acl = "private"
tags = {
Environment = "Production"
ManagedBy = "Terraform"
}
# Prevent this bucket from being destroyed accidentally by Terraform
lifecycle {
prevent_destroy = true
}
}
# vars.tf
variable "aws_region" {
description = "AWS region"
type = string
default = "eu-central-1"
}
provider "aws" {
region = var.aws_region
}
With that configuration, if someone removes the aws_s3_bucket.production_assets resource from HCL and tries to run terraform apply, Terraform will throw an error and refuse the destroy:
$ terraform plan
# (after removing the aws_s3_bucket.production_assets resource)
Terraform will perform the following actions:
# aws_s3_bucket.production_assets will be destroyed
- resource "aws_s3_bucket" "production_assets" {
- acl = "private"
- bucket = "my-company-prod-assets-2026"
- id = "my-company-prod-assets-2026"
- tags = {
- "Environment" = "Production"
- "ManagedBy" = "Terraform"
}
}
Plan: 0 to add, 0 to change, 1 to destroy.
$ terraform apply
Error: Instance cannot be destroyed
on main.tf line 14, in resource "aws_s3_bucket" "production_assets":
14: lifecycle {
15: prevent_destroy = true
16: }
This instance would be destroyed, but its meta-argument `prevent_destroy` is set to true.
To destroy this instance, remove that meta-argument and run `terraform apply` again.
terraform plan -out and a Safe terraform apply Flow
You can use a plan file in your CI/CD pipeline to build a safer deployment flow:
# Example GitLab CI/CD pipeline configuration (simplified)
stages:
- init
- validate
- plan
- apply
variables:
TF_ROOT: ${CI_PROJECT_DIR}/terraform
TF_PLAN_FILE: plan.tfplan
init:
stage: init
script:
- cd $TF_ROOT
- terraform init
validate:
stage: validate
script:
- cd $TF_ROOT
- terraform validate
- terraform fmt -check=true # Check the code formatting
plan:
stage: plan
script:
- cd $TF_ROOT
- terraform plan -out=$TF_PLAN_FILE
- terraform show -no-color $TF_PLAN_FILE # Render the plan output in a readable form
artifacts:
paths:
- ${TF_ROOT}/${TF_PLAN_FILE}
expire_in: 1 day # Keep the plan file
apply_review:
stage: apply
needs:
- plan
script:
- echo "Terraform plan has been generated. Please review the artifacts and approve the deployment."
- echo "To apply, trigger the 'apply_production' job manually."
when: manual # A stage that requires manual approval
apply_production:
stage: apply
needs:
- apply_review
script:
- cd $TF_ROOT
- terraform apply $TF_PLAN_FILE
when: manual # apply via manual trigger
This pipeline produces a .tfplan file in the plan stage and stores it as an artifact. The apply_review stage is manual and reminds the user to review the plan. The apply_production stage is also manual, so it can only be triggered after the plan is reviewed and approved. That avoids the risk of automatic apply -auto-approve.
Safe Use of count and for_each
Using count or for_each to manage dynamic resources can lead to surprise deletions if you’re not careful. Changing the order or the keys of resources can cause Terraform to interpret the change as “this old resource is gone, a new one will be created.”
movedBlocks: Since Terraform 1.1,movedblocks let you tell Terraform that resources weren’t actually deleted — they just moved — when you do things like rename resources or changefor_eachkeys.# Old: # resource "aws_instance" "web_server" { ... } # New: resource "aws_instance" "app_server" { # We changed the resource name # ... } # Tell Terraform via a moved block: moved { from = aws_instance.web_server to = aws_instance.app_server }- Stable Keys with
for_each: When you usefor_each, make sure the keys in your collection are stable. Using a map instead of a list, and ensuring the map keys won’t change, can keep resources from being accidentally destroyed.
# Wrong (risk of deletion if order changes):
# variable "instance_names" {
# type = list(string)
# default = ["web1", "web2"]
# }
# resource "aws_instance" "web" {
# count = length(var.instance_names)
# tags = {
# Name = var.instance_names[count.index]
# }
# # ...
# }
# Right (safer because the keys are stable):
variable "web_servers" {
type = map(object({
instance_type = string
ami_id = string
}))
default = {
"prod-web-01" = { instance_type = "t3.medium", ami_id = "ami-0abcdef123" }
"prod-web-02" = { instance_type = "t3.medium", ami_id = "ami-0abcdef123" }
}
}
resource "aws_instance" "web" {
for_each = var.web_servers
instance_type = each.value.instance_type
ami = each.value.ami_id
tags = {
Name = each.key
}
# ...
}
In this example, using for_each with a map ties each server to a unique key (e.g., “prod-web-01”). As long as you don’t remove a key from the map, Terraform won’t destroy a server even if the order of the others changes.
Conclusion
Terraform is a powerful tool that has revolutionized infrastructure management. But being aware of that power, and understanding the risks that come with it, is the key to using it responsibly. The terraform plan command is one of the most valuable tools we have for managing those risks. Reading its output carefully, avoiding blind apply automation, and using protective mechanisms like prevent_destroy are the foundation for preventing wrong-resource-deletion disasters.
It’s natural to chase speed and efficiency when automating your infrastructure, but you should never compromise on safety and consistency. A solid CI/CD pipeline, paired with a thorough review process and well-defined policies, lets you reap all of Terraform’s benefits while minimizing the potential risks. Don’t forget — well-planned and well-supervised automation is the foundation of smooth, safe infrastructure.
So, how do you manage your Terraform plans inside your automation pipelines? What strategies do you use to prevent wrong-resource-deletion scenarios? Don’t be shy — drop your thoughts and experiences in the comments!