İçeriğe Atla
Mustafa Erbay
Tutorials · 10 min read · görüntülenme Türkçe oku
100%

Docker Disk Storage Wars: A Guide to Data Integrity on VPS

I explain how I manage Docker disk space on my own VPS, ensure data integrity, and the problems I've encountered.

An image showing Docker containers and disk usage, reflecting disk fullness and data integrity concepts.

Last month, on the morning of April 28th, I woke up to a “Disk Space Critical” email from my own VPS. When I ran df -h, I saw that the / directory was 100% full. As I suspected, Docker was the culprit again. When you manage more than 13 containers on the same server, if even one of them gets out of control, it’s only a matter of time before the whole system hits swap and gets OOM-killed.

This situation is a familiar scenario for anyone hosting multiple applications on their own server and constantly struggling with disk storage issues. In this post, I will explain how I manage these “Docker Disk Storage Wars” on my VPS, how I ensure data integrity, and how I optimize disk space. My goal is to guide you with practical solutions and experiences I’ve personally lived through.

Why Do Docker Disk Storage Wars Happen?

The disk space on my VPS filling up suddenly has almost become a routine for me. Most of the time, the root of the problem lies in unexpected log growth from a container, unnecessary images, or build caches. For example, at one point, the build caches of my Next.js applications reached 33 GB, and on top of that, unused images ate up another 23 GB. That’s when the disk hit 100%.

Such situations, especially if you are using a VPS with limited resources, can lead to serious performance degradation and even service interruptions. A container’s disk I/O spiking can lead to kcompactd using 92% CPU and sshd being unable to accept new connections. Therefore, it is necessary to understand the root of the problem well.

Common Disk Space Consumers

Knowing the elements that consume the most disk space in the Docker ecosystem is the first step to solving the problem. In my experience, the leading ones are:

  • Dangling Images and Volumes: Unused or disconnected images and volumes can take up gigabytes of space without you realizing it. This becomes inevitable, especially if you rebuild images frequently.
  • Build Cache: Even if you don’t use multi-stage builds, Docker creates intermediate layers at every build step. These caches can accumulate and reach massive sizes. My 33 GB build cache problem was a perfect example of this.
  • Container Logs: Especially “chatty” applications or services running in debug mode can grow log files uncontrollably. Gigabytes of logs can accumulate within a few days; I’ve even witnessed critical services of a bank’s internal platform stop because of this.
  • Ephemeral Data and Temporary Files: Temporary files created by applications during runtime can become permanent fixtures on the disk if not cleaned up properly.

To detect these issues, the docker system df command is very useful. This command provides a detailed summary of Docker’s disk usage, allowing me to understand which component is taking up how much space.

docker system df

The output of this command shows how much disk space each Docker component is using. For example, you can find detailed information under categories like “Images”, “Containers”, “Local Volumes”, and “Build Cache”. Specifically, the “Reclaimable” area shows the amount of disk space you can recover through manual intervention.

Data Integrity and Persistent Storage Strategies

One of the most important issues when running applications in Docker is ensuring that data remains persistent even if containers die. Since I host my own sites and side projects (like hesapciyiz.com, spamkalkani.com), it’s critical that data isn’t lost. A wrong configuration or automatic cleanup can cause your data to vanish instantly.

Therefore, it is essential to correctly understand Docker’s storage mechanisms and develop proactive strategies. I generally prefer using volumes because they are easier to manage and are a persistent storage solution designed within Docker itself.

Docker Volumes vs. Bind Mounts

Docker offers two main methods for making data persistent: volumes and bind mounts. Both have their own advantages and disadvantages; the choice depends a bit on the use case.

  • Volumes: File systems managed by Docker. They are usually located under /var/lib/docker/volumes. Docker creates, manages, and cleans up volumes. They are ideal for sharing data between containers and are abstracted from the details of the host operating system. This is my preferred method. It’s a must-have, especially for stateful applications like postgres or redis.
  • Bind Mounts: Directly maps a specific directory on the host system into the container. It provides direct access to files and directories on the host system. It’s very useful in development environments for scenarios like mounting source code into the container. However, it creates a dependency on the host system’s path and can carry security risks (like the container accessing sensitive files on the host system).

Persistent Data Backup Strategies

Data integrity doesn’t just end with freeing up disk space; it’s also important to be able to restore data in case of a disaster. Automatic backups play a critical role in my setup.

  • Volume Backups: You can back up volumes by using the docker cp command or by running backup tools from within a container. I have a cron job that regularly takes a pg_dump of my postgres databases and sends it to remote storage.
  • Snapshots: Using the snapshot features offered by your VPS provider is a quick and easy way to take a backup of the entire disk image. However, this might not be sufficient for database consistency, so application-specific backups are still necessary.

Managing Disk Space Wisely

Now let’s get to the main point: how do I deal with these disk space issues? For me, proactive management and automation are the keywords. Considering I manage more than 13 containers on a VPS, it’s impossible to do everything manually.

Regular Cleaning Routines

The first thing to do manually is to use Docker’s own cleanup commands. The docker system prune command is a one-stop solution for cleaning up dangling images, containers, networks, and build cache.

# Cleans up all dangling containers, images, and networks
docker system prune

# Cleans up all dangling and unused images (including non-dangling ones)
docker image prune -a

# Cleans up all dangling and unused volumes
docker volume prune

Automated Cleanup Mechanisms

Manual cleaning is fine, but forgetfulness is human nature. For sysadmins like me, automation is a must. I handle this with cron or systemd timers.

Here is an example similar to the systemd timer and service files I use:

/etc/systemd/system/docker-prune.service:

[Unit]
Description=Clean up old Docker images, containers and volumes
Wants=network-online.target
After=network-online.target docker.service

[Service]
Type=oneshot
ExecStart=/usr/bin/docker system prune -af --volumes

/etc/systemd/system/docker-prune.timer:

[Unit]
Description=Run Docker prune weekly

[Timer]
OnCalendar=weekly
Persistent=true

[Install]
WantedBy=timers.target

After creating these two files, I enable the service with sudo systemctl daemon-reload, sudo systemctl enable docker-prune.timer, and sudo systemctl start docker-prune.timer. This way, unnecessary Docker resources are automatically cleaned up every week. The _work/_temp directories of my GitHub Actions self-hosted runner also get their share of such automatic cleanups; it can be annoying sometimes, but I prefer it over a full disk.

Log Management

Container logs are the silent enemies of disk space. As soon as a service starts giving errors, logs begin to accumulate and fill the disk in a short time. To prevent this, I use logging options in my docker-compose.yml file.

version: '3.8'
services:
  my_app:
    image: my_app_image
    ports:
      - "80:80"
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

This configuration limits the size of each log file to 10MB and keeps a maximum of 3 files. This means a maximum of 30MB of log files is kept for a container. This has largely prevented log bloating on my VPS.

Disk Space Monitoring and Alerts

No matter how good automatic cleaning is, unexpected situations can always arise. Therefore, it’s very important to constantly monitor disk space and receive alerts when it reaches critical levels.

I use Prometheus and Grafana in my systems. I monitor disk usage with Node Exporter and send notifications via Discord or email when certain thresholds are exceeded (e.g., 80% or 90% fullness). This “preflight resource guard” mechanism allows me to intervene much earlier than my Pipeline-health monitor sending a “DEGRADED” email.

# Alert if disk usage is over 80%
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 20

With this simple PromQL query, you can monitor disk fullness and manage alerts via Alertmanager.

Capacity Planning and Performance

Predicting future needs and optimizing performance is just as important as freeing up disk space. Especially in a situation where my Astro build consumes 2.5 GB of RAM and the system has 7.6 GB of RAM, using resources efficiently becomes critical.

Image Optimization

Smaller Docker images take up less disk space and build faster. This both saves disk space and speeds up deployment processes.

  • Multi-stage Builds: This technique separates the requirements used in the build phase (compilers, SDKs, etc.) from the final image. For example, when building a Go application, you can use a base image containing the Go compiler and then build the final image on a much smaller scratch or alpine image containing only the executable.
  • .dockerignore File: The .dockerignore file works like .gitignore. It significantly reduces the image size by specifying files that should not be included in the build context (node_modules, .git, .env, etc.).
  • Minimal Base Images: Choosing minimal base images like alpine or distroless prevents unnecessary libraries and tools from being included in the image.

Storage Drivers and Performance

Docker uses various storage drivers to manage containers and images. overlay2 is the default and most recommended driver today. It is quite successful in terms of performance and disk space usage efficiency.

If you have an old Docker installation or are using a different driver, you can check your current driver with the docker info command. Usually, you won’t have any issues with overlay2, but if you’re experiencing performance problems, checking the driver can be a good starting point.

Swap Usage and Disk I/O

I’ve encountered many OOM scenarios on my VPS. Especially when a container’s memory consumption increases, the system falling into swap and then disk I/O hitting the ceiling locks up the entire system. I remember writing sleep 360 last month, getting OOM-killed, and having to switch to polling-wait. This situation shows how critical disk performance is for overall system health.

High disk I/O increases swap usage, which slows down the system even further. To break out of this vicious cycle, it’s necessary to both manage disk space and optimize memory consumption. Security measures, such as blacklisting the algif_aead module in kernel vulnerabilities like cve-2026-31431, are also indirectly important for the stable operation of the system.

Conclusion

Operating on your own server with Docker brings many challenges that make us say, “it happens.” Disk storage wars are just one of them. However, with the right strategies, proactive monitoring, and automation, it is possible to manage these issues. My experiences on my own VPS show that regular cleaning, smart log management, and capacity planning are indispensable for a stable and high-performance system.

I hope this guide helps you deal with Docker disk issues on your own VPS or servers. Remember, the important thing is not to panic when you see the problem, but to understand the root cause and produce permanent solutions. If you have similar stories or different solutions regarding this topic, I’d love to hear them. In the next post, I’ll talk about a bug I encountered in my AI generation pipeline and how I solved it.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts