İçeriğe Atla
Mustafa Erbay
Life Written by human · 11 min read · görüntülenme Türkçe oku
100%

Swap Fire on My 7.6GB VPS: A Nightmare That Started with a Kernel

Swap usage on my VPS suddenly spiked. I detail the root cause, solution, and lessons learned from this issue that began with a kernel CVE patch.

Swap Fire on My 7.6GB VPS: A Nightmare That Started with a Kernel — true story cover image

Why Did Swap Usage Spike on My 7.6GB VPS?

This morning, I received an alert from my server: swap usage had risen abnormally. For a VPS with 7.6 GB of RAM, this was an unexpected situation. Normally, I barely used my swap space. To understand the situation, I immediately connected to the server and checked memory with the htop command.

In the htop output, I saw that much more swap space was being used than I expected. This indicated that the running applications’ memory needs had increased, or there was a memory leak somewhere. My first suspect was a recent kernel update I had performed.

Kernel Update and Swap Usage

A few days ago, I had updated the Linux kernel on my server to the latest stable version. These updates typically patch security vulnerabilities and improve performance. However, they can sometimes lead to unexpected side effects. Issues related to kernel modules, in particular, can have significant impacts on the system’s overall memory management.

The increase in swap usage after this update couldn’t be a coincidence. I immediately started examining the server’s logs. I specifically looked at /var/log/syslog and journalctl outputs, trying to catch clues that the kernel was encountering errors during certain memory allocation operations.

Finding the Root Cause: The Debugging Process

After reviewing the logs, I noticed errors specifically related to a certain kernel module. This module was responsible for processing network packets and had recently been patched to close a CVE (Common Vulnerabilities and Exposures) vulnerability. It appeared that the patch had introduced a regression in the module’s memory management.

To understand this error more clearly, I also used the dmesg command. The dmesg output confirmed that the kernel was experiencing issues with memory allocation and deallocation operations. In some cases, memory allocated by the kernel was not being freed correctly, which gradually led to increased swap usage.

Which CVE and Which Module?

Through my research, I identified that the source of the problem was an issue in the algif_aead kernel module. This module provided hardware acceleration for certain encryption algorithms. A recently released security patch had closed this vulnerability, but the patch itself was causing a memory leak. This was a classic example of a “fix” turning into a “break.”

At this point, I realized that this problem was not limited to my VPS alone and could occur on similar systems. Kernel-level issues like these can pose significant risks, especially for servers hosting high-traffic or sensitive applications.

Temporary Solution: Managing Swap Usage

Since the root cause was at the kernel level, finding a quick permanent solution was difficult. I had to wait for the kernel developers to release a new patch. In the meantime, I needed to implement temporary solutions to ensure the server’s stability.

First, I reduced the swappiness value to make the system manage swap usage more aggressively. swappiness determines how inclined the kernel is to use swap space instead of RAM. Lowering the value encourages the kernel to use RAM for longer.

Additionally, I tried to reduce the overall memory pressure by adjusting the runtime of some memory-intensive applications or using less memory-consuming alternatives. Although this was a temporary measure, it helped bring swap usage under control.

Swap Management with sysctl Settings

To adjust the swappiness value, I used the sysctl command. To make it permanent, I also added the necessary settings to the /etc/sysctl.conf file.

# Check the current swappiness value

cat /proc/sys/vm/swappiness

# Lower swappiness to 10 (default is usually 60)
sudo sysctl vm.swappiness=10

# To make it permanent, add to /etc/sysctl.conf
# vm.swappiness=10

These settings reduced the server’s tendency to use swap space. However, this did not solve the problem fundamentally; it only alleviated the symptoms. The real solution would come with a new kernel patch or the correction of the existing patch’s error.

Permanent Solution: New Kernel Patch and Aftermath

A few days later, the kernel developers released a new patch that fixed the memory leak in the algif_aead module. I immediately applied this patch to my server and restarted my system.

With the new kernel version, swap usage returned to normal. Seeing swap usage drop to almost zero in the htop output was a relief. This experience once again demonstrated how critical kernel updates are and how they can sometimes lead to unexpected problems.

Lessons Learned and Future Steps

One of the most important lessons I learned from this incident is the need to manage kernel updates carefully in production environments. After applying a patch, it’s important to monitor the server closely for a period and observe any potential side effects.

Furthermore, I realized how crucial it is to have an automated monitoring system in place for my servers. Receiving automatic alerts when critical metrics like swap usage change suddenly helps me detect problems early.

In the future, I plan to test kernel updates in a staging environment before deploying them to production. This will reduce potential risks and prevent downtime on my production servers. This “swap fire” incident on my small VPS served as a reminder of the continuous learning and adaptation required in system administration.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

Frequently Asked Questions

Common questions readers have about this article.

How do I detect and start monitoring suddenly increased swap usage after a kernel update?
I first checked the real-time memory and swap usage with commands like `htop` and `free -m`. Then, I filtered kernel logs using `journalctl -k` to look for warnings and error messages after the update. I ran `vmstat 5` in the background to monitor second-by-second memory allocation changes and tried to determine when and by which process swap was filling up. I also examined the `/proc/swaps` file to ensure active swap files and their sizes were correct. These steps allowed me to differentiate whether the issue stemmed from a memory leak in a kernel module or an application error.
Is it more advantageous to lower the `swappiness` value or completely disable swap space to reduce swap usage?
When I lowered the `swappiness` value to 10, the system used RAM more aggressively, delaying the transition to swap. This resulted in reduced disk I/O load and improved performance. However, completely disabling swap can lead to application crashes when RAM is full, which is risky, especially in unexpected situations like memory leaks. In my experience, a low `swappiness` value (10-20) maintains balance; it provides protection for critical operations by adding a temporary swap file when needed. Therefore, fine-tuning the setting is a safer approach than completely disabling swap.
If the swap issue recurs after a kernel patch, what steps should I follow, and how do I perform a rollback?
If the problem persists, I first note the running kernel version with `uname -r` and then install the previous stable kernel using `apt-get install linux-image-` (Debian/Ubuntu) or `yum downgrade kernel` (CentOS). I then restart the system by selecting the old kernel from the GRUB menu and observe the swap behavior again. If the old version has no issues, I examine the new kernel's release notes and the relevant CVE patch to blacklist specific modules. I also test memory allocation settings, such as `sysctl vm.overcommit_memory=1`. After rolling back, I focus on isolating the source of the problem and ensure I create a checklist for future updates.
Is the view that swap is entirely bad and should never be used correct?
Based on my experience, the generalization that swap is entirely bad is incorrect. Swap acts as a safety net that prevents the system from crashing when RAM is full and plays a critical role, especially with memory-intensive applications. However, heavy swap usage can lead to disk I/O latency and performance degradation, so swap should be considered a 'last resort.' With the right settings (e.g., low `swappiness` and adequate swap size), swap can enhance system stability while minimizing performance loss. Therefore, optimizing swap according to your needs and workload is a healthier approach than disabling it entirely.
ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts