Storage I/O Latency Battles in Legacy Virtualization

“Storage I/O Latency” Battles in Legacy Virtualization Infrastructure

In today’s modern IT world, virtualization technologies are at the heart of how we manage workloads and optimize resources. But on legacy virtualization infrastructure in particular, performance issues come up often. At the top of that list is “Storage I/O Latency.” Virtualized environments are extremely sensitive to disk I/O (Input/Output) operations, and high latency can slow applications to a crawl — or render them completely unusable.

In this post I’ll go into detail on the roots of the Storage I/O Latency problems you hit on legacy virtualization infrastructure, the methods for spotting those problems, and most importantly, the strategies you can apply to win these battles. The aim is to give you practical knowledge and solutions for getting the best possible performance out of the infrastructure you have.

What Is Storage I/O Latency and Why Does It Matter?

Storage I/O Latency is the time it takes for a data read or write request a VM or application sends to the storage system to complete. That duration covers the time from when the request starts to when the storage system acknowledges it. It’s typically measured in milliseconds (ms), and lower values mean faster, more efficient storage access.

That latency is a critical performance indicator in virtualized environments. High I/O latency can slow database queries, stretch out web app response times, and seriously hurt user experience in virtual desktop (VDI) environments. On legacy systems, the chance of running into these issues is higher because of shared storage resources and less optimized technologies.

Sources of Latency in Legacy Virtualization Infrastructure

There are a lot of factors that produce Storage I/O Latency on legacy virtualization infrastructure. Those factors usually lie along a complex chain that runs from the physical storage units up through the virtualization layer. To solve the problems, you first need to correctly identify the sources of that latency.

Physical Disk Limitations

Legacy systems usually used Hard Disk Drive (HDD) based storage solutions. Because of HDDs’ nature — they have mechanical parts — higher latency is unavoidable compared to SSDs. The disks’ rotational speed (RPM) and the seek time of their read/write heads directly affect I/O performance.

Plus, when more than one VM is doing heavy I/O on the same disk group, you hit a situation called “spindle contention.” That’s when the disk head has to constantly move between different I/O requests, which drags performance down. Older disk arrays usually have fewer spindles (physical disks), so this comes up more often.

SAN (Storage Area Network) Bottlenecks

The Storage Area Network (SAN) architecture lets many virtual servers access a centralized storage system. But the SAN itself can also be a source of bottlenecks. Old or low-capacity Host Bus Adapter (HBA) cards can limit the data flow from the server to the storage.

The capacity, port speeds, and configuration of the Fiber Channel or iSCSI switches also matter. A SAN switch with insufficient bandwidth or one that’s overloaded hurts the I/O performance of the whole system. An overloaded storage controller with insufficient processing power or memory also raises overall latency. Misconfigured multipathing settings (Round Robin, Fixed, MRU) can also keep I/O paths from being used effectively, producing more latency.

The Effect of the Virtualization Layer

The hypervisor layer (VMware ESXi, Microsoft Hyper-V) sits between the physical hardware and the VMs. That layer creates a certain amount of overhead. Older hypervisor versions may be less optimized than the modern ones, and that can produce extra latency during I/O operations.

The structure and block sizes of file systems like VMFS (VMware File System) or NTFS also affect performance. Plus, misconfigured resource management mechanisms like Storage I/O Control (SIOC) — or similar features on other virtualization platforms — can cause some VMs to hurt the others. VM snapshots can also seriously drag I/O performance down, because snapshots complicate the data flow on disk and require extra read/write operations.

Network Layer Issues (in iSCSI cases)

If iSCSI is used for the storage connection, the network layer also plays a critical role. The speed (1 GbE vs. 10 GbE) and capacity of the network cards determine how iSCSI traffic flows. Bottlenecks, packet loss, or high latency on the network directly affect iSCSI-based storage performance.

Using Jumbo Frames can boost performance in certain scenarios, but every device on the network has to support it and it has to be configured correctly. Plus, NIC teaming and load-balancing settings done correctly distribute iSCSI traffic across multiple network paths and lift performance. A misconfigured network can produce effects similar to a SAN bottleneck.

Tools for Detecting and Monitoring Latency Issues

Before solving problems, it’s essential to understand the current state and correctly identify the sources of latency. Even on legacy infrastructure there are several monitoring tools and metrics you can use.

Hypervisor-Level Monitoring

The management interfaces of virtualization platforms offer powerful tools for monitoring I/O performance:

VMware vCenter (vROps): vCenter provides extensive performance graphs for VMs, hosts, and datastores. The “Disk Read Latency (ms)” and “Disk Write Latency (ms)” metrics in particular show storage latency directly. IOPS (Input/Output Operations Per Second) and Throughput (MB/s) values also help you understand storage utilization. Average latency above 20-30ms is generally a sign of a problem.
Hyper-V Manager / System Center Virtual Machine Manager (SCVMM): Similar metrics can be monitored in Hyper-V environments. Metrics like Disk Queue Length, Average Disk sec/Read, and Average Disk sec/Write are important for understanding disk latency.

These tools let you identify which VMs or which storage units are experiencing the most latency.

OS-Level Monitoring

Monitoring I/O performance from inside the VMs is also important for understanding the effect at the application layer.

Windows Performance Monitor (Perfmon): On Windows-based VMs, the Perfmon tool provides plenty of valuable metrics under the LogicalDisk and PhysicalDisk counters. Counters like Avg. Disk sec/Read, Avg. Disk sec/Write, and Current Disk Queue Length show disk performance from inside the VM. These values can differ from the hypervisor-level latency, because the VM’s own file system and OS caching are in play.
Linux (iostat, vmstat): On Linux-based VMs, the iostat -x 1 command shows disk I/O statistics in real time (read/write speed, average wait time of I/O requests — await — and util%). vmstat, on the other hand, gives you info about overall system resources and I/O statistics.

These tools are essential for understanding how a particular application or service is being affected by storage performance.

Storage-Unit Level Monitoring

The management interfaces of SAN or NAS devices give you the most detailed information about the storage system’s overall health and performance.

SAN Management Interfaces: Interfaces from storage vendors like Dell EMC, NetApp, and HPE 3PAR provide controller utilization, disk queue depth, cache hit ratio, port performance, and per-disk-group I/O metrics. That data is critical for determining whether the storage system itself is a bottleneck.
Disk Queue Depth: The queue of I/O requests sent to a storage controller or a single disk. High queue depth indicates that the storage system is struggling to keep up with demand.

Monitoring at all these levels lets you pinpoint exactly where the latency starts and which link in the chain is the weak one.

Strategies and Solutions for Winning Latency Battles

Fighting Storage I/O Latency on legacy virtualization infrastructure takes a multi-faceted approach that includes both hardware improvements and software configuration optimizations.

Hardware Improvements

Some hardware changes you can make within your existing budget and infrastructure can deliver a significant boost in performance:

Move to SSD/Flash Storage: One of the most effective solutions is moving from HDD-based storage to Solid State Drive (SSD) or hybrid (HDD+SSD tiered) storage. SSDs offer much lower latency and much higher IOPS because they don’t have mechanical parts. If switching all your storage isn’t feasible, just creating SSD tiers for the most critical VMs or using an SSD-based caching solution can make a big difference.
Faster HBAs and SAN Switches: Replacing the HBAs in your servers with higher-speed cards (8 Gbps or 16 Gbps Fiber Channel, 10 GbE iSCSI) can speed up data flow. Similarly, raising the capacity and speed of SAN switches removes bottlenecks.
Increase Storage Controller Capacity: The storage unit’s controller manages all I/O operations. If the controller doesn’t have enough processing power and memory (cache), latency is unavoidable. If possible, moving to a more powerful controller or adding cache to the existing one can lift performance.
Add More Spindles (Disks) for IOPS Headroom: On HDD-based systems, adding more physical disks raises total IOPS capacity. The effect is more pronounced when you add disks to striped RAID levels like RAID 10.

Configuration Optimizations

Hardware changes aren’t always feasible. But software configuration optimizations can deliver real gains too:

RAID Level Choice: For critical applications, prefer performance-oriented RAID levels like RAID 10. If data integrity and capacity are the priorities, you can use RAID 5 or RAID 6, but remember there’s a write performance cost.
Datastore/LUN Sizing and Alignment: Size datastores or LUNs according to the workload. Make sure virtual disks are properly aligned with the physical disk sectors. Misalignment can require extra reads/writes on every I/O operation.
Properly Configuring Multipathing Policies: On SAN connections, enable policies like Round Robin using NMP (Native Multipathing) in VMware or third-party plugins (PowerPath, MPIO). That distributes I/O traffic across multiple paths and provides both performance and fault tolerance.
I/O Scheduler Settings on the Hypervisor: On Linux-based VMs or some hypervisors, I/O scheduler settings (Noop, Deadline, CFQ) can affect disk performance. “Noop” or “Deadline” schedulers are typically recommended for virtualized environments.
VM Disk Types: Creating VM disks as Thick Provision Eager Zeroed pre-allocates and zeroes the disk space. That can boost performance on subsequent writes, but raises disk space usage. Thin Provisioning offers flexibility but can produce extra latency on initial writes or when the disk grows.

Virtualization Layer Optimizations

Some adjustments at the hypervisor level can also reduce I/O latency:

Minimize or Plan Snapshot Use: Snapshots are useful for providing rollback points but they seriously drag I/O performance down. Avoid leaving snapshots on for long periods in production environments. For backup and similar operations, consider alternatives to snapshots when possible (VM cloning or storage-based snapshots, for example).
Use Storage I/O Control (SIOC) or Similar Mechanisms Effectively: Features like VMware SIOC fairly distribute I/O resources on a datastore across VMs, preventing “noisy neighbor” problems. They keep a heavy-I/O VM from negatively affecting others.
Allocate Sufficient Resources (CPU/RAM) to VMs: Insufficient CPU or RAM can prevent the VM from processing I/O operations effectively. Allocating sufficient resources to VMs indirectly improves I/O performance.
Disk Queue Depth Settings: In some cases, optimizing disk queue depth settings at the VM or hypervisor level may be necessary. That controls how many I/O requests can be sent to the storage system at once.

Application-Level Improvements

Sometimes the issue isn’t the storage infrastructure itself but the way applications are using the storage:

Database Optimization: Database servers are at the top of the I/O-intensive workload list. Optimizing indexes, tuning queries, and keeping database log files on a separate LUN or datastore can significantly improve I/O performance.
Application Caching: Using application-level caching mechanisms reduces reads of frequently accessed data from disk, lowering I/O load and lifting performance.
Separating Log Files: OS and application log files do constant write operations. Moving those log files to a separate disk or datastore with a less critical I/O load can protect the performance of the main application disks.

Looking Ahead: Modern Approaches

While these optimizations on legacy infrastructure deliver significant benefits, in the long run a move to modern technologies is unavoidable.

HCI (Hyperconverged Infrastructure): HCI solutions (Nutanix, VMware vSAN) combine compute and storage in a single server cluster, shortening the I/O path and reducing latency. The distributed storage architecture offers high performance and scalability.
All-Flash Storage Systems: Storage systems built entirely from SSDs deliver sub-millisecond latency and millions of IOPS, providing enough performance even for the heaviest workloads.
NVMe over Fabrics (NVMe-oF): Next-generation NVMe-based storage offers much lower latency and higher bandwidth than traditional SAN protocols. NVMe-oF distributes that performance over the network and forms the foundation for next-generation data centers.

These modern technologies solve the “Storage I/O Latency” battles at the root and remove many of the difficulties faced on legacy systems. But applying the strategies above to get the best out of your current infrastructure is essential for ensuring performance continuity during the migration period or over the long term.

Conclusion

Fighting “Storage I/O Latency” on legacy virtualization infrastructure is a complex process that takes constant attention and effort. But by applying the comprehensive strategies covered in this article, you can get significantly better performance out of your current systems. Understanding the root causes of the problems, detecting them with the right monitoring tools, and then optimizing at the hardware, software, and application levels — that’s the key to winning these battles.

Remember: every environment is unique, and finding the best solution requires continuous monitoring, testing, and tuning. With a proactive approach, you can deliver a smooth, high-performance experience to your users and applications even on legacy virtualization infrastructure. And as you move to the technologies of the future, understanding the fundamental I/O principles forms the foundation for building more solid, efficient systems.

Storage I/O Latency Battles in Legacy Virtualization