In shared storage systems such as virtual datacenters executing many virtual machines (VMs), multiple hosts may share the same set of storage devices and/or the same set of storage input/output (I/O) paths. Avoiding fabric congestion with some existing systems is difficult at least because of the complexity and changes in fabric topology, I/O load, and I/O path selection. For example, VMs, virtual disks, hosts, storage devices (or other storage areas), and Fibre channel links may be added and removed dynamically.
Further, advanced disk technology such as in solid-state disks (SSDs) provides better random I/O performance over other types of disks. SSDs are being used as cached, front-end tiers, and/or complete spindle replacements. With SSDs, it may be possible to achieve as high as 3 Gbytes/sec of random disk I/O traffic leading to an increase in I/O bandwidth per storage device or logical unit number (LUN). Additionally, high throughput sequential I/O operations such as backups, cloning, and template deployment may saturate fabric links and/or cause failure.
Responding to dynamic events by manually determining optimum paths for the individual hosts is difficult, unreliable, error prone, and unlikely to provide effective load balancing. Further, some of the existing systems attempt to load balance by multi-pathing, throttling I/O, performing LUN path selection techniques, migrating workloads from one host to another, or migrating data from one LUN or datastore to another. Such existing systems, however, fail to distinguish between LUN congestion and link congestion. As such, these existing systems cannot suggest alternate paths for accessing a LUN when the current path is congested. Additionally, many of the existing system operate at the host level and thus cannot produce a global optimum or alter or recommend topology changes or alternate paths to remedy bottlenecks.