The present invention relates to computer Input/Output (I/O) systems and devices and, more specifically, to real time monitoring of system performance to identify SAN and I/O device conditions causing performance degradations.
Storage Area Networks (SANs) can consist of a number of physically separate fiber channel switches with hundreds and possibly thousands of ports connected together to form a single logical fabric. Although a single logical fabric can consist of many physical switches with redundant inter switch links (ISLs), the switch fabric as a whole is a single point of failure because the intelligence that manages the fabric, e.g. the name server, can fail. Clients that require continuous availability for accessing devices from computer systems over the SAN will typically configure redundant paths from the host to the storage devices through a fabric and also deploy redundant fabrics. There are many causes for poor performance in a fabric without an explicit error being detected. For example, firmware errors managing ISL traffic can have bugs, and high traffic can cause congestion which in turn can cause secondary effects where I/O traffic is delayed. The target storage subsystems can also have errors causing I/O delays on specific channel paths. The host processor and operating system may also have errors in their path selection algorithms leading to congestion and resulting in unnecessary high average I/O service times.