In complex computer storage environments involving, e.g., multiple hosts and/or storage systems, the ability to provide efficient input/output (I/O) operations to and from physical storage remains an important task for system administrators. Often, noticeable performance degradations will occur as a result of slow I/O operations within a storage device. The ability to quickly analyze and resolve the problem can be critical to maintaining a high performing system.
On host systems (i.e., servers) where the physical disks are directly attached to the system initiating I/O, currently available performance analysis tools on a host can be used to evaluate I/O performance at the level of each physical disk, since the operating system initiates the I/O on the physical disk, and can track the response time.
However, with the advent of technologies such as RAID (redundant array of inexpensive disks) disk systems, data is divided and stored amongst a set of physical disks but appears as a single disk to the host system. However, because the RAID system presents a virtual disk (often called a LUN) to the host system (which treats it as a physical disk, i.e. performs I/O operations on the virtual disk), it is not possible for the host system to determine the physical disk which is actually accessed when the host system initiates an I/O operation on such a presented disk. Instead, a virtual to physical mapping is done in the RAID system, independently of the host system. In a simple case, the physical disk is one level removed from the host which initiated the I/O.
This problem is further exacerbated with the use of disk virtualization systems, which can for instance be installed between the host system and a RAID system. In this case, the disk virtualization system creates yet another level of virtual to physical mapping. The virtual disk created on the RAID system is presented to the disk virtualization system, which treats the presented virtual disk as its physical disk. The disk virtualization system can combine one or more disks presented to it by RAID disk systems to create a single virtual disk, which is presented to the host system as a single disk. In this case, the actual physical disk is two systems removed from the host system which initiates the I/O.
In these latter cases, it is not possible for the host system to determine the physical location where an I/O request is actually serviced. This causes various types of problems in performance and capacity monitoring. First, in the case of performance analysis, when the response times on one or more disks used by the host are slow, it is not possible to easily determine the exact location where the I/O was satisfied, which is key to finding the root cause of the slow I/O. In limited situations the physical location can be approximated by merging host and disk performance and topology data—see, for example, http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/PRS1230. However, one cannot determine where (i.e., which system—in cache or on disk) a specific I/O operation was fulfilled at with current technology and processes.
Secondly, in the case of capacity planning, when there are many applications using a disk virtualization or RAID system, the host system cannot determine what load it is generating (e.g., I/Os per second satisfied in cache or on physical disk) in the disk virtualization or RAID system. If planning to scale up one application that shares the RAID system with many applications, in order to assess whether the RAID system must be upgraded to support the added workload, one must be able to determine what percent of the RAID system load is initiated by the workload that is changing.