1. Field of the Invention
This invention generally relates to the management of resources in a data processing system and more particularly to the management of a disk array storage device.
2. Description of Related Art
Many data processing systems now incorporate disk array storage devices. Each of these devices comprises a plurality of physical disks arranged into logical volumes. Data on these devices is accessible through various control input/output programs in response to commands, particularly reading and writing commands from one or more host processors. A Symmetrix 5500 series integrated cached disk array that is commercially available from the assignee of this invention is one example of such a disk array storage device. This particular array comprises multiple physical disk storage devices or drives with the capability of storing large amounts of data up to several terabytes or more. The management of such resources becomes very important because the ineffective utilization of the capabilities of such an array can affect overall data processing system performance significantly.
Generally a system administrator will, upon initialization of a direct access storage device, determine certain characteristics of the data sets to be stored. These characteristics include the data set size, and volume names and, in some systems, the correspondence between a logical volume and a particular host processor in a multiple host processor system. Then the system administrator uses this information to configure the disk array storage device by distributing various data sets across different physical devices accordingly with an expectation of avoiding concurrent use of a physical device by multiple applications. Often times allocations based upon this limited information are or become inappropriate. When this occurs, the original configuration can degrade overall data processing system performance dramatically.
One approach to overcoming this problem has been to propose an analysis of the operation of the disk array storage device prior to loading a particular data set and then determining an appropriate location for that data set. For example, U.S. Pat. No. 4,633,387 to Hartung et al. discloses load balancing in a multi-unit data processing system in which a host operates with multiple disk storage units through plural storage directors. In accordance with this approach a least busy storage director requests work to be done from a busier storage director. The busier storage director, as a work sending unit, supplies work to the work requesting, or least busy, storage director.
U.S. Pat. No. 5,239,649 to McBride et al. discloses a system for balancing the load on channel paths during long running applications. In accordance with the load balancing scheme, a selection of volumes is first made from those having affinity to the calling host. The load across the respective connected channel paths is also calculated. The calculation is weighted to account for different magnitudes of load resulting from different applications and to prefer the selection of volumes connected to the fewest unused channel paths. An optimal volume is selected as the next volume to be processed. The monitored load on each channel path is then updated to include the load associated with the newly selected volume, assuming that the load associated with processing the volume is distributed evenly across the respective connected channel paths. The selection of the following volume is then based on the updated load information. The method continues quickly during subsequent selection of the remaining volumes for processing.
In another approach, U.S. Pat. No. 3,702,006 to Page discloses load balancing in a data processing system capable of multi-tasking. A count is made of the number of times each I/O device is accessed by each task over a time interval between successive allocation routines. During each allocation, an analysis is made using the count and time interval to estimate the utilization of each device due to the current tasks. An estimate is also made with the anticipated utilization due to the task undergoing allocation. The estimated current and anticipated utilization are then considered and used as a basis for attempting to allocate the data sets to the least utilized I/O devices so as to achieve balanced I/O activity.
Each of the foregoing references discloses a system in which load balancing is achieved by selecting a specific location for an individual data set based upon express or inferred knowledge about the data set. An individual data set remains on a given physical disk unless manually reconfigured. None of these systems suggests the implementation of load balancing by the dynamic reallocation or configuration of existing data sets within the disk array storage system.
Another load balancing approach involves a division of reading operations among different physical disk drives that are redundant. Redundancy has become a major factor in the implementation of various storage systems that must also be considered in configuring a storage system. U.S. Pat. No. 5,819,310 granted Oct. 6, 1998 discloses such a redundant storage system with a disclosed disk array storage device that includes two device controllers and related disk drives for storing mirrored data. Each of the disk drives is divided into logical volumes. Each device controller can effect different reading processes and includes a correspondence table that establishes the reading process to be used in retrieving data from the corresponding disk drive. Each disk controller responds to a read command that identifies the logical volume by using the correspondence table to select the appropriate reading process and by transferring data from the appropriate physical storage volume containing the designated logical volume.
Consequently, when this mirroring system is implemented, reading operations involving a single logical volume do not necessarily occur from a single physical device. Rather read commands to different portions of a particular logical volume may be directed to any one of the mirrors for reading from preselected tracks in the logical volume. Allowing such operations can provide limited load balancing and can reduce seek times.
Other redundancy techniques and striping techniques can tend to spread the load over multiple physical drives by dividing a logical volume into sub-volumes that are stored on individual physical drives in blocks of contiguous storage locations. However, if the physical drives have multiple logical volumes, sub-volumes or other forms of blocks of contiguous storage locations, the net effect may not balance the load with respect to the totality of the physical disk drives. Thus, none of the foregoing references discloses or suggests a method for providing a dynamic reallocation of physical address space based upon actual usage.
Therefore it is an object of this invention to enable a dynamic reallocation of data in a plurality of physical disk storage devices to reduce any imbalance of load requirements on each physical disk storage.
Another object of this invention is to determine the relative utilization of physical disk storage devices to reduce imbalances in the utilization.
In accordance with this invention, the load on a plurality of physical disk storage devices can be balanced by exchanging data blocks on two physical disk storage devices that are divided into blocks of contiguous storage locations. A list of all pairs of exchangeable data blocks on the physical disk storage devices is prepared. Disk utilization statistics are compiled for each data block in each physical disk storage device over a time interval. A configuration with a pair of data blocks on different physical disk storage devices is implemented if an analysis determines that the exchange will improve physical disk storage device operations.
In accordance with another aspect of this invention, access activity on physical disk storage devices that are divided into blocks of contiguous storage locations, is balanced. The balancing is achieved by compiling disk access statistics for each block over a time interval and compiling a list of all pairs of exchangeable blocks on the physical disk storage devices. The compiled disk access statistics are used to generate a disk utilization time that represents the total time required to complete all disk accesses during the time interval. The disk utilization times are used to select a configuration of blocks on the physical disk storage devices with various exchange blocks to determine an optimal exchange. Thereafter the data on the two identified blocks are exchanged.
In accordance with another aspect of this invention, the activity on a plurality of disk storage devices is balanced. At least two of the physical storage devices are divided into a plurality of logical volumes and including movable read/write heads. Balancing is achieved by compiling a list of all pairs of exchangeable logical volumes on the physical disk storage devices, defining an analysis time interval comprising a plurality of subintervals and recording, as a function of time in each subinterval, disk accesses for the transfer of data and the amount of data transferred for each logical volume. Thereafter a disk utilization value is generated in response to the disk accesses and the amount of data transferred as recorded for each logical volume during each subinterval. A pair of logical volumes on different disk storage devices are then selected for an exchange based upon the disk utilization values. A pair of logical volumes is exchanged if it determined that the exchange of the selected pair will improve the total operation of all the physical disk storage devices.
In accordance with still another object of this invention, activity on a plurality of physical disk storage devices is balanced. The physical devices are divided into zones of contiguous cylinders having different characteristic data transfer rates and that store a plurality of logical volumes, wherein each logical volume can be stored in at least one zone. Each physical disk storage device also includes a storage medium that moves at a characteristic angular velocity and read write heads that move between cylinders during seek operations. Balancing is achieved by compiling a list of all pairs of exchangeable logical volumes on the physical disk storage devices and defining an analysis time interval comprising a plurality of subintervals. The number and type of disk accesses and the amount of data transferred during each disk access are recorded, the types of disk accesses being taken from the group of read, sequential read and write to disk accesses. A weighted number of accesses is determined according to the sum of random reads, the number of sequential reads divided by four and the number of disk writes divided by two. For each logical volume and subinterval a disk utilization time is generated that includes the sum of values obtained by generating a seek time value based upon the number of layered accesses, generating a latency time value corresponding to the produce of the total number of random reads, disk writes and the number of sequential read operations and of the time required for the spindle to rotate one-half revolution and by generating a data transfer time by determining the percentage of the logical volume that is located in each zone, apportioning the data transfers to the logical volume according to the determined percentages, generating a data transfer time for each zone according to the amount of data apportioned to the zone and the data transfer rate for that zone in combining the data transfer times for each zone to obtain a data transfer time for the logical volume. Additional utilization times for each subinterval are then summed for each logical volume. A pair of logical volumes on different disk storage devices then is selected for an exchange based upon the relative disk utilization values for each logical volume. The exchange is made if it is determined that the exchange will improve the balance of operations on the physical disk storage devices.