1. Field of the Invention
This invention generally relates to the management of resources in a data processing system and more particularly to the management of a disk array storage device.
2. Description of Related Art
Many data processing systems now incorporate disk array storage devices. Each of these devices comprises a plurality of physical disks arranged into logical volumes. Data on these devices is accessible through various control input/output programs in response to commands, particularly reading and writing commands from one or more host processors. A Symmetrix 5500 series integrated cached disk array that is commercially available from the assignee of this invention is one example of such a disk array storage device. This particular array comprises multiple physical disk storage devices or drives with the capability of storing large amounts of data up to several terabytes or more. The management of such resources becomes very important because the ineffective utilization of the capabilities of such an array can affect overall data processing system performance significantly.
Generally a system administrator will, upon initialization of a direct access storage device, determine certain characteristics of the data sets to be stored. These characteristics include the data set size, and volume names and, in some systems, the correspondence between a logical volume and a particular host processor in a multiple host processor system. Then the system administrator uses this information to configure the disk array storage device by distributing various data sets across different physical devices accordingly with an expectation of avoiding concurrent use of a physical device by multiple applications. Often times allocations based upon this limited information are or become inappropriate. When this occurs, the original configuration can degrade overall data processing system performance dramatically.
One approach to overcoming this problem has been to propose an analysis of the operation of the disk array storage device prior to loading a particular data set and then determining an appropriate location for that data set. For example, U.S. Pat. No. 4,633,387 to Hartung et al. discloses load balancing in a multi-unit data processing system in which a host operates with multiple disk storage units through plural storage directors. In accordance with this approach a least busy storage director requests work to be done from a busier storage director. The busier storage director, as a work sending unit, supplies work to the work requesting, or least busy, storage director.
U.S. Pat. No. 5,239,649 to McBride et al. discloses a system for balancing the load on channel paths during long running applications. In accordance with the load balancing scheme, a selection of volumes is first made from those having affinity to the calling host. The load across the respective connected channel paths is also calculated. The calculation is weighted to account for different magnitudes of load resulting from different applications and to prefer the selection of volumes connected to the fewest unused channel paths. An optimal volume is selected as the next volume to be processed. The monitored load on each channel path is then updated to include the load associated with the newly selected volume, assuming that the load associated with processing the volume is distributed evenly across the respective connected channel paths. The selection of the following volume is then based on the updated load information. The method continues quickly during subsequent selection of the remaining volumes for processing.
In another approach, U.S. Pat. No. 3,702,006 to Page discloses load balancing in a data processing system capable of multi-tasking. A count is made of the number of times each I/O device is accessed by each task over a time interval between successive allocation routines. During each allocation, an analysis is made using the count and time interval to estimate the utilization of each device due to the current tasks. An estimate is also made with the anticipated utilization due to the task undergoing allocation. The estimated current and anticipated utilization are then considered and used as a basis for attempting to allocate the data sets to the least utilized I/O devices so as to achieve balanced I/O activity.
Each of the foregoing references discloses a system in which load balancing is achieved by selecting a specific location for an individual data set based upon express or inferred knowledge about th e data set. An individual data set remains on a given physical disk unless manually reconfigured. None of these systems suggests the implementation of load balancing by the dynamic reallocation or configuration of existing data sets within the disk array storage system.
Another load balancing approach involves a division of reading operations among different physical disk drives that are redundant. Redundancy has become a major factor in the implementation of various storage systems that must also be considered in configuring a storage system. U.S. Pat. No. 5,819,310 granted Oct. 6, 1998 discloses such a redundant storage system with a disclosed disk array storage device that includes two device controllers and related disk drives for storing mirrored data. Each of the disk drives is divided into logical volumes. Each device controller can effect different reading processes and includes a correspondence table that establishes the reading process to be used in retrieving data from the corresponding disk drive. Each disk controller responds to a read command that identifies the logical volume by using the correspondence table to select the appropriate reading process and by transferring data from the appropriate physical storage volume containing the designated logical volume.
Consequently, when this mirroring system is implemented, reading operations involving a single logical volume do not necessarily occur from a single physical device. Rather read commands to different portions of a particular logical volume may be directed to any one of the mirrors for reading from preselected tracks in the logical volume. Allowing such operations can provide limited load balancing and can reduce seek times.
Other redundancy techniques and striping techniques can tend to spread the load over multiple physical drives by dividing a logical volume into sub-volumes that are stored on individual physical drives in blocks of contiguous storage locations. However, if the physical drives have multiple logical volumes, sub-volumes or other forms of blocks of contiguous storage locations, the net effect may not balance the load with respect to the totality of the physical disk drives. Thus, none of the foregoing references discloses or suggests a method for providing a dynamic reallocation of physical address space based upon actual usage.
Therefore it is an object of this invention to enable a dynamic reallocation of data in a plurality of physical disk storage devices to reduce any imbalance of load requirements on each physical disk storage.
Another object of this invention is to determine the relative utilization of physical disk storage devices to reduce imbalances in the utilization.
Still another object of this invention is to provide a procedure for obtaining a value representing disk seek times in a physical disk storage device in an efficient manner minimizes loads on resources.
In accordance with one aspect of this invention, total seek time required to access a physical disk storage device that stores data in a plurality of data blocks is obtained by collecting the number of disk accesses to each data block during a sample interval. This information converts to disk seek time for the sample interval by generating a first sum of the accesses to all the data blocks, by generating a second sum that is the sum of all the first sums, by generating a third sum that is a sum of the squares of all the first sums and by combining the first, second and third sums to obtain the total interval required for all the disk accesses to all the data blocks in the physical disk storage device during the sample interval.
In accordance with another aspect of this invention, obtaining a total seek time required to access a physical disk storage device that stores data in a plurality of logical volumes includes, as initial steps, collecting the number of disk accesses to each logical volume during a sample interval and generating a weighted accesses value according to:   WeightedAccesses  =            N      rm        +                  N        wr            2        +                  N        sr            4      
where Nrm, Nwr and Nsr represent the number of acces ses of the read miss, write and sequential read types respectively. Then the method proceeds by producing, for the first sum, the values: Aixe2x80x232 A1+A2+ . . . +Ai and ANxe2x80x2=A1+A2+ . . . +AN, by producing for the second sum the value:       ∑          i      =      1        N    ⁢      A    i    xe2x80x2  
and by producing for the third sum the value:       ∑          i      =      1        N    ⁢      A    i    xe2x80x22  
wherein said step of combining the first, second and third sums produces a result according to:       ∑          i      =      1        N    ⁢            A      i      xe2x80x2        ⁢                            xe2x80x83                ⁢                              ∑                          i              =              1                        N                    ⁢                      A            i            xe2x80x22                                      A        N        xe2x80x2            