Various types of storage devices such as disk storage (with mechanical read/write heads and magnetic storage elements) and semiconductor storage such as flash memory are known today. In general, semiconductor storage devices have fast random read/write access, and disk storage devices have fast sequential read/write access. Moreover, different disk storage devices have different rotational speeds resulting in different access times. For example, a 15,000 RPM disk storage device has faster access than a 10,000 RPM disk storage device. Likewise different types of semiconductor storage devices exhibit different performance. For example, a NAND flash memory device writes faster but reads slower than a NOR flash memory-based device. Generally, the more expensive the storage device the faster it can read/write data and vice versa.
Previously known data life cycle management programs managed and migrated data (in data units/granularities of files, storage extents or storage volumes) to different types of storage devices based on recent frequency of access of the data unit, age of the data unit, and/or changing requirements of the data owner for the data units, etc. Generally, data that is newer is regarded as having higher importance than data which is older because the newer data typically experiences greater frequency of access. Consequently, the newer data is stored in more expensive, high performance storage devices. Conversely, data that is older, typically experiences lower frequency of access so is stored in less expensive, slower access storage devices. In a previously known data life cycle management program, a data owner specified rules that determine when a data unit needs to be moved. For example, there is a rule to move data from expensive flash memory to less expensive disk storage when the data unit's frequency of access drops below a specified threshold. A known performance monitoring system (such as the previously known IBM Tivoli Storage Productivity Center program) tracks frequency of access of a given data unit, and records the frequency of access as “metadata”. The IBM Tivoli Storage Productivity Center program records the number of I/O requests a data unit receives at regular time intervals. The IBM Tivoli Storage Productivity Center program also tracks the average data transfer size of each I/O request, whether there was a “cache hit” for the I/O request, whether the requested data for both reads and writes was sequential to a prior or subsequent I/O request, whether the I/O request resulted in an out-paging/de-staging of data into the cache and average cache hold time.
It was previously known to transfer data from a source storage device to another storage device for reasons other than life cycle management. For example, one storage device may be close to capacity, and data needs to be transferred to another storage device with greater available capacity (and the same or different access speed) to accommodate expected growth of the data. As another example, an application which cannot readily access its data in its current storage device needs faster access to the data, so the data is transferred to another storage device (of the same or different access speed) that is readily accessible by the application. It was also known to transfer data from a source storage device to another storage device for load balancing and storage consolidation.
In any of the foregoing scenarios where there is a need to transfer data for any reason, it was known to either execute the transfer immediately or schedule the data transfer to execute during off peak hours when there is more network bandwidth available and when storage components are less utilized. It was previously known to monitor the status and availability of the network bandwidth and resources within the storage components using an existing monitoring program that tracks current and past performance data.
It was also known to adjust the data transfer rate if a bandwidth utilization threshold or response time threshold has been exceeded.
A previously known Disk Magic™ modeling program manufactured by IntelliMagic Inc. estimates the utilizations of individual, respective I/O components involved in a data transfer. The I/O components involved in the data transfer are (a) storage subsystems which provide LUNs to a virtual storage controller program, (b) storage pools which are groups of LUNs, and (c) the virtual storage controller which groups LUNs having similar characteristics into storage pools and virtualizes the storage locations to route I/O requests correctly regardless of where the data physically resides. An “LUN” or logical unit number is a single disk or set of disks viewed as a single storage unit. Components that can be over-utilized, and thereby limit I/O traffic, include the storage device adapters, host adapters and storage subsystem processors. I/O is also limited by the capacity of hard disks to perform read/writes (based on read/write head speed, read/write time, etc.). The previously known Disk Magic modeling program was used primarily in a planning operation for data placement. To use the Disk Magic modeling program, an administrator entered projected (a) number of I/O requests per second, (b) average transfer size per I/O request, (c) total and sequential read/write percentages, (d) average cache-hold time, (e) average read cache-hit percentage, and (f) percentage of de-stage operations for each block of data that was to be placed in the storage system. These factors all affect the utilization of components within the storage system. The previously known Disk Magic modeling program then calculated the projected utilizations of the storage components in various available storage arrangements where the data is stored in different storage pools, and identified the projected utilization for the optimum storage arrangement.
The algorithm used by the Disk Magic modeling program to determine the utilizations of the storage components based on the foregoing factors is proprietary to IntelliMagic Inc. and unknown to the present inventors. However, the following is another algorithm to determine utilizations of storage components based on similar factors:
For an available disk storage system, an administrator obtains the following values from the disk storage system manufacturer:
Sector size (e.g. 256 bytes)
Cylinders (e.g. 1500)
Tracks per cylinder (e.g. 8)
Sectors per track (e.g. 113)
Track skew (e.g. 34 sectors)
Cylinder skew (e.g. 43 sectors)
Revolution speed (e.g. 7200 RPM)
Read fence size (e.g. 8 KB)
Sparing type (e.g. sector or track)
Cache size (e.g. 32 KB)
These parameters are used when calculating constants used in the equations that follow.
A seek time (needed for calculating the response time of the storage subsystem) is estimated in one of the following ways:
                a. A constant value        b. Based on a linear function where x is average seek distance in cylinders        c. Based on a curve (e.g. Lee's seek equation), where minSeek, avgSeek, and maxSeek are estimated as follows:        
      seekTime    ⁡          (      x      )        =      {                                                      0              ⁢                              :                                                                                        if                ⁢                                                                  ⁢                x                            =              0                                                                                          a                ⁢                                                      x                    -                    1                                                              +                              b                ⁡                                  (                                      x                    -                    1                                    )                                            +                              c                ⁢                                  :                                                                                                        if                ⁢                                                                  ⁢                x                            >              0                                          ,      where                                          x is the seek distance in cylinders,            a=(−10minSeek+15avgSeek−5maxSeek)/(3√{square root over (numCyl)}).            b=(7minSeek−15avgSeek+8maxSeek)/(3numCyl), and            c=minSeek.                        d. Based on an HPL seek equation, where the values V1-V6 are estimated as follows:        
Seek distanceSeek time1 cylinderV6<V1 cylindersV2 + V3 * {square root over (dist)}>=V1 cylindersV4 + V5 * dist,                 where dist is the seek distance in cylinders.                    If V6==−1, single-cylinder seeks are computed using the second equation. V1 is specified in cylinders, and V2 through V6 are specified in milliseconds.                            V1 must be a non-negative integer. V2 . . . V5 must be non-negative floats and V6 must be either a non-negative float or −1.                                    An example (HP C2200A disk):                            <616 cylinders seek time in ms=3.45+0.597√dist                ≧616 cylinders seek time in ms=10.8+0.012*dist                                                
The foregoing model to determine utilizations of storage components also considers the characteristics of the cache when calculating the seek time because cache hits eliminate the seek time of an I/O request. The percentage of I/O requests that result in no seek time is reflected by the cache-hit percentage (passed in as input). To calculate the response time of the storage subsystem, the seek time is added to the average disk controller overhead, which is a constant that has been previously calculated (e.g. 1.1 ms per read request, 5.1 ms per write request). The disk controller is responsible for processing I/O requests so the utilization value of a disk controller can be calculated using a linear function, where x is the number of I/O requests received. Similarly, the utilization value of the processor can be calculated using a linear function dependent on the number of I/O requests received because the processor is used by the disk controller when processing I/O requests. The host adapter sends I/O requests and processes responses so the host adapter also can be similarly modeled using a linear function dependent on the number of I/O requests sent. The hard disk drive utilization value can be set to the number of I/O requests. Utilization percentages are then calculated by dividing these estimated utilization values by their respective pre-defined capacities.
An object of the present invention is to determine an optimum time to conduct a data transfer after a decision is made to transfer the data (for any reason).