N/A
N/A
The present invention relates generally to computer systems, and more particularly to efficient management of data storage systems and computer system performance. The present invention relates specifically to a system and method for determining whether the activity graphs of logical volumes are correlated.
Data storage for computer systems may be provided using a variety of device types and technologies. Mass storage of computer readable data is often provided using electromagnetic storage disks, sometimes referred to as xe2x80x9chard disksxe2x80x9d, or xe2x80x9cdisksxe2x80x9d.
A device which controls and/or operates one or more disks is sometimes referred to as a xe2x80x9cdisk adapter.xe2x80x9d A disk adapter typically controls a set of stacked storage disks, each of which has data recorded electromagnetically in xe2x80x9cblocksxe2x80x9d of data which are organized onto concentric circles or xe2x80x9ctracks.xe2x80x9d A disk xe2x80x9cheadxe2x80x9d writes or reads the data stored on the tracks.
It is often desirable to have some form of virtual disk subsystem in order to provide flexibility in the allocation and management of disk storage. A virtual disk subsystem may be used to define what are referred to as xe2x80x9clogical volumesxe2x80x9d, using storage capacity from one or more disks. For example, one or more disks may be organized into a group to form a storage pool from which disk space may be allocated to form the logical volumes. Each logical volume may consist of an arbitrary-size storage space defined within the related group of storage devices. Each logical volume can then be used as a separate storage space for a file system or for raw data storage by various applications.
A problem arises with respect to mapping logical volumes onto specific physical devices, due to the fact that system performance may suffer when two or more logical volumes with correlated activity graphs are co-located on one or more physical devices. In such circumstances, shared resources of the physical device or devices storing the correlated logical volumes may impose a bottleneck on system performance during periods of coincidental high activity.
Thus, it would be desirable to have a system which determines whether the activity graphs of two logical volumes are correlated, so that appropriate action may be taken to prevent system performance degradation.
A known mathematical technique for determining the correlation of two data sequences is the xe2x80x9ccorrelation coefficientxe2x80x9d method. The correlation coefficient technique is a mathematical tool that may be employed to find linear relationships between two variables, X and Y. Correlation coefficient analysis typically results in a value from xe2x88x921 to +1 describing the correlation of two data sequences over the length of interest. Therefore, correlation coefficient values close to 1 indicate a positive linear association between the variables, in that as one of the variables increases, so does the other. Conversely, correlation coefficient values close to xe2x88x921 indicate a negative linear association between the variables, in that as one of the variables increases, the other decreases. Correlation coefficient values close to zero indicate a lack of linear association between the variables. As it is generally known, the correlation coefficient equation may be given as follows:       CORRELATION    ⁢          xe2x80x83        ⁢    COEFFICIENT    =                    ∑        i            ⁢                        (                                    X              i                        -                          X              _                                )                ⁢                  (                                    Y              i                        -                          Y              _                                )                                              ∑          i                ⁢                                            (                                                X                  i                                -                                  X                  _                                            )                        2                    ⁢                                    ∑              i                        ⁢                                          (                                                      Y                    i                                    -                                      Y                    _                                                  )                            2                                          
where Xi and Yi in this case are activity levels at the ith time samples, and {overscore (X)} and {overscore (Y)} are the means of X and Y, respectively.
The correlation coefficient technique has certain drawbacks applied to the problem of determining whether the activity graphs of logical volumes are correlated. These drawbacks stem from the fact that the correlation coefficient approach assumes that a linear correlation exists between the two sequences being compared, which is frequently not true for activity graphs of logical volumes. Moreover, the correlation coefficient approach does not consider the correlation of the absolute activity levels of the logical volumes being compared.
For the reasons stated above, it would be desirable to have a system which determines whether the activity graphs of two logical volumes are correlated, at least in part based on their absolute activity levels, so as to detect whether the logical volumes have significant periods of coincidental high activity levels. Further, it would be desirable to have a system which determines whether the activity graphs are correlated for two logical volumes, at least partly based on the magnitude of changes in the respective activity levels, so as to detect whether the logical volumes have coincidental activity level increases of a large magnitude.
In accordance with principles of the invention, a system and method are disclosed for determining whether activity graphs for logical volumes of data storage are correlated. Activity graphs are first generated over a test period for each of the logical volumes to be compared. Each activity graph consists of a series of activity levels measured at sampling points within the test period. An activity level for a logical volume may, for example, reflect the number of read and/or write operations involving data stored within that logical volume.
In an illustrative embodiment of the disclosed system, an area ratio criteria is employed to determine whether logical volume activity graphs are correlated. The disclosed system employs the area ratio criteria to determine whether activity graphs for a selected pair of logical volumes are correlated by determining whether an area ratio of the activity graphs is at least as great as a predetermined threshold. The area ratio of the activity graphs is a ratio between a minimum area and a maximum area described by the two activity graphs. The minimum area is defined as the area lying below both the activity graphs. The maximum area is defined as the area lying below either one or the other of the activity graphs. The predetermined threshold area ratio may, for example, be determined empirically for a given application, such that pairs of activity graphs which have an area ratio at least as large as the threshold area ratio value should be considered correlated.
In another illustrative embodiment, other correlation criteria are employed to determine if the activity graphs of two logical volumes are correlated. The set of correlation criteria which may be employed further includes a peak ratio criteria. The peak ratio criteria indicates that one activity graph is correlated to another activity graph if a peak ratio described by the two activity graphs is at least as large as a predetermined peak ratio value. The peak ratio for the two activity graphs being compared is a ratio between the number of coincidental peaks in the two graphs, and the difference between the total number of peaks in the two graphs and the number of coincidental peaks. The number of coincidental peaks may, for example, be defined as the number of peaks in one of the activity graphs which occur at the same time as peaks in the other activity graph. Further for purposes of illustration, a peak within an activity graph may be defined as an activity level sample which is above the average activity level for the activity graph, plus a predetermined offset value. Alternatively, or in addition, a peak within an activity graph may be defined as an activity level at least as large as the product of the average activity level for the activity graph multiplied by a predetermined multiplier value. Appropriate predetermined offset and/or multiplier values may be determined empirically for specific applications.
The correlation criteria employed by the disclosed system may further include a sharp peak criteria. The sharp peak criteria indicates that one activity graph is correlated to another in the event that the two activity graphs have a coincidental sharp peak. A coincidental sharp peak may only occur in the event that the highest activity levels of both logical volumes, during the relevant test period, occur at the same time. Additionally, a coincidental sharp peak may only occur in the event that these highest activity levels are also at least as great as the product of a next highest activity level within each respective activity graph and a predetermined multiplier. Furthermore, a coincidental sharp peak may only occur in the event that a value of a correlation coefficient for the two activity graphs is greater than a predetermined threshold correlation coefficient value.
A set of correlation criteria employed by the disclosed system may further include a correlation coefficient criteria. Such a correlation coefficient criteria employs the standard correlation coefficient technique to determine a correlation coefficient for the two graphs.
In the event that the disclosed system determines that two activity graphs are correlated by some correlation criteria, an indication of that determination may be generated, so that corrective action may be performed. Such indication may take the form of a message sent to, or posted for, a resource/configuration manager or entity, which may then take any form of corrective action that may be appropriate.
In this way, the disclosed system may advantageously determine whether the activity graphs of logical volumes are correlated, at least in part based on absolute activity levels. The disclosed system may further detect whether the logical volumes have significant periods of coincidental high activity levels. Additionally, the disclosed system further may determine whether the activity graphs of two logical volumes are correlated partly based on the magnitude of change in their respective activity levels over time, so as to detect whether the logical volumes have simultaneous high activity levels.
Different combinations of all or some of the disclosed correlation criteria may be used to determine whether activity graphs of logical volumes are correlated. Such combinations of the disclosed correlation criteria may further be used to determine the level of correlation between activity graphs of logical volumes. In this way, some or all of the disclosed correlation criteria may be used to determine how xe2x80x9cstrongxe2x80x9d a correlation exists between the activity graphs of logical volumes.