1. Field of the Invention
The present invention relates generally to computer storage systems and pertains more particularly to an apparatus for and a method of non-linear constraint optimization in a storage system configuration.
2. Discussion of the Prior Art
Storage systems for computer networks can contain a large number of storage devices having a wide variety of characteristics and a nearly arbitrary interconnection scheme. The configuration and management of the storage system is central to the functioning of the computer network. In very large networks, the inherent difficulties in configuring and managing the storage system are compounded by the sheer scale of the network. The situation has reached the point where the time needed to configure a new storage system can be several months and the cost of managing the storage system can be several times the purchase cost.
Large computer networks can contain a large number of host computers connected to the storage system. In such networks, many application programs may be running concurrently on one or more of the host computers and each application program has a certain level of service that it requires from the storage system in order to run well. The storage allocation problem is to optimally lay out data accessed by the application programs on the optimal set of storage devices in the storage system. A solution to the problem is referred to as an assignment plan.
The optimality of the resulting overall assignment plan is evaluated based on an objective function. For example, an objective function may be to minimize the cost of the storage system. An objective function may be to maximize the performance of the storage system. Other objective functions include balancing the load, maximizing the availability, and minimizing the physical footprint. One of ordinary skill in the art will realize that there are many other possible objective functions and that, often, multiple and competing objective functions will have to be balanced.
A specific piece of data is referred to as a workload unit. Associated with every workload unit are a set of standards. Standards include both the workload unit characteristics and the application program access characteristics. For example, a standard may be the size of the workload unit. A standard may be the access speed or the access frequency of the application program. Other standards include request size, request rate, run count, phasing behavior, on time, off time, and maximum amount of data loss. One of ordinary skill in the art will realize that there are many other possible standards.
Likewise, associated with every storage device are a set of characteristics. Characteristics include both performance measures and physical descriptions. For example, a characteristic may be the quantity of storage available on or the access speed of the storage device. A characteristic may be the size or the weight of the storage device. Other characteristics include position time, transfer rate, cost, outage frequency, outage length, and data loss rate. One of ordinary skill in the art will realize that there are many other possible characteristics.
For the storage allocation problem, the questions of whether the various workload unit standards are compatible with the various storage device characteristics serve as constraints on the solution. Often, the constraints are linear inequalities, that is, expressions of the form                                           ∑                                          w                i                            ∈              W                                ⁢                                    a              i                        ⁢                          w              i                                       less than                   f          ⁡                      (            d            )                                              Eq        .                  xe2x80x83                ⁢                  (          1          )                    
where the values of ai and f(d) are constants for a given storage device, for example, quantity of storage, outage frequency, and data loss rate. However, some of the constraints are non-linear, for example, access speed and utilization. Further, some of the constraints are not inequalities, for example, existence of proper interconnect cabling. This mixture of constraints serves to further complicate the matter because if one could assume that all of the constraints were of one type, then one could tailor the solution to that type of constraint. In particular, there are a number of good solutions if the constraints were all linear. Unfortunately, that is not necessarily the case here.
So, the storage allocation problem can be viewed on at least two levels. First, whether a particular workload unit can be assigned to a particular storage device, that is, whether the constraints are met. Second, whether a particular workload unit should be assigned to a particular storage device given the resulting overall assignment plan, that is, whether the objective function is optimized.
There exist many standard optimization problems that are similarly structured to the storage allocation problem. However, none of them are an exact match. As a result, none of them provide a model upon which to reach a solution.
One standard optimization problem similar to the storage allocation problem is the classic bin packing problem. In the classic bin packing problem, the challenge is to fit a set of n items, I={i1, i2, . . . in}, having fixed sizes, S={s1, S2, . . . Sn}, into a set of m bins, B={b1, b2, . . . bm}, having fixed capacities, C={c1, c2, . . . cm}. The objective function is to use the minimum number of bins possible given the constraint that the sum of the sizes of a set of items in a bin must be less then or equal to the capacity of the bin. So, in the classic bin packing problem, there is only one objective function and one constraint. In the storage allocation problem, however, there may be multiple objective functions and multiple constraints. As a result, solutions to the classic bin packing problem cannot be used directly to solve the storage allocation problem.
Another standard optimization problem similar to the storage allocation problem is the integer knapsack problem. In the integer knapsack problem, the challenge is to fit a set of n items, I={i1, i2, . . . in}, having a fixed size, S={s1, s2, . . . sn}, and a defined value, V={v1v2, . . . vn}, into a knapsack having a fixed size k. The objective function is to maximize the value of a set of items placed into the knapsack given the constraint that the sum of the sizes of the set of items in the knapsack must be less then or equal to the capacity of the knapsack. Again, in the integer knapsack problem, there is only one objective function and one constraint. Alternatively, there is a variant of the integer knapsack problem called the multidimensional knapsack problem which takes into account multiple capacity dimensions. An example solution is given in MANAGEMENT SCENCE by Yoshiaki Toyoda (Toyoda) in an article entitled xe2x80x9cA Simplified Algorithm for Obtaining Approximate Solutions to Zero-One Programming Problems.xe2x80x9d Even so, both of the knapsack problems differ from the storage allocation problem in two significant ways. First, the knapsack problems assume that the capacity dimensions can be captured as linear constraints. However, as noted above, this may not always be the case in the storage allocation problem and cannot be assumed. Second, the knapsack problems assume a fixed number of knapsacks. However, in the storage allocation problem, an objective function to choose the best set of storage devices may require that storage devices be added or that storage devices remain unused. As a result, solutions to the knapsack problems cannot be used directly to solve the storage allocation problem.
One computer specific optimization problem similar to the storage allocation problem is the standard file allocation problem. In the standard file allocation problem, the challenge is to place a set of n files, F={f1, f2, . . . fn}, having a fixed size, S={s1, s2, . . . sn}, and a set of m tasks, T={t1, t2, . . . tm}, onto a set of k nodes, N={n1, n2 , . . . nk}, having a fixed capacity, C={c1, c2, . . . ck}, where each task needs to access at least one file and each file is accessed by at least one task. The objective function is to minimize the file transmission costs of running the tasks given the constraint that the sum of the sizes of a set of files on a node must be less then or equal to the capacity of the node. Again, in the standard file allocation problem, there is only one objective function. Alternatively, there are a number of variants of the standard file allocation problem. Even so, all of the file allocation problems differ from the storage allocation problem in that they assume a fixed number of nodes. However, in the storage allocation problem, an objective function to choose the best set of storage devices may require that storage devices be added or that storage devices remain unused. As a result, solutions to the file allocation problems cannot be used directly to solve the storage allocation problem.
In addition to the above optimization problems, a number of narrow solutions have been proposed to address individual aspects of the problem. An example of a narrow solution is presented in U.S. Pat. No. 5,345,584 issued to Hill entitled xe2x80x9cSystem for Managing Data Storage Based on Vector-Summed Size-Frequency Vectors for Data Sets, Devices, and Residual Storage on Devices.xe2x80x9d This patent discloses a method that only considers the capacity and access speed of the storage device. Generally, all of the narrow solutions fail to provide a method for using arbitrary constraints or arbitrary objective functions.
Given the preceding state of the art, it is not surprising to find that configuration and management of the storage system has historically been accomplished through ad hoc solutions to the storage allocation problem. A formalization of the storage allocation problem and a collection of a range of solutions would be a significant advancement.
It is a primary purpose of the present invention to provide an improved apparatus for and method of non-linear constraint optimization in a storage system configuration.
In accordance with the primary aspect of the present invention, the objective function for a storage system is determined, the workload units are selected and their standards are determined, and the storage devices are selected and their characteristics are determined. These selections and determinations are then used by a constraint based solver through non-linear constraint integer optimization to generate an assignment plan for the workload units to the storage devices.