The impetus for this invention arose from work performed in furtherance of the Karmarkar method for optimizing the performance of commercial systems or enterprises. To gain understanding of this invention and its significance, the description below parallels the description of an invention for which an application for patent, entitled "Preconditioned Conjugate Gradient Method", was filed in the U.S. Patent and Trademark Office on even date herewith.
The need for optimization of systems arises in a broad range of technological and industrial areas. Examples of such a need include the assignment of transmission facilities in telephone transmission systems, oil tanker scheduling, control of the product mix in a factory, deployment of industrial equipment, inventory control, and others. In these examples a plurality of essentially like parameters are controlled to achieve an optimum behavior or result. Sometimes, the parameters controlling the behavior of a system have many different characteristics but their effect is the same; to wit they combine to define the behavior of the system. An example of that is the airline scheduling task. Not only must one take account of such matters as aircraft, crew, and fuel availability at particular airports, but it is also desirable to account for different costs at different locations, the permissible routes, desirable route patterns, arrival and departure time considerations vis-a-vis one's own airline and competitor airlines, the prevailing travel patterns to and from different cities, etc. Two common denominators of all of these applications is the existence of many parameters or variables that can be controlled, and the presence of an objective--to select values for the variables so that, in combination, an optimum result is achieved.
The relationships describing the permissible values of the various variables and their relationship to each other form a set of constraint relationships. Optimization decisions are typically subject to constraints. Resources, for example, are always limited in overall availability and, sometimes, the usefulness of a particular resource in a specific application is limited. The challenge, then, is to select values of the parameters of the system so as to satisfy all of the constraints and concurrently optimize its behavior, i.e., bring the level of "goodness" of the objective function to its maximum attainable level. Stated in other words, given a system where resources are limited, the objective is to allocate resources in such a manner so as to optimize the system's performance.
One method of characterizing optimization tasks is via the linear programming model. Such a model consists of a set of linear equalities and inequalities that represent the quantitative relationships between the various possible system parameters, their constraints, and their costs (or benefits). Describing complex systems, such as a commercial endeavor, in terms of a system of linear equations often results in extremely large numbers of variables and constraints placed on those variables. Until recently, artisans were unable to explicitly solve many of the optimization tasks that were facing them primarily because of the large size of the task.
The best known prior art approach to solving allocation problems posed as linear programming models is known as the simplex method. It was invented by George B. Dantzig in 1947, and described in Linear Programming and Extension, by George B. Dantzig, Princeton University Press, Princeton, N.J., 1963. In the simplex method, the first step is to select an initial feasible allocation as a starting point. The simplex method gives a particular method for identifying successive new allocations, where each new allocation improves the objective function compared to the immediately previous identified allocation, and the process is repeated until the identified allocation can no longer be improved.
The operation of the simplex method can be illustrated diagrammatically. In two-dimensional systems the solutions of a set of linear constraint relationships are given by a polygon of feasible solutions. In a three-dimensional problem, linear constraint relationships form a three dimensional polytope of feasible solutions. As may be expected, optimization tasks with more than three variables form higher dimensional polytopes. FIG. 1 depicts a polytope contained within a multi-dimensional hyperspace (the representation is actually shown in three dimensions for lack of means to represent higher dimensions). It has a plurality of facets, such as facet 11, and each of the facets is a graphical representation of a portion of one of the constraint relationships in the formal linear programming model. That is, each linear constraint defines a hyperplane in the multi-dimensional space of polytope 10, and a portion of that plane forms a facet of polytope 10. Polytope 10 is convex, in the sense that a line joining any two points of polytope 10 lies within or on the surface of the polytope.
It is well known that there exists a solution of a linear programming model which maximizes (or minimizes) an objective function, and that the solution lies at a vertex of polytope 10. The strategy of the simplex method is to successively identify from each vertex the adjacent vertices of polytope 10, and select each new vertex (each representing a new feasible solution of the optimization task under consideration) so as to bring the feasible solution closer, as measured by the objective function, to the optimum point 21. In FIG. 1, the simplex method might first identify vertex 12 and then move in a path 13 from vertex to vertex (14 through 20) until arriving at the optimum point 21.
The simplex method is thus constrained to move on the surface of polytope 10 from one vertex of polytope 10 to an adjacent vertex along an edge. In linear programming problems involving thousands, hundreds of thousands, or even millions of variables, the number of vertices on the polytope increases correspondingly, and so does the length of path 13. Moreover, there are so-called "worst case" problems where the topology of the polytope is such that a substantial fraction of the vertices must be traversed to reach the optimum vertex.
As a result of these and other factors, the average computation time needed to solve a linear programming model by the simplex method appears to grow at least proportionally to the square of the number of constraints in the model. For even moderately-sized allocation problems, this time is often so large that using the simplex method is simply not practical. This occurs, for example, where the constraints change before an optimum allocation can be computed, or the computation facilities necessary to optimize allocations using the model are simply not available at a reasonable cost. Optimum allocations could not generally be made in "real time" (i.e., sufficiently fast) to provide more or less continuous control of an ongoing process, system or apparatus.
To overcome the computational difficulties in the above and other methods, N. K. Karmarkar invented a new method, and apparatus for carrying out his method, that substantially improves the process of resource allocation. In accordance with Karmarkar's method, which is disclosed in U.S. Pat. No. 4,744,028 issued May 10, 1988, a starting feasible solution is selected within polytope 10, and a series of moves are made in the direction that, locally, points to the direction of greatest change toward the optimum vertex of the polytope. A step of computable size is then taken in that direction, and the process repeats until a point is reached that is close enough to the desired optimum point to permit identification of the optimum point.
Describing the Karmarkar invention more specifically, a point in the interior of polytope 10 is used as the starting point. Using a change of variables which preserves linearity and convexity, the variables in the linear programming model are transformed so that the starting point is substantially at the center of the transformed polytope and all of the facets are more or less equidistant from the center. The objective function is also transformed. The next point is selected by moving in the direction of steepest change in the transformed objective function by a distance (in a straight line) constrained by the boundaries of the polytope (to avoid leaving the polytope interior). Finally, an inverse transformation is performed on the new allocation point to return that point to the original variables, i.e., to the space of the original polytope. Using the transformed new point as a new starting point, the entire process is repeated.
Karmarkar describes two related "rescaling" transformations for moving a point to the center of the polytope. The first uses a projective transformation, and the second method uses an affine transformation. These lead to closely related procedures, which we call projective scaling and affine scaling, respectively. The projective scaling procedure is described in detail in N. K. Karmarkar's paper, "A New Polynomial Time Algorithm for Linear Programming", Combinatorica, Vol. 4, No. 4, 1934, pp. 373-395, and the affine scaling method is described in the aforementioned N. Karmarkar '028 patent and in U.S. Pat. No. 4,744,026 issued May 10, 1988 to Vanderbei.
The advantages of the Karmarkar invention derive primarily from the fact that each step is radial within the polytope rather than circumferential on the polytope surface and, therefore, many fewer steps are necessary to converge on the optimum point.
To proceed with the Karmarkar method, it is best to set up the optimization task in matrix notation, transformed to the following canonical form: EQU minimize: c.sup.T x EQU Subject to: Ax=b (1)
In the above statement of the task,
x=(x.sub.1, x.sub.2, . . . , x.sub.n) is a vector of the system attributes which, as a whole, describe the state of the system; n is the number of such system attributes; c=(c.sub.1, c.sub.2, . . . , c.sub.n) is a vector describing the objective function which minimizes costs, where "cost" is whatever adversely affects the performance of the system; c.sup.T is the transpose of vector c;
A=(a.sub.11, a.sub.12, . . . , a.sub.ij, . . . , a.sub.mn) is an m by n matrix of constraint coefficients;
b=(b.sub.1, b.sub.2, . . . , b.sub.m) is a vector of m constraint limits.
In carrying out the method first invented by Karmarkar, various computational steps are required. These are depicted in FIG. 2, which is similar to one of the drawings in the aforementioned Karmarkar patent application. The various vectors and matrices referred to in FIG. 2 are not essential to the understanding of the invention disclosed herein, and therefore are not discussed further. We wish to merely note that the step which is computationally most demanding is the step in block 165 that requires the use of the matrix inverse (AD.sup.2 A.sup.T).sup.-1. Developing that inverse is tantamount to solving (for the unknown u) the positive definite system of linear equations EQU AD.sup.2 A.sup.T u=p (2)
or EQU Qu=p,
where D is an affine scaling diagonal matrix, p is AD, and Q=AD.sup.2 A.sup.T.
A number of methods are known in the art for solving a system of linear equations. These include the various direct methods such as the Gaussian elimination method, and various iterative methods such as the relaxation methods and the conjugate gradient method. These methods are well known, but for completeness of this description they are described herein in abbreviated form.
The Gaussian elimination method for solving a system of linear equations is the method most often taught in school. It comprises a collection of steps, where two linear equations are combined at each step to eliminate one variable. Proceeding in this manner, an equation is arrived at that contains a single variable; a solution for that variable is computed; and a solution for the other variables is derived by back tracking. It can be shown that the process of eliminating variables to reach an equation with a single variable is a transformation of the given matrix which has non-zero values at arbitrary locations within the matrix into a matrix that contains nothing but zeros in the lower triangular half. The Gaussian elimination method is poorly suited for solving a very large system of linear equations because the process of transforming the original matrix into the matrix with a zero lower triangular portion introduces many non-zero terms into the upper triangular portion of the matrix. This does not present a problem in applications of the method to small systems, but in large systems it represents a major drawback. For example, in a large system where the A matrix contains 10.sup.6 equations and 10.sup.6 unknowns, the potential number of non-zero terms in the upper triangular portion is 10.sup.11. Presently there is no hope of dealing with such a large number of non-zero terms, even in terms of just storing the values. Fortunately, physical systems of the type whose optimization is desired are sparse; which means that bulk of the terms in the matrix is zero. A method that does not exhibit "fill-in" as the Gaussian elimination method would have to deal with many fewer non-zero terms. For example, in a system that contains only five non-zero terms in each column presents a total number of 5.times.10.sup.6 non-zero terms in the above example. That is much more manageable.
The use of relaxation methods in connection with large matrices is also not recommended because there is no assurance that a solution will be reached in reasonable time. The Gauss-Seidel method for example, works by guessing a first approximate, solution and computing from that a new approximate solution that is closer to the true solution of the system. Successive iterations eventually yield the actual solution to the system, but the number of required iterations is highly dependent of the initial choice.
In many of the above described methods, as well as in the novel conjugate gradient method disclosed below, it is necessary to perform a multiplication of a matrix by a vector. The matrix is relatively invariant, i.e., does not change with each iteration, while the vector changes often, e.g., with each iteration. There are standard techniques for performing a multiplication of a matrix by a vector, but those techniques do not take advantage of the data characteristics which may permit a faster and therefore more efficient realization of the desired product.