1. Field of the Invention
This invention relates generally to computers, and more particularly, to computer program products and methods for causing a computer to function in a particular efficient fashion.
2. Description of the Related Art
Modern computers contain microprocessors, which are essentially the brains of the computer. In operation, the computer uses the microprocessor to run a computer program.
The computer program might be written in a high-level computer language, such as C or C++, using statements similar to English, which statements are then translated (by another program called a compiler) into numerous machine-language instructions. Or the program might be written in assembly language, and then translated (by another program called an assembler) into machine-language instructions. In practice, every computer language above assembly language is a high-level language.
Each computer program contains numerous instructions, which tell the computer what precisely it must do, to achieve the desired goal of the program. The computer runs a particular computer program by executing the instructions contained in that program.
Frequently the goal of the program is to solve complicated real world problems which can be described in mathematical terms. Modern microprocessors permit such programs to be rapidly executed using techniques such as pipelining and speculative execution.
Modern microprocessors use a design technique called a pipeline, in which the output of one process serves as input to a second, the output of the second process serves as input to a third, and so on, often with more than one process occurring during a particular computer clock cycle.
Pipelining is a method used in some microprocessors of fetching and decoding instructions in which, at any given time, several program instructions are in various stages of being fetched or decoded. Ideally, pipelining speeds execution time by insuring that the microprocessor does not have to wait for instructions; when it completes execution of one instruction, the next is ready and waiting. In order to have the next instruction that is to be executed ready and waiting in the pipeline, the microprocessor somehow must predict what that instruction will be.
Branch prediction is a technique used in some microprocessors to guess whether or not a particular path in a program--called a branch--will be taken during program execution, and to fetch instructions from the appropriate location. When a branch instruction is executed, it and the next instruction executed are stored in a buffer. This information is used to predict which way the instruction will branch the next time it is executed. When the prediction is correct, executing a branch does not cause a pipeline break, so the system is not slowed down by the need to retrieve the next instruction. When the prediction is incorrect, a pipeline break does occur, and the system is slowed down because it then needs to locate and retrieve the next instruction. Such incorrect predictions are sometimes called branch mispredictions.
Speculative execution is a technique used in some microprocessors in which certain instructions are executed and results made available before the results are actually needed by the program, so that the results are ready and waiting when the program needs them. Which instructions are to be executed speculatively is based on the guesses made about which branches in the program will be taken. In general, when a branch is mispredicted and instructions speculatively executed based on that incorrect branch prediction, the results of the speculatively executed instructions must be discarded, and consequently the computer time and resources used to obtain the now discarded results are wasted.
Real-world problems frequently can be expressed mathematically using a group of equations generally referred to as a system of simultaneous equations. Those equations, in turn, can be expressed in what is sometimes called matrix form, described more fully below. A computer can then be used to manipulate and perform calculations with the matrices, and solve the problem.
A matrix is a set of numbers arranged in rows and columns so as to form a rectangular array. The numbers are called the elements of the matrix. If there are m rows and n columns, the matrix is said to be "m by n" matrix, written "m.times.n". For example, ##EQU1##
is a 2.times.3 matrix; it has two rows, and three columns. A matrix with m rows and m columns is called a square matrix of order m. An ordinary number can be regarded as a 1.times.1 matrix; thus, the number 3 can be thought of as the matrix [3].
In a common notation, a capital letter denotes a matrix, and the corresponding small letter with a double subscript denotes an element of that matrix. Thus, a.sub.ij is the element in the ith row and the jth column of the matrix A. If A is the 2.times.3 matrix shown above, then a.sub.11 equals 1, a.sub.12 equals 3, a.sub.13 equals 8, a.sub.21 equals 2, a.sub.22 equals -4, and a.sub.23 equals 5. Under certain conditions described more fully below, matrices can be added and multiplied as individual entities.
Matrices occur naturally in systems of simultaneous equations. In the following system for the unknowns x and y, EQU 2x+3y=7 EQU 3x+4y=10
the array of numbers ##EQU2##
is a matrix whose elements are the coefficients of the unknowns. The solution of the equations depends entirely on these numbers and on their particular arrangement. If 7 and 10 were interchanged, the solution would not be the same.
A matrix A can be multiplied by an ordinary number c, which is called a scalar. The product is denoted by cA or Ac, and is the matrix whose elements are ca.sub.ij.
The multiplication of a matrix A by a matrix B to yield a matrix C is defined only when the number of columns of the matrix A equals the numbers of rows of the matrix B. To determine the element c.sub.ij, which is in the ith row and the jth column of the product, the first element in the ith row of A is multiplied by the first element in the jth column of B, the second element in the row by the second element in the column, and so on until the last element in the row is multiplied by the last element of the column; the sum of all these products gives the element c.sub.ij. In symbols, for the situation where A has n columns and B has n rows, EQU C.sub.ij =a.sub.i1 b.sub.1j +a.sub.i2 b.sub.2j + . . . +a.sub.in b.sub.nj.
The matrix C has as many rows as A, and as many columns as B. Thus if A has m rows and n columns, and B has n rows and p columns, then C has m rows and p columns.
When B has only one column, that is, p=1, B is sometimes referred to as a column vector, or simply a vector. In a common notation, a single subscript is used to denote elements of a vector. Thus, v.sub.i is the ith element of the vector V.
The multiplication of a matrix A by a vector V to yield a vector D is defined only when the number of columns of the matrix A equals the number of elements of the vector V. Thus, multiplying an m.times.n matrix A by an n-element vector V, yields an m element vector D, the elements of which are indicated below, where the symbol "*" denotes multiplication. ##EQU3##
The individual elements of a matrix may be zero or non-zero. A matrix in which the non-zero elements amount to a very small percentage of the total number of elements, is sometimes referred to as a sparse matrix. Sparse matrices occur frequently in practice. Problems such as structural analysis, network flow analysis, different approximations to differential equations, finite element analysis, fmancial modeling, fluid dynamics, and so forth, all lead to sparse matrices. Because sparse matrices, and particularly large sparse matrices, frequently occur, techniques have been developed to take advantage of the large number of zeros contained in the sparse matrix, to avoid unnecessary computation and unnecessary storage.
When computers are used for sparse matrix computations, the sparse matrix usually is stored in a compressed form to reduce the storage requirements. In one such known compressed form, only the non-zero elements of the matrix are stored, along with the row and column location for each non-zero element.
In one known prior art method, the non-zero elements of each row of the sparse matrix are stored linearly in a first array, and a second array is used to keep track of the locations in the first array corresponding to the end of each row of the sparse matrix. A third array is used to keep track of the column location in the sparse matrix for each element in the first array. A known prior art method for computing the product of such a sparse matrix with a vector is illustrated in FIG. 1, and sample code is set forth below; in each the first array is called "matrix", the second array is called "end_of_row", the third array is called "column", and the resulting vector is called "result".
do row = 1, number _of_rows result (row) = 0.0 do i = (end_of_row(row-1)+1), end_of_row(row) result (row) = result (row) + matrix(i) * vector(column(i)) end do end do
When using this prior art technique to compute the product of a sparse matrix with a vector, it is necessary to determine the column index of each element in the first array, and compute its product with the corresponding element in the vector. This product is then accumulated until the end of the row is reached. Once the end of the row is reached, the accumulator is cleared, and the process is repeated for the next row. This is done until all the rows are processed.
The prior art method illustrated in FIG. 1 and in the sample code above, includes two DO loops: an outer DO loop; and an inner DO loop. The inner DO loop, denoted by reference numeral 210 in FIG. 2, includes, in general, steps 130, 140, 145 and 150 of FIG. 1; the outer DO loop, denoted by reference numeral 220 in FIG. 2, includes, in general, steps 110, 120, 155 and 160 of FIG. 1.
The inner DO loop is data dependent. That is, the number of times the inner loop calculations are performed is determined by the number of non-zero elements in each row of the sparse matrix. A particular row might have a small number of elements, or a large number of elements; the number of elements is not known until the calculations are made. This results in branch mispredictions caused by the microprocessor predicting the next computation will be in the inner loop when, in reality, because of the data, another branch of the program--the branch for the outer DO loop--must be executed next.
In the illustrated prior art method, such branch mispredictions can occur at the end of each row of the sparse matrix, that is, at the end of each inner DO loop. Such branch mispredictions in modern microprocessors result in lost performance.
The present invention is directed to overcoming, or at least reducing, the effects of one or more of the problems mentioned above.