1. Field of the Invention
The present invention relates to an apparatus and method for solving large dense systems of linear equations on a parallel processing computer. More particularly, this invention relates to an efficient parallel processing method throughout which solves dense systems of linear equations (those containing approximately 25,000 variables) wherein the system efficiently utilizes the computing and input/output resources of the parallel processing computer and minimizes idle time of those resources.
2. Prior Art
For some applications, such as determining the radar cross-section of aircraft, or simulating flow across an airfoil, very large systems of linear equations are used. These linear equations sometimes range on the order of 25,000 variables or more. It therefore is necessary to solve very large matrices comprising approximately 25,000.times.25,000 elements or more for solving these linear equations. This is typically difficult to accomplish on prior art linear algebra computers due to their inherent limitations, such as processing power, or input/output device speed. Also, general limitations have prevented the solutions for these problems, such as bandwidth and the cost of super-computing resources. Parallel computing architectures have been used for solving certain types of problems, including systems of linear equations, since many small processors may perform certain operations more efficiently than one large high-speed processor, such as that present in a typical super computer. The practical limitations in partitioning the problem to be solved in the parallel-processing machines, however, has hindered the usefulness of this architecture.
In a parallel processing machine, it is necessary to break down a problem (such as the operations to solve a large matrix representing a system of linear equations) into a series of discrete problems in order for the system to generate a solution. Segmenting such a problem is a nontrivial task, and must be carefully designed in order to maximize processing by each of the parallel nodes, and minimize input/output operations which must be performed. This is done so that the system does not spend the majority of the time performing input/output operations (thereby becoming "I/O bound") while the computing resources in the system remain substantially idle. Therefore, one goal of designing systems and problems is to maximize the amount of time that the system is processing. Another goal of parallel processing architectures in implementing very large matrix solutions, is to balance the I/O capacity of the system so that the system does not become totally "compute-bound" (e.g. processing only, no input/output operations) and the I/O units remain idle.
Another problem with parallel processing computers is that large chunks of a matrix (perhaps the entire matrix) need to be loaded into main memory of each parallel processing node in order to compute the solution. Given that very large matrices require vast amounts of computer main memory, certain systems perform matrix solutions "out-of-core." In other words, elements within the matrix that are not needed at a particular time may be written off to disk, and/or spooled to tape, for later retrieval and operation upon in the main memory of each node in the parallel processing system. Such an arrangement provides an easy system of maintaining backups (since they are constantly being spooled off to tape, disk or other similar media), as well as providing a natural check-point at which computation may resume if computer system power is lost, a system malfunction occurs or some other event occurs which halts processing. Given the long time period for solving very large matrices (up to a full real-time month for some very large systems, for example, those containing 100,000.times.100,000 elements), the possibility of losing system power or another type of malfunction may prove fatal to solving the matrix. Such a system may be termed an "out of core solver."