The present invention relates to a parallel processing system and, more particularly, to a parallel processing system for performing an application function by a plurality of processing units contained within a single network.
A typical parallel processing system comprises a plurality of processing units which are interconnected within a single network which together compute the result of a solvable problem. An advantage of the parallel processing system is the increased computing power derived from combining the processing power of several processing units. Typically, many prior art parallel processing units require a known, unchanging parallel processing system to be used. A function which is to be parallel processed is divided and communicated to a plurality of units such that each unit processes a portion of the entire function. Once each processing unit has completed processing its portion of the function, the processed portions of the function are combined to provide a completed function. However, many prior art parallel processing systems arbitrarily divide the application function among the plurality of processing units such that each processing unit completes its portion of the function at a different time. This leads to inefficient use of processing time. In addition, many of the processing units in the network typically are not utilized which further lowers the efficiency of the system.
An important aspect to parallel processing systems is the ability of the system to withstand equipment failures in individual processing units and the capability of continued operation. If a parallel processing system is unable to detect a failure in one of its processing units, the system will be unable to complete the processed function. Since the likelihood of a failure occurring during the computation of a given function is significantly high, a parallel processing system which is unable to detect failures is ineffective. On the other hand, if the parallel processing system can detect failures within a single processing unit, but the time necessary to check each processing unit is significant, the fault detect feature will counterbalance any benefit derived by using the parallel processing system.
There is a need for a parallel processing system which is capable of effectively dividing and processing an application function within a minimal amount of time in a processing environment that may change between subsequent executions of the application function. In addition, the parallel processing system should not affect the normal operations of each processing unit. Therefore, when a particular processing unit is not involved in a parallel process, the processing unit may engage in normal processing operations. The parallel processing system should be capable of running on any operating system and should include a plurality of processing units contained within a single network. A master unit controls the operating flow of the application function and divides the function among a plurality of slave processing units such that each processing unit completes its portion of the function at approximately the same time. Once each slave processing unit completes its portion of the application function, the processed data is transferred back to the master processing unit. In the event that a failure occurs in one of the processing units, the master processing unit reassigns the portion of the function allotted to the failed processing unit to another processing unit such that a completed application function can be performed.