An inherently serial data processing program is one in which a data processing unit processes program steps one-at-a-time and each successive step of the program further changes the results of the preceding step. Thus, each program step changes the total state of the data processing result in a serial fashion. These types of inherently serial programs can often be computation intensive. For example, data mining programs involve a large quantity of data records being analyzed in order to determine previously hidden trends or groupings in such data. In certain types of data mining algorithms, each record of the database is analyzed one-after-another (serially) and the resultant model (used, e.g., in trend prediction) is continuously updated based on the results of each of the serial computations.
While such serial processing gives a very precise result (since each and every data record is used in continuously altering the resultant model), a great deal of processing time is required.
Parallel processing techniques are known, in which a plurality of data processing units are provided and a separate processing unit is assigned, for example, its own mutually exclusive set of local data items, which may be data records, to process. This greatly shortens the overall computation time. If such a technique were applied to an inherently serial process, however, results quite different from the serial case would be attained, thus dissuading such parallel techniques from being used for such an inherently serial process where a specific result is desired.
That is, in such an inherently serial process, where a specific result is required, each processing step is always performed using the state of the resultant model as it was modified by the immediately preceding processing step. This will give the specific required result, since every data record is always processed with full knowledge of how each preceding data record has altered the resultant data model. However, if the total number of data records were divided amongst a plurality of parallel processing units, each processing unit would process its local data independently of the results being attained by the other processing units. Each data record would not be processed with full knowledge of how each preceding data record has altered the resultant data model. Accordingly, parallel processing will not give the same final result as would be attained using serial processing.
Thus, parallel processing techniques have not generally been applied to inherently serial processes where a specific result is required. Parallel techniques have been instead used to run other types of programs where it is not so important to take into account the processed effect of each successive data record. Inherently serial processes in which a specific result is required thus have traditionally suffered from long computation times due to the fact that each data item must be serially processed.