There are many time critical processes which have to be processed in a very narrow time range. Furthermore, there are many processes which must be performed on a huge amount of data objects. Those processes can be found, for example, in the field of funds management. At fiscal year end, most funds management systems execute a program to carry-forward commitments (e.g., debits and credits) into the next fiscal year. Such a procedure is typically performed once a year and, thus, may require processing of very large amounts of data.
Generally, large processing systems capable of performing parallel processing are used for such purposes to manage huge numbers of data objects. Nevertheless, a typical problem with parallel processing, arising within such kinds of systems, is the division of data into packages of appropriate size. Generally, a carry-forward process does not have an ‘a priori’ criterion for dividing data. For instance, the process may use different funds management account (“account”) assignment combinations for which commitments are posted. Those postings, and their corresponding accounts, may be read from total tables. Nevertheless, the distribution of data between the various accounts is unknown. That is, it is not clear how much data requires processing within each account. Further, because completed postings do not need to be carried-forward, certain accounts may not include any postings requiring processing.
When processing data objects organized into tables, a selected table is split into packages of appropriate size and a remote function call is submitted for parallel processing of the package. The remote function call, however, must not interrupt table pointers by which the data objects on each account are selected. Thus, the selection of the data objects on each account is made in a main process and the processing of the selected data objects is performed in parallel processes.
In a large data processing system, data object processing may be distributed between many parallel processes. Therefore, the sequentially executed selection in the main process will determine the total processing time (i.e., the total runtime), and only a limited number of processes will be kept busy at any one time.
For systems with a huge amount of data objects and correspondingly large processing capacity, it may be more effective to pass only the account to a parallel process which then would execute the selection itself. But even in this case, the total runtime will be determined by the sum of time spent for data selection and data processing for the account having the most data.
A different possible approach could be to start a separate batch process for each account where each batch process uses parallel processing for the data processing. This approach, however, would produce a very large number of separate result logs (e.g., thousands) and would therefore be very difficult for a user to evaluate.