1. Field of the Invention
The present invention relates to multiprocessing computer database systems. More particularly, the invention concerns a method, apparatus, and article of manufacture for performing an aggregate database processing task with multiple concurrently operating task execution units, using incremental and on-demand sub-task allocation. This is also referred to as "straw model" data processing.
2. Description of the Related Art
Most database tasks cannot be judged by their accuracy alone. Task completion time is of paramount importance to purchasers, testers, developers, and other users of computer programs. The time to complete a task or sub-task has a number of names, including throughput, response time, execution time, and task completion time. By any name, engineers and scientists are constantly seeking new ways to minimize their applications' task completion time.
Reducing task completion time is especially challenging in "multiprocessing environments", which refer to computer systems where multiple computers, digital data processing units, threads, or other processes perform concurrent tasks. For ease of discussion, the term "program execution threads" is used to refer to these and other arrangements, where some or all of hardware device performs a specific task by executing an instruction stream. "Threads" is a known shorthand term. In multi-thread systems, one important consideration is how to apportion the work among the various concurrently operating threads. Typically, workload is apportioned by somehow logically dividing the data to be processed. For example, a block of data might be divided evenly into a number of parts equal to the number of available threads, so that each thread can independently process a separate portion of the data. This is called "static" or "a priori" partitioning.
Such apportionment of work is rarely completely balanced. One reason is that the extent of the data may be difficult to ascertain, or the data's processing time may be otherwise difficult to predict. Furthermore, even equisized data subparts may require comparatively different processing times. As a result of this lack of balance, some threads finish early, and other threads labor for longer periods. Still, the overall job is not complete until all threads have finished, including the last to finish its respective task. Thus, even though some threads complete their work early, other threads linger on, delaying completion of the ultimate task. One name for this phenomenon is called "workload skew". For users seeking faster execution of their multi-thread application programs, workload skew is a significant problem area.