The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for allocating worker threads by utilizing configuration caching such that the allocation is performed in constant time.
Parallel processing systems and parallel programming are becoming more prevalent in today's computing environment. With such systems and programming, more than one computation/operation can be executed at substantially the same time. As a result, these computations and other operations can be performed such that the speed by which such computations and operations are performed is greatly increased. The parallel processing system provides greater throughput as a result.
Various standards for parallel processing have been developed. One standard is the OpenMP Application Program Interface (API). The OpenMP API supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, including Unix platforms and Windows NT platforms. Jointly defined by a group of major computer hardware and software vendors, OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer.
With OpenMP, as with other parallel processing standards, threads are selected to run a parallel task each time that a parallel region of code, such as a parallel loop construct in the code, is encountered during processing. There is a series of tasks that must be accomplished when creating a parallel region. After determining if the parallel region can proceed in parallel, and determining a number of threads that should be allocated for the particular parallel region, which thread(s) to use to process a parallel region of code are selected. Moreover, the selected threads must be informed of where to obtain the work to be performed so that the selected threads can execute the code associated with the parallel region.
Performing such thread selection is time consuming with the amount of time needed to perform such selection being proportional to the number of threads selected to execute the code in the parallel region. While this may be manageable when processors could execute at most a maximum of 4, 8, or even 16 threads in parallel, this becomes a significant time consumption factor in machines with a large number of parallel executing threads, e.g., 64, 128, or even more threads being executed in parallel.