1. Field of the Invention
This invention relates to methods for assessing parallel programming solutions, and particularly to the diagnostic examination of the execution and performance of parallel threads of execution that are implemented within a parallel computing environment.
2. Description of Background
Before out invention, conventionally, parallel computing systems (e.g., symmetric multiprocessing (SMP) systems) were used to effectively divide a task into smaller sub-tasks on multiple processors in order to efficiently balance the processing workload of the task; the results of the task division including decreased task processing times, and the faster acquisition of computational results. The dividing of the task into sub-tasks effectively created parallel processes, or threads, wherein the parallel processes/threads were simultaneously executed upon multiple computing resources of a system.
To optimize the use of parallel computing systems, parallel programming models have been developed. Particularly, parallel programming is focused on the separation, or partitioning of projects into separate tasks, in addition to the allocation of the tasks to different processors. Communication between multiple task processes traditionally has been facilitated by way of communications programming protocol models that are directed to distributed memory systems (e.g., by the use of Message Passing Interface (MPI)), or shared memory multiprocessing systems (e.g., by the use of OpenMP).
In many instances, when a communications programming protocol model is implemented to set forth instructions for accomplishing a predetermined parallel processing task, the programmed instructions may not completely be performed. For example, in the event that the processing for a segment of code is specified to perform a loop operation, the predetermined number of loops is evenly distributed among the same number of execution threads. The reasoning behind distributing the looping function among the differing processing threads is to utilize multiple resources within a system, and thus diminish the computational processing time. However, there may be occurrences when not all of the prescribed loop execution threads are running, and the computation processing time is not optimized. In these instances, the occurrence could be due to various reasons (e.g., problems due to a kernel or compiler, user error, defects in an SMP library, etc. . . . ).
Therefore, a need exists for an automated diagnostic performance technique that can be used to determine whether a parallelized code segment is fully executed as prescribed within a parallel programming environment, and further to provide performance estimates of specific parallelized code segments throughout an entire program.