Multi-threading frameworks, such as Open Multi-Processing (OpenMP), Intel® Threading Building Blocks (TBB), Intel® Cilk++, Intel® C++ for Throughput Computing (Ct), and Microsoft® Parallel Patterns Library (PPL) allow parallelism to improve the performance of a multi-threaded application. The advantage of the multi-threaded application can be observed on computer systems that have multiple central processing units (CPUs), or CPUs with multiple cores as each thread of the multi-thread application uses one of the CPUs/cores for concurrent execution.
However, if the multi-threaded framework is used incorrectly to execute the multi-threaded application, the advantage of parallelism may be compromised. FIG. 1A illustrates a prior-art code 100 of a parallel for-loop. The granularity of the function foo( ) is set as one. Depending on how long the function foo( ) takes to execute, the advantage of parallelism may be compromised as the granularity of one is too fine.
FIG. 1B illustrates a prior-art code 130 of a parallel for-loop with dynamic scheduling. The granularity of the function foo( ) is set as three. Dynamic scheduling requires distribution overheads and depending on how long the function foo( ) takes to execute, the advantage of parallelism may be compromised as the granularity of three is too fine.
FIG. 1C illustrates a prior-art code 150 of work tasks being spawn or created from only one thread. Depending on how large the variable N is set, the prior-art code 150 can have a linear spawning problem with significant active stealing overheads. For example, when the variable N is set to be more than 100, this execution scales much worse than another execution with recursive spawning.
The prior-art codes 100, 130, and 150 illustrate possible scenarios where the multi-threaded application can be used incorrectly or ineffectively.