Current hardware technology has developed System-on-Chip (SOC) devices with multiple cores. Currently, SOCs with 12 to 64 hardware threads exist, and there are plans for chips with 50+ cores (Intel Unveils New Product Plans for High-Performance Computing, http://www.intel.com/pressroom/archive/releases/2010/20100531comp.htm). In addition, multiprocessing computing systems have been developed that use from two to thousands of processors (Multiprocessing Systems, IEEE Transactions on Computers, Vol. C-25, No. 12, December 1976, http://63.84.220.100/csdl/trans/tc/1976/12/01674594.pdf). To take advantage of these multiprocessor or multithread systems, operating systems divide applications into discrete tasks that can be processed separately.
However, there are trade-offs between the number of threads used and the overhead accompanying multiple threads. When tasks on separate threads are completed, the results usually must be coordinated with tasks running on other threads. This creates difficult timing issues because tasks often require different amounts of time to accomplish. Also, subsequent tasks may require results from current tasks. This creates difficult synchronization issues. Another issue is power consumption. If all hardware threads are running at full capacity, the chip's power delivery systems may be over-taxed and may generate more heat than can be safely dissipated. One of the key challenges for parallel runtime systems is how to use hardware threads efficiently and to provide the best performance.