Advances in semi-conductor processing and logic design have permitted an increase in the amount of logic that may be present on integrated circuit devices. As a result, computer system configurations have evolved from a single or multiple integrated circuits in a system to multiple cores and multiple logical processors present on individual integrated circuits. A processor or integrated circuit typically comprises a single processor die, where the processor die may include any number of processing elements, such as cores, hardware threads, or logical processors.
The ever increasing number of processing elements on integrated circuits enables more software threads to be executed. However, many single-threaded applications still exist, which utilize a single processing element, while wasting the processing power of other available processing elements. Alternatively, programmers may create multi-threaded code to be executed in parallel. However, the multi-threaded code may not be optimized for a number of available processing elements. In either case, once code is replicated for parallel execution, duplicated instructions may be executed on multiple processing elements, which potentially results in minimal performance achievement and an increase in power/energy consumption.