Advances in semiconductor processing and logic design have permitted an increase in the amount of logic that may be present on integrated circuit devices. As a result, computer system configurations have evolved from a single or multiple integrated circuits in a system to multiple cores, multiple hardware threads, and multiple logical processors present on individual integrated circuits. A processor or integrated circuit typically comprises a single physical processor die, where the processor die may include any number of cores, hardware threads, or logical processors.
The ever increasing number of processing elements—cores, hardware threads, and logical processors—on integrated circuits enables more tasks to be accomplished in parallel. However, processors that employ all out-of-order cores may lead to power inefficiencies and/or performance inefficiencies under some circumstances. As a result, some hardware-software co-designed systems have been developed to confront the power-performance efficiency problem. In that system, a wide, simple in-order processor may be utilized, while software optimizes and schedules programs to run on the in-order hardware efficiently.
Yet, hardware-software co-designed systems are typically associated with two adverse impacts: (1) translation and/or optimization of code utilizing a binary translator may slow down some applications with short running tasks and small response-time constraints (a binary translation glass jaw); and (2) an in-order processor may not perform well for some styles of programs that are better suited for parallel execution (an in-order glass jaw).