Conventional computer processor architectures are typically pipelined to increase throughput. Pipelining is used to break the execution of instructions into a plurality of stages so that different instructions can be processed by different stages at the same time. Processor pipeline designs can be broadly classified into one of two categories: out-of-order pipelines and in-order pipelines.
Conventional out-of-order pipeline designs are usually used in high-performance applications. Conventional out-of-order pipeline designs have the capability to issue multiple instructions per cycle, and are capable of completing instructions out of order so that a processor can continue to execute instructions even while a long latency event is being handled for other instructions, such as due to a data cache miss by a prior instruction. This higher performance, however, comes with a significant tradeoff in terms of power consumption. Several functional units included in a conventional out-of-order pipeline design that are required to support the execution of instructions in an out-of-order fashion typically consume a significant amount of power. Certain applications benefit significantly from the use of out-of-order pipelining, so the increased power cost may be justified for those applications. However, other applications may not benefit sufficiently from out-of-order pipelining to justify the increased power cost associated with such designs.
Conventional in-order pipeline designs are typically a better choice for applications where power-efficiency rather than high performance is the primary goal. Processors supported by in-order pipelines typically have lower performance than processors supported by out-of-order pipelines. However, the functional units that are required to manage out-of-order pipelining are typically not required in-order pipeline designs, and as a result, in-order pipeline-based designs typically consume much less power than comparable out-of-order pipeline-based designs.
Processor designers are therefore typically forced to choose from between in-order and out-of-order pipelines based upon the expected workloads and power requirements of particular processor designs. Where a particular processor design is instead used to handle workloads that are better suited for the other type of pipeline, performance suffers.
With the ability to incorporate multiple processing cores on the same processor chip, however, it is also possible to incorporate both types of pipelines in the same processor design. With such a design, heterogeneous processing cores are integrated onto the same chip so that applications that are best suited for a particular type of pipeline are executed on processing cores best suited for such applications. However, where an application has different portions that are better suited for different pipeline designs, moving an application between different processing cores often introduces a significant latency overhead, thereby limiting the frequency and benefits of migration for many applications.
Another conventional approach includes simultaneously executing several copies of an application on both on a processing core with an out-of-order pipeline and a processing core with an in-order pipeline, so that different portions of an application that are better suited for one type of pipeline will be executed by the processing core having that type of pipeline. This conventional approach, however, is not energy efficient because multiple processing cores are executing redundant copies of the same application. In addition, resources that could have otherwise been used to execute other applications are tied up handling the redundant execution, so overall performance is reduced.
A need therefore continues to exist in the art for an improved manner of efficiently supporting both in-order and out-of-order pipelining for different workloads.