The present disclosure relates generally to computer systems, and more particularly to methods and computer systems performing superscalar out-of-order processing.
Modern central processing units (CPU) have a superscalar out-of-order architecture. A superscalar CPU implements a form of instruction-level parallelism within a single processor. It allows faster CPU throughput and executes more instructions in a unit of time. A superscalar processor executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units such as an arithmetic logic unit, a shifter, or a multiplier, on the processor. The superscalar aspect brings the benefit of ‘workload optimization’ (e.g. single instruction multiple data streams (SIMD) engine is good at vector processing). Out-of-order execution is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay. A processor executes instructions in an order governed by the availability of input data, not necessarily in their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and to retrieve data for the next instruction in a program, processing instead the next instructions which are able to run immediately and independently. It can be viewed as a hardware based dynamic recompilation to improve instruction scheduling. The out-of-order aspect brings high processing performance and the benefit of auto-parallelization of independent code segments.
However, the superscalar out-of-order processing architecture has its limitations. For example, there is only limited availability of processing units (e.g. two integer processing units, one float point unit (FPU), one single instruction stream, multiple data streams processing unit). Some of the processing units are hard-wired, (e.g. an integer processing unit can't be changed at runtime into a floating point processing unit as it is built with CMOS transistors). The size of out-of-order instruction window is finite and fixed.
Therefore, heretofore unaddressed needs still exist in the art to address the aforementioned deficiencies and inadequacies.