In modern microprocessors, many techniques are used to increase performance. Pipelining is a technique for exploiting parallelism between different instructions that have similar stages of execution. These stages are typically referred to, for example, as instruction-fetch, decode, operand-read, execute, write-back, etc. By performing work for multiple pipeline stages in parallel for a sequence of instructions the effective machine cycle time may be reduced and parallelism between the stages of instructions in the sequence may be exploited.
Superscalar execution is a technique that attempts to initiate more than one instruction in a single machine cycle and typically employs multiple pipelines to execute these instructions in parallel, thereby further exploiting parallelism between independent instructions in an instruction sequence. Such scheduling may be performed statically (by a compiler prior to program execution) or dynamically (by processor hardware during program execution). It will be appreciated that dependencies exist between successive pipeline stages, which may cause delays (sometimes called bubbles) in the execution flow.
An architectural technique that has been used to increase processor performance is a decoupling of memory accesses from execution of operations, which is sometimes referred to as a load/store architecture. In accordance with such architectures, memory accesses are performed by specific load or store instructions and other operations use register (rather than memory) operands. Some architectures provide a register file for addressing memory and a separate register file for holding data. In alternative architectures, the memory access instruction stream may be architecturally separate from, and executed in parallel with the execution instruction stream, where coordination and synchronization are performed either by software or by architectural queues.
Such architectural approaches have been used to restrict the kind of resources used by instructions of a particular type. Typically, compiler techniques are relied upon to statically generate only instructions which adhere to those restrictions and manage coordination and synchronization. One drawback to such an approach is that preexisting programs may need to be recompiled to comply with the new restrictions. In general, no reduction in required resources can be guaranteed for programs that do not comply.
In other alternative architectures, a micro-architecture is employed where instructions may be decoded into multiple micro-operations (sometimes called u-ops) such that memory accesses are performed by specific load or store micro-operations and other micro-operations fetch operands from internal (and architecturally invisible) registers. It will also be appreciated that there exist inherent dependencies between the micro-operations that load operands into these internal registers and the micro-operations that fetch operands from the internal registers. Such dependencies may also cause delays in the execution flow.
In alternative micro-architectures, to further exploit any parallelism between independent micro-operations, the micro-operations may be executed out-of-sequential-order, and yet retired in the original sequential order. When more and more parallelism is exploited through such techniques, it will be appreciated that the number of concurrent micro-operations and the complexity of resources to manage communication and synchronization between the micro-operations may also increase.
Modem micro-architectures may include policies and structures to support concurrent execution of instructions including register renaming, speculation, out-of-order execution, pipelining, multithreading, and dynamic superscalar execution of micro-operations. In some cases hundreds or thousands of storage, scheduling, execution and communication resources are employed to support such advanced processing techniques.
In order to provide processors at the best possible price/performance ratio, it is desirable to reduce any resources required while supporting processing techniques, which are as advanced as possible.
Techniques are desired to exploit parallelism between independent operations and efficiently manage dependencies, communication and synchronization. It would be desirable to find a microarchitecture that can dynamically reduce the required execution resources for preexisting programs, and at the same time, can support processing techniques that are as advanced as possible.