Developments in processors, such as microprocessors, microcontrollers, etc., are always on-going. The reason being is that there is a large demand for microprocessors to process instructions faster to reduce the execution time of a program, and more efficiently to reduce their overall power consumption. Techniques such as out-of-order processing, where instructions are executed not in the order provided by the program, have improved the performance of current processors. Even though the performance of processors have improved over the recent years, there are still some room for further improvement in the performance as illustrated in the following example.
FIG. 1A illustrates a block diagram of a prior art processor system 100. In general, the processor system 100 retrieves program instructions initially stored in a main memory 102 by way of a system bus 104, and performs the execution of the program instructions. The processor system 100 consists of an instruction-retrieval front end including an instruction cache 108, a prefetch buffer 110, and a prefetch logic 106. The processor system 100 further consists of a pre-processing stage including an instruction decoding logic 112 and a branch prediction logic 113. Finally, the processor system 100 consists of an execution processing stage including an allocator 114, a register alias table/reorder buffer (RAT/ROB) 115, a real (architectural) register file (RRF) 116, an instruction selection logic 118, an execution logic unit 120, and a retirement logic unit 122.
In operation, the instruction-retrieval front end of the processor system 100 functions to place instructions in the pipeline for execution. Specifically, the prefetch logic periodically issues requests for instructions from the main memory 102 by way of the system bus 104. In response to these requests, instruction data is transferred to the instruction cache 108. The prefetch logic 106 also causes sequential instruction data of a certain size (e.g. 16 bytes of instruction data at a time) to transfer from the instruction cache 108 to the prefetch buffer 110. The prefetch buffer 110 stores a certain amount of sequential instruction data (e.g. 32 bytes). When the prefetch buffer 110 has some empty slots, a signal is sent to the prefetch logic 106 instructing it to transfer another set of instructions from the instruction cache 108 to the prefetch buffer 110 (e.g. 16-bytes at a time).
The pre-processing stage of the processor system 100 generally entails preparing the instruction data for subsequent processing by the execution stage. Specifically, the instruction decoding logic 112 receives the 32 bytes of instruction data from the prefetch buffer 110 and identifies the actual instructions within the instruction data by marking boundaries between instructions. If the processor system 100 processes sub-instructions such as micro-ops (i.e. fixed-length RISC instructions), then the instruction decoding logic 112 translates the identified instructions into micro-ops. If the instruction received is a branch, the address from which the instruction was accessed is sent to the branch prediction logic unit 113 to predict where the program will branch to. The branch prediction logic 113, based on its prediction determination, instructs the prefetch logic 106 to sequentially transfer the corresponding instructions to the prefetch buffer 110.
The execution stage of the processor system 100 generally entails queing, scheduling, executing, and retiring the instructions. The allocator 114 sequentially adds new instructions into the end of the reorder buffer (ROB) 115. The register alias table (RAT) portion of the RAT/ROB 115 assigns alias registers to function as real registers 116 for instructions that use source operands. The register alias table (RAT) keeps track of which real register 116 does an alias register corresponds.
As shown in FIG. 1B, each reorder buffer (ROB) entry includes a first field to indicate whether the corresponding instruction has been executed, a second field to store the memory address of the instruction to branch to if the corresponding instruction is a branch, a third field to store the corresponding instruction, and a fourth field to identify the corresponding alias registers holding the source operands for the corresponding instruction. The reorder buffer (ROB) 115 is a cyclic buffer having a start-of-buffer pointer that points to the first entry of the reorder buffer (ROB) 115, such as entry four (4) as shown, and an end-of-buffer pointer that points to the last buffer entry, such as entry 36 as shown. Thus, the entry pointed to by the start-of-buffer pointer contains the oldest instruction in the reorder buffer (ROB) 115 and the entry pointed to by the end-of-buffer pointer contains to the youngest instruction in the reorder buffer (ROB) 115.
The instruction selection logic 118 selects and queues the instructions to be executed. The instructions can be selected out-of order. The criteria used by the instruction selection logic 118 to select an instruction is whether all prior conditions have been met for the instruction to execute. The execution logic unit 120 executes the instructions in the order selected by the instruction selection logic 118. After the instruction has been successfully executed, the retirement logic unit 122 sets the executed flag in the reorder buffer (ROB) 115. If and when the executed instruction becomes the oldest instruction in the reorder buffer (ROB) 115, the instruction is committed, and the retirement unit 122 causes the copying of the register result of the executed instruction from the corresponding alias register to the designated real register 116.
It is this copying that results in some inefficiencies in the processor system 100. The copying is expensive in terms of power consumption since it includes reading and writing operations. Reducing the number of copies from alias registers to the real register file (RRF) could result in lower power consumption, extended battery life and a less sophisticated cooling system for the processor.