A computer processor (processing unit), such as an integrated circuit (IC) based microprocessor, generally comprises a control unit, which directs the operation of the system, and one or more execution units, for example, arithmetic logic units (ALUs), which perform computational operations. The design of a processor involves the selection of a register set(s), communication passages between these registers, and a means of directing and controlling how these operate. Normally, a processor is directed by a program, which includes of a series of instructions that are kept in a main memory. Each instruction is a group of bits, usually one or more words in length, specifying an operation to be carried out by the processor. In general, the basic cycle of a processor comprises the following steps: (a) fetch an instruction from memory into an instruction register; (b) decode the instruction (i.e., determine what it indicates should be done; each instruction indicates an operation to be performed and the data to which the operation should be applied); (c) carry out the operation specified by the instruction; and (d) determine where the next instruction is located. Normally, the next instruction is the one immediately following the current one.
However, in high performance processors, such as superscalar processors where two or more scalar operations are performed in parallel, the processors may be designed to perform instructions out of order, or in an order that is not consistent with that defined by the software driving the microprocessor. In these processors, instructions are executed when they can be executed, as opposed to when they appear in the sequence defined by the program. Moreover, after execution of out of order instructions, the results are ultimately reordered to correspond with the instruction order, prior to passing the results back to the program.
Out of order execution of instructions poses peculiar obstacles from a design perspective. One such obstacle involves quickly locating instructions that are ready to be executed and permitting such ready instructions to access execution resources, as appropriate.
More specifically, in some implementations of out of order processors, instructions are fetched and then placed in respective slots of a queue (i.e., a temporary storage means), where the instructions are launched, or executed, in an out of order sequence. Each slot has a memory that is capable of temporarily storing information about an instruction and has some local logic functionality to support the memory. Typically, some type of control logic associated with the queue determines which and when instructions will be launched from the queue during a launch cycle. When each instruction is executed, this control logic causes the instruction to communicate with one or more execution resources, for example, ALUs or memory ports, and during each launch cycle, this control logic may cause more than one instruction to launch into execution, depending upon the nature and extent of the execution resources. In some present day designs of microprocessors, up to four instructions are launched during each launch cycle.
In determining which instructions should be launched during a launch cycle, the control logic evaluates a number of criteria, including for example, the age of an instruction in the queue (i.e., generally older instructions should be executed before newer instructions wherever possible), instruction dependencies, etc. An instruction, called a “dependent” instruction, is dependent upon another instruction, called a “producer” instruction, when the dependent instruction operates upon an operand or result that is produced from the producer instruction. Generally, dependent instructions are placed after their producer instructions in program order, and therefore, in a typical processor that executes instructions in order, the dependent instructions are executed after their producer instructions. However, in a processor that executes instructions out of order, unless safeguards are implemented, it is possible that a dependent instruction may be executed prior to the producer instruction from which it depends. Thus, the control logic will not permit an instruction to execute if it is dependent upon another producer instruction and the other producer instruction has not yet executed.
Another type of conflict present in many memory systems, such as on-chip SRAM (static random access memory) cache, involves contention for memory banks. Often, memory is organized into two or more banks, each of which can be accessed independently, but each can supply only one word of data during a launch cycle. U.S. Pat. No. 5,761,713 to Lesartre describes an example of such a cache. If two or more accesses are presented to the memory, then they can both execute if their addresses are for different banks. If more than one access addresses the same bank of memory, all but one will need to wait. The control logic can use bank conflicts as another criterion to qualify whether an instruction is ready to launch.
When an instruction is ready to be executed (has no operand dependencies and no cache conflicts, among other things), a local launch logic element associated with to each slot (as well as each instruction) indicates this ready condition. During a cycle, a part of the control logic of the queue, sometimes referred to as arbitration logic, seeks out ready instructions by analyzing the information from the launch logic elements associated with the slots and instructions, allocates one or more ports to the ready instructions, and causes the ready instructions to launch into execution. Historically, this arbitration part of the control logic associated with the queue has been complex, slow, and takes up space, which are very undesirable in the context of an integrated circuit (IC) microprocessor. An example of the foregoing logic, in the context of a memory queue, is set forth in U.S. Pat. No. 5,761,713 to Lesartre (see arbitration logic of FIGS. 5A-5D), which is incorporated herein by reference. The logic described in the aforementioned patent is hierarchical, in a sense, and requires much combinational logic. Thus, there is a need in the art for a better form of control logic for locating instructions that are ready to be executed during a launch cycle and permitting such ready instructions to access execution resources, as appropriate.