1. Field of the Invention
This invention relates in general to the field of instruction execution in a pipeline processing system, and more particularly to a method and apparatus which fast fills micro instructions having register generic operands with specific register values while the micro instructions are within an instruction queue.
2. Description of the Related Art
Modern computer systems utilize a number of different processor architectures to perform program execution. In conventional microprocessor based systems, a computer program is made up of a number of macro instructions that are provided to the microprocessor for execution. The microprocessor decodes each macro instruction into a sequence of micro instructions, i.e., simple machine instructions that the hardware in the microprocessor can understand, and executes all of the micro instructions in the sequence before decoding another macro instruction.
A macro instruction is typically of the form: OPCODE OPERAND1, OPERAND2, where OPCODE specifies the type of operation to be performed, such as add, multiply or nor, and OPERAND1, OPERAND2 specifies the data upon which the operation is to be performed. It should be appreciated that operands 1 and 2 specify data located either in a register within the microprocessor, such as R1, R2, R3, etc., or a location in memory that contains data.
A more specific example of a macro instruction that performs addition on two values, one located in memory M!, and the other located in a register R within the processor, and stores the result in memory, is:
ADD M!,R PA1 LOAD TEMP, M! PA1 ADD TEMP,R PA1 STORE M!, TEMP PA1 LOAD TEMP,X PA1 ADD TEMP,Y PA1 STORE X,TEMP
This macro instruction is fetched by a microprocessor and provided to a control unit within the microprocessor that translates or decodes the macro instruction into a sequence of micro instructions, or instruction primitives, that the execution unit within the microprocessor understands. The micro instruction sequence generated by the control unit is:
Since the execution unit of the microprocessor cannot operate directly on data within memory, the microprocessor first loads the data from memory M! into a temporary register. The microprocessor then adds the contents of the temporary register to the contents in register R. And finally, the microprocessor stores the result of the add back into memory M!.
The control unit within the microprocessor typically includes a control ROM which contains micro instruction sequences, and translate/decode logic which decodes the macro instructions, and addresses the control ROM to provide the appropriate micro instruction sequence for each macro instruction. For the example above, the micro instruction sequence would be stored in the control ROM at a designated address. When the control unit of the microprocessor received a macro instruction of the form indicated above, it would address the control ROM which would, in turn, provide the micro instruction sequence to an execution unit.
As microprocessors became more powerful, their macro instruction sets increased in both size and complexity. Thus the size of the control ROM which contained the micro instruction sequences associated with the macro instructions grew accordingly. However, the growth in size of the control ROM has been exponential rather than linear. As discussed above, for each macro instruction, a sequence of micro instructions is provided in the control ROM. But a single sequence of micro instructions is not capable of handling all variations of a macro instruction. For example, the opcode for the instruction above is ADD. This opcode instructs the execution unit to add two values together, but by itself, does not determine which two values. It is the operands 1 and 2 that specify the values upon which the add function will operate. In a simple instance, a microprocessor may have as many as eight different data registers (R1-R8) upon which the ADD operation might operate, with complex microprocessors having even more. To accommodate all operand permutations for the ADD instruction, at least 56 n|/(n-k)|! different micro instruction sequences would need to be provided for in the control ROM. And, this does not include any operands which specify memory locations for the data. Thus, if the control ROM were to provide operand specific micro instruction sequences for each macro instruction, and for all combination of operands, the size of the control ROM would be enormous.
To overcome this problem, register generic micro instruction sequences are often provided by the control ROM. For the above ADD M!,R macro instruction, the control ROM would provide the following micro instruction sequence:
where X and Y do not specify any particular operand. The control ROM provides these micro instructions, one at a time, to the translate/decode logic. The translate/decode logic takes these register generic micro instructions, and fills in the appropriate operands specified by the macro instruction. By allowing the translate/decode logic to fill in register generic operands, the size and complexity of the control ROM is dramatically reduced. However, with the advantage of decreased size and complexity of the control ROM comes the disadvantage of decreased performance. Now, the control unit not only has to look up the appropriate micro instruction sequence for each macro instruction, but in addition, has to fill in appropriate operands for each micro instruction. This fill in process requires additional processor time, which means that execution of the micro instruction sequence is delayed.
In more advanced computer systems, another type of microprocessor, called a "pipeline" processor, is used. A pipeline processor decodes macro instructions, similar to those of the conventional microprocessor discussed above, into a sequence of micro instructions. However, the micro instructions are overlapped during execution to improve performance. Such overlapping of micro instructions during execution is known as "pipelining". Pipelining is a key implementation technique used to make fast microprocessors.
A pipeline is like an assembly line. Each step in a pipeline operates in parallel with other steps, though on a different micro instruction. Like the assembly line, different steps are completing different parts of a macro instruction in parallel. Each of these steps is called a pipe stage or a pipe segment. The stages are connected one to the next to form a pipe-instructions enter at one end, progress through the stages, and exit at the other end.
Flow of instructions through a pipeline is typically controlled by a system clock, or processor clock signal. For example, during a first clock cycle, a first macro instruction may be fetched from memory. By the end of the clock cycle, the first macro instruction is placed into a buffer which feeds a translate/decode stage. During a second clock cycle, a second macro instruction may be fetched and placed into the buffer. In addition, and in parallel to the second macro instruction fetch, the first macro instruction is "read" by the translate/decode logic, and translated into a sequence of micro instructions. By the end of the second clock cycle, a first micro instruction in the sequence is provided to the instruction register. During a third clock cycle, the first micro instruction is provided to later stages in the pipeline, and a second micro instruction is stored in the instruction register. This pipeline process continues indefinitely as long as macro instructions can be fetched into the buffer during each clock cycle, and as long as the translate/decode logic can provide micro instructions to later stages in the pipeline during each clock cycle.
If we apply the idea of providing register generic micro instructions to a pipeline processor, the result is as follows. During a first clock cycle, a first macro instruction may be fetched from memory. By the end of the clock cycle, the first macro instruction is placed into a buffer which feeds a translate/decode stage. During a second clock cycle, a second macro instruction may be fetched and placed into the buffer. In addition, and in parallel to the second macro instruction fetch, the first macro instruction is "read" by the translate/decode logic, and the control ROM is addressed to provide the appropriate micro instruction sequence. By the end of the second clock cycle, a first micro instruction in the sequence is provided to the instruction register by the control ROM. In this case, however, the micro instruction that is provided, is register generic, i.e., it does not specify any particular operands. During a third clock cycle, the translate/decode logic must fill in the specific operands designated by the macro instruction. During a fourth cycle, the first micro instruction is provided to later stages in the pipeline, and a second micro instruction is stored in the instruction register. However, this micro instruction also needs to be filled in with register specific operands.
Thus, each time a register generic micro instruction is provided by the control ROM to the instruction register, translate/decode logic is required to fill in register specific operands. In pipeline processors, such register specific fill in of control ROM generated micro instructions requires at least one clock cycle per micro instruction. Such additional processing requirement adds delays or holes in the pipeline. And, every delay or hole in the pipeline increases the time required to execute the micro instruction sequence. Processor performance is effected accordingly. For a background on techniques used to fill in register generic operands, please see U.S. patent application Ser. No. 5,717,910 entitled "METHOD AND APPARATUS FOR REGISTER ADDRESS FILL-IN OF REGISTER GENERIC MICROCODE INSTRUCTIONS" by Glenn Henry, and Terry Parks, which is incorporated herein by reference.
One technique that has been used to overcome delays or holes in the pipeline, i.e., holes resulting from the fetch and translate stages of the pipeline, is to introduce an instruction queue between the translate/decode logic and the instruction register. The instruction queue can act as buffer to hold micro instructions generated by the translate/decode logic, before they are needed by the instruction register. If the instruction queue is able to "get ahead" of the pipeline, then later stalls or delays in the pipeline may be overcome by providing the micro instructions from the instruction queue. For a general background on instruction queues, please see U.S. patent application Ser. No. 5,619,667 entitled "METHOD AND APPARATUS FOR FAST FILL OF TRANSLATOR INSTRUCTION QUEUE", by Glenn Henry and Terry Parks, which is incorporated herein by reference. However, heretofore instruction queues have not been used to store micro instructions having register generic operands, nor has there been provided any mechanism which allows register generic micro instructions to be filled in while in an instruction queue.