1. Field of the Invention
This invention is related to the field of integrated circuits, and more particularly, to storing and processing microcoded instructions within an integrated circuit.
2. Description of the Related Art
Instructions processed in an integrated circuit are encoded as a sequence of ones and zeros. For some processor architectures, instructions may be encoded with a fixed length, such as a certain number of bytes. For other architectures, such as the x86 architecture, the length of instructions may vary. The x86 integrated circuit architecture specifies a variable length instruction set (i.e., an instruction set in which various instructions are each specified by differing numbers of bytes). For example, the 80386 and later versions of x86 integrated circuits employ between 1 and 15 bytes to specify a particular instruction. Instructions have an opcode, which may be 1–2 bytes, and additional bytes may be added to specify addressing modes, operands, and additional details regarding the instruction to be executed. The x86 integrated circuit architecture is one example of an architecture having complex instructions that may be implemented in microcode.
Certain instructions within the x86 instruction set are quite complex, specifying multiple operations to be performed. For example, the PUSHA instruction specifies that each of the x86 registers be pushed onto a stack defined by the value in the ESP register. Thus, a PUSHA instruction specifies that a store operation be performed for each register and the ESP register may be decremented between each store operation to generate the address for the next store operation.
While it may be possible to implement hardware to execute any instruction directly, the cost of such implementation in terms of the number of transistors required and/or die area needed may be prohibitive in some cases. In the case of an instruction set like the x86 instruction set mentioned above, which is rich in complex instructions, the hardware required to execute all instructions directly may be enormous. In fact, current integrated circuit production methods may not be adequate to produce a single chip capable of executing all x86 instruction directly in hardware. Fortunately, other methods for executing complex instructions have been developed, such as decomposing a complex instruction, referred to as a microcoded instruction, into a set of more elementary operations, referred to herein as microcode. Microcode may be executed directly on hardware that is far less complex than that necessary to execute the complex instructions.
Microcoded instructions are transmitted to a microcode instruction unit within the integrated circuit, which decodes the complex microcoded instruction and produces two or more less-complex microcode operations for execution by the integrated circuit. The simpler microcode operations corresponding to the microcoded instruction are typically stored in a read-only memory (ROM) associated with the microcode unit. Thus, microcoded instructions are often referred to as microcode ROM (MROM) instructions.
Less complex instructions are typically directly decoded by hardware decode units within the integrated circuit. The terms “directly-decoded instruction”, “fastpath instruction” or “non-complex instruction” may be used interchangeably herein to refer to an instruction that is decoded and executed by the integrated circuit without the aid of a microcode instruction unit. Directly-decoded instructions are decoded into component operations via hardware decode, without the intervention of a microcode instruction unit, and these operations are executed by functional units included within the integrated circuit.
An integrated circuit may decode or partially decode an instruction encoding to determine if an instruction is a fastpath instruction or an MROM instruction. If the instruction is an MROM instruction, the integrated circuit's microcode instruction unit retrieves the corresponding microcode routine from the integrated circuit's microcode ROM. Multiple clock cycles may be used to transfer the entire set of microcode operations within the ROM that correspond to the MROM instruction. Once the microcode operations are output from the microcode ROM unit, these operations are typically included within the operation stream that is dispatched to one or more devices that schedule operations for execution. Thus, typical microcode ROM units, in effect, perform instruction expansion on the microcoded instruction.
The microcode operations output from the microcode ROM may be elementary to the point that they can complete execution at a rate of one operation per execution cycle (clock cycle) of the processing or execution unit. In order to operate at maximum efficiency (maximum number of instructions executed per unit time), the execution unit may need to be supplied with elementary operations at a rate of one operation per clock cycle. This may become problematic when attempting to implement a integrated circuit capable of executing an instruction set rich in complex instructions (like the x86 instruction set) and at the same time having the highest possible clock speed, as described in more detail below.
Current technology enables the production of integrated circuits that may be run at speeds such that multiple clock cycles may be required to physically propagate a signal across the die. For example an integrated circuit may have a die size such that one dimension is 20 millimeters. The propagation time for signals across the die may be on the order of 30 mm/nanosecond (approximately one tenth the speed of light in vacuum). If the execution unit of such a processor is capable of being clocked at a rate of 6 GHz, then it may take as many as 4 clock cycles to propagate a signal across the die.
This propagation delay may be significant in the design of an integrated circuit's microcode ROM. As stated previously, an instruction set containing large numbers of complex-instructions (such as the x86 instruction set) may require that large amounts of microcode be stored in microcode ROM. This implies that a large portion of the integrated circuit's die area may be occupied by the microcode ROM. Typically, in order to access an operation contained in a microcode ROM, some other device generates the address for the line within the microcode ROM that contains the desired operations. The address signals propagate from the generating device to the microcode ROM. Logic within the microcode ROM decodes the address signals to activate the line containing the desired operation and the data from that line is made available at the output of the microcode ROM. For a microcode ROM that occupies a large area on the processor die, signal propagation delay as described above can result in a significantly greater delay in outputting data from a line whose memory elements are located far from the decode logic. Since only a single delay time is specified for data outputting, the time needed to make data available from the cells farthest from the decode logic sets the lower bound for the delay. The data signals must then propagate from the microcode ROM to the device that will make use of the data. The total time required to complete this process determines the speed at which the microcode ROM can be accessed.
It is typically desired that a microcode ROM have an access time of a small number (e.g., one or two) clock cycles in order to match the rate at which the ROM can output operations with the rate at which the operations are consumed by other components. A microcode ROM with enough capacity to store all the microcode routines of a complex instruction set may occupy such a large area on the processor die that it is incapable of the desired access time. If a microcode ROM cannot output operations at a rate that is greater than or equal to the rate at which the operations can be executed, the processor will operate at less than maximum efficiency.