1. Field of the Invention
This invention is related to the field of processors and, more particularly, to microcode patching within processors.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term xe2x80x9cclock cyclexe2x80x9d refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term xe2x80x9cinstruction processing pipelinexe2x80x9d is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
Microprocessor designers often design their products in accordance with the x86 microprocessor architecture in order to take advantage of its widespread acceptance in the computer industry. Because the x86 microprocessor architecture is pervasive, many computer programs are written in accordance with the architecture. X86 compatible microprocessors may execute these computer programs, thereby becoming more attractive to computer system designers who desire x86-capable computer systems. Such computer systems are often well received within the industry due to the wide range of available computer programs.
The x86 microprocessor architecture specifies a variable length instruction set (i.e. an instruction set in which various instructions employ differing numbers of bytes to specify that instruction). For example, the 80386 and later versions of x86 microprocessors employ between 1 and 15 bytes to specify a particular instruction. Instructions have an opcode, which may be 1-2 bytes, and additional bytes may be added to specify addressing modes, operands, and additional details regarding the instruction to be executed. Certain instructions within the x86 instruction set are quite complex, specifying multiple operations to be performed. For example, the PUSHA instruction specifies that each of the x86 registers be pushed onto a stack defined by the value in the ESP register. The corresponding operations are a store operation for each register, and decrements of the ESP register between each store operation to generate the address for the next store operation.
Often, complex instructions are classified as microcode read only memory (MROM) instructions. MROM instructions are transmitted to a microcode instruction unit within the microprocessor, which decodes the complex MROM instruction and produces two or more simpler microcode instructions for execution by the microprocessor. The simpler microcode instructions corresponding to the MROM instruction are typically stored in a read-only memory (ROM) within the microcode unit. The microcode instruction unit determines an address within the ROM at which the microcode instructions are stored, and transfers the microcode instructions out of the ROM beginning at that address. Multiple clock cycles may be used to transfer the entire set of instructions within the ROM that correspond to the MROM instruction.
Different instructions may require differing numbers of microcode instructions to effectuate their corresponding functions. Additionally, the number of microcode instructions corresponding to a particular MROM instruction may vary according to the addressing mode of the instruction, the operand values, and/or the options included with the instruction. The microcode instruction unit issues the microcode instructions into the instruction processing pipeline of the microprocessor. The microcode instructions are thereafter executed in a similar fashion to other instructions. It is noted that the microcode instructions may be instructions defined within the instruction set, or may be custom instructions defined for the particular microprocessor.
Conversely, less complex instructions are decoded by hardware decode units within the microprocessor, without intervention by the microcode unit. The terms xe2x80x9cdirectly-decoded instructionxe2x80x9d and xe2x80x9cfastpath instructionxe2x80x9d will be used herein to refer to instructions which are decoded and executed by the microprocessor without the aid of a microcode instruction unit. As opposed to MROM instructions which are reduced to simpler instructions which may be handled by the microprocessor, directly-decoded instructions are decoded and executed via hardware decode and functional units included within the microprocessor.
New microprocessor designs typically are produced in iterative steps. Microprocessor prototypes are fabricated on silicon chips, and then are tested using various techniques to determine if the processor design, as fabricated, will perform satisfactorily. As errors are detected, the microprocessor design is modified and new prototypes are produced embodying the modified design. This seemingly continuous process of designing, fabricating and testing a processor design is referred to as xe2x80x9cdebugging.xe2x80x9d
One of the portions of the microprocessor design that requires debugging is the microcode. As the microprocessor is tested, errors may be discovered in the microcode instructions. Because of the limited access to the microcode, the microcode is typically changed only when new prototypes are produced for successive designs. Furthermore, when errors are found in the microcode, all related debugging is typically stopped, because it is inefficient to modify the processor hardware when the associated microcode will be revised. Consequently, further debugging in related areas may be halted until the new prototypes are produced.
When errors (or bugs) are found in microcode instructions, these errors are documented to system designers. Typically, the system designers run simulations to find ways to change the microcode to correct the errors detected. These changes cannot be effectively tested until the next prototype is produced with the changes to the microcode embedded in the internal ROM of the subsequent processor prototype. A problem with this approach is that the changes to the microcode cannot be easily or completely verified in the system environment before the changes are committed to silicon. This procedure can greatly increase the cost and time expended during the design process, as unverified changes are made to the microcode and incorporated in a subsequent prototype of the microprocessor, only to fail.
It may also be desirable to enter production with a processor even though the processor microcode still has some xe2x80x9cbugsxe2x80x9d. In this situation, it may be desirable to somehow distribute microcode xe2x80x9cfixesxe2x80x9d users along with the processor. Also, it may be desirable to be able to somehow xe2x80x9cpatchxe2x80x9d processor microcode if microcode bugs or other bugs are discovered after a processor has already shipped to customers. Thus, it may be desirable to distribute or update microcode patches after a processor is in production.
One conventional way to address the above concerns is to incorporate a technique for patching existing instructions with substitute microcode instructions. When an instruction that needs to be patched is encountered, the instruction fetching mechanism of the microprocessor accesses the substitute microcode instruction from external memory and loads the substitute microcode instruction into the instruction cache. As used herein, the term xe2x80x9cexternal memoryxe2x80x9d refers to any storage device external to the microprocessor. The substitute microcode instruction, or patched microcode instruction, is then dispatched into the instruction processing pipeline as a substitute for the existing instruction.
Unfortunately, fetching patched microcode instructions from external memory causes a significant portion of the microprocessor to be redesigned. The instruction fetching and alignment mechanisms are designed for x86 type instructions, not microcode instructions. Microcode instructions are typically a different length then x86 instructions and are encoded differently. Therefore, the instruction fetching mechanism, instruction cache and other circuitry are not designed to handle microcode instructions. To implement the above described patching mechanism, this circuitry must be redesigned to accommodate patched microcode instructions.
Another problem with fetching microcode patches from external memory, or even from internal caches, is performance. In many conventional processors, the width of data returned by memory or cache accesses is smaller than the width of microcode instructions fetched from the microcode ROM of the processor. Thus, if a microcode patch is fetched from external memory or from a cache, multiple memory accesses will be required to load a patched microcode instruction, as compared to a single wide fetch from the processor""s microcode ROM. Furthermore, the latency for memory accesses is typically much longer than for fetches from the internal microcode ROM. Thus, microcode patches fetched from external memory or cache typically have an adverse effect on processor performance since fetching such a patch typically requires more and slower accesses.
One prior art processor loads microcode patches from system memory into the processor before the patches are needed. Loading the patch data is triggered by a write to model specific register (MSR) 079h with some other register pointing to the patch data in memory. If the patch is successfully loaded, MSR 08Bh is loaded with a patch identification (ID). This technique may avoid having to fetch a patch from external memory when the patch is needed.
Another problem with conventional microcode patch mechanisms concerns triggering the patch. One technique has been to provide a tag memory in the processor having one bit for every location in the microcode ROM. If a particular microcode ROM location is to be patched, then the corresponding bit is set in the tag memory. However, for typical microcode ROM sizes, this technique may require thousands of bits of tag memory. Additionally, timing may be complicated to access all the bits of the tag memory for each microcode ROM fetch in order to check if a patch is enabled.
Another technique is to flag instruction set opcodes that are to be patched. For efficient space and timing reasons, this technique has been implemented so that flagged opcode bins cover multiple opcodes, resulting in xe2x80x9cpatchingxe2x80x9d opcodes that did not need to be patched. Additionally, if the microcode that needs to be patched does not correspond to an instruction set opcode, such as an exception handler, it can not be patched. Thus, this technique lacks granularity and is limited to only patching microcode corresponding to instruction set opcodes.
An amount of random access memory (RAM) may be provided in a processor for implementing microcode patches. The patch RAM may loaded by a microcode routine that is part of the normal microcode contained in a microcode ROM unit of the processor. When the processor powers-up it uses its internal ROM microcode only if no patches are installed. However, if patches are installed and if a microcode line is accessed for which a patch is enabled, the patch is executed instead of the microcode line.
A patch may be enabled by setting a match register with the address of the microcode instruction line in the microcode ROM that is to be patched. A processor may include several such match registers. Whenever the microcode ROM address matches the contents of one of the match registers, control is transferred to the patch RAM. The patch RAM may have a plurality of fixed entry points each corresponding to one of the match registers. Thus, when an MROM address matches a match register, control is passed to the patch RAM at the fixed entry point corresponding to the matching match register. To disable a match register, its contents may be written with a value that will never match any ROM address, e.g. xe2x88x921.
Whenever a match is detected between an MROM address and a match register, the microcode instruction line from the ROM is disabled and control is transferred to the appropriate entry point in the patch RAM. In some embodiments, a delay slot may also be issued from the ROM while control is being transferred to the fixed entry point in the patch ROM. Thus, there may be a two cycle bubble in the MROM unit pipeline whenever control is transferred from the microcode ROM to the patch RAM since both the matching address line and the delay slot line from the ROM are cancelled. In a preferred embodiment, the patch RAM is a contiguously addressed extension of the microcode ROM. Therefore, regular microcode jump or branch instructions may be used when exiting a patch routine to return to the ROM. Thus, when exiting a patch routine there is no need to cancel any instructions and patch routines may be exited and MROM operation resumed with no delay.
In a preferred embodiment, the microcode patch routines are initially loaded into system memory. A microcode patch RAM loader routine is called and executed to load patch RAM data from the system memory into the processor""s patch RAM. This is typically done by a command from basic input/output system (BIOS) or the operating system software shortly after power-up or reset of the processor.
One embodiment of a microcode patching device may include a first memory configured to store a plurality of microcode instruction lines. The first memory is configured to provide microcode instruction lines as accessed by an address provided to the first memory. The first memory provides addressed microcode instruction lines to a decode unit. The device also may include one or more match registers each configured to store a value indicating one of the microcode instruction lines in the first memory. A second memory may also be included and configured to store one or more microcode patch routines. A control unit causes one of the patch routines from the second memory to be provided to the decode unit if the address provided to the first memory matches the value stored in one of the match registers. The microcode instruction line provided by the first memory may be cancelled and the indicated patch routine executed instead. Patch routines may be located at a fixed entry point where each fixed entry point corresponds to a different match register so that when an address matches the value stored in a match register, control is transferred to the fixed entry point in the second memory corresponding to the matching match register. The first memory may include a microcode patch loader routine for loading the microcode patch routines from a third memory into the second memory. The microcode patch loader routine may be configured to cause one of the microcode patch routines to be executed if a flag is set in the third memory when the patch loader routine loads the microcode patch routines from the third memory to the second memory.
A method for patching microcode in a processor may include generating an address to access a microcode memory and comparing that address to values stored in one or more match registers. If the address does not match the value in any of the match registers, the method includes executing a microcode instruction line from the microcode memory as indicated by the address. If the address does match the value in one of the match registers, the method includes executing a microcode patch routine stored in a patch memory. The patch routine may be executed instead of the microcode instruction line from the microcode memory that was indicated by the address. In one embodiment, both the microcode instruction line as indicated by the address and a next line are dispatched from the microcode memory even if the address matches the value in one of the match registers. If a match occurs, the method includes canceling both the microcode instruction line and the next line dispatched from the microcode memory.
The method may further include loading one or more microcode patch routines from a system memory into the patch memory. This loading may include calling a patch loader routine in the microcode memory. Upon completion of loading the one or more microcode patch routines, the patch loader routine checks if a flag was set in the system memory and branches to one of the microcode patch routines located at a fixed location in the patch memory if the flag is set. The loading may include reading a header store in the system memory where the header indicates values for the match registers. Other information, such as patch ID""s, check sums, and the above-mentioned flag, may be indicated in the header. Match registers may be disabled by indicating in the header that a particular match register should be set with a value, such a xe2x88x921, that will not match any address of the microcode memory.
In one embodiment, a processor includes a microcode memory configured to store a plurality of microcode instruction lines. An address is provided to the microcode memory and the microcode memory provides the microcode instruction line indicated by the address. A patch memory is included and configured to store one or more microcode patch routines. Each microcode patch routine is located in the patch memory at a fixed entry point. Each fixed entry point is matched to a different address of the microcode memory to be patched. A control unit may also be included and configured to determine if the address provided to the microcode memory is for a microcode line in the microcode memory that is to be patched. If the address is for a line that is to be patched, the control unit causes the one of the patch routines from the patched memory that is located at the fixed entry point corresponding to the address provided to the microcode memory to be executed instead of the microcode line in the microcode memory. The processor may also include one or more match registers, wherein each match register is configured to store a value indicating the address of one of the microcode instruction lines in the microcode memory. The control unit determines if the address is for a microcode line in the microcode memory that is to be patched by comparing the address provided to the microcode memory to the values stored in the match registers.