1. Field of the Invention
The present invention relates to increasing the speed and efficiency of a microprocessor while maintaining its compatibility with the instruction set architecture. More particularly, the present invention is a technique for decreasing the utilization of processing resources needed to execute particular microprocessor instructions. A parcel cache is provided which stores decoded instructions, i.e. parcels or micro-ops. This allows the decode mechanism in the microprocessor to retrieve a micro-op from the parcel cache and eliminates the necessity of repeatedly decoding often used instructions, such as those which are associated with loop and repeat operations.
2. Description of Related Art
In the computer industry there is a constant demand for ever faster and more efficient systems. Computer processing efficiency is largely dependent on the speed and resource utilization of the microprocessor that controls the basic functions of the computer system. Those microprocessors manufactured by the Intel Corporation execute a specific microprocessor instruction set architecture (ISA), also commonly referred to as x86 instructions. Other Intel (ISA) compatible microprocessors include those manufactured by Advanced Micro Devices, Inc., National Semiconductor and others. These Intel ISA microprocessors command a huge percentage of the marketplace and have caused a correspondingly large amount of software to be written for them. Due to this large amount of Intel ISA software, microprocessor developers cannot change the programmer visible aspects of the instruction set, since it may cause this large quantity of existing software (legacy software) to become inoperable.
Therefore, a challenge to microprocessor developers has been to improve the efficiency of the microprocessor without changing the manner in which the ISA is implemented on the processor. For example, many instructions in the Intel architecture require sub-operations to be performed before the instruction can be completed. If the number of sub-operations, i.e. micro-ops, can be minimized or their performance optimized, without changing the ISA or programmer visible registers, then performance of the microprocessor can be enhanced.
Typically, instructions in the Intel ISA are complex. Therefore, a lot of transistors and time are spent on decoding an x86 CISC (complex instruction set computer) instruction into a simpler RISC (reduced instruction set computer) operation (micro-op or instruction parcel). The motivation for converting an x86 instruction into a RISC operation is to remove the variable length instruction nature of an Intel ISA instruction and simplify the execution engine. The x86 instructions are complex because they tend to perform a lot of work in a single instruction. That is, each CISC instruction has a substantial amount of functionality encoded therein. In addition, to achieve good code density, these instructions are coded using variable opcode lengths. Hence, the complexity of x86 instructions puts a large burden on the front end of the processor pipeline with respect to logic complexity, timing and number of pipeline stages. A Pentium II processor (Pentium is a trademark of Intel Corporation) uses five (5) pipeline stages to fetch and decode the CISC x86 instructions. These 5 stages are a significant number of the total pipeline stages for the microprocessor operation.
Some of the complex instructions in the Intel ISA which perform a substantial amount of work and correspondingly require a lot of fetching and decoding overhead include LOOP, LOOP.sub.cc, REP, REPZ, REPNZ and REP MOVS instructions. These instructions will decrement a value in a register, such as a general purpose register (GPR) or the like and then make a comparison to determine if the resulting value is equal to zero. For example, each time the LOOP instruction is executed a count register is decremented and checked for zero. If the count equals zero, then the loop is terminated and program execution continues with the instruction following the LOOP. When the count is not zero, a jump (branch) is performed to a destination operand or instruction at a target address, usually the first instruction in the loop. The LOOP instruction does not modify the programmer visible condition code(s) in the flags register. The LOOP instruction will decrement a count register (ECX or CX) and then perform a comparison to see if the value is equal to zero. Whether the ECX or CX register is used depends on the size of the address. For 32 bit applications ECX is used and for 16 bit applications CX is used. Thus, the comparison operation will need to check at least 16 and possibly 32 locations which requires significant processing resources, such as hardware logic needed to perform the actual compare finction. The LOOP.sub.cc instruction also decrements the (E)CX register and compares the decremented value to zero, but allows the loop to be exited early by checking a condition code in the flags register. In either case, the compare logic is required to check the decremented value in the (E)CX register with zero. Similarly, the REP instruction(s) will decrement the count register (E)CX and repeat a string operation, e.g. load string, while the value is not equal to zero.
It can be seen that each of these complex instructions must be fetched and may be decoded into multiple micro-ops, or parcels each time the instructions in the loop are iteratively processed or the string instructions are repeated. Each iteration will require the five (5) stage fetch/decode pipeline to be entered causing a significant amount of processing resources to be expended.
Thus, in conventional systems a significant amount of the processor resources must be used to fetch and decode the complex x86 instructions. Particularly in the case of repetitive type instructions, a substantial increase in efficiency could be realized if the fetch and decode resources were not required to continuously process the same instructions at the expense of other instructions waiting to be fetched.
Therefore, it can be seen that a need exists for a microprocessor that executes the Intel instruction set architecture and maintains compatibility with software written for the Intel ISA, while efficiently executing those instructions using less hardware resources.
In particular it would be advantageous for a microprocessor to be able to fetch a complex instruction and decode that instruction into associated RISC micro-operations and store the micro-ops in an easily accessible memory for later use. Thus, the overhead of continually re-fetching and decoding various complex instructions can be saved and overall microprocessor efficiency increased.