The present invention relates to microprocessors and more particularly to a mechanism for updating a program counter value of a microprocessor.
Microprocessors are processors which are implemented on one or a very small number of semiconductor chips. Semiconductor chip technology is ever increasing the circuit densities and speeds within microprocessors; however, the interconnection between the microprocessor and external memory is constrained by packaging technology. Though on-chip interconnections are extremely cheap, off-chip connections are very expensive. Any technique intended to improve microprocessor performance must take advantage of increasing circuit densities and speeds while remaining within the constraints of packaging technology and the physical separation between the processor and its external memory. While increasing circuit densities provide a path to evermore complex designs, the operation of the microprocessor must remain simple and clear for users to understand how to use the microprocessor.
While the majority of existing microprocessors are targeted toward scalar computation, superscalar microprocessors are the next logical step in the evolution of microprocessors. The term superscalar describes a computer implementation that improves performance by a concurrent execution of scalar instructions. Scalar instructions are the type of instructions typically found in general purpose microprocessors. Using today's semiconductor processing technology, a single processor chip can incorporate high performance techniques that were once applicable only to large-scale scientific processors. However, many of the techniques applied to large scale processors are either inappropriate for scalar computation or too expensive to be applied to microprocessors.
A microprocessor runs application programs. An application program comprises a group of instructions. In running the application program, the processor fetches and executes the instructions in some sequence. There are several steps involved in executing even a single instruction, including fetching the instruction, decoding it, assembling its operands, performing the operations specified by the instruction, and writing the results of the instruction to storage. The execution of instructions is controlled by a periodic clock signal. The period of the clock signal is the processor cycle time.
The time taken by a processor to complete a program is determined by three factors: the number of instructions required to execute the program; the average number of processor cycles required to execute an instruction; and, the processor cycle time. Processor performance is improved by reducing the time taken by the processor to complete the program, which dictates reducing one or more of these factors.
One way to improve performance of the microprocessor is by overlapping the steps of different instructions, using a technique called pipelining. To pipeline instructions, the various steps of instruction execution are performed by independent units called pipeline stages. Pipeline stages are separated by clocked registers. The steps of different instructions are executed independently in different pipeline stages. Pipelining reduces the effective number of cycles required to execute an instruction, though not the total amount of time required to execute an instruction, by overlapping instructions and thus permitting the processor to handle more than one instruction at a time. This is done without increasing and often decreasing the processor cycle time. Pipelining typically reduces the average number of cycles per instruction by as much as a factor of three. However, when executing a branch instruction, the pipeline may sometimes stall until the result of the branch operation is known and the correct instruction is fetched for execution. This delay is known as the branch-delay penalty. Increasing the number of pipeline stages also typically increases the branch-delay penalty relative to the average number of cycles per instruction.
A typical microprocessor executes one instruction on every processor cycle. A superscalar processor reduces the average number of cycles per instruction beyond what is possible in a pipelined scalar processor by allowing concurrent execution of instructions in the same pipeline stage as well as concurrent execution of instructions in different pipeline stages. The term superscalar emphasizes multiple concurrent operations on scalar quantities as distinguished from multiple concurrent operations on vectors or arrays as is common in scientific computing.
While superscalar processors are conceptually simple, there is more to achieving increased performance than widening a processor's pipeline. Widening the pipeline makes it possible to execute more than one instruction per cycle but there is no guarantee that any given sequence of instructions can take advantage of this capability. Instructions are not independent of one another but are interrelated; these interrelationships prevent some instructions from occupying the same pipeline stage. Furthermore, the processor's mechanisms for decoding and executing instructions can make a big difference in its ability to discover instructions that can be executed simultaneously.
Superscalar techniques largely concern the processor organization independent of the instruction set and other architectural features. Thus, one of the attractions of superscalar techniques is the possibility of developing a processor that is code compatible with an existing architecture. Many superscalar techniques apply equally well to either reduced instruction set computer (RISC) or complex instruction set computer (CISC) architectures. However, because of the regularity of many of the RISC architectures, superscalar techniques have initially been applied to RISC processor designs.
The Program counter (PC), also called an Instruction Pointer (IP), preserves the memory address of instructions as the instructions are fetched from memory and executed. The program counter mechanism for maintaining and updating the program counter value, which is referred to as the program counter, includes an incrementer, a selector and a register. As each instruction is fetched and decoded, the address of the next sequential instruction is formed by adding the byte length of the current instruction to the current value of the program counter using the incrementer and placing this next sequential instruction in the register. When a branch is taken, the address of the target instruction is selected by the selector instead of the incremented value and this target address is placed in the register.
The program counter value serves two purposes. The program counter value provides the memory address of the next instruction to be fetched and executed. The program counter value also identifies the address of an instruction that encountered a problem which halted the execution of the instruction stream. This address may be used for debugging purposes or for possible continuation of execution of the instruction stream after corrective action is taken.
When using a pipelining implementation in a microprocessor, the program counter value is maintained at the beginning of the pipeline where the value provides the instruction fetch address; this value is referred to as the fetch PC value. This fetch PC value points to instructions entering the pipeline. As instructions propagate along the pipeline stages, subsequent instructions are fetched and placed in the pipeline. Accordingly, the fetch PC value does not correspond to instructions which are in stages of the pipeline other than the first stage. Because most problems that stop the execution of an instruction stream tend to be detected near or at the end of the pipeline, rather than at the beginning, the program counter value for an instruction must be maintained during execution of the instruction; this value is referred to as the execute PC value.
Two methods are known for maintaining the execute PC value. A first method is for the PC value of an instruction to accompany the instruction down the pipeline. With this method each pipeline stage requires additional storage to store the execution PC value. The amount of additional storage required is proportional to the number of pipeline stages. A second method duplicates the PC circuit at the end of the pipeline. In this method, only the length information of the instruction accompanies the instruction in the pipeline. As non-branch instructions complete, the length value of the instruction is added to the execute PC value to provide the execute PC value for the next instruction. As branch instructions complete, the target address for the branch, rather than the incremented value, is provided as the execute PC value.