A. Field of Invention
The present invention pertains generally to computer systems and more particularly to the manner of executing successive instructions in a computer system.
B. Description of the Background
Digital computer systems operate by executing a series of successive instructions that control the functions of the computer system. Typically, a new instruction step will be initiated for each cycle that is initiated by a processor of the computer system, so that a successive number of sequential instructions are being executed by the processor simultaneously. This process of executing a series of successive instruction steps simultaneously is known as a "pipeline." For any given set of successive instruction steps, the instruction that is currently being executed is referred to as the "current instruction step" and each successive subsequent instruction is indicated by its number. For example, the next instruction that is executed after the current instruction is referred to as the "first subsequent instruction step." If that instruction step is a "load instruction" that instruction step is referred to as the "first subsequent load instruction step." Similarly, if that instruction is an "add and branch if true instruction" it may be referred to as a "first subsequent branch instruction step." The next instruction step after the first subsequent instruction step is referred to as the "second subsequent instruction step," and so on. If, for any reason, an instruction step must be skipped in the series of successive instruction steps, the pipeline is said to have imposed upon it a "penalty of a one instruction step delay." This can occur, for example, when a load instruction needs data from a previous load instruction which has not yet finished retrieving the necessary data from a data cache. For example, instruction step delays may occur when the "operand address," i.e., the address from which data is to be retrieved from a general register, for a subsequent instruction, is the same as the "target address," i.e., the address to which data is to be stored or written, for a previous instruction. If the previous instruction has not yet stored the data in the general registers at the target address, and the target address is the same as the operand address of a subsequent instruction, then the subsequent instruction cannot be executed and a delay penalty will be imposed.
A similar situation occurs during a "branch instruction," i.e. an instruction that directs the processor to branch off to a new location in the sequence of successive instructions that would otherwise be followed by the processor. During the execution of a branch instruction, the branch instruction generates a new address for retrieving the new instruction. The processor requires a certain period of time to retrieve the branch instruction, decode it and determine the new address of the branch instruction. Since these functions take numerous cycles to accomplish, a number of instruction step delays may be imposed on the pipeline. Typically, the branch instruction is placed in the pipeline so that the first subsequent instruction after the branch instruction is an instruction that would normally be executed in the series of successive instruction steps. A pipeline designed in this manner is referred to as a "one state delay branch architecture." Hence, if the branch instruction address is not ready until the third subsequent instruction step, a penalty of only a one instruction step delay is imposed in a one state delay branch architecture.
In designing a microprocessor chip, it is advantageous to have the chip operate as fast as possible, by executing instructions and processing information in the shortest possible time. In achieving this goal, it is desirable to minimize the penalties for instruction step delays. Otherwise, the time gained in decreasing cycle time may be lost in instruction step delays. One method that has been used to significantly decrease the access time for retrieving an instruction, or data, has been to use instruction caches and data caches which comprise high speed memory that are connected directly between the microprocessor chip and slower memory such as dynamic random access memory (DRAM). These cache memories can comprise high speed flip-flops that are expensive, and occupy a relatively large amount of space in the system for the amount of data stored. Despite these limitations, cache memory does have a relatively fast response time which allows data or instructions to be fetched in a relatively rapid manner in comparison to DRAM, i.e., typically on the order of 15 nanoseconds to 30 nanoseconds or more access time. Access time, in this regard, may include the full cycle time from issuing an address until an instruction is received back by a processor and decoded.
Although cache memory in the form of instruction cache and data cache, is significantly faster than DRAM, it does limit the speed at which a microprocessor can operate. Conventional microprocessor architectures are designed for an access time for instruction cache and data cache of one cycle. By providing a one cycle response time, a penalty of only one instruction step delay is imposed for subsequent load instructions, i.e., the data is available for a second subsequent load instruction. Similarly, a penalty of only one instruction step delay is imposed for branch instructions in a one-state delay branch architecture, i.e., the branch address is available for the third subsequent instruction step. The problem with allowing only a one cycle response time, to fetch instructions from the instruction cache and to fetch data from the data cache, is that the cycle period of the processor must be sufficiently long to accommodate the response time of the instruction cache and the data cache. It is clear that many other steps can be executed in the microprocessor in a much shorter period than the period required by the response time of the instruction cache and data cache memories. Hence, the entire system must be slowed down and operate at slower speeds. With currently available instruction cache and data cache, typical microprocessors operate, at the present time, in the range of 15 to 20 megahertz. The limit of operation with currently available memory caches using current pipeline techniques may be on the order of 40 megahertz.
The one method considered by the inventors of the present invention to overcome these limitations, but not known to be prior art, was to allow two cycle periods for the instruction cache and data cache access time, while operating the other instructions on a one cycle time period. While this allows the microprocessor to operate at a much faster rate, i.e., double the rate at which it could operate if the cache memories had an access time of only one cycle, penalties of greater than one instruction step delay were imposed in the pipeline, which did not net an overall decrease in the response time of the microprocessor system. For example, in the process of executing load instructions, a penalty of a two instruction step delay was imposed when the data cache access time was set at two cycles. Similarly, in branch instructions, a penalty of a two instruction step delay was imposed in a one state delay branch architecture when the instruction cache access time was set at two cycles.
Hence, prior art microprocessors have been limited by the response time of the memory cache, and the allowance of a two cycle response time for memory cache does not solve these problems since additional instruction delay penalties would be imposed on the pipeline.