In the field of microprocessors, the number of instructions executed per second is a primary performance measure. As is well known in the art, many factors in the design and manufacture of a microprocessor impact this measure. For example, the execution rate depends quite strongly on the clock frequency of the microprocessor. The frequency of the clock applied to a microprocessor is limited, however, by power dissipation concerns and by the switching characteristics of the transistors in the microprocessor.
The architecture of the microprocessor is also a significant factor in the execution rate of a microprocessor. For example, many modern microprocessors utilize a "pipelined" architecture to improve their execution rate if many of their instructions require multiple clock cycles for execution. According to conventional pipelining techniques, each microprocessor instruction is segmented into several stages, and separate circuitry is provided to perform each stage of the. instruction. The execution rate of the microprocessor is thus increased by overlapping the execution of different stages of multiple instructions in each clock cycle. In this way, one multiple-cycle instruction may be completed in each clock cycle.
By way of further background, some microprocessor architectures are of the "superscalar" type, where multiple instructions are issued in each clock cycle for execution in parallel. Assuming no dependencies among instructions, the increase in instruction throughput is proportional to the degree of scalability.
Another known technique for improving the execution rate of a microprocessor and the system in which it is implemented is the use of a cache memory. Conventional cache memories are small high-speed memories that store program and data from memory locations which are likely to be accessed in performing later instructions, as determined by a selection algorithm. Since the cache memory can be accessed in a reduced number of clock cycles (often a single cycle) relative to main system memory, the effective execution rate of a microprocessor utilizing a cache is much improved over a non-cache system. Many cache memories are located on the same integrated circuit chip as the microprocessor itself, providing further performance improvement.
According to each of these architecture-related performance improvement techniques, certain events may occur that slow the microprocessor performance. For example, in both the pipelined and the superscalar architectures, multiple instructions may require access to the same internal circuitry at the same time, in which case one of the instructions will have to wait (i.e., "stall") until the priority instruction is serviced by the circuitry.
One type of such a conflict often occurs where one instruction requests a write to memory (including cache) at the same time that another instruction requests a read from the memory. If the instructions are serviced in a "first-come-first-served" basis, the later-arriving instruction will have to wait for the completion of a prior instruction until it is granted memory access. These and other stalls are, of course, detrimental to microprocessor performance.
It has been discovered that, for most instruction sequences (i.e., programs), reads from memory or cache are generally more time-critical than writes to memory or cache, especially where a large number of general-purpose registers are provided in the microprocessor architecture. This is because the instructions and input data are necessary at specific times in the execution of the program in order for the program to execute in an efficient manner; in contrast, since writes to memory are merely writing the result of the program execution, the actual time at which the writing occurs is not as critical since the execution of later instructions may not depend upon the result.
By way of further background, write buffers have been provided in microprocessors, such write buffers being logically located between on-chip cache memory and the bus to main memory. These conventional post-cache write buffers receive data from the cache for a write-through or write-back operation; the contents of the post-cache write buffer are written to main memory under the control of the bus controller, at times when the bus becomes available.
By way of further background, some pipelined architecture microprocessors operate according to speculative execution in order to maintain the pipeline full despite conditional branch or jump instructions being present in the program sequence. Speculative execution requires that predictive branching be performed, where the microprocessor predicts whether the conditional branch will be taken or not taken according to an algorithm; the predicted path is then executed in the pipeline. It is important that the results of speculative executed instructions not be written to memory or cache, because if the prediction is incorrect, it may be difficult or impossible to recover from the incorrectly performed memory write.
Another type of situation can occur where instructions are processed in a pipeline, including writes to memory, where an earlier instruction has an exception condition (e.g., divide-by-zero) for which the program execution should be immediately stopped.
It is an object of the present invention to provide a microprocessor architecture which buffers the writing of data from the CPU core into a write buffer, prior to retiring of the data to a cache, and in which recovery from speculative execution or exceptions can be readily performed.
It is a further object of the present invention to provide such an architecture which prevents the writing of data to memory during a speculative execution sequence.
It is a further object of the present invention to provide such an architecture which allows for multiple degrees of speculative execution.
Other objects and advantages of the present invention will be apparent to those of ordinary skill in the art having reference to the following specification in combination with the drawings.