1. Field of the Invention
This invention relates to the field of microprocessors and, more particularly, to memory operand fetching within microprocessors.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. On the other hand, superpipelined microprocessor designs divide instruction execution into a large number of subtasks which can be performed quickly. A pipeline stage is assigned to each subtask. By overlapping the execution of many instructions within the pipeline, superpipelined microprocessors attempt to achieve high performance. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term "instruction processing pipeline" is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
Superscalar microprocessors demand high memory bandwidth due to the number of instructions executed concurrently and due to the increasing clock frequency (i.e. shortening clock cycle) employed by the superscalar microprocessors. Many of the instructions include memory operations to fetch (read) and update (write) memory operands in addition to the operation defined for the instruction. The memory operands must be fetched from or conveyed to memory, and each instruction must originally be fetched from memory as well. Similarly, superpipelined microprocessors demand high memory bandwidth because of the high clock frequency employed by these microprocessors and the attempt to begin execution of a new instruction each clock cycle. It is noted that a given microprocessor design may employ both superscalar and superpipelined techniques in an attempt to achieve the highest possible performance characteristics.
Microprocessors are often configured into computer systems which have a relatively large, relatively slow main memory. Typically, multiple dynamic random access memory (DRAM) modules comprise the main memory system. The large main memory provides storage for a large number of instructions and/or a large amount of data for use by the microprocessor, providing faster access to the instructions and/or data then may be achieved from a disk storage, for example. However, the access times of modern DRAMs are significantly longer than the clock cycle length of modern microprocessors. The memory access time for each set of bytes being transferred to the microprocessor is therefore long. Accordingly, the main memory system is not a high bandwidth system. Microprocessor performance may suffer due to a lack of available memory bandwidth.
In order to relieve the bandwidth requirements on the main memory system, microprocessors typically employ one or more caches to store the most recently accessed data and instructions. Caches perform well when the microprocessor is executing programs which exhibit locality of reference. Particularly with respect to data (i.e. memory operands used by instructions), many programs have memory access patterns which exhibit locality of reference. A memory access pattern exhibits locality of reference if a memory operation to a particular byte of main memory indicates that memory operations to other bytes located within the main memory at addresses near the address of the particular byte are likely. Generally, a "memory access pattern" is a set of consecutive memory operations performed in response to a program or a code sequence within a program. The addresses of the memory operations within the memory access pattern may have a relationship to each other. For example, the memory access pattern may or may not exhibit locality of reference.
When programs exhibit locality of reference, cache hit rates (i.e. the percentage of memory operations for which the requested byte or bytes are found within the caches) are high and the bandwidth required from the main memory is correspondingly reduced. When a memory operation misses in the cache, the cache line (i.e. a block of contiguous data bytes) including the accessed data is fetched from main memory and stored into the cache. A different cache line may be discarded from the cache to make room for the newly fetched cache line.
Unfortunately, certain code sequences (for example, certain loops) within a program may have a memory access pattern which does not exhibit locality of reference or which may otherwise hamper the ability of the cache to relieve the bandwidth required from the main memory. For example, code sequences may access a datum once and not return to access that datum or other data within the same cache line as the datum (in other words, the code sequence may not exhibit locality of reference). If the datum misses in the cache, the cache line containing the datum is fetched from main memory and stored into the cache. Another cache line of data which may be accessed again in the future may be discarded from the cache to store the newly fetched cache line, even though the newly fetched cache line is not going to be accessed again in the near future.
In other cases, a microprocessor may be configured to convey a write memory operation which misses the cache to the main memory for storage without allocating storage in the cache for the cache line corresponding to the write memory operation. However, the cache line to which the write memory operation is directed may be accessed again within the code sequence. Since the cache line is not allocated and stored into the cache, the subsequent accesses miss the cache also.