1. Field of the Invention
This invention is related to the field of microprocessors and, more particularly, to an apparatus for maintaining instruction cache coherency.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively.
In order to further increase performance, superscalar microprocessors typically include one or more caches for storing instructions and data. A cache is a storage device configured onto the same semiconductor substrate as the microprocessor, or coupled nearby. The cache may be accessed more quickly than a main memory system coupled to the microprocessor. Generally speaking, a cache stores data and instructions from the main memory system in blocks referred to as cache lines. A cache line comprises a plurality of contiguous bytes. The contiguous bytes are typically aligned in main memory such that the first of the contiguous bytes resides at an address having a certain number of low order bits set to zero. The certain number of low order bits is sufficient to uniquely identify each byte within the cache line. The remaining bits of the address form a tag which may be used to refer to the entire cache line. As used herein, the term "address" refers to a value indicative of the storage location within main memory corresponding to one or more bytes of information.
Microprocessors may be configured with a single cache which stores both instructions and data, but are more typically configured with separate instruction and data caches. The caches are typically designed to be coherent with respect main memory. In particular, coherency requires that when bytes stored in main memory are modified, the modified bytes are conveyed in response to subsequent accesses to those bytes. The modified bytes are conveyed in response to subsequent accesses even if the bytes were stored into the cache prior to the modifications. Modifications may be performed by the microprocessor, or may be performed by another microprocessor or device coupled into a computer system with the microprocessor.
Modifications performed by external devices (i.e. devices outside of the microprocessor) are often detected by "snooping". Snooping refers to a process in which the microprocessor compares addresses presented to the main memory system to the tag addresses representing bytes stored in the caches. If a match occurs during snooping, the cache line is updated according to the nature of the main memory access. For example, the cache line may be invalidated in the cache upon detection of a modification of bytes within the cache line. A subsequent access to the cache line causes the modified bytes to be fetched from the main memory system. It is noted that the snooping address comparison is typically performed on a cache line basis (i.e. only that portion of the addresses which uniquely identify the cache line affected by the main memory access are compared).
Coherency is somewhat less complicated for instruction caches than for data caches. Instruction caches are typically not modified with respect to main memory by the microprocessor. Therefore, coherency may be maintained by detecting updates through snooping and invalidating the corresponding cache lines. Additionally, modifications performed by the microprocessor to main memory locations stored in the instruction cache are detected and the corresponding instruction cache lines discarded. These microprocessor-performed modifications are detected to allow the correct execution of "self-modifying code", in which a portion of a computer program updates another portion of that computer program during execution.
The instructions comprising a particular program sequence are fetched from the cache into an instruction processing pipeline within the microprocessor. An instruction processing pipeline generally comprises one or more pipeline stages in which a portion of instruction processing is performed. Typically, instruction processing involves at least the following processing functions: decoding an instruction to determine the required operations, fetching operands for the instruction (either from memory or from registers included within the microprocessor), executing the instruction, and storing the result of the execution into a destination specified by the instruction. An instruction flows through at least the pipeline stages which perform instruction processing functions required by that instruction. Certain pipeline stages may be bypassed by a particular instruction if the processing performed by the bypassed stages is not required by the particular instruction. For example, pipeline stages which perform cache and memory accesses may be bypassed by instructions which do not access memory. When an instruction reaches the end of the instruction processing pipeline, the microprocessor has completed the actions defined for that instruction.
In a superscalar microprocessor, portions of the instruction processing pipeline comprise multiple parallel pipeline stages. The parallel stages allow multiple instructions to be concurrently processed within a particular pipeline stage. Typically, as many as 20-40 or more instructions may be within the instruction processing pipeline of a superscalar microprocessor during a particular clock cycle. Unfortunately, this vast number of instructions presents a problem for cache coherency (either for external accesses or for updates performed by store instructions executed by the microprocessor). If memory locations corresponding to instructions within the instruction processing pipeline are modified, these instructions should be discarded from the instruction processing pipeline and the modified instructions fetched. In particular, instructions may be fetched from a particular cache line and that cache line may be discarded by the instruction cache prior to the instructions being executed. Searching the instruction cache for an address being updated is not sufficient for detecting such instructions within the instruction processing pipeline. Including logic for coherency checking at each pipeline stage would be prohibitive in both occupied silicon area and complexity. A mechanism for detecting updates to instructions within the instruction processing pipeline and for responding appropriately is desired.