1. Field of the Invention
This invention is related to the field of microprocessors and, more particularly, to an apparatus for maintaining instruction cache coherency.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term xe2x80x9cclock cyclexe2x80x9d refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively.
In order to further increase performance, superscalar microprocessors typically include one or more caches for storing instructions and data. A cache is a storage device configured onto the same semiconductor substrate as the microprocessor, or coupled nearby. The cache may be accessed more quickly than a main memory system coupled to the microprocessor. Generally speaking, a cache stores data and instructions from the main memory system in blocks referred to as cache lines. A cache line comprises a plurality of contiguous bytes. The contiguous bytes are typically aligned in main memory such that the first of the contiguous bytes resides at an address having a certain number of low order bits set to zero. The certain number of low order bits is sufficient to uniquely identify each byte within the cache line. The remaining bits of the address form a tag which may be used to refer to the entire cache line. As used herein, the term xe2x80x9caddressxe2x80x9d refers to a value indicative of the storage location within main memory corresponding to one or more bytes of information.
Microprocessors may be configured with a single cache which stores both instructions and data, but are more typically configured with separate instruction and data caches. The caches are typically designed to be coherent with respect main memory. In particular, coherency requires that when bytes stored in main memory are modified, the modified bytes are conveyed in response to subsequent accesses to those bytes. The modified bytes are conveyed in response to subsequent accesses even if the bytes were stored into the cache prior to the modifications. Modifications may be performed by the microprocessor, or may be performed by another microprocessor or device coupled into a computer system with the microprocessor.
Modifications performed by external devices (i.e. devices outside of the microprocessor) are often detected by xe2x80x9csnoopingxe2x80x9d. Snooping refers to a process in which the microprocessor compares addresses presented to the main memory system to the tag addresses representing bytes stored in the caches. If a match occurs during snooping, the cache line is updated according to the nature of the main memory access. For example, the cache line may be invalidated in the cache upon detection of a modification of bytes within the cache line. A subsequent access to the cache line causes the modified bytes to be fetched from the main memory system. It is noted that the snooping address comparison is typically performed on a cache line basis (i.e. only that portion of the addresses which uniquely identify the cache line affected by the main memory access are compared).
Coherency is somewhat less complicated for instruction caches than for data caches. Instruction caches are typically not modified with respect to main memory by the microprocessor. Therefore, coherency may be maintained by detecting updates through snooping and invalidating the corresponding cache lines. Additionally, modifications performed by the microprocessor to main memory locations stored in the instruction cache are detected and the corresponding instruction cache lines discarded. These microprocessor-performed modifications are detected to allow the correct execution of xe2x80x9cself-modifying codexe2x80x9d, in which a portion of a computer program updates another portion of that computer program during execution.
The instructions comprising a particular program sequence are fetched from the cache into an instruction processing pipeline within the microprocessor. An instruction processing pipeline generally comprises one or more pipeline stages in which a portion of instruction processing is performed. Typically, instruction processing involves at least the following processing functions: decoding an instruction to determine the required operations, fetching operands for the instruction (either from memory or from registers included within the microprocessor), executing the instruction, and storing the result of the execution into a destination specified by the instruction. An instruction flows through at least the pipeline stages which perform instruction processing functions required by that instruction. Certain pipeline stages may be bypassed by a particular instruction if the processing performed by the bypassed stages is not required by the particular instruction. For example, pipeline stages which perform cache and memory accesses may be bypassed by instructions which do not access memory. When an instruction reaches the end of the instruction processing pipeline, the microprocessor has completed the actions defined for that instruction.
In a superscalar microprocessor, portions of the instruction processing pipeline comprise multiple parallel pipeline stages. The parallel stages allow multiple instructions to be concurrently processed within a particular pipeline stage. Typically, as many as 20-40 or more instructions may be within the instruction processing pipeline of a superscalar microprocessor during a particular clock cycle. Unfortunately, this vast number of instructions presents a problem for cache coherency (either for external accesses or for updates performed by store instructions executed by the microprocessor). If memory locations corresponding to instructions within the instruction processing pipeline are modified, these instructions should be discarded from the instruction processing pipeline and the modified instructions fetched. In particular, instructions may be fetched from a particular cache line and that cache line may be discarded by the instruction cache prior to the instructions being executed. Searching the instruction cache for an address being updated is not sufficient for detecting such instructions within the instruction processing pipeline. Including logic for coherency checking at each pipeline stage would be prohibitive in both occupied silicon area and complexity. A mechanism for detecting updates to instructions within the instruction processing pipeline and for responding appropriately is desired.
The problems outlined above are in large part solved by a microprocessor employing a core snoop buffer apparatus in accordance with the present invention. The core snoop buffer stores addresses of pages from which instructions have been fetched but not yet retired (i.e. the instructions are outstanding within the instruction processing pipeline) Addresses corresponding to memory locations being modified are compared to the addresses stored in the core snoop buffer on a page basis. If a match is detected, then instructions are flushed from the instruction processing pipeline and refetched. In this manner, the instructions executed to the point of modifying registers or memory are correct in self-modifying code or multiprocessor environments. Advantageously, instructions may be speculatively fetched and executed, and yet still are coherent with respect to changes to memory. Additionally, the number of pages from which instructions are concurrently outstanding within the microprocessor are typically small compared to the number of cache lines outstanding or the number of instructions outstanding. Therefore, a relatively small hardware structure may be employed to perform the instruction coherency functionality.
Several embodiments of the core snoop buffer are shown. In one embodiment, addresses of pages along with a count of the outstanding instructions from each page are stored. Such an embodiment efficiently uses the storage locations by storing each page address in at most one storage location. The corresponding counts are incremented as additional instructions enter the instruction processing pipeline and decremented as instructions exit the instruction processing pipeline. In another embodiment, a FIFO buffer is employed which stores the pages of addresses in the order that instructions from the pages are fetched. A particular page address may be stored in more than one buffer location. However, deleting entries from the buffer comprises detecting an instruction which is retired from a different page than a previously retired instruction. The least recently added entry in the FIFO is removed upon such detection. These embodiments as well as other embodiments serve different desired levels of complexity and performance.
Broadly speaking, the present invention contemplates an apparatus for snooping updates to instructions which are within an instruction processing pipeline of a microprocessor. The apparatus comprises a first bus, an instruction storage, a buffer, a plurality of comparators, and a control unit. The first bus is configured to convey a first address indicative of a first memory location which is being updated. Included for storing a plurality of instructions, the instruction storage is divided into a plurality of cache lines into which the plurality of instructions are stored. A cache line comprises a particular number of consecutive instruction bytes. The buffer is configured to store a plurality of addresses, wherein each one of the plurality of addresses identifies at least two consecutive cache lines of instructions. The plurality of addresses encompasses memory locations corresponding to a second plurality of instructions which are within the instruction processing pipeline. Coupled to the first bus and the buffer, each one of the plurality of comparators receives one of the plurality of addresses. The comparators are configured to compare a subset of the first address to the plurality of addresses, and to assert signals indicating that the comparison indicates equality. The control unit is coupled to the buffer and to the plurality of comparators, and is configured to store each one of the plurality of addresses into the buffer when at least one instruction encompassed by one of the plurality of addresses is dispatched into the instruction processing pipeline.
The present invention further contemplates a method for snooping updates to instructions which are within an instruction processing pipeline of a microprocessor, comprising several steps. An address indicative of a plurality of instructions is stored in a buffer. The address is stored when the plurality of instructions enter the instruction processing pipeline. An update address indicative of a memory location being updated is compared to the address stored in the buffer. The plurality of instructions are flushed from the instruction processing pipeline if the compare indicates that the update address corresponds to the address. The address is discarded from the buffer when the plurality of instructions exit the instruction processing pipeline.