Multiprocessor systems employ two or more computer processors that can communicate with each other, such as over a bus or a general interconnect network. In such systems, each processor may have its own memory cache (or cache store) that is separate from the main system memory that the individual processors can access. Cache memory connected to each processor of the computer system can often enable faster access to data than if accessed from the main system memory. Caches are useful because they tend to reduce latency associated with accessing data on cache hits, and they work to reduce the number of requests to system memory. In particular, a write-back cache enables a processor to write changes to data in the cache without simultaneously updating the contents of memory. Modified data can be written back to memory at a later time.
Coherency protocols have been developed to ensure that whenever a processor reads or writes to a memory location it receives the correct or true data. Additionally, coherency protocols help ensure that the system state remains deterministic by providing rules to enable only one processor to modify any part of the data at any one time. If proper coherency protocols are not implemented, however, inconsistent copies of data can be generated.
Modern microprocessors employ instruction pipelines in order to increase program execution speeds. A superscalar processor is a processor that issues multiple independent instructions into multiple pipelines or execution units allowing multiple instructions to execute in parallel. A pre-fetch engine include an instruction fetch unit that fetches program instructions which are translated into micro-operands by a decoder and assigned a sequence number by a allocation unit. The instructions are streamed into multiple execution units that execute in parallel. Once executed, the instructions can be retired.
Microprocessors employ either an in-order pipeline which retires instructions in strict program order, or an out-of order pipeline which executes instructions out-of-order to increase program execution speed, but requires the re-ordering of results prior to retiring instructions. In a multi-processor system that employs a cache coherency protocol, either pipeline type will stall during an issuing of a source request as a result of a cache miss. The trend is for the ratio of the memory latency to the processor cycle time to grow in future microprocessor applications. This trend results in cache misses serviced by the system to become an increased portion of the execution time of an application.