1. Field of the Invention
This invention relates to computer systems, and, more particularly, to memory coherence protocols in multiprocessor systems.
2. Description of the Related Art
In order to increase the performance of computer systems, system designers often look towards techniques that increase the amount of concurrent or parallel processing that occurs within the system. For example, within a microprocessor, the ability of the microprocessor to execute multiple instructions in parallel may be increased in a fine-grained fashion by adding independent functional units and related execution resources using superscalar implementation techniques, or in a coarse-grained fashion by replicating individual processor cores within the microprocessor. Parallelism may be augmented at other levels of abstraction within the computer system, for example by providing multiple microprocessors within the system (also referred to as multiprocessor systems), or by integrating multiple discrete systems or subsystems together via a network or other type of interconnect to create a still more complex parallel system.
In parallel systems that provide access to shared memory, the possibility exists that two or more independent, concurrently executing processor tasks may attempt to concurrently access the same addressable location in memory. For example, one task may attempt to write the location at the same time the other attempts to read it. Absent some technique to predictably order or regulate such concurrent memory accesses, unpredictable or erroneous execution behavior may result. For example, the two tasks mentioned may produce different computational results depending on the order in which the write occurs relative to the read, which otherwise might be completely random. Similar problems may occur if different processors in a multiprocessor system attempt to locally cache shared data.
Generally, the problems that may arise from multiple tasks, processors or other types of agents attempting to concurrently access and/or modify shared data may be referred to as memory coherence problems, in that in the absence of ordering or control, shared data may become incoherent with respect to the agents sharing it. Frequently, designers of systems in which coherence problems may arise employ some type of coherence mechanism through which access to memory is governed by well-defined, coordinated procedures. For example, a coherence protocol such as the MESI protocol may be employed to prevent coherence problems by prescribing a closed set of possible coherence states (e.g., Modified, Exclusive, Shared or Invalid) that may correspond to any addressable quantum of memory or “unit of coherence” at a given time, and by further prescribing specific actions to be undertaken by or on behalf of a processor, task or other agent in response to various types of memory activity such as read or write activity.
Memory coherence protocols typically rely on the activity of one agent being visible with respect to other agents so that they may respond appropriately. Such visibility is often provided through the use of shared buses across which memory transactions may be broadcast to the various processors or other enforcers of the coherence protocol. For example, a write request to a particular memory address may be broadcast across a bus to multiple processors within a system such that all processors understand that their own copies of data at the particular address may no longer be valid.
The performance of shared buses tends to scale poorly as the number of attached devices increases. To improve operating frequency, point-to-point connections may be used in place of shared buses. However, this may increase the complexity and operating overhead required to maintain memory coherence, since memory transactions occurring over a particular point-to-point connection may no longer be globally visible to the rest of a system. Memory coherence may still be enforced, for example, by requiring data to be loaded from a system memory into a local cache before it is read or modified, thus making the various caches in the system the loci of coherence activity, rather than a shared bus. However, requiring that data be loaded into a processor's cache to ensure coherence may be particularly wasteful when that data is destined to be overwritten, for example as part of a Direct Memory Access (DMA) transfer from an input/output (I/O) device.