1. Field of the Invention
This invention relates in general to field of microprocessors, and more specifically to a method and apparatus for providing latency independent coherence between data in multiple caches associated with multiple processing complexes.
2. Background Art
A microprocessor based computing system typically includes a processor, a memory complex, an interface to input/output (I/O) devices, and I/O such as a keyboard, mouse, graphical display, hard disk, network interface, etc. In early computing systems, the memory complex communicated directly with the processor whenever the processor needed to read or write data. However, because the speed of the memory complex has not keep pace with speed advances in processors, it became necessary to place relatively small high speed memory between the memory complex and the processor to store data currently being operated on by the processor. The high speed memory is known as cache memory, or simply cache.
When a processor needs to read or write data to memory, if the data associated with the read/write is within the cache, the processor can perform the read/write quickly. However, if the data associated with the read/write is not within the cache, the processor must wait until the data is retrieved from the memory complex, and placed into the cache before the read/write is completed. One skilled in the art will appreciate that such description of cache operation is general, and that many advances have been made to improve cache performance when there is a “miss” in the cache. However, for purposes of this background, it is sufficient to understand that caches store a subset of the data that exists within the memory complex.
Further advances in computing have led to processor based systems where multiple processors operate together, and share a common memory complex. However, to obtain the speed advantage of caches, each of the processors has their own cache between them and the memory complex. But, when two or more caches are used, it is possible that two or more instances of the same data might reside outside the memory complex. In this situation, a methodology must exist to insure that the value of each of these instances is always the same. That is, from the viewpoint of each of the processors, a specific address in memory should refer to data that has only one value, whether or not the data resides in one or more caches. The area of technology associated with the methodology to insure consistency of data between caches is known as coherence.
In general, coherence methodology requires that whenever data is read from a shared memory complex into one or more caches, the data be “tagged” with a state which indicates: 1) what the memory address of the data is; and 2) what the coherent state of the data is. Coherent states are typically: 1) Invalid—indicating that the data in the cache at a particular address is no longer valid; 2) Shared—indicating that the data in the cache can be read but not written to; 3) Exclusive—indicating that the data can be read or written to; or 4) Modified—indicating that the data can be read and written to, but has already been written to by its associated processor. These four coherence states are known as MESI.
To implement a MESI methodology in a multiprocessor system, the current state of the art requires that each of the processors be coupled together, and to the memory complex, via a common bus architecture that allows the processors to “snoop” each others cache, to insure coherency. That is, if one processor wishes to load data from the memory complex into its cache, it must first insure that the data does not reside in another processor's cache in a modified or exclusive state. If the data does reside in another processor's cache in a modified state, for example, that data must first be written back to the memory complex, to insure that the requesting processor gets the latest data. Further, each of the processors are required, according to the common bus protocol, to respond to snoops within a predetermined period of time so that the requesting processor is not stalled, indeterminately.
The inventors of the present invention view the requirement of a common bus architecture, and of conformity to predetermined response periods, within the context of coherency in multiprocessor systems, as disadvantageous to newer computing architectures. More specifically, within the context of system-on-chip architectures, it may be desirable to utilize several processor cores, each of which share a common memory, but where the interface between the processor cores and the common memory does not utilize a common bus architecture. In such a context, the response period of each of the interfaces may be different, and the protocol associated with insuring coherency may be different.
Therefore, what is needed is a method and apparatus to allow multiple processors, and/or I/O devices, to share a common memory, using one or more interfaces, to insure memory coherency without regard to latencies associated with any of the interfaces.