Computer systems commonly have a plurality of components, such as processors, memory, and input/output devices, and a shared bus for transferring information among two or more of the components. The components commonly are coupled to the bus in the form of component modules, each of which may contain one or more processors, memory, and/or input/output devices. Information is transmitted on the bus among component modules during bus "cycles," each bus cycle being a period of time during which a module has control of the bus and is permitted to transfer, or drive, a limited quantity of information on the bus. The module having control of the bus during a given cycle is referred to as the bus owner.
Component modules generally communicate with one another via the bus in the form of "transactions" taking one or more cycles to complete, such as "read" and "write" transaction. For example, in a typical read transaction, a module will send signals on the bus to the main memory controller or another module identifying data that it needs to obtain and requesting that the identified data be sent to it. The responding module then processes the request and returns the data during one or more subsequent cycles. Many conventional buses accommodate "split transactions" in which a response need not immediately follow a request. For example, after a module initiates a read transaction, the module relinquishes control of the bus, allowing the bus to be used for other purposes until the responding module is ready to return the requested data. At that time, the responding module obtains control of the bus and sends the requested data to the requesting module.
In many computer systems, software running on the system is executed by two or more main processor modules that share a main memory. The main processors generally are coupled directly to the shared bus. The main memory generally is coupled to the bus through a main memory controller. If a processor is to read data from main memory or write data to main memory, it must communicate with the main memory controller. Systems of this type are often referred to as "shared memory multiprocessor" systems.
A processor module or input/output module may also have a cache memory, which stores frequently used data values for quick access by the module. Ordinarily, a cache memory stores both the frequently used data and the addresses where these data items are stored in main memory. When the module seeks data from an address in memory, it requests that data from its cache memory using the address associated with the data. The cache memory checks to see whether it holds data associated with that address. If so, it is possible for the cache memory to return the requested data directly to the processor. If the cache memory does not contain the desired information (i.e., if a "cache miss" occurs), a regular memory access ordinarily occurs. Cache memory is typically useful when main memory (generally RAM) accesses are slow compared to the microprocessor speed. Cache memory is faster than :main RAM memory.
In the case of a shared memory multi-processor system in which each processor has cache memory, the situation is somewhat more complex. In such a system, the data needed for a particular transaction may be stored in one or more cache memories, and/or in the main memory. The data in a cache memory may have been operated on by a processor, resulting in a value that is different from the value stored in main memory. It is generally necessary for software executing on the processors to utilize the most current values for data associated with particular addresses. Thus, whenever a processor seeks data that may have been used by other processors, it is necessary to implement a "cache coherency scheme," which is a process for making certain that data provided to processors is current.
In a typical coherency scheme, when data is requested by a module, each module having cache memory performs a "coherency check" of its cache memory to determine whether it has data associated with the requested address and reports the results of its coherency check. Each module also generally keeps track of and reports the status of the data stored in its cache memory in relation to the data associated with the same address stored in main memory and other cache memories. For example, a module may report that its data is "private" (i.e., the data is only available to that module) or that the data is "shared" (i.e., the data may reside in more than one cache memory at the same time). A module may also report whether its data is "clean" (i.e., the same as the data associated with the same address stored in main memory) or "dirty" (i.e., the data has been operated on after it was obtained). Ordinarily, only one private-dirty copy of data is permitted at any given time. A "coherent transaction" is any transaction, for example a memory read, that requires a check of all memories to determine the source of the data to be delivered to the requested processor.
Coherent transactions generally can be issued during any available bus cycle. Some modules, however, may be busy internally and unable to immediately perform a coherency check for the transaction and cache coherency checks may take several cycles to complete. To accommodate the rate at which coherent transactions can be issued, modules sometimes have a cache coherency queue for storing coherent transactions until a coherency check can be performed.
The results of the coherency checks performed by each module are analyzed and the most current data is provided to the module that requested the data. For example, if no cache memories have a copy of the requested data, the data will be supplied by main memory. If a module has a private-dirty copy, it generally will supply the data. When the data is supplied, each module typically updates the status of the data in its cache memory. For example, if a private-dirty copy of data is copied into main memory, it may become a clean copy.
Delays in the cache coherency hardware in the modules can cause ordering problems in multi-processor systems. As explained above, hardware must ensure that for any particular data request, the most up-to-date version of data is supplied. This can be difficult in a heavily pipelined system, since there is an inevitable delay in responding to transactions that have been issued on the bus.
One potential problem occurs if a module issues a coherent read of a particular data line at about the same time that a second module writes-back a dirty copy of the same line. Since the dirty copy is the most up-to-date, it should be supplied in response to the coherent read. However, if memory responds to the read before the write is executed, and the second module (i.e., the module writing-back the dirty copy of the line) does not detect the conflict when performing a coherency check because it already "gave up" the line, the original requestor would get incorrect "stale" data from memory. This "ordering" problem obviously can cause incorrect results when the processors operate on incorrect data.
Prior systems have avoided the above ordering problem using various techniques. Each prior technique has disadvantages. Some systems have only allowed a single coherent transaction to be issued at a time, and no new coherent transaction may be issued until all coherency reporting has been completed on the first coherent transaction. This technique ensures that transactions are processed in the appropriate order and that up-to-date data is supplied at the cost of decreasing usable bus bandwidth, thus limiting performance.
Other systems require that modules check outstanding transactions for potential conflicts before issuing a new transaction. For instance, before a processor could issue, a write-back of a cache line, the processor would check to make sure there were no outstanding coherent reads of the same line. This restriction also slows down potential transaction issue rate, thereby limiting performance, and increases complexity in the modules.
Accordingly, there is a need for a coherency scheme for a pipelined split transaction bus, that does not limit the rate that coherent transactions can be issued, and in which each module can process cache coherence checks at its own rate.