Computer systems commonly have a plurality of components, such as processors, memory, and input/output devices, and a shared bus for transferring information among two or more of the components. Typically, the components are coupled to the bus in the form of component modules, each of which may contain one or more processors, memory, and/or input/output devices. Information is transmitted on the bus among component modules during bus cycles, each bus cycle being a period of time during which a selected module is permitted to transfer, or drive, a limited quantity of information on the bus. Modules commonly send transactions on the bus to other modules to perform operations such as reading and writing data.
One class of computer system has two or more main processor modules for executing software running on the system (or one or more processor modules and one or more coherent input/output modules) and a shared main memory that is used by all of the processors and coherent input/output modules in the system. The main memory is generally coupled to the bus through a main memory controller. In many cases, one or more processors also has a cache memory, which stores recently used data values for quick access by the processor.
Ordinarily, a cache memory stores both the frequently used data and the addresses where these data items are stored in main memory. When the processor seeks data from an address in memory, it requests that data from the cache memory using the address associated with the data. The cache memory checks to see whether it holds data associated with that address. If so, the cache memory returns the requested data directly to the processor. If the cache memory does not contain the desired information (i.e., a "cache miss" occurs), the cache requests the data from main memory and stalls the processor while it is waiting for the data. Since cache memory is faster than main RAM memory, this strategy results in improved system performance.
In the case of a shared memory multi-processor computer in which each processor has cache memory, the situation is somewhat more complex. In such a computer, the most current data may be stored in one or more cache memories, or in the main memory. Software executing on the processors must utilize the most current values for data associated with particular addresses. Thus, a "cache coherency scheme," must be implemented to assure that all copies of data for a particular address are the same.
In a typical write-back coherency scheme, when data is requested by a module, each module having cache memory performs a "coherency check" of its cache memory to determine whether it has data associated with the requested address and reports the results of its coherency check. Each module also generally reports the status of the data stored in its cache memory in relation to the data associated with the same address stored in main memory and other cache memories. For example, a module may report that its data is "private" (i.e., the data value is only usable by this module) or that the data is "shared" (i.e., the data may reside in more than one cache memory at the same time). A module may also report whether its data is "clean" (i.e., the same as the data associated with the same address stored in main memory) or "dirty" (i.e., the data has been changed after it was obtained).
The results of the coherency checks performed by each module are analyzed by a selected processor and the most current data is provided to the module that requested the data. A "coherent transaction" is any transaction that requires a check of other caches to see whether data associated with a memory address is stored in the other caches, or to verify that data is current. Most reads and some writes to memory are coherent transactions. Those skilled in the art are familiar with many types of coherent transactions, such as a conventional read private, and non-coherent transactions, such as a conventional write-back.
In many conventional coherency schemes, reporting the results of coherency checks requires a significant amount of communication between the modules and the coherency processor that makes the final decision on how a memory request is to be satisfied. Each module having a cache memory must be informed of a required coherency check and must report the result of its coherency check to the coherency processor. Even if the number of communications is reduced, conventional means of processing and reporting the results of coherency checks are often slow. Coherency checks must be carried out in a manner that does not substantially reduce the effective bandwidth of the shared bus used by the modules for the inter-module communications.
To reduce the impact of memory latency delays, many conventional buses are "split transaction" buses; that is, a transaction does not need to be processed immediately after it is placed on the bus. For example, after a memory read transaction is issued on the bus, the module that issued the read relinquishes the bus, allowing other modules to use the bus for other transactions. When the requested data is available, the responding module for the read obtains control of the bus, and then transmits the data. It is often possible for modules in a shared bus system to initiate transactions faster than they can be serviced by the responding module, or faster than coherency checks can be performed by the other modules. For example, input/output devices often operate at a much slower speed than microprocessors and, thus, modules connecting input/output devices to the bus may be slow to respond. Similarly, main memory accesses are relatively slow, and it is possible for the processor modules to request data faster than it can be read from the main memory. Cache coherency checks may also be slow because the coherency checking processors in a module may be busy with other operations. Thus, it is often necessary to either slow down initiation of new transactions by modules or to handle the overflow of transactions when too many transactions are initiated in too short a time for them to be adequately processed or for coherency checks to be performed.
A typical prior art method for dealing with transaction overflow uses a "busy-abort" mechanism to handle the situation in which too many transactions of some type are initiated too quickly. When the responding module for the transaction sees a new transaction request that it cannot respond to immediately, the responding module sends back a "busy-abort" signal indicating that the transaction cannot be serviced at that time (e.g., an input/output module is occupied or a processor module having a cache memory cannot perform a coherency check fast enough). The requesting module then aborts its request and tries again at a later time. This approach increases design complexity because the requesting module must retain the transaction information until all possibility of receiving a "busy-abort" response has passed. In addition, if two transactions must be executed in a particular order, the second transaction generally cannot be issued until all possibility of receiving a "busy-abort" response has passed. Finally, aborted transactions result in processing delays and waste bus time.
An alternative approach is to require handshaking between modules after each transaction to confirm whether a transaction can be processed by the responding module. This approach also results in processing delays and unnecessary design complexity.
Accordingly, there is a need for a means of handling multiple transactions that a computer system cannot immediately process without imposing unnecessary processing delays or design complexity on the system.