The invention pertains generally to multiprocessor systems and more particularly to the prevention of livelock due to resource or coherency conflicts. With technology and design improvements of today, microprocessor operating frequencies may far exceed the frequency capabilities of multiprocessor system buses. One method of improving the system bus bandwidth and mitigating this frequency differential is to use wider buses. However, this solution results in a microprocessor cost adder for the additional pins required, and increases the cost of any memory controller attached to the system bus as well. Another way to improve system bus bandwidths is to use a narrow bus, but increase the bus speed by using point-to-point, unidirectional nets. When using these types of buses, a system bus switch is required to connect all of the microprocessor system buses. This system bus switch can be implemented as part of the memory controller.
If a system bus switch, including a memory controller, as described above is used to connect processors, it is possible to design the multiprocessor system to use snooping bus coherency protocols. In other words, the switch can be designed to accept commands from the processors, perform arbitration on these requests, and serially source these commands back to all processors as snoop commands, allowing each processor to see each command from every other processor. Each processor generates a snoop response to the snooped commands, sending these responses to the system bus switch. The snoop response generated by each processor is that processor's response to the snooped command, the response being a function of that processor's cache state for the requested coherency block, or a function of resource availability to perform the action requested, if any. The system bus switch logically combines the individual processor snoop responses, and sends the logical combination of all the processors snoop responses back to each processor. In such a system, the amount of time from the sourcing of the snoop command to the return of the combination of all the processor snoop responses can be several bus timing cycles. This number of bus cycles may be large and is usually longer than the desired snoop rate (the number of bus cycles between successive snoop commands). Two types of problems occur in systems where the time between the snoop command to that snoop commands' combined snoop response is larger than the time between successive snoop commands.
One problem is caused by coherency conflicts, where two or more processors in the multiprocessor system are attempting to perform conflicting operations on the same coherency block. A coherency block is defined herein as the smallest block of memory for which the processor will maintain cache coherency information. Usually, the coherency block size is the cache line size for the processor. An example of such a conflict would be a situation where two processors are attempting to do stores to the same coherency block. These stores would typically or most reasonably be to different byte locations in the coherency block. The stores must be logically serialized so that both stores are correctly reflected in the final coherency block result. In a snooping system allowing pipelined operations, the chronological bus sequence for each store is (1) the store command is snooped, (2) each processor sources its snoop response on the snoop response out bus, and (3) the combined snoop response is sourced to each processor (on the snoop response in bus). Complexity occurs when the bus sequences for the two stores from different processors (such as A and B shown in FIG. 2 to be later described) overlap such that A's combined snoop response in occurs after B's snoop response out. In this case, other system processors would be forced to respond to the B snoop command before seeing the combined response for the A snoop. Since the response to snoop command B could be dependent on the combined response for snoop command A, this sequence must be avoided.
Another problem occurs with snoop commands overlapped as described above. In the case where processors limit snoop command rates due to their resources or pacing requirements, a similar problem exists with overlapping snoop commands. In the case where there is a sequence of these command types on the bus, and the snoop commands are overlapped, a system livelock can occur. As defined herein, system livelock is a repetitive sequence of snoop command retries. This can happen if different snooping processors are forced to retry different commands, with the result that all commands are retried by some processor. Some mechanism to break this livelock must be present, if such a livelock can occur.
One approach that attempts to avoid these problems is the use of a non-pipelined bus, at least non-pipelined as far as the snoop command to combined snoop response in time is concerned. However, this restriction limits the system bus snoop rate, usually resulting in a performance problem (degradation) in multiprocessor systems.
For pipelined busses, one prior art method of solving the above mentioned problem is to use additional bus signals to support an additional retry protocol. This retry protocol can be used to retry snoop commands to the same addresses, which are within the snoop response out to snoop response in time window. While this approach is feasible on a single multiprocessor bus, the technique is more complicated in a multiple bus system such as shown in FIG. 1 to be later described. In addition, this prior art method requires that the address arbiter (if one exists) monitor the bus retries to detect the case where the bus gets into a sequence of repetitive retries due to conflicts with the prior bus commands. Some method must be implemented to break such a repetitive sequence when detected. One such way is for the arbiter, upon detection of such a sequence, to temporarily slow the snoop rate to remove the snoop command overlaps and break the repetitive retry sequence.
Since these solutions either reduce the system snoop rate or add complexity to the processors and system bus arbiter, what is needed is a simpler way to maximize the system snoop rate while solving these conflict problems caused by a pipelined snoop bus.